Czech Named Entity Corpus

Newest version of the Czech Named Entity Corpus (Czech Named Entity Corpus 2.0) is a corpus of 8993 Czech sentences with manually annotated 35220 Czech named entities, classified according to a two-level hierarchy of 46 named entities.

Current version download: Czech Named Entity Corpus 2.0.

Detailed description of the corpus, file formats, two-level named entity hierarchy and download links are available for every released version:

Work Published using CNEC




  • SVV project number 267 314 (Teoretické základy informatiky a výpočetní lingvistiky)
  • LINDAT/CLARIN (Large infrastructural grant for language resources, data access and distribution and related reseearch), project LM2010013 of the Ministry of Education of the Czech Republic
  • GAČR 406/12/P175 project (Vybrané derivační vztahy pro automatické zpracování češtiny) of the Grant Agency of the Czech Republic
  • PRVOUK P46 project