The Czech Named Entity Corpus 1.1 is a corpus of 5868 Czech sentences with manually annotated 33662 Czech named entities, classified according to a two-level hierarchy of 62 named entities. It is a minor update to the Czech Named Entity Corpus 1.0, a first publicly available corpus providing a large body of manually annotated named entities in Czech sentences, including a fine-grained classification. The corpus is available under the CC BY-NC-SA 3.0 license.
The named entities in Czech are classified according to a two-level hierarchy taken from Ševčíková et al., 2007. The hierarchy is the same as in CNEC 1.0.
Named entities are saved in formats:
Czech Named Entity Corpus 1.1 can be downloaded from LINDAT/CLARIN repository.
The Czech Named Entity Corpus 1.1 is evaluated using the canonical script distributed with the corpus. The evaluation metric is a strict (both span and type must be correct) span-based micro F1.
The difference between Czech Named Entity Corpus 1.1 and 1.0 are the following: