The latest version of the Czech Named Entity Corpus (Czech Named Entity Corpus 2.0) is a corpus of 8993 Czech sentences with manually annotated 35220 Czech named entities, classified according to a two-level hierarchy of 46 named entities.
Current version download: Czech Named Entity Corpus 2.0.
Detailed description of the corpus, file formats, two-level named entity hierarchy and download links are available for every released version:
CNEC 1.0 Types | CNEC 1.0 Supertypes | CNEC 2.0 Types | CNEC 2.0 Supertypes | CNEC 1.0 Extended | CNEC 2.0 Extended | Publication | Code | Method |
---|---|---|---|---|---|---|---|---|
– | – | – | – | – | 86.39 | Bachelor Thesis of Müller 2020, a rerun of Straková et al., 2019 | Straková et al., 2019 | LSTM-CRF+BERT |
86.88 | 89.91 | 86.23 | 89.37 | – | – | Straka et al., 2019 | – | Seq2seq+BERT |
86.88 | – | – | – | – | – | Straková et al., 2019 | GitHub | Seq2seq+BERT |
83.15 | 86.30 | – | – | 83.27 | 84.22 | Hluboké učení v automatické analýze českého textu. In: Slovo a slovesnost, ISSN 0037-7031, vol. 80, no. 4, pp. 306-327 | – | Deep NN |
– | – | – | – | – | 81.05 | Güngör, 2018 | – | RNN+WE+CLE |
81.20 | 84.68 | 79.23 | 82.78 | 80.88 | 80.79 | Straková et al., 2016 | GitHub | RNN+WE+CLE |
– | – | – | – | 74.08 | – | Konkol et al., 2015 | – | Latent semantics |
– | – | – | – | 75.61 | – | Demir and Özgür, 2014 | – | NN+WE |
– | – | – | – | 74.23 | 74.37 | Konkol and Konopík, 2014 | – | CRF+stemming |
79.23 | 82.82 | – | – | – | – | Straková et al., 2013 | NameTag | Simple NN |
– | 79.00 | – | – | 74.08 | – | Konkol and Konopík, 2013 | – | CRF |
– | 72.94 | – | – | – | – | Konkol and Konopík, 2011 | – | Maximum entropy |
68.00 | 71.00 | – | – | – | – | Kravalová and Žabokrtský, 2009 | – | SVM |
62.00 | 68.00 | – | – | – | – | Ševčíková et al., 2007 | – | Dec. trees |
Please let us know if you have a contribution to this table. Thanks!
Ševčíková, M., Žabokrtský, Z., Krůza, O.: Named Entities in Czech: Annotating Data and Developing NE Tagger. In: Matoušek, V., Mautner, P. (eds.) TSD 2007. LNCS (LNAI), vol. 4629, pp. 188–195. Springer, Heidelberg (2007).
@inproceedings{SevcikovaEtAl2007CNEC,
booktitle = {Lecture Notes in Artificial Intelligence, Proceedings of the 10th International Conference on Text, Speech and Dialogue},
series = {Lecture Notes in Computer Science},
title = {Named Entities in Czech: Annotating Data and Developing {NE} Tagger},
editor = {V{\'{a}}clav Matou{\v{s}}ek and Pavel Mautner},
author = {Magda {\v{S}}ev{\v{c}}{\'{\i}}kov{\'{a}} and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}} and Old{\v{r}}ich Kr{\r{u}}za},
year = {2007},
publisher = {Springer},
address = {Berlin / Heidelberg},
volume = {4629},
number = {{XVII}},
pages = {188--195},
isbn = {978-3-540-74627-0},
issn = {0302-9743},
}