Hlava Cor
Human Label Variation in Coreference (Hlava Cor) is a collection of commented multiple annotations (three annotators) of coreferential relations in Czech, i.e. the annotation of expressions that refer to the same extra-linguistic entity, concept, or situation. Given an anaphoric expression, annotators were instructed to identify a coreferential expression in the preceding context (if one exists) and to comment on their decision. The main aim of the annotation is to capture variation in the interpretation of coreference among readers. The dataset includes both written and spoken contexts.
Hlava Cor releases
Hlava Cor 1.0 (March 2026) [URL]
Authors
Anna Nedoluzhko, Jiří Mírovský. Šárka Zikánová, Eva Hajičová, Bianca Chuffartová, Šárka Dohnalová, Lucie Hartmanová, Eliška Nodlová, Dominik Teska, Františka Zikánová
Feel free to write to us if you have any questions: Anna Nedoluzhko, Jiří Mírovský, Šárka Zikánová.
Data Description
Hlava Cor comprises 1,024 cases (a sentence + adjacent and distant contexts), each annotated independently by three annotators in parallel. The annotators' decisions are documented in the "Commentary" columns.
The texts come from three sources:
- the Prague Dependency Treebank - Consolidated 1.0 (Hajič et al., 2020) (this is the main resource)
- Czech Academic Corpus 2.0 (Hladká et al., 2008)
- iRozhlas archive
Data Source and Format
After downloading the corpus from http://hdl.handle.net/11234/1-6131, the annotations can be found in directory data.
The annotations are available in four data formats:
- Hlava_Cor.ods - Open Document Format Spreadsheet
- Hlava_Cor.xlsx - MS Excel table
- Hlava_Cor.tsv - TAB-separated values, UTF-8, no column headings
- Hlava_Cor.odb - Open Document Format Database
Annotation instructions (in Czech) can be found in directory doc.
How to cite Hlava Cor
If you use the data in your research or need to cite it for any reason, please cite the dataset directly:
Anna Nedoluzhko, Jiří Mírovský, Šárka Zikánová, Eva Hajičová, Bianca Chuffartová, Šárka Dohnalová, Lucie Hartmanová, Eliška Nodlová, Dominik Teska and Františka Zikánová: Human Label Variation in Coreference. Data/software, ÚFAL MFF UK, Prague, Czech Republic, LINDAT/CLARIAH-CZ: http://hdl.handle.net/11234/1-6131, March 2026.
Acknowledgements
The development of Hlava Cor was financed by the GAČR project 24-11132S.
This work has been using language resources developed, stored or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education of the Czech Republic (project LM2023062).
Licence
This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence.
© 2026 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic.
References
Hajič, J. et al. 2020. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0).
Data/software, LINDAT-CLARIAH, URL: http://hdl.handle.net/11234/1-3185.
Hladká, B., et al. 2008. The Czech Academic Corpus 2.0 guide.
The Prague Bulletin of Mathematical Linguistics, 89:41–96.


