Hlava Cor

Human Label Variation in Coreference (Hlava Cor) is a collection of commented multiple annotations (three annotators) of coreferential relations in Czech, i.e. the annotation of expressions that refer to the same extra-linguistic entity, concept, or situation. Given an anaphoric expression, annotators were instructed to identify a coreferential expression in the preceding context (if one exists) and to comment on their decision. The main aim of the annotation is to capture variation in the interpretation of coreference among readers. The dataset includes both written and spoken contexts.

Hlava Cor releases

Hlava Cor 1.0 (March 2026) [URL]

Authors

Anna Nedoluzhko, Jiří Mírovský. Šárka Zikánová, Eva Hajičová, Bianca Chuffartová, Šárka Dohnalová, Lucie Hartmanová, Eliška Nodlová, Dominik Teska, Františka Zikánová

Feel free to write to us if you have any questions: Anna Nedoluzhko, Jiří MírovskýŠárka Zikánová.

Data Description

Hlava Cor comprises 1,024 cases (a sentence + adjacent and distant contexts), each annotated independently by three annotators in parallel. The annotators' decisions are documented in the "Commentary" columns.

The texts come from three sources:

Data Source and Format

After downloading the corpus from http://hdl.handle.net/11234/1-6131, the annotations can be found in directory data.

The annotations are available in four data formats:

  • Hlava_Cor.ods  - Open Document Format Spreadsheet
  • Hlava_Cor.xlsx - MS Excel table
  • Hlava_Cor.tsv  - TAB-separated values, UTF-8, no column headings
  • Hlava_Cor.odb  - Open Document Format Database

Annotation instructions (in Czech) can be found in directory doc.

How to cite Hlava Cor

If you use the data in your research or need to cite it for any reason, please cite the dataset directly:

Anna Nedoluzhko, Jiří Mírovský, Šárka Zikánová, Eva Hajičová, Bianca Chuffartová, Šárka Dohnalová, Lucie Hartmanová, Eliška Nodlová, Dominik Teska and Františka Zikánová: Human Label Variation in Coreference. Data/software, ÚFAL MFF UK, Prague, Czech Republic, LINDAT/CLARIAH-CZ: http://hdl.handle.net/11234/1-6131, March 2026.

Acknowledgements

The development of Hlava Cor was financed by the GAČR project 24-11132S.

This work has been using language resources developed, stored or distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education of the Czech Republic (project LM2023062).

Licence

Creative Commons License 

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence
© 2026 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic.

References

Hajič, J. et al. 2020. Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0).
Data/software, LINDAT-CLARIAH, URL: http://hdl.handle.net/11234/1-3185.

Hladká, B., et al. 2008. The Czech Academic Corpus 2.0 guide.
The Prague Bulletin of Mathematical Linguistics, 89:41–96.