Coreference resolution is the task of clustering together multiple mentions of the same entity appearing in a textual document (e.g. Joe Biden, the U.S. President and he). This CodaLab-powered shared task deals with multilingual coreference resolution and is associated with the CRAC 2022 Workshop (the Fifth Workshop on Computational Models of Reference, Anaphora and Coreference) held at COLING 2022.
Recently, inspired by the Universal Dependencies initiative (UD) [1], the coreference community has started discussions on establishing a universal annotation scheme and using it to harmonize existing corpora. The discussions at the CRAC 2020 workshop led to proposing the Universal Anaphora initiative. One of the lines of effort related to Universal Anaphora resulted in CorefUD, which is a multilingual collection of coreference data resources harmonized under a common scheme [2]. The public edition of CorefUD 1.0 contains 13 datasets for 10 languages, namely Catalan, Czech (2×), English (2×), French, German (2×), Hungarian, Lithuanian, Polish, Russian, and Spanish. The CRAC 2022 shared task deals with coreference resolution in all these languages.
The file format used in CorefUD 1.0 represents coreference using the bracketing notation inspired by the CoNLL-2011 and CoNLL-2012 shared tasks [3], and inserts it into the MISC column of the CoNLL-U, the file format used in UD. The content of the other columns is fully compatible with morphological and syntactic annotations of the UD framework in CorefUD (with, for instance, automatically parsed trees added to resources that miss manual syntactic annotations). Thus, the shared task participant can easily employ UD-style morphosyntactic features for coreference prediction for all resources in a unified way, if they want to (pilot studies of the relation between coreference and dependency syntax can be found in [4] and [5]). CorefUD tokenization is UD-compliant, too.
The main rules of the CRAC 2022 shared task are the following:
Charles University (Prague, Czechia): Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Daniel Zeman
Polish Academy of Sciences (Warsaw, Poland): Maciej Ogrodniczuk
Georgetown University (Washington D.C., USA): Yilun Zhu
University of West Bohemia (Pilsen, Czechia): Miloslav Konopík, Ondřej Pražák, Jakub Sido
If you are interested in participating in this shared task, please fill the registration form as soon as possible.
Technically, this registration will not be connected with participants' CodaLab accounts in any way. In other words, it will be possible to upload your CodaLab submissions without being registered here. However, we strongly recommend that at least one person from each participating team fills this registration form so that we can keep you informed about all updates regarding the shared task.
In addition, you can send any questions about the shared task to the organizers via corefud@googlegroups.com.
This shared task is supported by the Grants No. 20-16819X (LUSyD) of the Czech Science Foundation, and LM2018101 (LINDAT/CLARIAH-CZ) of the Ministry of Education, Youth, and Sports of the Czech Republic.
The public edition of CorefUD 1.0 data is used in this shared task, both for training and evaluation purposes. CorefUD 1.0 is a collection of previously existing datasets annotated with coreference, converted into a common annotation scheme. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference-specific information captured in the MISC column.
The public edition of CorefUD 1.0 contains 13 datasets for 10 languages, labeled as follows:
(There is also a non-public edition of CorefUD 1.0, containing 4 more datasets, however, they cannot be used for this shared task purposes because of their license limitations.)
The full specification of the CoNLL-U format is available at the website of Universal Dependencies. In a nutshell: every token has its own line; lines starting with #
are sentence-level comments, and empty lines terminate a sentence. Regular token lines start with an integer number. There are also lines starting with intervals (e.g. 4-5
), which introduce what UD calls “multi-word tokens”; these lines must be preserved in the output but otherwise the participants do not have to care about them (coreference annotation does not occur on them). Finally, there are also lines starting with decimal numbers (e.g. 2.1
), which correspond to empty nodes in the dependency graph; these nodes may represent zero mentions and may contain coreference annotation. Every token/node line contains 10 tab-separated fields (columns). The first column is the numeric ID of the token/node, the next column contains the word FORM; any coreference annotation, if present, will appear in the last column, which is called MISC. The file must use Linux-style line breaks, that is, a single LF character, rather than CR LF, which is common on Windows.
The MISC column is either a single underscore (_
), meaning there is no extra annotation, or one or more pieces of annotation (typically in the Attribute=Value
form), separated by vertical bars (|
). The annotation pieces relevant for this shared task always start with Entity=
; these should be learned from the training data and predicted for the test data. Any other annotation that is present in the MISC column of the input file should be preserved in the output (especially note that if you discard SpaceAfter=No
, or introduce a new one, the validator may report the file as invalid).
For more information on the Entity
attribute, see the PDF with the description of the CorefUD 1.0 format.
Example:
# global.Entity = eid-etype-head-minspan-infstat-link-identity # sent_id = GUM_academic_art-3 # text = Claire Bailey-Ross xxx@port.ac.uk University of Portsmouth, United Kingdom 1 Claire Claire PROPN NNP Number=Sing 0 root 0:root Entity=(e5-person-1-1,2,4-new-coref|Discourse=attribution:3->57:7 2 Bailey Bailey PROPN NNP Number=Sing 1 flat 1:flat SpaceAfter=No 3 - - PUNCT HYPH _ 4 punct 4:punct SpaceAfter=No 4 Ross Ross PROPN NNP Number=Sing 2 flat 2:flat Entity=e5) 5 xxx@port.ac.uk xxx@port.ac.uk PROPN NNP Number=Sing 1 list 1:list Entity=(e6-abstract-1-1-new-sgl) 6 University university NOUN NNP Number=Sing 1 list 1:list Entity=(e7-organization-1-3,5,6-new-sgl-University_of_Portsmouth 7 of of ADP IN _ 8 case 8:case _ 8 Portsmouth Portsmouth PROPN NNP Number=Sing 6 nmod 6:nmod:of Entity=(e8-place-1-3,4-new-sgl-Portsmouth|SpaceAfter=No 9 , , PUNCT , _ 11 punct 11:punct _ 10 United unite VERB NNP Tense=Past|VerbForm=Part 11 amod 11:amod Entity=(e9-place-2-1,2-new-coref-United_Kingdom 11 Kingdom kingdom NOUN NNP Number=Sing 1 list 1:list Entity=e9)e8)e7)
Each CorefUD dataset is divided into a training section, a development section, and a test section (train/dev/test for short). Technically, each CorefUD dataset consists of three CoNLL-U files containing disjoint sets of documents; boundaries between the three sections can be placed only on document boundaries.
Training and development files containing gold coreference annotations are identical to the CoNLL-U files available in CorefUD 1.0 (the link leads to the LINDAT/CLARIAH-CZ repository where the data can be downloaded from). In addition, the development set with gold coreference annotation stripped off, which might be useful for development purposes, is available for download.
Test sets without gold coreference annotations was made available to participants at the beginning of the evaluation phase. Test data with gold coreference annotation will be used internally in CodaLab for evaluation of submissions.
Submissions of all participant on the dev set were published after the shared task.
The official scorer for the shared task is corefud-scorer.py. Run the following command to calculate the primary score (CoNLL score) that will be used to rank the submissions (KEY_FILE is the file with gold annotations, RESPONSE_FILE is the file with your predictions):
python corefud-scorer.py KEY_FILE RESPONSE_FILE
The main evaluation metric for the task is the CoNLL score, which is an unweighted average of the F1 values of MUC, B-cubed, and CEAFe scores. To encourage the participants to develop multilingual systems, the primary ranking score will be computed by macro-averaging CoNLL F1 scores over all datasets.
For the same reason, singletons (entities with a single mention) will not be taken into account in calculation of the primary score, as many of the datasets do not have singletons annotated.
Since mention heads are annotated in CorefUD 1.0, it allows us to perform partial matching during evaluation. The response (predicted) mention matches a key (gold) mention partially if all words in the response mention belong also to the key mention and one of these words is the key mention head at the same time. The primary score will be computed using this partial matching (i.e. exact matching will not be used).
Although some of the datasets also comprise annotation of split antecedents, bridging and other anaphoric relations, these are not going to be evaluated.
Besides the primary ranking, the overview paper on the shared task will also introduce multiple secondary rankings, e.g. by CoNLL score for individual languages.
In a typical case, shared task participants should proceed as follows.
In the development phase (Phase 1):
The zip file uploaded to CodaLab must contain the following 13 files (without any other files or subdirectories):
ca_ancora-corefud-dev.conllu cs_pcedt-corefud-dev.conllu cs_pdt-corefud-dev.conllu de_parcorfull-corefud-dev.conllu de_potsdamcc-corefud-dev.conllu en_gum-corefud-dev.conllu en_parcorfull-corefud-dev.conllu es_ancora-corefud-dev.conllu fr_democrat-corefud-dev.conllu hu_szegedkoref-corefud-dev.conllu lt_lcc-corefud-dev.conllu pl_pcc-corefud-dev.conllu ru_rucor-corefud-dev.conllu
In the evaluation phase (Phase 2):
Let us empasize that even if multiple submissions are possible within the evaluation phase, their number is limited, they are allowed rather because of resolving some unexpected situations, but definitely should not be used for systematic optimization of parameters or hyperparameters of your model towards the scores shown by CodaLab.
Participants who have developed multiple coreference prediction systems are encouraged to submit their predictions separately, up to 3 systems per team, as long as the systems are different in some interesting ways (e.g. using different architectures, not just different hyperparameter settings). In order to submit an additional system of yours, please create an additional team account at CodaLab.
Many things can go wrong when filling the predicted coreference annotation in the CoNLL-U format (incorrect syntax in the MISC column, unmatched brackets etc.) It is highly recommended to always check validity prior to submitting the files, so that you do not run out of the maximum daily (2) and total submission (10) trials specified for the shared task.
That said, even files not passing the validation tests will be consider for the evaluation and contributing to the final score (provided the evaluation script does not fail on such files).
There are two basic requirements for each submitted CoNLL-U file:
The official UD validator will be used to check the validity of the CoNLL-U format. Anyone can obtain it by cloning the UD tools repository from Github and running the script validate.py
. Python 3 is needed to run the script (depending on your system, it may be available under the command python
or python3
; if in doubt, try python -V
to see the version).
$ git clone git@github.com:UniversalDependencies/tools.git $ cd tools $ python3 validate.py -h
In addition, a third-party module called regex
must be installed via pip. Try this if you do not have the module already:
$ sudo apt-get install python3-pip; python3 -m pip install regex
The validation script distinguishes several levels of validity; level 2 is sufficient in the shared task, as the higher levels deal with morphosyntactic requirements on the UD-released treebanks. On the other hand, we will use the --coref
option to turn on tests specific to coreference annotation. The validator also requires the option --lang xx
where xx
is the ISO language code of the data set.
$ python3 validate.py --level 2 --coref --lang cs cs_pdt-corefud-test.conllu *** PASSED ***
If there are errors, the script will print messages describing the location and the nature of the error, it will print *** FAILED *** with (number of) errors
, and it will return a non-zero exit value. If the file is OK, the script will print *** PASSED ***
and return zero as its exit value. The script may also print warning messages that point to potential problems in the file but are not considered errors and will not make the file invalid.
The baseline system is based on the multilingual coreference resolution system presented in [7]. The model uses multilingual BERT in the end-to-end setting. In simple words, the model goes through all potential spans and maximizes the probability of gold antecedents for each span. The same system is used for all the languages. More details can be found in [7].
The simplified system adapted to CorefUD 1.0, is publically available on GitHub along with tagged dev data and its dev data results.
Files with coreference predicted by the baseline system can be downloaded directly as zip files: dev set and test set, so you do not have to run the baseline system yourself and can only try to improve its outputs. The zip file's structure is identical to what will be expected by the CodaLab submission system. The files with baseline predictions were post-process by Udapi to make them pass the pre-submission validation tests: udapy -s read.Conllu split_docs=1 corefud.MergeSameSpan corefud.IndexClusters < orig.conllu > fixed.conllu
Udapi is a Python API for reading, writing, querying and editing Universal Dependencies data in the CoNLL-U format (and several other formats). Newly, it has support for the coreference annotations (and it was used for producing CorefUD). Even if you decide not to build your system by extending the baseline system, you can use Udapi for accessing the CorefUD data in a comfortable way.
@misc{11234/1-4698, title = {Coreference in Universal Dependencies 1.0 ({CorefUD} 1.0)}, author = {Nedoluzhko, Anna and Nov{\'a}k, Michal and Popel, Martin and {\v Z}abokrtsk{\'y}, Zden{\v e}k and Zeldes, Amir and Zeman, Daniel and Bourgonje, Peter and Cinkov{\'a}, Silvie and Haji{\v c}, Jan and Hardmeier, Christian and Krielke, Pauline and Landragin, Fr{\'e}d{\'e}ric and Lapshinova-Koltunski, Ekaterina and Mart{\'{\i}}, M. Ant{\`o}nia and Mikulov{\'a}, Marie and Ogrodniczuk, Maciej and Recasens, Marta and Stede, Manfred and Straka, Milan and Toldova, Svetlana and Vincze, Veronika and {\v Z}itkus, Voldemaras}, url = {http://hdl.handle.net/11234/1-4698}, note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University}, year = {2022} } @misc{11234/1-3510, title = {Coreference in Universal Dependencies 0.1 ({CorefUD} 0.1)}, author = {Nedoluzhko, Anna and Nov{\'a}k, Michal and Popel, Martin and {\v Z}abokrtsk{\'y}, Zden{\v e}k and Zeman, Daniel}, url = {http://hdl.handle.net/11234/1-3510}, note = {{LINDAT}/{CLARIAH}-{CZ} digital library at the Institute of Formal and Applied Linguistics ({{\'U}FAL}), Faculty of Mathematics and Physics, Charles University}, copyright = {License {CorefUD} v0.1}, year = {2021} }