CoNLL 2018 Shared Task

Tags:

Annotations, Machine Learning, Morphology, Multilingual, Parsers, Tools

A CoNLL 2018 shared task.

The proposed task is a follow-up of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. We first summarize aspects that will be new in 2018; then we provide a more detailed description of the shared task for readers who are not familiar with the 2017 task.

There will be three main evaluation metrics. None of them is more important than the others and we will not combine them into a single ranking. Participants who want to decrease task complexity may concentrate on improvements in just one metric; however, all participating systems will be evaluated with all three metrics, and participants are strongly encouraged to output all relevant annotation (syntax + morphology), even if they just copy values predicted by the baseline model.

The three metrics are described in more detail here. All three include word segmentation and labeled dependency relations. One of them is identical to the 2017 main metric so that results can be compared. The other two metrics focus on content words and include morphological features and lemmatization, respectively.

Instead of surprise languages, there will be a category of low-resource languages that have little or no training data. The names of the languages, as well as whatever sample data may be available, will not be kept as surprise.

There will be new languages that were not part of the 2017 evaluation (Afrikaans and Serbian already satisfy the requirements; others may be available when the training data is released).

Organizing committee: Daniel Zeman (chair, ÚFAL), Jan Hajič (ÚFAL), Joakim Nivre (Uppsala University), Filip Ginter (University of Turku), Slav Petrov (Google), Milan Straka (ÚFAL), Martin Popel (ÚFAL).

For more details see the main website of the shared task at http://universaldependencies.org/conll18/.

Search form