Ten years ago, two CoNLL shared tasks were a major milestone for parsing research in general and dependency parsing in particular. For the first time dependency treebanks in more than ten languages were available for learning parsers; many of them were used in follow-up work, evaluating parsers on multiple languages became a standard; and multiple state-of-the art, open-source parsers became available, facilitating production of dependency structures to be used in downstream applications. While the 2006 & 2007 tasks were extremely important in setting the scene for the following years, there were also limitations that complicated application of their results: 1. gold-standard tokenization and tags in the test data moved the tasks away from real-world scenarios, and 2. incompatible annotation schemes made cross-linguistic comparison impossible. CoNLL 2017 will pick up the threads of the pioneering tasks and address the two issues just mentioned.
The focus of the 2017 task is learning syntactic dependency parsers that can work in a real-world setting, starting from raw text, and that can work over many typologically different languages, even surprise languages for which there is little or no training data, by exploiting a common syntactic annotation standard. This task has been made possible by the Universal Dependencies initiative (UD), which has developed treebanks for 40+ languages with cross-linguistically consistent annotation and recoverability of the original raw texts. For the Shared Task, the annotation scheme called Universal Dependencies version 2 (or UD v2 for short) will be used.
Organizing committee: Jan Hajič (chair, ÚFAL), Daniel Zeman (ÚFAL), Joakim Nivre (Uppsala University), Filip Ginter (University of Turku), Slav Petrov (Google), Milan Straka (ÚFAL), Martin Popel (ÚFAL).
For more details see the main website of the shared task at http://universaldependencies.org/conll17/.
@inproceedings{biblio6072989545437349713, author = {Daniel Zeman and Martin Popel and Milan Straka and Jan Hajič and Joakim Nivre and Filip Ginter and Juhani Luotolahti and Sampo Pyysalo and Slav Petrov and Martin Potthast and Francis Tyers and Elena Badmaeva and Memduh Gökırmak and Anna Nedoluzhko and Silvie Cinková and Jan Hajič, jr. and Jaroslava Hlaváčová and Václava Kettnerová and Zdeňka Urešová and Jenna Kanerva and Stina Ojala and Anna Missilä and Christopher Manning and Sebastian Schuster and Siva Reddy and Dima Taji and Nizar Habash and Herman Leung and Marie-Catherine de Marneffe and Manuela Sanguinetti and Maria Simi and Hiroshi Kanayama and Valeria de Paiva and Kira Droganova and Héctor Martínez Alonso and Çağrı Çöltekin and Umut Sulubacak and Hans Uszkoreit and Vivien Macketanz and Aljoscha Burchardt and Kim Harris and Katrin Marheinecke and Georg Rehm and Tolga Kayadelen and Mohammed Attia and Ali Elkahky and Zhuoran Yu and Emily Pitler and Saran Lertpradit and Michael Mandl and Jesse Kirchner and Hector Fernandez Alcalde and Jana Strnadová and Esha Banerjee and Ruli Manurung and Antonio Stella and Atsuko Shimada and Sookyoung Kwak and Gustavo Mendonça and Tatiana Lando and Rattima Nitisaroj and Josie Li}, year = 2017, title = {CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies}, booktitle = {Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies}, pages = {1--19}, publisher = {Association for Computational Linguistics}, address = {Stroudsburg, PA, USA}, isbn = {978-1-945626-70-8}, }