Deep Universal Dependencies

Guidelines

Universal Dependencies (UD, Nivre et al. 2016) is an international initiative that strives 1) to define cross-linguistically applicable annotation scheme for morphology and surface dependency syntax of natural languages, and 2) to provide annotated data in as many languages as possible. One of the limitations of UD is that it stops at the level of surface syntax and does not annotate aspects of deep syntax and semantics, important for natural language understanding. Such aspects include predicate-argument structure, ellipsis, coreference, among others. There seems to be some demand for a “deep” extension of UD in the research community, but no such initiative is currently running, at least not at a multilinguality level that would be comparable to UD. The main obstacle is that annotating data on a deep syntactic or semantic level is even more costly than the surface-syntactic and morphological annotation. Therefore the research topic is to explore ways how and to what extent a deep-syntactic representation can be acquired automatically or semi-automatically, for many different languages. At least three lines of research come to mind: 1) for some languages (Czech, English, Finnish) there are corpora with various kinds of deep syntactic representation. Convert them to a common annotation style. 2) Explore ways of deducing approximate deep annotation from existing surface annotation (UD). 3) Explore ways of deep model transfer from resource-rich languages to related resource-poor languages.

References

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajič, Christopher Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, Daniel Zeman (2016): Universal Dependencies v1: A Multilingual Treebank Collection. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1659-1666, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1