UDPipe 2 is a Python prototype, capable of performing tagging, lemmatization and syntactic analysis of CoNLL-U input. It took part in several competitions, reaching excellent results in all of them:
Compared to UDPipe 1, it is Python-only, it does not perform tokenization, and the models require more computation power.
The UDPipe 2 models are currently available from the LINDAT UDPipe REST Service. Apart from the web interface, you can use the following Python client script to process your files. TODO
The available models are described on a separate page.
This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
@InProceedings{straka-2018-udpipe,
title = "{UDP}ipe 2.0 Prototype at {C}o{NLL} 2018 {UD} Shared Task",
author = "Straka, Milan",
booktitle = "Proceedings of the {C}o{NLL} 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies",
month = oct,
year = "2018",
address = "Brussels, Belgium",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/K18-2020",
doi = "10.18653/v1/K18-2020",
pages = "197--207",
}