Status: 
released
OS: 
Linux, Windows, OS X

UDPipe

We are currently preparing UDPipe 1.1 release, it should be available shortly.

1. Introduction

UDPipe is an trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given only annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary, as a library for C++, Python, Perl, Java, C#, and as a web service.

UDPipe is a free software under Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.

Copyright 2016 by Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic.

2. Online

2.1. Online Demo

LINDAT/CLARIN hosts UDPipe Online Demo.

2.2. Web Service

LINDAT/CLARIN also hosts UDPipe Web Service.

3. Release

3.1. Download

UDPipe releases are available on GitHub, either as a pre-compiled binary package, or source code only. The binary package contains Linux, Windows and OS X binaries, Java bindings binary, C# bindings binary, and source code of UDPipe and all language bindings). While the binary packages do not contain compiled Python or Perl bindings, packages for those languages are available in standard package repositories, i.e. on PyPI and CPAN.

We are currently preparing UDPipe 1.1 release, it should be available shortly.

3.1.1. Language Models

To use UDpipe, a language model is needed. The language models are available from LINDAT/CLARIN infrastructure and described further in the UDPipe User's Manual. Currently the following language models are available:

3.2. License

UDPipe is an open-source project and is freely available for non-commercial purposes. The library is distributed under Mozilla Public License 2.0 and the associated models and data under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.

If you use this tool for scientific work, please give credit to us by referencing Straka et al. 2016 and UDPipe website.

3.3. Platforms and Requirements

UDpipe is available as a standalone tool and as a library for Linux/Windows/OS X. It does not require any additional libraries. As any supervised machine learning tool, it needs trained linguistic models.

4. UDPipe Installation

UDPipe Installation on separate page.

5. UDPipe User's Manual

UDPipe User's Manual on separate page.

6. UDPipe API Reference

UDPipe API Reference on separate page.

7. Contact

Authors:

UDPipe website.

UDPipe LINDAT/CLARIN entry.

8. Acknowledgements

This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).

Acknowledgements for individual language models are listed in UDPipe User's Manual page.

8.1. Publications

  • (Straka et al. 2016) Straka Milan, Hajič Jan, Straková Jana. UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May 2016.

8.2. Bibtex for Referencing

@InProceedings{udpipe:2016,
  author    = {Straka, Milan and Haji\v{c}, Jan and Strakov\'{a}, Jana},
  title     = {{UDPipe:} Trainable Pipeline for Processing {CoNLL-U} Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing},
  booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16),
  year      = {2016},
  month     = {May},
  date      = {23-28},
  location  = {Portorož, Slovenia},
  publisher = {European Language Resources Association (ELRA)},
  address   = {Paris, France},
  isbn      = {978-2-9517408-9-1}
}

8.3. Persistent Identifier

If you prefer to reference UDPipe by a persistent identifier (PID), you can use http://hdl.handle.net/11234/1-1702.

Screenshot: