Author:

Status:

released

OS:

Linux, Windows, OS X

License:

MPL 2.0

Tags:

Morphology, Parsers, Taggers, Tools

UDPipe 1

Be aware that this page describes UDPipe 1. You might be also interested in visiting UDPipe 2 page.

Introduction
Online Web Application and Web Service
Release
- 3.1. Download
  - 3.1.1. Available Models
- 3.2. License
UDPipe Installation
UDPipe Models
UDPipe User's Manual
UDPipe API Reference
Contact
Acknowledgements

1. Introduction

UDPipe is a trainable pipeline for tokenization, tagging, lemmatization and dependency parsing of CoNLL-U files. UDPipe is language-agnostic and can be trained given annotated data in CoNLL-U format. Trained models are provided for nearly all UD treebanks. UDPipe is available as a binary for Linux/Windows/OS X, as a library for C++, Python, Perl, Java, C#, and as a web service. Third-party R CRAN package also exists.

UDPipe is a free software distributed under the Mozilla Public License 2.0 and the linguistic models are free for non-commercial use and distributed under the CC BY-NC-SA license, although for some models the original data used to create the model may impose additional licensing conditions. UDPipe is versioned using Semantic Versioning.

2. Online Web Application and Web Service

UDPipe Web Application is available at http://lindat.mff.cuni.cz/services/udpipe/ using LINDAT/CLARIN infrastructure.

UDPipe REST Web Service is also available, with the API documentation available at http://lindat.mff.cuni.cz/services/udpipe/api-reference.php.

3. Release

3.1. Download

UDPipe releases are available on GitHub, both as source code and as a pre-compiled binary package. The binary package contains Linux, Windows and OS X binaries, Java bindings binary, C# bindings binary, and source code of UDPipe and all language bindings). While the binary packages do not contain compiled Python or Perl bindings, packages for those languages are available in standard package repositories, i.e. on PyPI and CPAN.

You might also be interested in a contributed package spacy-udpipe which wraps UDPipe with spaCy API.

3.1.1. Available Models

To use UDPipe, a model is needed. The models are available from LINDAT/CLARIN infrastructure and described further in the UDPipe Models. Currently, the following models are available:

Universal Dependencies 2.5 Models: udpipe-ud2.5-191206 (documentation)
Universal Dependencies 2.4 Models: udpipe-ud2.4-190531 (documentation)
Universal Dependencies 2.3 Models: udpipe-ud2.3-181115 (documentation)
CoNLL18 Shared Task Baseline UD 2.2 Models: udpipe-ud2.2-conll18-180430 (documentation)
Universal Dependencies 2.0 Models: udpipe-ud2.0-170801 (documentation)
CoNLL17 Shared Task Baseline UD 2.0 Models: udpipe-ud2.0-conll17-170315 (documentation)
Universal Dependencies 1.2 Models: udpipe-ud1.2-160523 (documentation)

3.2. License

UDPipe is an open-source project and is freely available for non-commercial purposes. The library is distributed under Mozilla Public License 2.0 and the associated models and data under CC BY-NC-SA, although for some models the original data used to create the model may impose additional licensing conditions.

If you use this tool for scientific work, please give credit to us by referencing Straka et al. 2016 and the UDPipe website.

4. UDPipe Installation

UDPipe Installation is available on a separate page.

5. UDPipe Models

UDPipe Models are available on a separate page.

6. UDPipe User's Manual

UDPipe User's Manual is available on a separate page.

7. UDPipe API Reference

UDPipe API Reference is available on a separate page.

8. Contact

Authors:

Milan Straka, straka@ufal.mff.cuni.cz

UDPipe website.

UDPipe LINDAT/CLARIN entry.

9. Acknowledgements

This work has been using language resources developed and/or stored and/or distributed by the LINDAT/CLARIN project of the Ministry of Education of the Czech Republic (project LM2010013).

Acknowledgements for individual models are listed in the UDPipe User's Manual page.

9.1. Publications

(Straka et al. 2017) Milan Straka and Jana Straková. Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada, August 2017.
(Straka et al. 2016) Straka Milan, Hajič Jan, Straková Jana. UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, May 2016.

9.2. Bibtex for Referencing

@InProceedings{udpipe:2017,
  author    = {Straka, Milan  and  Strakov\'{a}, Jana},
  title     = {Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe},
  booktitle = {Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies},
  month     = {August},
  year      = {2017},
  address   = {Vancouver, Canada},
  publisher = {Association for Computational Linguistics},
  pages     = {88--99},
  url       = {http://www.aclweb.org/anthology/K/K17/K17-3009.pdf}
}

9.3. Persistent Identifier

If you prefer to reference UDPipe by a persistent identifier (PID), you can use http://hdl.handle.net/11234/1-1702.

Screenshot:

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form