Udapi Tutorial

by Martin Popel

Udapi is an API and framework for processing Universal Dependencies available for Python, Perl and Java. This tutorial uses the Python version and expects Linux+Bash and Python 3.3 or higher.

You can download my slides about UD and Udapi.

Step 1: Install Udapi

Follow the instructions at https://github.com/udapi/udapi-python
pip3 install --user --upgrade git+https://github.com/udapi/udapi-python.git
export PATH="$HOME/.local/bin/:$PATH"

Step 2: Download sample data

Download and extract ud14sample.tgz. There are just 10 sentences for each language plus one bigger file (dev.conllu) for English. For full UDv1.4 go to Lindat.
wget http://ufal.mff.cuni.cz/~popel/udapi/ud14sample.tgz
tar -xf ud14sample.tgz
cd sample

Step 3: Browse your favorite language

Use the udapy commands from my slides.
cat */sample.conllu | udapy -T | less -R

This concatenates all languages and pipes them to udapy and then to less (type q to exit). You can use e.g. UD_English instead of *. The -R option tells less to display colors (instead of their ANSI codes).

The -T prints the trees in text mode and it is actually a shortcut for udapy write.TextModeTrees color=1. Run udapy --help to see other useful shortcuts, e.g.

cat UD_English/sample.conllu | udapy -H > en.html
will create a html version, you can open in any modern browser. -HA will include all the nodes' attributes in the html output.

Step 4: Find out what does the discourse deprel (dependency relation) mean

OptionA: search the documentation.


see the documentation of discourse deprel

OptionB: browse UD_English/dev.conllu as in the previous step and find the occurences of discourse.

udapy -T < UD_English/dev.conllu | less -R
In the less

OptionC: extract all word forms and UPOS tags of nodes annotated with the discourse deprel in UD_English/dev.conllu. Hints: use udapy util.Eval node='PYTHON_CODE' and substitute PYTHON_CODE with a code which should use node.deprel, node.form and node.upos. The standard Unix way of frequency analysis is sort | uniq -c | sort -rn.

udapy util.Eval node='if node.deprel == "discourse": print(node.form, node.upos)' < dev.conllu > disc.txt
cat disc.txt | sort | uniq -c | sort -rn | less

TextLink training school, Prague, February 9, 2017