UDPipe 2 Models
Universal Dependencies 2.10 Models
Universal Dependencies 2.10 Models are distributed under the CC BY-NC-SA licence. The models are based solely on Universal Dependencies 2.10 treebanks, and additionally use multilingual BERT and RobeCzech.
The models require UDPipe 2.
Download
The latest version 220711 of the Universal Dependencies 2.10 models can be downloaded from LINDAT/CLARIN repository.
The models are also available in the REST service.
Acknowledgements
This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
The models were trained on Universal Dependencies 2.10 treebanks.
For the UD treebanks which do not contain original plain text version, raw text is used to train the tokenizer instead. The plain texts were taken from the W2C -- Web to Corpus.
Finally, multilingual BERT and RobeCzech are used to provide contextualized word embeddings.
Publications
- Milan Straka, Jakub Náplava, Jana Straková, David Samuel (2020): RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model. In: Text, Speech, and Dialogue. TSD 2021. Lecture Notes in Computer Science, vol 12848. Springer, Cham.
- Milan Straka, Jana Straková, Jan Hajič (2019): Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing. In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 1904.02099
- Milan Straka (2018): UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In: Proceedings of CoNLL 2018: The SIGNLL Conference on Computational Natural Language Learning, pp. 197-207, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-72-8
Model Description
The Universal Dependencies 2.10 models contain 123 models of 69 languages, each consisting of a tokenizer, tagger, lemmatizer and dependency parser, all trained using the UD data. We used the original train-dev-test split, but for treebanks with only train and no dev data we used last 10% of the train data as dev data. We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using the SpaceAfter=No
features. If the features
are not present in the data, they can be filled in using raw text in the
language in question.
The tagger, lemmatizer and parser are trained using gold UD data.
Model Performance
We present the tokenizer, tagger, lemmatizer and parser performance, measured on
the testing portion of the data, evaluated both on the raw text and using the
gold tokenization. The results are F1 scores measured by the conll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.10-220711 | Raw text | 98.59 | 99.78 | 98.58 | 95.46 | 98.13 | 95.33 | 97.43 | 90.10 | 87.23 | 78.64 | 78.59 |
afrikaans-afribooms-ud-2.10-220711 | Gold tokenization | — | — | 98.77 | 95.62 | 98.31 | 95.50 | 97.53 | 90.72 | 87.80 | 79.23 | 78.99 |
ancient_greek-perseus-ud-2.10-220711 | Raw text | 98.85 | 99.97 | 92.83 | 85.55 | 91.45 | 84.87 | 86.68 | 80.13 | 74.36 | 54.62 | 55.72 |
ancient_greek-perseus-ud-2.10-220711 | Gold tokenization | — | — | 92.88 | 85.60 | 91.47 | 84.90 | 86.70 | 80.32 | 74.53 | 54.73 | 55.87 |
ancient_greek-proiel-ud-2.10-220711 | Raw text | 48.02 | 100.00 | 97.77 | 98.05 | 92.35 | 91.05 | 94.71 | 79.82 | 76.06 | 60.08 | 65.75 |
ancient_greek-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.87 | 98.14 | 92.49 | 91.26 | 94.73 | 86.05 | 82.14 | 67.03 | 71.90 |
ancient_hebrew-ptnk-ud-2.10-220711 | Raw text | 98.06 | 68.76 | 56.80 | 56.94 | 55.13 | 50.80 | 49.88 | 38.73 | 34.67 | 18.47 | 17.69 |
ancient_hebrew-ptnk-ud-2.10-220711 | Gold tokenization | — | — | 68.03 | 67.97 | 66.97 | 56.15 | 53.35 | 63.31 | 51.61 | 28.08 | 24.34 |
arabic-padt-ud-2.10-220711 | Raw text | 82.09 | 94.58 | 91.72 | 89.01 | 89.14 | 88.69 | 90.41 | 78.63 | 74.54 | 65.84 | 67.88 |
arabic-padt-ud-2.10-220711 | Gold tokenization | — | — | 97.02 | 94.38 | 94.53 | 94.08 | 95.31 | 88.11 | 83.49 | 74.57 | 76.13 |
armenian-armtdp-ud-2.10-220711 | Raw text | 95.70 | 99.28 | 96.07 | — | 91.39 | 90.28 | 95.04 | 86.84 | 82.22 | 69.53 | 74.39 |
armenian-armtdp-ud-2.10-220711 | Gold tokenization | — | — | 96.63 | — | 92.03 | 90.77 | 95.70 | 88.50 | 83.81 | 70.18 | 75.42 |
armenian-bsut-ud-2.10-220711 | Raw text | 98.73 | 99.79 | 97.31 | — | 92.01 | 91.24 | 96.62 | 90.02 | 85.75 | 71.20 | 78.86 |
armenian-bsut-ud-2.10-220711 | Gold tokenization | — | — | 97.53 | — | 92.24 | 91.48 | 96.82 | 90.56 | 86.29 | 71.73 | 79.32 |
basque-bdt-ud-2.10-220711 | Raw text | 99.83 | 99.94 | 96.25 | — | 92.69 | 90.69 | 96.36 | 87.40 | 84.28 | 73.94 | 79.81 |
basque-bdt-ud-2.10-220711 | Gold tokenization | — | — | 96.30 | — | 92.73 | 90.72 | 96.39 | 87.48 | 84.36 | 73.99 | 79.86 |
belarusian-hse-ud-2.10-220711 | Raw text | 83.97 | 99.47 | 98.30 | 96.26 | 94.38 | 92.37 | 93.35 | 86.84 | 84.85 | 76.00 | 76.01 |
belarusian-hse-ud-2.10-220711 | Gold tokenization | — | — | 98.81 | 96.74 | 94.94 | 92.87 | 93.86 | 89.55 | 87.38 | 78.23 | 78.11 |
bulgarian-btb-ud-2.10-220711 | Raw text | 94.17 | 99.91 | 99.19 | 97.20 | 97.97 | 96.85 | 97.99 | 94.41 | 91.67 | 85.89 | 86.31 |
bulgarian-btb-ud-2.10-220711 | Gold tokenization | — | — | 99.29 | 97.30 | 98.07 | 96.96 | 98.09 | 95.24 | 92.44 | 86.52 | 87.03 |
catalan-ancora-ud-2.10-220711 | Raw text | 99.08 | 99.95 | 99.07 | 97.21 | 98.70 | 96.96 | 99.40 | 94.86 | 93.14 | 87.45 | 88.92 |
catalan-ancora-ud-2.10-220711 | Gold tokenization | — | — | 99.14 | 97.32 | 98.78 | 97.07 | 99.46 | 95.02 | 93.30 | 87.69 | 89.13 |
chinese-gsdsimp-ud-2.10-220711 | Raw text | 99.10 | 90.29 | 87.21 | 87.16 | 89.74 | 86.42 | 90.29 | 73.11 | 70.62 | 63.58 | 67.09 |
chinese-gsdsimp-ud-2.10-220711 | Gold tokenization | — | — | 96.14 | 96.04 | 99.45 | 95.30 | 99.99 | 87.28 | 84.07 | 78.56 | 82.64 |
chinese-gsd-ud-2.10-220711 | Raw text | 99.10 | 90.27 | 87.15 | 87.05 | 89.71 | 86.36 | 90.27 | 72.85 | 70.29 | 63.41 | 66.89 |
chinese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 96.21 | 96.08 | 99.40 | 95.34 | 99.99 | 87.15 | 83.96 | 78.41 | 82.59 |
classical_chinese-kyoto-ud-2.10-220711 | Raw text | 40.71 | 97.26 | 87.40 | 86.48 | 89.97 | 83.27 | 96.78 | 67.56 | 62.17 | 58.02 | 60.60 |
classical_chinese-kyoto-ud-2.10-220711 | Gold tokenization | — | — | 92.30 | 90.87 | 93.94 | 88.19 | 99.47 | 83.16 | 77.63 | 73.15 | 76.42 |
coptic-scriptorium-ud-2.10-220711 | Raw text | 33.87 | 74.49 | 72.43 | 72.34 | 72.53 | 71.54 | 72.91 | 51.25 | 49.43 | 36.55 | 39.14 |
coptic-scriptorium-ud-2.10-220711 | Gold tokenization | — | — | 96.94 | 96.78 | 97.49 | 95.50 | 97.02 | 90.48 | 87.70 | 76.04 | 79.57 |
croatian-set-ud-2.10-220711 | Raw text | 94.79 | 99.93 | 98.48 | 95.72 | 96.23 | 95.49 | 97.60 | 92.17 | 89.27 | 81.53 | 84.23 |
croatian-set-ud-2.10-220711 | Gold tokenization | — | — | 98.54 | 95.80 | 96.30 | 95.56 | 97.68 | 92.67 | 89.75 | 81.92 | 84.69 |
czech-pdt-ud-2.10-220711 | Raw text | 93.74 | 99.94 | 99.37 | 98.40 | 98.33 | 98.02 | 99.21 | 94.90 | 93.50 | 90.28 | 91.88 |
czech-pdt-ud-2.10-220711 | Gold tokenization | — | — | 99.45 | 98.47 | 98.40 | 98.09 | 99.28 | 95.63 | 94.23 | 90.88 | 92.50 |
czech-cac-ud-2.10-220711 | Raw text | 99.68 | 99.99 | 99.72 | 98.57 | 98.37 | 98.12 | 99.18 | 96.12 | 94.76 | 91.09 | 92.67 |
czech-cac-ud-2.10-220711 | Gold tokenization | — | — | 99.73 | 98.58 | 98.38 | 98.13 | 99.19 | 96.12 | 94.76 | 91.11 | 92.69 |
czech-cltt-ud-2.10-220711 | Raw text | 97.79 | 99.71 | 99.22 | 95.32 | 95.23 | 95.03 | 99.18 | 90.77 | 89.24 | 81.35 | 86.22 |
czech-cltt-ud-2.10-220711 | Gold tokenization | — | — | 99.47 | 95.47 | 95.40 | 95.18 | 99.47 | 91.20 | 89.68 | 81.67 | 86.73 |
czech-fictree-ud-2.10-220711 | Raw text | 98.95 | 99.99 | 99.17 | 97.06 | 97.83 | 96.86 | 99.35 | 96.38 | 94.91 | 89.61 | 92.81 |
czech-fictree-ud-2.10-220711 | Gold tokenization | — | — | 99.18 | 97.08 | 97.84 | 96.88 | 99.36 | 96.46 | 94.97 | 89.71 | 92.91 |
danish-ddt-ud-2.10-220711 | Raw text | 89.78 | 99.81 | 97.95 | — | 97.29 | 96.54 | 97.26 | 88.27 | 86.25 | 79.22 | 80.96 |
danish-ddt-ud-2.10-220711 | Gold tokenization | — | — | 98.16 | — | 97.53 | 96.79 | 97.45 | 89.46 | 87.42 | 80.42 | 82.17 |
dutch-alpino-ud-2.10-220711 | Raw text | 88.98 | 99.83 | 97.86 | 96.79 | 97.80 | 96.29 | 95.11 | 92.95 | 90.58 | 83.15 | 79.88 |
dutch-alpino-ud-2.10-220711 | Gold tokenization | — | — | 97.97 | 96.87 | 97.91 | 96.41 | 95.26 | 94.00 | 91.63 | 84.17 | 80.83 |
dutch-lassysmall-ud-2.10-220711 | Raw text | 74.93 | 99.80 | 96.98 | 95.62 | 96.63 | 94.88 | 95.70 | 90.61 | 87.94 | 79.67 | 78.25 |
dutch-lassysmall-ud-2.10-220711 | Gold tokenization | — | — | 97.25 | 96.43 | 97.36 | 95.83 | 95.97 | 94.51 | 91.66 | 84.48 | 83.08 |
english-ewt-ud-2.10-220711 | Raw text | 87.02 | 98.95 | 96.39 | 96.13 | 96.53 | 94.80 | 97.13 | 90.07 | 88.10 | 81.47 | 83.42 |
english-ewt-ud-2.10-220711 | Gold tokenization | — | — | 97.35 | 97.06 | 97.52 | 95.71 | 98.07 | 92.62 | 90.56 | 84.02 | 85.98 |
english-atis-ud-2.10-220711 | Raw text | 81.96 | 100.00 | 98.97 | — | 98.54 | 98.13 | 99.94 | 94.39 | 92.92 | 87.85 | 90.39 |
english-atis-ud-2.10-220711 | Gold tokenization | — | — | 98.97 | — | 98.56 | 98.15 | 99.94 | 95.88 | 94.26 | 89.80 | 92.40 |
english-gum-ud-2.10-220711 | Raw text | 95.36 | 99.64 | 97.95 | 97.91 | 97.88 | 96.91 | 98.77 | 92.30 | 90.35 | 84.63 | 86.31 |
english-gum-ud-2.10-220711 | Gold tokenization | — | — | 98.27 | 98.26 | 98.22 | 97.24 | 99.09 | 93.17 | 91.19 | 85.42 | 87.04 |
english-lines-ud-2.10-220711 | Raw text | 87.45 | 99.92 | 97.71 | 96.77 | 97.02 | 94.41 | 98.40 | 91.17 | 88.22 | 80.27 | 83.45 |
english-lines-ud-2.10-220711 | Gold tokenization | — | — | 97.79 | 96.84 | 97.07 | 94.48 | 98.47 | 92.10 | 89.17 | 81.05 | 84.36 |
english-partut-ud-2.10-220711 | Raw text | 100.00 | 99.72 | 97.23 | 97.11 | 96.35 | 95.26 | 98.14 | 94.24 | 92.21 | 83.35 | 87.34 |
english-partut-ud-2.10-220711 | Gold tokenization | — | — | 97.48 | 97.36 | 96.60 | 95.51 | 98.42 | 94.48 | 92.46 | 83.74 | 87.62 |
estonian-edt-ud-2.10-220711 | Raw text | 92.03 | 99.95 | 97.68 | 98.31 | 96.28 | 95.07 | 95.36 | 88.81 | 86.16 | 79.92 | 79.56 |
estonian-edt-ud-2.10-220711 | Gold tokenization | — | — | 97.81 | 98.36 | 96.36 | 95.19 | 95.43 | 89.71 | 87.03 | 80.77 | 80.37 |
estonian-ewt-ud-2.10-220711 | Raw text | 75.26 | 98.82 | 95.41 | 96.29 | 94.06 | 91.92 | 93.86 | 82.62 | 79.30 | 71.40 | 72.35 |
estonian-ewt-ud-2.10-220711 | Gold tokenization | — | — | 96.65 | 97.43 | 95.15 | 93.10 | 94.97 | 86.76 | 83.25 | 74.79 | 75.57 |
faroese-farpahc-ud-2.10-220711 | Raw text | 92.77 | 99.74 | 97.44 | 93.04 | 94.43 | 92.50 | 99.74 | 85.76 | 82.13 | 68.07 | 75.34 |
faroese-farpahc-ud-2.10-220711 | Gold tokenization | — | — | 97.64 | 93.28 | 94.68 | 92.72 | 100.00 | 86.82 | 83.10 | 69.17 | 76.51 |
finnish-tdt-ud-2.10-220711 | Raw text | 90.82 | 99.70 | 97.58 | 98.18 | 95.99 | 95.10 | 92.14 | 90.20 | 88.18 | 82.19 | 78.16 |
finnish-tdt-ud-2.10-220711 | Gold tokenization | — | — | 97.92 | 98.49 | 96.29 | 95.43 | 92.46 | 91.51 | 89.46 | 83.20 | 79.17 |
finnish-ftb-ud-2.10-220711 | Raw text | 86.84 | 99.91 | 96.69 | 95.14 | 96.83 | 94.02 | 95.57 | 89.80 | 87.18 | 80.04 | 80.49 |
finnish-ftb-ud-2.10-220711 | Gold tokenization | — | — | 97.00 | 95.36 | 96.92 | 94.32 | 95.67 | 91.91 | 89.23 | 82.55 | 82.84 |
french-gsd-ud-2.10-220711 | Raw text | 94.69 | 98.78 | 97.26 | — | 97.35 | 96.63 | 97.55 | 92.76 | 90.82 | 84.55 | 86.32 |
french-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.44 | — | 98.47 | 97.71 | 98.75 | 94.55 | 92.71 | 86.34 | 87.59 |
french-parisstories-ud-2.10-220711 | Raw text | 87.87 | 99.49 | 96.24 | — | 94.41 | 92.17 | 97.55 | 79.95 | 74.84 | 61.23 | 68.35 |
french-parisstories-ud-2.10-220711 | Gold tokenization | — | — | 96.81 | — | 94.90 | 92.68 | 97.98 | 81.67 | 76.50 | 62.46 | 69.32 |
french-partut-ud-2.10-220711 | Raw text | 100.00 | 99.48 | 97.26 | 96.76 | 94.72 | 93.96 | 97.33 | 94.72 | 92.81 | 81.09 | 86.22 |
french-partut-ud-2.10-220711 | Gold tokenization | — | — | 97.89 | 97.35 | 95.27 | 94.51 | 97.89 | 95.62 | 93.85 | 82.18 | 87.24 |
french-rhapsodie-ud-2.10-220711 | Raw text | 99.47 | 99.22 | 97.20 | 97.45 | 96.12 | 93.30 | 98.26 | 88.71 | 84.99 | 75.15 | 79.88 |
french-rhapsodie-ud-2.10-220711 | Gold tokenization | — | — | 98.00 | 98.13 | 96.89 | 93.97 | 98.99 | 89.89 | 86.08 | 75.91 | 80.35 |
french-sequoia-ud-2.10-220711 | Raw text | 84.02 | 99.15 | 98.32 | — | 97.15 | 96.68 | 98.33 | 93.60 | 92.22 | 86.08 | 89.00 |
french-sequoia-ud-2.10-220711 | Gold tokenization | — | — | 99.24 | — | 97.95 | 97.54 | 99.13 | 95.43 | 94.11 | 88.00 | 90.34 |
galician-ctg-ud-2.10-220711 | Raw text | 97.22 | 99.22 | 97.28 | 97.05 | 99.06 | 96.70 | 98.04 | 85.59 | 83.20 | 72.11 | 76.94 |
galician-ctg-ud-2.10-220711 | Gold tokenization | — | — | 98.01 | 97.78 | 99.84 | 97.41 | 98.79 | 87.31 | 84.80 | 74.04 | 78.88 |
galician-treegal-ud-2.10-220711 | Raw text | 87.99 | 98.74 | 96.00 | 93.69 | 94.85 | 92.82 | 96.67 | 83.44 | 79.36 | 67.82 | 71.68 |
galician-treegal-ud-2.10-220711 | Gold tokenization | — | — | 97.19 | 94.83 | 95.94 | 93.91 | 97.86 | 86.75 | 82.40 | 71.30 | 75.54 |
german-hdt-ud-2.10-220711 | Raw text | 92.34 | 99.90 | 98.51 | 98.45 | 94.08 | 93.70 | 97.16 | 96.94 | 96.04 | 84.79 | 90.40 |
german-hdt-ud-2.10-220711 | Gold tokenization | — | — | 98.62 | 98.58 | 94.22 | 93.83 | 97.26 | 97.63 | 96.75 | 85.46 | 91.08 |
german-gsd-ud-2.10-220711 | Raw text | 81.12 | 99.81 | 95.78 | 97.68 | 90.23 | 87.27 | 96.75 | 87.32 | 83.12 | 63.79 | 75.00 |
german-gsd-ud-2.10-220711 | Gold tokenization | — | — | 95.94 | 97.87 | 90.60 | 87.60 | 96.96 | 89.28 | 85.04 | 65.33 | 76.75 |
gothic-proiel-ud-2.10-220711 | Raw text | 31.12 | 100.00 | 96.48 | 96.98 | 90.08 | 88.19 | 94.62 | 74.17 | 68.40 | 55.39 | 62.02 |
gothic-proiel-ud-2.10-220711 | Gold tokenization | — | — | 96.97 | 97.42 | 90.90 | 89.33 | 94.71 | 84.11 | 78.29 | 65.73 | 71.10 |
greek-gdt-ud-2.10-220711 | Raw text | 90.19 | 99.87 | 98.09 | 98.10 | 95.60 | 95.01 | 95.61 | 93.05 | 91.24 | 81.58 | 81.04 |
greek-gdt-ud-2.10-220711 | Gold tokenization | — | — | 98.23 | 98.24 | 95.79 | 95.20 | 95.70 | 93.85 | 92.04 | 82.28 | 81.75 |
hebrew-htb-ud-2.10-220711 | Raw text | 99.39 | 85.05 | 82.78 | 82.80 | 81.23 | 80.53 | 82.91 | 70.63 | 68.13 | 55.31 | 59.44 |
hebrew-htb-ud-2.10-220711 | Gold tokenization | — | — | 97.44 | 97.42 | 95.73 | 95.05 | 97.34 | 92.71 | 90.24 | 78.77 | 81.80 |
hebrew-iahltwiki-ud-2.10-220711 | Raw text | 97.16 | 88.54 | 85.97 | 86.00 | 80.55 | 79.47 | 87.15 | 76.16 | 74.19 | 56.91 | 66.92 |
hebrew-iahltwiki-ud-2.10-220711 | Gold tokenization | — | — | 97.09 | 97.10 | 91.59 | 90.41 | 98.24 | 93.88 | 91.45 | 74.27 | 85.44 |
hindi-hdtb-ud-2.10-220711 | Raw text | 98.90 | 100.00 | 97.57 | 97.12 | 94.16 | 92.23 | 98.92 | 95.30 | 92.32 | 79.20 | 87.66 |
hindi-hdtb-ud-2.10-220711 | Gold tokenization | — | — | 97.58 | 97.14 | 94.18 | 92.26 | 98.92 | 95.42 | 92.44 | 79.35 | 87.81 |
hungarian-szeged-ud-2.10-220711 | Raw text | 95.89 | 99.85 | 96.68 | — | 94.22 | 93.53 | 94.92 | 88.81 | 85.09 | 75.22 | 78.19 |
hungarian-szeged-ud-2.10-220711 | Gold tokenization | — | — | 96.79 | — | 94.36 | 93.64 | 95.04 | 89.31 | 85.54 | 75.51 | 78.47 |
icelandic-icepahc-ud-2.10-220711 | Raw text | 92.15 | 99.82 | 96.90 | 93.24 | 91.32 | 86.42 | 95.99 | 87.21 | 83.36 | 65.98 | 74.25 |
icelandic-icepahc-ud-2.10-220711 | Gold tokenization | — | — | 97.08 | 93.45 | 91.47 | 86.61 | 96.15 | 87.78 | 83.87 | 66.49 | 74.84 |
icelandic-modern-ud-2.10-220711 | Raw text | 99.22 | 99.92 | 99.07 | 98.14 | 98.38 | 97.88 | 98.91 | 94.41 | 93.17 | 89.31 | 90.07 |
icelandic-modern-ud-2.10-220711 | Gold tokenization | — | — | 99.14 | 98.21 | 98.45 | 97.95 | 98.98 | 94.50 | 93.26 | 89.41 | 90.16 |
indonesian-gsd-ud-2.10-220711 | Raw text | 92.90 | 99.48 | 94.23 | 93.81 | 95.53 | 88.78 | 98.13 | 87.65 | 81.59 | 72.35 | 77.02 |
indonesian-gsd-ud-2.10-220711 | Gold tokenization | — | — | 94.66 | 94.26 | 95.99 | 89.17 | 98.53 | 88.57 | 82.42 | 73.21 | 77.88 |
indonesian-csui-ud-2.10-220711 | Raw text | 91.01 | 99.45 | 96.05 | 96.14 | 96.85 | 95.43 | 98.23 | 86.38 | 82.10 | 76.54 | 78.80 |
indonesian-csui-ud-2.10-220711 | Gold tokenization | — | — | 96.56 | 96.72 | 97.37 | 95.99 | 98.87 | 87.77 | 83.28 | 77.62 | 79.92 |
irish-idt-ud-2.10-220711 | Raw text | 97.25 | 99.72 | 95.63 | 94.76 | 90.33 | 87.14 | 95.30 | 86.74 | 81.10 | 64.20 | 71.52 |
irish-idt-ud-2.10-220711 | Gold tokenization | — | — | 95.89 | 95.07 | 90.60 | 87.46 | 95.54 | 87.28 | 81.64 | 64.56 | 71.85 |
italian-isdt-ud-2.10-220711 | Raw text | 98.76 | 99.84 | 98.57 | 98.50 | 98.25 | 97.67 | 98.79 | 94.66 | 93.01 | 86.61 | 88.00 |
italian-isdt-ud-2.10-220711 | Gold tokenization | — | — | 98.72 | 98.65 | 98.41 | 97.83 | 98.95 | 94.96 | 93.34 | 86.97 | 88.40 |
italian-markit-ud-2.10-220711 | Raw text | 98.24 | 99.59 | 96.76 | 97.00 | 93.80 | 92.08 | 88.18 | 88.36 | 84.51 | 69.95 | 77.77 |
italian-markit-ud-2.10-220711 | Gold tokenization | — | — | 97.15 | 97.40 | 94.10 | 92.35 | 88.54 | 89.13 | 85.26 | 70.51 | 78.46 |
italian-partut-ud-2.10-220711 | Raw text | 100.00 | 99.73 | 98.43 | 98.43 | 98.35 | 97.61 | 98.68 | 96.21 | 94.18 | 87.87 | 89.09 |
italian-partut-ud-2.10-220711 | Gold tokenization | — | — | 98.54 | 98.57 | 98.49 | 97.69 | 98.93 | 96.26 | 94.15 | 87.68 | 89.07 |
italian-postwita-ud-2.10-220711 | Raw text | 28.11 | 99.40 | 96.43 | 96.18 | 96.30 | 94.79 | 96.72 | 80.61 | 76.89 | 65.29 | 66.90 |
italian-postwita-ud-2.10-220711 | Gold tokenization | — | — | 97.04 | 96.82 | 96.80 | 95.29 | 97.31 | 88.34 | 84.19 | 75.32 | 77.32 |
italian-twittiro-ud-2.10-220711 | Raw text | 39.36 | 99.14 | 95.92 | 95.92 | 95.07 | 93.46 | 94.50 | 82.23 | 77.79 | 64.50 | 65.42 |
italian-twittiro-ud-2.10-220711 | Gold tokenization | — | — | 96.91 | 96.61 | 96.00 | 94.15 | 95.16 | 88.07 | 83.53 | 71.89 | 72.69 |
italian-vit-ud-2.10-220711 | Raw text | 96.73 | 99.76 | 98.14 | 97.39 | 97.64 | 96.21 | 98.89 | 92.08 | 89.16 | 80.93 | 83.70 |
italian-vit-ud-2.10-220711 | Gold tokenization | — | — | 98.36 | 97.71 | 97.85 | 96.53 | 99.10 | 92.88 | 89.97 | 81.91 | 84.63 |
japanese-gsd-ud-2.10-220711 | Raw text | 100.00 | 96.17 | 94.93 | 94.18 | 96.16 | 93.81 | 95.05 | 87.68 | 86.85 | 80.43 | 80.78 |
japanese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.55 | 97.50 | 99.99 | 97.13 | 98.47 | 94.73 | 93.75 | 88.50 | 88.34 |
japanese-gsdluw-ud-2.10-220711 | Raw text | 99.72 | 95.18 | 93.81 | 93.54 | 95.18 | 93.46 | 93.66 | 86.22 | 85.54 | 76.27 | 76.58 |
japanese-gsdluw-ud-2.10-220711 | Gold tokenization | — | — | 98.36 | 98.05 | 100.00 | 97.93 | 97.89 | 95.23 | 94.18 | 86.38 | 85.19 |
korean-kaist-ud-2.10-220711 | Raw text | 100.00 | 100.00 | 95.88 | 87.74 | — | 87.56 | 94.17 | 89.33 | 87.47 | 82.15 | 80.14 |
korean-kaist-ud-2.10-220711 | Gold tokenization | — | — | 95.88 | 87.74 | — | 87.56 | 94.17 | 89.33 | 87.47 | 82.15 | 80.14 |
korean-gsd-ud-2.10-220711 | Raw text | 93.93 | 99.87 | 96.57 | 90.27 | 99.67 | 88.02 | 93.57 | 88.54 | 84.91 | 80.73 | 77.23 |
korean-gsd-ud-2.10-220711 | Gold tokenization | — | — | 96.73 | 90.43 | 99.80 | 88.20 | 93.69 | 89.27 | 85.61 | 81.45 | 77.93 |
latin-ittb-ud-2.10-220711 | Raw text | 91.21 | 99.99 | 98.91 | 96.58 | 96.75 | 95.19 | 99.18 | 90.53 | 88.53 | 82.07 | 86.07 |
latin-ittb-ud-2.10-220711 | Gold tokenization | — | — | 98.92 | 96.57 | 96.78 | 95.20 | 99.18 | 91.50 | 89.51 | 82.63 | 86.59 |
latin-llct-ud-2.10-220711 | Raw text | 99.49 | 100.00 | 99.68 | 97.14 | 97.26 | 96.89 | 97.78 | 95.55 | 94.56 | 89.80 | 90.95 |
latin-llct-ud-2.10-220711 | Gold tokenization | — | — | 99.68 | 97.15 | 97.27 | 96.90 | 97.78 | 95.55 | 94.57 | 89.81 | 90.97 |
latin-perseus-ud-2.10-220711 | Raw text | 98.46 | 100.00 | 91.83 | 80.66 | 86.12 | 78.56 | 88.13 | 77.98 | 68.59 | 52.30 | 55.51 |
latin-perseus-ud-2.10-220711 | Gold tokenization | — | — | 91.85 | 80.66 | 86.12 | 78.55 | 88.16 | 78.14 | 68.71 | 52.39 | 55.58 |
latin-proiel-ud-2.10-220711 | Raw text | 36.81 | 99.87 | 96.69 | 96.87 | 90.56 | 89.54 | 96.21 | 74.07 | 69.56 | 56.74 | 63.93 |
latin-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.12 | 97.32 | 91.19 | 90.27 | 96.44 | 83.20 | 78.50 | 66.34 | 73.00 |
latin-udante-ud-2.10-220711 | Raw text | 98.81 | 99.61 | 90.58 | 75.59 | 81.31 | 71.62 | 87.25 | 75.26 | 67.81 | 43.95 | 50.36 |
latin-udante-ud-2.10-220711 | Gold tokenization | — | — | 90.82 | 75.70 | 81.53 | 71.70 | 87.44 | 75.50 | 67.97 | 44.08 | 50.51 |
latvian-lvtb-ud-2.10-220711 | Raw text | 97.83 | 99.31 | 96.51 | 89.83 | 93.86 | 89.08 | 95.92 | 88.75 | 85.79 | 76.04 | 80.25 |
latvian-lvtb-ud-2.10-220711 | Gold tokenization | — | — | 97.14 | 90.43 | 94.50 | 89.67 | 96.55 | 89.84 | 86.82 | 77.09 | 81.31 |
lithuanian-alksnis-ud-2.10-220711 | Raw text | 87.87 | 99.91 | 95.94 | 90.44 | 91.03 | 89.52 | 93.60 | 82.45 | 78.64 | 67.97 | 71.37 |
lithuanian-alksnis-ud-2.10-220711 | Gold tokenization | — | — | 96.04 | 90.52 | 91.16 | 89.63 | 93.69 | 83.70 | 79.88 | 68.98 | 72.36 |
lithuanian-hse-ud-2.10-220711 | Raw text | 97.30 | 97.30 | 89.28 | 90.21 | 83.13 | 78.38 | 88.16 | 70.27 | 61.79 | 45.67 | 54.04 |
lithuanian-hse-ud-2.10-220711 | Gold tokenization | — | — | 91.23 | 92.36 | 85.19 | 80.09 | 90.57 | 73.96 | 64.53 | 47.54 | 56.10 |
maltese-mudt-ud-2.10-220711 | Raw text | 86.29 | 99.84 | 95.80 | 95.79 | — | 95.35 | — | 84.96 | 80.07 | 68.98 | 72.86 |
maltese-mudt-ud-2.10-220711 | Gold tokenization | — | — | 95.95 | 95.92 | — | 95.48 | — | 85.65 | 80.70 | 69.40 | 73.33 |
marathi-ufal-ud-2.10-220711 | Raw text | 92.63 | 90.25 | 76.50 | — | 65.25 | 60.75 | 80.75 | 60.75 | 50.75 | 28.39 | 38.00 |
marathi-ufal-ud-2.10-220711 | Gold tokenization | — | — | 82.52 | — | 67.96 | 62.86 | 80.83 | 68.93 | 58.50 | 29.46 | 38.17 |
naija-nsc-ud-2.10-220711 | Raw text | 100.00 | 99.94 | 98.03 | — | 98.94 | 97.53 | 99.32 | 93.65 | 90.99 | 88.13 | 89.60 |
naija-nsc-ud-2.10-220711 | Gold tokenization | — | — | 98.08 | — | 99.00 | 97.58 | 99.38 | 93.75 | 91.08 | 88.21 | 89.68 |
north_sami-giella-ud-2.10-220711 | Raw text | 98.79 | 99.87 | 91.77 | 93.54 | 89.30 | 85.36 | 87.01 | 75.16 | 70.43 | 59.76 | 58.27 |
north_sami-giella-ud-2.10-220711 | Gold tokenization | — | — | 91.91 | 93.67 | 89.45 | 85.52 | 87.13 | 75.47 | 70.76 | 60.05 | 58.56 |
norwegian-bokmaal-ud-2.10-220711 | Raw text | 96.05 | 99.77 | 98.35 | — | 97.43 | 96.82 | 98.57 | 93.62 | 92.16 | 86.91 | 88.74 |
norwegian-bokmaal-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | — | 97.68 | 97.07 | 98.82 | 94.40 | 92.91 | 87.59 | 89.43 |
norwegian-nynorsk-ud-2.10-220711 | Raw text | 94.17 | 99.93 | 98.24 | — | 97.34 | 96.55 | 98.40 | 93.89 | 92.18 | 86.03 | 88.36 |
norwegian-nynorsk-ud-2.10-220711 | Gold tokenization | — | — | 98.41 | — | 97.50 | 96.73 | 98.53 | 94.63 | 92.93 | 86.93 | 89.20 |
norwegian-nynorsklia-ud-2.10-220711 | Raw text | 99.53 | 99.91 | 96.61 | — | 95.71 | 93.75 | 98.05 | 81.18 | 76.61 | 66.01 | 69.68 |
norwegian-nynorsklia-ud-2.10-220711 | Gold tokenization | — | — | 96.72 | — | 95.80 | 93.85 | 98.14 | 81.42 | 76.84 | 66.23 | 69.90 |
old_church_slavonic-proiel-ud-2.10-220711 | Raw text | 41.43 | 100.00 | 96.72 | 96.90 | 90.37 | 89.19 | 93.13 | 77.71 | 73.92 | 63.82 | 68.87 |
old_church_slavonic-proiel-ud-2.10-220711 | Gold tokenization | — | — | 97.08 | 97.28 | 91.06 | 89.93 | 93.14 | 88.30 | 84.18 | 74.01 | 77.39 |
old_french-srcmf-ud-2.10-220711 | Raw text | 100.00 | 99.70 | 96.68 | 96.50 | 97.70 | 95.72 | 99.65 | 91.17 | 87.38 | 80.76 | 84.40 |
old_french-srcmf-ud-2.10-220711 | Gold tokenization | — | — | 96.99 | 96.82 | 98.01 | 96.03 | 99.95 | 91.58 | 87.82 | 81.20 | 84.85 |
old_russian-torot-ud-2.10-220711 | Raw text | 29.60 | 100.00 | 94.39 | 94.70 | 87.56 | 85.23 | 85.92 | 71.00 | 65.32 | 51.64 | 53.64 |
old_russian-torot-ud-2.10-220711 | Gold tokenization | — | — | 95.06 | 95.29 | 88.50 | 86.60 | 85.96 | 83.30 | 77.24 | 64.09 | 62.94 |
old_russian-rnc-ud-2.10-220711 | Raw text | 84.03 | 97.48 | 90.94 | 86.55 | 76.51 | 67.15 | 75.31 | 61.28 | 55.93 | 33.24 | 34.04 |
old_russian-rnc-ud-2.10-220711 | Gold tokenization | — | — | 93.29 | 88.93 | 78.48 | 68.86 | 76.77 | 67.13 | 61.08 | 37.15 | 37.24 |
old_east_slavic-birchbark-ud-2.10-220711 | Raw text | 16.73 | 99.98 | 89.24 | 99.35 | 76.11 | 72.43 | 65.88 | 63.41 | 56.50 | 32.53 | 27.14 |
old_east_slavic-birchbark-ud-2.10-220711 | Gold tokenization | — | — | 89.37 | 99.37 | 76.54 | 72.82 | 66.05 | 76.31 | 69.00 | 41.63 | 33.60 |
persian-perdt-ud-2.10-220711 | Raw text | 99.83 | 99.66 | 97.48 | 97.36 | 97.61 | 95.60 | 98.88 | 93.63 | 91.42 | 86.18 | 88.66 |
persian-perdt-ud-2.10-220711 | Gold tokenization | — | — | 97.78 | 97.65 | 97.90 | 95.89 | 99.19 | 94.18 | 91.95 | 86.72 | 89.23 |
persian-seraji-ud-2.10-220711 | Raw text | 98.75 | 99.65 | 97.91 | 97.94 | 97.95 | 97.48 | 96.52 | 91.68 | 88.84 | 84.21 | 82.83 |
persian-seraji-ud-2.10-220711 | Gold tokenization | — | — | 98.24 | 98.28 | 98.28 | 97.78 | 96.80 | 92.36 | 89.48 | 84.82 | 83.40 |
polish-pdb-ud-2.10-220711 | Raw text | 97.33 | 99.85 | 98.89 | 95.89 | 96.11 | 95.26 | 98.10 | 94.22 | 92.19 | 85.44 | 88.36 |
polish-pdb-ud-2.10-220711 | Gold tokenization | — | — | 99.05 | 96.03 | 96.24 | 95.40 | 98.24 | 94.72 | 92.69 | 85.83 | 88.78 |
polish-lfg-ud-2.10-220711 | Raw text | 99.65 | 99.85 | 99.00 | 96.08 | 96.57 | 95.16 | 98.24 | 96.86 | 95.51 | 89.80 | 92.34 |
polish-lfg-ud-2.10-220711 | Gold tokenization | — | — | 99.17 | 96.25 | 96.74 | 95.33 | 98.38 | 97.25 | 95.89 | 90.19 | 92.66 |
pomak-philotis-ud-2.10-220711 | Raw text | 94.49 | 99.98 | 98.86 | — | 95.62 | 95.30 | 96.67 | 88.24 | 83.26 | 71.19 | 74.14 |
pomak-philotis-ud-2.10-220711 | Gold tokenization | — | — | 98.90 | — | 95.65 | 95.33 | 96.69 | 88.68 | 83.75 | 71.48 | 74.42 |
portuguese-gsd-ud-2.10-220711 | Raw text | 97.28 | 99.87 | 98.51 | 98.51 | 99.74 | 98.41 | 99.27 | 94.50 | 93.41 | 88.76 | 89.96 |
portuguese-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.65 | 98.64 | 99.89 | 98.55 | 99.40 | 94.90 | 93.81 | 89.23 | 90.36 |
portuguese-bosque-ud-2.10-220711 | Raw text | 89.89 | 99.68 | 97.87 | — | 96.95 | 96.00 | 98.35 | 92.35 | 90.07 | 81.38 | 84.69 |
portuguese-bosque-ud-2.10-220711 | Gold tokenization | — | — | 98.22 | — | 97.23 | 96.28 | 98.66 | 93.50 | 91.16 | 82.47 | 85.87 |
romanian-nonstandard-ud-2.10-220711 | Raw text | 96.77 | 98.83 | 96.18 | 91.87 | 90.53 | 89.18 | 94.90 | 88.85 | 84.82 | 68.21 | 76.36 |
romanian-nonstandard-ud-2.10-220711 | Gold tokenization | — | — | 97.30 | 92.86 | 91.49 | 90.10 | 95.99 | 90.57 | 86.50 | 69.69 | 77.68 |
romanian-rrt-ud-2.10-220711 | Raw text | 95.16 | 99.71 | 97.90 | 97.21 | 97.40 | 96.98 | 97.96 | 91.97 | 88.44 | 81.66 | 83.13 |
romanian-rrt-ud-2.10-220711 | Gold tokenization | — | — | 98.19 | 97.45 | 97.65 | 97.22 | 98.22 | 92.72 | 89.13 | 82.15 | 83.70 |
romanian-simonero-ud-2.10-220711 | Raw text | 100.00 | 99.84 | 98.45 | 97.97 | 97.56 | 97.25 | 98.91 | 94.08 | 92.13 | 85.52 | 88.32 |
romanian-simonero-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | 98.12 | 97.70 | 97.40 | 99.07 | 94.42 | 92.45 | 85.81 | 88.62 |
russian-syntagrus-ud-2.10-220711 | Raw text | 98.31 | 99.67 | 98.46 | — | 93.96 | 93.71 | 98.18 | 93.84 | 91.70 | 82.72 | 88.90 |
russian-syntagrus-ud-2.10-220711 | Gold tokenization | — | — | 98.79 | — | 94.28 | 94.03 | 98.46 | 94.56 | 92.39 | 83.28 | 89.44 |
russian-gsd-ud-2.10-220711 | Raw text | 96.49 | 99.50 | 98.11 | 97.55 | 94.71 | 93.61 | 97.01 | 91.44 | 88.55 | 81.04 | 84.62 |
russian-gsd-ud-2.10-220711 | Gold tokenization | — | — | 98.58 | 97.98 | 95.17 | 94.01 | 97.43 | 92.67 | 89.69 | 82.00 | 85.65 |
russian-taiga-ud-2.10-220711 | Raw text | 86.33 | 98.12 | 95.65 | — | 93.13 | 92.06 | 94.73 | 83.08 | 79.57 | 70.60 | 73.88 |
russian-taiga-ud-2.10-220711 | Gold tokenization | — | — | 97.34 | — | 94.90 | 93.72 | 96.37 | 85.64 | 81.92 | 72.82 | 76.10 |
sanskrit-vedic-ud-2.10-220711 | Raw text | 27.18 | 100.00 | 89.16 | — | 81.61 | 76.76 | 87.05 | 60.92 | 50.04 | 41.66 | 44.99 |
sanskrit-vedic-ud-2.10-220711 | Gold tokenization | — | — | 89.97 | — | 83.02 | 78.34 | 87.34 | 73.74 | 62.01 | 52.00 | 55.41 |
scottish_gaelic-arcosg-ud-2.10-220711 | Raw text | 60.89 | 97.47 | 93.78 | 89.29 | 90.91 | 88.21 | 95.08 | 81.24 | 75.60 | 62.73 | 69.22 |
scottish_gaelic-arcosg-ud-2.10-220711 | Gold tokenization | — | — | 96.62 | 92.24 | 94.02 | 91.39 | 97.59 | 87.33 | 81.65 | 69.25 | 75.23 |
serbian-set-ud-2.10-220711 | Raw text | 93.00 | 99.99 | 99.09 | 96.00 | 96.21 | 95.75 | 97.76 | 93.63 | 91.20 | 83.76 | 87.00 |
serbian-set-ud-2.10-220711 | Gold tokenization | — | — | 99.13 | 96.01 | 96.20 | 95.75 | 97.78 | 94.26 | 91.80 | 84.32 | 87.60 |
slovak-snk-ud-2.10-220711 | Raw text | 81.69 | 100.00 | 97.65 | 90.35 | 93.50 | 89.56 | 96.46 | 91.39 | 89.65 | 80.43 | 84.44 |
slovak-snk-ud-2.10-220711 | Gold tokenization | — | — | 97.88 | 90.55 | 93.69 | 89.80 | 96.50 | 93.91 | 92.08 | 82.89 | 86.95 |
slovenian-ssj-ud-2.10-220711 | Raw text | 98.95 | 99.94 | 98.97 | 96.97 | 97.15 | 96.63 | 98.58 | 93.99 | 92.60 | 86.83 | 88.91 |
slovenian-ssj-ud-2.10-220711 | Gold tokenization | — | — | 99.03 | 97.02 | 97.23 | 96.69 | 98.63 | 94.15 | 92.76 | 86.99 | 89.02 |
slovenian-sst-ud-2.10-220711 | Raw text | 23.14 | 99.85 | 94.82 | 92.71 | 92.43 | 89.84 | 97.38 | 65.69 | 60.84 | 50.88 | 54.78 |
slovenian-sst-ud-2.10-220711 | Gold tokenization | — | — | 95.62 | 93.09 | 92.84 | 90.89 | 97.56 | 78.39 | 73.07 | 63.39 | 68.33 |
spanish-ancora-ud-2.10-220711 | Raw text | 98.78 | 99.95 | 99.06 | 96.02 | 98.74 | 95.59 | 99.37 | 93.70 | 91.79 | 86.41 | 87.88 |
spanish-ancora-ud-2.10-220711 | Gold tokenization | — | — | 99.11 | 96.07 | 98.79 | 95.63 | 99.42 | 93.88 | 91.97 | 86.59 | 88.04 |
spanish-gsd-ud-2.10-220711 | Raw text | 95.62 | 99.75 | 97.15 | — | 96.94 | 95.27 | 98.72 | 91.87 | 89.57 | 78.63 | 84.25 |
spanish-gsd-ud-2.10-220711 | Gold tokenization | — | — | 97.39 | — | 97.19 | 95.53 | 98.97 | 92.66 | 90.32 | 79.43 | 85.04 |
swedish-talbanken-ud-2.10-220711 | Raw text | 96.53 | 99.84 | 98.44 | 97.33 | 97.32 | 96.51 | 98.15 | 92.23 | 89.85 | 83.92 | 85.97 |
swedish-talbanken-ud-2.10-220711 | Gold tokenization | — | — | 98.61 | 97.52 | 97.51 | 96.72 | 98.32 | 92.68 | 90.30 | 84.48 | 86.54 |
swedish-lines-ud-2.10-220711 | Raw text | 88.00 | 99.96 | 97.66 | 95.51 | 90.84 | 88.14 | 97.72 | 90.60 | 87.38 | 71.82 | 82.17 |
swedish-lines-ud-2.10-220711 | Gold tokenization | — | — | 97.73 | 95.52 | 90.87 | 88.15 | 97.76 | 91.44 | 88.19 | 72.50 | 82.95 |
tamil-ttb-ud-2.10-220711 | Raw text | 97.52 | 94.26 | 84.29 | 83.18 | 84.64 | 78.22 | 89.45 | 70.43 | 61.88 | 50.61 | 55.39 |
tamil-ttb-ud-2.10-220711 | Gold tokenization | — | — | 89.29 | 87.78 | 89.99 | 82.70 | 94.42 | 78.13 | 68.78 | 56.87 | 61.48 |
telugu-mtg-ud-2.10-220711 | Raw text | 96.62 | 99.58 | 93.63 | 93.63 | 98.61 | 93.49 | — | 90.72 | 84.63 | 77.14 | 81.14 |
telugu-mtg-ud-2.10-220711 | Gold tokenization | — | — | 94.04 | 94.04 | 99.03 | 93.90 | — | 91.68 | 85.58 | 77.98 | 81.98 |
turkish-boun-ud-2.10-220711 | Raw text | 86.93 | 98.83 | 91.56 | 92.51 | 91.72 | 86.56 | 93.23 | 78.48 | 72.40 | 59.77 | 65.11 |
turkish-boun-ud-2.10-220711 | Gold tokenization | — | — | 92.53 | 93.47 | 92.67 | 87.31 | 94.26 | 81.07 | 74.73 | 61.33 | 66.92 |
turkish-atis-ud-2.10-220711 | Raw text | 80.20 | 100.00 | 98.96 | — | 98.46 | 98.25 | 99.15 | 89.22 | 87.49 | 85.12 | 86.08 |
turkish-atis-ud-2.10-220711 | Gold tokenization | — | — | 99.02 | — | 98.52 | 98.32 | 99.13 | 91.11 | 89.30 | 86.98 | 87.93 |
turkish-framenet-ud-2.10-220711 | Raw text | 100.00 | 100.00 | 96.86 | — | 94.89 | 94.21 | 96.66 | 93.39 | 84.25 | 73.98 | 77.64 |
turkish-framenet-ud-2.10-220711 | Gold tokenization | — | — | 96.86 | — | 94.89 | 94.21 | 96.66 | 93.39 | 84.25 | 73.98 | 77.64 |
turkish-imst-ud-2.10-220711 | Raw text | 96.97 | 98.30 | 94.38 | 93.98 | 90.92 | 88.60 | 94.54 | 74.73 | 69.04 | 58.25 | 63.10 |
turkish-imst-ud-2.10-220711 | Gold tokenization | — | — | 95.94 | 95.49 | 92.40 | 89.97 | 96.13 | 78.07 | 72.09 | 60.26 | 65.33 |
turkish-kenet-ud-2.10-220711 | Raw text | 98.12 | 100.00 | 93.71 | — | 92.05 | 90.86 | 93.33 | 83.91 | 71.18 | 61.81 | 64.77 |
turkish-kenet-ud-2.10-220711 | Gold tokenization | — | — | 93.72 | — | 92.06 | 90.87 | 93.33 | 84.07 | 71.29 | 61.92 | 64.89 |
turkish-penn-ud-2.10-220711 | Raw text | 80.59 | 99.34 | 95.60 | — | 94.41 | 93.33 | 94.36 | 84.22 | 71.67 | 62.21 | 64.53 |
turkish-penn-ud-2.10-220711 | Gold tokenization | — | — | 96.30 | — | 95.11 | 94.02 | 95.01 | 86.76 | 73.91 | 63.63 | 66.02 |
turkish-tourism-ud-2.10-220711 | Raw text | 99.86 | 99.96 | 98.80 | — | 95.08 | 94.67 | 98.36 | 97.20 | 91.52 | 81.98 | 87.38 |
turkish-tourism-ud-2.10-220711 | Gold tokenization | — | — | 98.85 | — | 95.12 | 94.73 | 98.40 | 97.25 | 91.58 | 82.04 | 87.45 |
turkish_german-sagt-ud-2.10-220711 | Raw text | 99.44 | 98.91 | 90.21 | — | 80.32 | 75.60 | 90.82 | 71.14 | 60.98 | 41.12 | 51.00 |
turkish_german-sagt-ud-2.10-220711 | Gold tokenization | — | — | 91.09 | — | 80.89 | 76.08 | 91.52 | 72.69 | 62.06 | 41.64 | 51.71 |
ukrainian-iu-ud-2.10-220711 | Raw text | 96.61 | 99.81 | 97.90 | 94.35 | 94.18 | 93.12 | 97.34 | 90.61 | 88.27 | 78.92 | 83.01 |
ukrainian-iu-ud-2.10-220711 | Gold tokenization | — | — | 98.08 | 94.54 | 94.34 | 93.29 | 97.53 | 91.12 | 88.72 | 79.21 | 83.36 |
urdu-udtb-ud-2.10-220711 | Raw text | 98.31 | 100.00 | 93.91 | 92.15 | 82.83 | 78.40 | 97.41 | 88.15 | 82.49 | 56.62 | 74.68 |
urdu-udtb-ud-2.10-220711 | Gold tokenization | — | — | 93.93 | 92.17 | 82.86 | 78.43 | 97.41 | 88.23 | 82.58 | 56.67 | 74.77 |
uyghur-udt-ud-2.10-220711 | Raw text | 81.81 | 99.54 | 89.33 | 91.75 | 88.12 | 79.98 | 94.67 | 76.66 | 64.87 | 46.84 | 55.29 |
uyghur-udt-ud-2.10-220711 | Gold tokenization | — | — | 89.71 | 92.30 | 88.59 | 80.50 | 95.14 | 78.38 | 66.49 | 47.83 | 56.56 |
vietnamese-vtb-ud-2.10-220711 | Raw text | 93.46 | 85.37 | 78.21 | 76.76 | 85.12 | 76.57 | 85.16 | 52.68 | 47.84 | 41.55 | 44.29 |
vietnamese-vtb-ud-2.10-220711 | Gold tokenization | — | — | 90.36 | 88.55 | 99.72 | 88.32 | 99.59 | 72.88 | 65.41 | 58.76 | 62.51 |
welsh-ccg-ud-2.10-220711 | Raw text | 97.37 | 99.42 | 95.33 | 94.40 | 89.82 | 87.61 | 93.93 | 86.61 | 80.67 | 63.31 | 69.02 |
welsh-ccg-ud-2.10-220711 | Gold tokenization | — | — | 95.84 | 94.87 | 90.31 | 88.07 | 94.44 | 87.85 | 81.83 | 64.36 | 70.21 |
western_armenian-armtdp-ud-2.10-220711 | Raw text | 98.68 | 99.89 | 96.82 | — | 92.51 | 91.83 | 97.14 | 89.39 | 84.66 | 69.84 | 76.01 |
western_armenian-armtdp-ud-2.10-220711 | Gold tokenization | — | — | 96.90 | — | 92.60 | 91.93 | 97.22 | 89.64 | 84.89 | 70.07 | 76.23 |
wolof-wtb-ud-2.10-220711 | Raw text | 91.95 | 99.23 | 94.20 | 94.15 | 93.50 | 91.41 | 95.20 | 84.15 | 78.69 | 66.75 | 70.23 |
wolof-wtb-ud-2.10-220711 | Gold tokenization | — | — | 95.17 | 95.07 | 94.32 | 92.31 | 95.96 | 86.27 | 80.75 | 68.70 | 72.06 |
Universal Dependencies 2.6 Models
Universal Dependencies 2.6 Models are distributed under the CC BY-NC-SA licence. The models are based solely on Universal Dependencies 2.6 treebanks, and additionally use multilingual BERT.
The models require UDPipe 2.
Download
The latest version 200831 of the Universal Dependencies 2.6 models can be downloaded from LINDAT/CLARIN repository.
The models are also available in the REST service.
Acknowledgements
This work has been supported by the Ministry of Education, Youth and Sports of the Czech Republic, Project No. LM2018101 LINDAT/CLARIAH-CZ.
The models were trained on Universal Dependencies 2.6 treebanks.
For the UD treebanks which do not contain original plain text version, raw text is used to train the tokenizer instead. The plain texts were taken from the W2C -- Web to Corpus.
Finally, multilingual BERT is used to provide contextualized word embeddings.
Publications
- Milan Straka, Jana Straková, Jan Hajič (2019): Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing. In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 1904.02099
- Milan Straka (2018): UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In: Proceedings of CoNLL 2018: The SIGNLL Conference on Computational Natural Language Learning, pp. 197-207, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-72-8
Model Description
The Universal Dependencies 2.6 models contain 99 models of 63 languages, each consisting of a tokenizer, tagger, lemmatizer and dependency parser, all trained using the UD data. We used the original train-dev-test split, but for treebanks with only train and no dev data we used last 10% of the train data as dev data. We produce models only for treebanks with at least 1000 training words.
The tokenizer is trained using the SpaceAfter=No
features. If the features
are not present in the data, they can be filled in using raw text in the
language in question.
The tagger, lemmatizer and parser are trained using gold UD data.
Model Performance
We present the tokenizer, tagger, lemmatizer and parser performance, measured on
the testing portion of the data, evaluated both on the raw text and using the
gold tokenization. The results are F1 scores measured by the conll18_ud_eval.py
script.
Model | Mode | Words | Sents | UPOS | XPOS | UFeats | AllTags | Lemma | UAS | LAS | MLAS | BLEX |
---|---|---|---|---|---|---|---|---|---|---|---|---|
afrikaans-afribooms-ud-2.6-200830 | Raw text | 99.82 | 98.25 | 98.55 | 95.42 | 98.27 | 95.33 | 97.52 | 90.34 | 87.93 | 80.33 | 79.91 |
afrikaans-afribooms-ud-2.6-200830 | Gold tokenization | — | — | 98.70 | 95.56 | 98.41 | 95.48 | 97.61 | 90.80 | 88.40 | 80.78 | 80.31 |
ancient_greek-perseus-ud-2.6-200830 | Raw text | 99.97 | 98.85 | 93.20 | 86.01 | 91.59 | 85.27 | 86.81 | 79.57 | 73.90 | 54.80 | 55.63 |
ancient_greek-perseus-ud-2.6-200830 | Gold tokenization | — | — | 93.24 | 86.03 | 91.62 | 85.30 | 86.84 | 79.74 | 74.06 | 54.92 | 55.72 |
ancient_greek-proiel-ud-2.6-200830 | Raw text | 100.00 | 48.02 | 97.73 | 98.04 | 92.36 | 91.01 | 94.71 | 79.98 | 75.99 | 60.15 | 65.88 |
ancient_greek-proiel-ud-2.6-200830 | Gold tokenization | — | — | 97.91 | 98.18 | 92.59 | 91.30 | 94.76 | 85.95 | 81.90 | 66.72 | 71.84 |
arabic-padt-ud-2.6-200830 | Raw text | 94.58 | 82.09 | 91.68 | 88.96 | 89.14 | 88.65 | 90.36 | 78.86 | 74.85 | 66.06 | 68.12 |
arabic-padt-ud-2.6-200830 | Gold tokenization | — | — | 96.87 | 94.20 | 94.36 | 93.82 | 95.23 | 87.60 | 83.14 | 74.51 | 76.09 |
armenian-armtdp-ud-2.6-200830 | Raw text | 99.34 | 97.85 | 95.64 | — | 90.30 | 88.94 | 94.45 | 85.07 | 79.97 | 66.54 | 71.61 |
armenian-armtdp-ud-2.6-200830 | Gold tokenization | — | — | 96.11 | — | 90.90 | 89.37 | 95.04 | 86.31 | 81.18 | 67.00 | 72.22 |
basque-bdt-ud-2.6-200830 | Raw text | 99.94 | 99.83 | 96.44 | — | 93.60 | 91.69 | 96.40 | 87.24 | 84.15 | 74.97 | 79.94 |
basque-bdt-ud-2.6-200830 | Gold tokenization | — | — | 96.48 | — | 93.64 | 91.72 | 96.43 | 87.30 | 84.21 | 75.00 | 79.96 |
belarusian-hse-ud-2.6-200830 | Raw text | 99.84 | 78.70 | 96.14 | 31.78 | 82.07 | 26.98 | 81.48 | 75.81 | 71.05 | 49.78 | 50.93 |
belarusian-hse-ud-2.6-200830 | Gold tokenization | — | — | 96.39 | 31.85 | 82.19 | 27.08 | 81.53 | 80.18 | 75.18 | 52.92 | 53.60 |
bulgarian-btb-ud-2.6-200830 | Raw text | 99.91 | 94.17 | 99.15 | 97.19 | 97.95 | 96.84 | 97.97 | 94.35 | 91.61 | 85.92 | 86.43 |
bulgarian-btb-ud-2.6-200830 | Gold tokenization | — | — | 99.27 | 97.30 | 98.05 | 96.95 | 98.07 | 95.17 | 92.41 | 86.62 | 87.17 |
catalan-ancora-ud-2.6-200830 | Raw text | 99.98 | 99.43 | 99.05 | 98.99 | 98.63 | 98.14 | 99.31 | 94.53 | 92.86 | 87.63 | 89.24 |
catalan-ancora-ud-2.6-200830 | Gold tokenization | — | — | 99.09 | 99.03 | 98.67 | 98.18 | 99.34 | 94.60 | 92.95 | 87.73 | 89.35 |
chinese-gsdsimp-ud-2.6-200830 | Raw text | 90.29 | 99.10 | 87.32 | 87.20 | 89.73 | 86.54 | 90.29 | 72.68 | 70.32 | 63.38 | 66.94 |
chinese-gsdsimp-ud-2.6-200830 | Gold tokenization | — | — | 96.32 | 96.15 | 99.43 | 95.50 | 99.99 | 86.89 | 83.93 | 78.52 | 82.60 |
chinese-gsd-ud-2.6-200830 | Raw text | 90.27 | 99.10 | 87.27 | 87.18 | 89.74 | 86.50 | 90.27 | 72.99 | 70.50 | 63.83 | 67.21 |
chinese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 96.30 | 96.16 | 99.42 | 95.45 | 99.99 | 87.30 | 84.22 | 78.63 | 82.84 |
classical_chinese-kyoto-ud-2.6-200830 | Raw text | 99.46 | 46.22 | 90.91 | 90.91 | 93.43 | 88.00 | 99.42 | 72.75 | 67.18 | 63.67 | 66.02 |
classical_chinese-kyoto-ud-2.6-200830 | Gold tokenization | — | — | 93.55 | 93.24 | 95.01 | 90.86 | 99.96 | 85.49 | 80.20 | 76.42 | 79.25 |
coptic-scriptorium-ud-2.6-200830 | Raw text | 71.91 | 35.97 | 69.61 | 68.00 | 63.06 | 60.16 | 70.51 | 47.75 | 45.89 | 25.42 | 35.81 |
coptic-scriptorium-ud-2.6-200830 | Gold tokenization | — | — | 96.15 | 92.53 | 87.75 | 81.98 | 96.70 | 89.14 | 85.79 | 57.57 | 76.42 |
croatian-set-ud-2.6-200830 | Raw text | 99.95 | 94.41 | 98.18 | 95.91 | 96.40 | 95.27 | 97.58 | 92.20 | 88.40 | 80.16 | 83.07 |
croatian-set-ud-2.6-200830 | Gold tokenization | — | — | 98.23 | 96.00 | 96.52 | 95.38 | 97.64 | 92.72 | 88.89 | 80.66 | 83.53 |
czech-pdt-ud-2.6-200830 | Raw text | 99.93 | 93.35 | 99.23 | 97.61 | 97.59 | 97.13 | 99.09 | 93.81 | 92.03 | 87.79 | 89.88 |
czech-pdt-ud-2.6-200830 | Gold tokenization | — | — | 99.30 | 97.71 | 97.70 | 97.24 | 99.17 | 94.60 | 92.81 | 88.45 | 90.57 |
czech-cac-ud-2.6-200830 | Raw text | 99.98 | 99.68 | 99.52 | 97.33 | 97.05 | 96.64 | 98.93 | 94.31 | 92.48 | 87.56 | 89.76 |
czech-cac-ud-2.6-200830 | Gold tokenization | — | — | 99.54 | 97.36 | 97.07 | 96.67 | 98.95 | 94.37 | 92.54 | 87.63 | 89.83 |
czech-fictree-ud-2.6-200830 | Raw text | 99.99 | 98.95 | 98.68 | 95.80 | 96.79 | 95.38 | 99.20 | 94.83 | 92.66 | 85.35 | 89.58 |
czech-fictree-ud-2.6-200830 | Gold tokenization | — | — | 98.69 | 95.82 | 96.80 | 95.40 | 99.21 | 94.92 | 92.74 | 85.47 | 89.69 |
czech-cltt-ud-2.6-200830 | Raw text | 99.65 | 97.40 | 99.21 | 95.00 | 94.98 | 94.76 | 99.06 | 91.37 | 89.67 | 82.08 | 86.96 |
czech-cltt-ud-2.6-200830 | Gold tokenization | — | — | 99.49 | 95.19 | 95.16 | 94.95 | 99.30 | 91.91 | 90.21 | 82.25 | 87.31 |
danish-ddt-ud-2.6-200830 | Raw text | 99.81 | 89.78 | 98.01 | — | 97.52 | 96.72 | 97.31 | 88.56 | 86.46 | 79.62 | 81.12 |
danish-ddt-ud-2.6-200830 | Gold tokenization | — | — | 98.26 | — | 97.73 | 96.99 | 97.53 | 89.82 | 87.67 | 80.73 | 82.27 |
dutch-alpino-ud-2.6-200830 | Raw text | 99.83 | 88.59 | 97.41 | 95.98 | 97.02 | 95.36 | 97.32 | 92.79 | 90.38 | 81.53 | 83.18 |
dutch-alpino-ud-2.6-200830 | Gold tokenization | — | — | 97.57 | 96.13 | 97.18 | 95.53 | 97.46 | 93.93 | 91.53 | 82.72 | 84.42 |
dutch-lassysmall-ud-2.6-200830 | Raw text | 99.83 | 75.40 | 96.58 | 95.42 | 96.41 | 94.73 | 97.21 | 90.36 | 87.66 | 78.84 | 80.17 |
dutch-lassysmall-ud-2.6-200830 | Gold tokenization | — | — | 96.79 | 96.05 | 96.97 | 95.40 | 97.33 | 94.26 | 91.24 | 83.56 | 84.84 |
english-ewt-ud-2.6-200830 | Raw text | 98.95 | 86.60 | 96.36 | 96.06 | 96.56 | 94.88 | 97.64 | 89.55 | 87.43 | 80.50 | 83.29 |
english-ewt-ud-2.6-200830 | Gold tokenization | — | — | 97.29 | 97.03 | 97.57 | 95.84 | 98.57 | 92.24 | 90.05 | 83.33 | 86.07 |
english-gum-ud-2.6-200830 | Raw text | 99.81 | 83.66 | 96.79 | 96.76 | 97.55 | 95.88 | 97.35 | 90.02 | 87.52 | 79.41 | 80.43 |
english-gum-ud-2.6-200830 | Gold tokenization | — | — | 96.99 | 96.93 | 97.75 | 96.09 | 97.56 | 91.93 | 89.36 | 81.20 | 82.25 |
english-lines-ud-2.6-200830 | Raw text | 99.92 | 87.45 | 97.60 | 95.86 | 96.88 | 93.39 | 98.34 | 89.36 | 86.45 | 79.35 | 82.87 |
english-lines-ud-2.6-200830 | Gold tokenization | — | — | 97.67 | 95.90 | 96.92 | 93.41 | 98.41 | 90.26 | 87.36 | 80.24 | 83.79 |
english-partut-ud-2.6-200830 | Raw text | 99.72 | 100.00 | 97.37 | 97.08 | 96.29 | 95.38 | 98.23 | 94.12 | 92.09 | 83.04 | 87.20 |
english-partut-ud-2.6-200830 | Gold tokenization | — | — | 97.62 | 97.33 | 96.54 | 95.63 | 98.50 | 94.40 | 92.37 | 83.44 | 87.48 |
estonian-edt-ud-2.6-200830 | Raw text | 99.96 | 91.56 | 97.65 | 98.25 | 96.44 | 95.19 | 95.34 | 88.75 | 86.18 | 80.12 | 79.65 |
estonian-edt-ud-2.6-200830 | Gold tokenization | — | — | 97.75 | 98.29 | 96.48 | 95.28 | 95.40 | 89.66 | 87.06 | 80.93 | 80.44 |
estonian-ewt-ud-2.6-200830 | Raw text | 98.96 | 70.09 | 95.00 | 96.30 | 93.74 | 91.31 | 93.81 | 81.07 | 77.55 | 69.14 | 70.69 |
estonian-ewt-ud-2.6-200830 | Gold tokenization | — | — | 96.22 | 97.37 | 94.65 | 92.37 | 94.83 | 86.37 | 82.55 | 73.03 | 74.39 |
finnish-tdt-ud-2.6-200830 | Raw text | 99.70 | 88.64 | 97.63 | 98.25 | 96.05 | 95.11 | 92.06 | 90.11 | 88.10 | 82.04 | 77.91 |
finnish-tdt-ud-2.6-200830 | Gold tokenization | — | — | 97.97 | 98.56 | 96.37 | 95.48 | 92.38 | 91.69 | 89.63 | 83.30 | 79.18 |
finnish-ftb-ud-2.6-200830 | Raw text | 99.91 | 86.84 | 96.52 | 95.08 | 96.72 | 93.82 | 95.73 | 89.93 | 87.32 | 80.13 | 80.74 |
finnish-ftb-ud-2.6-200830 | Gold tokenization | — | — | 96.85 | 95.31 | 96.87 | 94.16 | 95.83 | 91.99 | 89.34 | 82.64 | 83.05 |
french-gsd-ud-2.6-200830 | Raw text | 98.87 | 94.67 | 97.23 | 98.86 | 96.65 | 96.00 | 97.69 | 92.77 | 90.82 | 83.14 | 86.08 |
french-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.29 | 99.99 | 97.63 | 96.94 | 98.80 | 94.46 | 92.63 | 84.72 | 87.21 |
french-sequoia-ud-2.6-200830 | Raw text | 99.09 | 87.50 | 98.33 | — | 97.25 | 96.79 | 98.16 | 93.90 | 92.45 | 86.54 | 89.25 |
french-sequoia-ud-2.6-200830 | Gold tokenization | — | — | 99.32 | — | 98.19 | 97.78 | 99.09 | 95.80 | 94.43 | 88.78 | 90.78 |
french-partut-ud-2.6-200830 | Raw text | 99.42 | 100.00 | 97.28 | 96.93 | 94.17 | 93.63 | 95.59 | 94.71 | 92.71 | 80.18 | 83.34 |
french-partut-ud-2.6-200830 | Gold tokenization | — | — | 97.89 | 97.54 | 94.74 | 94.20 | 96.20 | 95.47 | 93.62 | 81.20 | 84.28 |
french-spoken-ud-2.6-200830 | Raw text | 99.06 | 21.15 | 96.49 | 96.44 | — | 93.98 | 97.48 | 79.23 | 74.91 | 64.48 | 66.67 |
french-spoken-ud-2.6-200830 | Gold tokenization | — | — | 97.63 | 97.31 | — | 95.00 | 98.28 | 87.27 | 82.51 | 74.23 | 75.56 |
galician-ctg-ud-2.6-200830 | Raw text | 99.22 | 97.22 | 97.30 | 97.07 | 99.05 | 96.71 | 98.07 | 85.45 | 83.07 | 72.03 | 76.75 |
galician-ctg-ud-2.6-200830 | Gold tokenization | — | — | 98.04 | 97.79 | 99.83 | 97.43 | 98.82 | 87.22 | 84.73 | 74.05 | 78.78 |
galician-treegal-ud-2.6-200830 | Raw text | 98.74 | 87.99 | 95.99 | 93.58 | 94.72 | 92.63 | 96.71 | 83.26 | 79.23 | 67.54 | 71.73 |
galician-treegal-ud-2.6-200830 | Gold tokenization | — | — | 97.23 | 94.65 | 95.76 | 93.73 | 97.89 | 86.57 | 82.30 | 71.04 | 75.71 |
german-hdt-ud-2.6-200830 | Raw text | 99.91 | 92.34 | 98.51 | 98.45 | 94.09 | 93.69 | 97.23 | 96.88 | 95.96 | 84.87 | 90.41 |
german-hdt-ud-2.6-200830 | Gold tokenization | — | — | 98.62 | 98.57 | 94.21 | 93.81 | 97.32 | 97.57 | 96.67 | 85.53 | 91.10 |
german-gsd-ud-2.6-200830 | Raw text | 99.58 | 80.90 | 94.39 | 97.51 | 91.14 | 85.97 | 96.58 | 87.06 | 82.93 | 62.33 | 74.97 |
german-gsd-ud-2.6-200830 | Gold tokenization | — | — | 94.73 | 97.96 | 91.65 | 86.51 | 96.95 | 89.36 | 85.31 | 64.33 | 77.26 |
gothic-proiel-ud-2.6-200830 | Raw text | 100.00 | 31.12 | 96.39 | 96.90 | 90.18 | 88.05 | 94.70 | 74.10 | 68.48 | 55.16 | 62.26 |
gothic-proiel-ud-2.6-200830 | Gold tokenization | — | — | 96.81 | 97.26 | 91.12 | 89.28 | 94.77 | 83.73 | 77.93 | 65.37 | 70.85 |
greek-gdt-ud-2.6-200830 | Raw text | 99.87 | 90.19 | 97.99 | 98.00 | 95.57 | 94.91 | 95.53 | 93.00 | 91.16 | 81.28 | 80.73 |
greek-gdt-ud-2.6-200830 | Gold tokenization | — | — | 98.14 | 98.14 | 95.69 | 95.02 | 95.61 | 93.82 | 91.95 | 82.03 | 81.53 |
hebrew-htb-ud-2.6-200830 | Raw text | 85.04 | 99.39 | 82.79 | 82.76 | 81.31 | 80.57 | 82.97 | 69.85 | 67.39 | 54.79 | 59.16 |
hebrew-htb-ud-2.6-200830 | Gold tokenization | — | — | 97.48 | 97.48 | 96.03 | 95.36 | 97.23 | 91.83 | 89.25 | 78.52 | 81.02 |
hindi-hdtb-ud-2.6-200830 | Raw text | 100.00 | 98.90 | 97.64 | 97.29 | 94.18 | 92.32 | 98.78 | 95.32 | 92.37 | 79.24 | 87.69 |
hindi-hdtb-ud-2.6-200830 | Gold tokenization | — | — | 97.65 | 97.29 | 94.21 | 92.35 | 98.78 | 95.44 | 92.49 | 79.41 | 87.84 |
hungarian-szeged-ud-2.6-200830 | Raw text | 99.85 | 95.89 | 96.77 | — | 94.32 | 93.51 | 94.97 | 87.78 | 84.24 | 74.80 | 77.84 |
hungarian-szeged-ud-2.6-200830 | Gold tokenization | — | — | 96.87 | — | 94.45 | 93.61 | 95.09 | 88.28 | 84.73 | 75.27 | 78.26 |
indonesian-gsd-ud-2.6-200830 | Raw text | 100.00 | 94.13 | 93.89 | 94.28 | 95.55 | 89.00 | 99.61 | 86.07 | 79.97 | 69.25 | 77.74 |
indonesian-gsd-ud-2.6-200830 | Gold tokenization | — | — | 93.90 | 94.26 | 95.52 | 88.98 | 99.61 | 86.32 | 80.18 | 69.51 | 78.00 |
irish-idt-ud-2.6-200830 | Raw text | 99.71 | 97.36 | 94.35 | 94.30 | 73.43 | 70.38 | 93.18 | 84.47 | 77.88 | 40.78 | 65.74 |
irish-idt-ud-2.6-200830 | Gold tokenization | — | — | 94.59 | 94.60 | 73.65 | 70.63 | 93.41 | 84.98 | 78.30 | 40.94 | 65.87 |
italian-isdt-ud-2.6-200830 | Raw text | 99.84 | 98.76 | 98.52 | 98.44 | 98.23 | 97.66 | 98.65 | 94.77 | 93.12 | 86.91 | 87.85 |
italian-isdt-ud-2.6-200830 | Gold tokenization | — | — | 98.68 | 98.60 | 98.38 | 97.81 | 98.81 | 95.07 | 93.44 | 87.20 | 88.19 |
italian-partut-ud-2.6-200830 | Raw text | 99.73 | 100.00 | 98.41 | 98.52 | 98.27 | 97.77 | 98.74 | 96.07 | 93.90 | 87.45 | 88.95 |
italian-partut-ud-2.6-200830 | Gold tokenization | — | — | 98.54 | 98.65 | 98.38 | 97.88 | 98.93 | 96.18 | 93.98 | 87.48 | 89.15 |
italian-postwita-ud-2.6-200830 | Raw text | 99.47 | 30.49 | 96.53 | 96.28 | 96.43 | 94.89 | 96.76 | 80.97 | 76.94 | 65.79 | 67.44 |
italian-postwita-ud-2.6-200830 | Gold tokenization | — | — | 97.06 | 96.79 | 96.89 | 95.41 | 97.18 | 88.04 | 83.76 | 75.23 | 76.98 |
italian-twittiro-ud-2.6-200830 | Raw text | 99.06 | 36.80 | 95.99 | 95.86 | 95.22 | 93.37 | 94.68 | 81.69 | 77.38 | 64.34 | 65.32 |
italian-twittiro-ud-2.6-200830 | Gold tokenization | — | — | 97.01 | 96.77 | 96.14 | 94.42 | 95.50 | 87.84 | 83.43 | 71.64 | 72.68 |
italian-vit-ud-2.6-200830 | Raw text | 99.69 | 94.69 | 97.86 | 97.07 | 97.38 | 95.76 | 98.64 | 92.03 | 89.20 | 80.39 | 83.83 |
italian-vit-ud-2.6-200830 | Gold tokenization | — | — | 98.16 | 97.49 | 97.66 | 96.16 | 98.92 | 92.77 | 89.91 | 81.15 | 84.53 |
japanese-gsd-ud-2.6-200830 | Raw text | 95.34 | 94.61 | 93.67 | 93.56 | 95.32 | 92.74 | 95.02 | 85.11 | 84.01 | 76.23 | 77.83 |
japanese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.03 | 97.71 | 99.99 | 96.83 | 99.61 | 94.73 | 93.41 | 87.64 | 89.28 |
korean-kaist-ud-2.6-200830 | Raw text | 99.95 | 100.00 | 95.89 | 87.82 | — | 87.62 | 94.23 | 89.41 | 87.58 | 82.32 | 80.34 |
korean-kaist-ud-2.6-200830 | Gold tokenization | — | — | 95.94 | 87.85 | — | 87.66 | 94.27 | 89.51 | 87.67 | 82.42 | 80.42 |
korean-gsd-ud-2.6-200830 | Raw text | 99.87 | 93.93 | 96.61 | 90.19 | 99.69 | 88.03 | 93.51 | 88.68 | 85.04 | 80.93 | 77.36 |
korean-gsd-ud-2.6-200830 | Gold tokenization | — | — | 96.74 | 90.32 | 99.82 | 88.16 | 93.64 | 89.50 | 85.84 | 81.76 | 78.14 |
latin-ittb-ud-2.6-200830 | Raw text | 99.99 | 92.44 | 98.54 | 96.35 | 96.92 | 95.12 | 98.94 | 90.31 | 88.16 | 82.19 | 85.37 |
latin-ittb-ud-2.6-200830 | Gold tokenization | — | — | 98.52 | 96.37 | 96.92 | 95.11 | 98.93 | 91.24 | 89.07 | 82.62 | 85.88 |
latin-llct-ud-2.6-200830 | Raw text | 100.00 | 99.49 | 99.60 | 97.13 | 97.11 | 96.63 | 97.68 | 95.48 | 94.35 | 89.31 | 90.44 |
latin-llct-ud-2.6-200830 | Gold tokenization | — | — | 99.60 | 97.14 | 97.11 | 96.63 | 97.68 | 95.54 | 94.40 | 89.40 | 90.53 |
latin-proiel-ud-2.6-200830 | Raw text | 99.87 | 36.81 | 96.67 | 96.81 | 90.71 | 89.59 | 96.16 | 74.44 | 69.97 | 57.51 | 64.96 |
latin-proiel-ud-2.6-200830 | Gold tokenization | — | — | 97.07 | 97.16 | 91.53 | 90.52 | 96.42 | 83.78 | 79.04 | 67.58 | 73.88 |
latin-perseus-ud-2.6-200830 | Raw text | 100.00 | 98.46 | 91.65 | 81.18 | 86.33 | 78.75 | 88.05 | 78.09 | 68.97 | 52.82 | 56.03 |
latin-perseus-ud-2.6-200830 | Gold tokenization | — | — | 91.64 | 81.17 | 86.33 | 78.74 | 88.04 | 78.21 | 69.07 | 52.84 | 55.99 |
latvian-lvtb-ud-2.6-200830 | Raw text | 99.32 | 98.74 | 96.28 | 89.64 | 93.79 | 88.84 | 95.81 | 88.31 | 85.26 | 75.23 | 79.56 |
latvian-lvtb-ud-2.6-200830 | Gold tokenization | — | — | 96.92 | 90.24 | 94.40 | 89.43 | 96.45 | 89.33 | 86.23 | 76.29 | 80.60 |
lithuanian-alksnis-ud-2.6-200830 | Raw text | 99.91 | 87.87 | 95.97 | 90.37 | 91.07 | 89.41 | 93.61 | 82.54 | 78.70 | 67.95 | 71.30 |
lithuanian-alksnis-ud-2.6-200830 | Gold tokenization | — | — | 96.04 | 90.40 | 91.18 | 89.49 | 93.70 | 83.93 | 80.08 | 69.02 | 72.43 |
lithuanian-hse-ud-2.6-200830 | Raw text | 97.30 | 97.30 | 89.66 | 89.28 | 81.45 | 77.07 | 87.98 | 70.92 | 62.53 | 44.26 | 53.76 |
lithuanian-hse-ud-2.6-200830 | Gold tokenization | — | — | 91.23 | 91.32 | 83.21 | 78.40 | 90.28 | 73.77 | 64.53 | 45.25 | 54.68 |
maltese-mudt-ud-2.6-200830 | Raw text | 99.84 | 86.29 | 95.77 | 95.66 | — | 95.30 | — | 84.76 | 79.76 | 68.39 | 72.24 |
maltese-mudt-ud-2.6-200830 | Gold tokenization | — | — | 95.88 | 95.77 | — | 95.40 | — | 85.46 | 80.38 | 68.69 | 72.66 |
marathi-ufal-ud-2.6-200830 | Raw text | 90.25 | 92.63 | 78.50 | — | 65.25 | 61.50 | 80.00 | 61.25 | 53.50 | 31.73 | 40.92 |
marathi-ufal-ud-2.6-200830 | Gold tokenization | — | — | 84.22 | — | 68.69 | 63.83 | 80.10 | 70.39 | 60.92 | 31.95 | 42.32 |
naija-nsc-ud-2.6-200830 | Raw text | 100.00 | 99.56 | 98.14 | — | 99.16 | 97.77 | 99.27 | 92.46 | 89.81 | 84.18 | 86.20 |
naija-nsc-ud-2.6-200830 | Gold tokenization | — | — | 98.14 | — | 99.16 | 97.77 | 99.27 | 92.50 | 89.84 | 84.25 | 86.26 |
north_sami-giella-ud-2.6-200830 | Raw text | 99.87 | 98.79 | 92.35 | 93.57 | 89.40 | 85.61 | 86.85 | 76.66 | 71.84 | 60.71 | 58.95 |
north_sami-giella-ud-2.6-200830 | Gold tokenization | — | — | 92.47 | 93.70 | 89.56 | 85.75 | 86.96 | 76.97 | 72.16 | 60.95 | 59.19 |
norwegian-bokmaal-ud-2.6-200830 | Raw text | 99.83 | 95.63 | 98.37 | — | 97.52 | 96.86 | 98.55 | 93.74 | 92.26 | 87.03 | 88.76 |
norwegian-bokmaal-ud-2.6-200830 | Gold tokenization | — | — | 98.57 | — | 97.71 | 97.05 | 98.75 | 94.48 | 93.00 | 87.67 | 89.43 |
norwegian-nynorsk-ud-2.6-200830 | Raw text | 99.91 | 94.11 | 98.36 | — | 97.38 | 96.67 | 98.37 | 93.86 | 92.11 | 86.07 | 88.11 |
norwegian-nynorsk-ud-2.6-200830 | Gold tokenization | — | — | 98.50 | — | 97.51 | 96.80 | 98.50 | 94.66 | 92.93 | 86.95 | 89.01 |
norwegian-nynorsklia-ud-2.6-200830 | Raw text | 99.91 | 99.53 | 96.45 | — | 95.71 | 93.62 | 98.05 | 80.90 | 76.53 | 65.74 | 69.55 |
norwegian-nynorsklia-ud-2.6-200830 | Gold tokenization | — | — | 96.55 | — | 95.79 | 93.72 | 98.14 | 81.15 | 76.76 | 65.94 | 69.80 |
old_church_slavonic-proiel-ud-2.6-200830 | Raw text | 100.00 | 41.43 | 96.58 | 96.83 | 90.44 | 89.17 | 93.19 | 77.42 | 73.57 | 63.51 | 68.53 |
old_church_slavonic-proiel-ud-2.6-200830 | Gold tokenization | — | — | 96.89 | 97.09 | 91.22 | 89.97 | 93.20 | 87.95 | 83.81 | 73.92 | 77.26 |
old_french-srcmf-ud-2.6-200830 | Raw text | 99.93 | 100.00 | 96.40 | 96.27 | 97.80 | 95.58 | — | 92.28 | 87.74 | 81.08 | 84.17 |
old_french-srcmf-ud-2.6-200830 | Gold tokenization | — | — | 96.47 | 96.33 | 97.86 | 95.64 | — | 92.36 | 87.81 | 81.17 | 84.26 |
old_russian-torot-ud-2.6-200830 | Raw text | 100.00 | 29.60 | 94.33 | 94.39 | 87.51 | 85.16 | 85.82 | 70.66 | 65.18 | 51.26 | 53.18 |
old_russian-torot-ud-2.6-200830 | Gold tokenization | — | — | 94.93 | 94.99 | 88.44 | 86.35 | 85.77 | 83.15 | 77.17 | 63.78 | 62.66 |
old_russian-rnc-ud-2.6-200830 | Raw text | 98.15 | 85.46 | 91.80 | 87.74 | 75.83 | 66.63 | 74.94 | 63.08 | 57.53 | 33.85 | 35.04 |
old_russian-rnc-ud-2.6-200830 | Gold tokenization | — | — | 93.34 | 89.43 | 77.09 | 67.76 | 76.13 | 66.86 | 60.73 | 36.05 | 37.07 |
persian-seraji-ud-2.6-200830 | Raw text | 99.65 | 98.75 | 97.69 | 97.66 | 97.75 | 97.29 | 96.67 | 91.09 | 88.15 | 83.43 | 82.26 |
persian-seraji-ud-2.6-200830 | Gold tokenization | — | — | 97.98 | 97.97 | 98.07 | 97.60 | 96.94 | 91.74 | 88.76 | 84.00 | 82.82 |
polish-pdb-ud-2.6-200830 | Raw text | 99.85 | 97.33 | 98.88 | 95.73 | 95.84 | 95.03 | 98.05 | 94.02 | 92.01 | 84.93 | 88.08 |
polish-pdb-ud-2.6-200830 | Gold tokenization | — | — | 99.04 | 95.88 | 95.99 | 95.18 | 98.19 | 94.51 | 92.50 | 85.36 | 88.52 |
polish-lfg-ud-2.6-200830 | Raw text | 99.85 | 99.65 | 98.92 | 95.99 | 96.51 | 95.06 | 98.27 | 96.89 | 95.52 | 89.73 | 92.45 |
polish-lfg-ud-2.6-200830 | Gold tokenization | — | — | 99.09 | 96.18 | 96.70 | 95.25 | 98.41 | 97.29 | 95.91 | 90.12 | 92.77 |
portuguese-gsd-ud-2.6-200830 | Raw text | 99.84 | 97.50 | 98.53 | 98.52 | 99.71 | 98.43 | 99.33 | 94.57 | 93.47 | 88.69 | 90.02 |
portuguese-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.69 | 98.69 | 99.87 | 98.59 | 99.49 | 94.94 | 93.82 | 89.11 | 90.36 |
portuguese-bosque-ud-2.6-200830 | Raw text | 99.55 | 90.64 | 97.19 | — | 96.17 | 94.79 | 97.98 | 92.32 | 89.72 | 79.29 | 84.22 |
portuguese-bosque-ud-2.6-200830 | Gold tokenization | — | — | 97.60 | — | 96.49 | 95.11 | 98.42 | 93.53 | 90.80 | 80.42 | 85.51 |
romanian-rrt-ud-2.6-200830 | Raw text | 99.69 | 95.28 | 97.79 | 97.18 | 97.32 | 96.81 | 98.20 | 91.83 | 87.56 | 80.00 | 82.17 |
romanian-rrt-ud-2.6-200830 | Gold tokenization | — | — | 98.08 | 97.44 | 97.60 | 97.08 | 98.49 | 92.74 | 88.38 | 80.82 | 82.88 |
romanian-nonstandard-ud-2.6-200830 | Raw text | 98.35 | 96.73 | 95.61 | 91.38 | 90.03 | 88.67 | 94.23 | 88.89 | 84.47 | 67.59 | 75.81 |
romanian-nonstandard-ud-2.6-200830 | Gold tokenization | — | — | 97.21 | 92.90 | 91.53 | 90.13 | 95.74 | 91.00 | 86.49 | 69.53 | 77.28 |
russian-syntagrus-ud-2.6-200830 | Raw text | 99.60 | 98.80 | 98.86 | — | 97.60 | 97.38 | 98.33 | 94.22 | 92.97 | 89.27 | 90.35 |
russian-syntagrus-ud-2.6-200830 | Gold tokenization | — | — | 99.27 | — | 97.98 | 97.76 | 98.68 | 94.99 | 93.72 | 89.90 | 90.95 |
russian-gsd-ud-2.6-200830 | Raw text | 99.50 | 96.22 | 98.03 | 97.51 | 94.76 | 93.60 | 96.89 | 91.66 | 88.38 | 80.67 | 84.18 |
russian-gsd-ud-2.6-200830 | Gold tokenization | — | — | 98.49 | 97.98 | 95.17 | 93.97 | 97.27 | 92.77 | 89.43 | 81.44 | 85.05 |
russian-taiga-ud-2.6-200830 | Raw text | 97.16 | 82.69 | 94.13 | 95.72 | 90.01 | 87.50 | 93.05 | 81.17 | 76.99 | 65.28 | 69.94 |
russian-taiga-ud-2.6-200830 | Gold tokenization | — | — | 96.47 | 98.56 | 92.72 | 89.87 | 95.68 | 85.57 | 80.81 | 68.93 | 73.90 |
sanskrit-vedic-ud-2.6-200830 | Raw text | 100.00 | 27.18 | 89.50 | — | 81.72 | 77.12 | 87.11 | 60.79 | 49.75 | 41.65 | 44.67 |
sanskrit-vedic-ud-2.6-200830 | Gold tokenization | — | — | 90.01 | — | 83.11 | 78.58 | 87.24 | 73.34 | 61.55 | 51.87 | 54.91 |
scottish_gaelic-arcosg-ud-2.6-200830 | Raw text | 99.58 | 55.57 | 93.63 | 87.07 | 89.78 | 85.43 | 95.41 | 77.66 | 71.86 | 55.15 | 60.51 |
scottish_gaelic-arcosg-ud-2.6-200830 | Gold tokenization | — | — | 94.26 | 87.84 | 90.23 | 86.30 | 95.85 | 83.77 | 77.61 | 62.05 | 68.26 |
serbian-set-ud-2.6-200830 | Raw text | 99.99 | 93.00 | 98.98 | 95.75 | 95.92 | 95.35 | 97.82 | 93.66 | 91.18 | 83.18 | 86.80 |
serbian-set-ud-2.6-200830 | Gold tokenization | — | — | 99.01 | 95.78 | 95.94 | 95.39 | 97.83 | 94.33 | 91.82 | 83.84 | 87.45 |
slovak-snk-ud-2.6-200830 | Raw text | 100.00 | 85.28 | 97.19 | 87.79 | 92.66 | 86.71 | 96.52 | 91.71 | 89.60 | 78.75 | 84.54 |
slovak-snk-ud-2.6-200830 | Gold tokenization | — | — | 97.30 | 88.06 | 92.84 | 86.98 | 96.60 | 93.68 | 91.57 | 80.55 | 86.59 |
slovenian-ssj-ud-2.6-200830 | Raw text | 97.99 | 67.98 | 96.93 | 94.35 | 94.56 | 93.95 | 96.59 | 88.09 | 86.65 | 80.90 | 83.58 |
slovenian-ssj-ud-2.6-200830 | Gold tokenization | — | — | 98.86 | 96.44 | 96.69 | 96.01 | 98.54 | 94.41 | 92.96 | 86.68 | 89.30 |
slovenian-sst-ud-2.6-200830 | Raw text | 99.85 | 23.14 | 94.70 | 92.70 | 92.52 | 89.74 | 97.14 | 64.23 | 59.57 | 49.25 | 52.95 |
slovenian-sst-ud-2.6-200830 | Gold tokenization | — | — | 95.71 | 93.11 | 92.94 | 90.90 | 97.46 | 77.81 | 72.24 | 62.71 | 67.18 |
spanish-ancora-ud-2.6-200830 | Raw text | 99.95 | 98.32 | 99.09 | 99.02 | 98.87 | 98.33 | 99.36 | 93.62 | 91.78 | 86.82 | 88.06 |
spanish-ancora-ud-2.6-200830 | Gold tokenization | — | — | 99.14 | 99.06 | 98.91 | 98.37 | 99.40 | 93.83 | 91.97 | 87.01 | 88.24 |
spanish-gsd-ud-2.6-200830 | Raw text | 99.76 | 94.54 | 97.17 | — | 97.05 | 95.32 | 98.80 | 92.00 | 89.70 | 79.23 | 84.49 |
spanish-gsd-ud-2.6-200830 | Gold tokenization | — | — | 97.40 | — | 97.27 | 95.54 | 99.02 | 92.73 | 90.38 | 79.93 | 85.13 |
swedish-talbanken-ud-2.6-200830 | Raw text | 99.89 | 96.13 | 98.41 | 97.26 | 97.33 | 96.46 | 98.19 | 91.99 | 89.68 | 83.63 | 85.82 |
swedish-talbanken-ud-2.6-200830 | Gold tokenization | — | — | 98.51 | 97.38 | 97.44 | 96.57 | 98.30 | 92.46 | 90.14 | 84.12 | 86.33 |
swedish-lines-ud-2.6-200830 | Raw text | 99.96 | 87.20 | 97.71 | 95.47 | 90.89 | 88.10 | 97.76 | 89.14 | 85.80 | 71.44 | 81.67 |
swedish-lines-ud-2.6-200830 | Gold tokenization | — | — | 97.75 | 95.48 | 90.91 | 88.09 | 97.79 | 89.91 | 86.52 | 72.13 | 82.48 |
tamil-ttb-ud-2.6-200830 | Raw text | 94.51 | 97.52 | 88.39 | 82.92 | 85.30 | 82.11 | 89.15 | 70.28 | 64.91 | 54.93 | 58.46 |
tamil-ttb-ud-2.6-200830 | Gold tokenization | — | — | 93.36 | 87.28 | 90.10 | 86.22 | 93.97 | 78.03 | 71.79 | 61.09 | 64.80 |
telugu-mtg-ud-2.6-200830 | Raw text | 99.58 | 96.62 | 93.63 | 93.63 | 98.48 | 93.63 | — | 90.17 | 83.52 | 76.00 | 79.62 |
telugu-mtg-ud-2.6-200830 | Gold tokenization | — | — | 94.04 | 94.04 | 98.89 | 94.04 | — | 91.12 | 84.47 | 76.84 | 80.46 |
turkish-imst-ud-2.6-200830 | Raw text | 98.30 | 96.97 | 94.48 | 93.69 | 92.06 | 89.95 | 94.41 | 72.63 | 66.80 | 58.31 | 61.57 |
turkish-imst-ud-2.6-200830 | Gold tokenization | — | — | 96.10 | 95.32 | 93.66 | 91.50 | 96.00 | 76.10 | 69.93 | 60.33 | 63.83 |
ukrainian-iu-ud-2.6-200830 | Raw text | 99.81 | 96.61 | 97.89 | 94.22 | 94.18 | 93.13 | 97.39 | 90.59 | 88.24 | 78.76 | 83.19 |
ukrainian-iu-ud-2.6-200830 | Gold tokenization | — | — | 98.10 | 94.42 | 94.34 | 93.30 | 97.56 | 91.11 | 88.75 | 79.11 | 83.59 |
urdu-udtb-ud-2.6-200830 | Raw text | 100.00 | 98.31 | 94.10 | 92.27 | 82.89 | 78.41 | 97.38 | 88.27 | 82.63 | 56.79 | 74.77 |
urdu-udtb-ud-2.6-200830 | Gold tokenization | — | — | 94.08 | 92.26 | 82.92 | 78.43 | 97.39 | 88.37 | 82.74 | 56.90 | 74.92 |
uyghur-udt-ud-2.6-200830 | Raw text | 99.54 | 81.81 | 89.24 | 91.70 | 88.47 | 80.04 | 94.76 | 76.58 | 64.72 | 46.67 | 55.08 |
uyghur-udt-ud-2.6-200830 | Gold tokenization | — | — | 89.67 | 92.21 | 88.92 | 80.47 | 95.27 | 78.39 | 66.27 | 47.53 | 56.23 |
vietnamese-vtb-ud-2.6-200830 | Raw text | 85.37 | 93.46 | 78.19 | 76.69 | 85.11 | 76.53 | 85.15 | 52.80 | 47.90 | 41.56 | 44.31 |
vietnamese-vtb-ud-2.6-200830 | Gold tokenization | — | — | 90.56 | 88.69 | 99.72 | 88.47 | 99.58 | 72.63 | 65.26 | 58.85 | 62.42 |
welsh-ccg-ud-2.6-200830 | Raw text | 99.42 | 96.28 | 94.02 | 92.96 | 89.04 | 86.39 | 92.88 | 85.79 | 79.16 | 60.98 | 66.81 |
welsh-ccg-ud-2.6-200830 | Gold tokenization | — | — | 94.54 | 93.51 | 89.52 | 86.83 | 93.46 | 87.04 | 80.35 | 62.14 | 68.05 |
wolof-wtb-ud-2.6-200830 | Raw text | 99.23 | 91.95 | 94.25 | 94.12 | 93.37 | 91.19 | 95.22 | 83.79 | 78.59 | 66.50 | 70.09 |
wolof-wtb-ud-2.6-200830 | Gold tokenization | — | — | 95.19 | 95.03 | 94.22 | 92.10 | 95.97 | 85.98 | 80.75 | 68.61 | 72.08 |
EvaLatin 2020 Models
EvaLatin 2020 Models are distributed under the CC BY-NC-SA licence. The models are based solely on EvaLatin 2020 treebanks, and additionally use multilingual BERT.
The models require UDPipe 2.
Download
The latest version 200831 of the EvaLatin 2020 models can be downloaded from LINDAT/CLARIN repository.
The models are also available in the REST service.
Acknowledgements
This work was supported by the grant no. GX20-16819X of the Grant Agency of the Czech Republic, and has been using language resources stored and distributed by the LINDAT/CLARIAH-CZ project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2018101).
The models were trained on EvaLatin 2020 treebanks.
Finally, multilingual BERT is used to provide contextualized word embeddings.
Publications
- Milan Straka, Jana Straková (2020): UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings. In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 2006.03687
Model Performance
Model | Dataset | UPOS | Lemma |
---|---|---|---|
latin-evalatin20-200830 | test classical | 96.73 | 96.39 |
latin-evalatin20-200830 | test cross-genre | 90.47 | 86.89 |
latin-evalatin20-200830 | test cross-time | 87.58 | 90.59 |