Number of unique verbs in each treebank. Plain predicates are simply lemmas. Non plain predicates are various kinds of compound verbs, including verbs with particles (separable prefixes), and pronominal verbs with a lexicalized reflexive marker. The counts are slightly biased because occasionally a predicate is member of two categories. For example, Dutch mee|brengen zich “bring themselves” belongs to two non-plain categories: compound:prt and expl:pv.
Treebank | Predicates | Plain | Non-plain |
---|---|---|---|
Afrikaans AfriBooms | 866 | 750 | compound:prt 116 |
Akkadian PISANDUB | 97 | 91 | compound 6 |
Amharic ATT | 660 | 621 | compound 15, compound:svc 24 |
Ancient Greek PROIEL | 2861 | 2861 | |
Ancient Greek Perseus | 3923 | 3923 | |
Arabic PADT | 1569 | 1569 | |
Arabic PUD | 747 | 533 | compound:prt 214 |
Armenian ArmTDP | 1202 | 965 | compound 2, compound:lvc 209, compound:svc 26 |
Assyrian AS | 52 | 52 | |
Basque BDT | 1111 | 1016 | compound 95 |
Belarusian HSE | 587 | 587 | |
Breton KEB | 282 | 282 | |
Bulgarian BTB | 2780 | 2780 | |
Buryat BDT | 405 | 393 | compound 12 |
Catalan AnCora | 2066 | 1970 | compound 96 |
Chinese CFL | 480 | 368 | compound:dir 25, compound:ext 6, compound:vo 22, compound:vv 59 |
Chinese GSD | 4642 | 4642 | |
Classical Chinese Kyoto | 1443 | 1370 | compound:redup 73 |
Coptic Scriptorium | 556 | 552 | compound 4 |
Croatian SET | 2732 | 2026 | compound 35, expl:pv 671 |
Czech CAC | 4340 | 3294 | expl:pv 1046 |
Czech CLTT | 282 | 265 | expl:pv 17 |
Czech FicTree | 4038 | 3057 | expl:pv 981 |
Czech PDT | 6513 | 5125 | expl:pv 1388 |
Czech PUD | 777 | 637 | compound 8, expl:pv 132 |
Danish DDT | 1746 | 1528 | compound:prt 218 |
Dutch Alpino | 3385 | 2353 | compound:prt 831, expl:pv 201 |
Dutch LassySmall | 1656 | 1217 | compound:prt 358, expl:pv 81 |
English EWT | 2454 | 2013 | compound 102, compound:prt 339 |
English GUM | 1908 | 1635 | compound 11, compound:prt 262 |
English LinES | 1667 | 1434 | compound 20, compound:prt 213 |
English PUD | 784 | 717 | compound 10, compound:prt 57 |
English ParTUT | 1075 | 1021 | compound:prt 54 |
Erzya JR | 785 | 775 | compound 3, compound:svc 7 |
Estonian EDT | 3571 | 2300 | compound 15, compound:prt 1256 |
Estonian EWT | 877 | 643 | compound:prt 234 |
Faroese OFT | 218 | 218 | |
Finnish FTB | 2721 | 2557 | compound:prt 164 |
Finnish PUD | 616 | 608 | compound:prt 8 |
Finnish TDT | 2542 | 2484 | compound:prt 58 |
French FQB | 383 | 383 | |
French GSD | 2235 | 2235 | |
French ParTUT | 641 | 641 | |
French Sequoia | 980 | 980 | |
French Spoken | 638 | 637 | compound 1 |
Galician CTG | 1625 | 1625 | |
Galician TreeGal | 693 | 692 | compound 1 |
German GSD | 3186 | 2634 | compound 18, compound:prt 519, expl:pv 15 |
German HDT | 6652 | 4707 | compound:prt 1388, expl:pv 557 |
German LIT | 998 | 937 | compound:prt 61 |
German PUD | 806 | 710 | compound 3, compound:prt 92, expl:pv 1 |
Gothic PROIEL | 1163 | 1163 | |
Greek GDT | 952 | 952 | |
Hebrew HTB | 1918 | 1840 | compound:affix 15, compound:smixut 63 |
Hindi HDTB | 3672 | 563 | compound 3109 |
Hungarian Szeged | 1457 | 1149 | compound:preverb 308 |
Indonesian GSD | 2869 | 2695 | compound 174 |
Irish IDT | 289 | 254 | compound 33, compound:prt 2 |
Italian ISDT | 2143 | 2143 | |
Italian PUD | 677 | 677 | |
Italian ParTUT | 973 | 973 | |
Italian PoSTWITA | 1330 | 1328 | compound 2 |
Italian VIT | 2060 | 2060 | |
Japanese GSD | 3426 | 3360 | compound 66 |
Japanese Modern | 509 | 455 | compound 54 |
Japanese PUD | 946 | 938 | compound 8 |
Karelian KKPP | 178 | 176 | compound 2 |
Kazakh KTB | 427 | 420 | compound 1, compound:lvc 6 |
Komi Zyrian IKDP | 108 | 108 | |
Komi Zyrian Lattice | 213 | 213 | |
Korean GSD | 9120 | 9120 | |
Korean Kaist | 21647 | 20315 | compound 1332 |
Kurmanji MG | 246 | 172 | compound 4, compound:lvc 70 |
Latin ITTB | 1267 | 1267 | |
Latin PROIEL | 2190 | 2190 | |
Latin Perseus | 1403 | 1403 | |
Latvian LVTB | 3719 | 3718 | compound 1 |
Lithuanian ALKSNIS | 2010 | 2010 | |
Lithuanian HSE | 359 | 359 | |
Marathi UFAL | 185 | 122 | compound:lvc 36, compound:svc 27 |
Mbya Guarani Thomas | 102 | 85 | compound:svc 17 |
Naija NSC | 344 | 232 | compound 4, compound:prt 18, compound:svc 90 |
North Sami Giella | 786 | 786 | |
Norwegian Bokmaal | 3052 | 1981 | compound 2, compound:prt 1069 |
Norwegian Nynorsk | 2725 | 1649 | compound 1, compound:prt 1075 |
Norwegian NynorskLIA | 899 | 423 | compound:prt 476 |
Old Church Slavonic PROIEL | 1108 | 1108 | |
Old Russian RNC | 433 | 433 | |
Old Russian TOROT | 2619 | 2619 | |
Polish LFG | 3960 | 3067 | expl:pv 893 |
Polish PDB | 5875 | 4265 | expl:pv 1610 |
Polish PUD | 756 | 610 | expl:pv 146 |
Portuguese Bosque | 1939 | 1939 | |
Romanian Nonstandard | 2756 | 1199 | compound 40, expl:pv 1517 |
Romanian RRT | 2673 | 1777 | expl:pv 896 |
Russian GSD | 2046 | 2046 | |
Russian PUD | 875 | 875 | |
Russian SynTagRus | 7841 | 7832 | compound 9 |
Russian Taiga | 2044 | 2044 | |
Sanskrit UFAL | 155 | 155 | |
Serbian SET | 1555 | 1092 | compound 463 |
Slovak SNK | 2884 | 2194 | compound 5, expl:pv 685 |
Slovenian SSJ | 2365 | 2365 | |
Slovenian SST | 726 | 726 | |
Spanish AnCora | 2499 | 2414 | compound 85 |
Spanish GSD | 4177 | 4174 | compound 3 |
Swedish LinES | 1881 | 1458 | compound:prt 423 |
Swedish PUD | 716 | 596 | compound 1, compound:prt 119 |
Swedish Talbanken | 1604 | 1174 | compound:prt 430 |
Tagalog TRG | 17 | 17 | |
Tamil TTB | 318 | 306 | compound 11, compound:prt 1 |
Turkish GB | 574 | 458 | compound 108, compound:redup 8 |
Turkish IMST | 955 | 800 | compound 117, compound:lvc 15, compound:redup 23 |
Ukrainian IU | 3659 | 3648 | compound 2, compound:svc 9 |
Upper Sorbian UFAL | 369 | 324 | expl:pv 45 |
Urdu UDTB | 2185 | 355 | compound 1830 |
Vietnamese VTB | 1859 | 1844 | compound 15 |
Warlpiri UFAL | 23 | 23 | |
Welsh CCG | 119 | 119 | |
Wolof WTB | 1581 | 1476 | compound 3, compound:prt 41, compound:svc 61 |
Yoruba YTB | 140 | 92 | compound 3, compound:prt 4, compound:svc 41 |