Verbal Predicates in Deep UD Treebanks

Number of unique verbs in each treebank. Plain predicates are simply lemmas. Non plain predicates are various kinds of compound verbs, including verbs with particles (separable prefixes), and pronominal verbs with a lexicalized reflexive marker. The counts are slightly biased because occasionally a predicate is member of two categories. For example, Dutch mee|brengen zich “bring themselves” belongs to two non-plain categories: compound:prt and expl:pv.

TreebankPredicatesPlainNon-plain
Afrikaans AfriBooms866750compound:prt 116
Akkadian PISANDUB9791compound 6
Amharic ATT660621compound 15, compound:svc 24
Ancient Greek PROIEL28612861
Ancient Greek Perseus39233923
Arabic PADT15691569
Arabic PUD747533compound:prt 214
Armenian ArmTDP1202965compound 2, compound:lvc 209, compound:svc 26
Assyrian AS5252
Basque BDT11111016compound 95
Belarusian HSE587587
Breton KEB282282
Bulgarian BTB27802780
Buryat BDT405393compound 12
Catalan AnCora20661970compound 96
Chinese CFL480368compound:dir 25, compound:ext 6, compound:vo 22, compound:vv 59
Chinese GSD46424642
Classical Chinese Kyoto14431370compound:redup 73
Coptic Scriptorium556552compound 4
Croatian SET27322026compound 35, expl:pv 671
Czech CAC43403294expl:pv 1046
Czech CLTT282265expl:pv 17
Czech FicTree40383057expl:pv 981
Czech PDT65135125expl:pv 1388
Czech PUD777637compound 8, expl:pv 132
Danish DDT17461528compound:prt 218
Dutch Alpino33852353compound:prt 831, expl:pv 201
Dutch LassySmall16561217compound:prt 358, expl:pv 81
English EWT24542013compound 102, compound:prt 339
English GUM19081635compound 11, compound:prt 262
English LinES16671434compound 20, compound:prt 213
English PUD784717compound 10, compound:prt 57
English ParTUT10751021compound:prt 54
Erzya JR785775compound 3, compound:svc 7
Estonian EDT35712300compound 15, compound:prt 1256
Estonian EWT877643compound:prt 234
Faroese OFT218218
Finnish FTB27212557compound:prt 164
Finnish PUD616608compound:prt 8
Finnish TDT25422484compound:prt 58
French FQB383383
French GSD22352235
French ParTUT641641
French Sequoia980980
French Spoken638637compound 1
Galician CTG16251625
Galician TreeGal693692compound 1
German GSD31862634compound 18, compound:prt 519, expl:pv 15
German HDT66524707compound:prt 1388, expl:pv 557
German LIT998937compound:prt 61
German PUD806710compound 3, compound:prt 92, expl:pv 1
Gothic PROIEL11631163
Greek GDT952952
Hebrew HTB19181840compound:affix 15, compound:smixut 63
Hindi HDTB3672563compound 3109
Hungarian Szeged14571149compound:preverb 308
Indonesian GSD28692695compound 174
Irish IDT289254compound 33, compound:prt 2
Italian ISDT21432143
Italian PUD677677
Italian ParTUT973973
Italian PoSTWITA13301328compound 2
Italian VIT20602060
Japanese GSD34263360compound 66
Japanese Modern509455compound 54
Japanese PUD946938compound 8
Karelian KKPP178176compound 2
Kazakh KTB427420compound 1, compound:lvc 6
Komi Zyrian IKDP108108
Komi Zyrian Lattice213213
Korean GSD91209120
Korean Kaist2164720315compound 1332
Kurmanji MG246172compound 4, compound:lvc 70
Latin ITTB12671267
Latin PROIEL21902190
Latin Perseus14031403
Latvian LVTB37193718compound 1
Lithuanian ALKSNIS20102010
Lithuanian HSE359359
Marathi UFAL185122compound:lvc 36, compound:svc 27
Mbya Guarani Thomas10285compound:svc 17
Naija NSC344232compound 4, compound:prt 18, compound:svc 90
North Sami Giella786786
Norwegian Bokmaal30521981compound 2, compound:prt 1069
Norwegian Nynorsk27251649compound 1, compound:prt 1075
Norwegian NynorskLIA899423compound:prt 476
Old Church Slavonic PROIEL11081108
Old Russian RNC433433
Old Russian TOROT26192619
Polish LFG39603067expl:pv 893
Polish PDB58754265expl:pv 1610
Polish PUD756610expl:pv 146
Portuguese Bosque19391939
Romanian Nonstandard27561199compound 40, expl:pv 1517
Romanian RRT26731777expl:pv 896
Russian GSD20462046
Russian PUD875875
Russian SynTagRus78417832compound 9
Russian Taiga20442044
Sanskrit UFAL155155
Serbian SET15551092compound 463
Slovak SNK28842194compound 5, expl:pv 685
Slovenian SSJ23652365
Slovenian SST726726
Spanish AnCora24992414compound 85
Spanish GSD41774174compound 3
Swedish LinES18811458compound:prt 423
Swedish PUD716596compound 1, compound:prt 119
Swedish Talbanken16041174compound:prt 430
Tagalog TRG1717
Tamil TTB318306compound 11, compound:prt 1
Turkish GB574458compound 108, compound:redup 8
Turkish IMST955800compound 117, compound:lvc 15, compound:redup 23
Ukrainian IU36593648compound 2, compound:svc 9
Upper Sorbian UFAL369324expl:pv 45
Urdu UDTB2185355compound 1830
Vietnamese VTB18591844compound 15
Warlpiri UFAL2323
Welsh CCG119119
Wolof WTB15811476compound 3, compound:prt 41, compound:svc 61
Yoruba YTB14092compound 3, compound:prt 4, compound:svc 41