PDT-Vallex 4.0

The PDT-Vallex valency lexicon has been built in close connection with the tectogrammatical (syntax/semantics) annotation of the Prague Dependency Treebanks (Prague Dependency Treebank - PDT 3.5 and its predecessors, Prague Czech-English Dependency Treebank project PCEDT 2.0, and the Prague Dependency Treebank of Spoken Czech - PDTSC 2.0 and its predecessor) and a small corpus of user-generated text in the EC-funded "Faust" project. These four corpora are now available in a single package as the Prague Dependency Treebank - Consolidated 1.0 - PDT-C 1.0.

 

The current (latest) version, named PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 - PDT-Vallex 4.0, is now linked to all the aforementioned Czech corpora (at the tectogrammatical layer of annotation) and it is also a part of PDT-C 1.0. PDT-Vallex 4.0 contains 17341 valency frames for 13027 words (verbs, nouns, adjectives, adverbs), out of which 14528 verbal valency frames for 8498 verbs which occurred in the PDT-C 1.0 are also accessible online. PDT-Vallex 4.0 is an extended version of the original PDT-Vallex which contained 7121 verb entries with 11933 frames. PDT-Vallex 4.0 is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the Prague Dependency Treebanks main annotation tool) and also in more human readable form (see the links below). The main feature of this valency lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame (roughly corresponding also to a particular verb sense) with additional (generalized) information about its usage and surface morphosyntactic form alternatives.

Resources and access

PDT-Vallex is available in several forms.

Related projects

EngVallex

PDT-Vallex is complemented with EngVallex, valency lexicon for English, which follows the structure and labeling scheme of PDT-Vallex, but which is based on English PropBank frame files; it has been used for the tectogrammatical annotation of the English side of the Prague Czech-English Dependency Treebank (PCEDT 3.0). It can be downloaded here or browsed here.

CzEngVallex

CzEngVallex is a bilingual valency dictionary built over the PCEDT parallel treebank, which links verb senses and their arguments explicitly in the electronic version of the dictionary. It can also be downloaded here or browsed here.

SynSemClass

SynSemClass Lexicon  consists of bilingual (cs-en) synonym classes of verbs (verb senses). It is also available from the LINDAT repository (current version: SynSemClass3.0).

The older versions are SynSemClass1.0 (searchable hereand SynSemClass2.0 (searchable here).

How to cite

If you make use of PDT-Vallex, please cite the data directly as (or see the Licence tab for additional formats and options):

Urešová, Zdeňka; Bémová, Alevtina; Fučíková, Eva; Hajič, Jan; Kolářová, Veronika; Mikulová, Marie; Pajas, Petr; Panevová, Jarmila and Štěpánek, Jan (2021). PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0. LINDAT/CLARIAH-CZ digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11234/1-3499.

 @misc{11234/1-3499,
 title = {{PDT}-{V}allex: {C}zech Valency lexicon linked to treebanks 4.0 ({PDT}-{V}allex 4.0)},
 author = {Ure{\v s}ov{\'a}, Zde{\v n}ka and B{\'e}mov{\'a}, Alevtina and Fu{\v c}{\' \i}kov{\'a}, Eva and Haji{\v c}, Jan and Kol{\'a}{\v r}ov{\'a}, Veronika and Mikulov{\'a}, Marie and Pajas, Petr and Panevov{\'a}, Jarmila and {\v S}t{\v e}p{\'a}nek, Jan},
 url = {http://hdl.handle.net/11234/1-3499},
 note = {{LINDAT}/{CLARIAH-CZ} digital library at Institute of Formal and Applied Linguistics, Charles University in Prague},
 year  {2021} }