The Valency Lexicon of Czech Verbs with Complex Syntactic-Semantic Annotation

The Valency Lexicon of Czech Verbs is a collection of linguistically annotated data and documentation; it provides a formal, machine-readable description of valency frames of Czech verbs and additional syntactico-semantic information useful for the analysis and synthesis of Czech texts as well as other applied tasks in NLP. It covers the common senses of the most frequent Czech verbs (in total over 11,080 senses of almost 4,700 lemmas, i.e., more than 6,850 verb senses counting perfective and imperfective verbs as forming a single lexeme).

The lexicon provides:

  • valency frames with syntactico-semantic characterization of the most frequent verbs in their particular senses (number of complementations, their morphological forms and obligatoriness);
  • glosses, examples;
  • additional characteristics – control, diatheses, reflexivity, reciprocity, reflexive verbs, lexicalized alternations, syntactico-semantic class of verbs, including idioms and multiword expressions (light verb constructions).

The lexicon is available in three formats:

  • an HTMLversion for comfortable browsing and sorting according various criteria (link will be added soon);
  • a printed version of VALLEX 3.0;
  • XML and/or JSON data format for further applications (links will be added soon).

 

VALLEX 4.5

under construction, links will be added soon

VALLEX 4.5 is an enhanced successor of VALLEX 3.0, 3.5, and 4.0. In addition to the information stored there, VALLEX 4.5 provides a detailed description of reflexive verbs, i.e., verbs with the reflexive se or si as an obligatory part of their verb lexemes. VALLEX 4.5 covers 1,525 reflexive verbs in 1,545 lexical units (2,501 when aspectual counterparts counted separately).

 

How to cite the VALLEX lexicon

If you make use of VALLEX, please cite (at least one of) the following papers:

  • Kettnerová, V., Lopatková, M.: Reflexives in Czech from a Dependency Perspective. In Gerdes, K., Kahane. S. (eds.) Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, Syntaxfest 2019), pp. 14-25, Paris, France, Association for Computational Linguistics, 2019 [bib]

In addition, you can cite the data downloadable from LINDAT/CLARIN digital library:

  • Lopatková, M., Kettnerová, V., Mírovský, J., Vernerová, A., Bejček, E., Žabokrtský, Z.: VALLEX 4.5, LINDAT/CLARIAH-CZ Digital Library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, Prague, http://hdl.handle.net/11234/1-4756, 2022 [bib

 


VALLEX Archive

VALLEX 4.0

VALLEX 4.0 is an enhanced successor of VALLEX  3.0 and 3.5. In addition to the information stored there, VALLEX 4.0 contains also a detailed classification of verbs expressing reciprocity and reflexivity. VALLEX 4.0 covers 324 lexical units for inherently reciprocal verbs; further, it identifies almost 2,750 lexical units allowing for syntactic reciprocalization and almost 2,050 lexical units allowing for syntactic reflexivization.

The annotation of reflexivity and reciprocity has been developed within the project Between Reciprocity and Reflexivity: The Case of Czech Reciprocal Constructions supported by the Grant Agency of the Czech Republic, grant No. 18-03984S.

The theoretical part of the lexicon (including the Grammar Component) has been published as a Technical report in the ÚFAL series.

VALLEX 3.5

VALLEX 3.5 is an enhanced successor of VALLEX 3.0. In addition to the information stored in VALLEX 3.0, VALLEX 3.5 contains an annotation of light verb constructions, covering almost 3,000 collocations of predicative nouns with light verbs (counted as combinations of a lemma of a light verb and a lemma of a predicative noun), which correspond to almost 1,500 light verb constructions (counted as individual combinations of a lexical unit of a light verb and a lexical unit of a predicative noun).

The annotation of light verb constructions has been developed within the project Combining Words: Syntactic Properties of Czech Multiword Expressions with Light Verbs supported by the Grant Agency of the Czech Republic, grant No. GA15-09979S.

VALLEX 3.0

VALLEX 3.0 is an enhanced, cleaned and corrected successor of VALLEX 2.5. It contains - in addition to the information stored in VALLEX 2.5 - also 

  • annotation of grammaticalized alternations (diatheses and reciprocity) and lexicalized alternations,
  • links to real-world sentences annotated by the lexicon entries for more than one hundred Czech verbs, and
  • links to PDT-Vallex, a lexicon connected with the Prague Dependency Corpus.

VALLEX 3.0 has been developed within the project Delving Deeper: Lexicographic Description of Syntactic and Semantic Properties of Czech Verbs supported by the Grant Agency of the Czech Republic, grant  No. GA P406/12/0557.

VALLEX 2.7

VALLEX 2.7 is an enhanced, cleaned and corrected successor of VALLEX 2.5. It contains - in addition to the information stored in VALLEX 2.5 - also 

  • annotation of grammaticalized alternations (diatheses and reciprocity) and lexicalized alternations
  • links to real-world sentences annotated by the lexicon entries for more than one hundred Czech verbs, and
  • links to PDT-Vallex, a lexicon connected with the Prague Dependency Corpus.

VALLEX 2.5

VALLEX 2.5 is a cleaned and corrected successor of VALLEX 2.0. It was released electronically at the end of 2007 and since spring 2008 it is available also as a book issued by Karolinum Press, the publishing house of Charles University in Prague.

VALLEX 2.0

In VALLEX 2.0, there are roughly 2,730 lexeme entries containing together around 6,460 lexical units ("senses"). VALLEX 2.0—unlike traditional dictionaries and also unlike VALLEX 1.0—treats a pair of perfective and imperfective aspectual counterparts as a single lexeme (if perfective and imperfective verbs would be counted separately, the size of VALLEX 2.0 would virtually grow to 4,250 verb entries).

VALLEX 1.0

VALLEX 1.0 contains roughly 1400 verbs (counting only perfective and imperfective verbs, but not their iterative counterparts) – 1000 most frequent Czech verbs were selected according to their number of occurrences in a part of the Czech National Corpus (only 'být' (to be) was excluded); then their perfective or imperfective aspectual counterparts were added, if they were missing.

License

VALLEX can be used under the Creative Commons license BY-NC-SA 4.0

VALLEX can be used free of charge by any academic, educational or research institution, or other organization or individual making use of VALLEX for non-commercial research and/or education purposes. Legal usage of VALLEX is conditioned by filling the registration form.