Valency Lexicon of Czech Verbs

Markéta Lopatková, Václava Kettnerová, Eduard Bejček, Karolína Skwarska, Zdeněk ®abokrtský



   VALLEX Data
     - as web pages
     - as a book
     - as an XML file


   Docs & Publications

   License & Registration





The Valency Lexicon of Czech Verbs, Version 2 (VALLEX 2.x), is a collection of linguistically annotated data and documentation, resulting from an attempt at a formal description of the valency frames of Czech verbs. VALLEX has been developed at the Institute of Formal and Applied Linguistics at Faculty of Mathematics and Physics, Charles University in Prague. VALLEX 2.x is a successor of VALLEX 1.0, extended in both theoretical and quantitative aspects.

VALLEX 2.x provides information on the valency structure of verbs in their particular meanings / senses, possible morphological forms of their complementations and additional syntactic information, accompanied with glosses and examples. All lexeme entries in VALLEX are created manually; manual annotation with accent on consistency is highly time consuming and limits the speed of quantitative growth, but allows reaching the desired quality.

VALLEX is closely related to the Prague Dependency Treebank (PDT) project. The Functional Generative Description (FGD), being developed by Petr Sgall and his collaborators since the 1960s, is used as the background theory both in PDT and VALLEX. In PDT, FGD is verified by a complex annotation of large amounts of textual data, whereas in VALLEX it is used only for the description of the valency frames of selected verbs.

In VALLEX 2.x, there are roughly 2,730 lexeme entries containing together around 6,460 lexical units ("senses"). It is important to mention that VALLEX 2.x - according to FGD and unlike traditional dictionaries - treats a pair of perfective and imperfective aspectual counterparts as a single lexeme. Therefore, if perfective and imperfective verbs are counted separately, the size of VALLEX 2.x virtually grows to 4,250 entries (still without counting iteratives).

The verbs contained in VALLEX 2.x were selected as follows: (1) We gradually processed around 2500 most frequent Czech verbs, according to the number of their occurrences in a part of the Czech National Corpus. (2) Simultaneously, we added their perfective or imperfective aspectual counterparts (if they were not already present in the list of the most frequent verbs), and occasionally also iterative counterparts.

The preparation of the presented version of VALLEX has taken more than five years. Although it is still the work in progress requiring further linguistic research, we believe that even now it can be useful or at least interesting for other researchers in the field.

From the very beginning, VALLEX has been designed with emphasis on both human and machine readability. Therefore, both linguists and developers of applications within the Natural Language Processing domain can use and critically evaluate its content (of course, any feedback from them will be a valuable source of information for us, as well as a great motivation for further work). In order to satisfy different needs of these different potential users, VALLEX 2.x is distributed in the following three shapes:

  • Book version. VALLEX 2.5 was issued in the form of a traditional printed dictionary.
  • Browsable version. The HTML version of the data allows for an easy and fast navigation through the lexicon. Lexemes and lexical units are organized in several ways, following various criteria.
  • XML version. Programmers can run sophisticated queries (e.g., based on the XPATH query language) on this machine-tractable data, or use it in their applications.