| |||
The Valency Lexicon of Czech Verbs, Version 1.0 (VALLEX 1.0) is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 1.0 was developed at the Center of Computational Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, with the support of the project MSMT LN00A063.
VALLEX 1.0 is closely related to the Prague Dependency Treebank (PDT) project. The Functional Generative Description (FGD), being developed by Petr Sgall and his collaborators since the 1960s, is used as the background theory both in PDT and in VALLEX 1.0. In PDT, FGD is being verified by a complex annotation of large amounts of textual data, whereas in VALLEX 1.0 it is used only for the description of valency frames of selected verbs.
VALLEX 1.0 contains roughly 1400 verbs (counting only perfective and imperfective verbs, but not their iterative counterparts). They were selected as follows: (1) We started with about 1000 most frequent Czech verbs, according to their number of occurrences in a part of the Czech National Corpus (only 'být' (to be) and some modal verbs were excluded from this set, because of their non-trivial status on the tectogrammatical level of FGD). (2) Then we added their perfective or imperfective aspectual counterparts, if they were missing; in other words, the set of verbs in VALLEX 1.0 is closed under the relation of 'aspectual pair'.
The preparation of the first version of VALLEX has taken more than two years. Although it is still a work in progress requiring further linguistic research, we believe that it will be useful or at least interesting for other researchers in the field. That is why we make it available to them, as well as to anyone else using it for non-commercial purposes (under the terms of the license agreement below), already now.
From the very beginning, VALLEX 1.0 was designed
with an emphasis on both human and machine readability.
Therefore both linguists and developers of applications
within the Natural Language Processing domain can
use and critically evaluate its content.
Of course, any feedback from them will be a valuable
source of information to us, as well as a great motivation for
further work.
the Authors |
No matter what purpose you want to use the data for, we encourage you to get acquainted with the logical structure of the VALLEX data first.
In the following paragraphs, the links marked with (R) are accessible only to the registered users.
In order to satisfy different needs of different potential users, we distribute the lexicon in the following formats:
XML is the primary data format of VALLEX; both printable and browsable versions were automatically generated from the XML.
Note: The data format used during the manual annotation of VALLEX is not presented here. It is just a simple plain text format based on several notation rules, which allow for deterministic conversion to XML as well as syntax highlighting in an off-the-shelf text editor (therefore no specially designed annotation software was needed).
To obtain VALLEX 1.0, fill in the registration form below. After the registration, VALLEX 1.0 can be used by any academic, educational or research institution, or other organization or individual making use of VALLEX 1.0 for research and/or education purposes. Any other use is subject to explicit negotiations. Please read the license agreement carefully.
After the registration, the users will receive an e-mail with the instructions how to download VALLEX 1.0 from the Internet. It will be normally sent within one or two working days. No response will be sent to those who fill the form improperly (e.g. entering a senseless name or an anonymous e-mail address).
If you do not get the message within a reasonable time, please contact us at zabokrtsky@ckl.mff.cuni.cz .Although we have spent large efforts on removing annotation errors, some of them still remain in the lexicon. If you find any, you are kindly asked to report them to zabokrtsky@ckl.mff.cuni.cz. Any other comments are welcome as well.
Backward compatibility of future versions of VALLEX is not guaranteed. Besides removing potential annotation errors in VALLEX 1.0, some changes of the annotation scheme are expected due to progress in the theoretical model of valency.
Unlike the big Czech printed dictionaries such as 'Slovník spisovného jazyka českého', VALLEX is not supposed to be used as an obligatory norm of contemporary Czech.
VALLEX 1.0 has been carried out under the project MSMT LN00A063.
Many thanks for an extensive linguistic and also technical advice go to our colleagues from CKL and UFAL, especially to professor Jarmila Panevová.