CD cover
valency - the range of syntactic elements either required or specifically permitted by a verb or other lexical unit ...
Concise Oxford Dictionary of Linguistics

Preface

The Valency Lexicon of Czech Verbs, Version 1.0 (VALLEX 1.0) is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 1.0 was developed at the Center of Computational Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, with the support of the project MSMT LN00A063.

VALLEX 1.0 is closely related to the Prague Dependency Treebank (PDT) project. The Functional Generative Description (FGD), being developed by Petr Sgall and his collaborators since the 1960s, is used as the background theory both in PDT and in VALLEX 1.0. In PDT, FGD is being verified by a complex annotation of large amounts of textual data, whereas in VALLEX 1.0 it is used only for the description of valency frames of selected verbs.

VALLEX 1.0 contains roughly 1400 verbs (counting only perfective and imperfective verbs, but not their iterative counterparts). They were selected as follows: (1) We started with about 1000 most frequent Czech verbs, according to their number of occurrences in a part of the Czech National Corpus (only 'být' (to be) and some modal verbs were excluded from this set, because of their non-trivial status on the tectogrammatical level of FGD). (2) Then we added their perfective or imperfective aspectual counterparts, if they were missing; in other words, the set of verbs in VALLEX 1.0 is closed under the relation of 'aspectual pair'.

The preparation of the first version of VALLEX has taken more than two years. Although it is still a work in progress requiring further linguistic research, we believe that it will be useful or at least interesting for other researchers in the field. That is why we make it available to them, as well as to anyone else using it for non-commercial purposes (under the terms of the license agreement below), already now.

From the very beginning, VALLEX 1.0 was designed with an emphasis on both human and machine readability. Therefore both linguists and developers of applications within the Natural Language Processing domain can use and critically evaluate its content. Of course, any feedback from them will be a valuable source of information to us, as well as a great motivation for further work.

the Authors

Data

No matter what purpose you want to use the data for, we encourage you to get acquainted with the logical structure of the VALLEX data first.

In the following paragraphs, the links marked with (R) are accessible only to the registered users.

In order to satisfy different needs of different potential users, we distribute the lexicon in the following formats:

XML is the primary data format of VALLEX; both printable and browsable versions were automatically generated from the XML.

Note: The data format used during the manual annotation of VALLEX is not presented here. It is just a simple plain text format based on several notation rules, which allow for deterministic conversion to XML as well as syntax highlighting in an off-the-shelf text editor (therefore no specially designed annotation software was needed).

Documentation

Besides the logical structure of VALLEX mentioned above, selected publications are included in VALLEX 1.0 (although some claims contained in them might be obsolete at this moment). By far the most comprehensive text is the Technical Report from 2002 (100 pages).

Registration and Download

To obtain VALLEX 1.0, fill in the registration form below. After the registration, VALLEX 1.0 can be used by any academic, educational or research institution, or other organization or individual making use of VALLEX 1.0 for research and/or education purposes. Any other use is subject to explicit negotiations. Please read the license agreement carefully.

After the registration, the users will receive an e-mail with the instructions how to download VALLEX 1.0 from the Internet. It will be normally sent within one or two working days. No response will be sent to those who fill the form improperly (e.g. entering a senseless name or an anonymous e-mail address).

VALLEX 1.0 - Registration Form
Name:
E-mail:
Institution:
Address:

I hereby confirm that I accept the license agreement.
If you do not get the message within a reasonable time, please contact us at zabokrtsky@ckl.mff.cuni.cz .

Disclaimer

Although we have spent large efforts on removing annotation errors, some of them still remain in the lexicon. If you find any, you are kindly asked to report them to zabokrtsky@ckl.mff.cuni.cz. Any other comments are welcome as well.

Backward compatibility of future versions of VALLEX is not guaranteed. Besides removing potential annotation errors in VALLEX 1.0, some changes of the annotation scheme are expected due to progress in the theoretical model of valency.

Unlike the big Czech printed dictionaries such as 'Slovník spisovného jazyka českého', VALLEX is not supposed to be used as an obligatory norm of contemporary Czech.

Acknowledgements

VALLEX 1.0 has been carried out under the project MSMT LN00A063.

Many thanks for an extensive linguistic and also technical advice go to our colleagues from CKL and UFAL, especially to professor Jarmila Panevová.



Valid HTML 4.0! Valid CSS!