valency - the range of syntactic elements either required or specifically permitted by a verb or other lexical unit ...

Concise Oxford Dictionary of Linguistics

Preface
Data
Documentation
Registration and Download
Disclaimer
Acknowledgments

Preface

The Valency Lexicon of Czech Verbs, Version 1.0 (VALLEX 1.0) is a collection of linguistically annotated data and documentation, resulting from an attempt at formal description of valency frames of Czech verbs. VALLEX 1.0 was developed at the Center of Computational Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, with the support of the project MSMT LN00A063.

VALLEX 1.0 is closely related to the Prague Dependency Treebank (PDT) project. The Functional Generative Description (FGD), being developed by Petr Sgall and his collaborators since the 1960s, is used as the background theory both in PDT and in VALLEX 1.0. In PDT, FGD is being verified by a complex annotation of large amounts of textual data, whereas in VALLEX 1.0 it is used only for the description of valency frames of selected verbs.

VALLEX 1.0 contains roughly 1400 verbs (counting only perfective and imperfective verbs, but not their iterative counterparts). They were selected as follows: (1) We started with about 1000 most frequent Czech verbs, according to their number of occurrences in a part of the Czech National Corpus (only 'být' (to be) and some modal verbs were excluded from this set, because of their non-trivial status on the tectogrammatical level of FGD). (2) Then we added their perfective or imperfective aspectual counterparts, if they were missing; in other words, the set of verbs in VALLEX 1.0 is closed under the relation of 'aspectual pair'.

The preparation of the first version of VALLEX has taken more than two years. Although it is still a work in progress requiring further linguistic research, we believe that it will be useful or at least interesting for other researchers in the field. That is why we make it available to them, as well as to anyone else using it for non-commercial purposes (under the terms of the license agreement below), already now.

From the very beginning, VALLEX 1.0 was designed with an emphasis on both human and machine readability. Therefore both linguists and developers of applications within the Natural Language Processing domain can use and critically evaluate its content. Of course, any feedback from them will be a valuable source of information to us, as well as a great motivation for further work.

the Authors

Data

No matter what purpose you want to use the data for, we encourage you to get acquainted with the logical structure of the VALLEX data first.

In the following paragraphs, the links marked with ^(R) are accessible only to the registered users.

In order to satisfy different needs of different potential users, we distribute the lexicon in the following formats:

Browsable version (preview). HTML version of the data allows for an easy and fast navigation through the lexicon. Verbs and frames are organized in several ways, following various criteria. You can start with the alphabetically sorted verbs^(R) (remember: almost everything is clickable!).
Printable version (preview). You can print your own hard copy of VALLEX (beware: it is more than 200 pages!)
- PostScript (supported e.g. by GSView)
  data/printable/vallex.ps^(R)
- Portable Document Format (supported e.g. by Adobe Reader)
  data/printable/vallex.pdf^(R)
XML version (preview). Mostly for programmers: you can run sophisticated queries on the data (e.g. using the XSH environment, see the Technical Report pp. 89-94) or use the data in your applications, unless it contradicts the license agreement below.
- Document Type Definition data/xml/vallex.dtd^(R)
- XML Data data/xml/vallex.xml^(R)

XML is the primary data format of VALLEX; both printable and browsable versions were automatically generated from the XML.

Note: The data format used during the manual annotation of VALLEX is not presented here. It is just a simple plain text format based on several notation rules, which allow for deterministic conversion to XML as well as syntax highlighting in an off-the-shelf text editor (therefore no specially designed annotation software was needed).

Documentation

Besides the logical structure of VALLEX mentioned above, selected publications are included in VALLEX 1.0 (although some claims contained in them might be obsolete at this moment). By far the most comprehensive text is the Technical Report from 2002 (100 pages).

Lopatková, M. (2003) Valency in the Prague Dependency Treebank: Building the Valency Lexicon. PBML 79-80.
doc/pbml-2003.doc
doc/pbml-2003.pdf
Lopatková, M., Žabokrtský, Z., Skwarska, K., Benešová, V. (2002) Tektogramaticky anotovaný valenční slovník českých sloves. ÚFAL-CKL Technical Report TR-2002-15.
doc/tech-rep-2002.ps
doc/tech-rep-2002.pdf
Straňáková-Lopatková, M., Žabokrtský, Z. (2002b) Valency Dictionary of Czech Verbs: Complex Tectogrammatical Annotation. In: LREC2002, Proceedings, vol.III. (eds. M. González Rodríguez, C. Paz Suárez Araujo), ELRA, pp. 949-956.
doc/lrec-2002.pdf
Straňáková-Lopatková, M., Žabokrtský, Z. (2002a) Valenční slovník stokrát jinak: co je pod povrchem? In: Čeština - univerzália a specifika 4. Sborník konference ve Šlapanicích u Brna (ed. Zdeňka Hladká, Petr Karlík).
doc/slap-2002.doc
doc/slap-2002.pdf

Registration and Download

To obtain VALLEX 1.0, fill in the registration form below. After the registration, VALLEX 1.0 can be used by any academic, educational or research institution, or other organization or individual making use of VALLEX 1.0 for research and/or education purposes. Any other use is subject to explicit negotiations. Please read the license agreement carefully.

After the registration, the users will receive an e-mail with the instructions how to download VALLEX 1.0 from the Internet. It will be normally sent within one or two working days. No response will be sent to those who fill the form improperly (e.g. entering a senseless name or an anonymous e-mail address).

If you do not get the message within a reasonable time, please contact us at zabokrtsky@ckl.mff.cuni.cz .

Disclaimer

Although we have spent large efforts on removing annotation errors, some of them still remain in the lexicon. If you find any, you are kindly asked to report them to zabokrtsky@ckl.mff.cuni.cz. Any other comments are welcome as well.

Backward compatibility of future versions of VALLEX is not guaranteed. Besides removing potential annotation errors in VALLEX 1.0, some changes of the annotation scheme are expected due to progress in the theoretical model of valency.

Unlike the big Czech printed dictionaries such as 'Slovník spisovného jazyka českého', VALLEX is not supposed to be used as an obligatory norm of contemporary Czech.

Acknowledgements

VALLEX 1.0 has been carried out under the project MSMT LN00A063.

Many thanks for an extensive linguistic and also technical advice go to our colleagues from CKL and UFAL, especially to professor Jarmila Panevová.