Principal investigator (ÚFAL): 
Provider: 
Grant id: 
GA P406/12/0557
Duration: 
2012-2015

VALLEX

Delving Deeper: Lexicographic Description of Syntactic and Semantic Properties of Czech Verbs

Language phenomena at the syntax-semantics interface have been studied extensively, yet an adequate framework for their lexicographic description is still missing. The goal of the project is to propose such a framework and to apply it in the lexicographic processing of language data. Both theoretical and applied research is pursued, as this is an approach that benefits both. The research focuses on two areas. First, it aims at deepening the insight into various changes in valency structure of verbs; a formal model for lexicographic representation of such changes is designed. The model is used for description of grammatical, syntactic and semantic diatheses in the Valency Lexicon of Czech Verbs (VALLEX). Second, the project deals with mapping lexical resources that primarily aims at enhancing VALLEX with a semantic classification based on the FrameNet lexical database. The main applied output of the project is a qualitatively and quantitatively enhanced version of VALLEX available for a wide professional audience, for students and other language users as well as for NLP applications.

The main goal of the project is to propose an adequate framework for the theoretical description of language phenomena at the syntax-semantics interface; such a framework will be applied in lexicographic processing of language data. A close interplay between theoretical research and its application to an extensive data annotation represents a fruitful strategy that fortifies both sides involved.

The following areas are addressed in the project:

  1. The lexicographic representation of various changes in valency structure of verbs
    1. Theoretical research; design of a formal model for lexicographic description
    2. Grammatical and syntactic diatheses: theoretical and practical aspects
    3. Semantic diatheses: theoretical and practical aspects
    4. Other types of changes in valency structure of verbs
    5. Comparative aspects of diatheses
    6. Application in an electronic language resource
  2. Mapping lexical resources: an effective way of enriching lexical information
    1. Enhancing Czech valency lexicon with semantic classes and semantic roles
    2. Strengthening lexical resources with corpus evidence

The full project description can be found in the project proposal.

Project partners

Project Summary

The accomplishments of the four year project Delving Deeper: Lexicographic Description of Syntactic and Semantic Properties of Czech Verbs are twofold:

1. The theoretical insight into various language phenomena at the syntactic-semantic interface has been deepened – both grammaticalized alternations (diatheses and reciprocity) and lexicalized alternations (lexical-semantic conversions, structural splitting of a situational participant and multiple structural realization of a situational participant) were put under scrutiny. A contrastive perspective was also applied: we have focused namely on differences in syntactic behavior of Czech, Polish and Russian verbs undergoing changes in their valency structure. Moreover, a typologically new type of syntactic change – related to syntactic reflexivity in Czech – was identified and studied in detail, esp. changes in morphemic expressions of verbal complementations conditioned by the long and clitic variant of the reflexive pronoun.

Based on these theoretical achievements, the formal model of the lexicon has been refined and enriched – as a result, the final model provides an adequate and economic lexicographic representation of the studied phenomena. Further, elaborate lexicographic rules for describing changes in valency behavior of verbs undergoing alternations have been designed.

2. The electronic Valency lexicon of Czech verbs VALLEX has been substantially qualitatively and quantitatively enhanced in two dimensions. First, the information on the applicability of individual alternations (both grammaticalized and lexicalized) was added to its data component. Moreover, the lexicon has been enriched with the sample annotations of Polish and Russian lexical units undergoing alternations. Second, VALLEX has been interlinked with the annotation lexicon of the Prague Dependency Treebank; as a result of the interlinking, the lexicon has been enriched with examples from the treebank. The lexicon is accessible on the following webpage http://ufal.mff.cuni.cz/vallex/3.0/ .

The results were made available to the research community in journals dedicated to Czech and other Slavic languages (esp. categories Jimp, Jneimp and Jrec; 5 articles already published and 3 accepted for publication), in thematic anthologies and as chapters of monographs (category C; 1 published, 4 accepted for publication) and 1 theoretical monograph (already published). In addition, these results were presented at international and Czech conferences on both theoretical and computational linguistics (especially at those with proceedings monitored in WoS, category D; 5 published texts).

The main applied output of the project is both qualitatively and quantitatively enhanced valency lexicon of Czech verbs available for a wide professional audience as well as for students and other language users. An emphasis was laid on both human and machine-readability; thus both linguists and developers of applications within the Natural Language Processing domain can use it.

The lexicon is prepared for publication as a monograph (category B) and it has been already released as an electronic language resource (software, category R).

Another positive aspect of the project was the involvement of students, which resulted in 1 MSc thesis (Gregoire Labbé) and 2 PhD theses (Václava Kettnerová and Eduard Bejček). Five more PhD students also participated in the project (Anna Vernerová, Marie Podobová, Natalia Klyueva, Katarzyna Vaculová and Adriana Filas) and thus have obtained important research experience.

Publications

2016

  • book
    • Lopatková Markéta, Kettnerová Václava, Bejček Eduard, Vernerová Anna, Žabokrtský Zdeněk: Valenční slovník českých sloves VALLEX. Karolinum, Praha, 698 pp., 2016.
  • chapters in books

2015

2014

2013

  • articles
  • proceedings/collections
    • Kettnerová, V., Lopatková, M.: The Representation of Czech Light Verb Constructions in a Valency Lexicon. In Hajičová, E., Gerdes, K., Wanner, L. (eds.) Proceedings of the Second International Conference on Dependency Linguistics, Depling 2013, pp. 147-156, 2013. Matfyzpress, Charles University in Prague, Prague, Czech Republic.
    • Kettnerová, V., Lopatková, M., Bejček, E., Vernerová, A., Podobová, M.: Corpus Based Identification of Czech Light Verbs. In Gajdošová, K., Žáková, A. (eds.) Proceedings of the Seventh International Conference Slovko 2013; Natural Language Processing, Corpus Linguistics, E-learning, pp. 118-128, 2013. RAM-Verlag, Lüdenscheid, Germany.
    • Vernerová, A., Lopatková, M.: Towards Automatic Detection of Applicable Diatheses. In Vinař, T. (ed.) ITAT 2013: Information Technologies – Applications and Theory (Proceedings), pp. 10-17, 2013. Slovenská spoločnosť pre umelú inteligenciu. CreateSpace Independent Publishing Platform.
  • others
    • Labbé, G.: Traitement de la valence de verbes de mouvement slovènes sur la base de la valence de verbes tchèques. Mémoire de master 1. Institut National des Langues et Civilisations Orientales / Univerzita Karlova v Praze, p. 64 2013.

2012

Data Release

  • Lopatková, M., Kettnerová, V., Bejček, E., Vernerová, A., Žabokrtský, Z.: VALLEX 3.0 - Valenční slovník českých sloves. Data/software, Charles University in Prague, Faculty of Mathematics and Physics, http://ufal.mff.cuni.cz/vallex/3.0/, 2015.
  • Lopatková, M, Kettnerová, V., Bejček, E., Skwarska, K., Žabokrtský, Z.: VALLEX 2.6. Data/software, ÚFAL MFF UK, http://ufal.mff.cuni.cz/vallex/2.6/, Dec 2012