Principal investigator (ÚFAL): 
Grant id: 


Corpus-based Valency Lexicon of Czech Nouns

The project dealt with the lexicographic treatment of valency of Czech deverbal nouns.

The lexicon, called NomVallex I., is available in three formats:


An example of two related lexicon entries:

The main characteristics of the NomVallex I. lexicon can be summarized as follows:

  1. The lexicon captures valency of Czech deverbal nouns belonging in at least one of their meanings to one of the following semantic classes: Communication (e.g. dotaz ‘question’, dotazování (se) – dotázání (se) ‘asking’), Mental Action (e.g. plán ‘plan’, plánování ‘planning’) or Psych State (e.g. nenávist ‘hatred’, nenávidění ‘hating’). In total, the lexicon includes 505 lexical units in 248 lexemes (when considering aspectual counterparts, such as namítáníimpf – namítnutípf ‘expressing objections’, to be individual lexical units, the number rises to 655 lexical units covering a total of 297 lemmas).
  2. The lexicon is created within the theoretical framework of Functional Generative Description. It is founded on data from the SYN series of corpora from the Czech National Corpus and the Araneum Bohemicum Maximum corpus.
  3. The lexicon follows in the footsteps of the VALLEX lexicon (Lopatková et al., 2016b); it adopts the VALLEX annotation scheme, and in relevant cases, deverbal nouns captured in NomVallex I. mirror the division of lexemes into lexical units and the assignment of lexical units to semantic classes of the base verbs captured in the VALLEX lexicon. (In its electronic version, NomVallex I. also provides links to the valency lexicon PDT-Vallex.)
  4. It captures all lexical meanings of the nouns, differentiating between basic “categorial” meanings, i.e. action (e.g. žádání (si) ‘asking’, dovtípení (se) ‘inferring’), abstract result of an action (e.g. žádost ‘request’), property/quality (e.g. důvtip ‘ingenuity’), material object (e.g. pohled ‘postcard’) and container/quantity (e.g. počet ‘number’).
  5. Considering morphosyntactic properties of the studied nouns, the lexicon differentiates three basic types of noun derivates, namely syntactic, syntacticolexical and lexical derivates. In order to be able to compare valency behaviour of different types of noun derivates, a decision was made to include both stem-nominals (derived from verbs by suffixes -ní/-tí and containing a theme suffix, e.g. žádání (si)impf ‘asking’, navrhováníimpf – navrhnutípf1 – navrženípf2 ‘suggesting/proposing’, namítáníimpf – namítnutípf ‘expressing objections’) and root-nominals (derived from verbs by various suffixes, including the zero suffix, but not containing a theme suffix, e.g. žádost no-aspect ‘request’, návrhno-aspect ‘proposal’, námitkano-aspect ‘objection’).
  6. Nouns matching the following criteria were included in the lexicon: its semantic class is either Communication, Mental Action or Psych State, its categorial meaning is action or abstract result of an action, and it exhibits non-systemic valency behaviour (especially non-systemic forms of participants). When both stemnominals and root-nominals derived from the same verb are available, both are included if at least one of them satisfies the above criteria in at least one of its senses.
  7. Valency properties are captured in the form of a valency frame (in which valency slots are specified by a functor and a list of morphemic forms), and examples which occurred in the corpus data. The lexicon aims to illustrate the full range of syntactic structures of noun phrases, and thus the syntactic behaviour of every lexical unit is exemplified with all combinations of its participants (in all forms specified in the valency frame) which were found in the corpus data.
  8. In accordance with valency lexicons VALLEX and PDT-Vallex, the NomVallex I. lexicon assumes that every lexical meaning (sense) is linked to a corresponding valency frame, and vice versa, a difference in valency frames proves a difference in meaning. Morphosyntactic properties are taken into account when there is ambiguity about whether separate senses should be distinguished or not.
  9. Along with the printed version, the NomVallex I. lexicon is also available in an electronic form, both as publicly available web-pages ( and as machine readable data suitable for further research into valency of Czech deverbal nouns and for other NLP applications. The online version and an offline application allow for formulating specific and complex queries based on a wide range of criteria, e.g. the type of derivation of the noun (stem vs. root nominals), its aspectual characteristics, categorial meaning, semantic class, types of its valency complementations and their morphemic forms (including their distribution depending on the type of the noun and/or the type of the complementation itself, individually and in combinations), and the relation of the noun to its base verb including the differences in valency behaviour. Last but not least, the lexicon also quotes rich corpus evidence supporting the valency characteristics described in the given valency frames.
  10. A comparison of valency frames of nouns in the NomVallex I. lexicon and valency frames of their base verbs in VALLEX enables us to gain insight into systemic and non-systemic valency behaviour of Czech deverbal nouns.






