Czech title: Vybrané derivační vztahy pro automatické zpracovaní češtiny
Postdoc project GA ČR P406/12/P175
Principal investigator: Magda Ševčíková
The project deals with selected word-formation relations in Czech, namely with relations between adjectives and their derivates. On the basis of the semantic relation to their base adjectives, derivates were classified into two groups: into syntactic derivates (which have the same meaning as their base adjectives but differ in syntactic functions) and lexical derivates (which differ from the base adjectives in meaning). Based on our theoretical findings, an annotation proposal reflecting the derivational relations was integrated in the deep-syntactic annotation of the Prague Dependency Treebank (PDT) and included in PDT 3.0.
A database of words derived from adjectives (AdjDeriNet) was created under the project:
AdjDeriNet: Words Derived from Adjectives in Czech
Authors: Magda Ševčíková, Zdeněk Žabokrtský
The data consists of pairs of base adjectives and their derivatives. It contains 17,942 base adjectives (1st column in the tsv file; source_lemma in the xml file) that are base words for 26,329 lexemes of several parts of speech (2nd column in tsv; target_lemma in xml); the part of speech of the derivative is specified (3rd column in tsv; target_pos in xml):
The most productive base adjectives are:
The list of the most productive affixes:
Nouns ending in -as (ex. kliďas ‘phlegmatic person’), verbs with -at (zelenat ‘to turn green’), or adjectives with the suffix -ičký (maličký ‘very small’) belong to the least frequent deadjectival derivatives in the database.
The development procedure was focused on precision rather than recall (for instance, prefixes and prefixation combined with suffixation was omitted).
The data of AdjDeriNet can be downloaded from LINDAT-Clarin infrastructure in simple plain-text format (tab separated columns) or
in a self-documenting XML-based format. It can be used under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (CC-BY-NC-SA 3.0).