DeriNet

Lexical Network of Word-Formation Relations in Czech

DeriNet is a lexical network which models word-formation relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational links (relations between derivatives and their base lexemes) or links connecting compounds or univerbated words with their base words.

The current version, DeriNet 2.3, contains over 1 million lexemes (sampled from the MorfFlex dictionary) and introduces several key improvements over version 2.2:

(a) the set of 1,040,126 lexemes is aligned with the latest version of MorfFlex CZ (version 2.1),

(b) 5,781 derivational trees containing loanwords are enriched with etymological information specifying their origins, adopted from the Czech Etymological Lexicon,

(c) 8,867 new derivational and 1,262 new compound relations have been identified, resulting in a total of 791,771 derivational and 7,598 compound relations, and

(d) the morphological segmentation and classification of morphs have been significantly enhanced.

More details on the current file format of DeriNet releases, which has been used since version 2.0, can be found in Jonáš Vidra's et al. paper presented at the DeriMo 2019 workshop.

DeriNet 2.3 was released in January 2025. It is available in the LINDAT/CLARIAH-CZ digital library at the Institute of Farmal and Aplied Linguistics, Faculty of Mathematics and Physics, Charles University, under the terms of the Creative Commons-Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

For older versions of the DeriNet data see here.

Online DeriNet Tools

DeriNet data can be searched online using two versions of DeriNet Search. DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be viewed online using DeriNet Viewer.

 

 

Related projects

Universal Derivations (UDer)

DeriNet 2.1 is a part of the Universal Derivations (UDer), a collection of harmonized derivational resources for multiple languages. The current version contains many derivational resources for several different languages, all harmonized to the DeriNet file format. See the UDer page for more details.

Word-formation networks for other languages (created in the DeriNet-like format)

The following resources were created in cooperation with Poznan University of Technology:

  • DeriNet-style derivational networks for Czech, French, Polish, and Spanish created by a semi-supervised approach using a sequential pattern mining technique, as described in an article currently under review in the LRE journal: semi-supervised.zip (four generated networks plus our hand-annotated samples, for individual licenses see README)
  • Polish Word-Formation Network v. 0.5 (under CC-BY-NC-SA): polish-wfn-0.5.zip
  • Spanish Word-Formation Network v. 0.5 (under CC-BY-ND): spanish-wfn-0.5.zip
    • by Mateusz Lango

Related publications: