DeriNet

Lexical Network of Word-Formation Relations in Czech

DeriNet is a lexical network which models word-formation relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational links (relations between derivatives and their base lexemes) or links connecting compounds with their base words.

The present version, DeriNet 2.1, contains over 1 million lexemes (sampled from the MorfFlex dictionary) connected by 782 thousand derivational relations, 144 relations of conversion, 295 relations of univerbisation, 1,952 links pointing from compounds to their base words, and 50,533 links connecting orthographic variants.

A major change is the inclusion of autogenerated full morphological segmentations of all lemmas, 202 affixoid nodes serving as a base for (neoclassical) compounding, annotation of corpus frequency of lexemes, annotation of conjugation classes of verbs, links between orthographic variants of lexemes, and a pilot annotation of univerbisation.

More details on the current file format of DeriNet releases since version 2.0 can be found in Jonáš Vidra's et al. paper presented at the DeriMo 2019 workshop.

DeriNet 2.1 was released in July 2021. It is available in the LINDAT/CLARIAH-CZ digital library at the Institute of Farmal and Aplied Linguistics, Faculty of Mathematics and Physics, Charles university under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (CC-BY-NC-SA).

For older versions of the DeriNet data see here.

Online DeriNet Tools

DeriNet data can be searched online using two versions of DeriNet Search. DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be viewed online using DeriNet Viewer.

 

 

Related projects

Universal Derivations (UDer)

DeriNet 2.1 is a part of the Universal Derivations (UDer), a collection of harmonized derivational resources for multiple languages. The current version contains many derivational resources for several different languages, all harmonized to the DeriNet file format. See the UDer page for more details.

Word-formation networks for other languages (created in the DeriNet-like format)

The following resources were created in cooperation with Poznan University of Technology:

  • DeriNet-style derivational networks for Czech, French, Polish, and Spanish created by a semi-supervised approach using a sequential pattern mining technique, as described in an article currently under review in the LRE journal: semi-supervised.zip (four generated networks plus our hand-annotated samples, for individual licenses see README)
  • Polish Word-Formation Network v. 0.5 (under CC-BY-NC-SA): polish-wfn-0.5.zip
  • Spanish Word-Formation Network v. 0.5 (under CC-BY-ND): spanish-wfn-0.5.zip
    • by Mateusz Lango

Related publications: