DeriNet

Lexical Network of Word-Formation Relations in Czech

DeriNet is a lexical network which models word-formation relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational links (relations between derivatives and their base lexemes) or links connecting compounds or univerbated words with their base words.

The current version, DeriNet 2.3, contains over 1 million lexemes (sampled from the MorfFlex dictionary) and introduces several key improvements over version 2.2:

(a) the set of 1,040,126 lexemes is aligned with the latest version of MorfFlex CZ (version 2.1),

(b) 5,781 derivational trees containing loanwords are enriched with etymological information specifying their origins, adopted from the Czech Etymological Lexicon,

(c) 8,867 new derivational and 1,262 new compound relations have been identified, resulting in a total of 791,771 derivational and 7,598 compound relations, and

(d) the morphological segmentation and classification of morphs have been significantly enhanced.

More details on the current file format of DeriNet releases, which has been used since version 2.0, can be found in Jonáš Vidra's et al. paper presented at the DeriMo 2019 workshop.

DeriNet 2.3 was released in January 2025. It is available in the LINDAT/CLARIAH-CZ digital library at the Institute of Farmal and Aplied Linguistics, Faculty of Mathematics and Physics, Charles University, under the terms of the Creative Commons-Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

For older versions of the DeriNet data see here.

Online DeriNet Tools

DeriNet data can be searched online using two versions of DeriNet Search. DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be viewed online using DeriNet Viewer.

Related projects

Universal Derivations (UDer)

DeriNet 2.1 is a part of the Universal Derivations (UDer), a collection of harmonized derivational resources for multiple languages. The current version contains many derivational resources for several different languages, all harmonized to the DeriNet file format. See the UDer page for more details.

Word-formation networks for other languages (created in the DeriNet-like format)

DeriNet.RU 0.5 for Russian (under CC-BY-NC-SA 3.0 license): derinet-ru-0.5.zip
- by Lukáš Kyjánek et al.; part of the UDer collection
DeriNet.ES 0.6 for Spanish (under CC-BY-NC-SA 3.0 license): derinet-es-2019-06-10.tsv
- by Ján Faryad; part of the UDer collection
DeriNet.FA 0.5 for Farsi (under CC-BY-NC-SA 4.0 license)
- by Ebrahim Ansari et al.; part of the UDer collection

The following resources were created in cooperation with Poznan University of Technology:

DeriNet-style derivational networks for Czech, French, Polish, and Spanish created by a semi-supervised approach using a sequential pattern mining technique, as described in an article currently under review in the LRE journal: semi-supervised.zip (four generated networks plus our hand-annotated samples, for individual licenses see README)
Polish Word-Formation Network v. 0.5 (under CC-BY-NC-SA): polish-wfn-0.5.zip
- by Mateusz Lango; part of the UDer collection
Spanish Word-Formation Network v. 0.5 (under CC-BY-ND): spanish-wfn-0.5.zip
- by Mateusz Lango

Related publications:

Emil Svoboda & Magda Ševčíková. PaReNT (Parent Retrieval Neural Tool): A Deep Dive into Word Formation Across Languages. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). Torino: ELRA and ICCL, 2024, pp. 12611–12621.
Emil Svoboda & Magda Ševčíková. Compounds in Universal Dependencies: A Survey in Five European Languages. In Proceedings of the 6th Workshop on Research in Computational Linguistic Typology and Multilingual Natural Language Processing. Stroudsburg: ACL, 2024, pp. 88–99.
Lukáš Kyjánek, et al. Constructing a Lexical Resource of Russian Derivational Morphology. In: Proceedings of the 13th Conference on Language Resources and Evaluation Conference (LREC). Marseille, 2022, pp. 2788-2797.
Magda Ševčíková & Zdeněk Žabokrtský. Lexikální databáze DeriNet jako formální model tvoření slov v češtině. Seminář Ústavu Českého národního korpusu, 23 November 2021.
Jonáš Vidra & Zdeněk Žabokrtský. Transferring Word-Formation Networks Between Languages. In Proceedings of the Third Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021). France, 2021, pp. 135-144.
Emil Svoboda & Magda Ševčíková. Spliting and Identifying Czech Compounds: A Pilot Study. In Proceedings of the Third Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021). France, 2021, pp. 125-134.
Magda Ševčíková et al. Agent noun formation in Czech: An empirical study on suffix rivalry. In Second Workshop on Paradigmatic Word Formation Modelling, 2021, pp. 65-68.
Mateusz Lango et al. Semi-automatic construction of word-formation networks. Language Resources and Evaluation, 54, 2020, pp. 1-30.
Jan Bodnár et al. Semi-supervised Induction of Morpheme Boundaries in Czech Using a Word-Formation Network. In Proceedings of the 23rd International Conference Text, Speech, and Dialogue (TSD 2020), 2020, pp. 189-196.
Jonáš Vidra & Zdeněk Žabokrtský. Next Step in Online Querying and Visualization of Word-Formation Networks. In Proceedings of the 23rd International Conference Text, Speech, and Dialogue (TSD 2020), 2020, pp. 114-152.
Lukáš Kyjánek et al. Universal Derivations 1.0, A Growing Collection of Harmonised Word-Formation Resources. The Prague Bulletin of Mathematical Linguistics, 2020, 115(2), pp. 5-30.
Hamid Haghdoost et al. Morphological Networks for Persian and Turkish: What Can Be Induced from Morpheme Segmentation? The Prague Bulletin of Mathematical Linguistics, 2020, 115(2), pp. 105-127.
Lukáš Kyjánek. Harmonisation of Language Resources for Word-Formation of Multiple Languages. Master’s thesis, supervised by Magda Ševčíková. Prague, 2020. Unpublished thesis.
Magda Ševčíková & Lukáš Kyjánek. Introducing Semantic Labels into the DeriNet Network. Journal of Linguistics, 2019, 70(2), pp. 412-423.
Lukáš Kyjánek et al. Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 101-110.
Jonáš Vidra et al. DeriNet 2.0: Towards an All-in-One Word-Formation Resource. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 81-89.
Rudolf Rosa & Zdeněk Žabokrtský. Attempting to separate inflection and derivation using vector space representations. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 61-70.
Hamid Haghdoost et al. Building a Morphological Network for Persian on Top of a Morpheme-Segmented Lexicon. In Proceedings of the Second Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019). Prague, 2019, pp. 91-100.
Ján Faryad. Identifikace derivačních vztahů ve španělštině. ÚFAL Technical Report TR-2019-63. Prague, 2019.
Lukáš Kyjánek. Morphological Resources of Derivational Word-Formation Relations. ÚFAL Technical Report TR-2018-61. Prague, 2018.
Mateusz Lango et al. Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish). In Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, 2018, pp. 1853-1860.
Jonáš Vidra. Morphological segmentation of Czech words. Master's thesis, supervised by Zdeněk Žabokrtský. Prague, 2018. Unpublished thesis.
Magda Ševčíková et al. A language resource specialized in Czech word-formation: Recent achievements in developing the DeriNet database. Presented at the SlaviCorp 2018 conference. Prague, 2018.
Jonáš Vidra & Zdeněk Žabokrtský. Online Software Components for Accessing Derivational Networks. In Proceedings of the Workshop on Resources and Tools for Derivational Morphology (DeriMo 2017). Milano, 2017, pp. 129-139.
Magda Ševčíková et al. Identification of aspectual pairs of verbs derived by suffixation in the lexical database DeriNet. In Proceedings of the Workshop on Resources and Tools for Derivational Morphology (DeriMo 2017). Milano, 2017, pp. 105-116.
Magda Ševčíková. Modelování slovotvorných vztahů ve slovní zásobě češtiny. Talk at the Seminar of Formal Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, May 2017.
Magda Ševčíková et al. Lexikální síť DeriNet: elektronický zdroj pro výzkum derivace v češtině. Časopis pro moderní filologii, 98:1, 2016, pp. 62-76.
Zdeněk Žabokrtský et al. Merging Data Resources for Inflectional and Derivational Morphology in Czech. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016). Portorož, 2016, pp. 1307-1314
Magda Ševčíková & Zdeněk Žabokrtský. Word-Formation Network for Czech. In Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014). Reykjavík, 2014, pp. 1087-1093
Magda Ševčíková & Zdeněk Žabokrtský. Talk at the Seminar of Formal Linguistics, Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Prague, December 2014 (synchronized with DeriNet 0.9):
- Magda Ševčíková's slides
- Zdeněk Žabokrtský's slides

Search form