Word Formation Analyzer for Czech: Automatic Parent Retrieval and Classification of Word Formation Processes

Emil Svoboda, Magda Ševčíková

References:

  1. Marenglen Biba and Eva Gjati. Boosting text classification through stemming of composite words In Recent Advances in Intelligent Informatics, pages 185–194, Springer, 2014. (http://doi.org/10.1007/978-3-319-01778-5_19)
  2. Ivana Bozděchová. Tvoření slov skládáním, Institut sociálních vztahů, Praha, 1997.
  3. Johan Carlberger, Hercules Dalianis, Martin Duneld, and Ola Knutsson. Improving precision in information retrieval for Swedish using stemming In Proceedings of the 13th Nordic Conference of Computational Linguistics (NODALIDA 2001), pages 17-22, 2001.
  4. Elizaveta L. Clouet and Béatrice Daille. Splitting of compound terms in non-prototypical compounding languages In Workshop on Computational Approaches to Compound Analysis, pages 11–19, 2014. (http://doi.org/10.3115/v1/W14-5702)
  5. Jonáš Vidra, Zdeněk Žabokrtský, Magda Ševčíková, and Lukáš Kyjánek. DeriNet 2.0: Towards an All-in-One Word-Formation Resource In Proceedings of the 2nd Workshop on Resources and Tools for Derivational Morphology, pages 81–89, Charles University, 2019.
  6. Jonáš Vidra, Zdeněk Žabokrtský, Lukáš Kyjánek, Magda Šev cíková, Šárka Dohnalová, Emil Svoboda, and Jan Bodnár. DeriNet 2.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, 2021.
  7. Miloš Dokulil. Tvoření slov v češtině 1: Teorie odvozování slov, Academia, Praha, 1962.
  8. Ljiljana Dolamic and Jacques Savoy. Indexing and stemming approaches for the Czech language Information Processing & Management 45, pages 714–720, Elsevier, 2009. (http://doi.org/10.1016/j.ipm.2009.06.001)
  9. Petr Chmelař, David Hellebrand, Michal Hrušecký, and Vladimír Bartík. Nalezení slovních kořenů v češtině In Znalosti 2011: Sborník příspěvků 10. ročníku konference, pages 66–77, VŠB-Technical University of Ostrava, 2011.
  10. Birgit Hamp and Helmut Feldweg. GermaNet – a lexical-semantic net for German In Automatic information extraction and building of lexical semantic resources for NLP applications, pages 9-15, 1997.
  11. Oliver Hellwig and Sebastian Nehrdich. Sanskrit word segmentation using character-level recurrent and convolutional neural networks In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2754–2763, 2018. (http://doi.org/10.18653/v1/D18-1295)
  12. Verena Henrich and Erhard Hinrichs. Determining immediate constituents of compounds in GermaNet In Proceedings of the International Conference on Recent Advances in Natural Language Processing 2011, pages 420–426, 2011.
  13. Jack Hoeksema. Elative compounds in Dutch: Properties and developments In Intensivierungskonzepte bei Adjektiven und Adverben im Sprachenvergleich, pages 97–142, Kovač Verlag, Hamburg, 2012.
  14. Gérard Huet. A functional toolkit for morphological and phonological processing, application to a Sanskrit tagger Journal of Functional Programming 15, pages 573–614, Cambridge; New York, NY: Cambridge University Press, c1991-, 2005. (http://doi.org/10.1017/S0956796804005416)
  15. Irina Krotova, Sergey Aksenov, and Ekaterina Artemova. A Joint Approach to Compound Splitting and Idiomatic Compound Detection In Proceedings of the 12th Language Resources and Evaluation Conference, pages 4410–4417, 2020.
  16. Jianqiang Ma, Verena Henrich, and Erhard Hinrichs. Letter sequence labeling for compound splitting In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 76–81, 2016. (http://doi.org/10.18653/v1/W16-2012)
  17. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. Marian: Fast Neural Machine Translation in C++ In Proceedings of ACL 2018, System Demonstrations, pages 116–121, 2018. (http://doi.org/10.18653/v1/P18-4020)
  18. Miloš Dokulil, Karel Horálek, Jiřina Hůrková, Miloslava Knappová, and Jan Petr. Mluvnice češtiny 1. Fonetika, fonologie, morfonologie a morfematika, tvoření slov, Academia, Praha, 1986.
  19. Martin Ološtiak and Marta Vojteková. Kompozitnost' a kompozícia: príspevok k charakteristike zložen slov na materiáli západoslovansk jazykov. Slovo a slovesnost 82, pages 95–117, 2021.
  20. Karel Pala and Pavel Šmerk. Derivancze – derivational analyzer of Czech In International Conference on Text, Speech, and Dialogue, pages 515–523, 2015. (http://doi.org/10.1007/978-3-319-24033-6_58)
  21. Martin F Porter. An algorithm for suffix stripping Program: electronic library and information systems 14, pages 130–137, MCB UP Ltd, 1980. (http://doi.org/10.1108/eb046814)
  22. Martin F. Porter. Snowball: A language for stemming algorithms, Accessed 21.01.2022, 15.00h, 2001.
  23. František SztichŠtícha, Miloslav Vondráček, Ivana Kolářová, Jana Bílková, and Ivana Svobodová. Akademická gramatika spisovné češtiny, Academia, Praha, 2013.
  24. František SztichŠtícha, Ivana Kolářová, Miloslav Vondráček, Ivana Bozděchová, Jana Bílková, Klára Osolsobě, Pavla Kochová, Zdeňka Opavská, Josef Šimandl, Lucie Kopášková, and Vojtěch Veselý. Velká akademická gramatika spisovné češtiny 1: Morfologie: Druhy slov / Tvoření slov, Academia, Praha, 2018.
  25. Kyoko Sugisaki and Don Tuggener. German compound splitting using the compound productivity of morphemes In 14th Conference on Natural Language Processing, pages 141–147, 2018.
  26. Emil Svoboda and Magda Ševčíková. Splitting and Identifying Czech Compounds: A Pilot Study In Proceedings of the Third International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2021), pages 129–138, 2021.
  27. Michal Křen, Václav Cvrček, Tomáš Čapka, Anna Čermáková, Milena Hnátková, Lucie Chlumská, Tomáš Jelínek, Dominika Kováříková, Vladimír Petkevič, Pavel Procházka, Hana Skoumalová, Michal Škrabal, Petr Truneček, Pavel Vondřička, and Adrian Jan Zasina. SYN2015: Representative Corpus of Contemporary Written Czech In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2522–2528, 2016.
  28. Salvador Valera and Alba Ruz. Conversion in English: homonymy, polysemy and paronymy English Language and Linguistics 25, pages 181–204, 2021. (http://doi.org/10.1017/S1360674319000546)