Transferring Word-Formation Networks Between Languages

Jonáš Vidra, Zdeněk Žabokrtský

References:

  1. Marion Baranes and Beno Sagot. A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2793-2799, European Language Resources Association (ELRA), Reykjavik, Iceland, 2014.
  2. Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Abbott Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóga, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, and Ekaterina Vylomova. UniMorph 4.0: Universal Morphology In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 840–855, European Language Resources Association, Marseille, France, 2022.
  3. Khuyagbaatar Batsuren, Gabor Bella, and Fausto Giunchiglia. CogNet: A Large-Scale Cognate Database In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3136-3145, Association for Computational Linguistics, Florence, Italy, 2019. (http://doi.org/10.18653/v1/P19-1302)
  4. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. Enriching Word Vectors with Subword Information Transactions of the Association for Computational Linguistics 5, pages 135–146, 2017. (http://doi.org/10.1162/tacl_a_00051)
  5. Yoeng-Jin Chu and Tseng-Hong Liu. On the shortest arborescence of a directed graph Science Sinica 14, pages 1396-1400, 1965.
  6. Kenneth Ward Church. Char A Program for Aligning Parallel Texts at the Character Level In 31st Annual Meeting of the Association for Computational Linguistics, pages 1–8, Association for Computational Linguistics, Columbus, Ohio, USA, 1993. (http://doi.org/10.3115/981574.981575)
  7. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. Unsupervised Cross-lingual Representation Learning at Scale In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8440–8451, Association for Computational Linguistics, Online, 2020. (http://doi.org/10.18653/v1/2020.acl-main.747)
  8. Chris Dyer, Victor Chahuneau, and Noah A. Smith. A Simple, Fast, and Effective Reparameterization of IBM Model 2 In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 644-648, Association for Computational Linguistics, Atlanta, Georgia, 2013.
  9. Prakhar Gupta and Martin Jaggi. Obtaining Better Static Word Embeddings Using Contextual Embedding Models In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 5241–5253, Association for Computational Linguistics, Online, 2021. (http://doi.org/10.18653/v1/2021.acl-long.408)
  10. Katharina Hämmerl, Jindřich Libovický, and Alexander Fraser. Combining Static and Contextualised Multilingual Embeddings In Findings of the Association for Computational Linguistics: ACL 2022, pages 2316–2329, Association for Computational Linguistics, Dublin, Ireland, 2022. (http://doi.org/10.18653/v1/2022.findings-acl.182)
  11. Nabil Hathout. Acquisition of morphological families and derivational series from a machine readable dictionary In Proceedings of the 6th Décembrettes., pages 166-180, Cascadilla, Bordeaux, France, 2008.
  12. Nabil Hathout and Fiammetta Namer. Démonette, A French Derivational Morpho-Semantic Network Linguistic Issues in Language Technology 11, pages 125-162, 2014. (http://doi.org/10.33011/lilt.v11i.1369)
  13. Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, and Alexandra Birch. Marian: Fast Neural Machine Translation in C++ In Proceedings of ACL 2018, System Demonstrations, pages 116–121, Association for Computational Linguistics, Melbourne, Australia, 2018. (http://doi.org/10.18653/v1/P18-4020)
  14. Lukáš Kyjánek. Morphological Resources of Derivational Word-Formation Relations, ÚFAL MFF UK, Praha, Czechia, 2018.
  15. Lukáš Kyjánek, Zdeněk Žabokrtský, Magda Ševčíková, and Jonáš Vidra. Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology (DeriMo 2019), pages 101-110, ÚFAL MFF UK, Praha, Czechia, 2019.
  16. Mateusz Lango, Zdeněk Žabokrtský, and Magda Ševčíková. Semi-Automatic Construction of Word-Formation Networks Language Resources and Evaluation 55, pages 3-32, 2021. (http://doi.org/10.1007/s10579-019-09484-2)
  17. Ryan McDonald, Slav Petrov, and Keith Hall. Multi-Source Transfer of Delexicalized Dependency Parsers In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 62–72, Association for Computational Linguistics, Edinburgh, Scotland, UK., 2011.
  18. Tomáš Musil, Jonáš Vidra, and David Mareček. Derivational Morphological Relations in Word Embeddings In Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 173–180, Association for Computational Linguistics, Florence, Italy, 2019. (http://doi.org/10.18653/v1/W19-4818)
  19. Joakim Nivre and others. Universal Dependencies v1: A Multilingual Treebank Collection In Proceedings of the 10th International Conference on Language Resources and Evaluation, pages 1659-1666, ELRA, 2016.
  20. Rudolf Rosa and Zdeněk Žabokrtský. Unsupervised Lemmatization as Embeddings-Based Word Clustering, 2019.
  21. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python Journal of Machine Learning Research 12, pages 2825-2830, 2011.
  22. Rico Sennrich, Barry Haddow, and Alexandra Birch. Improving Neural Machine Translation Models with Monolingual Data In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 86–96, Association for Computational Linguistics, Berlin, Germany, 2016. (http://doi.org/10.18653/v1/P16-1009)
  23. Michel Simard, George F. Foster, and Pierre Isabelle. Using cognates to align sentences in bilingual corpora In Proceedings of the Fourth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Montréal, Canada, 1992.
  24. Milan Straka and Jana Straková. Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe In Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pages 88-99, Association for Computational Linguistics, Vancouver, Canada, 2017. (http://doi.org/10.18653/v1/K17-3009)
  25. Emil Svoboda and Magda Ševčíková. Word Formation Analyzer for Czech: Automatic Parent Retrieval and Classification of Word Formation Processes The Prague Bulletin of Mathematical Linguistics 118, pages 55–73, 2022. (http://doi.org/10.14712/00326585.019)
  26. Jörg Tiedemann. Parallel Data, Tools and Interfaces in OPUS In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2214–2218, European Language Resources Association (ELRA), Istanbul, Turkey, 2012.
  27. Zdeněk Žabokrtský, Magda Ševčíková, Milan Straka, Jonáš Vidra, and Adéla Limburská. Merging Data Resources for Inflectional and Derivational Morphology in Czech In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pages 1307-1314, European Language Resources Association, Paris, France, 2016.
  28. Britta Zeller, Jan Šnajder, and Sebastian Padó. DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 1201-1211, Sofia, Bulgaria, 2013.
  29. Jiajun Zhang and Chengqing Zong. Exploiting Source-side Monolingual Data in Neural Machine Translation In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1535–1545, Association for Computational Linguistics, Austin, Texas, 2016. (http://doi.org/10.18653/v1/D16-1160)
  30. Yuan Zhang, David Gaddy, Regina Barzilay, and Tommi Jaakkola. Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1307–1317, Association for Computational Linguistics, San Diego, California, 2016. (http://doi.org/10.18653/v1/N16-1156)