Morphological Networks for Persian and Turkish: What Can Be Induced from Morpheme Segmentation?

Hamid Haghdoost, Ebrahim Ansari, Zdeněk Žabokrtský, Mahshid Nikravesh, Mohammad Mahmoudi

References:

  1. Ebrahim Ansari, Zdeněk Žabokrtský, Mohammad Mahmoudi, Hamid Haghdoost, and Jonáš Vidra. Supervised Morphological Segmentation Using Rich Annotated Lexicon In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 52–61, INCOMA Ltd., Varna, Bulgaria, 2019. (http://doi.org/10.26615/978-954-452-056-4_007)
  2. Mohsen Arabsorkhi and Mehrnoush Shamsfard. Unsupervised Discovery of Persian Morphemes In Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations, pages 175–178, Association for Computational Linguistics, Stroudsburg, PA, USA, 2006. (http://doi.org/10.3115/1608974.1609002)
  3. Marion Baranes and Benoît Sagot. A Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2793–2799, European Language Resources Association (ELRA), Reykjavik, Iceland, 2014.
  4. Mahmood Bijankhan, Javad Sheykhzadegan, Mohammad Bahrani, and Masood Ghayoomi. Lessons from building a Persian written corpus: Peykare Language Resources and Evaluation 45, pages 143–164, 2011.
  5. Stig-Arne Grönroos, Sami Virpioja, Peter Smit, and Mikko Kurimo. Morfessor FlatCat: An HMM-Based Method for Unsupervised and Semi-Supervised Learning of Morphology In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pages 1177–1185, Dublin City University and Association for Computational Linguistics, Dublin, Ireland, 2014.
  6. Kris Cao and Marek Rei. A Joint Model for Word Embedding and Word Morphology In Proceedings of the 1st Workshop on Representation Learning for NLP, pages 18–26, Association for Computational Linguistics, Berlin, Germany, 2016. (http://doi.org/10.18653/v1/W16-1603)
  7. Mathias Creutz and Krista Lagus. Unsupervised Discovery of Morphemes In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning, pages 21–30, Association for Computational Linguistics, 2002. (http://doi.org/10.3115/1118647.1118650)
  8. Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo, Antti Puurula, Janne Pylkkönen, Vesa Siivola, Matti Varjokallio, Ebru Arisoy, Murat Saraçlar, and Andreas Stolcke. Morph-based Speech Recognition and Modeling of Out-of-vocabulary Words Across Languages ACM Trans. Speech Lang. Process. 5, pages 3:1–3:29, ACM, New York, NY, USA, 2007. (http://doi.org/10.1145/1322391.1322394)
  9. John Goldsmith. Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics 27, pages 153–198, MIT Press, Cambridge, MA, USA, 2001. (http://doi.org/10.1162/089120101750300490)
  10. Nizar Habash and Bonnie Dorr. A categorial variation database for English In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pages 17–23, 2003. (http://doi.org/10.3115/1073445.1073458)
  11. Hamid Haghdoost, Ebrahim Ansari, Zdeněk Žabokrtsk\`y, and Mahshid Nikravesh. Building a Morphological Network for Persian on Top of a Morpheme-Segmented Lexicon In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology, pages 91–100, 2019.
  12. Zellig Harris. From phoneme to morpheme Language 31, pages 209-221, Linguistic Society of America, 1955. (http://doi.org/10.1007/978-94-017-6059-1_2)
  13. Nabil Hathout and Fiammetta Namer. Démonette, a French derivational morpho-semantic network Linguistic Issues in Language Technology 11, pages 125-168, 2014.
  14. William Jones. A grammar of the Persian language 5, John Stockdale, 1807.
  15. Zbigniew Kaleta. Automatic Pairing of Perfective and Imperfective Verbs in Polish In Proceedings of the 8th Language and Technology Conference, pages , 2017.
  16. Akbar Karimi, Ebrahim Ansari, and Bahram Sadeghi Bigham. Extracting an English-Persian Parallel Corpus from Comparable Corpora In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018., 2018.
  17. Lívia Kőrtvélyessy. Cross-linguistic research into derivational networks In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology, pages 1–4, 2019.
  18. Oskar Kohonen, Sami Virpioja, Laura Leppänen, and Krista Lagus. Semi-supervised extensions to Morfessor baseline In Proceedings of the Morpho Challenge 2010 Workshop, pages 30–34, 2010.
  19. Lukáš Kyjánek, Zdeněk Žabokrtsk\`y, Magda Ševčíková, and Jonáš Vidra. Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages In Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology, pages 101–110, 2019.
  20. Mateusz Lango, Magda Ševčíková, and Zdeněk Žabokrtský. Semi-Automatic Construction of Word-Formation Networks (for Polish and Spanish) In Proceedings of the 11th Language Resources and Evaluation Conference, European Language Resource Association, Miyazaki, Japan, 2018.
  21. Yoong Keok Lee, Aria Haghighi, and Regina Barzilay. Modeling syntactic context improves morphological segmentation In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pages 1–9, 2011.
  22. Ebrahim Ansari, Zdeněk Žabokrtský, Hamid Haghdoost, and Mahshid Nikravesh. Persian Morphologically Segmented Lexicon 0.5, {LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{Ú}FAL}), Faculty of Mathematics and Physics, Charles University, 2019.
  23. Eleonora Litta, Marco Passarotti, and Chris Culy. Formatio formosa est. Building a Word Formation Lexicon for Latin In Proceedings of Third Italian Conference on Computational Linguistics (CLiC-it 2016) & Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2016), Napoli, Italy, December 5-7, 2016., 2016.
  24. Magda Ševčíková and Zdeněk Žabokrtský. Word-Formation Network for Czech In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1087–1093, European Language Resources Association (ELRA), Reykjavik, Iceland, 2014.
  25. Karthik Narasimhan, Regina Barzilay, and Tommi Jaakkola. An unsupervised method for uncovering morphological chains Transactions of the Association for Computational Linguistics 3, pages 157–167, MIT Press, 2015. (http://doi.org/10.1162/tacl_a_00130)
  26. Kemal Oflazer. Two-level description of Turkish morphology Literary and linguistic computing 9, pages 137–148, Oxford University Press, 1994. (http://doi.org/10.1093/llc/9.2.137)
  27. Maciej Piasecki, Radoslaw Ramocki, and Marek Maziarz. Recognition of Polish Derivational Relations Based on Supervised Learning Scheme In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), pages 916–922, European Language Resources Association (ELRA), Istanbul, Turkey, 2012.
  28. Hoifung Poon, Colin Cherry, and Kristina Toutanova. Unsupervised Morphological Segmentation with Log-linear Models In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 209–217, Association for Computational Linguistics, Stroudsburg, PA, USA, 2009. (http://doi.org/10.3115/1620754.1620785)
  29. Hanieh Poostchi, Ehsan Zare Borzeshi, and Massimo Piccardi. BiLSTM-CRF for Persian Named-Entity Recognition ArmanPersoNERCorpus: the First Entity-Annotated Persian Dataset In Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018, Miyazaki, Japan, May 7-12, 2018., 2018.
  30. Ahmed A Rafea and Khaled F Shaalan. Lexical analysis of inflected Arabic words using exhaustive search of an augmented transition network Software: Practice and Experience 23, pages 567–588, Wiley Online Library, 1993. (http://doi.org/10.1002/spe.4380230602)
  31. Mohammad Sadegh Rasooli, Ahmed El Kholy, and Nizar Habash. Orthographic and Morphological Processing for Persian-to-English Statistical Machine Translation In Proceedings of the Sixth International Joint Conference on Natural Language Processing, pages 1047–1051, Asian Federation of Natural Language Processing, Nagoya, Japan, 2013.
  32. Haşim Sak, Tunga Güngör, and Murat Saraçlar. Turkish language resources: Morphological parser, morphological disambiguator and web corpus In International Conference on Natural Language Processing, pages 417–427, 2008. (http://doi.org/10.1007/978-3-540-85287-2_40)
  33. Elnaz Shafaei, Diego Frassinelli, Gabriella Lapesa, and Sebastian Padó. DErivCELEX: Development and Evaluation of a German Derivational Morphology Lexicon based on CELEX In Proceedings of the DeriMo workshop, 2017.
  34. Jan Šnajder. DerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3371–3377, European Language Resources Association (ELRA), Reykjavik, Iceland, 2014.
  35. Hossein Taghi-Zadeh, Mohammad Hadi Sadreddini, Mohammad Hasan Diyanati, and Amir Hossein Rasekh. A new hybrid stemming method for Persian language Digital Scholarship in the Humanities 32, pages 209-221, 2015. (http://doi.org/10.1093/llc/fqv053)
  36. Robert Underhill. Turkish grammar, MIT press Cambridge, MA, 1976.
  37. Jesús Vilares, David Cabrero, and Miguel A. Alonso. Applying Productive Derivational Morphology to Term Indexing of Spanish Texts In Computational Linguistics and Intelligent Text Processing, pages 336–348, Springer Berlin Heidelberg, Berlin, Heidelberg, 2001. (http://doi.org/10.1007/3-540-44686-9_34)
  38. Sami Virpioja, Ville T. Turunen, Sebastian Spiegler, Oskar Kohonen, and Mikko Kurimo. Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology TRAITEMENT AUTOMATIQUE DES LANGUES 52, pages 45–90, Association pour le Traitement Automatique des Langues (ATALA), 2011.
  39. Krešimir Šojat, Matea Srebačić, Tin Pavelić, and Marko Tadić. CroDeriV: a new resource for processing Croatian morphology Proceedings of the Language Resources and Evaluation-LREC 14, pages 3366–3370, Citeseer, 2014.
  40. Zdeněk Žabokrtský, Magda Ševčíková, Milan Straka, Jonáš Vidra, and Adéla Limburská. Merging Data Resources for Inflectional and Derivational Morphology in Czech In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 1307–1314, European Language Resources Association (ELRA), Portoro{ž}, Slovenia, 2016.
  41. Britta Zeller, Jan Šnajder, and Sebastian Padó. DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1201–1211, Association for Computational Linguistics, Sofia, Bulgaria, 2013.