Project Manager (ÚFAL): 
Provider: 
Grant id: 
START/HUM/010
Duration: 
2021-2023

A data-based approach to competition in word-formation: selected semantic categories across seven languages

The project deals with data-based research into competition in word-formation. It aims to compare word-formation processes and strategies that speakers employ to express the semantic concepts of diminutiveness and femaleness in seven European languages (two Slavic, three Germanic, and two Romance languages). Derivatives, compounds and syntactic phrases used for these concepts in the analysed languages (cf. 'Polizistin' in German, 'policewoman' in English, and 'mujer policía' in Spanish) will be identified either by exploiting available language resources and tools (some of which have been developed by the project team members) or using tools and methods designed specifically for the project. The team of four PhD students of computational linguistics will develop machine learning models that will be able to simulate how these semantic concepts are expressed in the languages studied and discover which linguistic properties influence native speakers' choices among the competing alternatives. The results of the research are expected to be relevant both for the linguistic discussion on competition in word-formation and for modelling word-formation in Natural Language Processing.

Reg. n. CZ.02.2.69/0.0/0.0/19_073/0016935.

Publications

Journal, Proceedings & Reports

Data & Software

  • Kyjánek, L.; Bonami, O. 2022. Package of word embeddings of Czech from a large corpus, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-4920.
  • Kyjánek, L. 2022. Web-based Annotation Interface for Derivational Morphology. Github: https://github.com/lukyjanek/uder-annotation-interface. Online application: https://lukyjanek.github.io/subpages/uder-annotation-interface/UDerAnnotation.html.
  • Žabokrtský, Z.; Bafna, N.; Bodnár, J.; Kyjánek, L.; Svoboda, E.; Ševčíková, M.; Vidra, J. et al. 2022. Universal Segmentations 1.0, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-4629.
  • Kyjánek, L.; Lyashevskaya, O.; Nedoluzhko, A.; Vodolazsky, D.; Žabokrtský, Z. 2021. DeriNet.RU 0.5, Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, DeriNetRU-0.5.zip. Released also in the Universal Derivation collection v1.1.
  • Vidra, J.; Žabokrtský, Z.; Kyjánek, L.; Ševčíková, M.; Dohnalová, Š.; Svoboda, E.; Bodnár, J. DeriNet 2.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, 2021, http://hdl.handle.net/11234/1-3765.
  • Kyjánek, L.; Žabokrtský, Z.; Vidra, J.; Ševčíková, M. Universal Derivations v1.1, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, 2021, http://hdl.handle.net/11234/1-3247.

Presentations & Posters