A data-based approach to competition in word-formation: selected semantic categories across seven languages

The project deals with data-based research into competition in word-formation. It aims to compare word-formation processes and strategies that speakers employ to express the semantic concepts of diminutiveness and femaleness in seven European languages (two Slavic, three Germanic, and two Romance languages). Derivatives, compounds and syntactic phrases used for these concepts in the analysed languages (cf. 'Polizistin' in German, 'policewoman' in English, and 'mujer policía' in Spanish) will be identified either by exploiting available language resources and tools (some of which have been developed by the project team members) or using tools and methods designed specifically for the project. The team of four PhD students of computational linguistics will develop machine learning models that will be able to simulate how these semantic concepts are expressed in the languages studied and discover which linguistic properties influence native speakers' choices among the competing alternatives. The results of the research are expected to be relevant both for the linguistic discussion on competition in word-formation and for modelling word-formation in Natural Language Processing.