Principal investigator (ÚFAL): 
Project Manager (ÚFAL): 
Provider: 
Grant id: 
384826
Duration: 
2026-2027

The proposed project focuses on computational modeling of competition in natural languages, with a particular emphasis on the morpheme — the smallest meaningful unit of word formation. Here, competition is understood as an analogy to the evolutionary competition of biological species that compete for the same resources. In this project, we aim to apply computational models of population dynamics, also known as Lotka–Volterra models, to linguistic resources and use them to describe competitive relationships between morphemes. Unlike in biology, where such models are commonly used to describe phenomena such as population growth or disease spread, similar approaches have so far been rarely applied in linguistics.
Our data-driven approach will use both diachronic and multilingual resources, including corpora as well as lexical and typological databases. We then intend to compare the results with theoretical formulations of competitive mechanisms such as competitive exclusion, the Elsewhere Principle, and the Tolerance Principle.

The aim of the project is to develop computational models capturing competitive dynamics in language. The main focus will be on the morpheme as the basic, meaning-bearing element of language. For this purpose, suitable software tools will be developed within the project, including diachronic and multilingual language models. Using diachronic embedding vectors, we will examine relationships such as semantic shift, which occurs through gradual changes in the meaning of a word, morpheme, or phrase. These vectors will also be used to determine the degree of competition among the examined units. Combined with corpus frequency, serving as an analogue of population size, these values will help identify the parameters of the differential equations underlying the competition models. The resulting models will then be compared with existing linguistic hypotheses and interpreted in the context of current linguistic theory.