Monday, 3 May, 2021 - 14:00

UFAL PhD conference - Part I

Jan Bodnár
Lukáš Kyjánek
Emil Svoboda (ÚFAL MFF UK)


Jan Bodnár: Morphological segmentation across languages



Morphemes are the smallest parts of word carrying meaning. They can serve various purposes, such as in word “sleep|er|s”, “sleep” carries the core of the meaning, “er” is a derivative morpheme signifying that it is a person doing the given activity, and “s” is a flective ending signifying the plural form.

The goal of my dissertation is to develop a general tool for morphological segmentation of words from varying languages and thus to help other algorithms to see the internal structure of words, and thus to better recognize their meanings.

I will talk about the general approaches towards  the problems, about the experiments I have done so far, and I will also discuss the general challenges connected with the problem.




Lukáš Kyjánek: Derivational semantics in language resources



Derivation, which is the major source of new lexemes in the Czech language, is a process that adds affixes to existing lexemes to change their meanings, e.g., 'stavitel' ('builder') > 'stavitelka' ('female builder'). The affix -ka, derives also nouns with other meanings, for instance, the diminutive 'skříňka' ('small wardrobe') and the instrument noun 'žehlička' ('iron'). These meanings added through derivation are explicitly captured only in a few language resources. My dissertation topic is to find a set of derivational meanings, implement it to the data resources and, finally, compare how the same derivational meanings vary across different languages.

In the talk, I will introduce existing approaches to derivational semantics across language resources and linguistic theories. I will present several pilot experiments on labelling a few selected derivational meanings using (un)supervised computational methods.




Emil Svoboda: Modelling compounds across languages



Composition is a word formation process wherein new new naming units are created by combining already existing lexical items together, like in the Czech rybolov = [‘ryba’ + ‘lov’] (fishery), the English flowerpot = [‘flower’ + ‘pot’], the Bulgarian четоводство [чета + водя] (bookkeeping), or the Ancient Greek φιλοσοφία [‘φίλειν’ + ‘σοφία’] (philosophy). 

The problem to be tackled in my dissertation is to construct a theoretical model for this word-formation process general enough to be applicable for an assortment of different languages, but grounded enough to be usable for computational methods. This is a challenging task, because languages tend to differ significantly in their utilization of compounding in general, in what types of compounds they exhibit and how often, and even how their respective linguistic traditions conceptualize and define compound words.

This shall be done with the express purpose of enriching and harmonizing multilingual data resources in mind, which is itself to be carried out in parallel with the development of the model. For this reason, it follows naturally that I am also to build tools to automatically detect compound words, as well as find their constituent words.

In my presentation, I will set forth the numerous obstacles that await me in the pursuit of the aforementioned goal, how I plan on overcoming these, what tools, methods and knowledge I have at my disposal to do so and, finally, how successful I have been so far.




