Monday, 6 January, 2020 - 14:00

ÚFAL PhD Students Microconference

Ivana Kvapilíková, Jonáš Vidra (ÚFAL MFF UK)

Ivana Kvapilíková: Unsupervised machine translation

Unsupervised machine translation is the task of performing machine translation (MT) without any translation resources at training time. Existing approaches show that it is possible to design an MT system that learns entirely from monolingual corpora. However, the translation quality is still lagging far behind the supervised MT systems.

In this talk, I will present ideas for potential improvement of the existing models by better initialization with cross-lingual representations or by generating cleaner synthetic data for further training. I will focus on what I learned when exploring representations from multilingual language models such as mBERT and XLM.


Jonáš Vidra: Semi-supervised machine learning methods for developing derivational networks

Creating a derivational network, such as the Czech DeriNet, requires a large amount of time of skilled linguists, who have to manually
annotate the word-formational links. My dissertation topic is toevelop methods which would make the process easier and simpler and
thus allow us to create such networks for more languages in less time.

Two major pathways to be explored are monolingual derivation prediction (preliminary research by me, Lukáš Kyjánek, Mateusz Lango and others already shows viability of even simple machine learning methods, if there are some training data available) and cross-lingual transfer (which could be used to bootstrap the monolingual methods for languages without any existing word-formational data).

In this talk, I will introduce the existing research into the area and the options I currently see for further development.