Exploring multilingual representations of language units in a shared vector space

Guidelines

The goal of the thesis is to study internal representations of multilingual neural models and/or mapping several monolingual representations into one shared multilingual vector space. The thesis will try to answer the following questions:
How multilingual models deal with different language phenomena?
How well may monolingual embedding spaces be mapped into one space?
Can lexical concepts and relations between them be represented independently of language?

References

Goodfellow, I., Y. Bengio, and A. Courville: Deep learning. Cambridge, MA, USA: MIT press, 2016

Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou: Word Translation Without Parallel Data, arXiv, 2017

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv, 2018

Mikel Artetxe, Holger Schwenk: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, 2019, arXiv, 2019

Telmo Pires, Eva Schlinger, Dan Garrette: How multilingual is Multilingual BERT?, arXiv, 2019