Guidelines
The goal of the thesis is to study internal representations of multilingual neural models and/or mapping several monolingual representations into one shared multilingual vector space. The thesis will try to answer the following questions:
How multilingual models deal with different language phenomena?
How well may monolingual embedding spaces be mapped into one space?
Can lexical concepts and relations between them be represented independently of language?
References
Goodfellow, I., Y. Bengio, and A. Courville: Deep learning. Cambridge, MA, USA: MIT press, 2016
Alexis Conneau, Guillaume Lample, Marc'Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou: Word Translation Without Parallel Data, arXiv, 2017
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, arXiv, 2018
Mikel Artetxe, Holger Schwenk: Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond, 2019, arXiv, 2019
Telmo Pires, Eva Schlinger, Dan Garrette: How multilingual is Multilingual BERT?, arXiv, 2019