Multilingual pre-trained encoders: How far can we get with multilingual data?

Monday, 18 December, 2023 - 14:00

Room:

Multilingual pre-trained encoders: How far can we get with multilingual data?

Jindřich Libovický (ÚFAL MFF UK)

Pre-trained multilingual encoders trained with monolingual data only show surprising cross-lingual abilities. One thing that makes monolingually trained multilingual encoders attractive is that they do not require an explicit cross-lingual alignment using parallel data. Avoiding parallel data might have the advantage of not enforcing the culture of the highest-resourced language in the model. But is that really so? In the talk, we will discuss several ways of improving cross-lingual alignment with monolingual data only. Further, we will show two case studies on how the decision to use or not use parallel data affects how the models capture culture-related meaning aspects using an (almost) unsupervised interpretability method.

*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Multilingual pre-trained encoders: How far can we get with multilingual data?

Jindřich Libovický (ÚFAL MFF UK)