Principal investigator (ÚFAL): 
Project Manager (ÚFAL): 
Provider: 
Grant id: 
PRIMUS/23/SCI/023
ÚFAL budget: 
9158000
Duration: 
2023-2026

Language Neutral and Culturally Aware Multilingual Neural Sentence Representations

Recently, multilingual sentence representations allowed representing many languages in a single model and thus the zero-shot transfer of task-specific models between languages. These methods can potentially revolutionize computational linguistics and natural language processing (NLP) by unifying the processing of all languages into a single framework. Yet, the level cross-lingual alignment of current models is not sufficient for that.

We believe two points were neglected in previous work. Theoretical work suggests that physical perception might help to ground meaning – and eventually push the language neutrality of multilingual representation. Language meaning is socially constructed and inseparable from culture, which sets inherent limits for language neutrality. Multilingual representations must be aware of the cultural dimension of meaning, which should be interpretable and controllable.

In this project, we tackle these two issues of multilingual respresentation. As a results we want to make NLP models available in many languages without the need for explicit translation or task-specific data in multiple languages.

Publications

  1. Katharina Hämmerl, Björn Dieseroth, Patrick Schramowski, Jindřich Libovický, Constantin A. Rothkopf, Alexander Fraser, Kristian Kersting (2023): Speaking Multiple Languages Affects the Moral Bias of Language Models. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 2137-2156, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
  2. Katharina Hämmerl, Alina Fastowski, Jindřich Libovický, Alexander Fraser (2023): Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 7023-7037, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
  3. Jindřich Helcl, Jindřich Libovický (2023): CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval. In: Proceedings of the The 2nd Workshop on Multi-lingual Representation Learning (MRL), pp. 302-309, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-056-1 (pdf, local PDF, local PDF, bibtex)
  4. Hynek Kydlíček, Jindřich Libovický (2023): A Dataset and Strong Baselines for Classification of Czech News Texts. In: 26th International Conference, TSD 2023, pp. 33-44, Springer, Cham, Switzerland, ISBN 978-3-031-40497-9 (url, bibtex)
  5. Jindřich Libovický (2023): Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1000-1010, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (pdf, local PDF, bibtex)