In this project, we study multilingual language models, to what extent are the inner representaions of the models similar across languages and cross-lingual transfer between models.

Some news and achievements

Our submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval was one of the three award-winning systems in the competition.
Our paper on Lexically Grounded Subword Segmentation was accepted to EMNLP 2024, check out our video on YouTube
In spring 2024, Kathy Hämmerl from LMU Munich visited our group.
In March 2024, a new PhD student Gianluca Vico joined the group. Welcome!
We won the MRL 2023 Shared Task on Multi-lingual Multi-task
Information Retrieval
In March 2023, a new PhD student Andrei Manea joined the group. Welcome!

Publications

Katharina Hämmerl, Tomasz Limisiewicz, Jindřich Libovický, Alexander Fraser (2025): Beyond Literal Token Overlap: Token Alignability for Multilinguality. In: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pp. 756-767, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-190-2 (url, bibtex)
Adnan Al Ali, Jindřich Libovický (2024): How Gender Interacts with Political Values: A Case Study on Czech BERT Models. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 3038-3045, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (local PDF, bibtex)
Katharina Hämmerl, Jindřich Libovický, Alexander Fraser (2024): Understanding Cross-Lingual Alignment—A Survey. In: Findings of the Association for Computational Linguistics: ACL 2024, pp. 10922-10943, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-099-8 (url, local PDF, local PDF, bibtex)
Katharina Hämmerl, Andrei-Alexandru Manea, Gianluca Vico, Jindřich Helcl, Jindřich Libovický (2024): CUNI and LMU Submission to the MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval. In: Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pp. 357-364, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-184-1 (url, bibtex)
Jindřich Helcl, Zdeněk Kasner, Ondřej Dušek, Tomasz Limisiewicz, Dominik Macháček, Tomáš Musil, Jindřich Libovický (2024): Teaching LLMs at Charles University: Assignments and Activities. In: The Sixth Workshop on Teaching NLP: Proceedings of the Workshop, pp. 69-72, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-134-6 (url, local PDF, local PDF, bibtex)
Jindřich Libovický, Jindřich Helcl (2024): Lexically Grounded Subword Segmentation. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 7403-7420, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-164-3 (url, bibtex)
Philipp Rösch, Norbert Oswald, Michaela Geierhos, Jindřich Libovický (2024): Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples. In: The 3rd Workshop on Advances in Language and Vision Research: Proceedings of the Workshop, pp. 102-115, Association for Computational Linguistics (ACL), Kerrville, TX, USA , ISBN 979-8-89176-153-7 (pdf, local PDF, local PDF, bibtex)
Katharina Hämmerl, Björn Dieseroth, Patrick Schramowski, Jindřich Libovický, Constantin A. Rothkopf, Alexander Fraser, Kristian Kersting (2023): Speaking Multiple Languages Affects the Moral Bias of Language Models. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 2137-2156, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
Katharina Hämmerl, Alina Fastowski, Jindřich Libovický, Alexander Fraser (2023): Exploring Anisotropy and Outliers in Multilingual Language Models for Cross-Lingual Semantic Sentence Similarity. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 7023-7037, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (url, bibtex)
Jindřich Helcl, Jindřich Libovický (2023): CUNI Submission to MRL 2023 Shared Task on Multi-lingual Multi-task Information Retrieval. In: Proceedings of the The 2nd Workshop on Multi-lingual Representation Learning (MRL), pp. 302-309, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-056-1 (pdf, local PDF, local PDF, bibtex)
Hynek Kydlíček, Jindřich Libovický (2023): A Dataset and Strong Baselines for Classification of Czech News Texts. In: 26th International Conference, TSD 2023, pp. 33-44, Springer, Cham, Switzerland, ISBN 978-3-031-40497-9 (url, bibtex)
Jindřich Libovický (2023): Is a Prestigious Job the same as a Prestigious Country? A Case Study on Multilingual Sentence Embeddings and European Countries. In: Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 1000-1010, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (pdf, local PDF, bibtex)

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Language Neutral and Culturally Aware Multilingual Neural Sentence Representations

Some news and achievements

Publications