Jindřich Libovický

office: N233
email: libovicky@ufal.mff.cuni.cz
phone: +420 951 552 954
address: IMPAKT – „N“
V Holešovičkách 747/2
180 00 Praha 8
Czech Republic

Main Research Interests

multilingual language modeling, machine translation, multilingual tokenization, combining language and vision, cross-lingual fairness

I am a researcher at the institute, and with my group, I focus on multilinguality and cross-lingual fairness. My team focuses on multilingual language modeling and machine translation. Our research covers key areas including how language models align across different languages, developing better tokenization methods that work well for multiple languages, and studying how language models perform differently across languages to improve fairness.

Projects

Current projects as principal investigator

Language Neutral and Culturally Aware Multilingual Neural Sentence Representations (2023 – 2026)
This is a grant from PRIMUS, Charles University's program, to support young PIs in starting their own groups. The project investigates how multilingual neural language models represent and transfer knowledge across different languages, with a particular focus on cross-lingual alignment and semantic similarity in encoder models and sentence embeddings.

Better Tokenization for Multilingual Language Models and Machine Translation (2025 – 2027)
This is a grant from the Czech Science Foundation. The project aims to develop semantically-grounded subword segmentation techniques that create more meaningful and cross-linguistically alignable units, thereby reducing vocabulary size and improving parameter efficiency in massively multilingual language models.

As a team member

Linguistics, Artificial Intelligence, and Language and Speech Technologies: From Research to Applications (2025 – 2028)
This project aims to strengthen collaboration between two academic institutions and three innovative companies in language and speech technologies for AI systems. It will bridge classical linguistics with modern data-driven approaches to enable widespread AI application deployment across all economic and social sectors while respecting legal frameworks and societal priorities. I am a work package leader in this project.

Curriculum Vitae

Experience

Researcher Associate @Charles University (from 2022)

Researcher @Ludwig-Maximilians-Universität München (2019 – 2021)

Research Assistant @Charles University (2013 – 2019)

Software Engineering Intern @Google (2017)

Analytic Linguist Intern @Google (2016)

Research Development Support @IBM Czech Republic (2012 – 2015)

Education

Ph.D. in Computational Linguistics, Charles University, Faculty of Mathematics and Physics (2013 – 2019)

Masters degree in Media Studies (2014 – 2017), Charles University, Faculty of Social Sciences

Masters degree in Computational Linguistics (2011 – 2013), Charles University, Faculty of Mathematics and Physics

Bachelor degree in Media Studies (2011 – 2014), Charles University, Faculty of Social Sciences

Bachelor degree in Computer Science (2007 – 2011), Charles University, Faculty of Mathematics and Physics

Teaching

NPFL129 Introduction to Machine Learning with Python (lectures)

NPFL124 Introduction to Natural Language Processing (lectures 9, 10, 13)

NPFL140 Large Language Models

Coorganizing AI in Context: a series of invited talks and follow-up discussion seminars

I am happy to supervise NLP-related bachelor's and master's theses. Have a look at some prospective topics.

Selected Bibliography

Google Scholar
ORCID: 0000-0001-7717-4090
Scopus ID: 56875384500
Researcher ID: D-4799-2017
The full list of publications on a separate page.

wget --header 'User-Agent: Mozilla/5.0' https://aclanthology.org/2025.acl-long.966.pdf -O pdfs/friedrich-etal-2025-multilingual.pdf wget --header 'User-Agent: Mozilla/5.0' https://aclanthology.org/2024.emnlp-main.421.pdf -O pdfs/libovicky-helcl-2024-lexically.pdf
Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindřich Libovický, Kristian Kersting, Alexander Fraser.
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes.
In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025
Jindřich Libovický, Jindřich Helcl.
Lexically Grounded Subword Segmentation.
In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024

Jindřich Libovický, Rudolf Rosa, Alexander Fraser.
On the Language Neutrality of Pre-trained Multilingual Representations.
In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020

Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze.
Multimodal Abstractive Summarization for How2 Videos.
In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019

Jindřich Libovický, Jindřich Helcl.
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification.
In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018

Jindřich Libovický, Jindřich Helcl.
Attention Strategies for Multi-Source Sequence-to-Sequence Learning.
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017

See the full list on my Google Scholar profile or our institute's database.

Visit my blog at jlibovicky.github.io.

Students

Currently supervised PhD student

Andrei Manea (since 2023)

Gianluca Vico (since 2024)

Adnan Al Ali (since 2026)

Katharina Hämmerl (with Alexander Fraser at TUM, since 2021)

	Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindřich Libovický, Kristian Kersting, Alexander Fraser. Multilingual Text-to-Image Generation Magnifies Gender Stereotypes. In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025
	Jindřich Libovický, Jindřich Helcl. Lexically Grounded Subword Segmentation. In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024
	Jindřich Libovický, Rudolf Rosa, Alexander Fraser. On the Language Neutrality of Pre-trained Multilingual Representations. In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020
	Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze. Multimodal Abstractive Summarization for How2 Videos. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019
	Jindřich Libovický, Jindřich Helcl. End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018
	Jindřich Libovický, Jindřich Helcl. Attention Strategies for Multi-Source Sequence-to-Sequence Learning. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Jindřich Libovický

Main Research Interests

Projects

Current projects as principal investigator

As a team member

Curriculum Vitae

Experience

Education

Teaching

Selected Bibliography

Students

Currently supervised PhD student