Jindřich Libovický

office
N233
email
libovicky@ufal.mff.cuni.cz
phone
+420 951 552 954
address
IMPAKT – „N“
V Holešovičkách 747/2
180 00 Praha 8
Czech Republic

Main Research Interests

multilingual language modeling, machine translation, multilingual tokenization, combining language and vision, cross-lingual fairness

I am a researcher at the institute, and with my group, I focus on multilinguality and cross-lingual fairness. My team focuses on multilingual language modeling and machine translation. Our research covers key areas including how language models align across different languages, developing better tokenization methods that work well for multiple languages, and studying how language models perform differently across languages to improve fairness.

Projects

Current projects as principal investigator

Language Neutral and Culturally Aware Multilingual Neural Sentence Representations (2023 – 2026)
This is a grant from PRIMUS, Charles University's program, to support young PIs in starting their own groups. The project investigates how multilingual neural language models represent and transfer knowledge across different languages, with a particular focus on cross-lingual alignment and semantic similarity in encoder models and sentence embeddings.

Better Tokenization for Multilingual Language Models and Machine Translation (2025 – 2027)
This is a grant from the Czech Science Foundation. The project aims to develop semantically-grounded subword segmentation techniques that create more meaningful and cross-linguistically alignable units, thereby reducing vocabulary size and improving parameter efficiency in massively multilingual language models.

As a team member

Linguistics, Artificial Intelligence, and Language and Speech Technologies: From Research to Applications (2025 – 2028)
This project aims to strengthen collaboration between two academic institutions and three innovative companies in language and speech technologies for AI systems. It will bridge classical linguistics with modern data-driven approaches to enable widespread AI application deployment across all economic and social sectors while respecting legal frameworks and societal priorities. I am a work package leader in this project.

Curriculum Vitae

Experience

  • Researcher Associate @Charles University (from 2022)
  • Researcher @Ludwig-Maximilians-Universität München (2019 – 2021)
  • Research Assistant @Charles University (2013 – 2019)
  • Software Engineering Intern @Google (2017)
  • Analytic Linguist Intern @Google (2016)
  • Research Development Support @IBM Czech Republic (2012 – 2015)

Education

  • Ph.D. in Computational Linguistics, Charles University, Faculty of Mathematics and Physics (2013 – 2019)
  • Masters degree in Media Studies (2014 – 2017), Charles University, Faculty of Social Sciences
  • Masters degree in Computational Linguistics (2011 – 2013), Charles University, Faculty of Mathematics and Physics
  • Bachelor degree in Media Studies (2011 – 2014), Charles University, Faculty of Social Sciences
  • Bachelor degree in Computer Science (2007 – 2011), Charles University, Faculty of Mathematics and Physics

Teaching

I am happy to supervise NLP-related bachelor's and master's theses. Have a look at some prospective topics.

Selected Bibliography

The full list of publications on a separate page.

wget --header 'User-Agent: Mozilla/5.0' https://aclanthology.org/2025.acl-long.966.pdf -O pdfs/friedrich-etal-2025-multilingual.pdf wget --header 'User-Agent: Mozilla/5.0' https://aclanthology.org/2024.emnlp-main.421.pdf -O pdfs/libovicky-helcl-2024-lexically.pdf
Felix Friedrich, Katharina Hämmerl, Patrick Schramowski, Manuel Brack, Jindřich Libovický, Kristian Kersting, Alexander Fraser.
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes.
In: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025
Jindřich Libovický, Jindřich Helcl.
Lexically Grounded Subword Segmentation.
In: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024
Jindřich Libovický, Rudolf Rosa, Alexander Fraser.
On the Language Neutrality of Pre-trained Multilingual Representations.
In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020
Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze.
Multimodal Abstractive Summarization for How2 Videos.
In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019
Jindřich Libovický, Jindřich Helcl.
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification.
In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018
Jindřich Libovický, Jindřich Helcl.
Attention Strategies for Multi-Source Sequence-to-Sequence Learning.
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017

 

See the full list on my Google Scholar profile or our institute's database.

Visit my blog at jlibovicky.github.io.

Students

Currently supervised PhD student

Andrei Manea (since 2023)

Gianluca Vico (since 2024)

Adnan Al Ali (since 2026)

Katharina Hämmerl (with Alexander Fraser at TUM, since 2021)