Jindřich Libovický
Main Research Interests
multilingual language modeling, machine translation, multilingual tokenization, combining language and vision, cross-lingual fairness
I am a researcher at the institute, and with my group, I focus on multilinguality and cross-lingual fairness. My team focuses on multilingual language modeling and machine translation. Our research covers key areas including how language models align across different languages, developing better tokenization methods that work well for multiple languages, and studying how language models perform differently across languages to improve fairness.
Projects
Current projects as principal investigator
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations (2023 – 2026)
This is a grant from PRIMUS, Charles University's program, to support young PIs in starting their own groups. The project investigates how multilingual neural language models represent and transfer knowledge across different languages, with a particular focus on cross-lingual alignment and semantic similarity in encoder models and sentence embeddings.Better Tokenization for Multilingual Language Models and Machine Translation (2025 – 2027)
This is a grant from the Czech Science Foundation. The project aims to develop semantically-grounded subword segmentation techniques that create more meaningful and cross-linguistically alignable units, thereby reducing vocabulary size and improving parameter efficiency in massively multilingual language models.As a team member
Linguistics, Artificial Intelligence, and Language and Speech Technologies: From Research to Applications (2025 – 2028)
This project aims to strengthen collaboration between two academic institutions and three innovative companies in language and speech technologies for AI systems. It will bridge classical linguistics with modern data-driven approaches to enable widespread AI application deployment across all economic and social sectors while respecting legal frameworks and societal priorities. I am a work package leader in this project.
Curriculum Vitae
Experience
- Researcher Associate @Charles University (from 2022)
- Researcher @Ludwig-Maximilians-Universität München (2019 – 2021)
- Research Assistant @Charles University (2013 – 2019)
- Software Engineering Intern @Google (2017)
- Analytic Linguist Intern @Google (2016)
- Research Development Support @IBM Czech Republic (2012 – 2015)
Education
- Ph.D. in Computational Linguistics, Charles University, Faculty of Mathematics and Physics (2013 – 2019)
- Masters degree in Media Studies (2014 – 2017), Charles University, Faculty of Social Sciences
- Masters degree in Computational Linguistics (2011 – 2013), Charles University, Faculty of Mathematics and Physics
- Bachelor degree in Media Studies (2011 – 2014), Charles University, Faculty of Social Sciences
- Bachelor degree in Computer Science (2007 – 2011), Charles University, Faculty of Mathematics and Physics
Teaching
- NPFL129 Introduction to Machine Learning with Python (lectures)
- NPFL124 Introduction to Natural Language Processing (lectures 9, 10, 13)
- NPFL140 Large Language Models
- Coorganizing AI in Context: a series of invited talks and follow-up discussion seminars
I am happy to supervise NLP-related bachelor's and master's theses. Have a look at some prospective topics.
Selected Bibliography
- Google Scholar
- ORCID: 0000-0001-7717-4090
- Scopus ID: 56875384500
- Researcher ID: D-4799-2017
The full list of publications on a separate page.
Jindřich Libovický, Helmut Schmid, Alexander Fraser.
Why don′t people use character-level machine translation?.
In: Findings of the Association for Computational Linguistics: ACL 2022. 2022Jindřich Libovický, Alexander Fraser.
Neural String Edit Distance.
In: Proceedings of the Sixth Workshop on Structured Prediction for NLP. 2022Katharina Hämmerl, Jindřich Libovický, Alexander Fraser.
Combining Static and Contextualised Multilingual Embeddings.
In: Findings of the Association for Computational Linguistics: ACL 2022. 2022Jindřich Libovický, Rudolf Rosa, Alexander Fraser.
On the Language Neutrality of Pre-trained Multilingual Representations.
In: Findings of the Association for Computational Linguistics: EMNLP 2020. 2020Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze.
Multimodal Abstractive Summarization for How2 Videos.
In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019Jindřich Libovický, Jindřich Helcl.
End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification.
In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2018Jindřich Libovický, Jindřich Helcl, David Mareček.
Input Combination Strategies for Multi-Source Transformer Decoder.
In: Proceedings of the Third Conference on Machine Translation. 2018Jindřich Libovický, Jindřich Helcl.
Attention Strategies for Multi-Source Sequence-to-Sequence Learning.
In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2017
See the full list on my Google Scholar profile or our institute's database.
Recent Blog Posts
Visit my blog at jlibovicky.github.io.
Students
Currently supervised PhD student
Andrei Manea (since 2023)
Gianluca Vico (since 2024)
Katharina Hämmerl (with Alexander Fraser at TUM, since 2021)