MT Marathon 2024 Talks
Monday through Saturday, MT Marathon includes keynote talks.
Confirmed Speakers
Ona de Gibert, Joseph Attieh (University of Helsinki)
Knowledge distillation (TBA)
TBA
Raj Dabre (NICT, Japan)
Advances in Multilingual Machine Translation and Evaluation for Indian Languages
Given the proliferation of internet usage in India, machine translation of Indian languages has become an increasingly important topic. In this talk I will cover the recent advances in multilingual machine translation and evaluation for Indian languages. Specifically I will focus on two major efforts, namely, IndicTrans2 and IndicMT Eval. Regarding IndicTrans2, I will focus on how we scaled up human as well as automatically mined data following which robust open-source machine translation systems were developed which outperform previously existing models, both open and closed-source. I will then discuss MT evaluation of Indian languages where we developed meta-evaluation benchmarks and how we analyzed a large number of metrics to establish their efficacy. I will also briefly talk about IndicComet, a Comet model specially designed for Indian languages. Towards the end of my talk I will briefly cover the future of Indian language machine translation, especially in the context of LLMs.
Liz Salesky (JHU)
Pixel models (TBA)
TBA
Ricardo Rei, Nuno Guerreiro, Sweta Agrawal (Unbabel, Instituto de Telecomunicações)
Tower LLM (TBA)
TBA
Vilém Zouhar (ETH Zürich)
Token(s) of Appreciation for BPE
Tokenization is present in almost all NLP pipelines, but rarely examined mathematically. During the talk we'll formalize, show boundaries, and overall grok the most popular tokenization algorithm, Byte-Pair Encoding. With information theory, we also show what makes some tokenization better than others and how to use this as a metric before training your expensive models. Lastly, we cover stochastic tokenization variants and talk about how the tokenization story is far from being over...
Laurie Burchell (University of Edinburgh)
Language ID (TBA)
TBA
Tsz Kin Lam (University of Edinburgh)
Speech Translation (TBA)
TBA
Julius Cheng, Andreas Vlachos (University of Cambridge)
Decoding (TBA)
TBA