Babel Octopus: Robust Multi-Source Speech Translation

At present, events like conferences, international negotiations or business meetings employ human interpreters to bridge the language barrier. Major events are simultaneously interpreted into the world languages, but national languages are often neglected for capacity reasons. Latest advances in artificial intelligence have shaped a promising technology - speech translation (aka Babel Fish from The Hitchhiker’s Guide to the Galaxy). Still, the technology suffers from insufficiencies in terms of its robustness against background noise, different speaker accents, disfluent speech or lack of language resources (e.g. unknown words or translations).

We aim to propose Babel Octopus: a robust speech translation system leveraging multiple parallel audio streams in different languages (e.g. an original source and an interpreter). We primarily focus on speech-to-text translation.

This goal will be achieved by: (1) using multiple concurrent speech sources (2) utilizing multilingual input to help with disambiguation and increase translation quality; (3) overcoming insufficient, imbalanced or domain-mismatched text and speech data by unsupervised training and data augmentation.

Publications

Polák, P., & Bojar, O. (2021). Coarse-To-Fine And Cross-Lingual ASR Transfer. In Proceedings of the 21st Conference Information Technologies – Applications and Theory (ITAT 2021).
Polák, P., Singh, M., & Bojar, O. (2021). Explainable Quality Estimation: CUNI Eval4NLP Submission. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems.

Reg. č. CZ.02.2.69/0.0/0.0/19_073/0016935

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Publications