At present, events like conferences, international negotiations or business meetings employ human interpreters to bridge the language barrier. Major events are simultaneously interpreted into the world languages, but national languages are often neglected for capacity reasons. Latest advances in artificial intelligence have shaped a promising technology - speech translation (aka Babel Fish from The Hitchhiker’s Guide to the Galaxy). Still, the technology suffers from insufficiencies in terms of its robustness against background noise, different speaker accents, disfluent speech or lack of language resources (e.g. unknown words or translations).
We aim to propose Babel Octopus: a robust speech translation system leveraging multiple parallel audio streams in different languages (e.g. an original source and an interpreter). We primarily focus on speech-to-text translation.
This goal will be achieved by: (1) using multiple concurrent speech sources (2) utilizing multilingual input to help with disambiguation and increase translation quality; (3) overcoming insufficient, imbalanced or domain-mismatched text and speech data by unsupervised training and data augmentation.
Reg. č. CZ.02.2.69/0.0/0.0/19_073/0016935