Principal investigator (ÚFAL): 
Provider: 
Grant id: 
825460
Duration: 
2019-2021
Projects: 

ELITR

European Live Translator

The goal of the ELITR project is to remove the language barrier in communication within and among European citizens, companies, institutes and organizations at large assemblies like conferences, in smaller live discussions like workshops and in discussions held over long distance like formal or informal online meetings. Within the project, we will create automatic subtitling system of live meetings and conference presentations and provide the system of spoken language translation (interpreting). Furthermore, the project will design and implement automatic minuting, i.e. create structured summaries from automatic transcript of a discussion. We also aim to advance the state of the art to deliver machine translation of high-quality at the level of whole documents and to scale up to multilingual settings with dozens of source and/or target languages. The technologies will be tested by the Supreme Audit Office of the Czech Republic and alfaview®, a German online conferencing system.

You can find more at: https://elitr.eu/

  • Univerzita Karlova
  • NKÚ
  • The University of Edinburgh
  • Karlsruher Institut für Technologie
  • PerVoice SPA
  • Alfatraining Bildungszentrum GmbH

Publications

  1. Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Věra Kloudová, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stuker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alex Waibel, Changhan Wang, Shinji Watanabe (2022): FINDINGS OF THE IWSLT 2022 EVALUATION CAMPAIGN. In: Proceedings of the 19th International Conference on Spoken Language Translation, pp. 98-157, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-955917-41-4 (url, bibtex)
  2. Muskan Garg, Seema Wazarkar, Muskaan Singh, Ondřej Bojar (2022): Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 6837-6847, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (url, bibtex)
  3. Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar (2022): ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1771-1779, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, bibtex)
  4. Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alex Waibel, Changhan Wang, Matthew Wiesner (2021): FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN. In: Proceedings of the 18th International Conference on Spoken Language Translation, pp. 1-29, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-74-9 (url, local PDF, bibtex)
  5. Ebrahim Ansari, Ondřej Bojar, Barry Haddow, Mohammad Mahmoudi (2021): SLTev: Comprehensive Evaluation of Spoken Language Translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71-79, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-954085-05-3 (url, local PDF, bibtex)
  6. Ondřej Bojar, Dominik Macháček, Sangeet Sagar, Otakar Smrž, Jonáš Kratochvíl, Peter Polák, Ebrahim Ansari, Mohammad Mahmoudi, Rishu Kumar, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian Stüker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams (2021): ELITR Multilingual Live Subtitling: Demo and Strategy. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 271-277, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-954085-05-3 (bibtex)
  7. Ondřej Bojar, Vojtěch Srdečný, Rishu Kumar, Otakar Smrž, Felix Schneider, Barry Haddow, Phil Williams, Chiara Canton (2021): Operating a Complex SLT System with Speakers and Human Interpreters. In: Proceedings of Machine Translation Summit XVIII 1st Workshop on Automatic Spoken Language Translation in Real-World Settings, pp. 23-34, Association for Machine Translation in the Americas, Stroudsburg, PA, USA (pdf, bibtex)
  8. Tirthankar Ghosal, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar (2021): Report on the SIGDial 2021 Special Session on Summarization of Dialogues and Multi-Party Meetings (SummDial). In: ACM SIGIR Forum, ISSN 0163-5840, December 2021, pp. 1-17 (pdf, bibtex)
  9. Matyáš Kopp, Vladislav Stankov, Jan Oldřich Krůza, Pavel Straňák, Ondřej Bojar (2021): ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata. In: 24th International Conference on Text, Speech and Dialogue, pp. 293-304, Springer, Cham, Switzerland, ISBN 978-3-030-83526-2 (pdf, local PDF, bibtex)
  10. Dominik Macháček, Matúš Žilinec, Ondřej Bojar (2021): Lost in Interpreting: Speech Translation from Source or Interpreter?. In: Proceedings of INTERSPEECH 2021, pp. 2376-2380, ISCA, Baxas, France (pdf, local PDF, bibtex)
  11. Muskaan Singh, Tirthankar Ghosal, Ondřej Bojar (2021): An Empirical Performance Analysis of State-of-the-Art Summarization Models for Automatic Minuting. In: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pp. 50-60, ACL, 209 N. Eighth Street, Stroudsburg PA 18360, USA (url, bibtex)
  12. Erion Çano, Ondřej Bojar (2020): How Many Pages? Paper Length Prediction from the Metadata. In: 4th International Conference on Natural Language Processing and Information Retrieval, pp. 91-95, ACM, New York, USA, ISBN 978-1-4503-7760-7 (url, local PDF, bibtex)
  13. Erion Çano, Ondřej Bojar (2020): Human or Machine: Automating Human Likeliness Evaluation of NLG Texts (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422 (url)
  14. Erion Çano, Ondřej Bojar (2020): Automating Text Naturalness Evaluation of NLG Systems (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422 (url)
  15. Erion Çano, Ondřej Bojar (2020): Two Huge Title and Keyword Generation Corpora of Research Articles. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6663-6671, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
  16. Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alex Waibel, Changhan Wang (2020): FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1-34, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, bibtex)
  17. Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri (2020): Findings of the 2020 Conference on Machine Translation (WMT20). In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1-55, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, bibtex)
  18. Ondřej Bojar, Dominik Macháček, Sangeet Sagar, Otakar Smrž, Jonáš Kratochvíl, Ebrahim Ansari, Dario Franceschini, Chiara Canton, Ivan Simonini, Thai-Son Nguyen, Felix Schneider, Sebastian Stüker, Alex Waibel, Barry Haddow, Rico Sennrich, Philip Williams (2020): ELITR: European Live Translator. In: Proceedings of the 22st Annual Conference of the European Association for Machine Translation (2020), pp. 463-464, European Association for Machine Translation, Lisboa, Portugal, ISBN 978-989-33-0589-8 (url, bibtex)
  19. Dario Franceschini, Chiara Canton, Ivan Simonini, Armin Schweinfurth, Adelheid Glott, Sebastian Stüker, Thai-Son Nguyen, Felix Schneider, Thanh-Le Ha, Alex Waibel, Barry Haddow, Phil Williams, Rico Sennrich, Ondřej Bojar, Sangeet Sagar, Dominik Macháček, Otakar Smrž (2020): Removing European Language Barriers with Innovative Machine Translation Technology. In: Proceedings of the 1st International Workshop on Language Technology Platforms, pp. 44-49, ELRA, Paris, France, ISBN 979-10-95546-64-1 (url, local PDF, bibtex)
  20. Jonáš Kratochvíl, Peter Polák, Ondřej Bojar (2020): Large Corpus of Czech Parliament Plenary Hearings. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6363-6367, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
  21. Dominik Macháček, Ondřej Bojar (2020): Presenting Simultaneous Translation in Limited Space. In: Proceedings of the 20th Conference Information Technologies - Applications and Theory (ITAT 2020), pp. 32-37, Tomáš Horváth, Košice, Slovakia (pdf, bibtex)
  22. Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar, Matúš Žilinec, Ondřej Bojar, Thai-Son Nguyen, Felix Schneider, Philip Williams, Yuekun Yao (2020): ELITR Non-Native Speech Translation at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 200-208, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, bibtex)
  23. Peter Polák, Sangeet Sagar, Dominik Macháček, Ondřej Bojar (2020): CUNI Neural ASR with Phoneme-Level Intermediate Step for Non-Native SLT at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 191-199, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (url, local PDF, bibtex)
  24. Rudolf Rosa (2020): Deliverable D7.2 Report on NLP Technologies Workshop at EUROSAI Congress (technical report). In: (bibtex)
  25. Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar (2020): WMT20 Document-Level Markable Error Exploration. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 371-380, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (url, local PDF, bibtex)
  26. Erion Çano, Ondřej Bojar (2019): Keyphrase Generation: A Multi-Aspect Survey. In: Proceedings of the 25th Conference of Open Innovations Association FRUCT 2019, pp. 85-94, Finnish-Russian University Cooperation in Telecommunications, Helsinki, Finland, ISBN 978-952-69244-0-3 (pdf, bibtex)
  27. Erion Çano, Ondřej Bojar (2019): Keyphrase Generation: A Text Summarization Struggle. In: The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 666-672, NAACL-HLT 2019, Minneapolis, USA, ISBN 978-1-950737-13-0 (url, bibtex)
  28. Erion Çano, Ondřej Bojar (2019): Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study. In: Proceedings of the 12th International Conference on Natural Language Generation (INLG 2019), pp. 229-239, Association for Computational Linguistics, Stroudsubrgh, PA, USA, ISBN 978-1-950737-94-9 (url, bibtex)
  29. Ivana Kvapilíková, Dominik Macháček, Ondřej Bojar (2019): CUNI Systems for the Unsupervised News Translation Task in WMT 2019. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 241-248, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (pdf, bibtex)
  30. Dominik Macháček, Jonáš Kratochvíl, Tereza Vojtěchová, Ondřej Bojar (2019): A Speech Test Set of Practice Business Presentations with Additional Relevant Texts. In: Statistical Language and Speech Processing, pp. 151-161, Springer Nature Switzerland AG, Cham, Switzerland, ISBN 978-3-030-31371-5 (url, bibtex)
  31. Anna Nedoluzhko, Ondřej Bojar (2019): Towards Automatic Minuting of Meetings. In: Proceedings of the 19th Conference ITAT 2019: Slovenskočeský NLP workshop (SloNLP 2019), pp. 112-119, CreateSpace Independent Publishing Platform, Košice, Slovakia, ISBN 0000000000 (url, local PDF, bibtex)
  32. Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar (2019): Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed. In: Computación y Sistemas, ISSN 1405-5546, vol. 23, no. 3, pp. 923-934 (url, bibtex)
  33. Kateřina Rysová, Magdaléna Rysová, Tomáš Musil, Lucie Poláková, Ondřej Bojar (2019): A Test Suite and Manual Evaluation of Document-Level NMT at WMT19. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 455-463, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, local PDF, bibtex)
  34. Tereza Vojtěchová, Michal Novák, Miloš Klouček, Ondřej Bojar (2019): SAO WMT19 Test Suite: Machine Translation of Audit Reports. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 680-692, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)

Keywords:

  • Machine translation
  • Multilingualism, language diversity
  • Automated translation
  • Natural language processing
  • Service oriented architectures