Principal investigator (ÚFAL): 
Provider: 
Grant id: 
825303
Duration: 
2019-2021

Bergamot

Browser-based Multilingual Translation

The Bergamot project will add and improve client-side machine translation in a web browser.  Unlike current cloud-based options, running directly on users' machines empowers citizens to preserve their privacy and increases the uptake of language technologies in Europe in various sectors that require confidentiality.  Free software integrated with an open-source web browser, such as Mozilla Firefox, will enable bottom-up adoption by non-experts, resulting in cost savings for private and public sector users who would otherwise procure translation or operate monolingually.  
To understand and support non-expert users, our user experience work package researches their needs and creates the user interface.  Rather than simply translating text, this interface will expose improved quality estimates, addressing the rising public debate on algorithmic trust.  Building on quality estimation research, we will enable users to confidently generate text in a language they do not speak, enabling cross-lingual online form filling.  To improve quality overall, dynamic domain adaptation research addresses the peculiar writing style of a website or user by adapting translation on the fly using local information too private to upload to the cloud.  These applications require adaptation and inference to run on desktop hardware with compact model downloads, which we address with neural network efficiency research.  Our combined research on user experience, domain adaptation, quality estimation, outbound translation, and efficiency support a broad browser-based innovation plan.

You can find more at: https://browser.mt.

  • The University of Edinburgh
  • Univerzita Karlova
  • The University of Sheffield
  • Tartu Ulikool
  • MZ Denmark GmbH

Publications

The following publications received support from Bergamot. Not all Bergamot publications are listed, only those where ÚFAL team participated.

  1. Josef Jon, Dušan Variš, Michal Novák, Joao Paulo Aires, Ondřej Bojar (2023): Negative Lexical Constraints in Neural Machine Translation. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 372-384, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
  2. Dušan Variš (2023): Learning capabilities in Transformer Neural Networks (PhD thesis). In: (url, bibtex)
  3. Niyati Bafna, Martin Vastl, Ondřej Bojar (2022): Constrained Decoding for Technical Term Retention in English-­Hindi MT. In: Proceedings of ICON 2021: 18th International Conference on Natural Language Processing, pp. 1-6, NLP Association India, Centre for Natural Language Processing, Department of Computer Science and Engineering, Silchar, India (local PDF, bibtex)
  4. Jindřich Helcl, Barry Haddow, Alexandra Birch (2022): Non-Autoregressive Machine Translation: It's Not as Fast as it Seems. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1780-1790, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (local PDF, bibtex)
  5. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman (2022): CorefUD 1.0: Coreference Meets Universal Dependencies. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 4859-4872, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, bibtex)
  6. Michael Hanna, Ondřej Bojar (2021): A Fine-Grained Analysis of BERTScore. In: Proceedings of the Sixth Conference on Machine Translation, pp. 507-517, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
  7. Josef Jon, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages. In: Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 4019-4033, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-52-7 (url, local PDF, bibtex)
  8. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 354-361, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
  9. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Terminology translation Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 828-834, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
  10. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Daniel Zeman (2021): Is one head enough? Mention heads in coreference annotations compared with UD-style heads. In: Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021), pp. 101-114, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-14-8 (pdf, local PDF, bibtex)
  11. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Daniel Zeman (2021): Coreference meets Universal Dependencies – a pilot experiment on harmonizing coreference datasets for 11 languages (technical report). In: (pdf, local PDF, bibtex)
  12. Peter Polák, Muskaan Singh, Ondřej Bojar (2021): Explainable Quality Estimation: CUNI Eval4NLP Submission. In: Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pp. 250-255, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
  13. Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman (2021): Do UD Trees Match Mention Spans in Coreference Annotations?. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 3570-3576, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-10-0 (url, local PDF, bibtex)
  14. Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya (2021): Backtranslation Feedback Improves User Confidence in MT, Not Quality. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 151-161, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-46-6 (url, local PDF, bibtex)
  15. Petra Barančíková, Ondřej Bojar (2020): Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces. In: 23rd International Conference on Text, Speech and Dialogue, pp. 135-143, Springer, Cham, Switzerland, ISBN 978-3-030-58322-4 (local PDF, bibtex)
  16. Petra Barančíková, Ondřej Bojar (2020): COSTRA 1.0: A Dataset of Complex Sentence Transformations. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 3535-3541, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
  17. Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri (2020): Findings of the 2020 Conference on Machine Translation (WMT20). In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1-55, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, bibtex)
  18. Tom Kocmi, Ondřej Bojar (2020): Efficiently Reusing Old Models Across Languages via Transfer Learning. In: Proceedings of the 22st Annual Conference of the European Association for Machine Translation (2020), pp. 1-10, European Association for Machine Translation, Lisboa, Portugal, ISBN 978-989-33-0589-8 (bibtex)
  19. Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek (2020): Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task. In: Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 153-160, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-17-0 (url, local PDF, bibtex)
  20. Vilém Zouhar, Ondřej Bojar (2020): Outbound Translation User Interface Ptakopet: A Pilot Study. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6967-6975, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
  21. Vilém Zouhar, Michal Novák (2020): Extending Ptakopět for Machine Translation User Interaction Experiments. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 129-142 (pdf, local PDF, bibtex)
  22. Loïc Barrault, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, Marcos Zampieri (2019): Findings of the 2019 Conference on Machine Translation (WMT19). In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 1-61, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)
  23. Jindřich Helcl, Jindřich Libovický, Martin Popel (2019): CUNI System for the WMT19 Robustness Task. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 738-742, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, local PDF, local PDF, bibtex)
  24. Tom Kocmi (2019): Exploring Benefits of Transfer Learning in Neural Machine Translation (PhD thesis). In: (url, bibtex)
  25. Tom Kocmi, Ondřej Bojar (2019): CUNI Submission for Low-Resource Languages in WMT News 2019. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 234-240, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (pdf, bibtex)
  26. Daniel Kondratyuk, Ronald Cardenas, Ondřej Bojar (2019): Replacing Linguists with Dummies: A Serious Need for Trivial Baselinesin Multi-Task Neural Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 113, pp. 31-40 (pdf, bibtex)
  27. Tereza Vojtěchová, Michal Novák, Miloš Klouček, Ondřej Bojar (2019): SAO WMT19 Test Suite: Machine Translation of Audit Reports. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 680-692, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)

Keywords:

  • Machine translation
  • Human computer interaction and interface, visualiz
  • automated translation
  • Natural language processing