Principal investigator (ÚFAL): 
Provider: 
Grant id: 
19-26934X
Duration: 
2019-2023

NEUREM3

Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)

Projekt NEUREM3 spojuje základní výzkum v oblasti zpracování mluvené řeči (speech processing, SP) a přirozeného jazyka (natural language processing, NLP) s důrazem na vícejazyčnost a multi-modalitu (zpracování řeči a textu s podporou obrazové informace). V jádru současných metod hlubokého strojového učení leží spojité vektorové reprezentace, které si neuronové sítě samy budují během trénování. Ačkoli empiricky dosahují neuronové sítě často vynikajících výsledků, znalosti a pochopení získaných reprezentací jsou nedostatečné. NEUREM3 má ambici tuto mezeru vyplnit a studovat neuronové reprezentace pro jednotky textu a řeči různého rozsahu (od fonémů a písmen až po proslovy a dokumenty) a reprezentace získané pro izolované úlohy i více úloh současně (multi-tasking). NEUREM3 vylepší architektury i techniky trénování neuronových sítí, aby je bylo možné trénovat na neúplných nebo nekoherentních datech.

Cíle projektu česky Systematická studie neuronových struktur pro modelování řeči a textu v multimodálních a multilingválních prostředích. Výzkum hierarchií neuronových reprezentací, jejich srozumitelnosti pro lidské uživatele a trénování v realistických podmínkách neideálních a nekoherentních dat.

----------

The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has the ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.

Systematic study of neural structures for speech and text modeling in multi-modal and multi-lingual settings. Addressing hierarchy of neural representations, human interpretability, and training under realistic conditions of non-ideal and incoherent data.

Publications

The following publications were (in part) supported by NEUREM3. Not all NEUREM3 publications are listed, only those where someone from UFAL team has participated.

  1. Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Věra Kloudová, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stuker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alex Waibel, Changhan Wang, Shinji Watanabe (2022): FINDINGS OF THE IWSLT 2022 EVALUATION CAMPAIGN. In: Proceedings of the 19th International Conference on Spoken Language Translation, pp. 98-157, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-955917-41-4 (url, bibtex)
  2. Sunit Bhattacharya, Ondřej Bojar, Rishu Kumar (2022): Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Model. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: ACL 2022, pp. 130-135, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-21-6 (bibtex)
  3. Lukáš Burget, Ondřej Bojar (2022): Průběžná zpráva NEUREM3 (technical report). In: (pdf, bibtex)
  4. Jindřich Helcl, Barry Haddow, Alexandra Birch (2022): Non-Autoregressive Machine Translation: It's Not as Fast as it Seems. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1780-1790, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (bibtex)
  5. Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar (2022): ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1771-1779, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, bibtex)
  6. Philipp Rösch, Jindřich Libovický (2022): Probing the Role of Positional Information in Vision-Language Models. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1031-1041, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-76-6 (url, local PDF, local PDF, bibtex)
  7. Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussà, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher M. Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri (2021): Findings of the 2021 Conference on Machine Translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1-88, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (pdf, local PDF, obd, bibtex)
  8. Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alex Waibel, Changhan Wang, Matthew Wiesner (2021): FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN. In: Proceedings of the 18th International Conference on Spoken Language Translation, pp. 1-29, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-74-9 (url, local PDF, obd, bibtex)
  9. Ebrahim Ansari, Ondřej Bojar, Barry Haddow, Mohammad Mahmoudi (2021): SLTev: Comprehensive Evaluation of Spoken Language Translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71-79, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-954085-05-3 (url, local PDF, obd, bibtex)
  10. Michal Auersperger, Pavel Pecina (2021): Solving SCAN Tasks with Data Augmentation and Input Embeddings. In: Proceedings of the Recent Advances in Natural Language Processing, pp. 86-91, INCOMA Ltd., Shoumen, Bulgaria, ISBN 978-954-452-072-4 (pdf, local PDF, obd, bibtex)
  11. Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar (2021): Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain. In: Proceedings of the Sixth Conference on Machine Translation, pp. 733-774, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, obd, bibtex)
  12. Petr Gebauer, Ondřej Bojar, Vojtěch Švandelík, Martin Popel (2021): CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT. In: Proceedings of the Sixth Conference on Machine Translation, pp. 123-129, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, obd, bibtex)
  13. Michael Hanna, Ondřej Bojar (2021): A Fine-Grained Analysis of BERTScore. In: Proceedings of the Sixth Conference on Machine Translation, pp. 507-517, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, obd, bibtex)
  14. Michael Hanna, David Mareček (2021): Analyzing BERT’s Knowledge of Hypernymy via Prompting. In: Proceedings of the 4th Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 275-282, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-06-3 (pdf, obd, bibtex)
  15. Josef Jon, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages. In: Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 4019-4033, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-52-7 (url, local PDF, obd, bibtex)
  16. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Terminology translation Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 828-834, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, obd, bibtex)
  17. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 354-361, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, obd, bibtex)
  18. Věra Kloudová, Ondřej Bojar, Martin Popel (2021): Detecting Post-edited References and Their Effect on Human Evaluation. In: Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pp. 114-119, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-10-7 (pdf, local PDF, obd, bibtex)
  19. Matyáš Kopp, Vladislav Stankov, Jan Oldřich Krůza, Pavel Straňák, Ondřej Bojar (2021): ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata. In: 24th International Conference on Text, Speech and Dialogue, pp. 293-304, Springer, Cham, Switzerland, ISBN 978-3-030-83526-2 (pdf, local PDF, obd, bibtex)
  20. Ivana Kvapilíková, Ondřej Bojar (2021): Machine Translation of Covid-19 Information Resources via Multilingual Transfer. In: ITAT 2021 2nd Workshop on Automata, Formal and Natural Languages – WAFNL 2021, pp. 176-181, Faculty of Mathematics and Physics, Praha, Czechia (pdf, local PDF, obd, bibtex)
  21. Dominik Macháček, Matúš Žilinec, Ondřej Bojar (2021): Lost in Interpreting: Speech Translation from Source or Interpreter?. In: Proceedings of INTERSPEECH 2021, pp. 2376-2380, ISCA, Baxas, France (pdf, local PDF, obd, bibtex)
  22. Jiří Mayer, Pavel Pecina (2021): Synthesizing Training Data for Handwritten Music Recognition. In: Document Analysis and Recognition -- ICDAR 2021, Lecture Notes in Computer Science, ISSN 0302-9743, 12823, pp. 626-641, Springer International Publishing, Cham, Switzerland, ISBN 978-3-030-86333-3 (pdf, obd, bibtex)
  23. Toshiaki Nakazawa, Hideki Nakayma, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi (2021): Overview of the 8th Workshop on Asian Translation. In: Proceedings of the 8th Workshop on Asian Translation, pp. 1-45, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, obd, bibtex)
  24. Shantipriya Parida, Subhadarshi Panda, Ketan Kotwal, Amulya Ratna Dash, Satya Ranjan Dash, Yashvardhan Sharma, Petr Motlíček, Ondřej Bojar (2021): NLPHut’s Participation at WAT2021. In: Proceedings of the 8th Workshop on Asian Translation, pp. 146-154, Association for Computational Linguistics, Stroudsburg, USA (pdf, obd, bibtex)
  25. Peter Polák, Ondřej Bojar (2021): Coarse-To-Fine And Cross-Lingual ASR Transfer. In: ITAT 2021 2nd Workshop on Automata, Formal and Natural Languages – WAFNL 2021, pp. 154-160, Faculty of Mathematics and Physics, Praha, Czechia (pdf, local PDF, obd, bibtex)
  26. Peter Polák, Muskaan Singh, Ondřej Bojar (2021): Explainable Quality Estimation: CUNI Eval4NLP Submission. In: Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pp. 250-255, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, obd, bibtex)
  27. Dušan Variš, Ondřej Bojar (2021): Sequence Length is a Domain: Length-based Overfitting in Transformer Models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8246-8257, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-09-4 (pdf, local PDF, local PDF, local PDF, obd, bibtex)
  28. Vilém Zouhar (2021): Sampling and Filtering of Neural Machine Translation Distillation Data. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 1-8, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-50-3 (pdf, obd, bibtex)
  29. Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya (2021): Backtranslation Feedback Improves User Confidence in MT, Not Quality. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 151-161, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-46-6 (url, local PDF, obd, bibtex)
  30. Vilém Zouhar, Aleš Tamchyna, Martin Popel, Ondřej Bojar (2021): Neural Machine Translation Quality and Post-Editing Performance. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 10204-10214, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-09-4 (pdf, local PDF, obd, bibtex)
  31. Erion Çano, Ondřej Bojar (2020): How Many Pages? Paper Length Prediction from the Metadata. In: 4th International Conference on Natural Language Processing and Information Retrieval, pp. 91-95, ACM, New York, USA, ISBN 978-1-4503-7760-7 (url, local PDF, obd, bibtex)
  32. Erion Çano, Ondřej Bojar (2020): Two Huge Title and Keyword Generation Corpora of Research Articles. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6663-6671, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, obd, bibtex)
  33. Hadi Abdi Khojasteh, Ebrahim Ansari, Mahdi Bohlouli (2020): LSCP: Enhanced Large Scale Colloquial Persian Language Understanding. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6323-6327, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, obd, bibtex)
  34. Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alex Waibel, Changhan Wang (2020): FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1-34, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, obd, bibtex)
  35. Petra Barančíková, Ondřej Bojar (2020): Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces. In: 23rd International Conference on Text, Speech and Dialogue, pp. 135-143, Springer, Cham, Switzerland, ISBN 978-3-030-58322-4 (local PDF, obd, bibtex)
  36. Petra Barančíková, Ondřej Bojar (2020): COSTRA 1.0: A Dataset of Complex Sentence Transformations. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 3535-3541, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, obd, bibtex)
  37. Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri (2020): Findings of the 2020 Conference on Machine Translation (WMT20). In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1-55, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, obd, bibtex)
  38. Jonáš Kratochvíl, Peter Polák, Ondřej Bojar (2020): Large Corpus of Czech Parliament Plenary Hearings. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6363-6367, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, obd, bibtex)
  39. Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar (2020): Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 255-262, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-03-3 (url, local PDF, obd, bibtex)
  40. Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar (2020): CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1123-1128, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, obd, bibtex)
  41. Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek (2020): Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task. In: Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 153-160, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-17-0 (url, local PDF, obd, bibtex)
  42. Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar, Matúš Žilinec, Ondřej Bojar, Thai-Son Nguyen, Felix Schneider, Philip Williams, Yuekun Yao (2020): ELITR Non-Native Speech Translation at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 200-208, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, obd, bibtex)
  43. Nitika Mathur, Johnny Tian-Zheng Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar (2020): Results of the WMT20 Metrics Shared Task. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 688-725, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, obd, bibtex)
  44. Toshiaki Nakazawa, Hideki Nakayma, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi (2020): Overview of the 7th Workshop on Asian Translation. In: Proceedings of the 7th Workshop on Asian Translation (WAT2020), pp. 1-44, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, obd, bibtex)
  45. Shantipriya Parida, Petr Motlíček, Amulya Ratna Dash, Satya Ranjan Dash, Debasish Kumar Mallick, Satya Prakash Biswal, Priyanka Pattnaik, Biranchi Narayan Nayak, Ondřej Bojar (2020): ODIANLP’s Participation in WAT2020. In: Proceedings of the 7th Workshop on Asian Translation (WAT2020), pp. 103-108, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, obd, bibtex)
  46. Peter Polák, Sangeet Sagar, Dominik Macháček, Ondřej Bojar (2020): CUNI Neural ASR with Phoneme-Level Intermediate Step for Non-Native SLT at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 191-199, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (url, local PDF, obd, bibtex)
  47. Martin Popel, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, Zdeněk Žabokrtský (2020): Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. In: Nature Communications, ISSN 2041-1723, vol. 11, no. 4381, pp. 1-15 (url, local PDF, obd, bibtex)
  48. Shadi Saleh, Pavel Pecina (2020): Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6849-6860, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-25-5 (pdf, local PDF, obd, bibtex)
  49. Lucia Specia, Loïc Barrault, Ozan Caglayan, Amanda Duarte, Desmond Elliott, Spandana Gella, Nils Holzenberger, Chiraag Lala, Sun Jae Lee, Jindřich Libovický, Pranava Madhyastha, Florian Metze, Karl Mulligan, Alissa Ostapenko, Shruti Palaskar, Ramon Sanabria, Josiah Wang, Raman Arora (2020): Grounded Sequence to Sequence Transduction. In: IEEE Journal on Selected Topics in Signal Processing, ISSN 1932-4553, vol. 14, no. 3, pp. 577-591 (url, local PDF, obd, bibtex)
  50. Vilém Zouhar, Ondřej Bojar (2020): Outbound Translation User Interface Ptakopet: A Pilot Study. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6967-6975, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, obd, bibtex)
  51. Vilém Zouhar, Michal Novák (2020): Extending Ptakopět for Machine Translation User Interaction Experiments. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 129-142 (pdf, local PDF, obd, bibtex)
  52. Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar (2020): WMT20 Document-Level Markable Error Exploration. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 371-380, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (url, local PDF, obd, bibtex)
  53. Erion Çano, Ondřej Bojar (2019): Keyphrase Generation: A Text Summarization Struggle. In: The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 666-672, NAACL-HLT 2019, Minneapolis, USA, ISBN 978-1-950737-13-0 (url, obd, bibtex)
  54. Erion Çano, Ondřej Bojar (2019): Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study. In: Proceedings of the 12th International Conference on Natural Language Generation (INLG 2019), pp. 229-239, Association for Computational Linguistics, Stroudsubrgh, PA, USA, ISBN 978-1-950737-94-9 (url, obd, bibtex)
  55. Loïc Barrault, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, Marcos Zampieri (2019): Findings of the 2019 Conference on Machine Translation (WMT19). In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 1-61, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, obd, bibtex)
  56. Ondřej Bojar, Raffaella Bernardi, Bonnie L. Webber (2019): Representation of sentence meaning (A JNLE Special Issue). In: Natural Language Engineering, ISSN 1351-3249, vol. 25, no. 4, pp. 427-432 (pdf, local PDF, obd, bibtex)
  57. Jindřich Helcl, Jindřich Libovický, Martin Popel (2019): CUNI System for the WMT19 Robustness Task. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 738-742, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, local PDF, local PDF, obd, bibtex)
  58. Daniel Kondratyuk, Ronald Cardenas, Ondřej Bojar (2019): Replacing Linguists with Dummies: A Serious Need for Trivial Baselinesin Multi-Task Neural Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 113, pp. 31-40 (pdf, obd, bibtex)
  59. Dominik Macháček, Jonáš Kratochvíl, Tereza Vojtěchová, Ondřej Bojar (2019): A Speech Test Set of Practice Business Presentations with Additional Relevant Texts. In: Statistical Language and Speech Processing, pp. 151-161, Springer Nature Switzerland AG, Cham, Switzerland, ISBN 978-3-030-31371-5 (url, obd, bibtex)
  60. Qingsong Ma, Johnny Tian-Zheng Wei, Ondřej Bojar, Yvette Graham (2019): Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges . In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 62-90, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, obd, bibtex)
  61. Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi (2019): Overview of the 6th Workshop on Asian Translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1-35, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-90-1 (pdf, obd, bibtex)
  62. Anna Nedoluzhko, Ondřej Bojar (2019): Towards Automatic Minuting of Meetings. In: Proceedings of the 19th Conference ITAT 2019: Slovenskočeský NLP workshop (SloNLP 2019), pp. 112-119, CreateSpace Independent Publishing Platform, Košice, Slovakia, ISBN 0000000000 (url, local PDF, obd, bibtex)
  63. Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze (2019): Multimodal Abstractive Summarization for How2 Videos. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6587-6596, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2 (url, local PDF, local PDF, obd, bibtex)
  64. Shantipriya Parida, Ondřej Bojar, Satya Ranjan Dash (2019): Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation. In: Computación y Sistemas, ISSN 1405-5546, vol. 23, no. 4, pp. 1499-1505 (url, obd, bibtex)
  65. Shantipriya Parida, Petr Motlíček, Ondřej Bojar (2019): Idiap NMT System for WAT 2019 Multi-Modal Translation Task. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 175-180, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-90-1 (pdf, obd, bibtex)
  66. Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar (2019): Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed. In: Computación y Sistemas, ISSN 1405-5546, vol. 23, no. 3, pp. 923-934 (url, obd, bibtex)
  67. Martin Popel, Dominik Macháček, Michal Auersperger, Ondřej Bojar, Pavel Pecina (2019): English-Czech Systems in WMT19: Document-Level Transformer. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 342-348, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (pdf, local PDF, obd, bibtex)
  68. Shadi Saleh, Pavel Pecina (2019): Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval. In: Advances in Information Retrieval; 41st European Conference on IR Research, ECIR 2019 , Lecture Notes in Computer Science, ISSN 0302-9743, 1, pp. 507-522, Springer International Publishing, Berlin, Germany, ISBN 978-3-030-15719-7 (url, local PDF, obd, bibtex)
  69. Dušan Variš, Ondřej Bojar (2019): Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 130-135, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-47-5 (pdf, local PDF, local PDF, obd, bibtex)

Projekt spadá pod GAČR EXPRO.

This project falls under GACR EXPRO.