Projekt NEUREM3 spojuje základní výzkum v oblasti zpracování mluvené řeči (speech processing, SP) a přirozeného jazyka (natural language processing, NLP) s důrazem na vícejazyčnost a multi-modalitu (zpracování řeči a textu s podporou obrazové informace). V jádru současných metod hlubokého strojového učení leží spojité vektorové reprezentace, které si neuronové sítě samy budují během trénování. Ačkoli empiricky dosahují neuronové sítě často vynikajících výsledků, znalosti a pochopení získaných reprezentací jsou nedostatečné. NEUREM3 má ambici tuto mezeru vyplnit a studovat neuronové reprezentace pro jednotky textu a řeči různého rozsahu (od fonémů a písmen až po proslovy a dokumenty) a reprezentace získané pro izolované úlohy i více úloh současně (multi-tasking). NEUREM3 vylepší architektury i techniky trénování neuronových sítí, aby je bylo možné trénovat na neúplných nebo nekoherentních datech.

Cíle projektu česky Systematická studie neuronových struktur pro modelování řeči a textu v multimodálních a multilingválních prostředích. Výzkum hierarchií neuronových reprezentací, jejich srozumitelnosti pro lidské uživatele a trénování v realistických podmínkách neideálních a nekoherentních dat.

----------

The NEUREM3 project encompasses basic research in speech processing (SP) and natural language processing (NLP) with accent on multi-linguality and multi-modality (speech and text processing with the support of visual information). Current deep machine learning methods are based on continuous vector representations that are created by the neural networks (NN) themselves during the training. Although empirically, the results of NNs are often excellent, our knowledge and understanding of such representations is insufficient. NEUREM3 has the ambition to fill this gap and to study neural representations for speech and text units of different scopes (from phonemes and letters to whole spoken and written documents) and representations acquired both for isolated tasks and multi-task setups. NEUREM3 will also improve NN architectures and training techniques, so that they can be trained on incomplete or incoherent data.

Systematic study of neural structures for speech and text modeling in multi-modal and multi-lingual settings. Addressing hierarchy of neural representations, human interpretability, and training under realistic conditions of non-ideal and incoherent data.

Publications

The following publications were (in part) supported by NEUREM3. Not all NEUREM3 publications are listed, only those where someone from UFAL team has participated.

Ivana Kvapilíková (2025): Unsupervised Machine Translation: How Machines Learn to Understand across Languages. In: , ISBN 978-80-246-6084-4 (bibtex)
Vilém Zouhar, Sunit Bhattacharya, Ondřej Bojar (2025): Multimodal Shannon Game with Images. In: ACAIN, pp. 32-45, Springer, Cham, Switzerland, ISBN 978-3-031-82486-9 (url, bibtex)
Ibrahim Sa'id Ahmad, Antonios Anastasopoulos, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, William Chen, Qianqian Dong, Marcello Federico, Barry Haddow, Dávid Javorský, Mateusz Krubiński, Tsz Kin Lam, Xutai Ma, Prashant Mathur, Evgeny Matusov, Chandresh Kumar Maurya, John McCrae, Kenton Murray, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, Atul Kr. Ojha, John Ortega, Sara Papi, Peter Polák, Pavel Pecina, Adam Pospíšil, Elizabeth Salesky, Nivedita Sethiya, Anoop Sarkar, Jiatong Shi, Claytone Sikasote, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Brian Thompson, Alex Waibel, Shinji Watanabe, Patrick Wilken, Petr Zemánek, Rodolfo Zevallos (2024): FINDINGS OF THE IWSLT 2024 EVALUATION CAMPAIGN. In: Proceedings of the 21st International Conference on Spoken Language Translation, pp. 1-11, Association for Computational Linguistics, Stroudsburg, USA, ISBN 979-8-89176-141-4 (url, bibtex)
Sunit Bhattacharya, Vilém Zouhar, Věra Kloudová, Ondřej Bojar (2024): Stroop Effect in Multi-Modal Sight Translation (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, pp. 1-5 (url, local PDF)
Dominika Ďurišková, Daniela Jurášová, Matúš Žilinec, Eduard Šubert, Ondřej Bojar (2024): Khan Academy Corpus: A multilingual corpus of Khan Academy lectures. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 9743-9752, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (url, bibtex)
Josef Jon, Ondřej Bojar (2024): GAATME: A Genetic Algorithm for Adversarial Translation Metrics Evaluation. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pp. 7562-7569, European Language Resources Association, Torino, Italy, ISBN 978-2-493814-10-4 (url, bibtex)
Mateusz Krubiński, Pavel Pecina (2024): Towards Unified Uni- and Multi-modal News Headline Generation. In: Findings of the Association for Computational Linguistics: EACL 2024, pp. 437-450, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-093-6 (pdf, bibtex)
Dominik Macháček (2024): Multi-Source Simultaneous Speech Translation (PhD thesis). In: (url, bibtex)
Adam Osuský, Dávid Javorský, Ondřej Bojar (2024): InsBERT: Word importance from artificial insertions. In: Proceedings of the 24th Conference Information Technologies – Applications and Theory (ITAT 2024), pp. 96-106, CEUR-WS.org, Košice, Slovakia (pdf, bibtex)
Hening Wang, Leixin Zhang, Ondřej Bojar (2024): Human and Machine: Language Processing in Translation Tasks. In: Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024), pp. 243-250, Association for Computational Linguistics, Online (url, bibtex)
Uladzislau Yorsh, Martin Holeňa, Ondřej Bojar, David Herel (2024): On Difficulties of Attention Factorization through Shared Memory. In: The Second Tiny Papers Track at ICLR 2024, pp. 1-8, OpenReview.net (bibtex)
Leixin Zhang, David Burian, Vojtěch John, Ondřej Bojar (2024): Unveiling Semantic Information in Sentence Embeddings. In: Proceedings of the Fifth International Workshop on Designing Meaning Representations (DMR 2024) @ LREC-COLING 2024, pp. 39-47, ELRA Language Resource Association, ISBN 978-2-493814-39-5 (url, bibtex)
Vilém Zouhar, Ondřej Bojar (2024): Quality and Quantity of Machine Translation References for Automatic Metrics. In: Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024, pp. 1-11, ELRA, Paris, France, ISBN 978-2-493814-41-8 (url, bibtex)
Vilém Zouhar, Věra Kloudová, Martin Popel, Ondřej Bojar (2024): Evaluating Optimal Reference Translations. In: Natural Language Processing, ISSN 2977-0424, 2024, pp. 1-24 (url, bibtex)
Milind Agarwal, Sweta Agrawal, Antonios Anastasopoulos, Luisa Bentivogli, Ondřej Bojar, Claudia Borg, Marine Carpuat, Roldano Cattoni, Mauro Cettolo, Mingda Chen, William Chen, Khalid Choukri, Alexandra Chronopoulou, Thierry Declerck, Qianqian Dong, Kevin Duh, Yannick Estève, Marcello Federico, Souhir Gahbiche, Barry Haddow, Benjamin Hsu, Phu Mon Htut, Hirofumi Inaguma, Dávid Javorský, John Judge, Yasumasa Kano, Tom Ko, Rishu Kumar, Pengwei Li, Xutai Ma, Prashant Mathur, Evgeny Matusov, Paul McNamee, John McCrae, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Ha Nguyen, Jan Niehues, Xing Niu, Atul Kr. Ojha, John Ortega, Proyag Pal, Juan Pino, Lonneke van der Plas, Peter Polák, Elijah Rippeth, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Yun Tang, Brian Thompson, Kevin Tran, Marco Turchi, Alex Waibel, Mingxuan Wang, Shinji Watanabe, Rodolfo Zevallos (2023): FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN. In: Proceedings of the 20th International Conference on Spoken Language Translation, pp. 1-61, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-959429-84-5 (url, bibtex)
Sunit Bhattacharya, Ondřej Bojar (2023): Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, pp. 120-126 (pdf)
Tirthankar Ghosal, Ondřej Bojar, Marie Hledíková, Tom Kocmi, Anna Nedoluzhko (2023): Overview of the Second Shared Task on Automatic Minuting (AutoMin) at INLG 2023. In: Proceedings of the 16th International Natural Language Generation Conference, pp. 138-167, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-001-1 (bibtex)
Dávid Javorský, Ondřej Bojar, François Yvon (2023): Assessing Word Importance Using Models Trained for Semantic Tasks. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 8846-8856, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (pdf, bibtex)
Josef Jon, Ondřej Bojar (2023): Character-level NMT and language similarity. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 360-371, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
Josef Jon, Ondřej Bojar (2023): Breeding Machine Translations: Evolutionary approach to survive and thrive in the world of automated evaluation. In: Proceedings of 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2191-2212, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-72-2 (url, bibtex)
Josef Jon, Martin Popel, Ondřej Bojar (2023): CUNI at WMT23 General Translation Task: MT and a Genetic Algorithm. In: Proceedings of the Eighth Conference on Machine Translation, pp. 119-127, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-041-7 (pdf, bibtex)
Josef Jon, Dušan Variš, Michal Novák, Joao Paulo Aires, Ondřej Bojar (2023): Negative Lexical Constraints in Neural Machine Translation. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 372-384, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
Kristýna Klesnilová, Michelle Elizabeth (2023): Team Synapse @ AutoMin 2023: Leveraging BART-Based Models for Automatic Meeting Minuting. In: Proceedings of the 16th International Natural Language Generation Conference: Generation Challenges, pp. 108-113, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-003-5 (url, bibtex)
Věra Kloudová, David Mraček, Ondřej Bojar, Martin Popel (2023): Možnosti a meze tvorby tzv. optimálních referenčních překladů: po stopách „překladatelštiny“ v profesionálních překladech zpravodajských textů. In: Slovo a slovesnost, ISSN 0037-7031, vol. 84, no. 2, pp. 122-156 (url, bibtex)
Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondřej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Philipp Koehn, Benjamin Marie, Christof Monz, Makoto Morishita, Kenton Murray, Makoto Nagata, Toshiaki Nakazawa, Martin Popel, Maja Popović, Mariya Shmatova (2023): Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here but Not Quite There Yet. In: Proceedings of the Eighth Conference on Machine Translation, pp. 1-42, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-041-7 (url, bibtex)
Mateusz Krubiński, Pavel Pecina (2023): MLASK: Multimodal Summarization of Video-based News Articles. In: Findings of the Association for Computational Linguistics: EACL 2023, pp. 910-924, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-47-0 (pdf, bibtex)
Ivana Kvapilíková, Ondřej Bojar (2023): Low-Resource Machine Translation Systems for Indic Languages. In: Proceedings of the Eighth Conference on Machine Translation, pp. 954-958, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 979-8-89176-041-7 (bibtex)
Ivana Kvapilíková, Ondřej Bojar (2023): Boosting Unsupervised Machine Translation with Pseudo-Parallel Data. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 135-147, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (bibtex)
Dominik Macháček, Ondřej Bojar, Raj Dabre (2023): MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation. In: Proceedings of the 20th International Conference on Spoken Language Translation, pp. 169-179, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-959429-84-5 (pdf, local PDF, bibtex)
Dominik Macháček, Raj Dabre, Ondřej Bojar (2023): Turning Whisper into Real-Time Transcription System. In: Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 13th International Joint Conference on Natural Language Processing: System Demonstrations, pp. 17-24, Asian Federation of Natural Language Processing, Bali, Indonesia (pdf, bibtex)
Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre (2023): Robustness of Multi-Source MT to Transcription Errors. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 3707-3723, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (pdf, bibtex)
Toshiaki Nakazawa, Kazutaka Kinugawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, Makoto Morishita, Ondřej Bojar, Akiko Eriguchi, Yusuke Oda, Chenhui Chu, Sadao Kurohashi (2023): Overview of the 10th Workshop on Asian Translation. In: Proceedings of the 10th Workshop on Asian Translation, pp. 1-28, International Conference on Computational Linguistics, Macau, China (bibtex)
Kristýna Neumannová, Ondřej Bojar (2023): The Role of Compounds in Human vs. Machine Translation Quality. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 248-260, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
Shantipriya Parida, Ondřej Bojar (2023): HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language. In: Findings of the Association for Computational Linguistics: ACL 2023, pp. 10162-10183, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-62-3 (bibtex)
Andrej Perković, Jernej Vičič, Dávid Javorský, Ondřej Bojar (2023): Shortening of the results of machine translation using paraphrasing dataset. In: Proceedings of the 23rd Conference Information Technologies – Applications and Theory (ITAT 2023), pp. 121-130, 23rd Conference on Information Technologies – Applications and Theory, Košice, Slovakia (pdf, bibtex)
Peter Polák (2023): Long-form Simultaneous Speech Translation: Thesis Proposal. In: Proceedings of the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 13th International Joint Conference on Natural Language Processing: Student Research Workshop, pp. 64-74, Association for Computational Linguistics, Stroudsburg, PA, USA (url, local PDF, bibtex)
Peter Polák, Danni Liu, Ngoc-Quan Ngoc, Jan Niehues, Alex Waibel, Ondřej Bojar (2023): Towards Efficient Simultaneous Speech Translation: CUNI-KIT System for Simultaneous Track at IWSLT 2023. In: Proceedings of the 20th International Conference on Spoken Language Translation, pp. 389-396, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-959429-84-5 (url, bibtex)
Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar (2023): Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff. In: Proceedings of the 24st Annual Conference of the International Speech Communication Association, pp. 3979-3983, International Speech Communication Association, Baixas, France (url, bibtex)
František Trebuňa, Kristína Szabová, Ondřej Bojar (2023): Searching for Reasons of Transformers’ Success: Memorization vs Generalization. In: 26th International Conference, TSD 2023, pp. 25-32, Springer, Cham, Switzerland, ISBN 978-3-031-40497-9 (url, bibtex)
Iryna Tryhubyshyn, Aleš Tamchyna, Ondřej Bojar (2023): Bad MT Systems are Good for Quality Estimation. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 200-208, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (url, bibtex)
Dušan Variš (2023): Learning capabilities in Transformer Neural Networks (PhD thesis). In: (url, bibtex)
Idris Abdulmumin, Satya Ranjan Dash, Musa Abdullahi Dawud, Shantipriya Parida, Shamsuddeen Hassan Muhammad, Ibrahim Sa'id Ahmad, Subhadarshi Panda, Ondřej Bojar, Bashir Shehu Galadanci, Bello Shehu Bello (2022): Hausa Visual Genome: A Dataset for Multi-Modal English to Hausa Machine Translation. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 6471-6479, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (url, local PDF, bibtex)
Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Věra Kloudová, Surafel Melaku Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nadejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Yun Tang, Matthias Sperber, Sebastian Stuker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alex Waibel, Changhan Wang, Shinji Watanabe (2022): FINDINGS OF THE IWSLT 2022 EVALUATION CAMPAIGN. In: Proceedings of the 19th International Conference on Spoken Language Translation, pp. 98-157, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-955917-41-4 (url, local PDF, bibtex)
Michal Auersperger, Pavel Pecina (2022): Defending Compositionality in Emergent Languages. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, pp. 285-291, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-73-5 (pdf, local PDF, bibtex)
Niyati Bafna, Martin Vastl, Ondřej Bojar (2022): Constrained Decoding for Technical Term Retention in English-Hindi MT. In: Proceedings of ICON 2021: 18th International Conference on Natural Language Processing, pp. 1-6, NLP Association India, Centre for Natural Language Processing, Department of Computer Science and Engineering, Silchar, India (local PDF, bibtex)
Rachel Bawden, Ondřej Bojar, Rajen Chatterjee, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Rebecca Knowles, Tom Kocmi, Philipp Koehn, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Michal Novák, Martin Popel, Maja Popović, Mariya Shmatova, Marco Turchi (2022): Findings of the 2022 Conference on Machine Translation (WMT22). In: Proceedings of the Seventh Conference on Machine Translation, pp. 1-34, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Sunit Bhattacharya, Rishu Kumar, Ondřej Bojar (2022): Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Model. In: Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pp. 130-135, Association for Computational Linguistics, Stroudsburg, PA, USA (local PDF, bibtex)
Sunit Bhattacharya, Vilém Zouhar, Ondřej Bojar (2022): Sentence Ambiguity, Grammaticality and Complexity Probes. In: Proceedings of the 5th Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 1-11, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Lukáš Burget, Ondřej Bojar (2022): Průběžná zpráva NEUREM3 (technical report). In: (pdf, bibtex)
Satya Ranjan Dash, Shantipriya Parida, Esau Villatoro Tello, Biswaranjan Acharya, Ondřej Bojar (2022): Natural Language Processing In Healthcare, A Special Focus on Low Resource Languages. In: , ISBN 9780367685393 (bibtex)
Muskan Garg, Seema Wazarkar, Muskaan Singh, Ondřej Bojar (2022): Multimodality for NLP-Centered Applications: Resources, Advances and Frontiers. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 6837-6847, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (url, local PDF, bibtex)
Jindřich Helcl, Barry Haddow, Alexandra Birch (2022): Non-Autoregressive Machine Translation: It's Not as Fast as it Seems. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1780-1790, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-71-1 (local PDF, bibtex)
Christian Huber, Rishu Kumar, Ondřej Bojar, Alex Waibel (2022): Short-Term Word-Learning in a Dynamically Changing Environment (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, pp. 1-4 (url)
Dávid Javorský, Dominik Macháček, Ondřej Bojar (2022): Continuous Rating as Reliable Human Evaluation of Simultaneous Speech Translation. In: Proceedings of the Seventh Conference on Machine Translation, pp. 154-164, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Josef Jon, Martin Popel, Ondřej Bojar (2022): CUNI-Bergamot Submission at WMT22 General Task. In: Proceedings of the Seventh Conference on Machine Translation, pp. 280-289, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Mateusz Krubiński, Pavel Pecina (2022): From COMET to COMES – Can Summary Evaluation Benefit from Translation Evaluation?. In: Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems, pp. 21-31, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Nalin Kumar, Ondřej Bojar (2022): Genre Transfer in NMT: Creating Synthetic Spoken Parallel Sentences using Written Parallel Data. In: 19th International Conference on Natural Language Processing, pp. 224-233, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-38-8 (url, local PDF, bibtex)
Ivana Kvapilíková, Ondřej Bojar (2022): CUNI Submission to MT4All Shared Task. In: Proceedings of the LREC 2022 Workshop of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL 2022), pp. 78-82, European Language Resources Association (ELRA), Paris, France, ISBN 979-10-95546-91-7 (bibtex)
Jiří Mayer, Pavel Pecina (2022): Obstacles with Synthesizing Training Data for OMR. In: Proceedings of the 4th International Workshop on Reading Music Systems, pp. 15-19, University of Alicante, Alicante, Spain (url, local PDF, bibtex)
Toshiaki Nakazawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, Anoop Kunchukuttan, Makoto Morishita, Ondřej Bojar, Chenhui Chu, Kaori Abe, Yusuke Oda, Sadao Kurohashi (2022): Overview of the 9th Workshop on Asian Translation. In: Proceedings of the 9th Workshop on Asian Translation, pp. 1-36, International Conference on Computational Linguistics, Gyeongju, Korea (url, bibtex)
Anna Nedoluzhko, Muskaan Singh, Marie Hledíková, Tirthankar Ghosal, Ondřej Bojar (2022): ELITR Minuting Corpus: A Novel Dataset for Automatic Minuting from Multi-Party Meetings in English and Czech. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 3174-3182, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, local PDF, bibtex)
Peter Polák, Muskaan Singh, Anna Nedoluzhko, Ondřej Bojar (2022): ALIGNMEET: A Comprehensive Tool for Meeting Annotation, Alignment, and Evaluation. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 1771-1779, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, local PDF, bibtex)
Borek Požár, Klára Tauchmanová, Kristýna Neumannová, Ivana Kvapilíková, Ondřej Bojar (2022): CUNI Submission to the BUCC 2022 Shared Task on Bilingual Term Alignment. In: Proceedings of the LREC 2022 15th Workshop on Building and Using Comparable Corpora, pp. 43-49, European Language Resources Association, Paris, France, ISBN 979-10-95546-94-8 (local PDF, bibtex)
Philipp Rösch, Jindřich Libovický (2022): Probing the Role of Positional Information in Vision-Language Models. In: Findings of the Association for Computational Linguistics: NAACL 2022, pp. 1031-1041, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-76-6 (url, local PDF, local PDF, bibtex)
Sukanta Sen, Ondřej Bojar, Barry Haddow (2022): Simultaneous Translation for Unsegmented Input: A Sliding Window Approach (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, pp. 1-8 (url)
Kartik Shinde, Tirthankar Ghosal, Ondřej Bojar (2022): Automatic minuting: A pipeline method for generating minutes from multi-party meeting proceedings. In: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pp. 1-12, ACL, Stroudsburg PA 18360, USA (url, local PDF, bibtex)
Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussà, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher M. Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri (2021): Findings of the 2021 Conference on Machine Translation (WMT21). In: Proceedings of the Sixth Conference on Machine Translation, pp. 1-88, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (pdf, local PDF, bibtex)
Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alex Waibel, Changhan Wang, Matthew Wiesner (2021): FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN. In: Proceedings of the 18th International Conference on Spoken Language Translation, pp. 1-29, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-74-9 (url, local PDF, bibtex)
Ebrahim Ansari, Ondřej Bojar, Barry Haddow, Mohammad Mahmoudi (2021): SLTev: Comprehensive Evaluation of Spoken Language Translation. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 71-79, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-954085-05-3 (url, local PDF, local PDF, bibtex)
Michal Auersperger, Pavel Pecina (2021): Solving SCAN Tasks with Data Augmentation and Input Embeddings. In: Proceedings of the Recent Advances in Natural Language Processing, pp. 86-91, INCOMA Ltd., Shoumen, Bulgaria, ISBN 978-954-452-072-4 (pdf, local PDF, bibtex)
Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar (2021): Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain. In: Proceedings of the Sixth Conference on Machine Translation, pp. 733-774, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
Petr Gebauer, Ondřej Bojar, Vojtěch Švandelík, Martin Popel (2021): CUNI Systems in WMT21: Revisiting Backtranslation Techniques for English-Czech NMT. In: Proceedings of the Sixth Conference on Machine Translation, pp. 123-129, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
Michael Hanna, Ondřej Bojar (2021): A Fine-Grained Analysis of BERTScore. In: Proceedings of the Sixth Conference on Machine Translation, pp. 507-517, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
Michael Hanna, David Mareček (2021): Analyzing BERT’s Knowledge of Hypernymy via Prompting. In: Proceedings of the 4th Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 275-282, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-06-3 (pdf, bibtex)
Josef Jon, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): End-to-End Lexically Constrained Machine Translation for Morphologically Rich Languages. In: Proceedings of the Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp. 4019-4033, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-52-7 (url, local PDF, bibtex)
Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Terminology translation Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 828-834, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 354-361, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
Věra Kloudová, Ondřej Bojar, Martin Popel (2021): Detecting Post-edited References and Their Effect on Human Evaluation. In: Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval), pp. 114-119, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-10-7 (pdf, local PDF, bibtex)
Matyáš Kopp, Vladislav Stankov, Jan Oldřich Krůza, Pavel Straňák, Ondřej Bojar (2021): ParCzech 3.0: A Large Czech Speech Corpus with Rich Metadata. In: 24th International Conference on Text, Speech and Dialogue, pp. 293-304, Springer, Cham, Switzerland, ISBN 978-3-030-83526-2 (pdf, local PDF, bibtex)
Ivana Kvapilíková, Ondřej Bojar (2021): Machine Translation of Covid-19 Information Resources via Multilingual Transfer. In: ITAT 2021 2nd Workshop on Automata, Formal and Natural Languages – WAFNL 2021, pp. 176-181, Faculty of Mathematics and Physics, Praha, Czechia (pdf, local PDF, bibtex)
Dominik Macháček, Matúš Žilinec, Ondřej Bojar (2021): Lost in Interpreting: Speech Translation from Source or Interpreter?. In: Proceedings of INTERSPEECH 2021, pp. 2376-2380, ISCA, Baxas, France (pdf, local PDF, bibtex)
Jiří Mayer, Pavel Pecina (2021): Synthesizing Training Data for Handwritten Music Recognition. In: Document Analysis and Recognition -- ICDAR 2021, Lecture Notes in Computer Science, ISSN 0302-9743, 12823, pp. 626-641, Springer International Publishing, Cham, Switzerland, ISBN 978-3-030-86333-3 (pdf, bibtex)
Toshiaki Nakazawa, Hideki Nakayma, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi (2021): Overview of the 8th Workshop on Asian Translation. In: Proceedings of the 8th Workshop on Asian Translation, pp. 1-45, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, bibtex)
Shantipriya Parida, Subhadarshi Panda, Ketan Kotwal, Amulya Ratna Dash, Satya Ranjan Dash, Yashvardhan Sharma, Petr Motlíček, Ondřej Bojar (2021): NLPHut’s Participation at WAT2021. In: Proceedings of the 8th Workshop on Asian Translation, pp. 146-154, Association for Computational Linguistics, Stroudsburg, USA (pdf, bibtex)
Peter Polák, Ondřej Bojar (2021): Coarse-To-Fine And Cross-Lingual ASR Transfer. In: ITAT 2021 2nd Workshop on Automata, Formal and Natural Languages – WAFNL 2021, pp. 154-160, Faculty of Mathematics and Physics, Praha, Czechia (pdf, local PDF, bibtex)
Peter Polák, Muskaan Singh, Ondřej Bojar (2021): Explainable Quality Estimation: CUNI Eval4NLP Submission. In: Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pp. 250-255, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
Arghyadeep Sen, Shantipriya Parida, Ketan Kotwal, Subhadarshi Panda, Ondřej Bojar, Satya Ranjan Dash (2021): Bengali Visual Genome: A Multimodal Dataset for Machine Translation and Image Captioning. In: 9th International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA 2021), pp. 63-70, Springer Nature Singapore, Singapore, ISBN 978-981-16-6624-7 (local PDF, bibtex)
Muskaan Singh, Tirthankar Ghosal, Ondřej Bojar (2021): An Empirical Performance Analysis of State-of-the-Art Summarization Models for Automatic Minuting. In: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation, pp. 50-60, ACL, 209 N. Eighth Street, Stroudsburg PA 18360, USA (url, bibtex)
Dušan Variš, Ondřej Bojar (2021): Sequence Length is a Domain: Length-based Overfitting in Transformer Models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8246-8257, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-09-4 (pdf, local PDF, local PDF, local PDF, bibtex)
Vilém Zouhar (2021): Sampling and Filtering of Neural Machine Translation Distillation Data. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pp. 1-8, Association for Computational Linguistics, Stroudsburg, USA, ISBN 978-1-954085-50-3 (pdf, bibtex)
Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya (2021): Backtranslation Feedback Improves User Confidence in MT, Not Quality. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 151-161, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-46-6 (url, local PDF, bibtex)
Vilém Zouhar, Aleš Tamchyna, Martin Popel, Ondřej Bojar (2021): Neural Machine Translation Quality and Post-Editing Performance. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 10204-10214, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-09-4 (pdf, local PDF, bibtex)
Hadi Abdi Khojasteh, Ebrahim Ansari, Mahdi Bohlouli (2020): LSCP: Enhanced Large Scale Colloquial Persian Language Understanding. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6323-6327, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
Ebrahim Ansari, Amittai Axelrod, Nguyen Bach, Ondřej Bojar, Roldano Cattoni, Fahim Dalvi, Nadir Durrani, Marcello Federico, Christian Federmann, Jiatao Gu, Fei Huang, Kevin Knight, Xutai Ma, Ajay Nagesh, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Xing Shi, Sebastian Stüker, Marco Turchi, Alex Waibel, Changhan Wang (2020): FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 1-34, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, bibtex)
Petra Barančíková, Ondřej Bojar (2020): Costra 1.1: An Inquiry into Geometric Properties of Sentence Spaces. In: 23rd International Conference on Text, Speech and Dialogue, pp. 135-143, Springer, Cham, Switzerland, ISBN 978-3-030-58322-4 (local PDF, bibtex)
Petra Barančíková, Ondřej Bojar (2020): COSTRA 1.0: A Dataset of Complex Sentence Transformations. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 3535-3541, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri (2020): Findings of the 2020 Conference on Machine Translation (WMT20). In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1-55, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, bibtex)
Erion Çano, Ondřej Bojar (2020): How Many Pages? Paper Length Prediction from the Metadata. In: 4th International Conference on Natural Language Processing and Information Retrieval, pp. 91-95, ACM, New York, USA, ISBN 978-1-4503-7760-7 (url, local PDF, bibtex)
Erion Çano, Ondřej Bojar (2020): Two Huge Title and Keyword Generation Corpora of Research Articles. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6663-6671, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
Jonáš Kratochvíl, Peter Polák, Ondřej Bojar (2020): Large Corpus of Czech Parliament Plenary Hearings. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6363-6367, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
Ivana Kvapilíková, Mikel Artetxe, Gorka Labaka, Eneko Agirre, Ondřej Bojar (2020): Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 255-262, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-03-3 (url, local PDF, bibtex)
Ivana Kvapilíková, Tom Kocmi, Ondřej Bojar (2020): CUNI Systems for the Unsupervised and Very Low Resource Translation Task in WMT20. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 1123-1128, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, bibtex)
Jindřich Libovický, Zdeněk Kasner, Jindřich Helcl, Ondřej Dušek (2020): Expand and Filter: CUNI and LMU Systems for the WNGT 2020 Duolingo Shared Task. In: Proceedings of the Fourth Workshop on Neural Generation and Translation, pp. 153-160, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-17-0 (url, local PDF, bibtex)
Dominik Macháček, Jonáš Kratochvíl, Sangeet Sagar, Matúš Žilinec, Ondřej Bojar, Thai-Son Nguyen, Felix Schneider, Philip Williams, Yuekun Yao (2020): ELITR Non-Native Speech Translation at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 200-208, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (pdf, local PDF, bibtex)
Nitika Mathur, Johnny Tian-Zheng Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar (2020): Results of the WMT20 Metrics Shared Task. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 688-725, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, local PDF, bibtex)
Toshiaki Nakazawa, Hideki Nakayma, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi (2020): Overview of the 7th Workshop on Asian Translation. In: Proceedings of the 7th Workshop on Asian Translation (WAT2020), pp. 1-44, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, bibtex)
Shantipriya Parida, Petr Motlíček, Amulya Ratna Dash, Satya Ranjan Dash, Debasish Kumar Mallick, Satya Prakash Biswal, Priyanka Pattnaik, Biranchi Narayan Nayak, Ondřej Bojar (2020): ODIANLP’s Participation in WAT2020. In: Proceedings of the 7th Workshop on Asian Translation (WAT2020), pp. 103-108, Association for Computational Linguistics, Stroudsburg, USA (url, local PDF, bibtex)
Peter Polák, Sangeet Sagar, Dominik Macháček, Ondřej Bojar (2020): CUNI Neural ASR with Phoneme-Level Intermediate Step for Non-Native SLT at IWSLT 2020. In: Proceedings of the 17th International Conference on Spoken Language Translation, pp. 191-199, Association for Computational Linguistics, Online, ISBN 978-1-952148-07-1 (url, local PDF, bibtex)
Martin Popel, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, Zdeněk Žabokrtský (2020): Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. In: Nature Communications, ISSN 2041-1723, vol. 11, no. 4381, pp. 1-15 (url, local PDF, bibtex)
Shadi Saleh, Pavel Pecina (2020): Document Translation vs. Query Translation for Cross-Lingual Information Retrieval in the Medical Domain. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6849-6860, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-25-5 (pdf, local PDF, bibtex)
Lucia Specia, Loïc Barrault, Ozan Caglayan, Amanda Duarte, Desmond Elliott, Spandana Gella, Nils Holzenberger, Chiraag Lala, Sun Jae Lee, Jindřich Libovický, Pranava Madhyastha, Florian Metze, Karl Mulligan, Alissa Ostapenko, Shruti Palaskar, Ramon Sanabria, Josiah Wang, Raman Arora (2020): Grounded Sequence to Sequence Transduction. In: IEEE Journal on Selected Topics in Signal Processing, ISSN 1932-4553, vol. 14, no. 3, pp. 577-591 (url, local PDF, bibtex)
Vilém Zouhar, Ondřej Bojar (2020): Outbound Translation User Interface Ptakopet: A Pilot Study. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 6967-6975, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
Vilém Zouhar, Michal Novák (2020): Extending Ptakopět for Machine Translation User Interaction Experiments. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 129-142 (pdf, local PDF, bibtex)
Vilém Zouhar, Tereza Vojtěchová, Ondřej Bojar (2020): WMT20 Document-Level Markable Error Exploration. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 371-380, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (url, local PDF, bibtex)
Loïc Barrault, Ondřej Bojar, Marta R. Costa-Jussà, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias Müller, Santanu Pal, Matt Post, Marcos Zampieri (2019): Findings of the 2019 Conference on Machine Translation (WMT19). In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 1-61, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)
Ondřej Bojar, Raffaella Bernardi, Bonnie L. Webber (2019): Representation of sentence meaning (A JNLE Special Issue). In: Natural Language Engineering, ISSN 1351-3249, vol. 25, no. 4, pp. 427-432 (pdf, local PDF, bibtex)
Erion Çano, Ondřej Bojar (2019): Keyphrase Generation: A Text Summarization Struggle. In: The 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 666-672, NAACL-HLT 2019, Minneapolis, MN, USA, ISBN 978-1-950737-13-0 (url, bibtex)
Erion Çano, Ondřej Bojar (2019): Efficiency Metrics for Data-Driven Models: A Text Summarization Case Study. In: Proceedings of the 12th International Conference on Natural Language Generation (INLG 2019), pp. 229-239, Association for Computational Linguistics, Stroudsubrgh, PA, USA, ISBN 978-1-950737-94-9 (url, bibtex)
Jindřich Helcl, Jindřich Libovický, Martin Popel (2019): CUNI System for the WMT19 Robustness Task. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 738-742, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, local PDF, local PDF, bibtex)
Daniel Kondratyuk, Ronald Cardenas, Ondřej Bojar (2019): Replacing Linguists with Dummies: A Serious Need for Trivial Baselinesin Multi-Task Neural Machine Translation. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 113, pp. 31-40 (pdf, bibtex)
Dominik Macháček, Jonáš Kratochvíl, Tereza Vojtěchová, Ondřej Bojar (2019): A Speech Test Set of Practice Business Presentations with Additional Relevant Texts. In: Statistical Language and Speech Processing, pp. 151-161, Springer Nature Switzerland AG, Cham, Switzerland, ISBN 978-3-030-31371-5 (url, bibtex)
Qingsong Ma, Johnny Tian-Zheng Wei, Ondřej Bojar, Yvette Graham (2019): Results of the WMT19 Metrics Shared Task: Segment-Level and Strong MT Systems Pose Big Challenges . In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 62-90, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)
Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi (2019): Overview of the 6th Workshop on Asian Translation. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 1-35, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-90-1 (pdf, bibtex)
Anna Nedoluzhko, Ondřej Bojar (2019): Towards Automatic Minuting of Meetings. In: Proceedings of the 19th Conference ITAT 2019: Slovenskočeský NLP workshop (SloNLP 2019), pp. 112-119, CreateSpace Independent Publishing Platform, Košice, Slovakia (url, local PDF, bibtex)
Shruti Palaskar, Jindřich Libovický, Spandana Gella, Florian Metze (2019): Multimodal Abstractive Summarization for How2 Videos. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6587-6596, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2 (url, local PDF, local PDF, bibtex)
Shantipriya Parida, Ondřej Bojar, Satya Ranjan Dash (2019): Hindi Visual Genome: A Dataset for Multimodal English-to-Hindi Machine Translation. In: Computación y Sistemas, ISSN 1405-5546, vol. 23, no. 4, pp. 1499-1505 (url, bibtex)
Shantipriya Parida, Petr Motlíček, Ondřej Bojar (2019): Idiap NMT System for WAT 2019 Multi-Modal Translation Task. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 175-180, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-90-1 (pdf, bibtex)
Thuong-Hai Pham, Dominik Macháček, Ondřej Bojar (2019): Promoting the Knowledge of Source Syntax in Transformer NMT Is Not Needed. In: Computación y Sistemas, ISSN 1405-5546, vol. 23, no. 3, pp. 923-934 (url, bibtex)
Martin Popel, Dominik Macháček, Michal Auersperger, Ondřej Bojar, Pavel Pecina (2019): English-Czech Systems in WMT19: Document-Level Transformer. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 342-348, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (pdf, local PDF, bibtex)
Shadi Saleh, Pavel Pecina (2019): Term Selection for Query Expansion in Medical Cross-Lingual Information Retrieval. In: Advances in Information Retrieval; 41st European Conference on IR Research, ECIR 2019 , Lecture Notes in Computer Science, ISSN 0302-9743, 1, pp. 507-522, Springer International Publishing, Berlin, Germany, ISBN 978-3-030-15719-7 (url, local PDF, bibtex)
Dušan Variš, Ondřej Bojar (2019): Unsupervised Pretraining for Neural Machine Translation Using Elastic Weight Consolidation. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 130-135, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-47-5 (pdf, local PDF, local PDF, bibtex)

Projekt spadá pod GAČR EXPRO.

This project falls under GACR EXPRO.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

NEUREM3

Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling)

Publications