Martin Popel
Main Research Interests
MT (neural, syntax-based), Universal Dependencies, machine learning, deep learning, parsing, MT evaluation, treebanking
Projects
Grants: LUSyD, QT21, QTLeap, Manyla, Khresmoi, EuroMatrixPlus
Technical editor: The Prague Bulletin of Mathematical Linguistics and ÚFAL technical reports
Projects / Software / Data
- CUBBITT (neural machine translation, WMT2018–WMT2020 EN-CS winner, see my paper in Nature Communications)
- Udapi (NLP framework) [GitHub]
- Universal Dependencies
- MT-ComparEval a tool for comparison and evaluation of MT outputs. See wmt.ufal.cz for a live demo.
Older
Curriculum Vitae
- 2019 visiting researcher in Microsoft Translator, Redmond, USA
- 2009–2018 Ph.D. degree in Mathematical Linguistics, ÚFAL MFF UK. Thesis: Machine Translation Using Syntactic Analysis (slides)
- 2007–2009 Master's degree in Mathematical Linguistics, with honours, MFF UK. Thesis: Ways to Improve the Quality of English-Czech Machine Translation
- 2003–2007 Bachelor's degree in Computer Science, MFF UK. Thesis: Animation of Algorithms from Automata Theory (in Czech)
Teaching
- NPFL095 Modern Methods in Computational Linguistics, see the web with schedule
- NPFL070 Language Data Resources (with Zdeněk Žabokrtský)
- NPFL118 Natural language processing on computational cluster
Selected Bibliography
Selected papers
- Martin Popel, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondřej Bojar, Zdeněk Žabokrtský: Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals, Nature communications, 11, 4381 (2020).
- Martin Popel, Ondřej Bojar: Training Tips for the Transformer Model, The Prague Bulletin of Mathematical Linguistics, No. 104, 2018, pp. 43–70.
- Martin Popel: CUNI Transformer Neural MT System for WMT18 In Proceedings of WMT 2018, Brussels, Belgium, October 2018, pp. 486–491 [pdf]
- Martin Popel, Zdenek Žabokrtský, Martin Vojtek: Udapi: Universal API for Universal Dependencies In Proceedings of UDW 2017, Göteborg, Sweden, May 2017, pp. 96–101 [pdf] [slides]
- Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre et al.: CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies In Proceedings of CoNLL 2017, Vancouver, Canada, August 2017, pp. 1–19 [pdf]
- Roman Sudarikov, Martin Popel, Ondřej Bojar, Aljoscha Burchardt and Ondřej Klejch: Using MT-ComparEval In Proceedings of MT-Eval LREC 2016, Portorož, Slovenia, May 2016, pp. 76–82 [pdf] [slides]
- Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aurelie Neveol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor and Marcos Zampieri: Findings of the 2016 Conference on Machine Translation In Proceedings of WMT 2016, Berlin, Germany, August 2016, pp. 131–198 [pdf] [slides]
- Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, Zdeněk Žabokrtský: Coordination Structures in Dependency Treebanks In Proceedings of ACL 2013, Sofia, Bulgaria, August 5–7, 2013, pp. 517–527. [pdf] [poster]
- Ondřej Bojar, Miloš Ercegovčević, Martin Popel and Omar Zaidan: A Grain of Salt for the WMT Manual Evaluation. In Proceedings of WMT 2011, EMNLP 6th Workshop on Statistical Machine Translation, Edinburgh, UK, July 30, 2011, pp. 1–11. [pdf] presentation [pdf]
- Martin Popel, Zdeněk Žabokrtský: TectoMT: Modular NLP Framework. In Proceedings of IceTAL, 7th International Conference on Natural Language Processing, Reykjavík, Iceland, August 17, 2010, pp. 293–304. [pdf] presentation [pdf]
- Zdeněk Žabokrtský, Martin Popel: Hidden Markov Tree Model in Dependency-based Machine Translation. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics, Singapore, 2009, pp. 145–148 [pdf] poster [ppt]
All papers
- Coreference meets Universal Dependencies – a pilot experiment on harmonizing coreference datasets for 11 languages (technical report). In: (pdf, local PDF, bibtex)
- Speed-optimized, Compact Student Models that Distill Knowledge from a Larger Teacher Model: the UEDIN-CUNI Submission to the WMT 2020 News Translation Task. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 191-196, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, obd, bibtex)
- CUNI English-Czech and English-Polish Systems in WMT20: Robust Document-Level Training. In: Fifth Conference on Machine Translation - Proceedings of the Conference, pp. 269-273, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, obd, bibtex)
- Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. In: Nature Communications, ISSN 2041-1723, vol. 11, no. 4381, pp. 1-15 (url, local PDF, obd, bibtex)
- CUNI System for the WMT19 Robustness Task. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 738-742, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, local PDF, local PDF, obd, bibtex)
- Domain Adaptation of Document-Level NMT in IWSLT19. In: Proceedings of the 16th International Workshop on Spoken Language Translation, pp. 1-7, Karlsruhe Institute of Technology, Karlsruhe, Germany (url, obd, bibtex)
- English-Czech Systems in WMT19: Document-Level Transformer. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 342-348, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (pdf, local PDF, obd, bibtex)
- Hluboké učení v automatické analýze českého textu. In: Slovo a slovesnost, ISSN 0037-7031, vol. 80, no. 4, pp. 306-327 (obd, bibtex)
- Solving Three Czech NLP Tasks End-to-End with Neural Models. In: Proceedings of the 18th conference ITAT 2018: Slovenskočeský NLP workshop (SloNLP 2018), pp. 138-143, CreateSpace Independent Publishing Platform, Košice, Slovakia, ISBN 978-1727267198 (pdf, local PDF, local PDF, obd, bibtex)
- CUNI Transformer Neural MT System for WMT18. In: Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Tasks, pp. 486-491, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-81-0 (pdf, obd, bibtex)
- Machine Translation Using Syntactic Analysis (PhD thesis). In: (pdf, bibtex)
- Training Tips for the Transformer Model. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 110, pp. 43-70 (pdf, obd, bibtex)
- CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-21, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-82-7 (pdf, local PDF, obd, bibtex)
- Udapi: Universal API for Universal Dependencies. In: NoDaLiDa 2017 Workshop on Universal Dependencies, pp. 96-101, Göteborgs universitet, Göteborg, Sweden, ISBN 978-91-7685-501-0 (pdf, obd, bibtex)
- CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-19, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-70-8 (pdf, local PDF, obd, bibtex)
- Tools and Guidelines for Principled Machine Translation Development. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1877-1882, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (pdf, local PDF, obd, bibtex)
- CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered. In: Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Lecture Notes in Computer Science, ISSN 0302-9743, 9924, pp. 231-238, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-319-45509-9 (url, obd, bibtex)
- Findings of the 2016 Conference on Machine Translation (WMT16). In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 131-198, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, obd, bibtex)
- SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 435-441, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, obd, bibtex)
- QTLeap WSD/NED Corpora: Semantic Annotation of Parallel Corpora in Six Languages. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 3023-3030, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (obd, bibtex)
- TectoMT – a deep-linguistic core of the combined Chimera MT system. In: Baltic Journal of Modern Computing, ISSN 2255-8942, vol. 4, no. 2, pp. 377-377 (pdf, local PDF, local PDF, local PDF, obd, bibtex)
- Moses & Treex Hybrid MT Systems Bestiary. In: Proceedings of the 2nd Deep Machine Translation Workshop, pp. 1-10, ÚFAL MFF UK, Praha, Czechia, ISBN 978-80-88132-02-8 (url, local PDF, local PDF, obd, bibtex)
- Dictionary-based Domain Adaptation of MT Systems without Retraining. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 449-455, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, obd, bibtex)
- Using MT-ComparEval. In: Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem, pp. 76-82, LREC, Portorož, Slovenia (pdf, obd, bibtex)
- Using Parallel Texts and Lexicons for Verbal Word Sense Disambiguation. In: Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015), pp. 82-90, Uppsala University, Uppsala, Sweden, ISBN 978-91-637-8965-6 (pdf, local PDF, obd, bibtex)
- New Language Pairs in TectoMT. In: Proceedings of the 10th Workshop on Machine Translation, pp. 98-104, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-32-7 (pdf, local PDF, obd, bibtex)
- MT-ComparEval: Graphical evaluation interface for Machine Translation development. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 104, pp. 63-74 (pdf, obd, bibtex)
- Translation Model Interpolation for Domain Adaptation in TectoMT. In: Proceedings of the 1st Deep Machine Translation Workshop, pp. 89-96, ÚFAL MFF UK, Praha, Czechia, ISBN 978-80-904571-7-1 (url, local PDF, local PDF, obd, bibtex)
- Adaptation of machine translation for multilingual information retrieval in medical domain. In: Artificial Intelligence in Medicine, ISSN 0933-3657, vol. 61, no. 3, pp. 165-185 (url, obd, bibtex)
- HamleDT 2.0: Thirty Dependency Treebanks Stanfordized. In: Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 2334-2341, European Language Resources Association, Reykjavík, Iceland, ISBN 978-2-9517408-8-4 (pdf, local PDF, local PDF, obd, bibtex)
- CUNI in WMT14: Chimera Still Awaits Bellerophon. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 195-200, Association for Computational Linguistics, Baltimore, MD, USA, ISBN 978-1-941643-17-4 (pdf, local PDF, local PDF, obd, bibtex)
- HamleDT: Harmonized Multi-Language Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 48, no. 4, pp. 601-637 (url, local PDF, obd, bibtex)
- Khresmoi Professional: Multilingual Semantic Search for Medical Professionals. In: Proceedings of the ACM SIGIR Workshop on Health Search and Discovery: Helping Users and Advancing Medicine, pp. 31-34, Microsoft Research, Cambridge, UK (url, local PDF, obd, bibtex)
- PhraseFix: Statistical Post-Editing of TectoMT. In: Proceedings of the Eight Workshop on Statistical Machine Translation, pp. 141-147, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-57-2 (obd, bibtex)
- Cross-language Study on Influence of Coordination Style on Dependency Parsing Performance (technical report). In: (pdf, local PDF, bibtex)
- Coordination Structures in Dependency Treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-50-3 (pdf, local PDF, local PDF, local PDF, obd, bibtex)
- The Joy of Parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3921-3928, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, obd, bibtex)
- Formemes in English-Czech Deep Syntactic MT. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 267-274, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (pdf, local PDF, obd, bibtex)
- Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors. In: Proceedings of Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-6), ACL, pp. 39-48, Association for Computational Linguistics, Jeju, Korea, ISBN 978-1-937284-38-1 (pdf, local PDF, local PDF, obd, bibtex)
- HamleDT: To Parse or Not to Parse?. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 2735-2741, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, local PDF, obd, bibtex)
- A Grain of Salt for the WMT Manual Evaluation. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 1-11, Association for Computational Linguistics, Edinburgh, UK, ISBN 978-1-937284-12-1 (pdf, local PDF, local PDF, obd, bibtex)
- Influence of Parser Choice on Dependency-Based MT. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 433-439, Association for Computational Linguistics, Edinburgh, UK, ISBN 978-1-937284-12-1 (obd, bibtex)
- Maximum Entropy Translation Model in Dependency-Based MT Framework. In: Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 201-201, Association for Computational Linguistics, Uppsala, Sweden, ISBN 978-1-932432-71-8 (pdf, obd, bibtex)
- English-Czech Machine Translation Using TectoMT. In: WDS 2010 Proceedings of Contributed Papers, pp. 88-93, Matfyzpress, Charles University, Praha, Czechia, ISBN 978-80-7378-139-2 (pdf, local PDF, obd, bibtex)
- Perplexity of n-gram and Dependency Language Models. In: Text, Speech and Dialogue. 13th International Conference, TSD 2010, Brno, Czech Republic, September 6-10, 2010. Proceedings, Lecture Notes in Computer Science, ISSN 0302-9743, 6231, pp. 173-180, Springer, Berlin / Heidelberg, ISBN 978-3-642-15759-2 (local PDF, local PDF, obd, bibtex)
- TectoMT: Modular NLP Framework. In: Proceedings of the 7th International Conference on Advances in Natural Language Processing (IceTAL 2010), Lecture Notes in Computer Science, ISSN 0302-9743, 6233, pp. 293-304, Springer, Berlin / Heidelberg, ISBN 978-3-642-14769-2 (local PDF, local PDF, obd, bibtex)
- English-Czech MT in 2008. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 125-129, Association for Computational Linguistics, Athina, Greece (pdf, local PDF, bibtex)
- Ways to Improve the Quality of English-Czech Machine Translation (masters thesis). In: (pdf, local PDF, bibtex)
- Improving English-Czech Tectogrammatical MT. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 92, pp. 1-20 (pdf, bibtex)
- Hidden Markov Tree Model in Dependency-based Machine Translation. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 145-148, Association for Computational Linguistics, Suntec, Singapore, ISBN 978-1-932432-61-9 (pdf, local PDF, obd, bibtex)
For citations see Google Scholar or ORCID.
Students
- 2010–2011 Master student Amir Kamran (Hybrid MT Approaches for Low-Resource Languages)
- 2011–2012 Bachelor student Michal Koutný (Word prediction using language models)
- 2011–2013 Bachelor student Ondřej Klejch (Tool for comparison and evaluation of machine translation), see wmt.ufal.cz
- 2012–2013 Bachelor student Michal Sedlák (Web Interface for the Treex Framework), see Treex::Web
- 2018–2019 Master student Meisyarah Dwiastuti (Indonesian-English Neural Machine Translation)
Other talks
- Strojový překlad a umělá inteligence, Počítač ve škole, online konference, 2020-09-17 [pdf]
- Strojový překlad a umělá inteligence, Jeden den s informatikou, Praha, 2020-01-29 [pdf]
- Biases and perils of MT evaluation, Workshop on document level MT evaluation, Luxembourg, 2019-11-19 [pdf]
- LaTeX, Odborné vyjadřování a styl (NPOZ 009), Prague, 2019-03-25 [pdf] (in Czech)
- Strojový překlad a umělá inteligence, Jeden den s informatikou, Praha, 2018-02-07 [pdf]
- Strojový překlad a umělá inteligence, Gymnázium Vincence Makovského, Nové Město na Moravě, 2018-11-22 [pdf]
- Working with Universal Dependencies, ÚFAL Monday Seminary, 2016-03-13 [pdf]
- Universal Dependencies, UDPipe, Udapi, Prague, TextLink winter school, 2016-02-09 [pdf]
- WMT2016 IT-Task overview, Berlin, 2016-08-11 slides [pdf]
- My profile, Sedlec, 2015-09-14 3 slides [pdf]
- Machine Learning for Deep-syntactic MT, Seminar on the 35th Anniversary of the Cooperation between Charles University in Prague and Hamburg University, Prague, 2015-09-11, [pdf]
- Deep Syntactic MT and TectoMT (lecture) [pdf] and Treex (lab) [pdf], Machine Translation Marathon 2015, Prague, 2015-09-10
- Machine Translation and Discriminative Models, ÚFAL Monday Seminary, 2015-03-23 [pdf]
- AMR, Sedlec, 2014-09-16 slides [pdf]
- My profile, Sedlec, 2014-09-15 3 slides [pdf]
- Significance and Hypothesis testing, Language Data Resources (NPFL070), Prague, 2014-05-13 [pdf]
- Treex vs. NIF, NIF workshop, University of Economics, Prague, 2013-10-09 [pdf]
- Coordination Structures in Dependency Treebanks, Příchovice, 2013-09-19 [pdf]
- My profile, Příchovice, 2013-09-19 3 slides [pdf]
- Treex Tutorial, Machine Translation Marathon 2013, Prague, 2013-09-12
- TectoMT: Deep Syntactic Transfer, Machine Translation Marathon 2013, Prague, 2013-09-12 [pdf]
- Machine Translation Zoo, ÚFAL Monday Seminary, 2013-05-06 [pdf]
- NSF talk (Treex applications and Deep-syntactic Machine Translation), Prague, 2012-03-07 [pdf]
- Treex Tutorial: Introduction, CLARA Winter School on New Developments in Computational Linguistics, Prague, 2012-02-16 [pdf]
- TectoMT: Machine Translation System, FEAST, Universität des Saarlandes, Germany, 2011-11-16 [pdf]
- From the Jungle to a Park: Harmonizing Dependency Treebanks of 30 Languages – Coordination styles and transformations, ÚFAL Monday Seminary, 2011-10-31 [pdf]
- Treex: Modular NLP Framework, Malá Skála, 2011-09-17 [pdf]
- My profile and MT results, Malá Skála, 2011-09-15 5 slides [pdf]
- TectoMT Machine Translation System, Zdeněk Žabokrtský and Martin Popel, META-NET Course on Advanced MT Resources, Prague, 2010-12-17 [pdf]
- Prague Dependency Treebank Tutorial: Technology. Zdeněk Žabokrtský and Martin Popel: Introduction to TectoMT, CLARA Course on Treebank Annotation, Prague, 2010-12-15 [pdf]
- Treex: Modular NLP Framework, LATE-lunch talk at DFKI Language Technology Lab, Germany, 2010-11-04 [pdf]
- TectoMT: Machine Translation System, informal talk at Universität des Saarlandes, Germany, 2010-11-04 [pdf]
- Strojový překlad přes tektogramatickou rovinu v systému TectoMT, Pondělní seminář ÚFALu, 2010-03-22 (in Czech) [pdf]
- Deep Syntactic Machine Translation with Hidden Markov Tree Models, 5th PIRE meeting, 2009-12-12 [pdf]
- Kombinování překladových systémů, Seminar for Ph.D. Students, 2009-11-03 (in Czech) [pdf]