Michal Novák

office
N233
email
mnovak@ufal.mff.cuni.cz
phone
+420 951 552 954
address
IMPAKT – „N“
V Holešovičkách 747/2
180 00 Praha 8
Czech Republic

Main Research Interests

  • coreference / anaphora resolution
  • machine translation
  • machine learning

Projects

Current

  • MASAPI - Multilingual assistant for searching, analysing and processing information and decision support
  • LINDAT-CLARIAH-CZ - Language Resources and Digital Arts and Humanities Research Infrastructure
  • CorefUD - Coreference in Universal Dependencies

Former

  • EuroMatrix+
  • GAUK 4226/2011 – Utilization of coreference in machine translation
  • Khresmoi – Medical information retrieval (working on Machine Translation)
  • QTLeap - Quality Translation by Deep Language Engineering Approaches
  • GAUK 3389/2015 - Cross-lingual approaches to coreference resolution
  • GAČR 16-05394S - Structure of coreferential chains in parallel language data
  • NAKI II DG16P02B016 - Automatic Evaluation of Text Coherence in Czech
  • Bergamot - Browser-based Multilingual Translation

Curriculum Vitae

  • 2018 Ph.D. (Doctoral degree) in Computational Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.
    • Thesis: Coreference from the Cross-lingual Perspective
  • 2010 Mgr. (Master's degree) in Computational Linguistics, Faculty of Mathematics and Physics, Charles University in Prague.
    • Thesis: Machine Learning Approach to Anaphora Resolution
  • 2008 Bc. (Bachelor's degree) in Computer Science, Faculty of Mathematics and Physics, Charles University in Prague.
    • Thesis: Vizualizace PML souborů

Selected Bibliography

  1. Josef Jon, Dušan Variš, Michal Novák, Joao Paulo Aires, Ondřej Bojar (2023): Negative Lexical Constraints in Neural Machine Translation. In: Proceedings of Machine Translation Summit XIX vol. 1: Research Track, pp. 372-384, Asia-Pacific Association for Machine Translation (AAMT), Kyoto, Japan, ISBN 978-4-9913461-0-1 (pdf, bibtex)
  2. Juntao Yu, Michal Novák, Abdulrahman Aloraini, Nafise Sadat Moosavi, Silviu Paun, Sameer Pradhan, Massimo Poesio (2023): The Universal Anaphora Scorer 2.0. In: Proceedings of the 15th International Conference on Computational Semantics, pp. 183-194, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-959429-74-6 (url, bibtex)
  3. Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman (2023): Findings of the Second Shared Task on Multilingual Coreference Resolution. In: Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution, pp. 1-18, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-02-5 (pdf, local PDF, bibtex)
  4. Rachel Bawden, Ondřej Bojar, Rajen Chatterjee, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Rebecca Knowles, Tom Kocmi, Philipp Koehn, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Matteo Negri, Michal Novák, Martin Popel, Maja Popović, Mariya Shmatova, Marco Turchi (2022): Findings of the 2022 Conference on Machine Translation (WMT22). In: Proceedings of the Seventh Conference on Machine Translation, pp. 1-34, Association for Computational Linguistics, Stroudsburg, PA, USA (pdf, local PDF, bibtex)
  5. Pavel Kasík, Jindřich Libovický, Jindřich Helcl, Michal Novák (2022): Český překladač se naučil ukrajinsky rychle. Jen někdy plete jména měst. In: Seznam Zprávy, pp. 1-2 (url, bibtex)
  6. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman (2022): CorefUD 1.0: Coreference Meets Universal Dependencies. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 4859-4872, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (pdf, bibtex)
  7. Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, Yilun Zhu (2022): Findings of the Shared Task on Multilingual Coreference Resolution. In: Proceedings of the CRAC 2022 Shared Task on Multilingual Coreference Resolution, pp. 1-17, Association for Computational Linguistics, Gyeongju, Korea (url, local PDF, local PDF, bibtex)
  8. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Multilingual Low-Resource Translation for Indo-European Languages Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 354-361, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
  9. Josef Jon, Michal Novák, João Paulo de Souza Aires, Dušan Variš, Ondřej Bojar (2021): CUNI systems for WMT21: Terminology translation Shared Task. In: Proceedings of the Sixth Conference on Machine Translation, pp. 828-834, Association for Computational Linguistics, Online, ISBN 978-1-954085-94-7 (url, local PDF, bibtex)
  10. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Daniel Zeman (2021): Is one head enough? Mention heads in coreference annotations compared with UD-style heads. In: Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021), pp. 101-114, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-14-8 (pdf, local PDF, bibtex)
  11. Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Daniel Zeman (2021): Coreference meets Universal Dependencies – a pilot experiment on harmonizing coreference datasets for 11 languages (technical report). In: (pdf, local PDF, bibtex)
  12. Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman (2021): Do UD Trees Match Mention Spans in Coreference Annotations?. In: Findings of the Association for Computational Linguistics: EMNLP 2021, pp. 3570-3576, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-10-0 (url, local PDF, bibtex)
  13. Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya (2021): Backtranslation Feedback Improves User Confidence in MT, Not Quality. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 151-161, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-46-6 (url, local PDF, bibtex)
  14. Vilém Zouhar, Michal Novák (2020): Extending Ptakopět for Machine Translation User Interaction Experiments. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 129-142 (pdf, local PDF, bibtex)
  15. Michal Novák, Jiří Mírovský, Kateřina Rysová, Magdaléna Rysová (2019): Exploiting Large Unlabeled Data in Automatic Evaluation of Coherence in Czech. In: Proceedings of the 22nd International Conference on Text, Speech and Dialogue - TSD 2019, Lecture Notes in Computer Science, ISSN 0302-9743, 11697, pp. 197-210, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-030-27946-2 (url, bibtex)
  16. Kateřina Rysová, Magdaléna Rysová, Michal Novák, Jiří Mírovský, Eva Hajičová (2019): EVALD – a Pioneer Application for Automated Essay Scoring in Czech. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 113, pp. 9-30 (url, local PDF, bibtex)
  17. Magdaléna Rysová, Kateřina Rysová, Jiří Mírovský, Michal Novák (2019): Coherence Errors in Learners’ Essays and a Possibility of Their Improvement through EVALD (Automated Evaluator of Discourse). In: Proceedings of the 11th Annual International Conference on Education and New Learning Technologies (EDULEARN 2019), pp. 6761-6768, IATED Academy, Palma, Spain, ISBN 978-84-09-12031-4 (url, bibtex)
  18. Tereza Vojtěchová, Michal Novák, Miloš Klouček, Ondřej Bojar (2019): SAO WMT19 Test Suite: Machine Translation of Audit Reports. In: Fourth Conference on Machine Translation - Proceedings of the Conference, pp. 680-692, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-27-7 (url, bibtex)
  19. Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk (2018): Analysis of coreferential expressions in PAWS. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, vol. 2018, no. 17, 2018, pp. 512-521 (pdf, bibtex)
  20. Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk (2018): PAWS: A Multi-lingual Parallel Treebank with Anaphoric Relations. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 68-76, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-13-1 (url, bibtex)
  21. Michal Novák (2018): Coreference from the Cross-lingual Perspective. In: , ISBN 978-80-88132-06-6 (bibtex)
  22. Michal Novák (2018): A Study on Bilingually Informed Coreference Resolution. In: Proceedings of the 18th conference ITAT 2018: Slovenskočeský NLP workshop (SloNLP 2018), pp. 130-137, CreateSpace Independent Publishing Platform, Košice, Slovakia, ISBN 978-1727267198 (pdf, bibtex)
  23. Michal Novák (2018): Coreference from the Cross-lingual Perspective (PhD thesis). In: (url, local PDF, bibtex)
  24. Michal Novák (2018): A Fine-grained Large-scale Analysis of Coreference Projection. In: Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 77-86, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-13-1 (url, bibtex)
  25. Michal Novák, Jiří Mírovský, Kateřina Rysová, Magdaléna Rysová (2018): Topic–Focus Articulation: A Third Pillar of Automatic Evaluation of Text Coherence. In: Advances in Computational Intelligence (LNAI 11289): 17th Mexican International Conference on Artificial Intelligence, MICAI 2018, Proceedings, Part II, pp. 92-105, Springer, Switzerland, ISBN 978-3-030-04497-8 (url, bibtex)
  26. Magdaléna Rysová, Kateřina Rysová, Jiří Mírovský, Michal Novák (2018): Practicing Students‘ Writing Skills through eLearning: Automated Evaluation of Text Coherence in Czech. In: EDULEARN18 Proceedings, pp. 1963-1970, IATED Academy, Valencia, Spain, ISBN 978-84-09-02709-5 (url, bibtex)
  27. Michal Novák (2017): Coreference Resolution System Not Only for Czech. In: Proceedings of the 17th conference ITAT 2017: Slovenskočeský NLP workshop (SloNLP 2017), pp. 193-200, CreateSpace Independent Publishing Platform, Praha, Czechia, ISBN 978-1974274741 (pdf, bibtex)
  28. Michal Novák, Anna Nedoluzhko, Zdeněk Žabokrtský (2017): Projection-based Coreference Resolution Using Deep Syntax. In: Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017), pp. 56-64, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-945626-46-3 (pdf, bibtex)
  29. Michal Novák, Kateřina Rysová, Magdaléna Rysová, Jiří Mírovský (2017): Incorporating Coreference to Automatic Evaluation of Coherence in Essays. In: Statistical Language and Speech Processing, pp. 58-69, Springer International Publishing, Cham, Switzerland, ISBN 978-3-319-68455-0 (pdf, local PDF, bibtex)
  30. Kateřina Rysová, Magdaléna Rysová, Jiří Mírovský, Michal Novák (2017): Introducing EVALD – Software Applications for Automatic Evaluation of Discourse in Czech. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 634-641, INCOMA Ltd., Šumen, Bulgaria, ISBN 978-954-452-048-9 (pdf, bibtex)
  31. Ondřej Bojar, Ondřej Dušek, Tom Kocmi, Jindřich Libovický, Michal Novák, Martin Popel, Roman Sudarikov, Dušan Variš (2016): CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered. In: Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Lecture Notes in Computer Science, ISSN 0302-9743, 9924, pp. 231-238, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-319-45509-9 (url, bibtex)
  32. Anna Nedoluzhko, Michal Novák, Silvie Cinková, Marie Mikulová, Jiří Mírovský (2016): Coreference in Prague Czech-English Dependency Treebank. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 169-176, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (url, local PDF, bibtex)
  33. Anna Nedoluzhko, Anna Schwarz (Khoroshkina), Michal Novák (2016): Possessives in Parallel English‑Czech-Russian Texts. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, 15, pp. 483-497 (pdf, local PDF, bibtex)
  34. Michal Novák (2016): Pronoun Prediction with Linguistic Features and Example Weighing. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 602-608, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, bibtex)
  35. Rudolf Rosa, Roman Sudarikov, Michal Novák, Martin Popel, Ondřej Bojar (2016): Dictionary-based Domain Adaptation of MT Systems without Retraining. In: Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers, pp. 449-455, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-10-4 (pdf, bibtex)
  36. Ondřej Dušek, Luís Gomes, Michal Novák, Martin Popel, Rudolf Rosa (2015): New Language Pairs in TectoMT. In: Proceedings of the 10th Workshop on Machine Translation, pp. 98-104, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-32-7 (pdf, local PDF, bibtex)
  37. Anna Nedoluzhko, Svetlana Toldova, Michal Novák (2015): Coreference chains in Czech, English and Russian: Preliminary findings. In: Computational Linguistics and Intellectual Technologies, ISSN 2221-7932, vol. 14, no. 21, pp. 474-486 (pdf, bibtex)
  38. Michal Novák, Anna Nedoluzhko (2015): Correspondences between Czech and English Coreferential Expressions. In: Discours: Revue de linguistique, psycholinguistique et informatique., ISSN 1963-1723, 16, pp. 1-41 (url, bibtex)
  39. Michal Novák, Dieke Oele, Gertjan van Noord (2015): Comparison of Coreference Resolvers for Deep Syntax Translation. In: Proceedings of the Second Workshop on Discourse in Machine Translation, pp. 17-23, Association for Computational Linguistics, Lisboa, Portugal, ISBN 978-1-941643-32-7 (url, bibtex)
  40. Rudolf Rosa, Ondřej Dušek, Michal Novák, Martin Popel (2015): Translation Model Interpolation for Domain Adaptation in TectoMT. In: Proceedings of the 1st Deep Machine Translation Workshop, pp. 89-96, ÚFAL MFF UK, Praha, Czechia, ISBN 978-80-904571-7-1 (url, local PDF, local PDF, bibtex)
  41. Ondřej Dušek, Jan Hajič, Jaroslava Hlaváčová, Michal Novák, Pavel Pecina, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová, Daniel Zeman (2014): Machine Translation of Medical Texts in the Khresmoi Project. In: Proceedings of the Ninth Workshop on Statistical Machine Translation, pp. 221-228, Association for Computational Linguistics, Baltimore, MD, USA, ISBN 978-1-941643-17-4 (pdf, local PDF, local PDF, bibtex)
  42. Michal Novák, Zdeněk Žabokrtský (2014): Cross-lingual Coreference Resolution of Pronouns. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 14-24, Dublin City University and Association for Computational Linguistics, Dublin, Ireland, ISBN 978-1-941643-26-6 (pdf, bibtex)
  43. Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth J.F. Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová (2014): Adaptation of machine translation for multilingual information retrieval in medical domain. In: Artificial Intelligence in Medicine, ISSN 0933-3657, vol. 61, no. 3, pp. 165-185 (url, bibtex)
  44. Niraj Aswani, Thomas Beckers, Erich Birngruber, Célia Boyer, Andreas Burner, Jakub Bystroň, Khalid Choukri, Sarah Cruchet, Hamish Cunningham, Jan Dědek, Ljiljana Dolamic, René Donner, Ondřej Dušek, Sebastian Dungs, Ivan Eggel, Antonio Foncubierta, Norbert Fuhr, Adam Funk, Alba García Seco de Herrera, Arnaud Gaudinat, Georgi Georgiev, Julien Gobeill, Lorraine Goeuriot, Paz Gomez, Mark A. Greenwood, Manfred Gschwandtner, Allan Hanbury, Jan Hajič, Jaroslava Hlaváčová, Markus Holzer, Gareth J.F. Jones, Blanca Jordán, Matthias Jordan, Klemens Kaderk, Franz Kainberger, Liadh Kelly, Sascha Kriewel, Marlene Kritz, Georg Langs, Nolan Lawson, Johannes Leveling, David Mareček, Dimitrios Markonis, Iván Martínez, Vassil Momtchev, Alexandre Masselot, Hélène Mazo, Henning Müller, Michal Novák, Johann Petrak, João Palotti, Pavel Pecina, Konstantin Pentchev, Deyan Peychev, Natalia Pletneva, Martin Popel, Diana Pottecher, Angus Roberts, Rudolf Rosa, Patrick Ruch, Alexander Sachs, Matthias Samwald, Priscille Schneller, Veronika Stefanov, Aleš Tamchyna, Miguel Angel Tinte, Zdeňka Urešová, Alejandro Vargas, Dina Vishnyakova (2013): Khresmoi Professional: Multilingual Semantic Search for Medical Professionals. In: Proceedings of the ACM SIGIR Workshop on Health Search and Discovery: Helping Users and Advancing Medicine, pp. 31-34, Microsoft Research, Cambridge, UK (url, local PDF, bibtex)
  45. Anna Nedoluzhko, Jiří Mírovský, Michal Novák (2013): A Coreferentially annotated Corpus and Anaphora Resolution for Czech. In: Computational Linguistics and Intellectual Technologies, pp. 467-475, ABBYY, Moskva, Russia, ISBN 978-1-937284-58-9 (local PDF, bibtex)
  46. Michal Novák, Anna Nedoluzhko, Zdeněk Žabokrtský (2013): Translation of "It" in a Deep Syntax Framework. In: 51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Workshop on Discourse in Machine Translation, pp. 51-59, Omnipress, Inc., Sofija, Bulgaria, ISBN 978-1-937284-68-8 (pdf, bibtex)
  47. Michal Novák, Zdeněk Žabokrtský, Anna Nedoluzhko (2013): Two Case Studies on Translating Pronouns in a Deep Syntax Framework. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, pp. 1037-1041, Asian Federation of Natural Language Processing, Nagoya, Japan, ISBN 978-4-9907348-0-0 (pdf, bibtex)
  48. Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, Michal Novák, Martin Popel, Aleš Tamchyna (2012): The Joy of Parallelism with CzEng 1.0. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 3921-3928, European Language Resources Association, İstanbul, Turkey, ISBN 978-2-9517408-7-7 (url, local PDF, bibtex)
  49. Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin Majliš, Michal Novák, David Mareček (2012): Formemes in English-Czech Deep Syntactic MT. In: Proceedings of the Seventh Workshop on Statistical Machine Translation, pp. 267-274, Association for Computational Linguistics, Montréal, Canada, ISBN 978-1-937284-20-6 (pdf, local PDF, bibtex)
  50. Kateřina Veselovská, Giang Linh Nguy, Michal Novák (2012): Using Czech-English Parallel Corpora in Automatic Identification of It. In: The Fifth Workshop on Building and Using Comparable Corpora, pp. 112-120, European Language Resources Association, İstanbul, Turkey (local PDF, bibtex)
  51. Giang Linh Nguy, Michal Novák, Anna Nedoluzhko (2011): Coreference Resolution in the Prague Dependency Treebank (technical report). In: , pp. 1-66 (pdf, bibtex)
  52. Michal Novák (2011): Utilization of Anaphora in Machine Translation. In: WDS'11 Proceedings of Contributed Papers, Part I, pp. 155-160, Matfyzpress, Praha, Czechia, ISBN 978-80-7378-184-2 (pdf, bibtex)
  53. Michal Novák, Zdeněk Žabokrtský (2011): Resolving Noun Phrase Coreference in Czech. In: Lecture Notes in Computer Science, ISSN 0302-9743, 7099, pp. 24-34 (url, bibtex)
  54. Michal Novák (2010): Machine Learning Approach to Anaphora Resolution (masters thesis). In: (pdf, bibtex)
  55. Hana Klempová, Michal Novák, Peter Fabian, Jan Ehrenberger, Ondřej Bojar (2009): Získávání paralelních textů z webu. In: Informačné Technológie – Aplikácie a Teória. Zborník príspevkov, ITAT 2009, pp. 47-54, PONT s.r.o., Seňa, Slovakia, ISBN 978-80-970179-1-0 (local PDF, bibtex)