Milan Straka

office
Room 420
email
straka@ufal.mff.cuni.cz
phone
(+420) 95155 4361

Main Research Interests

  • Machine Learning
    • Artificial Neural Networks
    • Deep Learning
    • Structured Prediction
    • Bayesian Nonparametrics Modelling and Unsupervised Learning
  • NLP Tools
    • POS Tagging
    • Dependency Parsing
    • Named Entity Recognition and Linking

Projects

Curriculum Vitae

Teaching

Selected Bibliography

Papers

  1. Jiří Mayer, Milan Straka, Jan Hajič, jr., Pavel Pecina (2024): Practical End-to-End Optical Music Recognition for Pianoform Music. In: Document Analysis and Recognition -- ICDAR 2024, pp. 55-73, Springer International Publishing, Cham, Switzerland, ISBN 978-3-030-86333-3 (url, local PDF, bibtex)
  2. Michal Novák, Barbora Dohnalová, Miloslav Konopík, Anna Nedoluzhko, Martin Popel, Ondřej Pražák, Jakub Sido, Milan Straka, Zdeněk Žabokrtský, Daniel Zeman (2024): Findings of the Third Shared Task on Multilingual Coreference Resolution. In: Proceedings of The Seventh Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 78-96, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-171-1 (url, local PDF, bibtex)
  3. Milan Straka (2024): CorPipe at CRAC 2024: Predicting Zero Mentions from Raw Text. In: Proceedings of The Seventh Workshop on Computational Models of Reference, Anaphora and Coreference, pp. 97-106, Association for Computational Linguistics, Kerrville, TX, USA, ISBN 979-8-89176-171-1 (url, local PDF, bibtex)
  4. Milan Straka, Jana Straková (2024): Open-Source Web Service with Morphological Dictionary--Supplemented Deep Learning for Morphosyntactic Analysis of Czech. In: 27th International Conference on Text, Speech and Dialogue, pp. 279-290, Springer, Cham, Switzerland, ISBN 978-3-031-70563-2 (url, local PDF, bibtex)
  5. Milan Straka, Jana Straková, Federica Gamba (2024): ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin. In: Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pp. 207-214, ELRA and ICCL, Torino, Italia, ISBN 978-2-493814-46-3 (pdf, local PDF, bibtex)
  6. Vojtěch Vančura, Pavel Kordík, Milan Straka (2024): beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems. In: Proceedings of the 18th ACM Conference on Recommender Systems, pp. 1102-1107, Association for Computing Machinery, New York, NY, United States, ISBN 979-8-4007-0505-2 (url, local PDF, bibtex)
  7. Josef Vonášek, Milan Straka, Rostislav Krč, Lenka Lasoňová, Ekaterina Egorova, Jana Straková, Jakub Náplava (2024): CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance Ranking. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1221-1231, Association for Computing Machinery, New York, NY, USA, ISBN 9798400704314 (url, local PDF, bibtex)
  8. Ian Roberts, Andres Garcia-Silva, Cristian Berrìo Aroca, José Manuel Gómez-Pérez, Miroslav Jánoší, Dimitris Galanis, Rémi Callizano, Andis Lagzdiņš, Milan Straka, Ulrich Germann (2023): Language Technology Tools and Services. In: European Language Grid: A Language Technology Platform for Multilingual Europe, pp. 131-150, Springer Nature Switzerland AG, Cham, Switzerland, ISBN 978-3-031-17257-1 (url, bibtex)
  9. Milan Straka (2023): ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution. In: Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution, pp. 41-51, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-955917-02-5 (url, local PDF, bibtex)
  10. Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajič (2022): Quality and Efficiency of Manual Annotation: Pre-annotation Bias. In: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pp. 2909-2918, European Language Resources Association, Marseille, France, ISBN 979-10-95546-72-6 (url, local PDF, bibtex)
  11. Jakub Náplava, Milan Straka, Jana Straková, Alexandr Rosen (2022): Czech Grammar Error Correction with a Large and Diverse Corpus. In: Transactions of the Association for Computational Linguistics, ISSN 2307-387X, 10, pp. 452-467 (url, local PDF, bibtex)
  12. Milan Straka, Jana Straková (2022): ÚFAL CorPipe at CRAC 2022: Effectivity of Multilingual Models for Coreference Resolution. In: Proceedings of the CRAC 2022 Shared Task on Multilingual Coreference Resolution, pp. 28-37, Association for Computational Linguistics, Gyeongju, Korea (url, local PDF, bibtex)
  13. Jakub Náplava, Martin Popel, Milan Straka, Jana Straková (2021): Understanding Model Robustness to User-generated Noisy Texts. In: Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT 2021), pp. 340-350, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-90-9 (url, local PDF, bibtex)
  14. Jakub Náplava, Milan Straka, Jana Straková (2021): Diacritics Restoration using BERT with Analysis on Czech language. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 116, pp. 27-42 (pdf, local PDF, bibtex)
  15. David Samuel, Milan Straka (2021): ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5. In: Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT 2021), pp. 483-492, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-90-9 (url, local PDF, bibtex)
  16. Milan Straka, Jakub Náplava, Jana Straková (2021): Character Transformations for Non-Autoregressive GEC Tagging. In: Proceedings of the 7th Workshop on Noisy User-generated Text (W-NUT 2021), pp. 417-422, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-954085-90-9 (url, local PDF, bibtex)
  17. Milan Straka, Jakub Náplava, Jana Straková, David Samuel (2021): RobeCzech: Czech RoBERTa, a Monolingual Contextualized Language Representation Model. In: 24th International Conference on Text, Speech and Dialogue, pp. 197-209, Springer, Cham, Switzerland, ISBN 978-3-030-83526-2 (url, local PDF, bibtex)
  18. Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková (2020): Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 5208-5218, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4 (url, local PDF, bibtex)
  19. Kateřina Macková, Milan Straka (2020): Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer. In: 23rd International Conference on Text, Speech and Dialogue, pp. 171-179, Springer, Cham, Switzerland, ISBN 978-3-030-58322-4 (url, local PDF, bibtex)
  20. David Samuel, Milan Straka (2020): ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN. In: Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pp. 53-64, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-64-4 (url, local PDF, bibtex)
  21. Milan Straka, Jana Straková (2020): UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings. In: Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages, pp. 124-129, European Language Resources Association (ELRA), Marseille, France, ISBN 979-10-95546-53-5 (url, local PDF, bibtex)
  22. Daniel Kondratyuk, Milan Straka (2019): 75 Languages, 1 Model: Parsing Universal Dependencies Universally. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2779-2795, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-90-1 (url, local PDF, bibtex)
  23. Jakub Náplava, Milan Straka (2019): Grammatical Error Correction in Low-Resource Scenarios. In: Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 346-356, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-84-0 (url, local PDF, bibtex)
  24. Jakub Náplava, Milan Straka (2019): CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction. In: Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 183-190, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-34-5 (url, local PDF, bibtex)
  25. Stephan Oepen, Omri Abend, Jan Hajič, Daniel Hershcovich, Marco Kuhlmann, Nianwen Xue, Jayeol Chun, Milan Straka, Zdeňka Urešová, Tim O'Gorman (2019): MRP 2019: Cross-Framework Meaning Representation Parsing. In: Proceedings of the CoNLL 2019 Shared Task: Cross-Framework Meaning Representation Parsing, pp. 1-27, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-60-4 (url, local PDF, local PDF, bibtex)
  26. Milan Straka, Jana Straková (2019): ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task. In: Proceedings of the CoNLL 2019 Shared Task: Cross-Framework Meaning Representation Parsing, pp. 127-137, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-60-4 (url, local PDF, bibtex)
  27. Milan Straka, Jana Straková, Jan Hajič (2019): Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER. In: Proceedings of the 22nd International Conference on Text, Speech and Dialogue - TSD 2019, Lecture Notes in Computer Science, ISSN 0302-9743, 11697, pp. 137-150, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-030-27946-2 (url, local PDF, bibtex)
  28. Milan Straka, Jana Straková, Jan Hajič (2019): Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing (Electronic). In: ArXiv.org Computing Research Repository, ISSN 2331-8422, 1904.02099 (url, local PDF)
  29. Milan Straka, Jana Straková, Jan Hajič (2019): UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging. In: Proceedings of the 16th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 95-103, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-36-9 (pdf, local PDF, bibtex)
  30. Jana Straková, Milan Straka, Jan Hajič (2019): Neural Architectures for Nested NER through Linearization. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5326-5331, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-950737-48-2 (pdf, local PDF, bibtex)
  31. Jana Straková, Milan Straka, Jan Hajič, Martin Popel (2019): Hluboké učení v automatické analýze českého textu. In: Slovo a slovesnost, ISSN 0037-7031, vol. 80, no. 4, pp. 306-327 (bibtex)
  32. Petr Bělohlávek, Ondřej Plátek, Zdeněk Žabokrtský, Milan Straka (2018): Using Adversarial Examples in Natural Language Processing. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pp. 3693-3700, European Language Resources Association, Miyazaki, Japan, ISBN 979-10-95546-00-9 (url, local PDF, bibtex)
  33. Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič (2018): LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP 2018, pp. 4921-4928, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-84-1 (url, local PDF, bibtex)
  34. Jakub Náplava, Milan Straka, Pavel Straňák, Jan Hajič (2018): Diacritics Restoration Using Neural Networks. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pp. 1-10, European Language Resources Association, Miyazaki, Japan, ISBN 979-10-95546-00-9 (url, local PDF, bibtex)
  35. Milan Straka (2018): UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task. In: Proceedings of CoNLL 2018: The SIGNLL Conference on Computational Natural Language Learning, pp. 197-207, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-72-8 (pdf, local PDF, bibtex)
  36. Milan Straka, Nikita Mediankin, Tom Kocmi, Zdeněk Žabokrtský, Vojtěch Hudeček, Jan Hajič (2018): SumeCzech: Large Czech News-Based Summarization Dataset. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018), pp. 3488-3495, European Language Resources Association, Miyazaki, Japan, ISBN 979-10-95546-00-9 (url, local PDF, bibtex)
  37. Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov (2018): CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-21, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-82-7 (pdf, local PDF, bibtex)
  38. Natalia Klyueva, Antoine Doucet, Milan Straka (2017): Neural Networks for Multi-Word Expression Detection. In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), pp. 60-65, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-945626-48-7 (pdf, local PDF, bibtex)
  39. Milan Straka, Jana Straková (2017): Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88-99, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-70-8 (pdf, local PDF, bibtex)
  40. Milan Straka, Jana Straková, Jan Hajič (2017): Prague at EPE 2017: The UDPipe System. In: Proceedings of the 2017 Shared Task on Extrinsic Parser Evaluation at the Fourth International Conference on Dependency Linguistics and the 15th International Conference on Parsing Technologies, pp. 65-74, Association for Computational Linguistics (ACL), Stroudsburg, PA, USA, ISBN 978-1-945626-74-6 (pdf, local PDF, bibtex)
  41. Jana Straková, Milan Straka, Magda Ševčíková, Zdeněk Žabokrtský (2017): Czech Named Entity Corpus. In: Handbook of Linguistic Annotation, pp. 855-873, Springer Netherlands, Netherlands, ISBN 978-94-024-0879-9 (bibtex)
  42. Daniel Zeman, Martin Popel, Milan Straka, Jan Hajič, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gökırmak, Anna Nedoluzhko, Silvie Cinková, Jan Hajič, jr., Jaroslava Hlaváčová, Václava Kettnerová, Zdeňka Urešová, Jenna Kanerva, Stina Ojala, Anna Missilä, Christopher Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, Héctor Martínez Alonso, Çağrı Çöltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, Michael Mandl, Jesse Kirchner, Hector Fernandez Alcalde, Jana Strnadová, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendonça, Tatiana Lando, Rattima Nitisaroj, Josie Li (2017): CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-19, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-945626-70-8 (pdf, local PDF, bibtex)
  43. Milan Straka, Jan Hajič, Jana Straková (2016): UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 4290-4297, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (pdf, local PDF, bibtex)
  44. Jana Straková, Milan Straka, Jan Hajič (2016): Neural Networks for Featureless Named Entity Recognition in Czech. In: Text, Speech, and Dialogue: 19th International Conference, TSD 2016, Lecture Notes in Computer Science, ISSN 0302-9743, 9924, pp. 173-181, Springer International Publishing, Cham / Heidelberg / New York / Dordrecht / London, ISBN 978-3-319-45509-9 (url, local PDF, bibtex)
  45. Magda Ševčíková, Zdeněk Žabokrtský, Jonáš Vidra, Milan Straka (2016): Lexikální síť DeriNet: elektronický zdroj pro výzkum derivace v češtině. In: Časopis pro moderní filologii, ISSN 0008-7386, vol. 98, no. 1, pp. 62-76 (bibtex)
  46. Zdeněk Žabokrtský, Magda Ševčíková, Milan Straka, Jonáš Vidra, Adéla Limburská (2016): Merging Data Resources for Inflectional and Derivational Morphology in Czech. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), pp. 1307-1314, European Language Resources Association, Paris, France, ISBN 978-2-9517408-9-1 (pdf, local PDF, bibtex)
  47. Milan Straka, Jan Hajič, Jana Straková, Jan Hajič, jr. (2015): Parsing Universal Dependency Treebanks using Neural Networks and Search-Based Oracle. In: 14th International Workshop on Treebanks and Linguistic Theories (TLT 2015), pp. 208-220, IPIPAN, Warszawa, Poland, ISBN 978-83-63159-18-4 (pdf, local PDF, bibtex)
  48. Jana Straková, Milan Straka, Jan Hajič (2014): Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 13-18, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-941643-00-6 (pdf, local PDF, bibtex)
  49. David Mareček, Milan Straka (2013): Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 281-290, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-50-3 (pdf, local PDF, bibtex)
  50. Jana Straková, Milan Straka, Jan Hajič (2013): A New State-of-The-Art Czech Named Entity Recognizer. In: Text, Speech and Dialogue: 16th International Conference, TSD 2013. Proceedings, Lecture Notes in Computer Science, ISSN 0302-9743, 8082, pp. 68-75, Springer Verlag, Berlin / Heidelberg, ISBN 978-3-642-40584-6 (url, local PDF, bibtex)
  1. Milan Straka (2011): Adams’ Trees Revisited – Correct and Efficient Implementation. In Proceedings of TFP 2011, Symposium on Trends in Functional Programming, Madrid, Spain, May 2011 (local PDF)
  2. Milan Straka (2010): The performance of the Haskell containers package. In Proceedings of Haskell 2010, 3rd ACM Haskell symposium on Haskell, Baltimore, Maryland, September 2010 (local PDF)
  3. Milan Straka (2009): Optimal worst-case fully persistent arrays. In TFP 2009, Symposium on Trends in Functional Programming, Komarno, Slovakia, June 2009 (local PDF)
  4. Martin Mareš and Milan Straka (2007): Linear-Time Ranking of Permutations. In Proceedings of ESA 2007, 15th Annual European Symposium, Eilat, Israel, October 2007 (local PDF)

Theses