David Mareček

office
N 235
email
david.marecek@mff.cuni.cz
address
IMPAKT – „N“
V Holešovičkách 747/2
180 00 Praha 8
Czech Republic

Main Research Interests

interpretation of deep neural networks, machine learning, dependency syntax, unsupervised and semi-supervised parsing, machine translation

Projects

Curriculum Vitae

Curriculum vitae (PDF)

Teaching

NPFL097 - Unsupervised Machine Learning in NLP (lectures)

NTIN066 - Data Structures 1 (practicals)

Selected Bibliography

list of all publications

Selected Presentations

  • 2020-11-30  David Mareček, Jindřich Libovický, Rudolf Rosa, Tomáš Musil, Tomasz Limisiewicz: Hidden in the Layers: Interpretation of Neural Networks for Natural Language Processing. Monday seminar, ÚFAL MFF UK, Czechia. [slides]
  • 2019-09-12  David Mareček: Searching for Linguistic Structure in Neural Networks. Talk at the Research Seminar in Language Technology, University of Helsinki, Finland. [slides]
  • 2018-06-18  David Mareček: Exploring linguistic structure in self-attentions of Neural Machine Translation. Talk at the CoAStaL research group, University of Copenhagen, Denmark. [slides]
  • 2012-04-02 David Mareček: Unsupervised Dependency Parsing, Monday seminar, ÚFAL MFF UK, Czechia. [slides]
  • 2010-10-18 David Mareček: Dependency tree projection across parallel texts, FEAST meeting, Saarland University, Saarbrücken, Germany. [slides]

Books

  • David Mareček, Jindřich Libovický, Tomáš Musil, Rudolf Rosa, Tomasz Limisiewicz (2020): Hidden in the Layers: Interpretation of Neural Networks for Natural Language Processing. In: Studies in Computational and Theoretical Linguistics, ISBN 978-80-88132-10-3, Volume 20, Institute of Formal and Applied Linguistics, 2020 [pdf]

Theses

  • Doctoral thesis (2012): Unsupervised Dependency Parsing [pdf] [slides]
  • Master thesis (2008): Automatic Alignment of Tectogrammatical Trees from Czech-English Parallel Corpus [pdf] [slides]
  • Bachelor thesis (2006): Novelizátor zákonů (in Czech) [pdf]

Selected Papers

  • 2018: David Mareček, Rudolf Rosa: Extracting Syntactic Trees from Transformer Encoder Self-Attentions. In: Proceedings of the First Workshop on Analyzing and Interpreting Neural Networks for NLP, pp. 347-349, Association of Computational Linguistics, Brussels, Belgium, ISBN 978-1-948087-71-1
  • 2017: Bedřich Pišl, David Mareček: Communication with Robots using Multilayer Recurrent Networks. In: Proceedings of the First Workshop on Language Grounding for Robotics, pp. 44-48, Association for Computational Linguistics, Vancouver, Canada, ISBN 978-1-945626-64-7
  • 2016: David Mareček: Delexicalized and Minimally Supervised Parsing on Universal Dependencies. In: Statistical Language and Speech Processing, pp. 30-42, Springer International Publishing, Cham, Switzerland, ISBN 978-3-319-45924-0
  • 2015: David Mareček: Multilingual Unsupervised Dependency Parsing with Unsupervised POS tags. In: MICAI 2015: Advances in Artificial Intelligence and Soft Computing, Part I, pp. 72-82, Springer, Berlin / Heidelberg, ISBN 978-3-319-27059-3
  • 2014: Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth J.F. Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová: Adaptation of machine translation for multilingual information retrieval in medical domain. In: Artificial Intelligence in Medicine, ISSN 0933-3657, vol. 61, no. 3, pp. 165-185
  • 2014: Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič: HamleDT: Harmonized Multi-Language Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 48, no. 4, pp. 601-637
  • 2013: David Mareček, Milan Straka: Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 281-290, Association for Computational Linguistics, Sofia, Bulgaria, ISBN 978-1-937284-50-3
  • 2012: Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, Zdeněk Žabokrtský: Coordination Structures in Dependency Treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofia, Bulgaria, ISBN 978-1-937284-50-3
  • 2012: David Mareček, Zdeněk Žabokrtský: Exploiting Reducibility in Unsupervised Dependency Parsing. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 297-307, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-937284-43-5

2015

  • David Mareček: Multilingual Unsupervised Dependency Parsing with Unsupervised POS tags In: MICAI 2015: Advances in Artificial Intelligence and Soft Computing, Part I, pp. 72-82, Springer, Berlin/Heidelberg, ISBN 978-3-319-27059-3

2014

  • Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, and Jan Hajič: HamleDT: Harmonized Multi-LanguageDependency Treebank In Language Resources and Evaluation, ISBN 1574-020X, vol. 48,no. 4, pp. 601-637, 2014
  • Pavel Pecina, Ondřej Dušek, Lorraine Goeuriot, Jan Hajič, Jaroslava Hlaváčová, Gareth J.F. Jones, Liadh Kelly, Johannes Leveling, David Mareček, Michal Novák, Martin Popel, Rudolf Rosa, Aleš Tamchyna, Zdeňka Urešová Adaptation of machine translation for multilingual information retrieval in medical domain In: Artificial Intelligence in Medicine, ISBN 0933-3657, vol. 61, no. 3, pp. 165-185, 2014

2013

  • David Mareček and Milan Straka: Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing In Annual Meeting of the Association for Computational Linguistics (ACL'13), Sofia, Bulgaria, August 2013 [pdf]
  • Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, and Zdeněk Žabokrtský: Coordination Structures in Dependency Treebanks In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofia, Bulgaria, August 2013
  • Rudolf Rosa, David Mareček, and Aleš Tamchyna: Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Student Research Workshop, pp. 172-179, Association for Computational Linguistics, Sofia, Bulgaria, August 2013

2012

  • Rudolf Rosa and David Mareček: Dependency Relations Labeller for Czech. In Text, Speech and Dialogue, Lecture Notes in Computer Science, Volume 7499, pp 256-263, Springer-Verlag Berlin/Heidelberg, 2012 [pdf]
  • David Mareček and Zdeněk Žabokrtský: Exploiting Reducibility in Unsupervised Dependency Parsing. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 297-307, Jeju Island, Korea, July, 2012 [pdf] [bib] [slides]
  • Rudolf Rosa, Ondřej Dušek, David Mareček, and Martin Popel Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors. In Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation, pages 39-48, Jeju Island, Korea, 2012 [pdf] [bib]
  • Rudolf Rosa, David Mareček, and Ondřej Dušek: DEPFIX: A System for Automatic Correction of Czech MT Outputs. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Association for Computational Linguistics, pages 362-368, Montreal, Canada, June 7-8, 2012 [pdf] [bib]
  • Ondřej Dušek, Zdeněk Žabokrtský, Martin Popel, Martin Majliš, Michal Novák, and David Mareček: Formemes in English-Czech Deep Syntactic MT. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Association for Computational Linguistics, pages 267-274, Montreal, Canada, June 7-8, 2012 [pdf]
  • David Mareček, Zdeněk Žabokrtský: Unsupervised Dependency Parsing using Reducibility and Fertility features. In Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure, pages 84–89, Montréal, Canada, June 7, 2012 [pdf]
  • Daniel Zeman, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský and Jan Hajič: HamleDT: To Parse or Not to Parse?. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pp. 2735-2741, Istanbul, Turkey, 2012 [pdf]
  • Ondřej Bojar, Zdeněk Žabokrtský, Ondřej Dušek, Petra Galuščáková, Martin Majliš, David Mareček, Jiří Maršík, Michal Novák, Martin Popel and Aleš Tamchyna: The Joy of Parallelism with CzEng 1.0. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), pp. 3921-3928, Istanbul, Turkey, 2012 [pdf]

2011

  • David Mareček and Zdeněk Žabokrtský: Gibbs Sampling with Treeness constraint in Unsupervised Dependency Parsing. In Proceedings of RANLP Workshop on Robust Unsupervised and Semisupervised Methods in Natural Language Processing, pp. 1–8, Hissar, Bulgaria, 2011 [pdf] [slides PPT]
  • Martin Popel, David Mareček, Nathan Green and Zdeněk Žabokrtský: Influence of Parser Choice on Dependency-Based MT. In Proceedings of WMT 2011, EMNLP 6th Workshop on Statistical Machine Translation, Edinburgh, UK, pp. 433–439, 2011 [pdf] [poster JPG]
  • David Mareček, Rudolf Rosa, Petra Galuščáková and Ondřej Bojar: Two-step translation with grammatical post-processing. In Proceedings of WMT 2011, EMNLP 6th Workshop on Statistical Machine Translation, Edinburgh, UK, pp. 426–432, 2011 [pdf] [poster PDF]
  • David Mareček: Combining Diverse Word-Alignment Symmetrizations Improves Dependency Tree Projection. In Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science Volume 6608/2011, pages 144-154, DOI: 10.1007/978-3-642-19400-9_12, Springer Berlin/Heidelberg, 2011 [pdf] [slides PPT]

2010

  • Natalia Klyueva, David Mareček: Towards Parallel Czech-Russian Dependency Treebank. In Proceedings of AEPC 2010: Workshop on Annotation and Exploitation of Parallel Corpora, Tartu, Estonia, 2010 [pdf]
  • Martin Popel, David Mareček: Perplexity of n-gram and Dependency Language Models. In Proceedings of TSD 2010, 13th International Conference on Text, Speech and Dialog, Brno, Czech Republic, 2010 [pdf]
  • Ondřej Bojar, Kamil Kos, David Mareček: Tackling Sparse Data Issue in Machine Translation Evaluation. In Proceedings of 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweeden, 2010 [pdf]
  • David Mareček, Martin Popel, Zdeněk Žabokrtský: Maximum Entropy Translation Model in Dependency-Based MT Framework. In Proceedings of the Fifth Workshop on Statistical Machine Translation, Uppsala, Sweeden, 2010 [pdf]

2009

  • David Mareček, Natalia Kljueva: Converting Russian Treebank SynTagRus into Praguian PDT Style.. In Proceedings of Multilingual resources, technologies and evaluation for Central and Eastern European languages, Borovets, Bulgaria, 2009 [pdf] [slides] [poster]  
  • David Mareček: Improving Word Alignment Using Alignment of Deep Structures. In Proceedings of The 12th International Conference TSD 2009, Plzeň, Czech Republic, 2009 [pdf] [slides]
  • David Mareček: Using Tectogrammatical Alignment in Phrase‐Based Machine Translation.. WDS'09 Proceedings of Contributed Papers, MFF UK, Prague, 2009 [pdf] [slides]
  • Ondřej Bojar, David Mareček, Václav Novák, Martin Popel, Jan Ptáček, Jan Rouš, Zdeněk Žabokrtský: English-Czech MT in 2008.. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, Greece. Association for Computational Linguistics, 2009 [pdf]

2008

  • David Mareček, Zdeněk Žabokrtský, Václav Novák: Automatic Alignment of Czech and English Deep Syntactic Dependency Trees.. In Proceedings of EAMT08, Hamburg, Germany, 2008 [pdf] [poster]
     
  • Petr Pajas, David Mareček: MEd - an editor of interlinked multi-layered linearly-structured linguistic annotations, UK MFF UFAL, 2007

Students

Doctoral students:

  • Tomasz Limisiewicz - Exploring multilingual representations of language units in a shared vector space
  • Tomáš Musil - Exploring natural language principles with respect to algorithms of deep neural networks

Master students:

Bachelor students: