Daniel Zeman

office: 409
email: zeman@ufal.mff.cuni.cz
phone: +420 951 554 225
address: Malostranské náměstí 25
118 00 Praha 1
Czech Republic

Main Research Interests

My research in computational linguistics is mostly centered around morphological analysis and dependency syntax of natural languages. I am interested in computational models of morphology and syntax, including preparation of annotated data to train such models.

I am particularly interested in multilingual approaches that work for many typologically different languages, cross-lingual techniques that support disadvantaged languages with little resources, and harmonized data resources that support multilingual processing. Besides training models, I also look into ways how these datasets can be used in comparative linguistics and typology.

Beyond surface syntax I am working on deep syntactic relations, semantic roles and coreference. In the past I also worked on statistical machine translation for morphologically rich languages.

Selected Bibliography

Google Scholar
ORCID: 0000-0002-5791-6568
Scopus ID: 23092520600
Researcher ID: B-2844-2009

Marie-Catherine de Marneffe, Christopher Manning, Joakim Nivre, Daniel Zeman (2021): Universal Dependencies. In: Computational Linguistics, ISSN 1530-9312, vol. 47, no. 2, pp. 255-308 (url, local PDF, bibtex)

Daniel Zeman, Jan Hajič (2020): FGD at MRP 2020: Prague Tectogrammatical Graphs. In: Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pp. 33-39, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-64-4 (url, local PDF, bibtex)

Zdeněk Žabokrtský, Daniel Zeman, Magda Ševčíková (2020): Sentence Meaning Representations across Languages: What Can We Learn from Existing Frameworks? In: Computational Linguistics, ISSN 1530-9312, vol. 46, no. 3, pp. 605-665 (url, local PDF, bibtex)

Daniel Zeman (2018): The World of Tokens, Tags and Trees. ISBN 978-80-88132-09-7 (url, bibtex)

Daniel Zeman, Jan Hajič, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov (2018): CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-21, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-82-7 (pdf, local PDF, bibtex)

Daniel Zeman, Ondřej Dušek, David Mareček, Martin Popel, Loganathan Ramasamy, Jan Štěpánek, Zdeněk Žabokrtský, Jan Hajič (2014): HamleDT: Harmonized Multi-Language Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 48, no. 4, pp. 601-637 (url, local PDF, bibtex)

Martin Popel, David Mareček, Jan Štěpánek, Daniel Zeman, Zdeněk Žabokrtský (2013): Coordination Structures in Dependency Treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-50-3 (pdf, local PDF, poster, slides, bibtex)

Daniel Zeman (2008): Reusable Tagset Conversion Using Tagset Drivers. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 213-218, European Language Resources Association, Marrakech, Morocco, ISBN 2-9517408-4-0 (url, poster, local PDF, bibtex)

Daniel Zeman, Philip Resnik (2008): Cross-Language Parser Adaptation between Related Languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35-42, International Institute of Information Technology, Hyderabad, India (url, local PDF, slides, bibtex)

See here for the complete list of publications.

Teaching

Projects

HiČKoK (2023 – 2026)
UniDive (2022 – 2026) – COST Action Vice-Chair
CorefUD (since 2021)
LUSyD (2020 – 2024)
Deep Universal Dependencies (since 2017) – PI
Universal Dependencies (since 2014) – core group member
LINDAT/CLARIN/CLARIAH (2010 – 2026)
Interset (since 2006) – PI

HimL (2015 – 2018)
Manyla (2015 – 2017) – PI (GAČR grant)
Deltacorpus (2016) – Co-PI
QTLEAP (2013 – 2016)
HamleDT (2012 – 2015) – Co-PI
KHRESMOI (2010 – 2014)
CzechMate (2011 – 2013) – PI (GAČR grant)
MUSSLAP (2004 – 2008) – Co-PI (project led by University of West Bohemia)
Czech Parsing (1994 – 2005)
CKL (2000 – 2004)
Older projects (in Czech)

Supervised Theses

Graduated Doctoral Students

Diego Alves (2023; co-supervised at Sveučilište u Zagrebu): A Computational Typological Analysis of Syntactic Structures in European Languages

Graduated Master Students

Ján Faryad (2024)
Christian Cayralat (كريستيان خيرالله) (2021; Univerzita Karlova & Universität des Saarlandes)
Akshay Aggarwal (अक्षय अग्रवाल) (2020; Univerzita Karlova & Euskal Herriko Unibertsitatea)
Ronald Cardenas (2020; Univerzita Karlova & L-Università ta' Malta)
Ọlájídé Ishola (2019; co-supervised at Universität Tübingen)
Adédayọ̀ Olúòkun (2018)
Vinit Ravishankar (विनीत रविशंकर) (2018; Univerzita Karlova & L-Università ta' Malta)
Joachim Daiber (2013; Univerzita Karlova & Rijksuniversiteit Groningen)
Sibel Ciddi (2013; Univerzita Karlova & Universität des Saarlandes)
Manh-Ke Tran (Trần Mạnh Kế) (2012; Univerzita Karlova & Rijksuniversiteit Groningen)
Pranava Swaroop Madhyastha (ಪ್ರನವ) (2012; Univerzita Karlova & L-Università ta' Malta)
Angelina Ivanova (Ангелина Иванова) (2011; Univerzita Karlova & Freie Universität Bozen)
Bushra Jawaid (بشری جاوید) (2010; Univerzita Karlova & L-Università ta' Malta)

Graduated Bachelor Students

Katarína Dančejová (2021)
Tereza Storzerová (2019)
Ondřej Hálek (2010; FJFI ČVUT)
Martin Žember (2007)
David Mareček (2006)

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Daniel Zeman

Main Research Interests

Selected Bibliography

Teaching

Projects

Supervised Theses

Graduated Doctoral Students

Graduated Master Students

Graduated Bachelor Students