Daniel Zeman
Main Research Interests
My research in computational linguistics is mostly centered around morphological analysis and dependency syntax of natural languages. I am interested in computational models of morphology and syntax, including preparation of annotated data to train such models.
I am particularly interested in multilingual approaches that work for many typologically different languages, cross-lingual techniques that support disadvantaged languages with little resources, and harmonized data resources that support multilingual processing. Besides training models, I also look into ways how these datasets can be used in comparative linguistics and typology.
Beyond surface syntax I am working on deep syntactic relations, semantic roles and coreference. In the past I also worked on statistical machine translation for morphologically rich languages.
Selected Bibliography
- Google Scholar
- ORCID: 0000-0002-5791-6568
- Scopus ID: 23092520600
- Researcher ID: B-2844-2009
- Universal Dependencies. In: Computational Linguistics, ISSN 1530-9312, vol. 47, no. 2, pp. 255-308 (url, local PDF, bibtex)
- FGD at MRP 2020: Prague Tectogrammatical Graphs. In: Proceedings of the CoNLL 2020 Shared Task: Cross-Framework Meaning Representation Parsing, pp. 33-39, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-952148-64-4 (url, local PDF, bibtex)
- Sentence Meaning Representations across Languages: What Can We Learn from Existing Frameworks? In: Computational Linguistics, ISSN 1530-9312, vol. 46, no. 3, pp. 605-665 (url, local PDF, bibtex)
- The World of Tokens, Tags and Trees. ISBN 978-80-88132-09-7 (url, bibtex)
- CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. In: Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 1-21, Association for Computational Linguistics, Stroudsburg, PA, USA, ISBN 978-1-948087-82-7 (pdf, local PDF, bibtex)
- HamleDT: Harmonized Multi-Language Dependency Treebank. In: Language Resources and Evaluation, ISSN 1574-020X, vol. 48, no. 4, pp. 601-637 (url, local PDF, bibtex)
- Coordination Structures in Dependency Treebanks. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 517-527, Association for Computational Linguistics, Sofija, Bulgaria, ISBN 978-1-937284-50-3 (pdf, local PDF, poster, slides, bibtex)
- Reusable Tagset Conversion Using Tagset Drivers. In: Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), pp. 213-218, European Language Resources Association, Marrakech, Morocco, ISBN 2-9517408-4-0 (url, poster, local PDF, bibtex)
- Cross-Language Parser Adaptation between Related Languages. In: IJCNLP 2008 Workshop on NLP for Less Privileged Languages, pp. 35-42, International Institute of Information Technology, Hyderabad, India (url, local PDF, slides, bibtex)
See here for the complete list of publications.
Teaching
- NPFL094 Computational Morphology and Syntax
- NPFL075 Dependency Grammars and Treebanks
- NPFL120 Multilingual Natural Language Processing
- NPFL124 Natural Language Processing (bachelor-level introduction)
- NPRG045 Software project topics / Náměty na ročníkové projekty (in Czech)
Projects
- HiČKoK (2023 – 2026)
- UniDive (2022 – 2026) – COST Action Vice-Chair
- CorefUD (since 2021)
- LUSyD (2020 – 2024)
- Deep Universal Dependencies (since 2017) – PI
- Universal Dependencies (since 2014) – core group member
- LINDAT/CLARIN/CLARIAH (2010 – 2026)
- Interset (since 2006) – PI
- HimL (2015 – 2018)
- Manyla (2015 – 2017) – PI (GAČR grant)
- Deltacorpus (2016) – Co-PI
- QTLEAP (2013 – 2016)
- HamleDT (2012 – 2015) – Co-PI
- KHRESMOI (2010 – 2014)
- CzechMate (2011 – 2013) – PI (GAČR grant)
- MUSSLAP (2004 – 2008) – Co-PI (project led by University of West Bohemia)
- Czech Parsing (1994 – 2005)
- CKL (2000 – 2004)
- Older projects (in Czech)
Supervised Theses
Graduated Doctoral Students
- Diego Alves (2023; co-supervised at Sveučilište u Zagrebu): A Computational Typological Analysis of Syntactic Structures in European Languages
Graduated Master Students
- Ján Faryad (2024)
- Christian Cayralat (كريستيان خيرالله) (2021; Univerzita Karlova & Universität des Saarlandes)
- Akshay Aggarwal (अक्षय अग्रवाल) (2020; Univerzita Karlova & Euskal Herriko Unibertsitatea)
- Ronald Cardenas (2020; Univerzita Karlova & L-Università ta' Malta)
- Ọlájídé Ishola (2019; co-supervised at Universität Tübingen)
- Adédayọ̀ Olúòkun (2018)
- Vinit Ravishankar (विनीत रविशंकर) (2018; Univerzita Karlova & L-Università ta' Malta)
- Joachim Daiber (2013; Univerzita Karlova & Rijksuniversiteit Groningen)
- Sibel Ciddi (2013; Univerzita Karlova & Universität des Saarlandes)
- Manh-Ke Tran (Trần Mạnh Kế) (2012; Univerzita Karlova & Rijksuniversiteit Groningen)
- Pranava Swaroop Madhyastha (ಪ್ರನವ) (2012; Univerzita Karlova & L-Università ta' Malta)
- Angelina Ivanova (Ангелина Иванова) (2011; Univerzita Karlova & Freie Universität Bozen)
- Bushra Jawaid (بشری جاوید) (2010; Univerzita Karlova & L-Università ta' Malta)
Graduated Bachelor Students
- Katarína Dančejová (2021)
- Tereza Storzerová (2019)
- Ondřej Hálek (2010; FJFI ČVUT)
- Martin Žember (2007)
- David Mareček (2006)