HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style.

There are as many as 30 treebanks integrated in HamleDT at this moment. Depending on their license terms and your preferences, there are various options of obtaining the data: a part can be downloaded directly from us, the rest must be obtained from the original sources and then transformed using our open-source software.

The initial version of HamleDT was described in Zeman et al. (2012): HamleDT: To Parse or Not to Parse?

Please direct any questions or comments to the following e-mail address: zeman@ufal.mff.cuni.cz.

Arabic treebank sampleCzech treebank sampleModern Greek treebank sampleFinnish treebank sampleLatin treebank sampleRussian treebank sample



How to cite

If you make use of HamleDT, please cite the following paper:

@article{ biblio:ZeDuHamleDTHarmonized2014,
journal = {Language Resources and Evaluation},
title = {Hamle{DT}: Harmonized Multi-Language Dependency Treebank},
author = {Daniel Zeman and Ond{\v{r}}ej Du{\v{s}}ek and David Mare{\v{c}}ek and Martin Popel and Loganathan Ramasamy and Jan {\v{S}}t{\v{e}}p{\'{a}}nek and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}} and Jan Haji{\v{c}}},
year = {2014},
address = {Dordrecht, Netherlands},
volume = {48},
number = {4},
pages = {601--637},
issn = {1574-020X},