Released 2015-08-18: HamleDT 3.0

HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style.

There are as many as 42 treebanks integrated in HamleDT at this moment. Depending on their license terms and your preferences, there are various options of obtaining the data: a part can be downloaded directly from us, the rest must be obtained from the original sources and then transformed using our open-source software.

The initial version of HamleDT was described in Zeman et al. (2012): HamleDT: To Parse or Not to Parse?

Please direct any questions or comments to the following e-mail address: zeman@ufal.mff.cuni.cz.

Slovene treebank sample German treebank sample Spanish treebank sample
Ancient Greek treebank sample Persian treebank sample Hindi treebank sample

cs pl sk sl hr bg ru en de nl da sv ga pt es ca fr it ro la el grc fa hi bn ar he eu fi et hu tr ta te ja id


How to cite

If you make use of HamleDT, please cite the following paper:

@article{ biblio:ZeDuHamleDTHarmonized2014,
journal = {Language Resources and Evaluation},
title = {Hamle{DT}: Harmonized Multi-Language Dependency Treebank},
author = {Daniel Zeman and Ond{\v{r}}ej Du{\v{s}}ek and David Mare{\v{c}}ek and Martin Popel and Loganathan Ramasamy and Jan {\v{S}}t{\v{e}}p{\'{a}}nek and Zden{\v{e}}k {\v{Z}}abokrtsk{\'{y}} and Jan Haji{\v{c}}},
year = {2014},
address = {Dordrecht, Netherlands},
volume = {48},
number = {4},
pages = {601--637},
issn = {1574-020X},