HamleDT (HArmonized Multi-LanguagE Dependency Treebank) is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style.

There are as many as 30 treebanks integrated in HamleDT at this moment. Depending on their license terms and your preferences, there are various options of obtaining the data: a part can be downloaded directly from us, the rest must be obtained from the original sources and then transformed using our open-source software.

The initial version of HamleDT was described in Zeman et al. (2012): HamleDT: To Parse or Not to Parse?

Please direct any questions or comments to the following e-mail address: zeman@ufal.mff.cuni.cz.

Arabic treebank sampleCzech treebank sampleModern Greek treebank sampleFinnish treebank sampleLatin treebank sampleRussian treebank sample



How to cite

If you make use of HamleDT, please cite the following paper:

  author = {Daniel Zeman and David Mareček and Martin Popel and Loganathan Ramasamy and Jan Štěpánek and Zdeněk Žabokrtský and Jan Hajič},
  title = {HamleDT: To Parse or Not to Parse?},
  booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)},
  year = {2012},
  month = {May},
  date = {23-25},
  address = {Istanbul, Turkey},
  editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-7-7},
  language = {English},
  url = {http://www.lrec-conf.org/proceedings/lrec2012/pdf/429_Paper.pdf},