Introduction

What is Treex?

Treex (formerly TectoMT) was a highly modular NLP software system implemented in Perl programming language under Linux.

NOTE: Treex is not developed anymore (since ca 2017) and there are several better alternatives for most (though not all) of its original use cases:

  • Udapi - a similar general framework, but much faster, UD-compliant and with Python (in addition to Perl) API
  • UDPipeUD-style tokenization, tagging, lemmatization and parsing for many languages, available both as a tool and online service
  • MorphoDiTa - PDT-style morphological analysis
  • Parsito - PDT-style dependency parsing
  • Transformer - neural machine translation.

 

Treex is primarily aimed at Machine Translation, making use of the ideas and technology created during the Prague Dependency Treebank project. At the same time, it is also hoped to significantly facilitate and accelerate development of software solutions of many other NLP tasks, especially due to re-usability of the numerous integrated processing modules (called blocks), which are equipped with uniform object-oriented interfaces.

Online web interface

If you want to try Treex, we recommend to start with Treex::Web.

CPAN releases

Treex-Core Treex-Unilang Treex-EN Treex-Doc

Treex on GitHub

https://github.com/ufal/treex

Treex from Docker

https://hub.docker.com/r/ufal/treex/

Applications

NLP Applications developed within the Treex NLP framework:

  • TectoMT – English-to-Czech machine translation based on tectogrammatics
  • Depfix – a system for rule-based correction of English-Czech statistical MT outputs
  • HamleDT – harmonized dependency treebanks of 28 languages

Tutorial

  1. Installation Guide
  2. First Steps
  3. Cheat Sheet

There is a special version of the tutorial for computer labs at Malá Strana and MT-Marathon 2013.  

Acknowledgement

Work on this framework was supported by the grants FP7-ICT-2013-10-610516 (QTLeap), FP7-ICT-2007-3-231720 (EuroMatrix Plus), MSM 0021620838 (Moderní metody, struktury a systémy informatiky), LC536 (Centrum komputační lingvistiky), GAUK 116310.

QTLeap logo