Querying Dependency Treebanks in PML-TQ (Jan Štěpánek)
The tutorial will introduce the Prague Markup Language Tree Query, a powerful open-source search tool for all kinds of linguistically annotated treebanks with several client interfaces and two search backends.
We will go over the following topics:
- Which data formats are supported?
- Simple queries
- Creating reports
- Complex queries: negation, quantifiers
- What do I need to install the tool myself?
We will see lots of examples using some of the following treebanks (not necessarily dependency based):
- The Prague Dependency Treebank 2.0
- The Penn Treebank 3 (WSJ, Atis, Brown, and Switchboard data sets)
- The TIGER Treebank 1.0
- The Penn - CU Chinese Treebank 6.0
- The Penn Arabic Treebank 2 - version 2.0
- Hyderabad Treebank - (Bengali, Hindi, and Telugu Train Sets)
- Sinica Treebank 3.0
- Prague Czech-English Dependency Treebank 2.0
- CoNLL 2009 ST (Catalan, Chinese, English, German, and Spanish Train Sets)
The presentation slides from the tutorial can be found here.
The tools and data on which this tutorial is based have been
provided by LINDAT/Clarin, project LM2010013 of the Ministry of
Education, Youth and Sports of the Czech Republic.
This tutorial is part of the training activities of the projects META-NET and Khresmoi, supported by the European Commission under 7th Framework Programme projects 249119 and 257528, respectively.