PML-TQ - Tool for Querying Treebanks

The PML-TQ is a powerful open-source search tool for all kinds of linguistaically annotated treebanks with several client interfaces and two search backends (one based on an SQL database and one based on Perl and the TrEd toolkit). The tool works natively with treebanks encoded in the PML data format (conversion scripts are available for many established treebank formats).


PML-TQ at Lindat/Claring web service (screenshot) PML-TQ in TrEd (screenshot)

Getting Started

Search your local files:

Use the client-side PML-TQ search engine, which is part of the pmltq extension to the tree editor TrEd (see section about client interfaces below).

Search various treebanks using our server:

ÚFAL is hosting a PML-TQ search service for PDT 2.0, PDT 2.5, PDT 3.0 and many other treebanks, including Penn Treebank 3, Penn Chinese Treebank, Penn Arabic Treebank, Tiger Corpus 1.0, Universal Dependencies treebanks, and HamleDT treebanks. The server is accessible from several clients, including modern web browsers or the tree editor TrEd (see clients).

Search any treebank on your own PML-TQ server:

Download and install the PML-TQ server (Linux, UNIX, Mac OS X) on your computer/server.


Documentation


Clients

Web Browser

Any web browser with good support for SVG rendering, CSS, and JavaScript can be used as a client to a PML-TQ server (Firefox, Google Chrome, Opera browser, Safari, IE >=11).

A PML-TQ server hosted at ÚFAL can be accessed via LINDAT/Clarin web service (many of the treebanks are accessible freely, other treebanks require a login name and password - contact Matyáš Kopp to get information on how to obtain access to other treebanks).

TrEd

A fully graphical client for the PML-TQ with client-side searching capability is part of the tree editor TrEd (a GPL-licensed software available separatelly) as an extension called pmltq. Several other extensions provide PML schemas and visualization stylesheets for various treebanks.

To install this extension, start TrEd, select Setup -> Manage Extensions -> Get New Extensions and select 'pmltq'. When done, press Shift+F3 to start the search. Select Treebank (server) for searching using a PML-TQ server, or 'Files (local)' for searching local files using client-side search engine built into the client (contact Matyáš Kopp to get access to the PML-TQ server hosted at ÚFAL).

Command-line

Under development!


Server

This distribution contains a fast and efficient implementation of the PML-TQ powered by an SQL database with a client-server architecture (client -> REST API -> PML-TQ server -> SQL database backend).

The server is intended for searching large static data sets (complete treebanks). For individual files or small treebanks, up to say 10K trees (your mileage may vary), the client-side PML-TQ implementation in TrEd is usually sufficient.

The most important dependencies for running a PML-TQ server are:

  • the PML-TQ server distribution, see Download below,
  • an HTTP server,
  • PostgreSQL (>=8.4),
  • Perl (>=5.14),
  • the tree editor TrEd.
The treebank must be encoded in or converted to the PML format.

The server has been tested on Linux.

Download

Current version

The current version of the PML-TQ server can be downloaded from the GIT repository: https://github.com//ufal/perl-pmltq-server.

Old versions

Previous versions of the PML-TQ server (up to the version 0.7.10 (beta), released in 2013) were published as a single .tar.gz archive.
You can still download it here: pmltq-0.7.10.tar.gz (PML-TQ distribution package).

Installation of the PML-TQ Server

To install the server, please contact its current developer, Matyáš Kopp.


Bibliography

Štěpánek Jan, Pajas Petr: Querying Diverse Treebanks in a Uniform Way, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), Copyright © European Language Resources Association (ELRA), Valletta, Malta, pp. 1828-1835, 2010

Pajas Petr, Štěpánek Jan: System for Querying Syntactically Annotated Corpora, in Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Copyright © Association for Computational Linguistics, Suntec, Singapore, pp. 33-36, 2009

Pajas Petr, Štěpánek Jan: Recent Advances in a Feature-Rich Framework for Treebank Annotation, in The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, Manchester, pp. 673-680, 2008


Authors

© 2008-2010 Petr Pajas and Jan Štěpánek
© 2011-2013 Jan Štěpánek
© 2013-2015 Michal Sedlák
© 2015-2017 Matyáš Kopp (kopp at ufal.mff.cuni.cz)

Acknowledgement

The development of the PML-TQ was/has been supported by the following projects:


License

This software is published under the GPL (General Public License).


Additional Resources

More or less general presentations/tutorials about the PML-TQ

Introductions/tutorials focused on the PML-TQ used for various language phenomena