PML-TQ - Tool for Querying Treebanks

The PML-TQ is a powerful open-source search tool for all kinds of linguistaically annotated treebanks with several client interfaces and two search backends (one based on an SQL database and one based on Perl and the TrEd toolkit). The tool works natively with treebanks encoded in the PML data format (conversion scripts are available for many established treebank formats).


PML-TQ in TrEd (screenshot) PML-TQ in Opera (screenshot)

Getting Started

Search your local files:

Use the client-side PML-TQ search engine, which is part of the pmltq extension to the tree editor TrEd (see section about client interfaces below).

Register to search various treebanks using our server:

We are hosting a PML-TQ search service for PDT 2.0 and various other treebanks, including Penn Treebank 3, Penn Chinese Treebank, Penn Arabic Treebank, Tiger Corpus 1.0. To register, send an email to Matyáš Kopp, the current developer of the PML-TQ. The server is accessible from several clients, including modern web browsers or TrEd (see clients).

Search any treebank on your own PML-TQ server:

Download and install the PML-TQ server (Linux, UNIX, Mac OS X) on your computer/server.


Documentation


Clients

Web Browser

Any web browser with good support for SVG rendering, CSS, and JavaScript can be used as a client to a PML-TQ server (Firefox, Google Chrome, Opera browser, Safari, IE >=11).

TrEd

A fully graphical client for the PML-TQ with client-side searching capability is part of the tree editor TrEd (a GPL-licensed software available separatelly) as an extension called pmltq. Several other extensions provide PML schemas and visualization stylesheets for various treebanks.

To install this extension, start TrEd, select Setup > Manage Extensions > Get New Extensions and select 'pmltq'. When done, press Shift+F3 to start the search. Select Treebank (server) for searching using a PML-TQ server, or 'Files (local)' for searching local files using client-side search engine built into the client.

Command-line

A simple text-based client called pmltq is included in the server package.


Server

This distribution contains a fast and efficient implementation of the PML-TQ powered by an SQL database with a client-server architecture (HTTP client -> custom HTTP server -> CGI -> SQL database backend).

The server is intended for searching large static data sets (complete treebanks). For individual files or small treebanks, up to say 10K trees (your mileage may vary), the client-side PML-TQ implementation in TrEd is usually sufficient.

Running a PML-TQ server requires either Oracle or PostgresSQL database, Perl >= 5.8.8 and several Perl modules installable from CPAN. The treebank must be encoded in or converted to the PML format.

The server has been tested on Linux with Oracle XE 10g and PostgresSQL (8.4beta).

Download

Current version

... to-do

Old versions

Previous versions (up to the version 0.7.10 (beta), released in 2013) were published as a single .tar.gz archive.
You can still download it here: pmltq-0.7.10.tar.gz (PML-TQ distribution package).

Installation of PML-TQ Server

To run PML-TQ Servers, you will first need to install an SQL database server. Fully supported are Oracle 10g or 11g and PostgreSQL ver. min. 8.4.1.

Then follow carefully the instructions in the README file provided in the distribution and the configuration scripts you will be asked to edit during the installation process. Since individual steps of the server installation are still poorly documented, do not hesitate to ask the authors for guidance via e-mail.


Bibliography

Štěpánek Jan, Pajas Petr: Querying Diverse Treebanks in a Uniform Way, in Proceedings of the Seventh conference on International Language Resources and Evaluation (LREC'10), Copyright © European Language Resources Association (ELRA), Valletta, Malta, pp. 1828-1835, 2010

Pajas Petr, Štěpánek Jan: System for Querying Syntactically Annotated Corpora, in Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Copyright © Association for Computational Linguistics, Suntec, Singapore, pp. 33-36, 2009

Pajas Petr, Štěpánek Jan: Recent Advances in a Feature-Rich Framework for Treebank Annotation, in The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, Manchester, pp. 673-680, 2008


Authors

© 2008-2010 Petr Pajas and Jan Štěpánek
© 2011-2013 Jan Štěpánek
© 2013-2015 Michal Sedlák
© 2015-2017 Matyáš Kopp

Acknowledgement

The development of the PML-TQ was/has been supported by the following projects:


License

This software is published under the GPL (General Public License).