After unzipping the downloaded archive, the data can be found in the directory
data, where they are further divided into ten subdirectories (
etest). Annotation of each document is captured in four interlinked files, in accordance with the layer of annotation: word layer (files
*.w.gz), morphological layer (
*.m.gz), analytical layer (
*.a.gz), and tectogrammatical layer(
The data are stored in the Prague Markup Language (PML, Pajas and Štěpánek 2008) format, which is an XML based format for linguistic annotations (esp. treebanks). For the sake of completeness, PML schemata of the files can be found in the directory
resources. (The schemata are XML files that describe the structure of the annotated files.) Also, the valency lexicon PDT-Vallex 3.0 can be found in the same directory.
Tree editor TrEd (Pajas and Štěpánek 2008) can be used to open and browse the data. The editor can be downloaded for various platforms from its home page. Please follow the installation instructions specified at the page for your operating system.
Now, TrEd is able to open the data of PDiT 2.0. To see the annotation of a document on the tectogrammatical layer, open the respective file with extension
In case of troubles with the installation of TrEd or with browsing the data, please contact the authors at (tred
PML Tree Query (PML-TQ; Pajas and Štěpánek 2009) is a powerful client-server based system for querying treebanks, developed primarily for searching in PDT data. TrEd can be used as a user-friendly graphically oriented client, using the extension "PML Tree Query Interface for TrEd (pmltq)" (follow the same installation steps as for the pdit20 extension described above). To get access to the public PML-TQ server provided by our institute, please contact the administrators at (tred
at ufal.mff.cuni.cz). Please refer to the PML-TQ web page for further information.
The public PML-TQ server of our institute can be accessed also anonymously using a web browser as the client. Unlike the TrEd client, it does not need any registration but lacks the possibility to create a query graphically and has limitations in the ways it can display the results. However, it is the quickest way to get to the data.
Pajas, P. and Štěpánek, J.: Recent Advances in a Feature-Rich Framework for Treebank Annotation. In: The 22nd International Conference on Computational Linguistics - Proceedings of the Conference, The Coling 2008 Organizing Committee, Manchester, UK, ISBN 978-1-905593-45-3, pp. 673-680, 2008.
Pajas, P. and Štěpánek, J.: System for Querying Syntactically Annotated Corpora. In: Proceedings of the ACL-IJCNLP 2009 Software Demonstrations, Association for Computational Linguistics, Suntec, Singapore, ISBN 1-932432-61-2, pp. 33-36, 2009.