Header Image n.1Header Image n.2Header Image n.3Header Image n.4Header Image n.5Header Image n.6

In this section we will describe all possible ways to work with the PEDT 2.0 DVD. You can use it as it is, you can copy files directly from it or let the installer do the work. You can use it under Windows as well as Linux. It should work on Mac too. But first let's have a look at its content:

  • Data – self explanatory.
  • TrEd – Tree Editor, a software tool for working with the data
  • Documentation – All useful information about PEDT 2.0 mostly in HTML and PDF
  • Data Browser – You can have a look at a complete set of data immediately out-of-the-box in your web browser
  • Schemata – PML schemata describing the data format
  • Engvallex – Valency lexicon of English referred to in the data

Requirements

There are no special requirements for working with PEDT 2.0. All that is needed should already be in your computer. You need a PDF reader and an internet browser to open the documentation. On Linux you need to have Perl installed if you want to use TrEd.

If you are a Mac user, almost everything that is said about Linux applies also to you. The only trouble could be the installation of TrEd. It is possible to install TrEd in Mac OS X, but occasional troubles may occur. Please contact us if it is your case.

Installers

There are two installers in the root directory of the PEDT 2.0 DVD. If you are a Windows user, run the setup_windows.exe file. It can copy the data files, the documentation and the web-based data browser to your computer. It will also offer to install TrEd, if it does not detect it in your computer.

If you have Linux on your machine, run setup_unix.sh. It provides basically the same set of choices as the Windows installer.

Data

The data are stored in the directory data and divided into 25 sections – subdirectories 0024. Each section contains up to 100 logical files. A logical file is a continuous piece of text (a news article) and three layers of its annotation: phrasal (an unaltered copy of the corresponding constituency tree from the Penn Treebank), analytical and tectogrammatical. These three layers are stored separately in physical files. Hence each section contains up to 300 physical files. All files are gzipped but there is no need to uncompress them because TrEd has no problems opening gzipped files. It is even faster than opening uncompressed plain text files.

You can work with all the files directly from the DVD or copy them to your computer. Both the Windows and the Linux installers can be used to copy the files to a directory of your choice.

TrEd

If you aim to do some serious work with the data, you will most probably be using TrEd – Tree Editor. It is the tool that we used to annotate the data. We have been using it for many years ourselves and we are still developing it to suit our needs. Therefore it should be the best available software you could have. It is written in Perl, so you need to have Perl installed to run it. The TrEd installer will install it for you if you use Windows. If you are on Linux, there is only a small chance that you do not have it yet. In that case please use the package manager of your distribution to install Perl.

TrEd is a highly modular software. There are a lot of extensions that extend its functionality. You will need some of them to work with the PEDT 2.0 Data. These extensions will not be installed if you use the standard TrEd installer and you will not be able to open the data correctly unless you install them. We provide a working solution for both Windows and Linux users on the PEDT 2.0 DVD.

For Windows we have a portable TrEd that runs out-of-the-box (directly from the DVD) – just run tred.bat from the root directory. All the required extensions are integrated into it. TrEd itself resides in the tred directory. You can copy it anywhere into your computer. In that case use the file tred\tred_portable\tred.bat to run TrEd.

If you are on a Linux machine and use our installer to install TrEd, it will also install all the required extensions.

In case you have to use standard Perl (Strawberry Perl on Windows) and TrEd distributions, you will have to install the extensions manually. You need these: Prague Dependency Treebank 2.0 Annotation (ptd20), Prague English Treebank Annotation (pedt) and PDT-Vallex Editor (pdt_vallex). The extensions are easy to install because TrEd has its own Extension Manager (look for Setup → Manage Extensions... in the application menu).

Documentation

The documentation of PEDT 2.0 is integrated into the PEDT 2.0 web site (which you are reading right now). It refers to a number of PDF documents and some other sites. It works like a crossroad that should connect you to all the information available. A complete copy of this web site is stored on the PEDT 2.0 DVD – use the index.html file from the root directory to run it. It is possible (and installers can do it for you) to copy the web anywhere you want to, but you should take into consideration that besides the documentation the web also consists of the data browser and it takes a lot of space.

Data Browser

We transformed all the PEDT 2.0 Data into SVG files, so that you can view them in your internet browser. The trees look just the same as in TrEd. You can navigate through them easily in our data browser, written entirely using standard web technologies. All you need is a recent internet browser (recent means at least version 9.0 in case of Internet Explorer). This is the best choice if you want to have just a quick look at the data.

You can also copy the whole thing into your computer (directory doc from the root directory) but take into consideration that the complete data browser consists of almost 150,000 files and will need more than 2 GiB of storage space.

Schemata

The PML schemata describing the data format are stored in the schemata directory. They are here just for your reference: you do not need to copy them anywhere. They are already integrated into the TrEd extensions.

Engvallex

Engvallex is a valency lexicon used in PEDT 2.0. It is interlinked with Propbank and Verbnet. These resources are stored in the engvallex directory. You can browse through them in TrEd. Do not bother installing or copying them anywhere, they are already incorporated in the TrEd extensions – just like the schemata.

Troubles?

If you feel that you need more help or if something does not work, do not hesitate to contact us.