In this section we will describe all possible ways to work with the PCEDT 2.0 DVD. You can use it as it is, you can copy files directly from it or let the installer do the work. You can use it under Windows as well as Linux. It should work on Mac too. But first let's have a look at its content:
- Data – self explanatory.
- TrEd – Tree Editor, a software tool for working with the data
- Documentation – All useful information about PCEDT 2.0, mostly in HTML and PDF
- Data Browser – You can have a look at a complete set of data immediately out-of-the-box in your web browser
- Schemata – PML schemata describing the data format
- Valency Lexicons – Valency lexicons of Czech and English referred to in the data
There are no special requirements for working with PCEDT 2.0. All that is needed should already be in your computer. You need a PDF reader and an internet browser to open the documentation. On Linux you need to have Perl installed if you want to use TrEd.
If you are a Mac user then almost everything that is said about Linux applies also to you. The only trouble could be the installation of TrEd. It is possible to install TrEd in Mac OS X but occasional troubles may occur. Please contact us if it is your case.
There are two installers in the root directory of the PCEDT 2.0 DVD. If you are a Windows user, run the
setup_windows.exe file. It can copy the data files, the documentation and the web-based data browser to your computer. It will also offer to install TrEd.
If you have Linux on your machine, run
setup_unix.sh. It provides basically the same set of choices as the Windows installer.
The data are stored in the
data directory and divided into 25 sections – subdirectories
24. Each section contains up to 100 files. All files are gzipped, but there is no need to uncompress them because TrEd has no problems opening gzipped files. It is even faster than opening uncompressed plain text files.
You can work with all the files directly from the DVD or copy them to your computer. Both the Windows and the Linux installers can be used to copy the files to a directory of your choice.
If you aim to do some serious work with the data, you will most probably be using TrEd – Tree Editor. It is the tool that we used to annotate the data. We have been using it for many years ourselves and we are still developing it to suit our needs. Therefore it should be the best available software you could have. It is written in Perl, so you need to have Perl installed to run it. The TrEd installer will install it for you if you use Windows. If you are on Linux, there is only a small chance that you do not have it yet. In that case, please use the package manager of your distribution to install Perl.
TrEd is a highly modular software. There are a lot of extensions that extend its functionality. You will need some of them to work with the PCEDT 2.0 Data. These extensions will not be installed if you use the standard TrEd installer and you will not be able to open the data correctly unless you install them. We provide a working solution for both Windows and Linux users on the PCEDT 2.0 DVD.
For Windows we have a portable TrEd that runs out-of-the-box (directly from the DVD) – just run
tred.bat from the root directory. All the required extensions are integrated into it. TrEd itself resides in the
tred directory. You can copy it anywhere into your computer. In that case use the
tred\tred_portable\tred.bat file to run TrEd.
If you are on a Linux machine and use our installer to install TrEd, it will also install all the required extensions.
In case you have to use standard Perl (Strawberry Perl on Windows) and TrEd distributions, you will have to install everything else manually. You will need the
Treex::Core Perl module (distributed through CPAN – it will probably install quite a lot of dependencies) and these TrEd extensions: Prague Dependency Treebank 2.0 Annotation (ptd20), Prague English Treebank Annotation (pedt) and PDT-Vallex Editor (pdt_vallex). It is easy to install TrEd extensions because TrEd has its own Extension Manager (look for Setup → Manage Extensions... in the application menu). After this you will have to run TrEd using the
ttred script that comes with the
Treex::Core Perl module. These instructions apply both for Windows and Linux.
The documentation of PCEDT 2.0 is integrated into the PCEDT 2.0 web site (which you are reading right now). It refers to a number of PDF documents and some other sites. It works like a crossroad that should connect you to all the information available. A complete copy of this web site is stored on the PCEDT 2.0 DVD –. Use the
index.html file from the root directory to run it. It is possible (and installers can do it for you) to copy the web anywhere you want to, but you should take into consideration that besides the documentation the web also consists of the data browser and it takes a lot of space.
We transformed all PCEDT 2.0 Data into SVG files, so that you can view them in your internet browser. The trees look just the same as in TrEd. You can navigate through them easily in our data browser, written entirely using standard web technologies. All you need is a recent internet browser (recent means at least version 9.0 in case of Internet Explorer). This is the best choice if you want to have just a quick look at the data.
You can also copy the whole thing into your computer (directory
doc from the root directory) but take into consideration that the complete data browser consists of almost 50,000 files and will need more than 3 GiB of storage space.
The PML schemata describing the data format are stored in directory
schemata. They are here just for your reference: you do not need to copy them anywhere. They are already integrated into the TrEd extensions.
PCEDT 2.0 uses two valency lexicons. We have PDT-Vallex for the Czech and Engvallex for the English data. Engvallex is interlinked with Propbank and Verbnet. These resources are stored in directory
valency_lexicons. You can browse through them in TrEd. Do not bother installing or copying them anywhere, they are already incorporated in the TrEd extensions – just like the schemata.
If you feel that you need more help or if something does not work, do not hesitate to contact us.