Data

The data are stored in the directory data. They are organized into sections numbered 001 - 245. Since this is not a full release, only a subset (62 sections) is on the CD. The numbering corresponds to the original PTB-WSJ naming convention. The filenames are the same (except for the extensions of course), and each PTB-WSJ section is divided into ten (or less) smaller PEDT sections. For example, the PTB-WSJ section 01 yields ten PEDT sections 010, 011, ..., 019. The three layers of annotation - phrasal for original PTB-WSJ data, analytical and tectogrammatical - are stored in separate files. Their extensions are p.gz, a.gz and t.gz, respectively. Beware - unpacking the files (gz is a compressed file format) will break linking of the layers and you will not be able to open the unpacked files in TrEd.