The original Penn Treebank annotation is displayed as an additional layer of the corpus. We call this layer p-layer (phrase-structure layer). This layer is not aligned with any of the Czech parts (layers) of the corpus.


The original bracketing was converted into PML so that it can be viewed and processed by the annotation tool TrEd. Each node was assigned its unique ID. All original attribute values were preserved. Figure1 presents an example of a Penn Treebank sentence, as displayed by TrEd.

Terminal nodes that represent tokens are yellow, while traces are grey. Labels of non-terminal nodes are displayed as well. POS-tags are attached to the terminal nodes, instead of being treated as separate pre-terminals. The IDs are stored in each node, respectively (but by default not displayed). The tokens are in addition lemmatized; placing the mouse cursor over a node displays its ID and lemma in the complete list of attribute values that appears in a separate window (see Figure2).