The Prague Dependency Treebank 1.0

Conversion between dependency and phrase structures

dep2phr.pl

Reads a file in CSTS format (default format of PDT) and writes a file in FS format (secondary file format, readable for the tree viewers). The output file contains phrase trees instead of dependency trees, so there are nonterminal nodes not corresponding to any single word in the sentence. The program operates with lemmas, not word forms.

Usage: the input is read from stdin or from a file whose name is supplied as command line argument. The output is written on stdout.

Platform: a perl script.

dep2tree

Reads a file in CSTS format (default format of PDT) and writes a file with phrase structures in a self-explaining bracketed format (example: (TOP (VP Přišel/>Vp (NP Pavel/>N1 ) ) ./Z )). Word following a left bracket is nonterminal (phrase name). The words are presented as form/tag pairs, tags are shortened to two characters. Phrase heads are marked by preceding their tag with the > character. The lemmas are lost during the conversion process.

Please be aware that the phrase structure is not capable of capturing nonprojective constructions occurring in Czech. This may result in structures violating the original word order.

Usage: the input is read from stdin. The output is written on stdout.

Platform: dep2tree is a unix shell script. It is only a front end that calls a bunch of perl scripts (check the perl path on the first line of each of them!) and even a binary file (the front end assumes it's running under Linux and calls the appropriate binary; other binaries for Suns also available; recompile the appended source code for other platforms).

Acknowledgement: this is a code by Michael Collins written for the JHU Workshop '98 project.

tree2dep.pl

Reads a file generated by dep2tree and ports it back to CSTS. As dep2tree loses information, the resulting file will by no means be identical to the original!

Usage: the input is read from stdin or from a file whose name is supplied as command line argument. The output is written on stdout.

Platform: a perl script.