This web page is dedicated to ÚFAL activities related to the RST Discourse Treebank and the RST Signalling Corpus. For the moment, it gives information on:
-
how to transform RST-DT data, RST-SC data and PDTB implicit relations into a PML format
-
distributions of RST signals in matched PDTB implicit relations (in pdf below)
Below we give instructions how to transform data of the RST Discourse Treebank, the RST Signalling Corpus and implicit relations from the Penn Discourse Treebank into a PML format, in order to allow for browsing the data in the tree editor TrEd and querying the data in the PML-Tree Query. You need to have the tree editor TrEd installed, along with an extension for the RST (in TrEd, go to Setup->Manage Extensions->Get New Extensions, and search for rst).
The transformation steps are:
-
transformation of the native RST-DT/RST-SC format into the PML format
-
Transformation scripts are a part of the rst TrEd extension - in the directory tools, set paths to the original RST trees (from the RST-SC!) and to the annotation of signals (from the RST-SC) in the Makefile, and run:
-
adding the implicit relations from the PDTB COLUMN format into the RST files in the PML format
-
Transformation scripts are a part of the rst TrEd extension - in the directory tools, set a path to the original PDTB-2.0 distribution in the Makefile, and run:
After this transformation, you can open the data in the tree editor TrEd (the transformed data are in the directory tools/data). To search in the data, you also need to install the TrEd extension for the PML-Tree Query (in TrEd, go to Setup->Manage Extensions->Get New Extensions, and search for pmltq).
Problems that occured
-
Files named file1 - file5 need to be renamed to their original PTB names (using information from the index.html file in the RST-DT distribution).
-
One of the transformed implicit relations did not have the target node set.
-
One of the matching RST relations does not have annotated signals (already in the original data - between text spans 164 and 163 in the file wsj_0681)
-
To-Do: The transformation scripts do not fill-in the attribute order.
RST signal distributions
The following table is an extension of Table 4 in the submitted D&D paper. It gives full distributions of RST signals in the 472 implicit PDTB relations we were able to match with original RST relations.