Documentation

The Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0, Poláková et al. 2023) is a dataset of 54 Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST, Mann and Thompson 1988). Each text document in the treebank is represented as a single tree-like structure, the nodes (discourse units) are interconnected through hierarchical rhetorical relations.

The dataset also contains concurrent annotations of five double-annotated documents.

The original texts are a part of the data annotated in the Prague Dependency Treebank (Hajič et al., 2020), although the two projects are independent.

The annotation in Czech RST Discourse Treebank is based in large part on the RST version used for annotation in the Potsdam Commentary Corpus and documented in the following two annotation guidelines (English and German):

Annotation scheme for the Czech treebank is described in the Annotation Manual (in Czech, available upon request). Compared to the Stede et. al (2017) version, guidelines for Czech have been modified in the following basic points:

  • Segmetantion: segmentation of discountinuous units, segmentation of relative clauses, attribution and reported contents
  • Structure: changes resulting from the new segmentation principles, some further constraints on the structure
  • Relation inventory: introduction of 5 new labels (mostly for the needs of reversed nuclearity), overall 36 rhetorical relations + 1 technical relation (Same-unit).

 

References

Luke Gessler, Yang Liu and Amir Zeldes (2019). A Discourse Signal Annotation System for RST Trees. In: Proceedings of Discourse Relation Treebanking and Parsing (DISRPT 2019). Minneapolis, MN, pp. 56-61.

Jan Hajič et.al. (2020): Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0). Data/software, LINDAT-CLARIAH, URL: http://hdl.handle.net/11234/1-3185.

Mann, W. C. and Thompson, S. A. (1988). Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8 (3), 243-281.

Stede, M., M. Taboada and D. Das (2017). Annotation Guidelines for Rhetorical Structure. University of Potsdam and Simon Fraser University, 2017

Stede, M. et al.: Handbuch Textannotation: Potsdamer Kommentarkorpus 2.0 . Potsdam Cognitive Science Series 8, Universitätsverlag Potsdam, 2016

Taboada, Maite and William C. Mann. (2006b). Rhetorical Structure Theory: Looking back and moving ahead. Discourse Studies, 8 (3), 423-459.

Project webpage: https://ufal.mff.cuni.cz/grants/global-coherence
rstWeb User Guide: https://corpling.uis.georgetown.edu/rstweb/info
Rhetorical Structure Theory webpage: https://www.sfu.ca/rst

 

RST analysis of an example text: