Czech RST Discourse Treebank 1.0

The Czech RST Discourse Treebank 1.0 (CzRST-DT 1.0, Poláková et al., 2023) is a dataset of 54 Czech journalistic texts manually annotated using the Rhetorical Structure Theory (RST; Mann and Thompson, 1988). Each text document in the treebank is represented as a single tree-like structure, the nodes (discourse units) are interconnected through hierarchical rhetorical relations.

The dataset also contains concurrent annotations of five double-annotated documents.

The original texts are a part of the data annotated in the Prague Dependency Treebank (Hajič et al., 2020), although the two projects are independent.

References

Jan Hajič, Eduard Bejček, Alevtina Bémová, Eva Buráňová, Eva Fučíková, Eva Hajičová, Jiří Havelka, Jaroslava Hlaváčová, Petr Homola, Pavel Ircing, Jiří Kárník, Václava Kettnerová, Natalia Klyueva, Veronika Kolářová, Lucie Kučová, Markéta Lopatková, David Mareček, Marie Mikulová, Jiří Mírovský, Anna Nedoluzhko, Michal Novák, Petr Pajas, Jarmila Panevová, Nino Peterek, Lucie Poláková, Martin Popel, Jan Popelka, Jan Romportl, Magdaléna Rysová, Jiří Semecký, Petr Sgall, Johanka Spoustová, Milan Straka, Pavel Straňák, Pavlína Synková, Magda Ševčíková, Jana Šindlerová, Jan Štěpánek, Barbora Štěpánková, Josef Toman, Zdeňka Urešová, Barbora Vidová Hladká, Daniel Zeman, Šárka Zikánová, Zdeněk Žabokrtský: Prague Dependency Treebank - Consolidated 1.0 (PDT-C 1.0). Data/software, LINDAT-CLARIAH, URL: http://hdl.handle.net/11234/1-3185, 2020.

William C. Mann and Sandra A. Thompson: Rhetorical Structure Theory: Toward a functional theory of text organization. Text-interdisciplinary Journal for the Study of Discourse 8 (3), pp. 243-281, 1988.

Lucie Poláková, Šárka Zikánová, Jiří Mírovský, Eva Hajičová: Czech RST Discourse Treebank 1.0. Data/software, ÚFAL MFF UK, Prague, Czech Republic, LINDAT/CLARIAH-CZ: http://hdl.handle.net/11234/1-5174, June 2023.