Principal investigator (ÚFAL): 
Provider: 
Grant id: 
20-09853S
Duration: 
2020 - 2022

Global Coherence

Global Coherence of Czech Texts in the Corpus-Based Perspective

The project aims at theoretical and corpus-based representation of global coherence in Czech written texts. Global coherence assumes a hierarchical representation of smaller (clauses, sentences) and larger text units (e.g. paragraphs) and existence of coherence relations between these units on all levels of the hierarchy. A single interconnected representation for the entire document is postulated, too. As a first step, up-to-date linguistic frameworks for global coherence analysis are critically evaluated. We benefit from our own long-term experience with describing various linguistic aspects of local coherence. Next, we will design a suitable scenario for representing global coherence with corpus methods and conduct a pilot annotation. The project combines and expands both the line of development of research on discourse and coherence at ÚFAL and recent advances in international discourse-oriented community. 

2020

In the first stage of the project, we have concentrated on the research of mutual configurations and hierarchical structures in local discourse relations (in local analytic approach), as compared to the principles of a global approach like the Rhetorical Structure Theory (RST). With qualitative and quantitative corpus methods and advanced querying system, we have described the ways and the extent, in which Czech data annotated for local coherence display features of higher text structure/global coherence. A first step of this research was published this year (Poláková and Mírovský, TSD 2020, see below).

On the basis of these findings, in terms of underlying theories and analytical methods in coherence processing, we have addressed the adequacy of some of the principles of local and global approaches to the description of discourse coherence on real texts, like the tree-like representation of documents (RST) or the minimality principle (Penn Discourse Treebank, PDTB). The findings for Czech data are quite similar to those for English data published earlier: that very few configurations of pairs of local discourse relation in fact break the tree-ness constraint applied in the RST. The most decisive factor here is the definition of a discourse unit (argument) in each theoretical frame, together with the annotators' biases in the local, incrementally proceeding analysis vs. the global perspective. A specific role is also played by the way of treatment of cues/signals of these relations, in our case specifically the treatment of secondary connectives (connective phrases).

We have further explored the role of long-distance (mostly anaphoric) relations and connectives, which, in different (global) analytic perspective, can be regarded as relations between large discourse units, relations of higher structure (Poláková et al., LREC 2020) and we have also studied specific connective roles of most common focalizers, which play a role in thematic progressions of a text and also function as operators in discourse relations (Hajičová, Mírovský, Štěpánková, PBML 2020).

Publications in 2020:

Lucie Poláková, Jiří Mírovský (2020): Mining Local Discourse Annotation for Features of Global Discourse Structure. In: 23rd International Conference on Text, Speech and Dialogue, pp. 50-60, Springer, Cham, Switzerland, ISBN 978-3-030-58322-4, https://www.springer.com/gp/book/9783030583224#aboutBook

Eva Hajičová, Jiří Mírovský, Barbora Štěpánková (2020): Focalizers and Discourse Relations. In: The Prague Bulletin of Mathematical Linguistics, ISSN 0032-6585, 115, pp. 187-197, https://ufal.mff.cuni.cz/pbml/115/art-hajicova-mirovsky-stepankova.pdf

Lucie Poláková, Kateřina Rysová, Magdaléna Rysová, Jiří Mírovský (2020): GeCzLex: Lexicon of Czech and German Anaphoric Connectives. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), pp. 1082-1089, European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.137.pdf

Presentations in 2020:

Workshop: "Explicit and implicit coherence relations: Different, but how exactly?", Humboldt-Universität zu Berlin, Germany, January 17-18, 2020:

Lucie Poláková: Implicit relation questions surfacing in Prague discourse projects

Šárka Zikánová: Factors influencing implicit discourse relations in Czech

Annual Meeting of Societás Linguistica Europea (SLE 2020), August 27:

Eva Hajičová: Focalizers and discourse relations