Discourse relations are marked by connectives and non-connective signals like specific syntactic constructions (e.g., inversion or clefts), sense relations between lexemes, or punctuation. However, as opposed to connectives, non-connective signals can co-occur with discourse relations without contributing to the signalling of the relation, in fact, they can do so even if they seem to present evidence against the actual discourse relation (Hoek et al. 2019, Zeldes and Liu 2020), like in the case of antonymy in an ELABORATION relation. The annotation of discourse signals in most previous work follows a principle of ‘signal relevance’, annotating only the signals deemed relevant for the interpretation of the relation: In the Penn Discourse Treebank (Webber et al. 2018), those signals comprise connectives, non-connective lexical and syntactic signals; in the RST Signalling Corpus (Das et al. 2015) the range of such signals is much wider (see Poláková et al. 2017 for a comparison). Crible (2022) criticises this strategy and advocates annotating all the signals present in a relation.
This discussion raises theoretical and practical issues, starting with the general question of what exactly makes a signal relevant for a relation. From a more practical point of view, one needs to investigate whether it is really necessary to annotate the full range of signals: If the ‘relevant’ signals alone can predict the eventual correlations between signals and relations, it would not be necessary to invest additional effort to annotate the full range of signals. At the same time, it would be interesting to have an estimate of the effort necessary to extend annotations of relevant signals to include all the signals. To shed more light on these issues, we conducted a study annotating the full range of signals in a subcorpus of 1,000 discourse relations of the RST Signalling Corpus. We found that while only one third of the signals was annotated in the corpus, this had no strong impact on the distribution of signals across discourse relations.
Crible, L. (2020). Weak and strong discourse markers in speech, chat, and writing: Do signals compensate for ambiguity in explicit relations? Discourse Processes, 57, 793–807.
Das, D., Taboada, M., & McFetridge, P. (2015). RST Signalling Corpus LDC2015T10. Linguistic Data Consortium, Philadelphia.
Hoek, J., Zufferey, S., Evers-Vermeul, J., & Sanders, T. (2019). The linguistic marking of coherence relations: Interactions between connectives and segment-internal elements. Pragmatics & Cognition. 25: 275–309.
Poláková, L., Mírovský, J., & Synková, P. (2017). Signalling implicit relations: A PDTB-RST comparison. In: Dialogue and Discourse, 8, pp. 225-248.
Webber, B., Prasad, R., Lee, A., & Joshi, A. (2018). The Penn Discourse Treebank 3.0 Annotation Manual.
Zeldes, A., & Liu, Y. (2020). A neural approach to discourse relation signal detection. Dialogue & Discourse. 11, 1– 33.
Markus Egg is currently Professor of English Language at Humboldt-Universität zu Berlin. He received his doctorate degree from the University of Konstanz in 1993 and his Habilitation degree from the University of Saarbrücken in 2001. He was previously employed at the IBM Scientific Research Centre in Heidelberg, the Department of Computational Linguistics at the Universität des Saarlands, and as an Associate Professor at the Centre for Linguistics and Cognition Groningen at the University of Groningen. His research interests are semantics, syntax, and pragmatics and their interfaces, both from a theoretical and a computational point of view.
*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ufal.mff.cuni.cz ***