Monday, 15 May, 2023 - 14:00

Discourse Connective Lexicons: Best Practices and Lessons Learnt

Discourse connectives are words or phrases (e.g., because, as long as, either ... or) that indicate a discourse relation explicitly. While they play a relatively prominent role in some frameworks (Penn Discourse TreeBank), and are less salient in others (Rhetorical Structure Theory), specifically focusing on this group of words has proven to be beneficial for several downstream tasks (Machine Translation, Argument Mining, Summarisation). Apart from their syntactic heterogeneity (conjunctions, adverbials and, according to some definitions, prepositions as well), a particularly challenging aspect about connectives is that they can be both functionally and semantically ambiguous. We argue that having a lexicon of discourse connectives (for a particular language), listing -ideally exhaustively- all connectives present in that language, augmented with a set of attributes encoding syntactic and semantic information, is a useful resource for automated connective identification and disambiguation.

We maintain and further populate a platform ( hosting such connective lexicons for a number of languages. In this talk, I will cover strategies to populate such a lexical resource, and augment it with information useful for downstream tasks. In addition, I will report on some first experiments around transforming it into a bi-(and eventually multi-)lingual resource, from which observations on cross-lingual distribution of discourse relation senses can be drawn.


Peter Bourgonje has been working on Natural Language Processing for 15 years, both in industry and academia/applied research. He obtained his PhD on Discourse Processing at Potsdam University (Prof. Manfred Stede), and currently works as an NLP developer at an Antwerp-based company ( on resume parsing and legal text processing.


*** The talk will be delivered in person (MFF UK, Malostranské nám. 25, 4th floor, room S1) and will be streamed via Zoom. For details how to join the Zoom meeting, please write to sevcikova et ***