Linguistic expressions form patterns in discourse. Passages of text can be analyzed in terms of the individuals, concepts, times and situations that they introduce to the discourse. In this work, we focus on situation entities, which are expressed at the clause level. In her work on "modes of discourse", Smith (2003) distinguishes, among others, the situation entity types of STATE ("John loves cake"), EVENT ("Mike won the race") and GENERIC SENTENCES ("Lions are carnivores").
In this talk, I will give an overview of our corpus annotation endeavour, which has the aim of providing a foundation for an analysis of discourse at the level of situation entities and discourse modes. In addition to situation entity types, we annotate several underlying aspectual distinctions, which have partially been studied by the NLP community, but for which no large annotated corpora are available to date.
Specifically, knowing the lexical aspectual class of a verb in context is necessary to dinstinguish STATEs and EVENTs ("like" - stative, vs. "win" - dynamic). Recognizing whether the subject of a clause refers to a kind ("Lions are dangerous") or not ("Simba is cute") helps to identify GENERIC SENTENCES. Finally, habitual sentences express regularities (as in the GENERALIZING SENTENCE "Susie drives to work") rather than one-time events ("Yesterday, Susie cycled to work").
Our computational models for these three aspectual distinctions all reach above 80% accuracy, and interestingly perform only slightly worse for verb types not seen in the training data. Finally, I will present some intermediate results for automatically classifying situation entity types, which is on-going research.
Smith, Carlota S. Modes of discourse: The local structure of texts. Vol. 103. Cambridge University Press, 2003.
For more references and some recent publications, please see our project web site: