Textual coreference is generally taken to mean the use of various linguistic means (pronouns, synonyms, generalising nouns etc.) which function as anaphoric (occasionally cataphoric) reference devices. This reference is not realised by grammatical means alone, but also via context. Textual coreference devices are vague by nature and the identification of a coreferred element based purely on context is problematical, and therefore our approach is to concentrate for the time being only on the most frequent textual co-reference devices, i.e pronouns. The following textual coreference devices are identified:
3rd person personal and possessive pronouns; 1st and 2nd persons are excepted. (In the tectogrammatical tree, personal and possessive pronouns have the single t-lemma #PersPron
.)
the demonstrative pronouns ten, ta, to (=that).
with textual ellipsis, where a new node with the t-lemma substitute #PersPron
is added to the tectogrammatical tree (textual coreference is not identified here when the added node represents a pronoun in the 1st or 2nd person).
!!! Coreference with newly established nodes is closely linked to the selection of the t-lemma substitute, which in fact depends on the type of coreference (grammatical coreference - textual coreference - the node does not corefer; see Section 12.2, "Ellipsis of the dependent element"). When a dependent valency modification of a noun, adjective or adverb is added to the structure, for reasons of simplification and acceleration of the annotation, the working t-lemma #Gen
is selected, and therefore any coreference at these nodes is for the time being unrepresented (see Section 2.4.1, "General arguments and unspecified Actors").
Coreference is for the time being unrepresented with pronominal adverbs (tam (=there/thither), sem (=here/hither), tady (=here), tak (=thus) etc.) and in other pronominal expressions.
Cases of pronouns with which coreference is normally represented (on (=he/it), jeho (=his/its), ten (=that)) that do not corefer, are described in Section 3.2, "No textual coreference".
Transitional type of textual coreference (#Unsp
). A transitional type between non-coreference and textual coreference involves cases where the Actor of a verb represented by a newly established node with the t-lemma #Unsp
is not specified. The coreferred element of the Actor unexpressed at surface level cannot be precisely determined: it refers to the preceding text rather than to a specific item, and therefore a node with the explicitly anaphoric t-lemma #PersPron
, is not used. Although the referent of the newly established node is unclear, the group of people (or objects) to which the node refers can be at least partially identified from the context. Cf.:
U Nováků {#Unsp
.ACT
} dobře vaří. (=They cook well at Nováks'.)
No explicit coreferred element of the Actor of the verb vařit (=to cook) occurs in the text; however on the basis of the context it can be deduced that it is probably the chefs at the Nováks' restaurant. A node with the t-lemma #PersPron
(which stands for an explicit coreferred element) is therefore not selected, nor is a node with the t-lemma #Gen
(the Actor is not generalised: it can be more closely specified), but a node with the t-lemma #Unsp
; however, no coreference relation is marked in the tree.
On this type, see Section 2.4.1, "General arguments and unspecified Actors".
Textual coreference is represented by the attribute coref_text.rf
and coref_special
(see Section 1, "Representing coreference in the tectogrammatical trees").