Chapter 3. Rules of annotation

Table of Contents

List of analytical functions
A simple sentence (clause) containing a verb; parts of sentence in a dependency relation
Predicate Pred, Pnom, AuxV
Subject Sb
Attribute Atr, AtrAdv, AdvAtr, AtrAtr, AtrObj, ObjAtr
Object Obj, ObjAtr, AtrObj
Adverbials (and borderline cases)
Complement (verbal attribute) Atv, AtvV
Auxiliary sentence members AuxC, AuxP, AuxZ, AuxO, AuxT, AuxR and AuxY
Graphic symbols (punctuation); the root of the tree AuxS, AuxK, AuxX, AuxG
Ellipses - one-member sentences without a verb
Ellipsis ExD, ExD_Co
One-member sentences without a verb ExD, ExD_Co
Relations between sentences and sentence parts (other than dependency)
Coordination (sentential, of sentence parts) Coord, <afun> _Co
Apposition Apos, <afun>_Ap
Parenthesis
General rules
A 'frozen' parenthesis AuxY_Pa
An independent sentential form (containing a predicate) Pred_Pa
A syntactically incorporated sentence part with <afun>_Pa
A syntactially non-incorporated sentence part or sentential form; ellipsis; an independent sentence part; vocative; interjection Exd_Pa
Complex phenomena
Direct speech
Addresses and names of persons and institutions
Expression with numerals, figures in different functions
Referring words
The boundary line between free and 'bound' dative
Collocations, phraseologisms (phrasemes)
Accompaniment
Reflexive se, si
Phrases of comparison with conjunctions jako (as), než (than)
Foreign words in the text
Bibliographical references
Composed Czech proper names
Freely adjoined sentence parts

In the following text under 'assignment' or 'function' always the value of the attribute afun is meant. Such formulations as "to suspend node x under node y" or "node x depends on node y" mean that node y is the governing node of node x. In graphic representation a governing node stands always higher than a dependent node. (For the definition of the graph of sentence representation see Analytical level)

The rules for annotation at analytical level deal with sentence structure only (i.e., with determining which word depends on which) and with identifying the types of dependency (in terms of analytic functions). The structure of the sentence is captured directly in the graph of the sentence, the analytical function being represented as the value of the attribute afun. The other attributes serve as information for the annotator only - their values either have already been fixed or are to be determined in another way. Annotators are allowed to insert in addition to the values of the attribute afun just notes into the values of the attributes agreed upon (origap for the whole sentence, mstag for individual nodes). Owing to technical reasons also such edges can be found in the graphic representation that do not correspond to the traditional concept of grammatical dependency (e.g., coordination, apposition, "distant" dependency if an ellipsis occurs in the sentence, compound (improper) prepositions and conjunctions, etc.). For the sake of simplicity even for these edges such terms as "depends", "is susended, hangs", etc. will be used. Where confusion is impending, it will be explicitly stated whether grammatical or "technical" dependency is the case.

For manual annotation the input data are preprocessed with the aid of programmes, which impress an initial structure on the sentence annotated at the morphological level and convert it to the format required by the programme for manual annotation (GRAPH). The initial structure is trivial - each word is regarded as dependent on the word immediately preceding. The only exception represent punctuation marks at the end of the sentence, which will depend on the root of the tree. In the course of this initialisation even such functions are preliminarily assigned with which there is high probability that their automatic assignation is correct. The attribute afun with other nodes obtains value ???, so that in using the programme GRAPH the still unprocessed nodes can be identified at first sight.

The annotation proceeds in the following way:

  1. The root of the tree is formed by an added node marked with # (see The root of the tree AuxS).

  2. In a prototypically structured sentence, the predicate of the sentence depends on the root together with a node labelled in accordance with the final punctuation mark. If the sentence has no predicate, the "remaining" sentence parts may be suspended here side by side (see point 4); in a compound sentence the main node for coordination depends on the root (see point 5).

  3. We comply with the principles of dependency analysis:

  4. If there is an ellipsis and if an element lacks its governing node, we use a special marker ExD ( Ellipsis ExD, ExD_Co). Essentially, also a one-member non-verbal sentence is regarded as an ellipsis ( One-member sentences without a verb ExD, ExD_Co).

  5. The problems of coordination and apposition (Coord, <afun>_Co and Apos, <afun>_Ap) are solved technically by choosing the so-called main node (a conjunction, a comma etc.) that depends on a higher node; individual members of coordination or apposition constructions then 'depend' on this node (technically).

  6. Parenthetical parts of sentences are denoted by a special affix ( Parenthesis).

All rules and functions are described in detail in the corresponding special sections. Moreover, in the section Complex phenomena solutions of some more complex phenomena are described, which could not be included in the above sections.

The rules have been designed and formulated so as to cover as many language phenomena as possible. Owing to the character of natural languages, of course, it has been impossible to capture everything. Therefore, also utterances not corresponding to structures described in our rules occur in the corpus. In such exceptional cases the decisions have been left up to the annotators' language feeling to let the annotators decide individually.

The rules for annotation at the analytical level (or the description of the sentence representation at analytical level and its relation to the morphological level) are based above all on the principles of annotation (Chapter 2., Principles of Annotation), on the demand for consistency, the need to explicitly capture the relations between all occurences of word-forms in the sentence, and in view of the assumed representation of the sentence at the tectogrammatical level. Wherever possible they observe traditional conceptions of Czech grammar as it is described especially in V. Šmilauer's book 'Novočeská skladba' (Modern Czech Syntax). It should be pointed out that relatively frequent differences can be met owing to the "non-computational" orientation of the handbook in question, where a number of quite common phenomena fail to be explicitly, systematically and sometimes even consistently described. We also differ from the handbook in understanding the subject as depending on the predicate (see the part Modifying predicate and the part Definition of subject).

To give an example of the complications: the predicate often consists of more words and we have to intercept each of them separately. Therefore, in addition to the function Pred for the main node of the predicate we also use the symbols AuxV for the auxiliary verb to be (see Compound verb forms) and Pnom for the nominal part of the predicate (see the structure of verbal-nominal predicate).

If a predicate is connected with an infinitive, we label the latter as Obj, which otherwise denotes classical object (see The structure of the compound verbal predicate). The afun Atr is used not only for classical attribute, but also for certain elements of compound addresses and names (see Addresses and names of persons and institutions), for items in foreign language texts (see Foreign words in the text) and for the elements of numerical expressions (which are difficult to represent anyway since various further problems get involved, see Expression with numerals, figures in different functions). In addition, we also introduce the so-called combined functions (see Combined functions).

As compared with Šmilauer's conception, we also define in a slightly different way the border lines between individual sentence parts, particularly of the object (see Definition of Obj), adverbial (see Specification of adverbials) and complement (see Definition of complement).

With some examples which illustrate individual rules, graphs representing parts of sentences roughly in the shape in which they are rendered in the annotation by programme GRAPH are used. In this way the values of the attributes form (the upper text with a node) and afun (the lower part of the text, below the value of the attribute form) are represented.

List of analytical functions

In the following chart all admissible values of the attribute afun are listed.

 

afun Description
Pred Predicate, a node not depending on another node; depends on #
Sb Subject
Obj Object
Adv Adverbial
Atv Complement (so-called determining) technically hung on a non-verb. element
AtvV Complement (so-called determining) hung on a verb, no 2nd gov. node
Atr Attribute
Pnom Nominal predicate, or nom. part of predicate with copula be
AuxV Auxiliary vb. be
Coord Coord. node
Apos Apposition (main node)
AuxT Reflex. tantum
AuxR Ref., neither Obj nor AuxT, Pass. refl.
AuxP Primary prepos., parts of a secondary p.
AuxC Conjunction (subord.)
AuxO Redundant or emotional item, 'coreferential' pronoun
AuxZ Emphasizing word
AuxX Comma (not serving as a coordinating conj.)
AuxG Other graphic symbols, not terminal
AuxY Adverbs, particles not classed elsewhere
AuxS Root of the tree (#)
AuxK Terminal punctuation of a sentence
ExD A technical value for a deleted item; also for the main element of a sentence without predicate (Externally-Dependent)
AtrAtr An attribute of any of several preceding (syntactic) nouns
AtrAdv Structural ambiguity between adverbial and adnominal (hung on a name/noun) dependency without a semantic difference
AdvAtr Dtto with reverse preference
AtrObj Structural ambiguity between object and adnominal dependency without a semantic difference
ObjAtr Dtto with reverse preference