Table of Contents
In the following text under 'assignment' or 'function' always the value of the attribute afun is meant. Such formulations as “to suspend node x under node y” or “node x depends on node y” mean that node y is the governing node of node x. In graphic representation a governing node stands always higher than a dependent node. (For the definition of the graph of sentence representation see Analytical level)
The rules for annotation at analytical level deal with sentence structure only (i.e., with determining which word depends on which) and with identifying the types of dependency (in terms of analytic functions). The structure of the sentence is captured directly in the graph of the sentence, the analytical function being represented as the value of the attribute afun. The other attributes serve as information for the annotator only - their values either have already been fixed or are to be determined in another way. Annotators are allowed to insert in addition to the values of the attribute afun just notes into the values of the attributes agreed upon (origap for the whole sentence, mstag for individual nodes). Owing to technical reasons also such edges can be found in the graphic representation that do not correspond to the traditional concept of grammatical dependency (e.g., coordination, apposition, “distant” dependency if an ellipsis occurs in the sentence, compound (improper) prepositions and conjunctions, etc.). For the sake of simplicity even for these edges such terms as “depends”, “is susended, hangs”, etc. will be used. Where confusion is impending, it will be explicitly stated whether grammatical or “technical” dependency is the case.
For manual annotation the input data are preprocessed with the aid of programmes, which impress an initial structure on the sentence annotated at the morphological level and convert it to the format required by the programme for manual annotation (GRAPH). The initial structure is trivial - each word is regarded as dependent on the word immediately preceding. The only exception represent punctuation marks at the end of the sentence, which will depend on the root of the tree. In the course of this initialisation even such functions are preliminarily assigned with which there is high probability that their automatic assignation is correct. The attribute afun with other nodes obtains value ???, so that in using the programme GRAPH the still unprocessed nodes can be identified at first sight.
The annotation proceeds in the following way:
The root of the tree is formed by an added node marked with # (see The root of the tree AuxS).
In a prototypically structured sentence, the predicate of the sentence depends on the root together with a node labelled in accordance with the final punctuation mark. If the sentence has no predicate, the “remaining” sentence parts may be suspended here side by side (see point 4); in a compound sentence the main node for coordination depends on the root (see point 5).
we look up the predicate of the sentence (which will be suspended according to point 2, see Predicate Pred, Pnom, AuxV),
we find the subject of the sentence (in our conception it depends on the predicate, see Subject Sb),
we determine the modifying elements of verbal complements (Atr, Obj, Adv, Atv, AtvV). However, also all other (remaining) nodes must be dependent (sometimes only in a purely technical way), see Auxiliary sentence members AuxC, AuxP, AuxZ, AuxO, AuxT, AuxR and AuxY and Graphic symbols (punctuation); the root of the tree AuxS, AuxK, AuxX, AuxG).
If there is an ellipsis and if an element lacks its governing node, we use a special marker ExD ( Ellipsis ExD, ExD_Co). Essentially, also a one-member non-verbal sentence is regarded as an ellipsis ( One-member sentences without a verb ExD, ExD_Co).
The problems of coordination and apposition (Coord, <afun>_Co and Apos, <afun>_Ap) are solved technically by choosing the so-called main node (a conjunction, a comma etc.) that depends on a higher node; individual members of coordination or apposition constructions then 'depend' on this node (technically).
Parenthetical parts of sentences are denoted by a special affix ( Parenthesis).
All rules and functions are described in detail in the corresponding special sections. Moreover, in the section Complex phenomena solutions of some more complex phenomena are described, which could not be included in the above sections.
The rules have been designed and formulated so as to cover as many language phenomena as possible. Owing to the character of natural languages, of course, it has been impossible to capture everything. Therefore, also utterances not corresponding to structures described in our rules occur in the corpus. In such exceptional cases the decisions have been left up to the annotators' language feeling to let the annotators decide individually.
The rules for annotation at the analytical level (or the description of the sentence representation at analytical level and its relation to the morphological level) are based above all on the principles of annotation (Chapter 2., Principles of Annotation), on the demand for consistency, the need to explicitly capture the relations between all occurences of word-forms in the sentence, and in view of the assumed representation of the sentence at the tectogrammatical level. Wherever possible they observe traditional conceptions of Czech grammar as it is described especially in V. Šmilauer's book 'Novočeská skladba' (Modern Czech Syntax). It should be pointed out that relatively frequent differences can be met owing to the “non-computational” orientation of the handbook in question, where a number of quite common phenomena fail to be explicitly, systematically and sometimes even consistently described. We also differ from the handbook in understanding the subject as depending on the predicate (see the part Modifying predicate and the part Definition of subject).
To give an example of the complications: the predicate often consists of more words and we have to intercept each of them separately. Therefore, in addition to the function Pred for the main node of the predicate we also use the symbols AuxV for the auxiliary verb to be (see Compound verb forms) and Pnom for the nominal part of the predicate (see the structure of verbal-nominal predicate).
If a predicate is connected with an infinitive, we label the latter as Obj, which otherwise denotes classical object (see The structure of the compound verbal predicate). The afun Atr is used not only for classical attribute, but also for certain elements of compound addresses and names (see Addresses and names of persons and institutions), for items in foreign language texts (see Foreign words in the text) and for the elements of numerical expressions (which are difficult to represent anyway since various further problems get involved, see Expression with numerals, figures in different functions). In addition, we also introduce the so-called combined functions (see Combined functions).
As compared with Šmilauer's conception, we also define in a slightly different way the border lines between individual sentence parts, particularly of the object (see Definition of Obj), adverbial (see Specification of adverbials) and complement (see Definition of complement).
With some examples which illustrate individual rules, graphs representing parts of sentences roughly in the shape in which they are rendered in the annotation by programme GRAPH are used. In this way the values of the attributes form (the upper text with a node) and afun (the lower part of the text, below the value of the attribute form) are represented.
In the following chart all admissible values of the attribute afun are listed.
|Pred||Predicate, a node not depending on another node; depends on #|
|Atv||Complement (so-called determining) technically hung on a non-verb. element|
|AtvV||Complement (so-called determining) hung on a verb, no 2nd gov. node|
|Pnom||Nominal predicate, or nom. part of predicate with copula be|
|AuxV||Auxiliary vb. be|
|Apos||Apposition (main node)|
|AuxR||Ref., neither Obj nor AuxT, Pass. refl.|
|AuxP||Primary prepos., parts of a secondary p.|
|AuxO||Redundant or emotional item, 'coreferential' pronoun|
|AuxX||Comma (not serving as a coordinating conj.)|
|AuxG||Other graphic symbols, not terminal|
|AuxY||Adverbs, particles not classed elsewhere|
|AuxS||Root of the tree (#)|
|AuxK||Terminal punctuation of a sentence|
|ExD||A technical value for a deleted item; also for the main element of a sentence without predicate (Externally-Dependent)|
|AtrAtr||An attribute of any of several preceding (syntactic) nouns|
|AtrAdv||Structural ambiguity between adverbial and adnominal (hung on a name/noun) dependency without a semantic difference|
|AdvAtr||Dtto with reverse preference|
|AtrObj||Structural ambiguity between object and adnominal dependency without a semantic difference|
|ObjAtr||Dtto with reverse preference|