Graphic symbols (punctuation); the root of the tree AuxS, AuxK, AuxX, AuxG

By punctuation all graphic symbols are understood which do not form a part of a word or a number written in numerals. This concerns, above all, the comma, treated by special rules and possibly having different functions (it most usually obtains afun AuxX - see Comma AuxX), further the fullstop, question mark, exclamation mark, semicolon, colon, parentheses and inverted commas of all kinds, asterisks, slashes etc. All these punctuation marks obtain their afuns according to their position in the sentence: it either concerns punctuation at the end of the sentence, separating the given sentence from the subsequent one (see Terminal symbol of the sentence AuxK), or punctuation within the sentence (obtaining, as a rule, afun AuxG - see Punctuation, other graphic symbols AuxG). A special case is represented by the additional symbol # to denote the root of the tree (see The root of the tree AuxS).

For any punctuation marks (with the exception of the root AuxS, with which it can t happen) the rules concerning ExD hold. If, therefore, as a consequence of an ellipsis, the punctuation occurs in some other position in the tree than normally expected, the afuns AuxK, AuxX, AuxG (in the same way as the afuns of current sentence members) get replaced by the afun ExD.

The root of the tree AuxS

Any segment subjected to our analysis obtains automatically the additional symbol #. This symbol always stands for the root of its tree being automatically marked afun AuxS in advance.

In current circumstances the predicate of the main clause depends on # (with afun Pred), as well as the final punctuation mark (see Predicate Pred, Pnom, AuxV and Terminal symbol of the sentence AuxK). As far as incomplete sentences without a predicate are concerned, one or more nodes with afun ExD depend on the symbol #, which, as the case may be, can be suspended with the aid of intermediate AuxP or AuxC (see Ellipsis ExD, ExD_Co).

In some specific cases a subordinating conjunction with afun AuxC can stand depend on the symbol #, with the predicate (Pred) dependent on the conjunction (see Ellipsis of the governing clause).

All the members mentioned above can, in addition, be coordinated (a node with afun Coord will depend on the root, see Coordination (sentential, of sentence parts) Coord, <afun> _Co), or there can be a parenthesis (the member obtaining a suffix _Pa, see Parenthesis.

  1. image

    Odstoupil  
    He-resigned  
  2. image

      o   voze,   ty   o   koze  
    I   about   carriage   you   about   goat  
    we are at cross purposes

Terminal symbol of the sentence AuxK

INFO AUXK

This section has only two parts. In the part Current sentences all current terminations of sentences are described, as well as the ways they are treated in our analysis. In the part Titles and ciphers an example of a newspaper title is shown with its position in the tree structure.

Current sentences

INFO AUXK

The terminal punctuation mark in the sentence obtains a special analytical function AuxK. If there are more than one punctuation marks at the end of the sentence, all of them are handled in this way. This function (as one of the few treated in this way) is assigned to the terminal punctuation already in the preparation of data for manual annotation. It is necessary, however, to examine whether the punctuation has not some other function (e.g., it could be a fullstop following an abbreviation, etc., where the function AuxG should be used, or a case in which in a tree more than one AuxK occur, which then are to be supplied manually according to the following conventions). A normal termination of a sentence by means of a fullstop is illustrated by an example:

  1. image

    Pracuje  
    He-is-working  

If (usually for technical reasons, above all owing to imperfections in programmes detecting the boundaries between sentences) it is necessary to represent two sentences within one tree, the function AuxK is used to mark the punctuation terminating the other sentences as well. No such sentence ought to occur in the resulting corpus (it will be corrected manually). Parentheses which figure in the example depend, conforming to the general rules of handling punctuation, on the governing member of the introduced part of the sentence.

  1. image

    Přišel.   (   Ale   proč   !?   )  
    he-came   (   but   why   ?   )  

If at the end of a sentence there is only one punctuation symbol which stands for more symbols (e.g., if an indicative sentence terminates by an abbreviation followed by a fullstop, but in the text only one fullstop appears, as it is usual according to the rules of orthography), the terminal punctuation of the sentence is supposed to have the lowest priority: the node in question is regarded as a part of the sentence rather than as the terminal punctuation.

image

Přišel   za   30   min   .  
he-arrived   in   30   min   .  

Typographical conventions require some punctuation symbols to change their natural sequences. Even in such cases we proceed by making the punctuation depend on the node to which it naturally belongs. A non-projective tree results in this way. Let us illustrate this by the following example. The quotation marks in accordance with the general rules of punctuation depend on the governing word of the subtree introduced by them. The terminal fullstop depends on the root of the tree, although in terms of "word-order" the concluding quotation marks may precede.

image

Byl   "dravcem".  
he-was   predator  

In most of the cases the terminal punctuation depends on the root of the tree (where, by the way, it has often been already placed by the programme for data preparation). However, a "terminal" punctuation is approached in a special way if it obviously belongs to some independent clause (e.g., to direct speech or parenthesis - both are contained in the example); this holds also in cases in which they are situated at the end of the sentence. (Even so they, in fact, fulfil two functions, of which we prefer the function of terminating the parenthesis or direct speech). In this case this "terminal" punctuation depends on the predicate or some other technically governing node of the clause to which it belongs (it also obtains afun AuxK). If this governing node is missing, the mentioned "terminal" punctuation gets dependent on the nearest higher node and obtains afun ExD.

image

Petr   tenkrát   řekl   (vzpomínáte?)   "   Potřebuji   počítač   "  
Petr   then   said   do-you-remember   "   I-need   computer   "  
Titles and ciphers

INFO AUXK

  1. image

    Praha   (jxk)   -   President   navštívil   metropoli  
    Prague   (jxk)   -   President   visited   metropolis  

Comma AuxX

Comma is used to stand between clauses and to separate individual members of coordination or apposition. With the exception of the case in which a comma represents the main node of a coordination or apposition, its function always is AuxX (or ExD). If it represents the main node of coordination it obtains afun Coord (see also Coordination (sentential, of sentence parts) Coord, <afun> _Co). If it represents the main node of apposition, it obtains afun Apos (see also Apposition Apos, <afun>_Ap). The dependence of comma (on its "governing" node) is technical, not grammatical, similarly as it is with the rest of punctuation.

1. A comma separating a clause will depend on the node (technically) governing the subordinate clause, which is either a conjunction with the function AuxC (in the case of a subordinate conjunctional clause), or on some other governing node of the subordinate clause (usually a predicate with dependent relative clause) with the pertinent function; if the predicate is elided or if the conjunction is missing, comma depends on the nearest superior node and it obtains the function ExD (see Ellipsis ExD, ExD_Co). If the subordinate clause is inserted and both its separating commas are present in the sentence, both will depend on the same node.

  1. dům,   který   pláče  
    house   that   is-weeping  
  2. viděl,   že   spí  
    he-saw   that   s/he-was-sleeping  

2. If comma separates a parenthesis, it depends on the (technically) governing node of the parenthesis. If the (grammatically) governing member of the parenthesis is a node with the value of the function ExD_Pa, then the comma (or both the commas if the parenthesis is placed inside the sentence) depends on the nearest higher node and obtains the function ExD (mind: not ExD_Pa). This concerns any other punctuation separating parenthesis (from one side or from both sides).

  1. image

    voda   se,   abych   tak   řekl,   umoudřila  
    water   Refl   so-that   so   I-said   grew-wise  
    the water, so to speak, grew wise
  2. image

    před   smrtí,   neznámo   proč,   si   koupil   tramvajenku  
    before   death   unknown   why   Refl   he-bought   tram-pass  
    before dying, for reasons unknown, he bought a tram-pass

3. Commas separating sentences or coordinated sentence parts are hung on the relevant coordinating conjunction or some other coordinating node with the afun Coord (e.g. ale but, a and, neboť since, proto therefore etc.). It may happen that even if such a node of coordination is introduced, we have to make use of the comma: this comma will be the last one in the series and will thus be labelled as Coord. The other delimiting commas will get the afun AuxX and will hang on the last one. A more detailed instruction concerning coordination is given in Coordination (sentential, of sentence parts) Coord, <afun> _Co.

  1. důvěřuje   magii,   mystice   a   jiným   pověrám  
    he-trusts   in-magic,   mystic   and   other   superstitions  

4. If there is a combination of several phenomena, it might happen that some comma is missing in the sentence. More exactly, one comma displays several functions though it is represented by a single node. When deciding which of its functions should be assigned to this node, we hang such a comma "as high as possible". If a coordinating comma with afun Coord cooccurs with some other comma (with the afun AuxX or ExD), the coordination function has a priority. If a coordinating function cooccurs with the function AuxX and other commas, the coordinating comma will hang on the highest node in the tree, on which it may hang. If the cooccurrence of commas takes place on the same level of the tree (e.g. if the comma functions as end of an embedded clause and at the same time as an opening of another clause), the following priorities should be obeyed:

  1. coordinating comma with Coord

  2. comma opening an embedded clause

  3. coordinating comma with the afun AuxX

  4. comma closing an embedded clause

Punctuation, other graphic symbols AuxG

INFO AUXG

The conditions under which a punctuation mark is assigned afun AuxG and the specification of exceptions are described in section Specification of the afun AuxG. The following sections are devoted to individual types of punctuation marks from the point of view of their functions (meaning) and to the representation of these types. The following functions are distinguished: bracketing, closing an embedded part of the sentence (Bracketing AuxG), introductory, delimiting a part of a text (Introducing punctuation (colon)), the punctuation mark after an abbreviation (Punctuation marks after an abbreviation (fullstop)), the punctuation mark after a serial numeral (Punctuation mark after a serial numeral (fullstop)), superfluous punctuation marks (A superfluous punctuation mark (three dots)), the hyphen with -li (Punctuation mark with the conjunction -li (hyphen)), and the hyphen in compounds (Punctuation marks in compound proper names).

Specification of the afun AuxG

INFO AUXG

In principle, punctuation marks are assigned afun AuxG. This function is assigned (as one of the few) to all punctuation marks (with the exception of comma and those punctuation marks that stand at the end of the sentence) at the point of the preparation of the data for manual tagging. However, it is always necessary to check, whether the punctuation mark has also some other function.

If a punctuation mark is the final punctuation mark of the sentence, it may get afun AuxK (see Terminal symbol of the sentence AuxK), commas are usually assigned afun AuxX (see Comma AuxX), but they may be also assigned other afuns, namely Apos and Coord.

The afun Apos is assigned if the punctuation mark introduces the second part of apposition and if there is no commonly used expression in this position that could be used instead (e.g. tj. i.e., např. e.g., apod. etc.). The bracket, colon or another introductory punctuation mark becomes the node for apposition (Apos); in the case of a kind of bracket, its second part hangs on this node with the afun AuxG. For a more detail on apposition, see Sect. Apposition Apos, <afun>_Ap.

  1. image

    liberální   strana   (LSNS)  
    Liberal   party   (LSNS)  

The afun Coord is assigned in case of punctuation marks for coordination. It can either coordinate two sentences, or two parts of some whole, as is the case with sport results or temporal expressions of the type hh:mm. See also Coordination (sentential, of sentence parts) Coord, <afun> _Co.

  1. image

    Foto:   Přemysl   Toníček  
    Photo:   Přemysl   Toníček  
  2. image

    stav   je   5:1  
    score   is   5:1  
  3. image

      zase   hlasovali   ...,   odhlasovali   si   dovolenou!  
    once   again   they-voted   ...,   they-decided-for   Refl   vacations  
  4. image

    Homolka:   Nové   oddělení   prosperuje  
    Homolka:   New   department   prospers  
Bracketing AuxG

INFO AUXG

The node for a "bracketing" punctuation (all kinds of brackets and quotation marks, or commas, as the case may be, if they have only a "bracketing" function) depends on the governing node of the introduced (bracketed) part of the sentence:

image

"mezní"   případy  
"boundary"   cases  

If the governing member of the bracketed part is missing, the mechanism of ExD is used for the sentence members within the bracketed part as well as for the punctuation marks. The nodes for the punctuation marks thus hang on the next higher node, and get the afun ExD.

image

"Borůvky,"   odpověděla  
"Blueberries,"   she-answered  

If, however, the governing word of the bracketed part is present, the function of the punctuation mark remains unchanged, even if the governing word itself has afun ExD (this is the case when some grammatically superordinated member is deleted, but the punctuation mark would not be dependent on this node).

image

Přijde   a   hned:   "Kde   je   večeře?"  
he-comes   and   immediately:   "Where   is   dinner?"  

If a dependent clause (most often a direct speech) is interrupted by an introductory main clause, all inverted commas depend (in many cases in a non-projective way) on the governing member of the direct speech they introduce:

image

"Karel,"   řekl   Tonda,   "nepřišel".  
"Karel,"   said   Tonda,   "didn't-come."  

It may happen that the bracketing punctuation delimits more than two members, without a deletion being present. "Brackets" are then hung on the governing nodes of the boundary members of the delimited part of the sentence:

image

prolínání   "velmi   vznešeného   s   pokleslým"  
intermixing   "of-very   noble   with   lower"  

For bracketing punctuation marks with direct speech, see Direct speech.

Introducing punctuation (colon)

INFO AUXG

Direct speech or some listings connected by coordination are usually delimited from the part that introduces them. Most frequently, this delimitation function is carried by a colon. In the sequel we call punctuation marks of this type introducing punctuation.

Introducing punctuation depends on the governing node of the introduced part (direct speech, listing). Also in this case, the rule for ExD can be applied: if the governing node is missing, the introducing punctuation mark is hung on the next higher node and it gets the afun ExD.

  1. image

    řekla:   "zítra   tady   končíme."  
    she-said:   "tomorrow   here   we-finish"  
  2. image

    zasadili:   brambory,   cibuli   a   česnek,  
    they-planted:   potatoes,   onion   and   garlic  
  3. image

    řekla:   "Tonda   Karlovi."  
    she-said:   "Tonda   to-Charles"  
Punctuation marks after an abbreviation (fullstop)

INFO AUXG

An abbreviation is usually terminated by a fullstop. This fullstop depends directly on the node for the abbreviation.

image

na   str.   4  
on   p.   4  
Punctuation mark after a serial numeral (fullstop)

INFO AUXG

Fullstop after a serial numeral depends directly on the serial numeral.

image

4. 12. 1997
12/4/1997
A superfluous punctuation mark (three dots)

INFO AUXG

Superfluous marks that occur in the sentence depend on the nearest suitable node. If none of the nodes is supposed to be suitable for that purpose, these marks are hung as high as possible, but in a way that does not violate the projectivity of the tree.

Superfluous marks occur most frequently in the positions of deleted words (three dots). Three dots, however, often occur also as a part of coordination or even as the governing node of a coordination.

  1. image

    jděte   všichni   do   ...  
    go   all   to   ...  
  2. image

    uzrály   a)   švestky,   b)   meruňky,  
    ripened   a)   plums,   b)   apricots  
  3. image

    včera   jsem   *   &   byl   v     @   *   naší   +   -   hospůdce  
    yesterday   I-was   *   &   was   in   that   @   *   our   +   -   pub  
Punctuation mark with the conjunction -li (hyphen)

INFO AUXG

The hyphen before li depends directly on the conjunction li, see representation of the conjunction -li.

obrátí-li   ho  
if-he-turns   him  
Punctuation marks in compound proper names

INFO AUXG

Punctuation marks (most frequently a hyphen) in compound proper names (also in those of foreign origin) depend on the modifying nodes of the main node from the 'inner' side. More detailed instructions for the representation of compounds is given in Composed Czech proper names, or, as the case may be, in Foreign words in the text.

image

Anna - Marie