Foreign words in the text

The representation of those parts of text that are written in a foreign language is not guided by the syntactic structure of these foreign segments because it would be often difficult to recognize the correct structure. This holds also for cases when the structure is clear.

After the identification of the string of words of foreign origin (it may be a proper name, some collocation, or even a whole sentence) its "main" node is selected (the "representative") and assigned afun according to the function of the string in the sentence. If the string is not a part of some syntactic structure, it is assigned afun ExD and is suspended on the initial symbol #; it follows that a foreign word can never get afun Pred, even if the string contained some predicate, see ex. (6) below).

The choice of the representative depends on the judgement of the annotator. The first candidate is the last element of the string (its rightmost element). However, esp. with names and titles, this function may be taken over by some other word that - in contrast to the last word - is inflected. In a concrete example, this inflection need not be evident (e.g. the name may be in Nominative), but again, the judgement of the annotator is relevant. If more than one word is inflected, the last inflected word is regarded as the representative. The remaining words in the string are suspended on the representative as sisters of each other and are assigned afun Atr.

If graphic symbols are present in the string, they are suspended on the attributes and assigned afun AuxG; they are placed "towards" the "main" node, i.e. they are suspended on that of its neighbouring words that is further from the "main" word. It is not easy to find such a word, though. If some symbol occurs at the very beginning or at the very end of the string, it is suspended directly on the representative (cf. the question mark in ex. (7)).

The foreign word may be sometimes superordinated to some other nodes, which means that there are also nodes suspended on the representative other than those with afun Atr (and AuxG). This is exemplified by the punctuation marks in (8).

  1. image

    zeptali   se   Ali-fattah   Bukatelam   Hasran   Hubejního  
    they-asked   Refl   Ali-fattah   Bukatelam   Hasran   Hubejni  
  2. image

    Kim Ir-Sen
  3. image

    baronka   von   Klos  
    baroness   von   Klos  
  4. image

    baronka von Klosová
    baroness von Klosová (with the feminine ending of proper names -ová)
  5. image

  6. image

    Quo vadis ?
  7. image

    nesla   knihu   Quo   vadis?  
    she-brought   book   Quo   vadis?  
  8. image

    řekl:   "   Čo   bolo,   to   bolo   ."  
    he-said   "   Čo   bolo,   to   bolo   ."  
    (Slovak for: what was that was)  

Analysis of (3) and (4): In both examples, the foreign string consists of two nodes: [von Klos], [von Klosová], respectively. In (3), this string is an attribute of the word baronka, while in (4) the name is the representative and the word baronka is its attribute (as a title).

Analysis of (6) and (7): In (6) the foreign component consists of two nodes: [quo vadis]. The question mark indicates that the given string is an interrogative sentence; the question mark does not belong to the foreign string and it is therefore suspended on the symbol #. In (7), the question mark belongs to the foreign string (it is included in the title of the book), and as such is suspended on the representative.

Analysis of (8): The foreign string consists in five nodes: [Čo bolo, to bolo]. The node "bolo" is selected as the representative. The colon, fullstop and the inverted commas are considered to be component parts of the (Czech) text and therefore they are suspended under the main node of the direct speech (according to the general rules for direct speech).