Chapter 2. Basic principles of sentence representation at the tectogrammatical level

Natural language is an extraordinarily complex system; therefore, it is useful to decompose its description into several layers. The highest level in the framework of the Functional Generative Description (FGD), which serves as the theoretical basis for PDT, is called the tectogrammatical level and is supposed to represent the semantic structure of the sentence. The tectogrammatical level in PDT is based on the ideas developed in FGD; in a number of details, though, it is modified or supplemented.

The tectogrammatical level in PDT is governed by the following principles:

Tectogrammatical trees have these basic properties:

Also the following terms are used when talking about tectogrammatical trees (here explained only informally):

Technical root node of a tectogrammatical tree. The root node of a sentence is a node with no linguistic interpretation; it only serves technical purposes (e.g. it bears the sentence indentifier). It has always exactly one daughter node. The root of a sentence is called technical root node of a tectogrammatical tree. When talking about tectogrammatical tree nodes (further in the text), the technical root node is not taken into account (if not stated otherwise).

Fig. 2.1: The technical root node of the tectogrammatical tree is the highest node, its only daughter node is connected to it by a thin dotted line (the value of the nodetype attribute of the technical root node is root; the technical root node also has the id attribute, which serves for identifying the sentence in the corpus).

Mother node. Node X is the mother of node Y, if there is an edge between X and Y and if X is closer to the technical root node of the tree (i.e. if it is higher in the tree).

Fig. 2.1: The mother of the node representing the expression (starý) sultán is the node for a.

Immediate daughter node. Node X is an immediate daughter of node Y, if Y is the mother of X.

Since tectogrammatical trees make use of linear ordering, there are right and left daughter nodes. A right (left) immediate daughter of node M is such an immediate daughter which occurs to the right (left) of node M.

Fig. 2.1: The immediate daughter nodes of the node representing the verb vystřídat se are these three nodes: the node for the conjunction a, the newly established node for the Patient and the node for the prepositional phrase na trůnu. All immediate daughter nodes of vystřídat se are left daughters.

Governing/dependent node. If nodes X and Y (or: the expressions represented by them) are in a dependency relation, X is the governing node (or dependent node) of node Y. The governing node does not have to be the mother node of the dependent node (there can even be more governing nodes for a single node) and the dependent node does not have to be an immediate daughter of its governing node (see also Section 1, "Dependency"). (In the technical documentation for PDT, the terms "effective mother node" and "effective daughter node" are used for this type of relation).

Fig. 2.1: The governing node of the node for starý is the node for sultán (which is also its mother node). The governing node of the node for sultán is the node representing the verb vystřídat se (which is not its mother node).

Sister node. Node X is a sister node of node Y if they have the same mother.

Since tectogrammatical trees make use of linear ordering, there are right and left sisters. A right (left) sister node of node M is such a sister that occurs to the right (left) of node M.

Fig. 2.1: The sister nodes of the node for a are the newly established node for the Patient of vystřídat se and the node representing the prepositional phrase na trůnu. All the sisters of the node representing the conjunction a are its right sisters.

Path from node M. For purposes of topic - focus articulation annotation, we also define the term right (left) path from node M and the rightmost (leftmost) path from node M.

A right (left) path from node M is such a path in the tree that starts at node M, goes downwards (towards the leaves) and ends in a node that has no right (left) immediate daughters. Node M is not part of the path.

The rightmost (leftmost) path from node M is such a right (left) path in the tree for which it holds that no node on the path has a right (left) sister.

Fig. 2.1: There is no right path leading from the node for vystřídat se. As for the leftmost path from the node representing vystřídat se, it consists of the nodes for a, sultán and starý.

Subtrees. A subtree of a tectogrammatical tree is a continuous subgraph of a tectogrammatical tree (a subset of its nodes and edges with a marked root node).

Root of a subtree. The root of a subtree is the node of the subtree the mother node of which (if existent) is not part of the subtree.

Expression. Linguistically relevant parts of a sentence are called expression. (Whole sentences are also expressions.)

Root of an expression. The root of an expression is short for the root of the subtree representing a given expression.

The root of a sentence is the root of the subtree corresponding to a whole sentence; i.e. it is the (only) direct daughter of the technical root node of the tectogrammatical tree.

Effective root of an expression. The effective root of an expression is the node that either has no governing node in the given tectogrammatical tree or the governing node of which is not part of the subtree representing the expression. The effective root of an expression can be identical to the root of the expression; however, sometimes it is not, e.g. in case of paratactic structures: the root node (there is only one root) is not identical to the effective root nodes (which are usually more than one).

Fig. 2.1: The root of the example sentence is the node for vystřídat se. This node is also the effective root of the sentence. The coordination starý sultán a nový sultán is represented by a subtree of the tectogrammatical tree; the root of the subtree (the root of the coordination) is the node representing the conjunction a, the effective root nodes are the two nodes representing the noun sultán.

Figure 2.1. Tectogrammatical tree

Tectogrammatical tree

Starý sultán a nový sultán se vystřídali na trůnu. (=lit. Old sultan and new sultan REFL changed on throne)