Grammatemes

Content

Grammatemes

In this text we present the main principles of the tectogrammatical representation applied to English and use English examples, but features that are not language-specific to English apply to the Czech tectogrammatical representation as well.

The grammatemes are generated automatically in the current PCEDT 2.0 annotation.

Grammatemes are mostly semantically oriented counterparts of morphological categories such as number, degree of comparison, or tense. The system of grammatemes preserves the cognitive information represented by morphological categories, which would otherwise get lost at the higher level of abstraction (when representing words with their lemmas). Not all tokens have such semantically important morphological categories. Those that have them are marked by nodetype="complex".

Not all grammatemes are relevant for all parts of speech. The complex t-nodes were therefore divided into four groups according to which grammatemes are relevant for them. These groups are called semantic parts of speech and are the following: semantic nouns, semantic adjectives, semantic verbs and semantic adverbs. These groups are not identical with the 'traditional' parts of speech. They reflect basic onomasiological categories of substance, quality, event and circumstance. The semantic part of speech is reflected by the attribute sempos.

The grammatemes have been inserted only automatically for English, using POS tags, information about auxiliary words, a list of pronouns, etc. Only a subset of grammatemes has been introduced so far. A list and explanation follow.

`gram/sempos`

This grammateme renders the semantic part of speech. The following values are recognized for English:

n.denot: (associated with gram/number: sg, pl)
adj.denot: (associated with gram/negation: adjectives like uncool get gram/negation="neg1"). Adjectives with the morphological tags JJR or those modified by more get gram/degcmp="comp". Adjectives with the morphological tag JJS or those modified by most get gram/degcmp="sup". Adjectives with the morphological tag JJ not modified by most or more get gram/degcmp="pos".
adv.denot.grad.neg: (associated with gram/negation: adverbs like unfortunately get gram/negation="neg1".) Adverbs with the morphological tags RBR or those modified by more get gram/degcmp="comp". Adverbs with the morphological tag RBS or those modified by most get gram/degcmp="sup".
n.pron.def.pers: This label denotes definite personal pronouns. It is associated with gram/gender, gram/number and gram/person.
adv.pron.indef: These are indefinite pronominal adverbials, such as when, where, why, how.
n.pron.indef: These are pronouns like what, who, whose, but also those, these, both when acting as nouns. The pronouns those, these, both are also associated with gram/number="pl", while all others have gram/number="sg". Whenever such a pronoun has a grammatical antecedent (e.g. the girl that I saw yesterday), it is associated with gram/indeftype="relat". The indef grammateme has many values in Czech (capturing the many ways in which pronouns can be indefinite), but it acquires only this single value in the current English data.
Numerals are covered by n.quant.def (cardinal numbers) and adj.quant.def (ordinal numbers). Container numerals used in singular (hundred, thousand, million, billion) are associated with gram/number="pl".
All morphological verbs get gram/sempos="v". The grammatemes gram/deontmod, gram/verbmod and gram/tense are relevant for verbs. They get a separate description below.

`gram/deontmod`

This grammateme reflects verb modality. Verb forms with no modality get the value decl. Combinations of a lexical verb with a modal verb get the following values:

must, have to: deb
should, ought to: hrt
can, cannot, could: poss
may, might: perm

In the current data version there are stray instances of these values:

be able to: fac (12 times)
want: vol (once)

They got filled in automatically when the annotator accidentally hid the lexemes be able to and want into a/aux.rf, which the annotators were not supposed to, so they actually mark annotation errors.

`gram/verbmod`

This grammateme renders the verb mood. It has the following values:

ind: infinitives and indicative
cdn: conditional mood expressed by would, should, could, might

There is a similar attribute called sentmod, which does not belong to the grammatemes. Its values are:

enunc: enunciative
inter: interrogative
desid: desiderative (probably only applicable to Czech)
excl: exclamative
imper: imperative

This attribute is assigned to main predicates in a sentence and irrelevant for subordinate predicates. A main predicate that has gram/verbmod="ind" naturally has also sentmod="enunc", so, in this case the description is somewhat redundant.

`gram/tense`

There are only three categories for tense in Czech. Currently only three tense categories are indicated for English as well, although the system of tenses is much more complex in English than it is in Czech:

will, shall, wo (won't), to be going to: post
have -ed and verbs tagged with VBN, VBD: ant
present tense and present progressive tense: sim
non-finite verb form: nil

`gram/gender`

Gender is indicated in personal and possessive pronouns and is guessed by a separate script in proper nouns. It distinguishes masculine, feminine and neuter. Its values are:

nr: not recognized
fem: feminine
neut: neuter
inan: masculine (The label was adopted from the original grammateme set for Czech, which has two masculine genders: animate and inanimate. It is admittedly an illogical label for English, where masculine is only identified in animate pronouns such as he, his, him and himself).

`gram/negation`

The lexical negation is marked in nouns and adjectives. The negation prefixes un, in, im, non, dis, il, ir are identified as negation. Note that this does not yet apply to adverbs and verbs (e.g. unexpectedly and unwrap still have gram/negation="neg0"). Verbs with the negation particles not/n't have systematically gram/negation="neg0" and have to be identified by the negation particle as child.

`gram/number`

This grammateme has the values singular (sg), plural (pl) and nr (not recognized) and is applicable to nouns and pronouns and numbers acting as nouns. It does not only rely on the morphological tag, but it also uses grammatical congruence and other clues to identify semantic plural (e.g. in 5 billion euro, both billion and euro are identified as plural forms, although the morphological tag does not indicate it.)

`gram/degcmp`

Adjectives with the morphological tags JJR or those modified by more get gram/degcmp="comp". Adjectives with the morphological tag JJS or those modified by most get gram/degcmp="sup". Other adjectives get gram/degcmp="pos".

`gram/indeftype`

In English, this grammateme only displays the value relat and is found with the relative pronouns that, what, whatever, whereby, which, who and whose.

`gram/person`

This grammateme is assigned to pronouns. The pronouns I, me, we, my, us, our, ours, mine get the value 1. Other values are, accordingly, 2 and 3. A few cases have nr (not recognized).

Grammatemes `dispmod, iterativeness` and `resultative`

These grammatemes are not adapted to English and their values do not contribute any information at the moment.

<< Formemes Valency >>

Content

gram/sempos

gram/deontmod

gram/verbmod

gram/tense

gram/gender

gram/negation

gram/number

gram/degcmp

gram/indeftype

gram/person

Grammatemes dispmod, iterativeness and resultative