The grammatemes are generated automatically in the current PCEDT 3.0 annotation.
Grammatemes are mostly semantically oriented counterparts of morphological categories such as number, degree of comparison, or tense. The system of grammatemes preserves the cognitive information represented by morphological categories, which would otherwise get lost at the higher level of abstraction (when representing words with their lemmas). Not all tokens have such semantically important morphological categories. Those that have them are marked by nodetype=complex.
Not all grammatemes are relevant for all parts of speech. The complex t-nodes were therefore divided into four groups according to which grammatemes are relevant for them. These groups are called semantic parts of speech and are the following: semantic nouns, semantic adjectives, semantic verbs and semantic adverbs. These groups are not identical with the 'traditional' parts of speech. They reflect basic onomasiological categories of substance, quality, event and circumstance. The semantic part of speech is reflected by the attribute sempos.
The grammatemes have been inserted only automatically for English, using POS tags, information about auxiliary words, a list of pronouns, etc. Only a subset of grammatemes has been introduced so far. A list and explanation follow.
This grammateme renders the semantic part of speech. The following values are recognized for English:
The gram/deontmod grammateme reflects verb modality. Verb forms with no modality get the value decl. Combinations of a lexical verb with a modal verb get the following values:
In the current data version there are stray instances of these values: fac (12 times), vol (once). They got filled in automatically when the annotator accidentally hid the lexemes be able to and want into a/aux.rf, which the annotators were not supposed to, so these instances actually mark annotation errors.
The gram/verbmod grammateme renders the verb mood. It has the following values:
Currently only three tense categories are indicated for English:
Gender is indicated in personal and possessive pronouns and is guessed by a separate script in proper nouns. It distinguishes masculine, feminine and neuter. Its values are:
The lexical negation is marked in nouns and adjectives. The negation prefixes un, in, im, non, dis, il, ir are identified as negation. Note that this does not yet apply to adverbs. E.g. unexpectedly still has gram/negation=neg0. Verbs with the negation particles not/n't have systematically gram/negation=neg0 and have to be identified by the negation particle as child.
The gram/number grammateme has the values singular (sg), plural (pl) and nr (not recognized) and is applicable to nouns and pronouns and numbers acting as nouns. It does not only rely on the morphological tag, but it also uses grammatical congruence and other clues to identify semantic plural (e.g. in 5 billion euro, both billion and euro are identified as plural forms, although the morphological tag does not indicate it).
Adjectives with the morphological tags JJR or those modified by more get gram/degcmp=comp. Adjectives with the morphological tag JJS or those modified by most get gram/degcmp=sup. Other adjectives get gram/degcmp=pos.
In English, the gram/indeftype grammateme only displays the value relat and is found with the relative pronouns that, what, whatever, whereby, which, who and whose.
The gram/person grammateme is assigned to pronouns. The pronouns I, me, we, my, us, our, ours and mine get the value 1. Other values are, accordingly, 2 and 3. A few cases have nr ("not recognized").
These grammatemes are not adapted to English and their values do not contribute any information at the moment.