The grammatemes are generated automatically in the current PEDT 2.0 annotation.
Grammatemes are mostly semantically oriented counterparts of morphological categories such as number, degree of comparison, or tense. The system of grammatemes preserves the cognitive information represented by morphological categories, which would otherwise get lost at the higher level of abstraction (when representing words with their lemmas). Not all tokens have such semantically important morphological categories. Those that have them are marked by nodetype="complex".
Not all grammatemes are relevant for all parts of speech. The complex t-nodes were therefore divided into four groups according to which grammatemes are relevant for them. These groups are called semantic parts of speech and are the following: semantic nouns, semantic adjectives, semantic verbs and semantic adverbs. These groups are not identical with the 'traditional' parts of speech. They reflect basic onomasiological categories of substance, quality, event and circumstance. The semantic part of speech is reflected by the attribute sempos.
The grammatemes have been inserted only automatically for English, using POS tags, information about auxiliary words, a list of pronouns, etc. Only a subset of grammatemes has been introduced so far. A list and explanation follow.
This grammateme renders the semantic part of speech. The following values are recognized for English:
- n.denot: (associated with gram/number: sg, pl)
- adj.denot: (associated with gram/negation: adjectives like uncool get gram/negation="neg1"). Adjectives with the morphological tags JJR or those modified by more get gram/degcmp="comp". Adjectives with the morphological tag JJS or those modified by most get gram/degcmp="sup". Adjectives with the morphological tag JJ not modified by most or more get gram/degcmp="pos".
- adv.denot.grad.neg: (associated with gram/negation: adverbs like unfortunately get gram/negation="neg1".) Adverbs with the morphological tags RBR or those modified by more get gram/degcmp="comp". Adverbs with the morphological tag RBS or those modified by most get gram/degcmp="sup".
- n.pron.def.pers: This label denotes definite personal pronouns. It is associated with gram/gender, gram/number and gram/person.
- adv.pron.indef: These are indefinite pronominal adverbials, such as when, where, why, how.
- n.pron.indef: These are pronouns like what, who, whose, but also those, these, both when acting as nouns. The pronouns those, these, both are also associated with gram/number="pl", while all others have gram/number="sg". Whenever such a pronoun has a grammatical antecedent (e.g. the girl that I saw yesterday), it is associated with gram/indeftype="relat".
- Numerals are covered by n.quant.def (cardinal numbers) and adj.quant.def (ordinal numbers). Container numerals used in singular (hundred, thousand, million, billion) are associated with gram/number="pl".
- All morphological verbs get gram/sempos="v". The grammatemes gram/deontmod, gram/verbmod and gram/tense are relevant for verbs. They get a separate description below.
This grammateme reflects verb modality. Verb forms with no modality get the value decl. Combinations of a lexical verb with a modal verb get the following values:
- must, have to: deb
- should, ought: hrt
- want: vol
- can, cannot, could: poss
- may, might: perm
In the current data version there are stray instances of these values:
- be able to: fac (12 times)
- want: vol (once)
They got filled in automatically when the annotator accidentally hid the lexemes be able to and want into a/aux.rf, which the annotators were not supposed to, so these instances actually mark annotation errors.
This grammateme renders the verb mood. It has the following values:
- ind: infinitives and indicative
- cdn: conditional mood expressed by would, should, could, might
There is a similar attribute called sentmod, which does not belong to the grammatemes. Its values are:
- enunc: enunciative
- inter: interrogative
- excl: exclamative
- imper: imperative
This attribute is assigned to main predicates in a sentence and irrelevant for subordinate predicates. A main predicate that has gram/verbmod="ind" naturally has also sentmod="enunc", so, in this case the description is somewhat redundant.
Currently only three tense categories are indicated for English:
- will, shall, wo (won't), to be going to: post
- have -ed and verbs tagged with VBN, VBD: ant
- present tense and present progressive tense: sim
Gender is indicated in personal and possessive pronouns and is guessed by a separate script in proper nouns. It distinguishes masculine, feminine and neuter. Its values are:
- nr: not recognized
- fem: feminine
- neut: neuter
- inan: masculine (The label was adopted from the original grammateme set for Czech, which has two masculine genders: animate and inanimate. It is admittedly an illogical label for English, where masculine is only identified in animate pronouns such as he, his, him and himself).
The lexical negation is marked in nouns and adjectives. The negation prefixes un, in, im, non, dis, il, ir are identified as negation. Note that this does not yet apply to adverbs. E.g. unexpectedly still has gram/negation="neg0". Verbs with the negation particles not/n't have systematically gram/negation="neg0" and have to be identified by the negation particle as child.
This grammateme has the values singular (sg), plural (pl) and nr (not recognized) and is applicable to nouns and pronouns and numbers acting as nouns. It does not only rely on the morphological tag, but it also uses grammatical congruence and other clues to identify semantic plural (e.g. in 5 billion euro, both billion and euro are identified as plural forms, although the morphological tag does not indicate it.)
Adjectives with the morphological tags JJR or those modified by more get gram/degcmp="comp". Adjectives with the morphological tag JJS or those modified by most get gram/degcmp="sup". Other adjectives get gram/degcmp="pos".
In English, this grammateme only displays the value relat and is found with the relative pronouns that, what, whatever, whereby, which, who and whose.
This grammateme is assigned to pronouns. The pronouns I, me, we, my, us, our, ours and mine get the value 1. Other values are, accordingly, 2 and 3. A few cases have nr ("not recognized").
Grammatemes dispmod, iterativeness and resultative
These grammatemes are not adapted to English and their values do not contribute any information at the moment.