The morphological tag of the current token (which can be found in the text part of <f> or <d>), manually disambiguated. The tagset is defined by the morphological dictionary used for preprocessing the data.
In the Prague Dependency Treebank (PDT), the following tagset system is currently in use. For more information, please refer to the PDT documentation.
Each tag is a 15-tuple of symbols (mostly uppercase letters and digits, but many lowercase and special symbols are used as well). Each single-character position contains a value from one morphological category. 13 categories are in fact fully used:
Position | Category name | Description |
---|---|---|
1 | POS | Part of Speech |
2 | SUBPOS | Detailed Part of Speech |
3 | GENDER | Grammatical Gender (for agreement) |
4 | NUMBER | Grammatical Number (for agreement) |
5 | CASE | Morphological Case |
6 | POSSGENDER | Gender of Possessor |
7 | POSSNUMBER | Number of Possessor |
8 | PERSON | Person |
9 | TENSE | Tense |
10 | GRADE | Degree of Comparison |
11 | NEGATION | Negation |
12 | VOICE | Voice |
13 | RESERVE1 | Reserved |
14 | RESERVE2 | Reserved |
15 | VAR | Variant, Style, Register |
For more information on the individual categories, especially the sets of possible values, please see the full Tagset documentation ( psfile, pdffile) or the quick tagset reference ( htmlfile, pdffile).
ATTRIBUTES
CONTENT DECLARATION