2.2.1. Positional tags

A positional tag is a string of 15 characters. Every positions encodes one morphological category using one character (mostly upper case letters or numbers).

Position Name Description
1 POS Part of speech
2 SubPOS Detailed part of speech
3 Gender Gender
4 Number Number
5 Case Case
6 PossGender Possessor's gender
7 PossNumber Possessor's number
8 Person Person
9 Tense Tense
10 Grade Degree of comparison
11 Negation Negation
12 Voice Voice
13 Reserve1 Reserve
14 Reserve2 Reserve
15 Var Variant, style

Some of the characters encode aggregation of more atomic values - for example: 'X' - means any value, Y means masculine animate (M) or inanimate (I). Dash ('-') means "not applicable" (e.g. tense for nouns).

Not all combinations of tag values are possible. There is about 4K tags.

See also: http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/docc0pos.pdf

Or for quick reference: http://ufal.mff.cuni.cz/pdt/Morphology_and_Tagging/Doc/hmptagqr.html

1 - Part of speech

In fact, part of speech is rather lexical-syntactic than morphological property. It is practical to keep it in the tags but it would be more accurate to keep it in the lemmas. Anyway, no lemma is allowed to occur with two different parts of speech in the accompanying tags. If a word behaves syntactically as various parts of speech, several lemmas have to be reserved for it.

Value Description
A Adjective
C Numeral
D Adverb
I Interjection
J Conjunction
N Noun
P Pronoun
V Verb
R Preposition
T Particle
X Unknown, Not Determined, Unclassifiable
Z Punctuation (also used for the Sentence Boundary token)

2 - Detailed part of speech

Further subcategorizes POS. The POS value is uniquely specified by SubPOS value.

Table 2.5. SUBPOS

Value Description POS
# Sentence boundary Z - punctuation
% Author's signature, e.g. haš-99_:B_;S N - noun
* Word krát (lit.: times) C - numeral
, Conjunction subordinate (incl. aby, kdyby in all forms) J - conjuction
} Numeral, written using Roman numerals (XIV) C - numeral
: Punctuation (except for the virtual sentence boundary word ###, which uses the the section called "2 - Detailed part of speech" #) Z - punctuation
= Number written using digits C - numeral
? Numeral kolik (lit. how many/how much) C - numeral
@ Unrecognized word form X - unknown
^ Conjunction (connecting main clauses, not subordinate) J - conjunction
4 Relative/interrogative pronoun with adjectival declension of both types (soft and hard) (jaký, který, čí, ..., lit. what, which, whose, ...) P - pronoun
5 The pronoun he in forms requested after any preposition (with prefix n-: něj, něho, ..., lit. him in various cases) P - pronoun
6 Reflexive pronoun se in long forms (sebe, sobě, sebou, lit. myself / yourself / herself / himself in various cases; se is personless) P - pronoun
7

Reflexive pronouns se (the section called "5 - Case" = 4), si (the section called "5 - Case" = 3), plus the same two forms with contracted -s: ses, sis (distinguished by the section called "8 - Person" = 2; also number is singular only) This should be done somehow more consistently, virtually any word can have this contracted -s (cos, polívkus, ...)

P - pronoun
8 Possessive reflexive pronoun svůj (lit. my/your/her/his when the possessor is the subject of the sentence) P - pronoun
9 Relative pronoun jenž, již, ... after a preposition (n-: něhož, niž, ..., lit. who) P - pronoun
A Adjective, general A - adjective
B Verb, present or future form V - verb
C Adjective, nominal (short, participial) form rád, schopen, ... A - adjective
D Pronoun, demonstrative (ten, onen, ..., lit. this, that, that ... over there, ... ) P - pronoun
E Relative pronoun což (corresponding to English which in subordinate clauses referring to a part of the preceding text) P - pronoun
F Preposition, part of; never appears isolated, always in a phrase (nehledě (na), vzhledem (k), ..., lit. regardless, because of) R - preposition
G Adjective derived from present transgressive form of a verb A - adjective
H Personal pronoun, clitical (short) form (mě, mi, ti, mu, ...); these forms are used in the second position in a clause (lit. me, you, her, him), even though some of them (mě) might be regularly used anywhere as well P - pronoun
I Interjections I - interjection
J Relative pronoun jenž, již, ... not after a preposition (lit. who, whom) P - pronoun
K Relative/interrogative pronoun kdo (lit. who), incl. forms with affixes -ž and -s (affixes are distinguished by the category Table 2.16, "VAR" (for -ž) and the section called "8 - Person" (for -s)) P - pronoun
L Pronoun, indefinite všechnen, sám (lit. all, alone) P - pronoun
M Adjective derived from verbal past transgressive form A - adjective
N Noun (general) N - noun
O Pronoun svůj, nesvůj, tentam alone (lit. own self, not-in-mood, gone) P - pronoun
P Personal pronoun já, ty, on (lit. I, you, he ) (incl. forms with the enclitic -s, e.g. tys, lit. you're); gender position is used for third person to distinguish on/ona/ono (lit. he/she/it), and number for all three persons P - pronoun
Q Pronoun relative/interrogative co, copak, cožpak (lit. what, isn't-it-true-that) P - pronoun
R Preposition (general, without vocalization) R - preposition
S Pronoun possessive můj, tvůj, jeho (lit. my, your, his); gender position used for third person to distinguish jeho, její, jeho (lit. his, her, its), and number for all three pronouns P - pronoun
T Particle T - particle
U Adjective possessive (with the masculine ending -ův as well as feminine -in) A - adjective
V Preposition (with vocalization -e or -u): (ve, pode, ku, ..., lit. in, under, to) R - preposition
W Pronoun negative (nic, nikdo, nijaký, žádný, ..., lit. nothing, nobody, not-worth-mentioning, no/none) P - pronoun
X (temporary) Word form recognized, but tag is missing in dictionary due to delays in (asynchronous) dictionary creation  
Y Pronoun relative/interrogative co as an enclitic (after a preposition) (oč, nač, zač, lit. about what, on/onto what, after/for what) P - pronoun
Z Pronoun indefinite (nějaký, některý, číkoli, cosi, ..., lit. some, some, anybody's, something) P - pronoun
a Numeral, indefinite (mnoho, málo, tolik, několik, kdovíkolik, ..., lit. much/many, little/few, that much/many, some (number of), who-knows-how-much/many) C - numeral
b Adverb (without a possibility to form negation and degrees of comparison, e.g. pozadu, naplocho, ..., lit. behind, flatly); i.e. both the the section called "11 - Negation" as well as the Table 2.13, "GRADE" attributes in the same tag are marked by - (Not applicable) D - adverb
c Conditional (of the verb být (lit. to be) only) (by, bych, bys, bychom, byste, lit. would) V - verb
d Numeral, generic with adjectival declension (dvojí, desaterý, ..., lit. two-kinds/..., ten-...) C - numeral
e Verb, transgressive present (endings -e/-ě, -íc, -íce) V - verb
f Verb, infinitive V - verb
g Adverb (forming negation (??? set to A/N) and degrees of comparison Table 2.13, "GRADE" set to 1/2/3 (comparative/superlative), e.g. velký, za\-jí\-ma\-vý, ..., lit. big, interesting  
h Numeral, generic; only jedny and nejedny (lit. one-kind/sort-of, not-only-one-kind/sort-of) C - numeral
i Verb, imperative form V - verb
j Numeral, generic greater than or equal to 4 used as a syntactic noun (čtvero, desatero, ..., lit. four-kinds/sorts-of, ten-...) C - numeral
k Numeral, generic greater than or equal to 4 used as a syntactic adjective, short form (čtvery, ..., lit. four-kinds/sorts-of) C - numeral
l Numeral, cardinal jeden, dva, tři, čtyři, půl, ... (lit. one, two, three, four); also sto and tisíc (lit. hundred, thousand) if noun declension is not used C - numeral
m Verb, past transgressive; also archaic present transgressive of perfective verbs (ex.: udělav, lit. (he-)having-done; arch. also udělaje (Table 2.16, "VAR" = 4), lit. (he-)having-done) V - verb
n Numeral, cardinal greater than or equal to 5 C - numeral
o Numeral, multiplicative indefinite (-krát, lit. (times): mnohokrát, tolikrát, ..., lit. many times, that many times) C - numeral
p Verb, past participle, active (including forms with the enclitic - s, lit. 're (are)) V - verb
q Verb, past participle, active, with the enclitic -ť, lit. (perhaps) - could-you-imagine-that? or but-because- (both archaic) V - verb
r Numeral, ordinal (adjective declension without degrees of comparison) C - numeral
s Verb, past participle, passive (including forms with the enclitic -s, lit. 're (are)) V - verb
t Verb, present or future tense, with the enclitic -ť, lit. (perhaps) -could-you-imagine-that? or but-because- (both archaic) V - verb
u Numeral, interrogative kolikrát, lit. how many times? C - numeral
v Numeral, multiplicative, definite (-krát, lit. times: pětkrát, ..., lit. five times) C - numeral
w Numeral, indefinite, adjectival declension (nejeden, tolikátý, ..., lit. not-only-one, so-many-times-repeated) C - numeral
y Numeral, fraction ending at -ina; used as a noun (pětina, lit. one-fifth) C - numeral
z Numeral, interrogative kolikátý, lit. what (at-what-position- place-in-a-sequence) C - numeral

Obsolete values:

Value Description
! Abbreviation used as an adverb
. Abbreviation used as an adjective
~ Abbreviation used as a verb
; Abbreviation used as a noun
3 Abbreviation used as a numeral
x Abbreviation, part of speech unknown/indeterminable

3 - Gender

In fact, gender is a truly morphological attribute only for adjectives, pronouns, numerals and verbs. For nouns, it is a lexical property. As a consequence, no noun lemma is allowed to occur with two different genders in the accompanying tags. If a word allows for more than genders, several lemmas have to be reserved for it.

Table 2.6. Gender

Value Description
F Feminine
H {F, N} - Feminine or Neuter
I Masculine inanimate
M Masculine animate
N Neuter
Q Feminine (with singular only) or Neuter (with plural only); used only with participles and nominal forms of adjectives
T Masculine inanimate or Feminine (plural only); used only with participles and nominal forms of adjectives
X Any
Y {M, I} - Masculine (either animate or inanimate)
Z {M, I, N} - Not fenimine (i.e., Masculine animate/inanimate or Neuter); only for (some) pronoun forms and certain numerals

4 - Number

Table 2.7. Number

Value Description
D Dual , e.g. nohama
P Plural, e.g. nohami
S Singular, e.g. noha
W Singular for feminine gender, plural with neuter; can only appear in participle or nominal adjective form with gender value Q
X Any

5 - Case

Table 2.8. CASE

Value Description
1 Nominative, e.g. žena
2 Genitive, e.g. ženy
3 Dative, e.g. ženě
4 Accusative, e.g. ženu
5 Vocative, e.g. ženo
6 Locative, e.g. ženě
7 Instrumental, e.g. ženou
X Any

6 - Possessor's Gender

Table 2.9. Possessor's Gender

Value Description
F Feminine, e.g. matčin, její
M Masculine animate (adjectives only), e.g. otců
X Any
Z {M, I, N} - Not feminine, e.g. jeho

7 - Possessor's Number

Table 2.10. Possessor's Number

Value Description
P Plural, e.g. náš
S Singular, e.g. můj
X Any, e.g. your

8 - Person

Table 2.11. PERSON

Value Description
1 1st person, e.g. píšu, píšeme
2 2nd person, e.g. píšeš, píšete
3 3rd person, e.g. píše, píšou
X Any person

9 - Tense

Table 2.12. Tense

Value Description
F Future
H {R, P} - Past or Present
P Present
R Past
X Any

10 - Degree of Comparison

Table 2.13. GRADE

Value Description
1 Positive, e.g. velký
2 Comparative, e.g. větší
3 Superlative, e.g. největší

11 - Negation

Table 2.14. NEGATION

Value Description
A Affirmative (not negated), e.g. možný
N Negated, e.g. nemožný

12 - Voice

Table 2.15. Voice

Value Description
A Active, e.g. píšící
P Passive, e.g. psaný

15 - Variant

Table 2.16. VAR

Value Description
- Basic variant, standard contemporary style; also used for standard forms allowed for use in writing by the Czech Standard Orthography Rules despite being marked there as colloquial
1 Variant, second most used ( less frequent), still standard
2 Variant, rarely used, bookish, or archaic
3 Very archaic, also archaic + colloquial
4 Very archaic or bookish, but standard at the time
5 Colloquial, but (almost) tolerated even in public
6 Colloquial (standard in spoken Czech)
7 Colloquial (standard in spoken Czech), less frequent variant
8 Abbreviations
9 Special uses, e.g. personal pronouns after prepositions etc.