2.1.3. Category

Lemma category is indicated by "_:" followed by a letter. Most categories correspond to parts of speech. They are rarely used because the part of speech is encoded in morphological tags as well (see below; note however that some parts of speech are encoded by different characters in the lemma than in the morphological tag). They should be used if the same lemma behaves as two or more parts of speech. No lemma is allowed to appear with morphological tags for two or more different parts of speech. For instance, vedle can be either adverb or preposition. There should be two lemmas, vedle-1_:D, and vedle-2_:P. Note however that in PDT 2.0 some lemmas, especially foreign words, occasionally appear with tags for different parts of speech, and if there are separate lemmas for each part of speech, it is often described verbally in the Comment part rather than formally using the Category field. In our example it would be vedle-1_^(je_z_toho_vedle), and vedle-2_^(vedle_něčeho). This will be corrected in future versions.

Three categories are used on a more systematical basis: _:T and _:W for verbal aspect, and _:B for abbreviations. Aspect has currently no representation in the morphological tags. It is treated as a lexical property - although there are some morphological implications, lots of irregularities could be expected if it was part of the verbal paradigm. The morphological analyzer covers aspect for some verbs while lacking the information for many others. If available, the aspect is indicated in the lemma. Note that there are biaspectual verbs, so analyzovat_:T_:W would be correct.

Abbreviations are exceptions to the Rule 3 (saying that different AddInfo implies different lemma numbers). There can be two lemmas with the same base form and number, if the only difference in their AddInfos is that one contains "_:B" and the other does not. For more information on abbreviations see Chapter 4, Abbreviations.

Table 2.2. Lemma categories

Category Explanation
N noun
A, J adjective
Z pronoun
M numeral
V verb
T imperfect verb
W perfect verb
D adverb
P preposition
C conjunction
I particle
F interjection
B abbreviation
Q ???
X do not use