2.1.1. Base form and number

The Word in LemmaProper is the base form of the respective paradigm. This means nominative singular for nouns, the same plus masculine positive for adjectives, similarly for pronouns and numerals. Verbs are represented by their infinitive forms.

The Number in LemmaProper helps to distinguish several senses of a homonymous base form. It should neither be zero nor start with zero. The used numbers need not form a continuous sequence. Sometimes a particular number is repeatedly used for a special kind of word (e.g. the lemmas numbered "-99" are almost invariantly authors' signatures and their Category/Style part is "_:B_;S"). Conventions of this kind exist solely for the convenience of a human reader but they are not meant to signal anything to a processing program. No conclusions should be ever drawn from the value of the lemma number! There is no warranty that an observed number "semantics" holds anywhere else. Other sources of information, such as the AddInfo text, should be used instead.

The following rules shall hold for each group of lemmas sharing the same base form.

Unfortunately many lemmas are not covered by our automatic morphological analyzer. Such lemmas were created by the annotators, and the administrator of the lexicon should later make their numbers and/or suffixes consistent and conformant to the above rules. In many cases it was not manageable to complete this task for PDT 2.0.

Base form in lemma is case-sensitive. Of course, words that have to be always capitalized in writing, have their lemma capitalized as well. As a consequence, špaček (starling) and Špaček_;S need not be distinguished by numbers (or they can both use the same number). However, although not required, the unique numbering of such cases is recommended.

Sometimes the numbering of lemmas reflect that their base form is homonymous with another word, although the other meaning is not base form. For instance, žena is a noun (meaning woman) but it can also be transgressive form of the verb hnát. The morphological analyzer may assign different numbers to both meanings of žena, although the latter is not a base form. As a consequence, there may be lemma žena-2 even if there is no other lemma with the same base form. Such behavior is allowed but not required.