l

Lemma as defined by a morphological dictionary, manually disambiguated. It has the same format also in the <MMl> and <MDl> text fields.

In the Czech data provided in the Prague Dependency Treebank, and partially also in the Czech National Corpus (CNC), the formal structure of the lemma is described in the remaining part of this section.

The lemma string includes an optional sense ID (a decimal number separated from the lemma by a single dash symbol, such as -3), followed optionally by syntactic, semantic and style tags, derivational information and a comment (or any combination of those) marked in a non-SGML way: each tag is only one-letter long, it is attached to the lemma by an underscore and a single markup symbol:

Syntactic tags

Syntactic tags have been used formerly for alternate part of speech for some words, but are not used today except for verb aspect distinction for regular verbs (T, W symbols). Part of speech symbols can be always found in the associated morphological tag (<t>, <MMt>, <MDt>), and the abbreviation information from the tag (8 in its VAR (last) column) has precedence over the B designation here.
N Noun
J Adjective
A Adjective
Z Pronoun
T Imperfective verb
W Perfective verb
V Verb (aspect not specified)
M Numeral
C Conjunction
D Adverb
P Preposition
F Interjection
I Particle
B Abbreviation
Q Unused
X Unused

Semantic tags

G Geographical name
Y Person's first (given) name
S Person's family name
E Names of members of nations, cities, ethnic groups etc.
R Product name
K Organization name
m Other proper name
H Chemistry
U Medicine
L Natural Sciences
j Law, Legal
g General Technical term
c Electronics, Computers
y DIY, travel, free time
b Economy and Finances
u Culture, Education, Arts, other science
w Sports
p Politics, Government, Military
z Environment
o Colors

Style tags

s Bookish
a Archaic
n Dialect
h Colloquial (not tolerated in the standard)
e Expressive
l Slang
v Vulgar (extremely expressive)
t Foreign-language word
x Parallel spelling/form, do not use for morph. generation

Derivation information, general comment

Derivation information and general comment (introduced by the caret symbol, ^) are furthermore always contained within a set of parentheses.

Within the parentheses, derivation information always starts with a star (*) as a distinguishing symbol (vs. a general comment), optionally preceded by a derivation type formed by the symbol ^ (caret) and a two-letter code. After the star symbol, a "rule" follows which describes how to get the (underlying) lemma which the current lemma has been derived from. The rule has two parts:

If a star (*) is used in the deletion part, the to-be-appended part which follows the contain the complete original lemma. Otherwise, the number of symbols (including the sense ID if any) designated by the deletion part should be stripped off before attaching the to-be-appended part to form the original lemma.

Examples:

Eventually, all proper derivations should have the ^XX derivation present; all remaining comments starting with a star (*) and containing the string transformation rule described above will be considered synonyms, not derivations.


Content


No ATTRIBUTES
CONTENT DECLARATION

Tag Minimization
Open Tag: REQUIRED
Close Tag: OPTIONAL

Parent Elements


Top Elements
All Elements


csts DTD