Chapter 3. Names

Table of Contents

3.1. Personal names
3.1.1. von, van, etc.
3.1.2. Chinese and Korean names
3.1.3. Foreignized Czech names
3.2. Geographical names
3.2.1. Countries, cities, rivers, mountains
3.2.2. Streets
3.3. Companies and institutions
3.3.1. Restaurants
3.3.2. Sport clubs
3.4. Horses, DJ's etc.
3.5. Products
3.6. Sporting and other events
3.7. Other
3.7.1. Buildings
3.7.2. Televisions
3.7.3. News and magazines
3.7.4. Song names
3.8. Adjectives derived from names

Unlike in version 1.0, it is now preferred to separate named entity tagging from morphology. Named entities (often multiple-word) should be marked and categorized as special phrases on a layer other than morphological; this is a separate project that has not been included in PDT 2.0. Lemmas of proper names will still bear information on the name category. Nevertheless, we respect the original idea that the term suffixes shall explain the meaning of the lemma, not the context it appears in. Thus for instance New should be lemmatized as new_,t in New York, not New_;G. York should be lemmatized York_;G even in New York Times where it was previously York_;K. For details see below.

Unfortunately, it was not manageable to enforce the desired lemmatization in PDT 2.0. The annotation is still inconsistent in this respect. We plan to correct it in a future version.

Table 3.1. Name types

Type Explanation, examples
Y given name (formerly used as default): Petr, John
S surname, family name: Dvořák, Zelený, Agassi, Bush
E member of a particular nation, inhabitant of a particular territory: Čech, Kolumbijec, Newyorčan
G geographical name: Praha, Tatry (the mountains)
K company, organization, institution: Tatra (the company)
R product: Tatra (the car)
m other proper name: names of mines, stadiums, guerilla bases, etc.

The lemma should start with upper case if the word is always in upper-case in names (Špaček_;S is always capitalized, špaček is not).