MorfFlex CZ (the latest version is MorfFlex CZ 2.0)  is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation.

The MorfFlex CZ 2.0 dictionary contains 125,348,899 lemma-tag-wordform triples.

MorfFlex CZ 2.0 is an integral part of the PDT-C 1.0 release. There is a full consistency between all the data and the dictionary. 

AuthorsJan HajičJaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková

The dictionary can be downloaded from the LINDAT-Clarin repository under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence.


How to cite MorfFlex CZ 2.0

If you use the dictionary in your research or need to cite it for any reason, please cite:

Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková: MorfFlex CZ 2.0. Data/software, LINDAT-CLARIAH, URL:, 2020.

Hajič Jan, Bejček Eduard, Hlaváčová Jaroslava, Mikulová Marie, Straka Milan, Štěpánek Jan, Štěpánková Barbora: Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, pp. 5208-5218, 2020. (pdf)

Hajič Jan: Disambiguation of Rich Inflection (Computational Morphology of Czech), Karolinum, Prague, Czechia, 2004.

Hlaváčová Jaroslava, Mikulová Marie, Štěpánková Barbora, Hajič Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Slovakia, ISSN 0021-5597, pp. 380-389, 2019.

Mikulová Marie, Hajič Jan, Hana Jiří, Hanová Hana, Hlaváčová Jaroslava, Jeřábek Emil, Štěpánková Barbora, Vidová Hladká Barbora, Zeman Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank - Consolidated 2020 release. Technical report no. TR-2020-64, Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia, 2020. (pdf)

Štěpánková Barbora, Mikulová Marie, Hajič Jan: The MorfFlex Dictionary of Czech as a Source of Linguistic Data. In: Proceedings of XIX EURALEX Congress: Lexicography for Inclusion, Democritus University of Thrace, Thrace, Greece, ISBN 978-618-85138-1-5, ISSN 2521-7100, pp. 387-392, 2020. (pdf)