MorfFlex CZ (the latest version is MorfFlex CZ 2.0)  is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation.

The MorfFlex CZ 2.0 dictionary contains 125,348,899 lemma-tag-wordform triples.

MorfFlex CZ 2.0 is an integral part of the PDT-C 1.0 release. There is a full consistency between all the data and the dictionary. 

AuthorsJan HajičJaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková

The dictionary can be downloaded from the LINDAT-Clarin repository under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence.


How to cite MorfFlex CZ 2.0

If you use the dictionary in your research or need to cite it for any reason, please cite:

For LREC papers (separate language resources references):

 title={MorfFlex CZ 2.0},
 author={Haji\v{c}, Jan and Hlavá\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora},
 url = {},
 publisher={Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University}, 
 address={Prague, Czech Republic}, 
 year={2020} }

For general papers and citations:

 title={MorfFlex CZ 2.0},
 author={Haji\v{c}, Jan and Hlavá\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, 
 url = {},
 note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\\'U}FAL}), 
 Faculty of Mathematics and Physics, Charles University}, 
 copyright={Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)},
 year={2020} }

For "plaintext" reference:

(Hajič et al., 2020)

Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková: MorfFlex CZ 2.0. Data/software, LINDAT-CLARIAH, URL:, 2020.

For footnote references, the following is sufficient in LaTeX papers:




Hajič Jan, Bejček Eduard, Hlaváčová Jaroslava, Mikulová Marie, Straka Milan, Štěpánek Jan, Štěpánková Barbora: Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, pp. 5208-5218, 2020. (pdf)

Hajič Jan: Disambiguation of Rich Inflection (Computational Morphology of Czech), Karolinum, Prague, Czechia, 2004.

Hlaváčová Jaroslava, Mikulová Marie, Štěpánková Barbora, Hajič Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Slovakia, ISSN 0021-5597, pp. 380-389, 2019.

Mikulová Marie, Hajič Jan, Hana Jiří, Hanová Hana, Hlaváčová Jaroslava, Jeřábek Emil, Štěpánková Barbora, Vidová Hladká Barbora, Zeman Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank - Consolidated 2020 release. Technical report no. TR-2020-64, Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia, 2020. (pdf)

Štěpánková Barbora, Mikulová Marie, Hajič Jan: The MorfFlex Dictionary of Czech as a Source of Linguistic Data. In: Proceedings of XIX EURALEX Congress: Lexicography for Inclusion, Democritus University of Thrace, Thrace, Greece, ISBN 978-618-85138-1-5, ISSN 2521-7100, pp. 387-392, 2020. (pdf)