MorfFlex CZ (the latest version is MorfFlex CZ 2.0) is the Czech morphological dictionary developed originally by Jan Hajič as a spelling checker and lemmatization dictionary. MorfFlex is a flat list of lemma-tag-wordform triples. For each wordform, full inflectional information is coded in a positional tag. Wordforms are organized into entries (paradigm instances or paradigms in short) according to their formal morphological behavior. The paradigm (set of wordforms) is identified by a unique lemma. Apart from traditional morphological categories, the description also contains some semantic, stylistic and derivational information. For more details see a comprehensive specification of the Czech morphological annotation.
The MorfFlex CZ 2.0 dictionary contains 125,348,899 lemma-tag-wordform triples.
MorfFlex CZ 2.0 is an integral part of the PDT-C 1.0 release. There is a full consistency between all the data and the dictionary.
Authors: Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková.
The dictionary can be downloaded from the LINDAT-Clarin repository under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence.
How to cite MorfFlex CZ 2.0
If you use the dictionary in your research or need to cite it for any reason, please cite:
For LREC papers (separate language resources references):
@languageresource{lrMorfFlexCZ20, title={MorfFlex CZ 2.0}, author={Haji\v{c}, Jan and Hlavá
\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, url = {http://hdl.handle.net/11234/1-3185}, publisher={Institute of Formal and Applied Linguistics, LINDAT/CLARIN, Charles University}, address={Prague, Czech Republic},lindat={
http://hdl.handle.net/11234/1-3186},
year={2020} }
For general papers and citations:
@misc{
MorfFlexCZ20, title={MorfFlex CZ 2.0},author={Haji\v{c}, Jan and Hlavá
\v{c}ov\'{a}, Jaroslava and Mikulov\'{a}, Marie and Straka, Milan and {\v{S}}t\v{e}p\'{a}nkov\'{a}, Barbora}, url = {http://hdl.handle.net/11234/1-3186}, note = {{LINDAT}/{CLARIN} digital library at the Institute of Formal and Applied Linguistics ({{\\'U}FAL}), Faculty of Mathematics and Physics, Charles University}, copyright={Creative Commons - Attribution-{NonCommercial}-{ShareAlike} 4.0 International ({CC} {BY}-{NC}-{SA} 4.0)}, year={2020} }
For "plaintext" reference:
(Hajič et al., 2020)
Jan Hajič, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Barbora Štěpánková: MorfFlex CZ 2.0. Data/software, LINDAT-CLARIAH, URL: http://hdl.handle.net/11234/1-3186, 2020.
For footnote references, the following is sufficient in LaTeX papers:
\url{
http://hdl.handle.net/11234/1-3186}
Publications
Hajič Jan, Bejček Eduard, Hlaváčová Jaroslava, Mikulová Marie, Straka Milan, Štěpánek Jan, Štěpánková Barbora: Prague Dependency Treebank - Consolidated 1.0. In: Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC 2020), European Language Resources Association, Marseille, France, ISBN 979-10-95546-34-4, pp. 5208-5218, 2020. (pdf)
Hajič Jan: Disambiguation of Rich Inflection (Computational Morphology of Czech), Karolinum, Prague, Czechia, 2004.
Hlaváčová Jaroslava, Mikulová Marie, Štěpánková Barbora, Hajič Jan: Modifications of the Czech morphological dictionary for consistent corpus annotation. Jazykovedný časopis / Journal of Linguistics, Vol. 70, No. 2, Slovakia, ISSN 0021-5597, pp. 380-389, 2019.
Mikulová Marie, Hajič Jan, Hana Jiří, Hanová Hana, Hlaváčová Jaroslava, Jeřábek Emil, Štěpánková Barbora, Vidová Hladká Barbora, Zeman Daniel: Manual for Morphological Annotation, Revision for the Prague Dependency Treebank - Consolidated 2020 release. Technical report no. TR-2020-64, Institute of Formal and Applied Linguistics, Charles University, Prague, Czechia, 2020. (pdf)
Štěpánková Barbora, Mikulová Marie, Hajič Jan: The MorfFlex Dictionary of Czech as a Source of Linguistic Data. In: Proceedings of XIX EURALEX Congress: Lexicography for Inclusion, Democritus University of Thrace, Thrace, Greece, ISBN 978-618-85138-1-5, ISSN 2521-7100, pp. 387-392, 2020. (pdf)