UMC - ÚFAL Multilingual Corpora


UMC is an collection of multilingual corpora compiled at the Institute of Formal and Applied Linguistics (ÚFAL). Depending on our future needs and funding sources, UMC will grow in various languages, text types at different speeds.


UMC is (and always will be) available for research, educational and non-profit use free of charge. Contact us if you are interested in obtaining a different type of license.



The development of UMC tools and corpora happens in our subversion repository and accompanying Trac system:

Related Projects

The following projects are closely related to UMC for various reasons:


The work on UMC was supported by the following grants:

FP6-IST-5-034291-STP (EuroMatrix)
2009, 2010
FP7-ICT-2007-3-231720 (EuroMatrix Plus)

Institute of Formal and Applied Linguistics (ÚFAL)
Ondřej Bojar, bojar <at>
$Id: index.html 377 2011-02-04 12:46:15Z zeman $