Morče - Czech Morphological Tagger

Introduction

Project Morče (abbreviation of "MORfologie ČEŠtiny", which means Czech morphology) is a software for morphological disambiguation (tagging) of Czech text. For every word in an input text, Morče selects one morphological interpretation from all possibilities. The algorithm is statistical, based on an idea of so-called "Averaged Perceptron" published by Michael Collins in 2002.

So far, Morče is a best stand-alone morphological tagger for Czech. It has been trained and tuned on data from PDT 2.0. Hundreds of experiments have been done to find the best set of features. Accuracy of this version is 95.1 % on evaluation test data (95.5 % on developement test data). A better set of features is available for registered users only (currently 96.0 % on developement test data).

Morče was developed in the Institute of Formal and Applied Lingustics by Jan Raab.

Morče gets on its input morphologically analysed text in format CSTS and outputs disambiguated text in the same format. It is a command line application, written in C and it was compiled for Linux and Windows platforms.

Acknowledgement

Developement of Morče is supported from grant 1ET101120503 of Academy of Sciences of the Czech Republic and grant GD201/05/H014 of Czech Science Foundation.