Morče - Czech Morphological Tagger

Documentation

Installation

Download and extract the archive. Change working directory to subdirectory "src" and run "make". Binaries will appear in subdirectory "bin".
Alternatively, you can download Windows and Linux x86 binaries.

Simple usage

Unix:
morce_run.sh "input_file" "output_file"

Windows:
morce_run.bat "input_file" "output_file"

Advanced usage

You will find following subdirectories in the main Morče directory:
morce/bin ... executable binaries (included in archive with Linux / Windows binaries, otherwise you need to run make first)
morce/src ... source code (included in source code archive)
morce/trained ... pre-trained coefficients for direct usage (included in both source code and binaries archives) morce/sample_data ... short sample data in CSTS format

make_ftrs makes set of features and s dictionary from training data.
Options:
	[-t "training_data" ... input file with training data in CSTS format, if not specified stdin is used]
	-o "output" ... output file with set of features
	[-e "dictionary" ... output file where dictionary should be saved]
Typical example: make_ftrs -t train.csts -e t.dct -o train.ftrs

It is advised to sort uniquely the set of features before further usage because of saving time and disk space.
Typical example: sort -u t.ftrs >t.sort

train trains coefficients on training data.
Options:
	[-t "training_data" ... input file with training data in CSTS format, if not specified stdin is used]
	-d "dictionary" ... input file with dictionary
	-f "features" ... input file with set of features
	[-n "iterations" ... number of iterations, default 10]
	[-o "prefix" ... output files with coefficients prefix]
Typical example: train -t train.csts -d t.dct -f t.sort -o train_coef

test use trained coefficients to tag test data.
Options:
	[-t "test_data" ... input file with test data in CSTS format, if not specified stdin is used]
	-d "dictionary" ... input file with dictionary
	-f "features" ... input file with set of features
	-a "coefficients" ... input file with coefficients
	[-o "output" ... output file in CSTS format]
Typical example: test -t test.csts -d t.dct -f t.sort -a train_coef_10 -o output.csts