Lab 10 - GIZA++ and Moses

The goal of the lab is to get GIZA++ and Moses running and to carry out an experiment comparing alignment error rate (AER) of several alignments with the final BLEU score.

In general, follow:

Moses installation
Baseline model construction (includes GIZA++ compilation) (you don't need the last step EMS)

More background info available in Moses Training Tutorial (see the left menu, section Training).

Detailed Steps

Download and compile GIZA++: https://github.com/moses-smt/giza-pp
Download and compile Moses: https://github.com/moses-smt/mosesdecoder/
Pick an English->Czech corpus from OPUS: http://opus.lingfil.uu.se/?src=en&trg=cs
Follow the Baseline Model Construction mentioned above to:
- Tokenize the corpus using: moses/scripts/tokenizer/tokenizer.pl
- Extract language model from the target side of the corpus (lmplz)
- "Train" the model (moses/scripts/training/train-model.perl, includes GIZA++, phrase extraction, final config)
- "Tune" the model (moses/scripts/training/mert-moses.pl: MERT, ie. weight optimization)
Translate the test set: run moses with the optimized config (moses/bin/moses -f mert-work/moses.ini -i testcorpus.src.txt > mt-output.txt)
Score the translations (moses/bin/evaluator --sctype BLEU --candidate mt-output.txt --reference testcorpus.tgt.txt --bootstrap 1000)

HW04 Assignment: BLEU vs. AER

The previous lab 09 and this lab 10 lead directly to the solution of your homework 04. For your homework:

Run and evaluate your baseline MT system, record BLEU on the test set.
Apply the same word alignment technique (i.e. GIZA++) to the concatenation of your training corpus and the test corpus. Extract the test set alignments, record AER on the test set.
- To get GIZA++ alignments for this combined corpus, use moses/scripts/training/train-model.perl --first-step=1 --last-step=3.
- To use a different symmetrization technique (e.g. union), use --first-step=3 --last-step=3 --alignment=union (assuming that step 2, GIZA++, was already performed).
Repeat the previous two steps for two or more variations of the alignment or the symmetrization (e.g. intersection or union instead of the default gdfa, grow-diag-final-and).
- At least one of the setups has to be IBM1 alignment script from lab 09, you may or may not experiment with token variation (stemming etc.) for this.
- Feel free to reduce the training data size if your implementation could not handle the same amount of data as GIZA++. (Yes, this invalidates the comparison, but I don't want to torture you with running MERT etc. again to have also GIZA++ and MT results on this reduced training corpus.)

Please submit:

A brief report on your experiment (just notes on 1 page are sufficient). Make sure to indicate:

What training corpus you used (describe exactly all parts).
A table listing the number of parallel sentences, source and target tokens for all sections: training corpus, development (=tuning) corpus and the final test corpus. Everyone should have the same test corpus.

A table listing your results (3 or more setups of word alignment/symmetrization technique), e.g.:

Setup	BLEU	AER
baseline (GIZA++ default config, gdfa)	???	???
my IBM1, intersection	???	???
my IBM1, only source-to-target	???	???
my IBM1, intersection, based on stems	???	???

The input file (tokenized etc., as fed to Moses) for a sanity check.
The output file of the run with the best BLEU score, as emited by Moses.
The output file of the run with the best AER score, as emited by Moses.

Send your solutions to Ondrej Bojar (bojar -at- ufal.mff.cuni.cz , mention FEL HW04 in the subject).

Deadline: 23:59 8th January 2017