Lab 10 - GIZA++ and Moses

The goal of the lab is to get GIZA++ and Moses running and to carry out an experiment comparing alignment error rate (AER) of several alignments with the final BLEU score.

In general, follow:

  1. Moses installation
  2. Baseline model construction (includes GIZA++ compilation) (you don't need the last step EMS)

More background info available in Moses Training Tutorial (see the left menu, section Training).

Detailed Steps

  1. Download and compile GIZA++: https://github.com/moses-smt/giza-pp
  2. Download and compile Moses: https://github.com/moses-smt/mosesdecoder/
  3. Pick an English->Czech corpus from OPUS: http://opus.lingfil.uu.se/?src=en&trg=cs
  4. Follow the Baseline Model Construction mentioned above to:
  5. Translate the test set: run moses with the optimized config (moses/bin/moses -f mert-work/moses.ini -i testcorpus.src.txt > mt-output.txt)
  6. Score the translations (moses/bin/evaluator --sctype BLEU --candidate mt-output.txt --reference testcorpus.tgt.txt --bootstrap 1000)

HW04 Assignment: BLEU vs. AER

The previous lab 09 and this lab 10 lead directly to the solution of your homework 04. For your homework:

  1. Run and evaluate your baseline MT system, record BLEU on the test set.
  2. Apply the same word alignment technique (i.e. GIZA++) to the concatenation of your training corpus and the test corpus. Extract the test set alignments, record AER on the test set.
  3. Repeat the previous two steps for two or more variations of the alignment or the symmetrization (e.g. intersection or union instead of the default gdfa, grow-diag-final-and).

Please submit:

Send your solutions to Ondrej Bojar (bojar -at- ufal.mff.cuni.cz , mention FEL HW04 in the subject).

Deadline: 23:59 8th January 2017