MT Marathon 2013 Projects

We collected and shared proposed projects for about two months before the Marathon. As usual, project topics settled on the first day of the MT Marathon. Most of the projects actually made it to the final presentation and some will continue even (long) after the Marathon.

Summary of Projects after a Year and Later

These are MT Marathon 2013 projects that made it to the final presentation. Some of them evolved even further.


The project ran in a very stealth mode but evolved into an LREC paper:

  • Christian Buck, Kenneth Heafield, and Bas van Ooyen {N-gram Counts and Language Models from the Common Crawl [PDF] Proc. of LREC 2014, Reykjavík, Iceland.

And the obtained data are available for download.



Forest MIRA (Forest rescoring in Joshua for MIRA training)


Inline Tag Handling

Accomplished after the MT Marathon week, some small adoption.

Internal tree structure for GHKM rules in Moses

Most work done during Marathon, fully accomplished afterwards (significant rewrite needed for the 'score' program from Moses training pipeline). The implementation is now part of Moses' master branch on GitHub.

It has been used as the basis of some subsequent work, including the implementation of syntactic constraints in Edinburgh's WMT14 English-to-German system. Some further details are in Section 3 of the paper:

  • Philip Williams, Rico Sennrich, Maria Nadejde, Matthias Huck, Eva Hasler, and Philipp Koehn: Edinburgh's Syntax-Based Systems at WMT 2014 [PDF] Proceedings of the Ninth Workshop on Statistical Machine Translation. 2014.

In future work, new feature functions may be developed that make use of the internal tree structure. One possible application would be to employ it for STSG-style features in tree-to-string translation.

Language Model Interpolation

Under reasonably active development after the Marathon.

Jacana Word Aligner

Project finished during the Marathon.

Now hosted at

Later used in a term project for the MT class at UPenn.

New features, testing and refactoring Joshua

A Discriminative Lexicon for Translating to Morphologically Rich Languages

The integration of Vowpal Wabbit into Moses was later rewritten and now is part of the Moses trunk.



Multipass Decoding in Moses with CSLM


Extending KenLM Pruning

Concluded, in Moses/kenlm master branch.



Sparse Features for Reordering

Under reasonably active development afterwards.

Social Media Machine Translation Toolkit (SMMTT)

Wrapped up during Marathon. Toolkit available at:

What Are Those Projects, Anyway?

Projects are the cornerstone of MT Marathons. In practice, they consist of:

  • Crazy advertisement and reporting sessions:
  • Leisure debates in small groups:
  • Serious discussions:
  • Frenetic coding:
  • Swamped computers and networks:
  • But also minutes of peaceful deliberation: