MT Marathon 2013 Projects
We collected and shared proposed projects for about two months before the Marathon. As usual, project topics settled on the first day of the MT Marathon. Most of the projects actually made it to the final presentation and some will continue even (long) after the Marathon.
- List of proposed projects as updated during MT Marathon 2013.
- The same interactive document as of now.
- Boaster slides used on Monday to attract project participants (use username and password 'mtm').
- Project interim presentations (use username and password 'mtm').
- Project final presentations (use username and password 'mtm').
- Wiki (and SVN) for the projects (use username and password 'mtm'), where some may have more news.
Summary of Projects after a Year and Later
These are MT Marathon 2013 projects that made it to the final presentation. Some of them evolved even further.
CommonCrawl
The project ran in a very stealth mode but evolved into an LREC paper:
- Christian Buck, Kenneth Heafield, and Bas van Ooyen {N-gram Counts and Language Models from the Common Crawl [PDF] Proc. of LREC 2014, Reykjavík, Iceland.
And the obtained data are available for download.
CorefMT
Dormant?
Forest MIRA (Forest rescoring in Joshua for MIRA training)
Dormant?
Inline Tag Handling
Accomplished after the MT Marathon week, some small adoption.
Internal tree structure for GHKM rules in Moses
Most work done during Marathon, fully accomplished afterwards (significant rewrite needed for the 'score' program from Moses training pipeline). The implementation is now part of Moses' master branch on GitHub.
It has been used as the basis of some subsequent work, including the implementation of syntactic constraints in Edinburgh's WMT14 English-to-German system. Some further details are in Section 3 of the paper:
- Philip Williams, Rico Sennrich, Maria Nadejde, Matthias Huck, Eva Hasler, and Philipp Koehn: Edinburgh's Syntax-Based Systems at WMT 2014 [PDF] Proceedings of the Ninth Workshop on Statistical Machine Translation. 2014.
In future work, new feature functions may be developed that make use of the internal tree structure. One possible application would be to employ it for STSG-style features in tree-to-string translation.
Language Model Interpolation
Under reasonably active development after the Marathon.
Jacana Word Aligner
Project finished during the Marathon.
Now hosted at https://code.google.com/p/jacana-xy/
Later used in a term project for the MT class at UPenn.
New features, testing and refactoring Joshua
A Discriminative Lexicon for Translating to Morphologically Rich Languages
The integration of Vowpal Wabbit into Moses was later rewritten and now is part of the Moses trunk.
MTSpell
Dormant?
Multipass Decoding in Moses with CSLM
Dormant?
Extending KenLM Pruning
Concluded, in Moses/kenlm master branch.
QuEst@MTM
Dormant?
Sparse Features for Reordering
Under reasonably active development afterwards.
Social Media Machine Translation Toolkit (SMMTT)
Wrapped up during Marathon. Toolkit available at: https://github.com/wlin12/SMMTT
What Are Those Projects, Anyway?
Projects are the cornerstone of MT Marathons. In practice, they consist of:
-
Crazy advertisement and reporting sessions:
-
Leisure debates in small groups:
-
Serious discussions:
-
Frenetic coding:
-
Swamped computers and networks:
-
But also minutes of peaceful deliberation: