SIS code: 
Semester: 
summer
E-credits: 
6
Examination: 
2/2 Z+Zk
Guarantor: 

Statistical Machine Translation

The course covers the area of machine translation (MT) in its current breadth, delving deep enough in each approach to let you confuse every existing MT system. We put a balanced emphasis on several imporant types of state-of-the-art systems: phrase-based MT, surface-syntactic MT and (a typically Praguian) deep-syntactic MT. We do not forget common pre-requisities and surrounding fields: extracting translation equivalents from parallel texts (including word alignment techniques), MT evaluation or methods of system combination.

We aim to provide a unifying view of machine translation as statistical search in a large search space, well supported with practical experience during your project work in a team or alone. Finally, we also attempt to give a gist of emerging approaches in MT, such as neural networks.

Lecture Materials

All lecture materials since 2008 are available in the course SVN:

https://svn.ms.mff.cuni.cz/projects/NPFL087
For read-only access use username: student and password: student

Additional Sources

Outline

  1. Metrics of MT Quality.

  2. Approaches to MT. Statistical MT. Phrase-Based MT. Moses.

  3. Parallel texts. Sentence and word alignment. hunalign, GIZA++.

  4. Morphology in MT. Factored phrase-based translation. Moses.

  5. Model optimization (MERT). Moses tools.

  6. Phrase-structure trees in MT. Parsing-based MT. Stat-XFER, Joshua.

  7. Dependency trees in MT.

  8. Tectogrammatical trees in MT. TectoMT.

  9. Advanced: Search. System combination. Neural networks in MT.

  10. Project presentations.

Grading

Key requirements:

  • Work on a project (alone or in a group of two to three).
  • Present project results (~30-minute talk).
  • Write a report (~4-page scientific paper).

Contributions to the grade:

  • 10% three MTtalks CodEx exercises,
  • 30% written exam,
  • 50% project report,
  • 10% project presentation.

Final Grade:
≥50% good, ≥70% very good, ≥90% excellent.