Principal investigator (ÚFAL): 
Provider: 
Grant id: 
1356213
Duration: 
2013-2015

DepRefSet

Utilizing a Multitude of References in Machine Translation

The project will explore possibilities of using an excessive number of reference translations (in the
orders of thousands up to tens of thousands) in statistical machine translation, focusing on
phrase-based translation. We will address the problem of automatically constructing the
multitude of references and of their quick manual validation. We will design methods for
improving automatic evaluation of machine translation using this data source. We will also
develop approaches for integrating the multitude of reference translations into parameter
optimization of translation systems, with the aim of improving their stability and potentially also
translation quality. The developed methods will be thoroughly evaluated by experiments on the
English-Czech language pair. However, we assume that the techiques themselves will be language-
independent.

 

Publications

Bojar Ondřej, Macháček Matouš, Tamchyna Aleš, Zeman Daniel: Scratching the Surface of Possible Translations. In: LNCS, Vol. 8082, pp. 465-474, 2013. Presented at TSD 2013.

Releases

DepRefSet data set: Many Czech References for 50 Sentences Selected from WMT11 Data [LINDAT]