Principal investigator (ÚFAL): 
Grant id: 
ÚFAL budget: 
551,040 EUR


Quality Translation by Deep Language Engineering Approaches

Main website:

In the last decade, the incremental advancement of mainstream research on Machine Translation (MT) has been obtained by encompassing increasingly sophisticated statistical approaches and fine-grained linguistic features that add to the surface level alignment on which these approaches are ultimately anchored.

It has been ventured recently, in some leading academic and industry circles, that the incremental progress towards quality MT of this path may be asymptotically reaching a ceiling, as more fine-grained distinctions tend to be needed to aim at better translations with fewer gains in terms of quality increase.

The goal of this project is to contribute to a quantum leap in quality MT by pursuing a novel approach that opens the way to higher quality translations and a new cycle of technological advancement.

We build on the complementarity of the two pillars of language technology — symbolic and probabilistic — and seek a quantum leap in their hybridization. We explore combinations of them that amplify their strengths and mitigate their drawbacks with a new design for the intertwining of statistical and rule-based MT.

The construction of deep treebanks has progressed to be delivering now the first significant Parallel DeepBanks, where pairs of synonymous sentences from different languages are annotated with their fully-fledged grammatical representations, up to the level of their semantic representation.

The construction of Linked Open Data and other semantic resources, in turn, has progressed now to support impactful application of lexical semantic processing that handles and resolves referential and conceptual ambiguity.

These cutting edge advances permit for the cross-lingual alignment supporting translation to be established at the level of deeper linguistic representation. The deeper the level the less language-specific differences remain among source and target sentences and new chances of success become available for the statistically based transduction.

1 FCUL Portugal

Faculdade de Ciências da Universidade de Lisboa

2 DFKI Germany Deutsche Forschungszentrum für Künstliche Intelligenz GmbH
3 CUNI Czechia Univerzita Karlova v Praze
4 IICT-BAS Bulgaria Institute of Information and Communication Technologies
5 UBER Germany Humboldt-Universität zu Berlin
6 UPV/EHU Spain Universidad del País Vasco EHU UPV
7 UG Netherlands Rijksuniversiteit Groningen
8 HF Portugal Higher Functions-Sistemas Informáticos Inteligentes, Lda