image/svg+xml
DepfixcomponentPrc.
Subject-predicateagr.68%
Pro-dropinsubject73%
Subject-pastparticipleagr.75%
Passive-aux’be’agr.77%
Possessivewith’of’78%
Presentcontinuous78%
Missingreflexiveverbs80%
Subjectcategoriesprojection83%
Rehangchildrenofauxverbs83%
Lostnegationrecovery90%
Precision
TectoMT Moses Depfix
Tokens[M]
CorpusSents[M]EnglishCzech
CzEng1.014.83235.67205.17
Europarl0.6517.6115.00
CommonCrawl0.164.083.63
0.2 GWord Parallel3.6 GWord Czech
Chimera – Three Heads for English-to-Czech Translation
Ondřej Bojar, Rudolf Rosa, Aleš Tamchyna {bojar,rosa,tamchyna}@ufal.mff.cuni.cz
Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague
Input
Good Sentence Structure and Unseen Forms
The source is available to all components to minimize error propagation. - hybrid (rule-based/statistical) MT system- transfer at a deep syntactic layer (t-layer)- our combination: get an extra phrase table for Moses from TectoMT output - phrase-based SMT- large-scale data- morphological tags as factors for a better grammatical coherence
Parallel Data: - rule-based error correction in MT output- output parse corrected based on source
Arizona was the first to introduce such a requirement.Arizona byla nejprve na zavedení takového požadavku.Arizona was at first on introducing such a requirement.Arizona byla první, zavede takový požadavek.Arizona was the first, it will introduce such a requirement.Arizona byla první, kdo zavedl takový požadavek.Arizona was the first who introduced such a requirement. Source:Plain Moses:TectoMT:CU-Bojar: ✔ TectoMT introduces a separate clause. ✔ CU-Bojar improves on that.
The biggest risk was for Tereshkova.Největší riziko je pro Těreškovovou.The biggest risk is for Tereshkova.Největší riziko bylo pro Těreškovovou.The biggest risk was for Tereshkova. Source:CU-Bojar:CU-Depfix: ✔ Depfix provides the correct tense.
We're not FC Barcelona!My jsme FC Barcelona!We are FC Barcelona!Nejsme FC Barcelona!We're not FC Barcelona! Source:CU-Bojar:CU-Depfix:
✔ Depfix restores lost negation.
The main anti-Soviet war leaders returned to power in 2001. Hlavní protisovětské války vůdci vrátili k moci v roce 2001. The main anti-Sovietfem war leaders returned to power in year 2001.Hlavní protisovětští váleční vůdci se vrátili k moci v roce 2001. The main anti-Sovietmasc war leaders returned to power in year 2001. Source:Plain Moses:TectoMT: ✔ TectoMT provides the correct word form, never seen in the training data. Moses
gram/number : sg
gram/sempos : n.denot
gram/number : sg
gram/person : 3
gram/sempos : n.pron.def.pers
gram/gender : anim
gram/resultative : res0
gram/deontmod : decl
gram/dispmod : disp0
gram/tense : sim
gram/verbmod : ind
gram/iterativeness : it0
gram/sempos : v
gram/diathesis : act
gram/negation : neg0
gram/number : sg
gram/person : 1
gram/sempos : n.pron.def.pers
gram/gender : nr
gram/resultative : res0
gram/deontmod : decl
gram/dispmod : disp0
gram/tense : sim
gram/verbmod : ind
gram/iterativeness : it0
gram/sempos : v
gram/diathesis : act
gram/negation : neg0
gram/number : sg
gram/person : 1
gram/sempos : n.pron.def.pers
gram/gender : nr
gram/number : sg
gram/sempos : n.denot
confidence
ACT
n:subj
The
confidence
#PersPron
ACT
n:subj
he
give
RSTR
v:rc
gives
#PersPron
PAT
n:obj
me
also
RHEM
also
explain
PRED
v:fin
explains
#PersPron
APP
n:poss
my
performance
PAT
n:obj
performance
The
AuxA
DT
confidence
Sb
NN
he
Sb
PRP
gives
Atr
VBZ
me
Obj
PRP
also
Adv
RB
explains
Pred
VBZ
my
Atr
PRP$
performance
Obj
NN
NN
IS4
Full lemma: důvěra
Full tag: NNFS1-----A-----
Full lemma: ,
Full tag:
Full lemma: který
Full tag: P4FS4-----------
Full lemma: já
Full tag: PH-S4--1--------
Full lemma: dávat
Full tag: VB-S---3P-AA---I
Full lemma: ,
Full tag:
Full lemma: také
Full tag:
Full lemma: vysvětlovat
Full tag: VB-S---3P-AA---I
Full lemma: můj
Full tag: PSIS4-S1--------
Full lemma: výkon
Full tag: NNIS4-----A-----
P4
FS4
PH
-S4
VB
-S3PA
PS
IS4
Důvěra
Sb
NN
FS1
,
kterou
mě
dává
,
také
vysvětluje
VB
-S3PA
můj
výkon
Sample Errors
TectoMT suffers from errors in analysis and often mistranslates multiword expressions or idioms: ...turning a blind eye....obrátí slepé oko. Source:TectoMT: ✘ Literal translation.
DepFix sometimes applies rules in inappropriate situations: Source:CU-Bojar:CU-Depfix:
But we're waiting in the sidelines.Ale čekáme v ústranísg.Ale čekáme v ústraníchpl. ✘ Depfix too eager at recovering plural. ✔ Reference tokens
Credits & Opportunities
generated missed Available in 1-best output of: ...but could have been taken from:
All components Plain Moses alone TectoMT alone Depfix alone orother combinations
Nowhere Plain Moses TectoMT Both CU-Bojar
Our Chimera: Beat Google
Syntax-Based TranslationimprovesSentence Structureand reducesOOV on Both Sides
Main Searchadds0.2 GWord Parallel3.6 GWord Czech Rule-Based Error Correctionrestores lost negation:
We're FC Barcelona!
not Output
+ testset translated by TectoMT and added as a separate phrase table
0.003
0.07
0.06 Three Separate Language Models:
TokenOrderSentsTokensARPA.gzTrie
[M][M][GB][GB]
stc4201.313430.9228.211.8
stc724.91444.8413.18.1
tag1014.83205.177.23.0
wordforms wordforms tags
DepfixcomponentPrc.Imp.Usl.
Subject-predicateagr.68%5.1%57%
Pro-dropinsubject73%3.4%63%
Subject-pastparticipleagr.75%6.3%42%
Passive-aux’be’agr.77%4.8%69%
Possessivewith’of’78%1.5%31%
Presentcontinuous78%1.5%31%
Missingreflexiveverbs80%1.6%64%
Subjectcategoriesprojection83%3.7%62%
Rehangchildrenofauxverbs83%5.5%62%
Lostnegationrecovery90%7.2%38%
Examples of Depfix rules with high precision.
= # improved / (# impr. + # wors.)= # modified / # evaluated= # equal / # modified precisionimpactuseless
This work was partially supported by the grants P406/11/1499 of the Grant Agency of the Czech Republic,FP7-ICT-2011-7-288487 (MosesCore) and FP7-ICT-2010-6-257528 (Khresmoi) of the European Union and by SVV project number 267314.
gram/number : sg
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person :
gram/politeness :
gram/diathesis :
gram/gender : fem
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos : n.denot
gram/negation :
gram/indeftype : relat
gram/sempos : n.pron.indef
gram/number : sg
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person : 3
gram/politeness :
gram/diathesis :
gram/gender : anim
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos : n.pron.def.pers
gram/negation :
gram/number :
gram/definiteness :
gram/resultative : res0
gram/numertype :
gram/indeftype :
gram/person :
gram/politeness :
gram/diathesis : act
gram/gender :
gram/aspect : proc
gram/deontmod : decl
gram/dispmod : disp0
gram/tense : sim
gram/verbmod : ind
gram/iterativeness : it0
gram/degcmp :
gram/sempos : v
gram/negation : neg0
gram/number : sg
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person : 1
gram/politeness :
gram/diathesis :
gram/gender : anim
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos : n.pron.def.pers
gram/negation :
gram/number :
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person :
gram/politeness :
gram/diathesis :
gram/gender :
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos :
gram/negation :
gram/number :
gram/definiteness :
gram/resultative : res0
gram/numertype :
gram/indeftype :
gram/person :
gram/politeness :
gram/diathesis : act
gram/gender :
gram/aspect : proc
gram/deontmod : decl
gram/dispmod : disp0
gram/tense : sim
gram/verbmod : ind
gram/iterativeness : it0
gram/degcmp :
gram/sempos : v
gram/negation : neg0
gram/number : sg
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person : 1
gram/politeness :
gram/diathesis :
gram/gender : anim
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos : n.pron.def.pers
gram/negation :
gram/number : sg
gram/definiteness :
gram/resultative :
gram/numertype :
gram/indeftype :
gram/person :
gram/politeness :
gram/diathesis :
gram/gender : inan
gram/aspect :
gram/deontmod :
gram/dispmod :
gram/tense :
gram/verbmod :
gram/iterativeness :
gram/degcmp :
gram/sempos : n.denot
gram/negation :
důvěra
ACT
n:1
Důvěra
který
n:4
kterou
#PersPron
ACT
n:1
dávat
RSTR
v:rc
dává
#PersPron
PAT
n:4
mě
také
RHEM
také
vysvětlovat
PRED
v:fin
vysvětluje
#PersPron
APP
n:poss
můj
výkon
PAT
n:4
výkon
Presented at WMT 2013, Sofia, Bulgaria. Transfer Analysis Synthesis Moses makes its many usual errors.
SystemBLEUTERWMTRanking
AppraiseMTurk
CU
-
TECTOMT
14.70.7410.4550.491
CU
-
BOJAR
20.1
0.6960.637
0.555
CU
-
DEPFIX
20.0
0.6930.664
0.542
PLAIN
Moses19.50.713––
G
OOGLE
T
R
.––0.6180.526
big
morphological
long ... and fixes other flaws:
MissionAccomplished