There has been an increasing interest in using statistical machine translation (SMT) for the task of Grammatical Error Correction (GEC) for English-as-a-Second-Language (ESL) learners. Two of the three highest-scoring systems of the CoNLL-2014 Shared Task were SMT-based. The currently highest-scoring result published for the CoNLL-2014 test set has been achieved by a system combination of the five best CoNLL-2014 submissions built with MEMT (a tool for MT system combination).
In this talk, we demonstrate how a single SMT-based system can match and outperform the result of the mentioned combined system. Furthermore, this system outperforms any other published results (including our own CoNLL-2014 submission) for a single system by a margin of several percent F-Score when the same training data is being used.
These results are achieved by adapting current state-of-the art methods for phrase-based SMT specifically to the problem of GEC. We report on the effects of:
- Parameter tuning for GEC
- Introducing GEC-specific dense and sparse features
- Using large-scale data