NPFL116 - Compendium of Neural Machine Translation
This seminar should make the students familiar with the current research trends in machine translation using deep neural networks. The students should most importantly learn how to deal with the ever-growing body of literature on empirical research in machine translation and critically asses its content. The semester consists of few lectures summarizing the state of the art, discussions on reading assignments and student presentation of selected papers.
Students will for 3 team of 3 students and 1 team of 2 students and present one of the following group of papers to the fellow students. The students will not only prepare a presentation of the paper but also questions for discussion after the paper presentation.
Others should also get familiar with the paper, so they can participate in the discussion.
It is recommended, though not required, to arrange a consultation with the teachers at least one day before the presentation.
Unsupervised machine translation is an active research topic where the goal is creating a machine translation system without the necessity of having huge corpora of parallel data to train the models.
So far, there were two papers on this topic:
Two years ago, there have been many papers attempting to use reinforcement learning for machine translation and optimize the model directly towards sentence-level BLUE score instead of cross-entropy which appears to be clearly sub-optimal. This methods have not been much successful, mainly because the inherent limitation of BLEU score.
Generative Adversarial Networks with the generator-discriminator setup are a follow-up of this research. A trained discriminator plays a role optimization metric, its goal is to discriminate between a generated and human translation, the generator on the other hand tries to fool the discriminator and generate as close translation to human reference as possible.
The following papers will be presented:
Facebook recently came with a sequence-to-sequence architecture that is base entirely on convolutional networks. This allows parallel processing of the input sentence. The autoregressive nature of the decoder does not allow parallel decoding in the inference time, however it is still possible at the training time when the target sentence is known.
The architecture was introduced in series of two papers:
Non-autoregressive regressive models can generate the whole output sequence in parallel and do not need to wait before the previous word is generated to update the hidden state.
A group that will choose this topic will choose two papers from the following list: