This seminar should make the students familiar with the current research trends in machine translation using deep neural networks. The students should most importantly learn how to deal with the ever-growing body of literature on empirical research in machine translation and critically asses its content. The semester consists of few lectures summarizing the state of the art, discussions on reading assignments and student presentation of selected papers.
SIS code: NPFL116
Semester: summer
E-credits: 3
Examination: 0/2 C
Instructors: Jindřich Libovický, Jindřich Helcl
The course is not taught in this semester. Looking forward to see you in 2020.
1. Introductory notes on machine translation and deep learning Logistics NN Intro Reading Questions
2. Neural architectures for NLP Slides Reading Questions
3. Attentive sequence-to-sequence learning using RNNs Slides Reading Questions
4. Sequence-to-sequence learning with self-attention, a.k.a Transformer Slides Reading: BPE Reading: Backtranslation Questions
5. Tricks for improving NMT performance Reading Questions
6. Unsupervised Neural Machine Translation
7. Generative Adversarial Networks
Introduction
Reading 1.5 hour LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature 521.7553 (2015): 436.
Questions
Feb 28 Slides
Covered topics: embeddings, RNNs, vanishing gradient, LSTM, 1D convolution
Reading 2 hours
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
Questions
Mar 6 Slides
Covered topics: recurrent langauge model, RNN decoder, feedforward attention mechanism
Reading 2 hours
Vaswani, Ashish, et al. Attention is all you need. Advances in Neural Information Processing Systems. 2017.
Questions
Mar 27 Slides
Covered topics:
Reading: BPE Reading: Backtranslation 2 hours in total
Questions
Apr 3
Covered topics:
Reading 1 hour
Yun Chen, Kyunghyun Cho, Samuel R. Bowman, Victor O.K. Li: Stable and Effective Trainable Greedy Decoding for Sequence to Sequence Learning. Accepted to ICLR 2018.
Questions
Apr 10
Unsupervised machine translation is an active research topic where the goal is creating a machine translation system without the necessity of having huge corpora of parallel data to train the models.
So far, there were two papers on this topic:
Apr 17
Two years ago, there have been many papers attempting to use reinforcement learning for machine translation and optimize the model directly towards sentence-level BLUE score instead of cross-entropy which appears to be clearly sub-optimal. This methods have not been much successful, mainly because the inherent limitation of BLEU score.
Generative Adversarial Networks with the generator-discriminator setup are a follow-up of this research. A trained discriminator plays a role optimization metric, its goal is to discriminate between a generated and human translation, the generator on the other hand tries to fool the discriminator and generate as close translation to human reference as possible.
The following papers will be presented:
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., ... & Bengio, Y. (2014). Generative Adversarial Nets. In Advances in neural information processing systems (pp. 2672-2680).
Wu, L., Xia, Y., Zhao, L., Tian, F., Qin, T., Lai, J., & Liu, T. Y. (2017). Adversarial neural machine translation. arXiv preprint arXiv:1704.06933.
Apr 24
Non-autoregressive regressive models can generate the whole output sequence in parallel and do not need to wait before the previous word is generated to update the hidden state.
Gu, J., Bradbury, J., Xiong, C., Li, V. O., & Socher, R. (2017). Non-Autoregressive Neural Machine Translation. arXiv preprint arXiv:1711.02281.
Lee, J., Mansimov, E., & Cho, K. (2018) Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement. arXiv preprint arXiv:1802.06901.
May 15
Facebook recently came with a sequence-to-sequence architecture that is base entirely on convolutional networks. This allows parallel processing of the input sentence. The autoregressive nature of the decoder does not allow parallel decoding in the inference time, however it is still possible at the training time when the target sentence is known.
The architecture was introduced in series of two papers:
Gehring, Jonas, et al. (2017) A Convolutional Encoder Model for Neural Machine Translation. ACL 2017.
Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional Sequence to Sequence Learning. International Conference on Machine Learning (pp. 1243-1252).
There will be a reading assignment after every class. You will be given few question about the reading that you should submit before the next lecture.
Students will form teams and present one of the selected groups of papers to the fellow students. The students will not only prepare a presentation of the paper but also questions for discussion after the paper presentation.
Others should also get familiar with the paper, so they can participate in the discussion.
It is strongly encouraged to arrange a consultation with the course instructors at least one day before the presentation.
There will be a final written test that will not be graded.