Summer Semester 2017

Students Presentations


March 1

Course logistics prerequisties ⚫ syllabus ⚫ how to get credits

Notes on deep learning ∍ deep learning  ⚫ network building blocks  ⚫ network components as functional programming  ⚫ deep learning alchemy  ⚫ reading the learning curves

Recurrent Neural Networks  definition ⚫  RNN as a program ⚫ excercise with Euclid's algorithm

Reading: Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

What are the problems of the presented architecture? How do you think the neural MT continued after publishing this paper?


Project proposals for NPFL087 Statistical Machine Translation.

March 8

Recurrent Neural Networks ∍ vanilla RNNs ⚫ vanishing gradient problem ⚫ understanding LSTMs  ⚫ Gated Recurrent Units ⚫ neural language models ⚫ word embeddings  ⚫ sampling from a language model

Reading: Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
Question: What do you think is the main difference between Bahdanau's attention model and the concept of alignment in statistical MT?

March 15

Attentive sequence-to-sequence learning RNN as a probabilistic model ⚫ encoder-decoder architecture ⚫ training vs. runtime decoding ⚫ Neural Turing machines as motivation of attention ⚫ attention model  ⚫ attention vs. alignment

Implementation and performance computational graph & backpropagation  ⚫ memory consumption

Reading: Chung, Junyoung, Kyunghyun Cho, and Yoshua Bengio. "A character-level decoder without explicit segmentation for neural machine translation." arXiv preprint arXiv:1603.06147 (2016).
Question: What are the reasons authors do not use character-level encoder? How would you improve the architecture such that it would allow character level encoding?

March 23

Model Ensembling and Beam Search beam search ⚫ emsembles  ⚫ computing in log domain

Big vocabulary problem copy from source  ⚫ subword units  ⚫ character-level methods

Reading: Sennrich, Rico, et al. "Nematus: a Toolkit for Neural Machine Translation." arXiv preprint arXiv:1703.04357 (2017).
Question: Compare the Nematus models with the models from Bahdanau et al., 2014. How do they differ? Think of at least three differences.

Match 29

Implementation in TensorFlow

Reading: Shen, Shiqi, et al. "Minimum Risk Training for Neural Machine Translation." Proceeding of ACL 2016 (2016).
April 5

Advanced Optimization reinforcement learning ⚫ minimum risk training