Summer Semester 2017

Students Presentations

Seminars

March 1

Course logistics prerequisties ⚫ syllabus ⚫ how to get credits

Notes on deep learning ∍ deep learning  ⚫ network building blocks  ⚫ network components as functional programming  ⚫ deep learning alchemy  ⚫ reading the learning curves

Recurrent Neural Networks  definition ⚫  RNN as a program ⚫ excercise with Euclid's algorithm

Reading: Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
Question:

What are the problems of the presented architecture? How do you think the neural MT continued after publishing this paper?

 

Project proposals for NPFL087 Statistical Machine Translation.

March 8

Recurrent Neural Networks ∍ vanilla RNNs ⚫ vanishing gradient problem ⚫ understanding LSTMs  ⚫ Gated Recurrent Units ⚫ neural language models ⚫ word embeddings  ⚫ sampling from a language model

Reading: Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
Question: What do you think is the main difference between Bahdanau's attention model and the concept of alignment in statistical MT?

March 15

Attentive sequence-to-sequence learning RNN as a probabilistic model ⚫ encoder-decoder architecture ⚫ training vs. runtime decoding ⚫ Neural Turing machines as motivation of attention ⚫ attention model  ⚫ attention vs. alignment

Implementation and performance computational graph & backpropagation  ⚫ memory consumption

Reading: Chung, Junyoung, Kyunghyun Cho, and Yoshua Bengio. "A character-level decoder without explicit segmentation for neural machine translation." arXiv preprint arXiv:1603.06147 (2016).
Question: What are the reasons authors do not use character-level encoder? How would you improve the architecture such that it would allow character level encoding?

March 23

Model Ensembling and Beam Search beam search ⚫ emsembles  ⚫ computing in log domain

Big vocabulary problem copy from source  ⚫ subword units  ⚫ character-level methods

Reading: Sennrich, Rico, et al. "Nematus: a Toolkit for Neural Machine Translation." arXiv preprint arXiv:1703.04357 (2017).
Question: Compare the Nematus models with the models from Bahdanau et al., 2014. How do they differ? Think of at least three differences.

Match 29

Implementation in TensorFlow

Reading: Shen, Shiqi, et al. "Minimum Risk Training for Neural Machine Translation." Proceeding of ACL 2016 (2016).
Question: ???

April 5

Advanced Optimization reinforcement learning ⚫ minimum risk training