Deep Learning – Summer 2017/18
In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.
About
SIS code: NPFL114
Semester: summer
E-credits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
- lecture: Czech lecture is held on Monday 10:40 in S9, English lecture on Monday 13:10 in S9
- practicals: there are four parallel practicals, on Monday 15:40 in SU1, Monday 17:20 in SU1, on Tuesday 9:00 in SU1 and on Tuesday 10:40 in SU1
Lectures
1. Introduction to Deep Learning Slides Video numpy_entropy mnist_layers_activations
2. Training Neural Networks Slides Video mnist_training gym_cartpole
3. Training Neural Networks II Slides Video mnist_dropout uppercase
4. Convolutional Networks Slides Video mnist_conv mnist_competition
5. Convolutional Networks II Slides Video mnist_batchnorm fashion_masks
6. Easter Monday 3d_recognition
7. Object Detection & Segmentation, Neural Networks Slides Video nsketch_transfer sequence_classification sequence_prediction
8. Recurrent Neural Networks II, Word Embeddings Slides Video tagger_we tagger_cle tagger_cnne tagger_sota
9. Recurrent Neural Networks III, Machine Translation Slides Video lemmatizer_noattn lemmatizer_attn lemmatizer_sota
10. Deep Generative Models Slides Video vae gan dcgan nli
11. Sequence Prediction, Reinforcement Learning Slides Video tagger_crf phoneme_recognition monte_carlo
12. Sequence Prediction II, Reinforcement Learning II Slides Video q_learning q_network reinforce reinforce_with_baseline reinforce_with_pixels
13. Practical Methodology, TF Development, Advanced Architectures Slides Video Master Thesis Proposals hyperparams_gp hyperparams_rl eager_mnist estimator_mnist
Requirements
To pass the practicals, you need to obtain at least 80 points, which are awarded for home assignments. Note that up to 40 points above 80 will be transfered to the exam.
To pass the exam, you need to obtain at least 55, 70 and 85 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.
The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
The student recordings of the lectures and the practicals are available here.
1. Introduction to Deep Learning
Feb 26 Slides Video numpy_entropy mnist_layers_activations
- Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
- Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
- Gaussian distribution [Section 3.9.3 of DLB]
- Machine Learning Basics [Section 5.1-5.1.3 of DLB]
- History of Deep Learning [Section 1.2 of DLB]
- Linear regression [Section 5.1.4 of DLB]
- Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
- Challenges Motivating Deep Learning [Section 5.11 of DLB]
- Neural network basics (this topic is treated in detail withing the lecture NAIL002)
- Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
- Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
- Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
- Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
2. Training Neural Networks
Mar 05 Slides Video mnist_training gym_cartpole
- Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
- Hyperparameters and validation sets [Section 5.3 of DLB]
- Maximum Likelihood Estimation [Section 5.5 of DLB]
- Neural network training (this topic is treated in detail withing the lecture NAIL002)
- Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
- Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3; note that Algorithms 6.5 and 6.6 are used in practice]
- SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
- SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
- SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
- Optimization algorithms with adaptive gradients
- AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
- RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
- Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]
3. Training Neural Networks II
Mar 12 Slides Video mnist_dropout uppercase
- Training neural network with a single hidden layer
- Playing with TensorFlow Playground
- Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 10-12]
- Regularization [Chapter 7 until Section 7.1 of DLB]
- Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
- L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
- Dataset Augmentation [Section 7.4 of DLB]
- Ensembling [Section 7.11 of DLB]
- Dropout [Section 7.12 of DLB]
4. Convolutional Networks
Mar 19 Slides Video mnist_conv mnist_competition
- Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
- Gradient clipping [Section 10.11.1 of DLB]
- Parameter initialization strategies [Section 8.4 of DLB]
- Introduction to convolutional networks [Chapter 9 and Sections 9.1-9.3 of DLB]
- Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
- Max pooling and average pooling [Section 9.3 of DLB]
- Stride and Padding schemes [Section 9.5 of DLB]
- AlexNet [Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
- Prior probabilities of convolutional network architecture [Deep Image Prior]
5. Convolutional Networks II
Mar 26 Slides Video mnist_batchnorm fashion_masks
- VGG [Karen Simonyan and Andrew Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition]
- GoogLeNet (aka Inception) [Christian Szegedy et al.: Going Deeper with Convolutions]
- Batch normalization [Section 8.7.1 of DLB, optionally the paper Sergey Ioffe and Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
- Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]
- ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
- WideNet [Wide Residual Network]
- ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]
- NasNet [Learning Transferable Architectures for Scalable Image Recognition]
6. Easter Monday
Apr 02 3d_recognition
Easter Monday
7. Object Detection & Segmentation, Neural Networks
Apr 09 Slides Video nsketch_transfer sequence_classification sequence_prediction
- Object detection using Fast R-CNN [Ross Girshick: Fast R-CNN]
- Proposing RoIs using Faster R-CNN [Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
- Image segmentation [Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN]
- Layer Normalization [Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton: Layer Normalization]
- Group Normalization [Yuxin Wu, Kaiming He: Group Normalization]
- Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
- The challenge of long-term dependencies [Section 10.7 of DLB]
8. Recurrent Neural Networks II, Word Embeddings
Apr 16 Slides Video tagger_we tagger_cle tagger_cnne tagger_sota
- Long Short-Term Memory (LSTM) [Section 10.10.1 of DLB, Sepp Hochreiter, Jürgen Schmidhuber (1997): Long short-term memory, felix A. Gers, Jürgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM]
- Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation]
Word2vec
word embeddings, notably the CBOW and Skip-gram architectures [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space]- Hierarchical softmax and Negative sampling [Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality]
- Bidirectional RNN [Section 10.3 of DLB]
- Character-level embeddings using Recurrent neural networks [C2W model from Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
- Character-level embeddings using Convolutional neural networks [CharCNN from Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush: Character-Aware Neural Language Models]
- Character-level embeddings using character n-grams [Described simultaneously in several papers as Charagram (John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu: Charagram: Embedding Words and Sentences via Character n-grams), Subword Information (Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov: Enriching Word Vectors with Subword Information or SubGram (Tom Kocmi, Ondřej Bojar: SubGram: Extending Skip-Gram Word Representation with Substrings)]
9. Recurrent Neural Networks III, Machine Translation
Apr 23 Slides Video lemmatizer_noattn lemmatizer_attn lemmatizer_sota
- Highway Networks [Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber: Training Very Deep Networks]
- Variational Dropout [Yarin Gal, Zoubin Ghahramani: A Theoretically Grounded Application of Dropout in Recurrent Neural Networks]
- Layer Normalization [Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton: Layer Normalization]
- Neural Machine Translation using Encoder-Decoder or Sequence-to-Sequence architecture [Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks and Kyunghyun Cho et al.: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation]
- Using Attention mechanism in Neural Machine Translation [Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
- Translating Subword Units [Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
- Google NMT [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
- Translating without RNNs with attention only [Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin: Attention Is All You Need}]
10. Deep Generative Models
Apr 30 Slides Video vae gan dcgan nli
- Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 14-14.2.3 of DLB]
- Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
- Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB, Diederik P Kingma, Max Welling: Auto-Encoding Variational Bayes]
- Generative Adversarial Networks [Section 20.10.4 of DLB, Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio: Generative Adversarial Networks, Alec Radford, Luke Metz, Soumith Chintala: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Martin Arjovsky, Soumith Chintala, Léon Bottou: Wasserstein GAN]
11. Sequence Prediction, Reinforcement Learning
May 07 Slides Video tagger_crf phoneme_recognition monte_carlo
Study material for Reinforcement Learning is the second edition of Reinforcement Learning: An Introduction by Richar S. Sutton, available only as a draft.
- Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa: Natural Language Processing (Almost) from Scratch]
- Connectionist Temporal Classification (CTC) loss [A. Graves, S. Fernández, F. Gomez, J. Schmidhuber: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
- Multi-arm Bandits [Chapter 2, Sections 2.1-2.5 of Sutton's Book]
- General setting of Reinforcement Learning [Chapter 3, Sections 3.1-3.3 of Sutton's Book]
- Monte Carlo Reinforcement Learning Algorithm [Chapter 5, Sections 5.1-5.4 (especially the algorithm in 5.4) of Sutton's Book]
12. Sequence Prediction II, Reinforcement Learning II
Mar 19 Slides Video q_learning q_network reinforce reinforce_with_baseline reinforce_with_pixels
- Temporal Difference RL Methods [Section 6.1 of Sutton's Book]
- SARSA algorithm [Section 6.4 of Sutton's Book]
- Q-Learning algorithm [Section 6.5 of Sutton's Book]
- Deep Q-Network [Volodymyr Mnih et al.: Human-level control through deep reinforcement learning]
- Policy Gradient Methods [Section 13.1 of Sutton's Book]
- Policy Gradient Theorem [Section 13.2 of Sutton's Book]
- REINFORCE Algorithm [Section 13.3 of Sutton's Book; note that the gamma^t on the last line should not be there]
- REINFORCE with Baseline Algorithm [Section 13.4 of Sutton's Book; note that the gamma^t on the last two lines should not be there]
- Actor-Critic Reinforce Learning Algorithm [Section 13.5 of Sutton's Book]
13. Practical Methodology, TF Development, Advanced Architectures
May 21 Slides Video Master Thesis Proposals hyperparams_gp hyperparams_rl eager_mnist estimator_mnist
- Hyperparameter selection using Reinforcement Learning [Learning Transferable Architectures for Scalable Image Recognition]
- Hyperparameter selection using Bayesian Optimization [Practical Bayesian Optimization of Machine Learning Algorithms, Google Vizier: A Service for Black-Box Optimization]
- TensorFlow Eager Mode
- TensorFlow Estimator API
- TensorFlow Data API
- WaveNet [WaveNet: A Generative Model for Raw Audio]
- Unsupervised Generation of a Word Dictionary [Word Translation Without Parallel Data]
- Memory Augmented Networks [One-shot learning with Memory-Augmented Neural Networks]
The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.4, TensorFlow 1.5.0, NumPy 1.14.0 and OpenAI Gym 0.9.5.
You can install TensorFlow 1.5.0 either to user packages using
pip3 install --user tensorflow==1.5.0
, or create a virtual
environment using python3 -m venv VENV_DIR
and then installing
TensorFlow inside it by running VENV_DIR/bin/pip3 install tensorflow==1.5.0
.
Note that updates about the tasks (notably changes in the task descriptions) are announced on the UFAL NPFL114 mailing list. However, the mailing list will not contain anything not present on this website.
Teamwork
Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
numpy_entropy
Deadline: Mar 12, 15:39 3 points
The goal of this exercise is to famirialize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.
Load a file numpy_entropy_data.txt
, whose lines consist of data points of our
dataset, and load numpy_entropy_model.txt
, which describes a model probability distribution,
with each line being a tab-separated pair of (data point, probability).
Example files are in the labs/01.
Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or inf
for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):
- entropy H(data distribution)
- cross-entropy H(data distribution, model distribution)
- KL-divergence DKL(data distribution, model distribution)
Use natural logarithms to compute the entropies and the divergence. The evaluation on ReCodEx is performed on data structurally similar to numpy_entropy_eval_examples.zip.
mnist_layers_activations
Deadline: Mar 12, 15:39 5 points
The motivation of this exercise is to famirialize a bit with TensorFlow and
TensorBoard. Start by playing with
mnist_example.py.
Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs
. Then open http://localhost:6006 in a browser and explore the three
active tabs – Scalars, Images and Graphs.
Your goal is to modify the mnist_layers_activations.py template and implement the following:
- A number of hidden layers (including zero) can be specified on the command line
using parameter
layers
. - Activation function of these hidden layers can be also specified as a command
line parameter
activation
, with supported values ofnone
,relu
,tanh
andsigmoid
. - Print the final accuracy on the test set to standard output. Write the
accuracy as percentage rounded on two decimal places, e.g.,
91.23
.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
- 0 layers, activation none
- 1 layer, activation none, relu, tanh, sigmoid
- 3 layers, activation sigmoid, relu
- 5 layers, activation sigmoid
mnist_training
Deadline: Mar 19, 15:39 4 points
This exercise should teach you using different optimizers and learning rates (including exponential decay). Your goal is to modify the mnist_training.py template and implement the following:
- Using specified optimizer (either
SGD
orAdam
). - Optionally using momentum for the
SGD
optimizer. - Using specified initial learning rate for the optimizer.
- Optionally use given final learning rate. If the final learning rate is
given, implement exponential learning rate decay (using
tf.train.exponential_decay
). Specifically, for the whole first epoch, train using the given initial learning rate. Then lower the learning rate between epochs by multiplying it each time by the same suitable constant, such that the whole last epoch is trained using the specified final learning rate. - Print the final accuracy on the test set to standard output. Write the
accuracy as percentage rounded on two decimal places, e.g.,
91.23
.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
SGD
optimizer,learning_rate
0.01;SGD
optimizer,learning_rate
0.01,momentum
0.9;SGD
optimizer,learning_rate
0.1;Adam
optimizer,learning_rate
0.001;Adam
optimizer,learning_rate
0.01 andlearning_rate_final
0.001.
gym_cartpole
Deadline: Mar 19, 15:39 5 points
Solve the CartPole-v1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole-data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.
The solution to this task should be a model which passes evaluation on random
inputs. This evaluation is performed by running the
gym_cartpole_evaluate.py,
which loads a model and then evaluates it on 100 random episodes (optionally
rendering if --render
option is provided). In order to pass, you must achieve
an average reward of at least 475 on 100 episodes.
The size of the training data is very small and you should consider it when designing the model.
To submit your model in ReCodEx, use the supplied
gym_cartpole_recodex.py
script. When executed, the script embeds the saved model in current
directory into a script gym_cartpole_recodex_submission.py
, which can
be submitted in ReCodEx. Note that by default there are at most
five submission attempts, write me if you need more.
mnist_dropout
Deadline: Mar 26, 15:39 3 points
This exercise evaluates the effect of dropout. Your goal is to modify the mnist_dropout.py template and implement the following:
- Allow using dropout with specified dropout rate on the hidden layer. The dropout must be active only during training and not during test set evaluation.
- Print the final accuracy on the test set to standard output. Write the
accuracy as percentage rounded on two decimal places, e.g.,
91.23
.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):
- dropout rate
0
,0.3
,0.5
,0.6
,0.8
,0.9
uppercase
Deadline: Mar 26, 15:39 6-10 points
This assignment introduces first textual task. Your goal is to implement a network which is given a Czech text and it tries to uppercase appropriate letters. Specifically, your goal is to uppercase given test set as well as possible. The task data is available in uppercase_data.zip archive. While the training and the development sets are in correct case, the test set is all in lowercase.
This is an open-data task, so you will submit only the uppercased test set (in addition to a training script, which will be used only to understand the approach you took).
The task is also a competition. Everyone who submits a solution which achieves at least 96.5% accuracy will get 6 points; the rest 4 points will be distributed depending on relative ordering of your solutions, i.e., the best solution will get total 10 points, the worst solution (but at least with 96.5% accuracy) will get total 6 points. The accuracy is computed per-character and will be evaluated by uppercase_eval.py script.
If you want, you can start with the
uppercase.py
template, which loads the data, generate an alphabet of given size containing most frequent
characters, and can generate sliding window view on the data.
To represent characters, you might find tf.one_hot
useful.
To submit the uppercased test set in ReCodEx, use the supplied
uppercase_recodex.py
script. You need to provide at least two arguments – the first is the path to
the uppercased test data and all other arguments are paths to the sources used
to generate the test data. Running the script will create
uppercase_recodex_submission.py
file, which can be submitted in ReCodEx.
Do not use RNNs or CNNs in this task, only densely connected layers (with various activation and output functions).
mnist_conv
Deadline: Apr 02, 15:39 3 points
In this assignment, you will be training convolutional networks. Start with
the mnist_conv.py
template and implement the following functionality using the tf.layers
module.
The architecture of the
network is described by the cnn
parameter, which contains comma-separated
specifications of sequential layers:
C-filters-kernel_size-stride-padding
: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example:C-10-3-1-same
M-kernel_size-stride
: Add max pooling with specified size and stride. Example:M-3-2
F
: Flatten inputs.R-hidden_layer_size
: Add a dense layer with ReLU activation and specified size. Example:R-100
For example, when using --cnn=C-10-3-2-same,M-3-2,F,R-100
, the development
accuracies after first five epochs on my CPU TensorFlow version are
95.14, 97.00, 97.68, 97.66, and 97.98. However, some students also obtained
slightly different results on their computers and still passed ReCodEx
evaluation.
After implementing this task, you should continue with mnist_batchnorm
.
mnist_competition
Deadline: Apr 02, 15:39 5-10 points
The goal of this assignment is to devise the best possible model for MNIST data set. However, in order for the test set results not to be available, use the data from mnist-gan.zip. It was created using GANs (generative adversarial networks) from the original MNIST data and contain fake test labels (all labels are 255).
This is an open-data task, you will submit only test set labels (in addition to a training script, which will be used only to understand the approach you took).
The task is a competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 99.4%, you will be awarded 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
The
mnist_competition.py
template loads data from mnist-gan
directory and in the end saves
the test labels in the required format (each label on a separate line).
To submit the test set labels in ReCodEx, use the supplied
mnist_competition_recodex.py
script. You need to provide at least two arguments – the first is the path to
the test set labels and all other arguments are paths to the sources used
to generate the test data. Running the script will create
mnist_competition_recodex_submission.py
file, which can be submitted in ReCodEx.
mnist_batchnorm
Deadline: Apr 08, 23:59 3 points
In this assignment, you will work with extend the mnist_conv
assignment to
support batch normalization. Start with the
mnist_batchnorm.py
template and in addition to all functionality of mnist_conv
, implement also
the following layer:
CB-filters-kernel_size-stride-padding
: Add a convolutional layer with BatchNorm and ReLU activation and specified number of filters, kernel size, stride and padding, Example:CB-10-3-1-same
To correctly implement BatchNorm:
- The convolutional layer should not use any activation and no biases.
- The output of the convolutional layer is passed to batch normalization layer
tf.layers.batch_normalization
, which should specifytraining=True
during training andtraining=False
during inference. - The output of the batch normalization layer is passed through tf.nn.relu.
- You need to update the moving averages of mean and variance in the batch normalization
layer during each training batch. Such update operations can be obtained using
tf.get_collection(tf.GraphKeys.UPDATE_OPS)
and utilized either directly insession.run
, or (preferably) attached toself.train
usingtf.control_dependencies
.
For example, when using --cnn=CB-10-3-2-same,M-3-2,F,R-100
, the development
accuracies after first five epochs on my CPU TensorFlow version are
95.92, 97.54, 97.84, 97.76, and 98.18. However, some students also obtained
slightly different results on their computers and still passed ReCodEx
evaluation.
You can now experiment with various architectures and try obtaining best accuracy on MNIST.
fashion_masks
Deadline: Apr 08, 23:59 6-12 points
This assignment is a simple image segmentation task. The data for this task is available from fashion-masks.zip. The inputs consist of 28×28 greyscale images of ten classes of clothing, while the outputs consist of the correct class and a pixel bit mask. Your goal is to generate such outputs for the test set (including to a training script, which will be used only to understand the approach you took).
Performance is evaluated using mean IoU, where IoU for a single example is defined as an intersection of the gold and system mask divided by their union (assuming the predicted label is correct; if not, IoU is 0). The evaluation (using for example development data) can be performed by fashion_masks_eval.py script.
The task is a competition and the points will be awarded depending on your test set score. If your test set score surpasses 75%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that quite a straightfoward model surpasses 80% on development set after an hour of computation (and 90% after several hours), so reaching 75% is not that difficult.
You should start with the fashion_masks.py template, which loads the data, computes averate IoU and on the end produces test set annotations in the required format (one example per line containing space separated label and mask, the mask stored as zeros and ones, rows first).
To submit the test set annotations in ReCodEx, use the supplied fashion_masks_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
3d_recognition
Deadline: Apr 15, 23:59 7-13 points
Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data (visualization of objects of all classes) or 32×32×32 data (visualization of objects of all classes). As usual, this is an open data task; therefore, your goal is to generate labels for unannotated test set. Note that the original dataset contains only train and test portion – you need to use part of train portion as development set.
The task is a competition and the points will be awarded depending on your test set accuracy. If your test set score surpasses 75%, you will be awarded 7 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that even straightfoward models can reach more than 90% on the test set, the current state-of-the-art is more than 98%.
You should start with the 3d_recognition.py template, which loads the data, split development set from the training data, and on the end produces test set annotations in the required format.
To submit the test set annotations in ReCodEx, use the supplied 3d_recognition_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
nsketch_transfer
Deadline: Apr 22, 23:59 6-12 points
This assignment demonstrates usefulness of transfer learning. The goal is to train a classifier for hand-drawn sketches. The dataset of 224×224 grayscale sketches categorized in 250 classes is available from nsketch.zip. Again, this is an open data task, and your goal is to generate labels for unannotated test set.
The task is a competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 40%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions.
To solve the task with transfer learning, start with a pre-trained ImageNet network (NASNet A Mobile is used in the template, but feel free to use any) and convert images to features. Then (probably in a separate script) train a classifier processing the precomputed features into required classes. This approach leads to at least 52% accuracy on development set. To improve the accuracy, you can then finetune the original network – compose the pre-trained ImageNet network together with the trained classifier and continue training the whole composition. Such finetuning should lead to at least 70% accuracy on development set (using ResNet).
You should start with the nsketch_transfer.py template, which loads the data, creates NASNet network and load its weight, evaluates and predicts using batches, and on the end produces test set annotations in the required format. However, feel free to use multiple scripts for solving this assignment. The above template requires NASNet sources and pretrained weights, which you can download among others here. An independent example of using NASNet for classification is also available as nasnet_classify.py.
To submit the test set annotations in ReCodEx, use the supplied nsketch_transfer_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
sequence_classification
Deadline: Apr 22, 23:59 3 points
This exercise demonstrates tf.nn.dynamic_rnn
, shows convergence speed and
illustrates exploding gradient issue and how to fix it with gradient clipping.
The network should process sequences of 50 small integers and compute parity
for each prefix of the sequence. The inputs are either 0/1, or vectors with
one-hot representation of small integer.
Your goal is to modify the sequence_classification.py template and implement the following:
- Use specified RNN cell type (
RNN
,GRU
andLSTM
) and dimensionality. - Process the sequence using
tf.nn.dynamic_rnn
. - Use additional hidden layer on the RNN outputs if requested.
- Implement gradient clipping if requested.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:
--rnn_cell=RNN --sequence_dim=1
,--rnn_cell=GRU --sequence_dim=1
,--rnn_cell=LSTM --sequence_dim=1
- the same as above but with
--sequence_dim=2
- the same as above but with
--sequence_dim=10
--rnn_cell=LSTM --hidden_layer=50 --rnn_cell_dim=30 --sequence_dim=30
and the same with--clip_gradient=1
- the same as above but with
--rnn_cell=RNN
- the same as above but with
--rnn_cell=GRU --hidden_layer=70
sequence_prediction
Deadline: Apr 22, 23:59 3 points
The motivation of this exercise is to learn low-level handling of RNN cells. The network should learn to predict one specific sequence of montly totals of international airline passengers from 1949-1960.
Your goal is to modify the sequence_prediction.py template and implement the following:
- Use specified RNN cell type (
RNN
,GRU
andLSTM
) and dimensionality. - For the training part of the sequence, the network should sequentially predict the elements, using the correct previous element value as inputs.
- For the testing part of the sequence, the network should sequentially predict the elements using its own previous prediction.
- After each epoch, print the
tf.losses.mean_squared_error
of the test part prediction using the"{:.2g}"
format.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Note that the network does not regularize and only uses one sequence, so it is quite brittle.
- try
RNN
,GRU
andLSTM
cells - try dimensions of 5, 10 and 50
tagger_we
Deadline: Apr 29, 23:59 2 points
In this assignment you will create a simple part-of-speech tagger. For training and evaluation, use czech-cac.zip data containing Czech tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The dataset can be loaded using the morpho_dataset.py module.
Your goal is to modify the tagger_we.py template and implement the following:
- Use specified RNN cell type (
GRU
andLSTM
) and dimensionality. - Create word embeddings for training vocabulary.
- Process the sentences using bidirectional RNN.
- Predict part-of-speech tags.
- You need to properly handle sentences of different lengths in one batch.
- Note how resettable metrics are handled by the template.
After submitting the task to ReCodEx, continue with tagger_cle
and/or
tagger_cnne
assignment.
You should also experiment with what effect does the RNN cell type and cell dimensionality have on the results.
tagger_cle
Deadline: Apr 29, 23:59 2 points
This task is a continuation of tagger_we
assignment.
Using the tagger_cle.py
template, add the following features in addition to tagger_we
ones:
- Create character embeddings for training alphabet.
- Process unique words with a bidirectional character-level RNN.
- Create character word-level embeddings as a sum of the final forward and backward state.
- Properly distribute the CLEs of unique words into the batches of sentences.
- Generate overall embeddings by concatenating word-level embeddings and CLEs.
Once submitted to ReCodEx, you should experiment with the effect of CLEs
compared to plain tagger_we
, and the influence of their dimensionality.
Note that tagger_we
has by default word embeddings twice the size of
word embeddings in tagger_cle
.
tagger_cnne
Deadline: Apr 29, 23:59 2 points
This task is a continuation of tagger_we
assignment.
Using the tagger_cnne.py
template, add the following features in addition to tagger_we
ones:
- Create character embeddings for training alphabet.
- Process unique words with one-dimensional convolutional filters with kernel size of 2 to some given maximum. To obtain a fixed-size representation, perform chanel-wise max-pooling over the whole word.
- Generate convolutional embeddings (CNNE) as a concatenation of features corresponding to the ascending kernel sizes.
- Properly distribute the CNNEs of unique words into the batches of sentences.
- Generate overall embeddings by concatenating word-level embeddings and CNNEs.
Once submitted to ReCodEx, you should experiment with the effect of CNNEs
compared to plain tagger_we
, and the influence of the maximum kernel size and
number of filters. Note that tagger_we
has by default word embeddings twice
the size of word embeddings in tagger_cnne
.
tagger_sota
Deadline: Apr 29, 23:59 4-15 points
The goal of this task is to improve the state-of-the-art in Czech part-of-speech tagging. The current state-of-the-art is (to my best knowledge) from Spoustová et al., 2009 and is 95.67% in supervised and 95.89% in semi-supervised settings.
For training use the czech-pdt.zip dataset, which can be loaded using the morpho_dataset.py module. Note that the dataset contains more than 1500 unique POS tags and that the POS tags have a fixed structure of 15 positions (so it is possible to generate the POS tag characters independently).
Additionally, you can also use outputs of a morphological analyzer czech-pdt-analysis.zip. For each word form in train, dev and test PDT data, an analysis is present either in a file containing results from a manually generated morphological dictionary, or in a file with results from a trained morphological guesser. Both files have the same structure – each line describes one word form which is stored on the beginning of the line, followed by tab-separated lemma-tag pairs from the analyzer.
This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 90%, you will be awarded 4 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Any solution surpassing 95.89% will get additional 5 points. The evaluation (using for example development data) can be performed by morpho_eval.py script.
You can start with the tagger_sota.py template, which loads the PDT data, loads the morphological analysers data, and finally generates the predictions in the required format (which is exactly the same as the input format).
To submit the test set annotations in ReCodEx, use the supplied tagger_sota_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
lemmatizer_noattn
Deadline: May 06, 23:59 4 points
In this assignment you will create a simple lemmatizer. For training and evaluation, use czech-cac.zip data containing Czech tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The dataset can be loaded using the morpho_dataset.py module.
Your goal is to modify the lemmatizer_noattn.py template and implement the following:
- Embed characters of source forms and run a forward GRU encoder.
- Embed characters of target lemmas.
- Implement a training time decoder which uses gold target characters as inputs.
- Implement an inference time decoder which uses previous predictions as inputs.
- The initial state of both decoders is the output state of the corresponding GRU encoded form.
After submitting the task to ReCodEx, continue with lemmatizer_attn
assignment.
lemmatizer_attn
Deadline: May 06, 23:59 2 points
This task is a continuation of lemmatizer_noattn
assignment.
Using the lemmatizer_attn.py
template, add the following features in addition to lemmatizer_noattn
ones:
- Run the encoder using bidirectional GRU.
- Implement attention in both decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply non-linearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.
Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.
lemmatizer_sota
Deadline: May 06, 23:59 4-13 points
The goal of this task is to improve the state-of-the-art in Czech lemmatization. The current state-of-the-art is (to my best knowledge) czech-morfflex-pdt-161115 reimplementation of Spoustová et al., 2009 tagger and achieves 97.86% lemma accuracy.
As in tagger_sota
assignment, for training use the
czech-pdt.zip
dataset, which can be loaded employing the
morpho_dataset.py
module. Additionally, you can also use outputs of a morphological analyzer
czech-pdt-analysis.zip.
This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 90%, you will be awarded 4 points; the rest 4 points will be distributed depending on relative ordering of your solutions. Any solution surpassing 97.86% will get additional 5 points. The evaluation (using for example development data) can be performed by morpho_eval.py script.
You can start with the lemmatizer_sota.py template, which loads the PDT data, loads the morphological analysers data, and finally generates the predictions in the required format (which is exactly the same as the input format).
To submit the test set annotations in ReCodEx, use the supplied lemmatizer_sota_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
vae
Deadline: May 13, 23:59 3 points
In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format.
Your goal is to modify the vae.py template and implement a functional VAE using the embedded TODO notes.
After submitting the task to ReCodEx, you can experiment with the three
available datasets (fashion
, cifar-cars
and mnist-data
) and different
latent variable dimensionality (z_dim=2
and z_dim=100
). The generated images
are available in TensorBoard logs.
gan
Deadline: May 13, 23:59 3 points
In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format.
Your goal is to modify the gan.py template and implement a functional GAN using the embedded TODO notes.
After submitting the task to ReCodEx, you can experiment with the three
available datasets (fashion
, cifar-cars
and mnist-data
) and maybe try
different latent variable dimensionality. The generated images are available in
TensorBoard logs.
You can also continue with dcgan
task.
dcgan
Deadline: May 13, 23:59 1 points
This task is a continuation of gan
assignment, which you will modify to
implement the Deep Convolutional GAN (DCGAN).
Your goal is to modify the
dcgan.py
template to implement a DCGAN using the embedded TODO notes. Note that
most of the TODO notes are from gan
assignment.
After submitting the task to ReCodEx, you can experiment with the three
available datasets (fashion
, cifar-cars
and mnist-data
). However, not that
you will need a lot of computational power (preferably a GPU) to generate the
images.
nli
Deadline: May 13, 23:59 6-12 points
In this competition you will be solving the Native Language Identification task. In that task, you get an English essay writen by a non-native individual and your goal is to identify their native language.
We will be using NLI Shared Task 2013 data, which contains documents in 11 languages. For each language, the train, development and test sets contain 900, 100 and 100 documents, respectively. Particularly interesting is the fact that humans are very bad in this task, while machine learning models can achive quite high accuracy. Notably, the 2013 shared tasks winners achieved 83.6% accuracy, while current state-of-the-art is at least 87.1% (Malmasi and Dras, 2017).
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, you can use nli_dataset.py script.
This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 50%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions. An evaluation (using for example development data) can be performed by nli_eval.py.
You can start with the nli.py template, which loads the data and generates predictions in the required format (language of each essay on a line).
To submit the test set annotations in ReCodEx, use the supplied nli_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
tagger_crf
Deadline: May 20, 23:59 1 points
This task is an extension of tagger_we
assignment.
Using the tagger_crf.py
template, in addition to tagger_we
features, implement training and decoding
with a CRF output layer, using the tf.contrib.crf
module.
Once submitted to ReCodEx, you should experiment with the effect of CRF
compared to plain tagger_we
. Note however that the effect of CRF on tagging
is minor – more appropriate task is for example named entity recognition,
which you can experiment with using Czech Named Entity Corpus
czech-cnec.zip.
phoneme_recognition
Deadline: May 20, 23:59 6-10 points
This assignment is a competition task in speech recognition area. Specifically,
your goal is to predict a sequence of phonemes given a spoken utterance.
We will be using TIMIT corpus, with input sound waves passed through the usual
preprocessing – computing 13
Mel-frequency cepstral coefficients (MFCCs)
each 10 milliseconds and appending their derivation, obtaining 26 floats for
every 10 milliseconds. You can repeat exactly this preprocessing on a given wav
file using the timit_mfcc26_preprocess.py
script.
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, you can use timit_mfcc26_dataset.py module.
This task is an open-data competition and the points will be awarded depending on your
test set performance. The generated phoneme sequences are evaluated using edit distance to the gold
phoneme sequence, normalized by the length of the phoneme sequence
(i.e., exactly as tf.edit_distance
). If your test set score surpasses 50%, you will be
awarded 6 points; the rest 6 points will be distributed depending on relative
ordering of your solutions. An evaluation (using for example development data)
can be performed by timit_mfcc26_eval.py.
You can start with the phoneme_recognition.py template. You will need to implement the following:
- The CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
- Convert the input phoneme sequences into sparse representation
(
tf.where
andtf.gather_nd
are useful). - Use a bidirectional RNN and an output linear layer without activation.
- Utilize CTC loss (
tf.nn.ctc_loss
). - Perform decoding by a CTC decoder (either greedily using
tf.nn.ctc_greedy_decoder
, or with beam search employingtf.nn.ctc_beam_search_decoder
). - Evaluate results using normalized edit distance (
tf.edit_distance
). - Write the generated phoneme sequences.
To submit the test set annotations in ReCodEx, use the supplied phoneme_recognition_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.
monte_carlo
Deadline: May 20, 23:59 3 points
Solve the CartPole-v1 environment environment from the OpenAI Gym using the Monte Carlo reinforcement learning algorithm. Note that this task does not require TensorFlow.
Use the supplied cart_pole_evaluator.py module (depending on gym_evaluator.py to interact with the discretized environment. The environment has the following methods and properties:
states
: number of states of the environmentactions
: number of actions of the environmentreset(start_evaluate=False) → new_state
: starts a new episodestep(action) → new_state, reward, done, info
: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environment-specific informationrender()
: render current environment state
Once you finish training (which you indicate by passing start_evaluate=True
to reset
), your goal is to reach an average reward of 475 during 100
evaluation episodes. Note that the environment prints your 100-episode
average reward each 10 episodes even during training.
You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.
During evaluation in ReCodEx, three different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.
q_learning
Deadline: May 27, 23:59 3 points
Solve the MountainCar-v0 environment environment from the OpenAI Gym using the Q-learning reinforcement learning algorithm. Note that this task does not require TensorFlow.
Use the supplied mountain_car_evaluator.py
module (depending on gym_evaluator.py
to interact with the discretized environment. The environment
methods and properties are described in the monte_carlo
assignment.
Your goal is to reach an average reward of -140 during 100 evaluation episodes.
You can start with the q_learning.py template, which parses several useful parameters, creates the environment and illustrates the overall usage. Note that setting hyperparameters of Q-learning is a bit tricky – I usualy start with a larger value of ε (like 0.2 or even 0.5) an then gradually decrease it to almost zero.
During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.
q_network
Deadline: May 27, 23:59 2 points
Solve the MountainCar-v0 environment environment from the OpenAI Gym using a Q-network (neural network variant of Q-learning algorithm).
Note that training DQN (Deep Q-Networks) is inherently tricky and unstable. Therefore, we will implement a direct analogue of tabular Q-learning, allowing the network to employ independent weights for every discretized environment state.
Use the supplied mountain_car_evaluator.py
module (depending on gym_evaluator.py
to interact with the discretized environment. The environment
methods and properties are described in the monte_carlo
assignment.
Your goal is to reach an average reward of -200 during 100 evaluation episodes.
You can start with the q_network.py template. Note that setting hyperparameters of Q-network is even more tricky than for Q-learning – if you try to vary the architecture, it might not learn at all.
During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 10 minutes.
reinforce
Deadline: May 27, 23:59 2 points
Solve the CartPole-v1 environment environment from the OpenAI Gym using the REINFORCE algorithm.
Use the supplied cart_pole_evaluator.py
module (depending on gym_evaluator.py
to interact with the continuous environment. The environment
has the same properties and methods as the discrete environment described
in monte_carlo
task, with the following additions:
- the continuous environment has to be created with
discrete=False
option state_shape
: the shape describing the floating point state tensorstates
: as the number of states is infinite, raises an exception
Once you finish training (which you indicate by passing start_evaluate=True
to reset
), your goal is to reach an average reward of 475 during 100
evaluation episodes. Note that the environment prints your 100-episode
average reward each 10 episodes even during training. You should start with the
reinforce.py
template.
During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.
After solving this task, you should continue with reinforce_with_baseline
.
reinforce_with_baseline
Deadline: May 27, 23:59 2 points
This is a continuation of reinforce
assignment.
Using the reinforce_with_baseline.py template, modify the REINFORCE algorithm to use a baseline.
Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.
During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.
reinforce_with_pixels
Deadline: May 27, 23:59 6 points
This is an experimental task which might require a lot of time to solve.
The goal of this assignment is to extend the reinforce_with_baseline
assignment to make it work on pixel inputs.
The supplied cart_pole_pixels_evaluator.py
module (depending on gym_evaluator.py
generates a pixel representation of the CartPole
environment
as 80×80 image with three channels, with each channel representing one time step
(i.e., the current situation and the two previous ones).
Start with the reinforce_with_pixels.py template, which contains a rich collection of summaries that you can use to explore the behaviour of the model being trained.
Note that this assignment is not trivial – it takes some time and resources to
make the training progress at all. To show any progress, your goal is to
reach an average reward of 50 during 100 evaluation episodes. As before, the
evaluation period begins only after you call reset
with start_evaluate
.
During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 15 minutes.
Because the time limit is 15 minutes per test, you cannot probably train
the model directly in ReCodEx. Instead, you need to save the trained model and embed
it in your Python solution (see the gym_cartpole
assignment for an example
of saving the model and then embedding it in a Python source).
hyperparams_gp
Deadline: Jun 03, 23:59 2 points
The goal of this assignment is to try performing automatic hyperparameter
search. Your goal is to optimize conv_net.py
model with several hyperparameters, so that it achieves highest validation
accuracy on Fashion MNIST dataset after two epochs of training.
The hyperparameters and their possible values and distributions are described
in the ConvNet.hyperparameters
method.
Implement the search using the skopt
package (can be installed using
pip3 install [--user] scikit-optimize
), and print best accuracy
after 15 trials. Implement the two following strategies:
random_search
: use random search in the hyperparameter spacegp_ei
: use gaussian process approach (skopt.gp_minimize
) with expected improvement (EI) acquisition function
This task is evaluated manually. After you submit your solution to ReCodEx (which will not pass automatic evaluation), write me an email and I will perform the evaluation.
hyperparams_rl
Deadline: Jun 03, 23:59 3 points
The goal of this assignment is to try performing automatic hyperparameter
search. Your goal is to optimize conv_net.py
model with several hyperparameters, so that it achieves highest validation
accuracy on Fashion MNIST dataset after two epochs of training.
The hyperparameters and their possible values and distributions are described
in the ConvNet.hyperparameters
method.
Implement the search using reinforcement learning. Notably, generate the hyperparameters using a forward LSTM with dimensionality 16, generating individual hyperparameters on each time step.
This task is evaluated manually. After you submit your solution to ReCodEx (which will not pass automatic evaluation), write me an email and I will perform the evaluation.
eager_mnist
Deadline: Jun 03, 23:59 3 points
In this assignment, you will implement a simple MNIST CNN classification model using TensorFlow Eager.
Your goal is to start with the eager_mnist.py template and implement training and evaluation using Eager mode according to the template instructions.
estimator_mnist
Deadline: Jun 03, 23:59 3 points
In this assignment, you will implement a simple MNIST convolutional classification
model using tf.estimator
API.
Your goal is to start with the
estimator_mnist.py
template and implement training and evaluation using Estimator
according
to the template instructions.
The exam is primarily written and consists of 5 question, each worth 20 points. The required number of points (including the maximum of 40 surplus points from the practicals) to obtain grades 1, 2, 3 are 85, 70 and 55, respectively. An example exam is available.
Generally, only the topics covered on the lecture are part of the exam (i.e., you should be able to tell me what I told you). The references are to Deep Learning Book, unless stated otherwise.
-
Computation model of neural networks
- acyclic graph with nodes and edges
- evaluation (forward propagation) [Algorithm 6.1]
- activation functions [
tanh
andReLU
s, including equations] - output functions [
σ
andsoftmax
, including equations (3.30 and 6.29); you should also know howsoftmax
is implemented to avoid overflows]
-
Backpropagation algorithm [Algorithm 6.2; Algorithms 6.5 and 6.6 are used in practise, i.e., during
tf.train.Optimizer.compute_gradients
, so you should understand the idea behind them, but you do not have to understand the notation ofop.bprop
etc. from Algorithms 6.5 and 6.6] -
Gradient descent and stochastic gradient descent algorithm [Section 5.9]
-
Maximum likelihood estimation (MLE) principle [Section 5.5, excluding 5.5.2]
- negative log likelihood as a loss derived by MLE
- mean square error loss derived by MLE from Gaussian prior [Equations (5.64)-(5.66)]
-
In addition to have theoretical knowledge of the above, you should be able to perform all of it on practical examples – i.e., if you get a network with one hidden layer, a loss and a learning rate, you should perform the forward propagation, compute the loss, perform backpropagation and update weights using SGD. In order to do so, you should be able to derivate softmax with NLL, sigmoid with NLL and linear output with MSE.
-
Stochastic gradient descent algorithm improvements (you should be able to write the algorithms down and understand motivations behind them):
- learning-rate decay
- SGD with momentum [Section 8.3.2 and Algorithm 8.2]
- SGD with Nestorov Momentum (and how it is different from normal momentum) [Section 8.3.3 and Algorithm 8.3]
- AdaGrad (you should be able to explain why, in case of stationary gradient distribution, AdaGrad effectively decays learning rate) [Section 8.5.1 and Algorithm 8.4]
- RMSProp (and why is it a generalization of AdaGrad) [Section 8.5.2 and Algorithm 8.5]
- Adam (and why the bias-correction terms (1-β^t) are there) [Section 8.5.3 and Algorithm 8.7]
-
Regularization methods:
- Early stopping [Section 7.8, without the How early stopping acts as a regularizer part]
- L2 regularization [First paragraph of 7.1.1 and Equation (7.5)]
- L1 regularization [Section 7.1.2 up to Equation (7.20)]
- Dropout [just the description of the algorithm]
- Batch normalization [Section 8.7.1]
-
Gradient clipping [Section 10.11.1]
-
Convolutional networks:
- Basic convolution and cross-correlation operation on 4D tensors [Equations (9.5) and (9.6)]
- Differences compared to a fully connected layer [Section 9.2 and Figure 9.6]
- Multiple channels in a convolution [Equation (9.7)]
- Stride and padding schemes [Section 9.5 up to page 349, notably Equation (9.8)]
- Max pooling and average pooling [Section 9.3]
- AlexNet [general architecture without knowing specific constants, i.e., convolutional layers combined with pooling layers and two fully connected layers at the end; Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
- ResNet [only the important ideas (mainly residual connections) and overall architecture of ResNet 152; Kaiming He et al.: Deep Residual Learning for Image Recognition]
- Object detection using Fast R-CNN [overall architecture, RoI-pooling layer, parametrization of generated bounding boxes, used loss function; Ross Girshick: Fast R-CNN]
- Proposing RoIs using Faster R-CNN [overall architecture, the differences and similarities of Fast R-CNN and the proposal network from Faster R-CNN; Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
- Image segmentation with Mask R-CNN [overall architecture, RoI-align layer; Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN]
-
Recurrent networks:
- Using RNNs to represent sequences [Figure 10.2 with
h
as output; Chapter 10 and Section 10.1] - Using RNNs to classify every sequence element [Figure 10.3; details in Section 10.2 excluding Sections 10.2.1-10.2.4]
- Bidirectional RNNs [Section 10.3]
- Encoder-decoder sequence-to-sequence RNNs [Section 10.4; note that you should know how the network is trained and also how it is later used to predict sequences]
- Stacked (or multi-layer) LSTM [Figure 10.13a of Section 10.10.5; more details (not required for the exam) can be found in Alex Graves: Generating Sequences With Recurrent Neural Networks]
- The problem of vanishing and exploding gradient [Section 10.7]
- Long Short-Term Memory (LSTM) [Section 10.10.1]
- Gated Recurrent Unit (GRU) [Section 10.10.2]
- Using RNNs to represent sequences [Figure 10.2 with
-
Word representations [in all cases, you should be able to describe the algorithm for computing the embedding, and how the backpropagation works (there is usually nothing special, but if I ask what happens if a word occurs multiple time in a sentence, you should be able to answer)]
- The
word2vec
word embeddings- CBOW and Skip-gram models [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space]
- Hierarchical softmax [Section 12.4.3.2, or Section 2.1 of the following paper]
- Negative sampling [Section 2.2 of Tomas Mikolov, Ilya Sutskever,
Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of
Words and Phrases and their
Compositionality]; note that
negative sampling is a simplification of Importance sampling described
in Section 12.4.3.3, with
w_i=1
; the proposal distribution inword2vec
being unigram distribution to the power of 3/4
- Character-level embeddings using RNNs [C2W model from Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
- Character-level embeddings using CNNs [CharCNN from Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush: Character-Aware Neural Language Models]
- The
-
Highway Networks [Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber: Training Very Deep Networks]
-
Machine Translation
- Translation using encoder-decoder (also called sequence-to-sequence) architecture [Sections 10.4 and Section 12.4.5]
- Attention mechanism in NMT [Section 12.4.5.1, but you should also know the equations for the attention, notably Equations (4), (5), (6) and (A.1.2) of Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
- Subword units [The BPE algorithm from Section 3.2 of Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
-
Deep generative models using differentiable generator nets [Section 20.10.2]:
- Variational autoencoders [Section 20.10.3 up to page 698 (excluding),
together with Reparametrization trick from Section 20.9 (excluding Section
20.9.1)]
- Regular autoencoders [undercomplete AE – Section 14.1, sparse AE – first two paragraphs of Section 14.2.1, denoising AE – Section 14.2.2]
- Generative Adversarial Networks [Section 20.10.4 up to page 702 (excluding)]
- Variational autoencoders [Section 20.10.3 up to page 698 (excluding),
together with Reparametrization trick from Section 20.9 (excluding Section
20.9.1)]
-
Structured Prediction
- Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa: Natural Language Processing (Almost) from Scratch]
- Connectionist Temporal Classification (CTC) loss [A. Graves, S. Fernández, F. Gomez, J. Schmidhuber: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
-
Reinforcement learning [note that proofs are not required for reinforcement learning; all references are to the Mar 2018 draft of second edition of Reinforcement Learning: An Introduction by Richar S. Sutton]
- Multi-arm bandits [Chapter 2, Sections 2.1-2.5]
- General setting of reinforcement learning [agent-environment, action-state-reward, return; Chapter 3, Sections 3.1-3.3]
- Monte Carlo reinforcement learning algorithm [Sections 5.1-5.4, especially the algorithm in Section 5.4]
- Temporal Difference RL Methods [Section 6.1]
- SARSA algorithm [Section 6.4]
- Q-Learning [Section 6.5; you should also understand Eq. (6.1) and (6.2)]
- Policy gradient methods [representing policy by the network, using
softmax, Section 13.1]
- Policy gradient theorem [Section 13.2]
- REINFORCE algorithm [Section 13.3; note that the
γ^t
on the last line should not be there] - REINFORCE with baseline algorithm [Section 13.4; note that the
γ^t
on the last two lines should not be there]