Deep Learning – Summer 2019/20
In recent years, deep neural networks have been used to solve complex machinelearning problems. They have achieved significant stateoftheart results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving stateoftheart results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.
About
SIS code: NPFL114
Semester: summer
Ecredits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
 lectures: Czech lecture is held on Monday 9:50 in S5, English lecture on Tuesday 9:50 in S5; first lecture is on Feb 24
 practicals: there are three parallel practicals, a Czech one on Monday 12:20 in SU2, and two English ones on Tuesday 12:20 in SU2 and on Tuesday 14:00 in SW1; first practicals are on Feb 24/25
Lectures
1. Introduction to Deep Learning Slides PDF Slides 2018 Video numpy_entropy pca_first mnist_layers_activations
2. Training Neural Networks Slides PDF Slides 2018 Video Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
3. Training Neural Networks II Slides PDF Slides 2018 Video Questions explore_examples mnist_regularization mnist_ensemble uppercase
4. Convolutional Neural Networks Slides PDF Slides Video 2018 Video Questions mnist_cnn image_augmentation tf_dataset cifar_competition
5. Convolutional Neural Networks II Slides PDF Slides Video 2018 Video Questions mnist_web cags_classification cags_segmentation
6. Object Detection Slides PDF Slides Video 2018 Video I 2018 Video II Questions cnn_manual bboxes_utils svhn_competition
7. Recurrent Neural Networks Slides PDF Slides Video 2018 Video I 2018 Video II Questions 3d_recognition sequence_classification tagger_we
8. Word Embeddings, CRF, CTC Slides PDF Slides Video 2018 Video I 2018 Video II Questions mnist_multiple tagger_cle_rnn tagger_cle_cnn tagger_competition speech_recognition
9. Word2Vec, Seq2seq, NMT Slides PDF Slides Video 2018 Video I 2018 Video II Questions tensorboard_projector lemmatizer_noattn lemmatizer_attn lemmatizer_competition
10. Deep Generative Models Slides PDF Slides Video 2018 Video Questions vae gan dcgan
11. Introduction to Deep Reinforcement Learning Slides PDF Slides Video 2018 Video I 2018 Video II Questions omr_competition monte_carlo reinforce reinforce_baseline reinforce_pixels
12. Speech Synthesis, External Memory Networks Slides PDF Slides Video 2018 Video Questions
13. Transformer, BERT Slides PDF Slides Video Questions sentiment_analysis
License
Unless otherwise stated, teaching materials for this course are available under CC BYSA 4.0.
The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
1. Introduction to Deep Learning
Feb 24 Slides PDF Slides 2018 Video numpy_entropy pca_first mnist_layers_activations
 Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
 Selfinformation, entropy, crossentropy, KLdivergence [Section 3.13 of DBL]
 Gaussian distribution [Section 3.9.3 of DLB]
 Machine Learning Basics [Section 5.15.1.3 of DLB]
 History of Deep Learning [Section 1.2 of DLB]
 Linear regression [Section 5.1.4 of DLB]
 Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
 Challenges Motivating Deep Learning [Section 5.11 of DLB]
 Neural network basics
 Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
 Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
 Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
 Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
2. Training Neural Networks
Mar 2 Slides PDF Slides 2018 Video Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
 Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
 Hyperparameters and validation sets [Section 5.3 of DLB]
 Maximum Likelihood Estimation [Section 5.5 of DLB]
 Neural network training
 Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
 Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.1 and 6.2; note that Algorithms 6.5 and 6.6 are used in practice]
 SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
 SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
 SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
 Optimization algorithms with adaptive gradients
 AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
 RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
 Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]
3. Training Neural Networks II
Mar 9 Slides PDF Slides 2018 Video Questions explore_examples mnist_regularization mnist_ensemble uppercase
 Training neural network with a single hidden layer
 Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 1012]
 Regularization [Chapter 7 until Section 7.1 of DLB]
 Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
 L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 1718]
 Dataset augmentation [Section 7.4 of DLB]
 Ensembling [Section 7.11 of DLB]
 Dropout [Section 7.12 of DLB]
 Label smoothing [Section 7.5.1 of DLB]
 Saturating nonlinearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
 Parameter initialization strategies [Section 8.4 of DLB]
 Gradient clipping [Section 10.11.1 of DLB]
4. Convolutional Neural Networks
Mar 23 Slides PDF Slides Video 2018 Video Questions mnist_cnn image_augmentation tf_dataset cifar_competition
 Introduction to convolutional networks [Chapter 9 and Sections 9.19.3 of DLB]
 Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
 Max pooling and average pooling [Section 9.3 of DLB]
 Stride and Padding schemes [Section 9.5 of DLB]
 AlexNet [ImageNet Classification with Deep Convolutional Neural Networks]
 VGG [Very Deep Convolutional Networks for LargeScale Image Recognition]
 GoogLeNet (aka Inception) [Going Deeper with Convolutions]
 Batch normalization [Section 8.7.1 of DLB, optionally the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
 Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]
 ResNet [Deep Residual Learning for Image Recognition]
5. Convolutional Neural Networks II
Mar 30 Slides PDF Slides Video 2018 Video Questions mnist_web cags_classification cags_segmentation
 Residual CNN Networks
 ResNet [Deep Residual Learning for Image Recognition]
 WideNet [Wide Residual Network]
 DenseNet [Densely Connected Convolutional Networks]
 PyramidNet [Deep Pyramidal Residual Networks]
 ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]
 Regularizing CNN Networks
 SENet [SqueezeandExcitation Networks]
 EfficientNet [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks]
6. Object Detection
Apr 06 Slides PDF Slides Video 2018 Video I 2018 Video II Questions cnn_manual bboxes_utils svhn_competition
 Fast RCNN [Fast RCNN]
 Proposing RoIs using Faster RCNN [Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks]
 Mask RCNN [Mask RCNN]
 Feature Pyramid Networks [Feature Pyramid Networks for Object Detection]
 Focal Loss, RetinaNet [Focal Loss for Dense Object Detection]
 EfficientDet [EfficientDet: Scalable and Efficient Object Detection]
 Group Normalization [Group Normalization]
7. Recurrent Neural Networks
Apr 14 Slides PDF Slides Video 2018 Video I 2018 Video II Questions 3d_recognition sequence_classification tagger_we
 Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
 The challenge of longterm dependencies [Section 10.7 of DLB]
 Long ShortTerm Memory (LSTM) [Section 10.10.1 of DLB, Sepp Hochreiter, Jürgen Schmidhuber (1997): Long shortterm memory, Felix A. Gers, Jürgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM]
 Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Highway Networks [Training Very Deep Networks]
 RNN Regularization
 Variational Dropout [A Theoretically Grounded Application of Dropout in Recurrent Neural Networks]
 Layer Normalization [Layer Normalization]
 Bidirectional RNN [Section 10.3 of DLB]
 Word Embeddings [Section 14.2.4 of DLB]
8. Word Embeddings, CRF, CTC
Apr 20 Slides PDF Slides Video 2018 Video I 2018 Video II Questions mnist_multiple tagger_cle_rnn tagger_cle_cnn tagger_competition speech_recognition
 Characterlevel embeddings using Recurrent neural networks [C2W model from Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
 Characterlevel embeddings using Convolutional neural networks [CharCNN from CharacterAware Neural Language Models]
 Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of Natural Language Processing (Almost) from Scratch]
 Connectionist Temporal Classification (CTC) loss [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
9. Word2Vec, Seq2seq, NMT
Apr 27 Slides PDF Slides Video 2018 Video I 2018 Video II Questions tensorboard_projector lemmatizer_noattn lemmatizer_attn lemmatizer_competition
Word2vec
word embeddings, notably the CBOW and Skipgram architectures [Efficient Estimation of Word Representations in Vector Space] Hierarchical softmax [Section 12.4.3.2 of DLB or Distributed Representations of Words and Phrases and their Compositionality]
 Negative sampling Distributed Representations of Words and Phrases and their Compositionality]
 Characterlevel embeddings using character ngrams [Described simultaneously in several papers as Charagram (Charagram: Embedding Words and Sentences via Character ngrams), Subword Information (Enriching Word Vectors with Subword Information or SubGram (SubGram: Extending SkipGram Word Representation with Substrings)]
 Neural Machine Translation using EncoderDecoder or SequencetoSequence architecture [Section 12.5.4 of DLB, Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks and Kyunghyun Cho et al.: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Using Attention mechanism in Neural Machine Translation [Section 12.4.5.1 of DLB, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
 Translating Subword Units [Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
 Google NMT [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
10. Deep Generative Models
May 04 Slides PDF Slides Video 2018 Video Questions vae gan dcgan
 Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 1414.2.3 of DLB]
 Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
 Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB, AutoEncoding Variational Bayes]
 Generative Adversarial Networks
 GAN [Section 20.10.4 of DLB, Generative Adversarial Networks]
 CGAN [Conditional Generative Adversarial Nets]
 DCGAN [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks]
 WGAN [Wasserstein GAN]
 BigGAN [Large Scale Gan Training for High Fidelity Natural Image Synthesis]
11. Introduction to Deep Reinforcement Learning
May 11 Slides PDF Slides Video 2018 Video I 2018 Video II Questions omr_competition monte_carlo reinforce reinforce_baseline reinforce_pixels
Study material for Reinforcement Learning is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (reffered to as RLB), available online.
 Multiarmed bandits [Sections 22.4 of RLB]
 Markov Decision Process [Sections 33.3 of RLB]
 Policies and Value Functions [Sections 3.5 of RLB]
 Monte Carlo Methods [Sections 55.4 of RLB]
 Policy Gradient Methods [Sections 1313.1 of RLB]
 Policy Gradient Theorem [Section 13.2 of RLB]
 REINFORCE algorithm [Section 13.3 of RLB]
 REINFORCE with baseline algorithm [Section 13.4 of RLB]
12. Speech Synthesis, External Memory Networks
May 18 Slides PDF Slides Video 2018 Video Questions
 NasNet [Learning Transferable Architectures for Scalable Image Recognition]
 WaveNet [WaveNet: A Generative Model for Raw Audio]
 Parallel WaveNet [Parallel WaveNet: Fast HighFidelity Speech Synthesis]
 Full speech synthesis pipeline Tacotron 2 [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions]
 Neural Turing Machine [Neural Turing Machines]
 Differenciable Neural Computer [Hybrid computing using a neural network with dynamic external memory]
 Memory Augmented Neural Networks [Oneshot learning with MemoryAugmented Neural Networks]
13. Transformer, BERT
May 25 Slides PDF Slides Video Questions sentiment_analysis
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 (including the bonus points) will be transfered to the exam.
Environment
The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, TensorFlow 2.1.0, TensorFlow Addons 0.8.1, TensorFlow Hub 0.7.0, TensorFlow Probability 0.9.0, OpenAI Gym 0.15.4 and NumPy 1.18.1.
Installing to Central User Packages Repository
You can install all required packages to central user packages repository using
pip3 install user upgrade pip setuptools
followed by
pip3 install user tensorflow==2.1.0 tensorflowaddons==0.8.1 tensorflowhub==0.7.0 tensorflowprobability==0.9.0 gym==0.15.4
.
Installing to a Virtual Environment
Python supports virtual environments, which are directories containing
independent sets of installed packages. You can create the virtual environment
by running python3 m venv VENV_DIR
followed by
VENV_DIR/bin/pip3 install upgrade pip setuptools
and
VENV_DIR/bin/pip3 install tensorflow==2.1.0 tensorflowaddons==0.8.1 tensorflowhub==0.7.0 tensorflowprobability==0.9.0 gym==0.15.4
.
Problems With the Environment
Windows TensorFlow Fails with ImportError: DLL load failed
If your Windows TensorFlow fails with ImportError: DLL load failed
,
you are probably missing
Visual C++ 2019 Redistributable.
Cannot Start tensorboard
If tensorboard
cannot be found, make sure the directory with pip installed
packages is in your PATH (that directory is either in your virtual environment
if you use a virtual environment, or it should be ~/.local/bin
on Linux
and %UserProfile%\AppData\Roaming\Python\Python3[57]
and
%UserProfile%\AppData\Roaming\Python\Python3[57]\Scripts
on Windows).
On Windows, tensorboard Shows a Blank Page
Some programs (even VS and VS code) erroneously change Windows systemwide MIME
type of Javascript files to text/plain
, which causes problems for tensorboard.
If you encounter the issue, the easiest is to uninstall tensorboard (pip3 uninstall tensorboard
) and then install a development version (pip3 install [user] tbnightly
) which contains a fix. The development version is then
started exactly as a stable one using a tensorboard
command.
Warning About Missing libnvinfer, libnvinfer_plugin and TensorRT
TensorFlow 2.1 eagerly checks for availability of TensorRT during the first
import tensorflow
. In case you do not have it, a threeline warning is printed.
You can safely ignore the warning, both the CPU and the GPU backends work without TensorRT.
Tunnelling Tensorboard in Deepnote
To access Tensorboard in DeepNote, first make sure you have
labs/deepnote_ngrok
– if you do not, run labs/deepnote_ngrok_get
script to
download it. Then start Tensorboard and finally run labs/deepnote_ngrok 6006
(or a different port on which you started tensorboard) to get a public
URL you can open in another browser tab to access Tensorboard.
Teamwork
Solving assignments in teams of size 2 or 3 is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
numpy_entropy
Deadline: Mar 8, 23:59 3 points
The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.
Load a file numpy_entropy_data.txt
, whose lines consist of data points of our
dataset, and load numpy_entropy_model.txt
, which describes a model probability distribution,
with each line being a tabseparated pair of (data point, probability).
Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or inf
for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):
 entropy H(data distribution)
 crossentropy H(data distribution, model distribution)
 KLdivergence D_{KL}(data distribution, model distribution)
Use natural logarithms to compute the entropies and the divergence.
For data distribution file numpy_entropy_data.txt
A
BB
A
A
BB
A
CCC
and model distribution file numpy_entropy_model.txt
A 0.5
BB 0.3
CCC 0.1
D 0.1
the output should be
0.96
1.07
0.11
If we remove the CCC 0.1
line from the model distribution, the output should
change to
0.96
inf
inf
pca_first
Deadline: Mar 8, 23:59 2 points
The goal of this exercise is to familiarize with TensorFlow tf.Tensor
s,
shapes and basic tensor manipulation methods. Start with the
pca_first.py.
In this assignment, you will compute the covariance matrix of several examples from the MNIST dataset, compute the first principal component and quantify the explained variance of it.
It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 pca_first.py examples=1024 iterations=64 seed=7 threads=1
51.52 9.94
python3 pca_first.py examples=8192 iterations=128 seed=7 threads=1
52.58 10.20
python3 pca_first.py examples=55000 iterations=1024 seed=7 threads=1
52.74 9.71
mnist_layers_activations
Deadline: Mar 8, 23:59 2 points
Before solving the assignment, start by playing with
example_keras_tensorboard.py,
in order to familiarize with TensorFlow and TensorBoard.
Run it, and when it finishes, run TensorBoard using tensorboard logdir logs
.
Then open http://localhost:6006 in a browser and explore the active tabs.
Your goal is to modify the mnist_layers_activations.py template and implement the following:
 A number of hidden layers (including zero) can be specified on the command line
using parameter
layers
.  Activation function of these hidden layers can be also specified as a command
line parameter
activation
, with supported values ofnone
,relu
,tanh
andsigmoid
.  Print the final accuracy on the test set.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
0
layers, activationnone
1
layer, activationnone
,relu
,tanh
,sigmoid
10
layers, activationsigmoid
,relu
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=0 activation=none
91.22
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=1 activation=none
91.96
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=1 activation=relu
94.84
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=1 activation=tanh
94.19
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=1 activation=sigmoid
92.32
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=3 activation=relu
96.06
python3 mnist_layers_activations.py recodex seed=7 threads=1 epochs=1 batch_size=50 layers=5 activation=tanh
94.67
sgd_backpropagation
Deadline: Mar 15 22, 23:59
3 points
In this exercise you will learn how to compute gradients using the socalled automatic differentiation, which is implemented by an automated backpropagation algorithm in TensorFlow. You will then perform training by running manually implemented minibatch stochastic gradient descent.
Starting with the sgd_backpropagation.py template, you should:
 implement a neural network with a single tanh hidden layer and categorical output layer;
 compute the crossentropy loss;
 use
tf.GradientTape
to automatically compute the gradient of the loss with respect to all variables;  perform the SGD update.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 sgd_backpropagation.py batch_size=64 epochs=2 hidden_layer=20 learning_rate=0.1 seed=7 threads=1
92.38
python3 sgd_backpropagation.py batch_size=100 epochs=2 hidden_layer=32 learning_rate=0.2 seed=7 threads=1
93.77
sgd_manual
Deadline: Mar 15 22, 23:59
2 points
The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.
Note that this assignment is the only one where we will compute the gradient manually, we will otherwise always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise and it is definitely not required to pass the course. Furthermore, we will compute the derivative of the output functions later on the Mar 9 lecture.
Start with the sgd_manual.py template, which is based on sgd_backpropagation.py one. Be aware that these templates generates each a different output file.
In order to check that you do not use automatic differentiation, ReCodEx checks
that there is no GradientTape
string in your source (except in the comments).
The outputs should be exactly the same as in the sgd_backpropagation assignment.
mnist_training
Deadline: Mar 15 22, 23:59
3 points
This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:
 Using specified optimizer (either
SGD
orAdam
).  Optionally using momentum for the
SGD
optimizer.  Using specified learning rate for the optimizer.
 Optionally use a given learning rate schedule. The schedule can be either
exponential
orpolynomial
(with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:
SGD
optimizer,learning_rate
0.01;SGD
optimizer,learning_rate
0.01,momentum
0.9;SGD
optimizer,learning_rate
0.1;Adam
optimizer,learning_rate
0.001;Adam
optimizer,learning_rate
0.01;Adam
optimizer,exponential
decay,learning_rate
0.01 andlearning_rate_final
0.001;Adam
optimizer,polynomial
decay,learning_rate
0.01 andlearning_rate_final
0.0001.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_training.py recodex threads=1 seed=7 epochs=1 batch_size=100 hidden_layer=50 optimizer SGD learning_rate 0.03
90.10
python3 mnist_training.py recodex threads=1 seed=7 epochs=1 batch_size=100 hidden_layer=50 optimizer SGD learning_rate 0.2 momentum 0.9
94.42
python3 mnist_training.py recodex threads=1 seed=7 epochs=1 batch_size=100 hidden_layer=50 optimizer Adam learning_rate 0.007
94.90
python3 mnist_training.py recodex threads=1 seed=7 epochs=2 batch_size=100 hidden_layer=50 optimizer SGD learning_rate 0.09 decay polynomial learning_rate_final 0.005
92.53
python3 mnist_training.py recodex threads=1 seed=7 epochs=2 batch_size=100 hidden_layer=50 optimizer Adam learning_rate 0.02 decay exponential learning_rate_final 0.0005
96.37
gym_cartpole
Deadline: Mar 15 22, 23:59
3 points
Solve the CartPolev1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpoledata.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.
The solution to this task should be a model which passes evaluation on random
inputs. This evaluation is performed by running the
gym_cartpole_evaluate.py
script, which loads a model and then evaluates it on 100 random episodes
(optionally rendering if render
option is provided; note that the script
can be also imported as a module and evaluate any given tf.keras.Model
).
In order to pass, you must achieve an average reward of at least 475 on 100
episodes. Your model should have either one or two outputs (i.e., using either
sigmoid or softmax output function).
The size of the training data is very small and you should consider it when designing the model.
When submitting your model to ReCodEx, submit:
 one file with the model itself (with
h5
suffix),  the source code (or multiple sources) used to train the model (with
py
suffix), and possibly indicating teams.
explore_examples
Your goal in this zeropoint assignment is to explore the prepared examples.
 The example_keras_models.py example demonstrates three different ways of constructing Keras models – sequential models, functional API and model subclassing.
 The example_keras_manual_batches.py shows how to train and evaluate Keras model when using custom batches.
 The example_manual.py
illustrates how to implement a manual training loop without using
Model.compile
, with customOptimizer
, loss function and metric. However, this example is 23 times slower than the previous two ones.  The example_manual_tf_function.py
uses
tf.function
annotation to speed up execution of the previous example back to the level ofModel.fit
. See the officialtf.function
documentation for details.
mnist_regularization
Deadline: Mar 22, 23:59 6 points
You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:
 Allow using dropout with rate
args.dropout
. Add a dropout layer after the firstFlatten
and also after allDense
hidden layers (but not after the output layer).  Allow using L2 regularization with weight
args.l2
. Usetf.keras.regularizers.L1L2
as a regularizer for all kernels (but not biases) of allDense
layers (including the last one).  Allow using label smoothing with weight
args.label_smoothing
. Instead ofSparseCategoricalCrossentropy
, you will need to useCategoricalCrossentropy
which offerslabel_smoothing
argument.
In ReCodEx, there will be three tests (one for each regularization methods) and you will get 2 points for passing each one.
In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):
 dropout rate
0
,0.3
,0.5
,0.6
,0.8
;  l2 regularization
0
,0.001
,0.0001
,0.00001
;  label smoothing
0
,0.1
,0.3
,0.5
.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_regularization.py recodex seed=7 threads=1 epochs=10 batch_size=50 hidden_layers=20 dropout 0.2
90.00
python3 mnist_regularization.py recodex seed=7 threads=1 epochs=10 batch_size=50 hidden_layers=20 l2 0.01
89.05
python3 mnist_regularization.py recodex seed=7 threads=1 epochs=10 batch_size=50 hidden_layers=20 label_smoothing 0.2
91.09
mnist_ensemble
Deadline: Mar 22, 23:59 2 points
Your goal in this assignment is to implement model ensembling.
The mnist_ensemble.py
template trains args.models
individual models, and your goal is to perform
an ensemble of the first model, first two models, first three models, …, all
models, and evaluate their accuracy on the development set.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_ensemble.py recodex seed=7 threads=1 epochs=2 batch_size=50 hidden_layers=20 models=3
94.96 94.96 95.54 95.58 94.90 95.54
python3 mnist_ensemble.py recodex seed=7 threads=1 epochs=1 batch_size=50 hidden_layers=20 models=5
94.08 94.08 94.36 94.34 93.94 94.20 94.02 94.20 93.94 94.16
uppercase
Deadline: Mar 22, 23:59 4 points+5 bonus
This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.
This is an opendata task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 97.0% accuracy will get 4 basic points; the 5 bonus points will be distributed depending on relative ordering of your solutions. The accuracy is computed percharacter and can be evaluated by uppercase_eval.py script.
You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.
Do not use RNNs, CNNs or Transformer in this task (if you have doubts, contact me).
mnist_cnn
Deadline: Apr 05, 23:59 5 points
To pass this assignment, you will learn to construct basic convolutional
neural network layers. Start with the
mnist_cnn.py
template and assume the requested architecture is described by the cnn
argument, which contains commaseparated specifications of the following layers:
Cfilterskernel_sizestridepadding
: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example:C1031same
CBfilterskernel_sizestridepadding
: Same asCfilterskernel_sizestridepadding
, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally ReLU activation. Example:CB1031same
Mkernel_sizestride
: Add max pooling with specified size and stride. Example:M32
R[layers]
: Add a residual connection. Thelayers
contain a specification of at least one convolutional layer (but not a recursive residual connectionR
). The input to the specified layers is then added to their output (after the ReLU nonlinearity of the last one). Example:R[C1631same,C1631same]
F
: Flatten inputs. Must appear exactly once in the architecture.Hhidden_layer_size
: Add a dense layer with ReLU activation and specified size. Example:H100
Ddropout_rate
: Apply dropout with the given dropout rate. Example:D0.5
An example architecture might be cnn=CB1652same,M32,F,H100,D0.5
.
You can assume the resulting network is valid; it is fine to crash if it is not.
After a successful ReCodEx submission, you can try obtaining the best accuracy
on MNIST and then advance to cifar_competition
.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=F,H100
94.84
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=F,H100,D0.5
94.17
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=M52,F,H50
87.18
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=C835same,C832valid,F,H50
86.18
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=CB635valid,F,H32
90.23
python3 mnist_cnn.py seed=7 recodex threads=1 epochs=1 batch_size=50 cnn=C835valid,R[C831same,C831same],F,H50
91.15
image_augmentation
Deadline: Apr 05, 23:59 1 points
The template image_augmentation.py creates a simple convolutional network for classifying CIFAR10. Your goal is to perform image data augmentation operations using ImageDataGenerator and to utilize these data during training.
tf_dataset
Deadline: Apr 05, 23:59 2 points
In this assignment you will familiarize yourselves with tf.data
, which is
TensorFlow highlevel API for constructing input pipelines. If you want,
you can read an official TensorFlow tf.data guide
or reference API manual.
The goal of this assignment is to implement image augmentation preprocessing
similar to image_augmentation
, but with tf.data
. Start with the
tf_dataset.py
template and implement the input pipelines employing the tf.data.Dataset
.
cifar_competition
Deadline: Apr 05, 23:59 5 points+5 bonus
The goal of this assignment is to devise the best possible model for CIFAR10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR10.
This is an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 60% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve at least ~73% on the development set to score 60% on the test set.
You may want to start with the cifar_competition.py template which generates the test set annotation in the required format.
mnist_web
You can try a
Javascriptbased demo of MNIST classification.
This demo uses a neural network trained in TensorFlow
using the mnist_web.py module,
whose output was converted for Tensorflow.js
with tensorflowjs_converter input_format=keras
command and is then utilized
by mnist_web.html.
cags_classification
Deadline: Apr 12, 23:59 6 points+5 bonus
The goal of this assignment is to use pretrained EfficientNetB0 model to achieve best accuracy in CAGS classification.
The CAGS dataset consists
of images of cats and dogs of size $224×224$, each classified in one of
the 34 breeds and each containing a mask indicating the presence of the animal.
To load the dataset, use the cags_dataset.py
module. The dataset is stored in a
TFRecord file
and each element is encoded as a
tf.train.Example.
Therefore the dataset is loaded using tf.data
API and each entry can be
decoded using .map(CAGS.parse)
call.
To load the EfficientNetB0, use the the provided
efficient_net.py
module. Its method pretrained_efficientnet_b0(include_top)
:
 downloads the pretrained weights if they are not found;
 it returns a
tf.keras.Model
processing image of shape $(224, 224, 3)$ with float values in range $[0, 1]$ and producing a list of results: the first value is the final network output:
 if
include_top == True
, the network will include the final classification layer and produce a distribution on 1000 classes (whose names are in imagenet_classes.py);  if
include_top == False
, the network will return image features (the result of the last global average pooling);
 if
 the rest of outputs are the intermediate results of the network just before a convolution with $\textit{stride} > 1$ is performed (denoted $C_5, C_4, C_3, C_2, C_1$ in the Object Detection lecture).
 the first value is the final network output:
An example performing classification of given images is available in image_classification.py.
A note on finetuning: each tf.keras.layers.Layer
has a mutable trainable
property indicating whether its variables should be updated – however, after
changing it, you need to call .compile
again (or otherwise make sure the list
of trainable variables for the optimizer is updated). Furthermore, training
argument passed to the invocation call decides whether the layer is executed in
training regime (neurons gets dropped in dropout, batch normalization computes
estimates on the batch) or in inference regime. There is one exception though
– if trainable == False
on a batch normalization layer, it runs in the
inference regime even when training == True
.
This is an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone who submits a solution which achieves at least 90% test set accuracy will get 6 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_classification.py template which generates the test set annotation in the required format.
cags_segmentation
Deadline: Apr 19, 23:59 6 points+5 bonus
The goal of this assignment is to use pretrained EfficientNetB0 model to
achieve best image segmentation IoU score on the CAGS dataset.
The dataset and the EfficientNetB0 is described in the cags_classification
assignment.
This is an opendata task, where you submit only the test set masks together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
A mask is evaluated using intersection over union (IoU) metric, which is the
intersection of the gold and predicted mask divided by their union, and the
whole test set score is the average of its masks' IoU. A TensorFlow compatible
metric is implemented by the class CAGSMaskIoU
of the
cags_segmentation_eval.py
module, which can further be used to evaluate a file with predicted masks.
The task is also a competition. Everyone who submits a solution which achieves at least 85% test set IoU will get 6 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_segmentation.py template, which generates the test set annotation in the required format – each mask should be encoded on a single line as a space separated sequence of integers indicating the length of alternating runs of zeros and ones.
cnn_manual
Deadline: Apr 19, 23:59 3 points
To pass this assignment, you need to manually implement the forward and backward
pass through a 2D convolutional layer. Start with the
cnn_manual.py
template, which construct a series of 2D convolutional layers with ReLU
activation and valid
padding, specified in the args.cnn
option.
The args.cnn
contains comma separater layer specifications in the format
filterskernel_sizestride
.
Of course, you cannot use any TensorFlow convolutional operation (instead,
implement the forward and backward pass using matrix multiplication and other
operations) nor the GradientTape
for gradient computation.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 cnn_manual.py seed=7 recodex threads=1 learning_rate=0.01 epochs=1 batch_size=50 cnn=511
89.62
python3 cnn_manual.py seed=7 recodex threads=1 learning_rate=0.01 epochs=1 batch_size=50 cnn=531
92.83
python3 cnn_manual.py seed=7 recodex threads=1 learning_rate=0.01 epochs=1 batch_size=50 cnn=532
90.62
python3 cnn_manual.py seed=7 recodex threads=1 learning_rate=0.01 epochs=1 batch_size=50 cnn=532,1032
92.58
bboxes_utils
Deadline: Apr 19, 23:59 2 points
This is a preparatory assignment for svhn_competition
. The goal is to
implement several bounding box manipulation routines in the
bboxes_utils.py
module. Notably, you need to implement the following methods:
bbox_to_fast_rcnn
: convert a bounding box to a Fast RCNNlike representation relative to a given anchor;bbox_from_fast_rcnn
: convert a Fast RCNNlike representation relative to an anchor back to a bounding box;bboxes_training
: given a list of anchors and gold objects, assign gold objects to anchors and generate suitable training data (the exact algorithm is described in the template).
The bboxes_utils.py contains simple unit tests, which are evaluated when executing the module, which you can use to check the validity of your implementation.
When submitting to ReCodEx, you must submit exactly one Python source with
methods bbox_to_fast_rcnn
, bbox_to_fast_rcnn
and bboxes_training
.
These methods are then executed and compared to the reference implementation.
svhn_competition
Deadline: Apr 26, 23:59 May 03, 23:59
5 points+5 bonus
The goal of this assignment is to implement a system performing object recognition, optionally utilizing pretrained EfficientNetB0 backbone.
The Street View House Numbers (SVHN) dataset
annotates for every photo all digits appearing on it, including their bounding
boxes. The dataset can be loaded using the svhn_dataset.py
module. Similarly to the CAGS
dataset, it is stored in a
TFRecord file
with tf.train.Example
elements, which can be decoded using .map(SVHN.parse)
call. Every element
is a dictionary with the following keys:
"image"
: a square 3channel image,"classes"
: a 1D tensor with all digit labels appearing in the image,"bboxes"
: a[num_digits, 4]
2D tensor with bounding boxes of every digit in the image.
Given that the dataset elements are each of possibly different size and you
want to preprocess them using a NumPy function bboxes_training
, it might be
more comfortable to convert the dataset to NumPy. Alternatively, you can
call bboxes_training
directly in tf.data.Dataset.map
by using tf.numpy_function
,
see FAQ.
Similarly to the cags_classification
, you can load the EfficientNetB0 using the provided
efficient_net.py
module. Its method pretrained_efficientnet_b0(include_top, dynamic_shape=False)
has gotten
a new argument dynamic_shape
, and with dynamic_shape=True
it constructs
a model capable of processing an input image of any size.
This is an opendata task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
Each test set image annotation consists of a sequence of space separated fivetuples label top left bottom right, and the annotation is considered correct, if exactly the gold digits are predicted, each with IoU at least 0.5. The whole test set score is then the prediction accuracy of individual images. An evaluation of a file with the predictions can be performed by the svhn_eval.py module.
The task is also a competition. Everyone submitting a solution with at least 20% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that I usually need at least 35% development set accuracy to achieve the required test set performance.
You should start with the svhn_competition.py template, which generates the test set annotation in the required format.
A baseline solution can use RetinaNetlike single stage detector,
using only a single level of convolutional features (no FPN)
with singlescale and singleaspect anchors. Focal loss is available
as tfa.losses.SigmoidFocalCrossEntropy
(using reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE
option is a good
idea) and nonmaximum suppression as
tf.image.non_max_suppression or
tf.image.combined_non_max_suppression.
3d_recognition
Deadline: Apr 26, 23:59 5 points+5 bonus
Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data or 32×32×32 data. To load the dataset, use the modelnet.py module.
The official dataset offers only train and test sets, with the test set having a different distributions of labels. Our dataset contains also a development set, which has nearly the same label distribution as the test set.
The assignment is again an opendata task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. Everyone submitting a solution with at least 85% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You can start with the 3d_recognition.py template, which among others generates test set annotations in the required format.
sequence_classification
Deadline: Apr 26, 23:59 6 points
The goal of this assignment is to introduce recurrent neural networks, manual TensorBoard log collection, and manual gradient clipping. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with onehot representation of small integer.
Your goal is to modify the sequence_classification.py template and implement the following:
 Use specified RNN type (
SimpleRNN
,GRU
andLSTM
) and dimensionality.  Process the sequence using the required RNN.
 Use additional hidden layer on the RNN outputs if requested.
 Implement gradient clipping if requested.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:
rnn_cell=SimpleRNN sequence_dim=1
,rnn_cell=GRU sequence_dim=1
,rnn_cell=LSTM sequence_dim=1
 the same as above but with
sequence_dim=2
 the same as above but with
sequence_dim=10
rnn_cell=LSTM hidden_layer=70 rnn_cell_dim=30 sequence_dim=30
and the same withclip_gradient=1
 the same as above but with
rnn_cell=SimpleRNN
 the same as above but with
rnn_cell=GRU hidden_layer=90
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 sequence_classification.py recodex seed=7 batch_size=16 epochs=1 threads=1 train_sequences=1000 test_sequences=100 sequence_length=20 sequence_dim=1 rnn_cell=SimpleRNN rnn_cell_dim=16 hidden_layer=0 clip_gradient=0
52.85
python3 sequence_classification.py recodex seed=7 batch_size=16 epochs=1 threads=1 train_sequences=1000 test_sequences=100 sequence_length=20 sequence_dim=1 rnn_cell=LSTM rnn_cell_dim=10 hidden_layer=0 clip_gradient=0
54.80
python3 sequence_classification.py recodex seed=7 batch_size=16 epochs=1 threads=1 train_sequences=1000 test_sequences=100 sequence_length=20 sequence_dim=1 rnn_cell=GRU rnn_cell_dim=12 hidden_layer=0 clip_gradient=0
47.95
python3 sequence_classification.py recodex seed=7 batch_size=16 epochs=1 threads=1 train_sequences=1000 test_sequences=100 sequence_length=20 sequence_dim=1 rnn_cell=LSTM rnn_cell_dim=16 hidden_layer=50 clip_gradient=0
54.10
python3 sequence_classification.py recodex seed=7 batch_size=16 epochs=1 threads=1 train_sequences=1000 test_sequences=100 sequence_length=20 sequence_dim=1 rnn_cell=LSTM rnn_cell_dim=16 hidden_layer=50 clip_gradient=0.01
53.85
tagger_we
Deadline: Apr 26, 23:59 3 points
In this assignment you will create a simple partofspeech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and partofspeech tag. The morpho_dataset.py module (down)loads the dataset and can generate batches.
Your goal is to modify the tagger_we.py template and implement the following:
 Use specified RNN cell type (
GRU
andLSTM
) and dimensionality.  Create word embeddings for training vocabulary.
 Process the sentences using bidirectional RNN.
 Predict partofspeech tags. Note that you need to properly handle sentences of different lengths in one batch using masking.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_we.py recodex seed=7 batch_size=2 epochs=1 threads=1 max_sentences=200 rnn_cell=LSTM rnn_cell_dim=16 we_dim=64
29.34
python3 tagger_we.py recodex seed=7 batch_size=2 epochs=1 threads=1 max_sentences=200 rnn_cell=GRU rnn_cell_dim=20 we_dim=64
46.29
mnist_multiple
Deadline: May 03, 23:59 2 points
In this assignment you will implement a model with multiple inputs. Start with the mnist_multiple.py template and:
 The goal is to create a model, which given two input MNIST images predicts, if the digit on the first one is larger than on the second one.
 The model has three outputs:
 label prediction for the first image,
 label prediction for the second image,
 direct prediction whether the first digit is larger than the second one.
 In addition to direct prediction, you can use the predicted labels for both images and compare them – an indirect prediction.
 You need to implement:
 the model, using multiple inputs, outputs, losses, and metrics;
 construction of twoimage batches using regular MNIST batches,
 computation of direct and indirect prediction accuracy.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_multiple.py recodex seed=7 batch_size=50 epochs=1 threads=1
93.94 97.32
python3 mnist_multiple.py recodex seed=7 batch_size=100 epochs=1 threads=1
91.86 96.28
tagger_cle_rnn
Deadline: May 03, 23:59 2 points
This assignment is a continuation of tagger_we
. Using the
tagger_cle_rnn.py
template, implement characterlevel word embedding computation using
a bidirectional characterlevel GRU.
Once submitted to ReCodEx, you should experiment with the effect of CLEs
compared to a plain tagger_we
, and the influence of their dimensionality. Note
that tagger_we
has by default word embeddings twice the size of word
embeddings in tagger_cle_rnn
.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_cle_rnn.py recodex seed=7 batch_size=3 epochs=2 threads=1 max_sentences=90 rnn_cell=LSTM rnn_cell_dim=16 we_dim=32 cle_dim=16
25.85
python3 tagger_cle_rnn.py recodex seed=7 batch_size=3 epochs=2 threads=1 max_sentences=90 rnn_cell=GRU rnn_cell_dim=20 we_dim=32 cle_dim=16
33.90
tagger_cle_cnn
Deadline: May 03, 23:59 2 points
This task is a continuation of tagger_cle_rnn
assignment. Using the
tagger_cle_cnn.py
template, instead of using RNNs to generate characterlevel embeddings,
process character sequences with 1D convolutional filters with varying kernel
sizes and obtain fixedsize representations using global maxpooling.
Compute the final embeddings by using a highway layer.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_cle_cnn.py recodex seed=7 batch_size=3 epochs=4 threads=1 max_sentences=90 rnn_cell=LSTM rnn_cell_dim=16 we_dim=32 cle_dim=16 cnn_filters=16 cnn_max_width=3
38.01
python3 tagger_cle_cnn.py recodex seed=7 batch_size=3 epochs=4 threads=1 max_sentences=90 rnn_cell=GRU rnn_cell_dim=20 we_dim=32 cle_dim=16 cnn_filters=16 cnn_max_width=3
53.85
tagger_competition
Deadline: May 03, 23:59 4 points+5 bonus
In this assignment, you should extend
tagger_we
/tagger_cle_rnn
/tagger_cle_cnn
into a realworld Czech partofspeech tagger. We will use
Czech PDT dataset loadable using the morpho_dataset.py
module. Note that the dataset contains more than 1500 unique POS tags and that
the POS tags have a fixed structure of 15 positions (so it is possible to
generate the POS tag characters independently).
You can use the following additional data in this assignment:
 You can use outputs of a morphological analyzer loadable with morpho_analyzer.py. If a word form in train, dev or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
 You can use any unannotated text data (Wikipedia, Czech National Corpus, …), and also any pretrained word embeddings (assuming they were trained on plain texts).
The assignment is again an opendata task, where you submit only the test set annotations
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. Everyone submitting a solution with at least 92% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 95.89% from Spoustová et al., 2009. You can evaluate a generated file using the morpho_evaluator.py module.
You can start with the tagger_competition.py template, which among others generates test set annotations in the required format.
speech_recognition
Deadline: May 03, 23:59 May 10, 23:59
6 points+5 bonus
This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using TIMIT corpus, with input sound waves passed through the usual preprocessing – computing Melfrequency cepstral coefficients (MFCCs). You can repeat exactly this preprocessing on a given audio using the timit_mfcc_preprocess.py script.
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset using the timit_mfcc.py module.
This is an opendata task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is also a competition. The evaluation is performed
by computing edit distance to the gold letter sequence, normalized by its length
(i.e., exactly as tf.edit_distance
). Everyone submitting a solution with at
most 50% test set edit distance will get 6 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. An evaluation (using for example development data)
can be performed by
speech_recognition_eval.py.
You should start with the speech_recognition.py template.
 To perform speech recognition, you should use CTC loss for training and CTC beam search decoder for prediction. Both the CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
 A basic architecture:
 converts target letters into sparse representation,
 use a bidirectional RNN and an output linear layer without activation,
 compute CTC loss (
tf.nn.ctc_loss
),  if required, perform decoding by a CTC decoder (
tf.nn.ctc_beam_search_decoder
) and possibly evaluate the results using normalized edit distance (tf.edit_distance
).
tensorboard_projector
You can try exploring the TensorBoard Projector with pretrained embeddings
for 20k most frequent lemmas in
Czech
and English
– after extracting the archive, start
tensorboard logdir dir_where_the_archive_is_extracted
.
In order to use the Projector tab yourself, you can take inspiration from the projector_export.py script, which was used to export the above pretrained embeddings.
lemmatizer_noattn
Deadline: May 10, 23:59 4 points
The goal of this assignment is to create a simple lemmatizer. For training
and evaluation, we use the same dataset as in tagger_we
loadable by the
morpho_dataset.py
module.
Your goal is to modify the lemmatizer_noattn.py template and implement the following:
 Embed characters of source forms and run a bidirectional GRU encoder.
 Embed characters of target lemmas.
 Implement a training time decoder which uses gold target characters as inputs.
 Implement an inference time decoder which uses previous predictions as inputs.
 The initial state of both decoders is the output state of the corresponding GRU encoded form.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 lemmatizer_noattn.py recodex seed=7 batch_size=2 epochs=3 threads=1 max_sentences=200 rnn_dim=24 cle_dim=64
20.47
lemmatizer_attn
Deadline: May 10, 23:59 3 points
This task is a continuation of the lemmatizer_noattn
assignment. Using the
lemmatizer_attn.py
template, implement the following features in addition to lemmatizer_noattn
:
 The bidirectional GRU encoder returns outputs for all input characters, not just the last.
 Implement attention in the decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply nonlinearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.
Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 lemmatizer_attn.py recodex seed=7 batch_size=2 epochs=3 threads=1 max_sentences=200 rnn_dim=24 cle_dim=64
22.14
lemmatizer_competition
Deadline: May 10, 23:59 May 17, 23:59
5 points+5 bonus
In this assignment, you should extend lemmatizer_noattn
or lemmatizer_attn
into a realworld Czech lemmatizer. As in tagger_competition
, we will use
Czech PDT dataset loadable using the morpho_dataset.py
module.
You can also use the following additional data as in the tagger_competition
assignment.
The assignment is again an opendata task, where you submit only the test set annotations
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. Everyone submitting a solution with at least 92% accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 97.86%. You can evaluate a generated file using the morpho_evaluator.py module.
You can start with the lemmatizer_competition.py template, which among others generates test set annotations in the required format.
vae
Deadline: May 17, 23:59 3 points
In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format. Your goal is to modify the vae.py template and implement a VAE.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
different latent variable dimensionality (z_dim=2
and z_dim=100
).
The generated images are available in TensorBoard logs.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 vae.py recodex seed=7 batch_size=50 dataset=mnistrecodex decoder_layers=500,500 encoder_layers=500,500 epochs=2 threads=1 z_dim=2
2357.67
python3 vae.py recodex seed=7 batch_size=50 dataset=mnistrecodex decoder_layers=500,500 encoder_layers=500,500 epochs=2 threads=1 z_dim=100
2174.10
gan
Deadline: May 17, 23:59 3 points
In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format. Your goal is to modify the gan.py template and implement a GAN.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
maybe try different latent variable dimensionality. The generated images are
available in TensorBoard logs.
You can also continue with dcgan
assignment.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 gan.py recodex seed=7 batch_size=50 dataset=mnistrecodex discriminator_layers=128 generator_layers=128 epochs=2 threads=1 z_dim=2
57.75
python3 gan.py recodex seed=7 batch_size=50 dataset=mnistrecodex discriminator_layers=128 generator_layers=128 epochs=2 threads=1 z_dim=100
49.24
dcgan
Deadline: May 17, 23:59 1 points
This task is a continuation of the gan
assignment, which you will modify to
implement the Deep Convolutional GAN (DCGAN).
Start with the
dcgan.py
template and implement a DCGAN. Note that most of the TODO notes are from
the gan
assignment.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
). However,
note that you will need a lot of computational power (preferably a GPU) to
generate the images.
Note that the results might be slightly different, depending on your CPU type and whether you use GPU.
python3 dcgan.py recodex seed=7 batch_size=50 dataset=mnistrecodex epochs=1 threads=1 z_dim=2
30.34
python3 dcgan.py recodex seed=7 batch_size=50 dataset=mnistrecodex epochs=1 threads=1 z_dim=100
27.20
omr_competition
Deadline: May 24, 23:59 5 points+5 bonus
You should implement optical music recognition in your final competition assignment. The inputs are PNG images of monophonic scores starting with a clef, key signature, and a time signature, followed by several staves. The dataset is loadable using the omr_dataset.py module and is downloaded automatically if missing (note that is has 185MB). No other data or pretrained models are allowed for training.
The assignment is again an opendata task, where you submit only the annotated test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit exactly one .txt file and at least one .py file.
Note that all .zip
files you submit will be extracted first.
The task is also a competition. The evaluation is performed by computing edit distance to the gold letter sequence, normalized by its length. Everyone submitting a solution with at most 10% test set edit distance will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Furthermore, 3 bonus points will be given to anyone surpassing current stateoftheart of 0.80%. An evaluation (using for example development data) can be performed by omr_competition_eval.py.
You can start with the omr_competition.py template, which among others generates test set annotations in the required format.
monte_carlo
Deadline: May 24, 23:59 2 points
Solve the discretized CartPolev1 environment environment from the OpenAI Gym using the Monte Carlo reinforcement learning algorithm.
Use the supplied cart_pole_evaluator.py module (depending on gym_evaluator.py) to interact with the discretized environment. The environment has the following methods and properties:
states
: number of states of the environmentactions
: number of actions of the environmentepisode
: number of the current episode (zerobased)reset(start_evaluate=False) → new_state
: starts a new episodestep(action) → new_state, reward, done, info
: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environmentspecific informationrender()
: render current environment state
Once you finish training (which you indicate by passing start_evaluate=True
to reset
), your goal is to reach an average return of 475 during 100
evaluation episodes. Note that the environment prints your 100episode
average return each 10 episodes even during training.
You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
Note that gym_evaluator.py
and cart_pole_evaluator.py
must not be submitted to ReCodEx.
reinforce
Deadline: May 24, 23:59 2 points
Solve the continuous CartPolev1 environment environment from the OpenAI Gym using the REINFORCE algorithm.
The supplied cart_pole_evaluator.py
module (depending on gym_evaluator.py)
can create a continuous environment using environment(discrete=False)
.
The continuous environment is very similar to the discrete environment, except
that the states are vectors of realvalued observations with shape environment.state_shape
.
Your goal is to reach an average return of 475 during 100 evaluation episodes. Start with the reinforce.py template.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
Note that gym_evaluator.py
and cart_pole_evaluator.py
must not be submitted to ReCodEx.
reinforce_baseline
Deadline: May 24, 23:59 2 points
This is a continuation of the reinforce
assignment.
Using the reinforce_baseline.py template, solve the CartPolev1 environment environment using the REINFORCE with baseline algorithm.
Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.
Your goal is to reach an average return of 475 during 100 evaluation episodes.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
Note that gym_evaluator.py
and cart_pole_evaluator.py
must not be submitted to ReCodEx.
reinforce_pixels
Deadline: May 24, 23:59 2 points
This is a continuation of the reinforce
or reinforce_baseline
assignments.
The supplied cart_pole_pixels_evaluator.py
module (depending on gym_evaluator.py)
generates a pixel representation of the CartPole
environment
as an $80×80$ image with three channels, with each channel representing one time step
(i.e., the current observation and the two previous ones).
To pass the assignment, you need to reach an average return of 250 during 100 evaluation episodes. During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 10 minutes.
You can start with the reinforce_pixels.py template using the correct environment.
Note that gym_evaluator.py
and cart_pole_pixels_evaluator.py
must not be submitted to ReCodEx.
sentiment_analysis
Deadline: Jun 7, 23:59 5 points
In this assignment you should try finetuning the mBERT model to perform sentiment analysis. We will use Czech dataset of Facebook comments, which can be loaded by the text_classification_dataset.py module.
Use the BERT implementation from the
🤗 Transformers library, which
you can install by pip3 install [user] transformers
. Start by looking at the
bert_example.py
example demonstrating loading, tokenizing and calling a BERT model, and you can
also read the documentation, specifically
for the tokenizer
and for TFBertModel.call.
The assignment is an opendata task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file. You pass if your test set accuracy is at least 75%.
You can start with the sentiment_analysis.py template, which among others generates test set annotations in the required format.
In the competitions, your goal is to train a model and then predict target values on the given unannotated test set.
Submitting to ReCodEx
When submitting a competition solution to ReCodEx, you can include any
number of files of any kind, and either submit them individuall or
compess them in a .zip
file. However, there should be exactly one
text file with the test set annotation (.txt
) and at least one
Python source (.py
) containing the model training and prediction.
The Python sources are not executed, but must be included for inspection.
Evaluation in ReCodEx

For every submission, ReCodEx checks the above conditions (exactly one
.txt
, at least one.py
), whether the given annotations can be evaluated without error, and if the annotations surpass the required performance baseline. If all these checks pass, the assignment is marked as solved in ReCodEx and gets the regular points for the assignment. 
Just after the deadline, the newest submission of every user passing ReCodEx evaluation participates in a course competition. Additional bonus points are then awarded according to the ordering of the participating submissions.

After the deadline, the exact performance becomes visible for all submissions.
What Is Allowed
 You can use only the given annotated data, either for training or evaluation.
 You can use any unannotated or manually created data for training or evaluation.
 The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just neural networks, like handwritten rules).
 Do not use test set annotations in any way.
 Unless stated otherwise, you can use any algorithm to solve the competition task at hand. The implementation should be either created by you or you should understand it fully.
tf.data

How to look what is in a
tf.data.Dataset
?The
tf.data.Dataset
is not just an array, but a description of a pipeline, which can produce data if requested. A simple way to run the pipeline is to iterate it using Python iterators:dataset = tf.data.Dataset.range(10) for entry in dataset: print(entry)

How to use
tf.data.Dataset
withmodel.fit
ormodel.evaluate
?To use a
tf.data.Dataset
in Keras, the dataset elements should be pairs(input_data, gold_labels)
, whereinput_data
andgold_labels
must be already batched. For example, givenCAGS
dataset, you can preprocess training data forcags_classification
as (for development data, you would remove the.shuffle
):train = cags.train.map(CAGS.parse) train = train.map(lambda example: (example["image"], example["label"])) train = train.shuffle(10000, seed=args.seed) train = train.batch(args.batch_size)

Is every iteration through a
tf.data.Dataset
the same?No. Because the dataset is only a pipeline generating data, it is called each time the dataset is iterated – therefore, every
.shuffle
is called in every iteration. 
How to generate different random numbers each epoch during
tf.data.Dataset.map
?When a global random seed is set, methods like
tf.random.uniform
generate the same sequence of numbers on each iteration.The easiest method I found is to create a Generator object and use it to produce random numbers.
generator = tf.random.experimental.Generator.from_seed(42) data = tf.data.Dataset.from_tensor_slices(tf.zeros(10, tf.int32)) data = data.map(lambda x: x + generator.uniform([], maxval=10, dtype=tf.int32)) for _ in range(3): print(*[element.numpy() for element in data])

How to call numpy methods or other nontf functions in
tf.data.Dataset.map
?You can use tf.numpy_function to call a numpy function even in a computational graph. However, the results have no static shape information and you need to set it manually – ideally using tf.ensure_shape, which both sets the static shape and verifies during execution that the real shape mathes it.
For example, to use the
bboxes_training
method from bboxes_utils, you could do something like:anchors = np.array(...) def prepare_data(example): anchor_classes, anchor_bboxes = tf.numpy_function( bboxes_utils.bboxes_training, [anchors, example["classes"], example["bboxes"], 0.5], (tf.int32, tf.float32)) anchor_classes = tf.ensure_shape(anchor_classes, [len(anchors)]) anchor_bboxes = tf.ensure_shape(anchor_bboxes, [len(anchors), 4]) ...

How to use
ImageDataGenerator
intf.data.Dataset.map
?The
ImageDataGenerator
offers a.random_transform
method, so we can usetf.numpy_function
from the previous answer:train_generator = tf.keras.preprocessing.image.ImageDataGenerator(...) def augment(image, label): return tf.ensure_shape( tf.numpy_function(train_generator.random_transform, [image], tf.float32), image.shape ), label dataset.map(augment)
Finetuning

How to make a part of the network frozen, so that its weights are not updated?
Each
tf.keras.layers.Layer
/tf.keras.Model
has a mutabletrainable
property indicating whether its variables should be updated – however, after changing it, you need to call.compile
again (or otherwise make sure the list of trainable variables for the optimizer is updated).Note that once
trainable == False
, the insides of a layer are no longer considered, even if some its sublayers havetrainable == True
. Therefore, if you want to freeze only some sublayers of a layer you use in your model, the layer itself must havetrainable == True
. 
How to choose whether dropout/batch normalization is executed in training or inference regime?
When calling a
tf.keras.layers.Layer
/tf.keras.Model
, a named optiontraining
can be specified, indicating whether training or inference regime should be used. For a model, this option is automatically passed to its layers which require it, and Keras automatically passes it duringmodel.{fit,evaluate,predict}
.However, you can manually pass for example
training=False
to a layer when using Functional API, meaning that layer is executed in the inference regime even when the whole model is training. 
How does
trainable
andtraining
interact?The only layer, which is influenced by both these options, is batch normalization, for which:
 if
trainable == False
, the layer is always executed in inference regime;  if
trainable == True
, the training/inference regime is chosen according to thetraining
option.
 if
Masking

How can sequences of different length be processed by a RNN?
Keras employs masking to indicate, which sequence elements are valid and which are just padding.
Usually, a mask is created using a Embedding or Masking layer and is then automatically propagated. If
model.compile
is used, it is also automatically utilized in losses and metrics.However, in order for the mask propagation to work, you can use only
tf.keras.layers
to process the data, not raw TF operations liketf.concat
or even the+
operator (seetf.keras.layers.Concatenate/Add/...
). 
How to compute masked losses and masked metrics manually?
When you want to compute the losses and metrics manually, pass the mask as the third argument to their
__call__
method (each individual component of loss/metric is then multiplied by the mask, zeroing out the ones for padding elements). 
How to print output masks of a
tf.keras.Model
?When you call the model directly, like
model(input_batch)
, the mask of each output is available in a private._keras_mask
property, so for singleoutput models you can print it withprint(model(input_batch)._keras_mask)
.
TensorBoard

How to create TensorBoard logs manually?
Start by creating a SummaryWriter using for example:
writer = tf.summary.create_file_writer(args.logdir, flush_millis=10 * 1000)
and then you can generate logs inside a
with writer.as_default()
block.You can either specify
step
manually in each call, or you can usetf.summary.experimental.set_step(step)
. Also, during training you usually want to log only some batches, so the logging block during training usually looks like:if optimizer.iterations % 100 == 0: tf.summary.experimental.set_step(optimizer.iterations) with self._writer.as_default(): # logging

What can be logged in TensorBoard?
 scalar values:
tf.summary.scalar(name like "train/loss", value, [step])
 tensor values displayed as histograms or distributions:
tf.summary.histogram(name like "train/output_layer", tensor value castable to `tf.float64`, [step])
 images as tensors with shape
[num_images, h, w, channels]
, wherechannels
can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):tf.summary.image(name like "train/samples", images, [step], [max_outputs=at most this many images])
 possibly large amount of text (e.g., all hyperparameter values, sample
translations in MT, …) in Markdown format:
tf.summary.text(name like "hyperparameters", markdown, [step])
 audio as tensors with shape
[num_clips, samples, channels]
and values in $[1,1]$ range:tf.summary.audio(name like "train/samples", clips, sample_rate, [step], [max_outputs=at most this many clips])
 scalar values:
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 (including the bonus points) will be transfered to the exam.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100point exam, to obtain grades 3, 2 and 1, respectively. (PhD students with binary grades require 75 points.) The exam consists of five 20point questions, which are randomly generated, but always cover the whole course. In addition, you can get at most 40 surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count.
Exam Questions
Lecture 2 Questions

Training Neural Network
Assume the artificial neural network on the right, with mean square error loss and gold output of 3. Compute the values of all weights $w_i$ after performing an SGD update with learning rate 0.1.Different networks architectures, activation functions (
tanh
,sigmoid
,softmax
) and losses (MSE
,NLL
) may appear in the exam. 
Maximum Likelihood Estimation
Formulate maximum likelihood estimator for neural network parameters and derive the following two losses: NLL (negative log likelihood) loss for networks returning a probability distribution
 MSE (mean square error) loss for networks returning a real number with a normal distribution with a fixed variance

Backpropagation Algorithm, SGD with Momentum
Write down the backpropagation algorithm. Then, write down the SGD algorithm with momentum. Finally, formulate SGD with Nestorov momentum and explain the difference to SGD with regular momentum. 
Adagrad and RMSProp
Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of $1/\sqrt{t}$ in step $t$. Furthermore, write down RMSProp algorithm and compare it to Adagrad. 
Adam
Write down the Adam algorithm and explain the biascorrection terms $(1\beta^t)$.
Lecture 3 Questions

Regularization
Define overfitting and sketch what a regularization is. Then describe basic regularization methods like early stopping, L2 and L1 regularization, dataset augmentation, ensembling and label smoothing. 
Dropout
Describe the dropout method and write down exactly how is it used during training and during inference. Then explain why it cannot be used on RNN state, describe the variational dropout variant, and also describe layer normalization. 
Network Convergence
Describe factors influencing network convergence, namely: Parameter initialization strategies (explain also why batch normalization helps with the initialization range).
 Problems with saturating nonlinearities (and again, why batch normalization helps; also discuss why NLL (compared to MSE) helps with saturating nonlinearities on the output layer).
 Gradient clipping (and the difference between clipping individual gradient elements or the gradient as a whole).
Lecture 4 Questions

Convolution
Write down equations of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$, the stride is $T \times S$, the operation performed is in fact crosscorrelation (as usual in convolutional neural networks) and that $O$ output channels are computed. Explain both $\textit{SAME}$ and $\textit{VALID}$ padding schemes and write down output size of the operation for both these padding schemes. 
Batch Normalization
Describe the batch normalization method and explain how it is used during training and during inference. Explicitly write over what is being normalized in case of fully connected layers, and in case of convolutional layers. Compare batch normalization to layer normalization.
Lecture 5 Questions

VGG and ResNet
Describe overall architecture of VGG and ResNet (you do not need to remember exact number of layers/filters, but you should know when a BatchNorm is executed, when ReLU, and how residual connections work when the number of channels increases). Then describe two ResNet extensions (WideNet, DenseNet, PyramidNet, ResNeXt). 
CNN Regularization, SE, MBConv
Describe CNN regularization methods (networks with stochastic depth, Cutout, DropBlock). Then show a Squeeze and excitation block for a ResNet and finally sketch mobile inverted bottleneck with separable convolutions. 
Transposed Convolution
Write down equations of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$, the stride is $S$, the operation performed is in fact crosscorrelation (as usual in convolutional neural networks) and that $O$ output channels are computed. Then write down the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs).
Lecture 6 Questions

Twostage Object Detection
Define object detection task and describe FastRCNN and FasterRCNN architectures. Notably, show what the overall architectures of the networks are, explain the RoIpooling, show how the network parametrizes bounding boxes, how do the losses looks like, how are RoI chosen during training, how the objects are predicted, and what region proposal network does. 
Image Segmentation
Define object detection and image segmentation tasks, and sketch a FasterRCNN and MaskRCNN architectures. Notably, show what the overall architecture of the networks is, explain the RoIpooling and RoIalign layer, show how the network parametrizes bounding boxes, how do the losses looks like, how are RoI chosen during training and how the objects are predicted. 
Singlestage Object Detection
Define object detection task and describe singlestage detector architecture. Namely, show feature pyramid network, define focal loss and sketch RetinaNet – the overall architecture including the convolutional classification and bounding box prediction heads, overall loss, how the gold labels are generated, and how the objects are predicted.
Lecture 7 Questions

LSTM
Write down how the Long ShortTerm Memory cell operates. 
GRU and Highway Networks
Show a basic RNN cell (using just one hidden layer) and then write down how it is extended using gating into the Gated Recurrent Unit. Finally, describe highway networks and compare them to RNN.
Lecture 8 Questions

Characterlevel word embeddings
Describe why are characterlevel word embeddings useful. Then describe the two following methods: RNN: using bidirectional recurrent neural networks
 CNN: describe how convolutional networks (CNNs) can be used to compute characterlevel word embeddings. Write down the exact equation computing the embedding, assuming that the input word consists of characters $\{x_1, \ldots, x_N\}$ represented by embeddings $\{e_1, \ldots, e_N\}$ for $e_i \in \mathbb R^D$, and we use $F$ filters of widths $w_1, \ldots, w_F$. Also explicitly count the number of parameters.

Sequence classification and CRF
Describe how RNNs, bidirectional RNNs and multilayer RNNs can be used to classify every element of a given sequence (i.e., what the architecture of a tagger might be; include also residual connections and suitable places for dropout layers). Then, explain how a CRF layer works, define score computation for a given sequence of inputs and sequence of labels, describe the loss computation during training, and sketch the inference algorithm. 
CTC Loss
Describe CTC loss and the whole settings which can be solved utilizing CTC loss. Then show how CTC loss can be computed. Finally, describe greedy and beam search CTC decoding.
Lecture 9 Questions

Word2vec and Hierarchical and Negative Sampling
Explain how can word embeddings be precomputed using the CBOW and Skipgram models. First start with the variants where full softmax is performed, and then describe how hierarchical softmax and negative sampling is used to speedup training of word embeddings. 
Neural Machine Translation and Attention
Draw/write how an encoderdecoder architecture is used for machine translation, both during training and during inference. Then describe the architecture of an attention module. 
Neural Machine Translation and Subwords
Draw/write how an encoderdecoder architecture is used for machine translation, both during training and during inference (without attention). Furthermore, elaborate on how subword units are used to reduce outofvocabulary problem and sketch BPE algorithm and WordPieces algorithm for constructing fixed number of subword units.
Lecture 10 Questions

Variational Autoencoders
Describe deep generative modelling using variational autoencoders – show VAE architecture, devise training algorithm, write training loss, and propose sampling procedure. 
Generative Adversarial Networks
Describe deep generative modelling using generative adversarial networks  show GAN architecture and describe training procedure and training loss. Mention also CGAN (conditional GAN) and sketch generator and discriminator architecture in a DCGAN.
Lecture 11 Questions

Reinforcement learning
Describe the general reinforcement learning settings and formulate the Monte Carlo algorithm. Then, formulate and prove the policy gradient theorem and write down the REINFORCE algorithm. 
REINFORCE with baseline
Describe the general reinforcement learning settings, formulate the policy gradient theorem and write down the REINFORCE algorithm. Then explain what is the baseline, show policy gradient theorem with the baseline (including the proof of why the baseline can be included), and write down the REINFORCE with baseline algorithm.
Lecture 12 Questions

Speech Synthesis
Describe the WaveNet network (what a dilated convolution and gated activations are, how the residual block looks like, what the overall architecture is, and how global and local conditioning work). Discuss parallelizability of training and inference, show how Parallel WaveNet can speedup inference, and sketch how it is trained. 
Neural Turing Machines
Sketch an overall architecture of a Neural Turing Machine with an LSTM controller, assuming $R$ reading heads and one write head. Describe the addressing mechanism (content addressing and its combination with previous weights, shifts, and sharpening) and reading and writing operations. Finally, describe the inputs and the outputs of the controller.
Lecture 13 Questions

Transformer
Describe Transformer architecture, namely the selfattention layer, multihead selfattention layer, masked selfattention and overall architecture of an encoder and a decoder. Describe positional embeddings, learning rate schedule during training and parallelizability of training and inference. 
BERT
Describe the BERT model architecture (including multihead selfattention layer) and its pretraining – format of input and output data, masked language model and next sentence prediction. Define GELU and describe how the BERT model can be finetuned to perform POS tagging, sentiment analysis and paraphrase detection (detect if two sentences have the same meaning).