Deep Learning – Summer 2020/21
In recent years, deep neural networks have been used to solve complex machinelearning problems. They have achieved significant stateoftheart results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving stateoftheart results, for example in image classification, object detection, lemmatization, speech recognition or 3D object recognition). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.
About
SIS code: NPFL114
Semester: summer
Ecredits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
 lectures: Czech lecture is held on Monday 9:50 in S5, English lecture on Monday 13:10 in S5; first lecture is on Mar 1
 practicals: there are two parallel practicals, a Czech one on Tuesday 10:40 in S9, and an English ones on Tuesday 9:00 in S9; first practicals are on Mar 2
 consultations: voluntary consultations regarding the assignments or other issues are held regularly on Tuesday 14:00 in SU1
All lectures and practicals will be recorded and available on this website.
Given the pandemic situation, all lectures and practicals are currently held online.
Lectures
1. Introduction to Deep Learning Slides PDF Slides CZ Lecture EN Lecture Questions numpy_entropy pca_first mnist_layers_activations
2. Training Neural Networks Slides PDF Slides CZ Lecture EN Lecture Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
3. Training Neural Networks II Slides PDF Slides CZ Lecture EN Lecture Questions mnist_regularization mnist_ensemble uppercase
4. Convolutional Neural Networks Slides PDF Slides CZ Lecture EN Lecture Questions mnist_cnn image_augmentation tf_dataset mnist_multiple cifar_competition
5. Convolutional Neural Networks II Slides PDF Slides CZ Lecture EN Lecture Questions cnn_manual cags_classification
6. Easter Monday EN Consultations mnist_web cags_segmentation 3d_recognition
7. Object Detection Slides PDF Slides CZ Lecture EN Lecture Questions bboxes_utils svhn_competition
8. Recurrent Neural Networks Slides PDF Slides CZ Lecture EN Lecture EN Consultations Questions sequence_classification tagger_we tagger_cle tagger_competition
9. CRF, CRC, Word2Vec Slides PDF Slides CZ Lecture EN Lecture Questions tensorboard_projector tagger_crf speech_recognition
10. Seq2seq, NMT, Transformer Slides PDF Slides CZ Lecture EN Lecture Questions tagger_crf_manual lemmatizer_noattn lemmatizer_attn lemmatizer_competition
11. Transformer, BERT Slides PDF Slides CZ Lecture EN Lecture Questions tagger_transformer sentiment_analysis reading_comprehension
12. Deep Generative Models Slides PDF Slides CZ Lecture EN Lecture Questions vae gan dcgan
13. Introduction to Deep Reinforcement Learning Slides PDF Slides CZ Lecture EN Lecture Questions monte_carlo reinforce reinforce_baseline reinforce_pixels
14. NASNet, Speech Synthesis, External Memory Networks Slides PDF Slides CZ Lecture EN Lecture Questions learning_to_learn
The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
1. Introduction to Deep Learning
Mar 01 Slides PDF Slides CZ Lecture EN Lecture Questions numpy_entropy pca_first mnist_layers_activations
 Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
 Selfinformation, entropy, crossentropy, KLdivergence [Section 3.13 of DBL]
 Gaussian distribution [Section 3.9.3 of DLB]
 Machine Learning Basics [Section 5.15.1.3 of DLB]
 History of Deep Learning [Section 1.2 of DLB]
 Linear regression [Section 5.1.4 of DLB]
 Challenges Motivating Deep Learning [Section 5.11 of DLB]
 Neural network basics
 Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
 Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
 Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
 Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
 Universal approximation theorem
2. Training Neural Networks
Mar 08 Slides PDF Slides CZ Lecture EN Lecture Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
 Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
 Hyperparameters and validation sets [Section 5.3 of DLB]
 Maximum Likelihood Estimation [Section 5.5 of DLB]
 Neural network training
 Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
 Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.1 and 6.2; note that Algorithms 6.5 and 6.6 are used in practice]
 SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
 SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
 SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
 Optimization algorithms with adaptive gradients
 AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
 RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
 Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]
3. Training Neural Networks II
Mar 15 Slides PDF Slides CZ Lecture EN Lecture Questions mnist_regularization mnist_ensemble uppercase
 Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 1012]
 Regularization [Chapter 7 until Section 7.1 of DLB]
 Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
 L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 1718]
 Dataset augmentation [Section 7.4 of DLB]
 Ensembling [Section 7.11 of DLB]
 Dropout [Section 7.12 of DLB]
 Label smoothing [Section 7.5.1 of DLB]
 Saturating nonlinearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
 Parameter initialization strategies [Section 8.4 of DLB]
 Gradient clipping [Section 10.11.1 of DLB]
4. Convolutional Neural Networks
Mar 22 Slides PDF Slides CZ Lecture EN Lecture Questions mnist_cnn image_augmentation tf_dataset mnist_multiple cifar_competition
 Introduction to convolutional networks [Chapter 9 and Sections 9.19.3 of DLB]
 Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
 Max pooling and average pooling [Section 9.3 of DLB]
 Stride and Padding schemes [Section 9.5 of DLB]
 AlexNet [ImageNet Classification with Deep Convolutional Neural Networks]
 VGG [Very Deep Convolutional Networks for LargeScale Image Recognition]
 GoogLeNet (aka Inception) [Going Deeper with Convolutions]
 Batch normalization [Section 8.7.1 of DLB, optionally the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
 Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]
5. Convolutional Neural Networks II
Mar 29 Slides PDF Slides CZ Lecture EN Lecture Questions cnn_manual cags_classification
 Residual CNN Networks
 ResNet [Deep Residual Learning for Image Recognition]
 WideNet [Wide Residual Network]
 DenseNet [Densely Connected Convolutional Networks]
 PyramidNet [Deep Pyramidal Residual Networks]
 ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]
 Regularizing CNN Networks
 SENet [SqueezeandExcitation Networks]
 EfficientNet [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks]
 Transposed convolution
 UNet [UNet: Convolutional Networks for Biomedical Image Segmentation]
6. Easter Monday
Apr 05 EN Consultations mnist_web cags_segmentation 3d_recognition
7. Object Detection
Apr 12 Slides PDF Slides CZ Lecture EN Lecture Questions bboxes_utils svhn_competition
 Fast RCNN [Fast RCNN]
 Proposing RoIs using Faster RCNN [Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks]
 Mask RCNN [Mask RCNN]
 Feature Pyramid Networks [Feature Pyramid Networks for Object Detection]
 Focal Loss, RetinaNet [Focal Loss for Dense Object Detection]
 EfficientDet [EfficientDet: Scalable and Efficient Object Detection]
 Group Normalization [Group Normalization]
8. Recurrent Neural Networks
Apr 19 Slides PDF Slides CZ Lecture EN Lecture EN Consultations Questions sequence_classification tagger_we tagger_cle tagger_competition
 Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
 The challenge of longterm dependencies [Section 10.7 of DLB]
 Long ShortTerm Memory (LSTM) [Section 10.10.1 of DLB, Sepp Hochreiter, Jürgen Schmidhuber (1997): Long shortterm memory, Felix A. Gers, Jürgen Schmidhuber, Fred Cummins (2000): Learning to Forget: Continual Prediction with LSTM]
 Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB, Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Highway Networks [Training Very Deep Networks]
 RNN Regularization
 Variational Dropout [A Theoretically Grounded Application of Dropout in Recurrent Neural Networks]
 Layer Normalization [Layer Normalization]
 Bidirectional RNN [Section 10.3 of DLB]
 Word Embeddings [Section 14.2.4 of DLB]
 Characterlevel embeddings using Recurrent neural networks [C2W model from Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
 Characterlevel embeddings using Convolutional neural networks [CharCNN from CharacterAware Neural Language Models]
9. CRF, CRC, Word2Vec
Apr 26 Slides PDF Slides CZ Lecture EN Lecture Questions tensorboard_projector tagger_crf speech_recognition
 Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of Natural Language Processing (Almost) from Scratch]
 Connectionist Temporal Classification (CTC) loss [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
Word2vec
word embeddings, notably the CBOW and Skipgram architectures [Efficient Estimation of Word Representations in Vector Space] Hierarchical softmax [Section 12.4.3.2 of DLB or Distributed Representations of Words and Phrases and their Compositionality]
 Negative sampling Distributed Representations of Words and Phrases and their Compositionality]
 Characterlevel embeddings using character ngrams [Described simultaneously in several papers as Charagram (Charagram: Embedding Words and Sentences via Character ngrams), Subword Information (Enriching Word Vectors with Subword Information or SubGram (SubGram: Extending SkipGram Word Representation with Substrings)]
10. Seq2seq, NMT, Transformer
May 03 Slides PDF Slides CZ Lecture EN Lecture Questions tagger_crf_manual lemmatizer_noattn lemmatizer_attn lemmatizer_competition
 Neural Machine Translation using EncoderDecoder or SequencetoSequence architecture [Section 12.5.4 of DLB, Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks and Kyunghyun Cho et al.: Learning Phrase Representations using RNN EncoderDecoder for Statistical Machine Translation]
 Using Attention mechanism in Neural Machine Translation [Section 12.4.5.1 of DLB, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
 Translating Subword Units [Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
 Google NMT [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
 Transformer architecture [Attention Is All You Need]
11. Transformer, BERT
May 10 Slides PDF Slides CZ Lecture EN Lecture Questions tagger_transformer sentiment_analysis reading_comprehension
 Transformer architecture [Attention Is All You Need]
 BERT [BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding]
 RoBERTa [RoBERTa: A Robustly Optimized BERT Pretraining Approach]
 ALBERT [ALBERT: A Lite BERT for Selfsupervised Learning of Language Representations]
12. Deep Generative Models
May 17 Slides PDF Slides CZ Lecture EN Lecture Questions vae gan dcgan
 Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 1414.2.3 of DLB]
 Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
 Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB, AutoEncoding Variational Bayes]
 Generative Adversarial Networks
 GAN [Section 20.10.4 of DLB, Generative Adversarial Networks]
 CGAN [Conditional Generative Adversarial Nets]
 DCGAN [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks]
 WGAN [Wasserstein GAN]
 BigGAN [Large Scale Gan Training for High Fidelity Natural Image Synthesis]
13. Introduction to Deep Reinforcement Learning
May 24 Slides PDF Slides CZ Lecture EN Lecture Questions monte_carlo reinforce reinforce_baseline reinforce_pixels
Study material for Reinforcement Learning is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (reffered to as RLB), available online.
 Multiarmed bandits [Sections 22.4 of RLB]
 Markov Decision Process [Sections 33.3 of RLB]
 Policies and Value Functions [Sections 3.5 of RLB]
 Monte Carlo Methods [Sections 55.4 of RLB]
 Policy Gradient Methods [Sections 1313.1 of RLB]
 Policy Gradient Theorem [Section 13.2 of RLB]
 REINFORCE algorithm [Section 13.3 of RLB]
 REINFORCE with baseline algorithm [Section 13.4 of RLB]
14. NASNet, Speech Synthesis, External Memory Networks
May 31 Slides PDF Slides CZ Lecture EN Lecture Questions learning_to_learn
 NasNet [Learning Transferable Architectures for Scalable Image Recognition]
 EfficientNet [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks]
 WaveNet [WaveNet: A Generative Model for Raw Audio]
 Parallel WaveNet [Parallel WaveNet: Fast HighFidelity Speech Synthesis]
 Full speech synthesis pipeline Tacotron 2 [Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions]
 Neural Turing Machine [Neural Turing Machines]
 Differenciable Neural Computer [Hybrid computing using a neural network with dynamic external memory]
 Memory Augmented Neural Networks [Oneshot learning with MemoryAugmented Neural Networks]
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and nonbonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments, you obtain additional 50 bonus points.
Environment
The tasks are evaluated automatically using the ReCodEx Code Examiner.
The evaluation is performed using Python 3.8, TensorFlow 2.4.1, TensorFlow Addons 0.12.1, TensorFlow Probability 0.12.1, TensorFlow Hub 0.11.0 and OpenAI Gym 0.18.0. You should install the exact version of these packages yourselves.
Teamwork
Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
No Cheating
Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are expected to share code and submit identical solutions.
numpy_entropy
Deadline: Mar 15, 23:59 3 points
The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.
Load a file numpy_entropy_data.txt
, whose lines consist of data points of our
dataset, and load numpy_entropy_model.txt
, which describes a model probability distribution,
with each line being a tabseparated pair of (data point, probability).
Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or inf
for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):
 entropy H(data distribution)
 crossentropy H(data distribution, model distribution)
 KLdivergence D_{KL}(data distribution, model distribution)
Use natural logarithms to compute the entropies and the divergence.
For data distribution file numpy_entropy_data.txt
A
BB
A
A
BB
A
CCC
and model distribution file numpy_entropy_model.txt
A 0.5
BB 0.3
CCC 0.1
D 0.1
the output should be
Entropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats
If we remove the CCC 0.1
line from the model distribution, the output should
change to
Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
pca_first
Deadline: Mar 15, 23:59 2 points
The goal of this exercise is to familiarize with TensorFlow tf.Tensor
s,
shapes and basic tensor manipulation methods. Start with the
pca_first.py.
In this assignment, you will compute the covariance matrix of several examples from the MNIST dataset, compute the first principal component and quantify the explained variance of it.
It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 pca_first.py examples=1024 iterations=64
Total variance: 53.12
Explained variance: 9.64
python3 pca_first.py examples=8192 iterations=128
Total variance: 53.05
Explained variance: 9.89
python3 pca_first.py examples=55000 iterations=1024
Total variance: 52.74
Explained variance: 9.71
mnist_layers_activations
Deadline: Mar 15, 23:59 2 points
Before solving the assignment, start by playing with
example_keras_tensorboard.py,
in order to familiarize with TensorFlow and TensorBoard.
Run it, and when it finishes, run TensorBoard using tensorboard logdir logs
.
Then open http://localhost:6006 in a browser and explore the active tabs.
Your goal is to modify the mnist_layers_activations.py template and implement the following:
 A number of hidden layers (including zero) can be specified on the command line
using parameter
hidden_layers
.  Activation function of these hidden layers can be also specified as a command
line parameter
activation
, with supported values ofnone
,relu
,tanh
andsigmoid
.  Print the final accuracy on the test set.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_layers_activations.py hidden_layers=0 activation=none
Epoch 1/10 loss: 0.8272  accuracy: 0.7869  val_loss: 0.2755  val_accuracy: 0.9308
Epoch 2/10 loss: 0.3328  accuracy: 0.9089  val_loss: 0.2419  val_accuracy: 0.9342
Epoch 3/10 loss: 0.2995  accuracy: 0.9165  val_loss: 0.2269  val_accuracy: 0.9392
Epoch 4/10 loss: 0.2886  accuracy: 0.9197  val_loss: 0.2219  val_accuracy: 0.9414
Epoch 5/10 loss: 0.2778  accuracy: 0.9222  val_loss: 0.2202  val_accuracy: 0.9430
Epoch 6/10 loss: 0.2745  accuracy: 0.9234  val_loss: 0.2171  val_accuracy: 0.9416
Epoch 7/10 loss: 0.2669  accuracy: 0.9246  val_loss: 0.2152  val_accuracy: 0.9420
Epoch 8/10 loss: 0.2615  accuracy: 0.9263  val_loss: 0.2159  val_accuracy: 0.9424
Epoch 9/10 loss: 0.2561  accuracy: 0.9280  val_loss: 0.2156  val_accuracy: 0.9404
Epoch 10/10 loss: 0.2596  accuracy: 0.9270  val_loss: 0.2146  val_accuracy: 0.9434
loss: 0.2637  accuracy: 0.9259
python3 mnist_layers_activations.py hidden_layers=1 activation=none
Epoch 1/10 loss: 0.5384  accuracy: 0.8430  val_loss: 0.2438  val_accuracy: 0.9350
Epoch 2/10 loss: 0.2951  accuracy: 0.9166  val_loss: 0.2332  val_accuracy: 0.9350
Epoch 3/10 loss: 0.2816  accuracy: 0.9217  val_loss: 0.2359  val_accuracy: 0.9306
Epoch 4/10 loss: 0.2808  accuracy: 0.9225  val_loss: 0.2283  val_accuracy: 0.9384
Epoch 5/10 loss: 0.2705  accuracy: 0.9227  val_loss: 0.2341  val_accuracy: 0.9370
Epoch 6/10 loss: 0.2718  accuracy: 0.9234  val_loss: 0.2333  val_accuracy: 0.9388
Epoch 7/10 loss: 0.2669  accuracy: 0.9253  val_loss: 0.2223  val_accuracy: 0.9412
Epoch 8/10 loss: 0.2595  accuracy: 0.9281  val_loss: 0.2471  val_accuracy: 0.9342
Epoch 9/10 loss: 0.2573  accuracy: 0.9270  val_loss: 0.2293  val_accuracy: 0.9368
Epoch 10/10 loss: 0.2615  accuracy: 0.9264  val_loss: 0.2318  val_accuracy: 0.9400
loss: 0.2795  accuracy: 0.9241
python3 mnist_layers_activations.py hidden_layers=1 activation=relu
Epoch 1/10 loss: 0.5379  accuracy: 0.8500  val_loss: 0.1459  val_accuracy: 0.9612
Epoch 2/10 loss: 0.1563  accuracy: 0.9553  val_loss: 0.1128  val_accuracy: 0.9682
Epoch 3/10 loss: 0.1052  accuracy: 0.9697  val_loss: 0.0966  val_accuracy: 0.9714
Epoch 4/10 loss: 0.0792  accuracy: 0.9765  val_loss: 0.0864  val_accuracy: 0.9744
Epoch 5/10 loss: 0.0627  accuracy: 0.9814  val_loss: 0.0818  val_accuracy: 0.9768
Epoch 6/10 loss: 0.0500  accuracy: 0.9857  val_loss: 0.0829  val_accuracy: 0.9772
Epoch 7/10 loss: 0.0394  accuracy: 0.9881  val_loss: 0.0747  val_accuracy: 0.9792
Epoch 8/10 loss: 0.0328  accuracy: 0.9905  val_loss: 0.0746  val_accuracy: 0.9788
Epoch 9/10 loss: 0.0239  accuracy: 0.9934  val_loss: 0.0845  val_accuracy: 0.9762
Epoch 10/10 loss: 0.0231  accuracy: 0.9936  val_loss: 0.0806  val_accuracy: 0.9778
loss: 0.0829  accuracy: 0.9773
python3 mnist_layers_activations.py hidden_layers=1 activation=tanh
Epoch 1/10 loss: 0.5338  accuracy: 0.8483  val_loss: 0.1668  val_accuracy: 0.9570
Epoch 2/10 loss: 0.1855  accuracy: 0.9478  val_loss: 0.1262  val_accuracy: 0.9648
Epoch 3/10 loss: 0.1271  accuracy: 0.9640  val_loss: 0.1001  val_accuracy: 0.9724
Epoch 4/10 loss: 0.0966  accuracy: 0.9716  val_loss: 0.0918  val_accuracy: 0.9738
Epoch 5/10 loss: 0.0742  accuracy: 0.9784  val_loss: 0.0813  val_accuracy: 0.9774
Epoch 6/10 loss: 0.0605  accuracy: 0.9832  val_loss: 0.0811  val_accuracy: 0.9750
Epoch 7/10 loss: 0.0471  accuracy: 0.9872  val_loss: 0.0759  val_accuracy: 0.9774
Epoch 8/10 loss: 0.0385  accuracy: 0.9902  val_loss: 0.0761  val_accuracy: 0.9762
Epoch 9/10 loss: 0.0298  accuracy: 0.9929  val_loss: 0.0783  val_accuracy: 0.9766
Epoch 10/10 loss: 0.0257  accuracy: 0.9945  val_loss: 0.0788  val_accuracy: 0.9744
loss: 0.0822  accuracy: 0.9751
python3 mnist_layers_activations.py hidden_layers=1 activation=sigmoid
Epoch 1/10 loss: 0.8219  accuracy: 0.7952  val_loss: 0.2150  val_accuracy: 0.9400
Epoch 2/10 loss: 0.2485  accuracy: 0.9301  val_loss: 0.1632  val_accuracy: 0.9562
Epoch 3/10 loss: 0.1864  accuracy: 0.9477  val_loss: 0.1322  val_accuracy: 0.9636
Epoch 4/10 loss: 0.1513  accuracy: 0.9560  val_loss: 0.1163  val_accuracy: 0.9676
Epoch 5/10 loss: 0.1235  accuracy: 0.9646  val_loss: 0.1041  val_accuracy: 0.9718
Epoch 6/10 loss: 0.1069  accuracy: 0.9702  val_loss: 0.0957  val_accuracy: 0.9722
Epoch 7/10 loss: 0.0889  accuracy: 0.9746  val_loss: 0.0887  val_accuracy: 0.9746
Epoch 8/10 loss: 0.0774  accuracy: 0.9785  val_loss: 0.0869  val_accuracy: 0.9756
Epoch 9/10 loss: 0.0641  accuracy: 0.9832  val_loss: 0.0845  val_accuracy: 0.9760
Epoch 10/10 loss: 0.0594  accuracy: 0.9842  val_loss: 0.0805  val_accuracy: 0.9772
loss: 0.0862  accuracy: 0.9741
python3 mnist_layers_activations.py hidden_layers=3 activation=relu
Epoch 1/10 loss: 0.4989  accuracy: 0.8471  val_loss: 0.1121  val_accuracy: 0.9688
Epoch 2/10 loss: 0.1168  accuracy: 0.9645  val_loss: 0.1028  val_accuracy: 0.9692
Epoch 3/10 loss: 0.0784  accuracy: 0.9756  val_loss: 0.1176  val_accuracy: 0.9654
Epoch 4/10 loss: 0.0586  accuracy: 0.9810  val_loss: 0.0860  val_accuracy: 0.9732
Epoch 5/10 loss: 0.0451  accuracy: 0.9849  val_loss: 0.0867  val_accuracy: 0.9778
Epoch 6/10 loss: 0.0398  accuracy: 0.9869  val_loss: 0.0884  val_accuracy: 0.9782
Epoch 7/10 loss: 0.0303  accuracy: 0.9898  val_loss: 0.0797  val_accuracy: 0.9818
Epoch 8/10 loss: 0.0256  accuracy: 0.9917  val_loss: 0.0892  val_accuracy: 0.9796
Epoch 9/10 loss: 0.0218  accuracy: 0.9930  val_loss: 0.1074  val_accuracy: 0.9732
Epoch 10/10 loss: 0.0220  accuracy: 0.9927  val_loss: 0.0821  val_accuracy: 0.9796
loss: 0.0883  accuracy: 0.9779
python3 mnist_layers_activations.py hidden_layers=10 activation=relu
Epoch 1/10 loss: 0.6597  accuracy: 0.7806  val_loss: 0.1348  val_accuracy: 0.9622
Epoch 2/10 loss: 0.1533  accuracy: 0.9561  val_loss: 0.1172  val_accuracy: 0.9670
Epoch 3/10 loss: 0.1154  accuracy: 0.9680  val_loss: 0.0991  val_accuracy: 0.9708
Epoch 4/10 loss: 0.0912  accuracy: 0.9737  val_loss: 0.1112  val_accuracy: 0.9704
Epoch 5/10 loss: 0.0758  accuracy: 0.9795  val_loss: 0.1060  val_accuracy: 0.9732
Epoch 6/10 loss: 0.0729  accuracy: 0.9794  val_loss: 0.1077  val_accuracy: 0.9730
Epoch 7/10 loss: 0.0647  accuracy: 0.9825  val_loss: 0.0921  val_accuracy: 0.9734
Epoch 8/10 loss: 0.0554  accuracy: 0.9845  val_loss: 0.0994  val_accuracy: 0.9756
Epoch 9/10 loss: 0.0503  accuracy: 0.9871  val_loss: 0.1114  val_accuracy: 0.9720
Epoch 10/10 loss: 0.0470  accuracy: 0.9875  val_loss: 0.1084  val_accuracy: 0.9740
loss: 0.1119  accuracy: 0.9736
python3 mnist_layers_activations.py hidden_layers=10 activation=sigmoid
Epoch 1/10 loss: 2.3115  accuracy: 0.1026  val_loss: 1.8614  val_accuracy: 0.2174
Epoch 2/10 loss: 1.8910  accuracy: 0.1963  val_loss: 1.8708  val_accuracy: 0.2064
Epoch 3/10 loss: 1.8796  accuracy: 0.1998  val_loss: 1.8007  val_accuracy: 0.2030
Epoch 4/10 loss: 1.8249  accuracy: 0.2047  val_loss: 1.4527  val_accuracy: 0.3074
Epoch 5/10 loss: 1.2759  accuracy: 0.4293  val_loss: 0.8859  val_accuracy: 0.6154
Epoch 6/10 loss: 0.9357  accuracy: 0.5910  val_loss: 0.8584  val_accuracy: 0.6884
Epoch 7/10 loss: 0.8281  accuracy: 0.6777  val_loss: 0.6917  val_accuracy: 0.7296
Epoch 8/10 loss: 0.7334  accuracy: 0.7111  val_loss: 0.6801  val_accuracy: 0.7124
Epoch 9/10 loss: 0.7111  accuracy: 0.7132  val_loss: 0.7223  val_accuracy: 0.6916
Epoch 10/10 loss: 0.6875  accuracy: 0.7243  val_loss: 0.6183  val_accuracy: 0.7850
loss: 0.6737  accuracy: 0.7623
sgd_backpropagation
Deadline: Mar 22, 23:59 3 points
In this exercise you will learn how to compute gradients using the socalled automatic differentiation, which is implemented by an automated backpropagation algorithm in TensorFlow. You will then perform training by running manually implemented minibatch stochastic gradient descent.
Starting with the sgd_backpropagation.py template, you should:
 implement a neural network with a single tanh hidden layer and categorical output layer;
 compute the crossentropy loss;
 use
tf.GradientTape
to automatically compute the gradient of the loss with respect to all variables;  perform the SGD update.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 sgd_backpropagation.py batch_size=64 hidden_layer=20 learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Test accuracy after epoch 5 is 94.60
python3 sgd_backpropagation.py batch_size=100 hidden_layer=32 learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Test accuracy after epoch 5 is 95.31
sgd_manual
Deadline: Mar 22, 23:59 2 points
The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.
While in this assignment we compute the gradient manually, we will nearly always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise than a realworld application. Furthermore, we will compute the derivatives together on the Mar 16 practicals.
Start with the sgd_manual.py template, which is based on sgd_backpropagation.py one. Be aware that these templates generates each a different output file.
In order to check that you do not use automatic differentiation, ReCodEx checks
that you do not use tf.GradientTape
in your solution.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 sgd_manual.py batch_size=64 hidden_layer=20 learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Test accuracy after epoch 5 is 94.60
python3 sgd_manual.py batch_size=100 hidden_layer=32 learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Test accuracy after epoch 5 is 95.31
mnist_training
Deadline: Mar 22, 23:59 3 points
This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:
 Using specified optimizer (either
SGD
orAdam
).  Optionally using momentum for the
SGD
optimizer.  Using specified learning rate for the optimizer.
 Optionally use a given learning rate schedule. The schedule can be either
exponential
orpolynomial
(with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_training.py optimizer=SGD learning_rate=0.01
Epoch 1/10 loss: 1.2077  accuracy: 0.6998  val_loss: 0.3662  val_accuracy: 0.9146
Epoch 2/10 loss: 0.4205  accuracy: 0.8871  val_loss: 0.2848  val_accuracy: 0.9258
Epoch 3/10 loss: 0.3458  accuracy: 0.9038  val_loss: 0.2496  val_accuracy: 0.9350
Epoch 4/10 loss: 0.3115  accuracy: 0.9139  val_loss: 0.2292  val_accuracy: 0.9390
Epoch 5/10 loss: 0.2862  accuracy: 0.9202  val_loss: 0.2131  val_accuracy: 0.9426
Epoch 6/10 loss: 0.2698  accuracy: 0.9231  val_loss: 0.2003  val_accuracy: 0.9464
Epoch 7/10 loss: 0.2489  accuracy: 0.9296  val_loss: 0.1881  val_accuracy: 0.9500
Epoch 8/10 loss: 0.2344  accuracy: 0.9331  val_loss: 0.1821  val_accuracy: 0.9522
Epoch 9/10 loss: 0.2203  accuracy: 0.9385  val_loss: 0.1715  val_accuracy: 0.9560
Epoch 10/10 loss: 0.2130  accuracy: 0.9397  val_loss: 0.1650  val_accuracy: 0.9572
loss: 0.1977  accuracy: 0.9442
python3 mnist_training.py optimizer=SGD learning_rate=0.01 momentum=0.9
Epoch 1/10 loss: 0.5876  accuracy: 0.8309  val_loss: 0.1684  val_accuracy: 0.9560
Epoch 2/10 loss: 0.1929  accuracy: 0.9458  val_loss: 0.1274  val_accuracy: 0.9644
Epoch 3/10 loss: 0.1370  accuracy: 0.9617  val_loss: 0.1051  val_accuracy: 0.9706
Epoch 4/10 loss: 0.1073  accuracy: 0.9696  val_loss: 0.0922  val_accuracy: 0.9746
Epoch 5/10 loss: 0.0870  accuracy: 0.9754  val_loss: 0.0844  val_accuracy: 0.9782
Epoch 6/10 loss: 0.0740  accuracy: 0.9798  val_loss: 0.0790  val_accuracy: 0.9782
Epoch 7/10 loss: 0.0616  accuracy: 0.9827  val_loss: 0.0738  val_accuracy: 0.9820
Epoch 8/10 loss: 0.0546  accuracy: 0.9853  val_loss: 0.0749  val_accuracy: 0.9796
Epoch 9/10 loss: 0.0450  accuracy: 0.9878  val_loss: 0.0762  val_accuracy: 0.9798
Epoch 10/10 loss: 0.0438  accuracy: 0.9885  val_loss: 0.0703  val_accuracy: 0.9806
loss: 0.0675  accuracy: 0.9794
python3 mnist_training.py optimizer=SGD learning_rate=0.1
Epoch 1/10 loss: 0.5462  accuracy: 0.8503  val_loss: 0.1677  val_accuracy: 0.9572
Epoch 2/10 loss: 0.1909  accuracy: 0.9459  val_loss: 0.1267  val_accuracy: 0.9648
Epoch 3/10 loss: 0.1361  accuracy: 0.9615  val_loss: 0.0994  val_accuracy: 0.9724
Epoch 4/10 loss: 0.1057  accuracy: 0.9699  val_loss: 0.0890  val_accuracy: 0.9762
Epoch 5/10 loss: 0.0851  accuracy: 0.9762  val_loss: 0.0844  val_accuracy: 0.9784
Epoch 6/10 loss: 0.0730  accuracy: 0.9796  val_loss: 0.0800  val_accuracy: 0.9784
Epoch 7/10 loss: 0.0604  accuracy: 0.9833  val_loss: 0.0725  val_accuracy: 0.9814
Epoch 8/10 loss: 0.0536  accuracy: 0.9859  val_loss: 0.0726  val_accuracy: 0.9796
Epoch 9/10 loss: 0.0444  accuracy: 0.9886  val_loss: 0.0744  val_accuracy: 0.9802
Epoch 10/10 loss: 0.0430  accuracy: 0.9883  val_loss: 0.0665  val_accuracy: 0.9822
loss: 0.0658  accuracy: 0.9800
python3 mnist_training.py optimizer=Adam learning_rate=0.001
Epoch 1/10 loss: 0.4529  accuracy: 0.8712  val_loss: 0.1166  val_accuracy: 0.9686
Epoch 2/10 loss: 0.1205  accuracy: 0.9648  val_loss: 0.0921  val_accuracy: 0.9748
Epoch 3/10 loss: 0.0763  accuracy: 0.9775  val_loss: 0.0831  val_accuracy: 0.9774
Epoch 4/10 loss: 0.0540  accuracy: 0.9844  val_loss: 0.0758  val_accuracy: 0.9780
Epoch 5/10 loss: 0.0408  accuracy: 0.9879  val_loss: 0.0733  val_accuracy: 0.9808
Epoch 6/10 loss: 0.0298  accuracy: 0.9919  val_loss: 0.0833  val_accuracy: 0.9810
Epoch 7/10 loss: 0.0238  accuracy: 0.9936  val_loss: 0.0761  val_accuracy: 0.9814
Epoch 8/10 loss: 0.0169  accuracy: 0.9950  val_loss: 0.0760  val_accuracy: 0.9796
Epoch 9/10 loss: 0.0132  accuracy: 0.9966  val_loss: 0.0810  val_accuracy: 0.9814
Epoch 10/10 loss: 0.0116  accuracy: 0.9968  val_loss: 0.0913  val_accuracy: 0.9782
loss: 0.0812  accuracy: 0.9784
python3 mnist_training.py optimizer=Adam learning_rate=0.01
Epoch 1/10 loss: 0.3453  accuracy: 0.8944  val_loss: 0.1442  val_accuracy: 0.9586
Epoch 2/10 loss: 0.1415  accuracy: 0.9585  val_loss: 0.1317  val_accuracy: 0.9638
Epoch 3/10 loss: 0.1126  accuracy: 0.9685  val_loss: 0.1323  val_accuracy: 0.9646
Epoch 4/10 loss: 0.0977  accuracy: 0.9720  val_loss: 0.1397  val_accuracy: 0.9684
Epoch 5/10 loss: 0.0938  accuracy: 0.9744  val_loss: 0.1374  val_accuracy: 0.9708
Epoch 6/10 loss: 0.0864  accuracy: 0.9755  val_loss: 0.2143  val_accuracy: 0.9618
Epoch 7/10 loss: 0.0863  accuracy: 0.9773  val_loss: 0.1833  val_accuracy: 0.9696
Epoch 8/10 loss: 0.0741  accuracy: 0.9801  val_loss: 0.1747  val_accuracy: 0.9716
Epoch 9/10 loss: 0.0734  accuracy: 0.9815  val_loss: 0.2182  val_accuracy: 0.9668
Epoch 10/10 loss: 0.0715  accuracy: 0.9828  val_loss: 0.2157  val_accuracy: 0.9698
loss: 0.2383  accuracy: 0.9687
python3 mnist_training.py optimizer=Adam learning_rate=0.01 decay=exponential learning_rate_final=0.001
Epoch 1/10 loss: 0.3396  accuracy: 0.8952  val_loss: 0.1255  val_accuracy: 0.9652
Epoch 2/10 loss: 0.1132  accuracy: 0.9654  val_loss: 0.1273  val_accuracy: 0.9666
Epoch 3/10 loss: 0.0714  accuracy: 0.9776  val_loss: 0.0896  val_accuracy: 0.9768
Epoch 4/10 loss: 0.0467  accuracy: 0.9854  val_loss: 0.0970  val_accuracy: 0.9756
Epoch 5/10 loss: 0.0315  accuracy: 0.9896  val_loss: 0.1041  val_accuracy: 0.9788
Epoch 6/10 loss: 0.0193  accuracy: 0.9934  val_loss: 0.1029  val_accuracy: 0.9790
Epoch 7/10 loss: 0.0121  accuracy: 0.9961  val_loss: 0.0926  val_accuracy: 0.9802
Epoch 8/10 loss: 0.0061  accuracy: 0.9983  val_loss: 0.1044  val_accuracy: 0.9802
Epoch 9/10 loss: 0.0035  accuracy: 0.9992  val_loss: 0.0992  val_accuracy: 0.9806
Epoch 10/10 loss: 0.0029  accuracy: 0.9994  val_loss: 0.1052  val_accuracy: 0.9816
loss: 0.0880  accuracy: 0.9797
Final learning rate: 0.001
python3 mnist_training.py optimizer=Adam learning_rate=0.01 decay=polynomial learning_rate_final=0.0001
Epoch 1/10 loss: 0.3428  accuracy: 0.8944  val_loss: 0.1176  val_accuracy: 0.9634
Epoch 2/10 loss: 0.1229  accuracy: 0.9632  val_loss: 0.1303  val_accuracy: 0.9642
Epoch 3/10 loss: 0.0920  accuracy: 0.9728  val_loss: 0.1064  val_accuracy: 0.9724
Epoch 4/10 loss: 0.0702  accuracy: 0.9784  val_loss: 0.1086  val_accuracy: 0.9726
Epoch 5/10 loss: 0.0472  accuracy: 0.9856  val_loss: 0.1197  val_accuracy: 0.9738
Epoch 6/10 loss: 0.0328  accuracy: 0.9896  val_loss: 0.1195  val_accuracy: 0.9758
Epoch 7/10 loss: 0.0208  accuracy: 0.9929  val_loss: 0.1094  val_accuracy: 0.9776
Epoch 8/10 loss: 0.0112  accuracy: 0.9962  val_loss: 0.1135  val_accuracy: 0.9794
Epoch 9/10 loss: 0.0051  accuracy: 0.9986  val_loss: 0.1074  val_accuracy: 0.9800
Epoch 10/10 loss: 0.0027  accuracy: 0.9995  val_loss: 0.1088  val_accuracy: 0.9794
loss: 0.0899  accuracy: 0.9816
Final learning rate: 0.0001
gym_cartpole
Deadline: Mar 22, 23:59 3 points
Solve the CartPolev1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole_data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.
The solution to this task should be a model which passes evaluation on random
inputs. This evaluation can be performed by running the
gym_cartpole.py
with evaluate
argument (optionally rendering if render
option is
provided), or directly calling the evaluate_model
method. In order to pass,
you must achieve an average reward of at least 475 on 100 episodes. Your model
should have either one or two outputs (i.e., using either sigmoid or softmax
output function).
When designing the model, you should consider that the size of the training data is very small and the data is quite noisy.
When submitting to ReCodEx, do not forget to also submit the trained model.
mnist_regularization
Deadline: Mar 29, 23:59 3 points
You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:
 Allow using dropout with rate
args.dropout
. Add a dropout layer after the firstFlatten
and also after allDense
hidden layers (but not after the output layer).  Allow using L2 regularization with weight
args.l2
. Usetf.keras.regularizers.L1L2
as a regularizer for all kernels (but not biases) of allDense
layers (including the last one).  Allow using label smoothing with weight
args.label_smoothing
. Instead ofSparseCategoricalCrossentropy
, you will need to useCategoricalCrossentropy
which offerslabel_smoothing
argument.
In ReCodEx, there will be six tests tests (two for each regularization methods) and you will get half a point for passing each one.
In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):
 dropout rate
0
,0.3
,0.5
,0.6
,0.8
;  l2 regularization
0
,0.001
,0.0001
,0.00001
;  label smoothing
0
,0.1
,0.3
,0.5
.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_regularization.py dropout=0.3
Epoch 5/30 loss: 0.2319  accuracy: 0.9309  val_loss: 0.1919  val_accuracy: 0.9420
Epoch 10/30 loss: 0.1207  accuracy: 0.9608  val_loss: 0.1507  val_accuracy: 0.9560
Epoch 15/30 loss: 0.0785  accuracy: 0.9758  val_loss: 0.1300  val_accuracy: 0.9606
Epoch 20/30 loss: 0.0595  accuracy: 0.9833  val_loss: 0.1292  val_accuracy: 0.9628
Epoch 25/30 loss: 0.0517  accuracy: 0.9816  val_loss: 0.1311  val_accuracy: 0.9618
Epoch 30/30 loss: 0.0315  accuracy: 0.9919  val_loss: 0.1413  val_accuracy: 0.9618
loss: 0.1630  accuracy: 0.9541
python3 mnist_regularization.py dropout=0.5
Epoch 5/30 loss: 0.3931  accuracy: 0.8815  val_loss: 0.2147  val_accuracy: 0.9366
Epoch 10/30 loss: 0.2626  accuracy: 0.9232  val_loss: 0.1665  val_accuracy: 0.9528
Epoch 15/30 loss: 0.2229  accuracy: 0.9261  val_loss: 0.1427  val_accuracy: 0.9582
Epoch 20/30 loss: 0.1765  accuracy: 0.9473  val_loss: 0.1379  val_accuracy: 0.9596
Epoch 25/30 loss: 0.1653  accuracy: 0.9477  val_loss: 0.1272  val_accuracy: 0.9628
Epoch 30/30 loss: 0.1335  accuracy: 0.9596  val_loss: 0.1251  val_accuracy: 0.9638
loss: 0.1510  accuracy: 0.9521
python3 mnist_regularization.py l2=0.001
Epoch 5/30 loss: 0.3280  accuracy: 0.9699  val_loss: 0.3755  val_accuracy: 0.9426
Epoch 10/30 loss: 0.2259  accuracy: 0.9867  val_loss: 0.3511  val_accuracy: 0.9408
Epoch 15/30 loss: 0.2089  accuracy: 0.9866  val_loss: 0.3109  val_accuracy: 0.9516
Epoch 20/30 loss: 0.1966  accuracy: 0.9911  val_loss: 0.2973  val_accuracy: 0.9532
Epoch 25/30 loss: 0.1928  accuracy: 0.9947  val_loss: 0.3079  val_accuracy: 0.9510
Epoch 30/30 loss: 0.1916  accuracy: 0.9918  val_loss: 0.3002  val_accuracy: 0.9522
loss: 0.3313  accuracy: 0.9394
python3 mnist_regularization.py l2=0.0001
Epoch 5/30 loss: 0.1387  accuracy: 0.9793  val_loss: 0.2231  val_accuracy: 0.9452
Epoch 10/30 loss: 0.0686  accuracy: 0.9982  val_loss: 0.2132  val_accuracy: 0.9508
Epoch 15/30 loss: 0.0530  accuracy: 1.0000  val_loss: 0.1938  val_accuracy: 0.9564
Epoch 20/30 loss: 0.0446  accuracy: 1.0000  val_loss: 0.1954  val_accuracy: 0.9538
Epoch 25/30 loss: 0.0431  accuracy: 1.0000  val_loss: 0.1909  val_accuracy: 0.9572
Epoch 30/30 loss: 0.0439  accuracy: 1.0000  val_loss: 0.1914  val_accuracy: 0.9608
loss: 0.2141  accuracy: 0.9512
python3 mnist_regularization.py label_smoothing=0.1
Epoch 5/30 loss: 0.6077  accuracy: 0.9865  val_loss: 0.6626  val_accuracy: 0.9610
Epoch 10/30 loss: 0.5422  accuracy: 0.9994  val_loss: 0.6414  val_accuracy: 0.9642
Epoch 15/30 loss: 0.5225  accuracy: 1.0000  val_loss: 0.6324  val_accuracy: 0.9654
Epoch 20/30 loss: 0.5145  accuracy: 1.0000  val_loss: 0.6289  val_accuracy: 0.9674
Epoch 25/30 loss: 0.5101  accuracy: 1.0000  val_loss: 0.6281  val_accuracy: 0.9678
Epoch 30/30 loss: 0.5081  accuracy: 1.0000  val_loss: 0.6271  val_accuracy: 0.9682
loss: 0.6449  accuracy: 0.9592
python3 mnist_regularization.py label_smoothing=0.3
Epoch 5/30 loss: 1.2506  accuracy: 0.9884  val_loss: 1.2963  val_accuracy: 0.9630
Epoch 10/30 loss: 1.2070  accuracy: 0.9992  val_loss: 1.2799  val_accuracy: 0.9652
Epoch 15/30 loss: 1.1937  accuracy: 1.0000  val_loss: 1.2773  val_accuracy: 0.9638
Epoch 20/30 loss: 1.1875  accuracy: 1.0000  val_loss: 1.2748  val_accuracy: 0.9662
Epoch 25/30 loss: 1.1847  accuracy: 1.0000  val_loss: 1.2753  val_accuracy: 0.9676
Epoch 30/30 loss: 1.1834  accuracy: 1.0000  val_loss: 1.2760  val_accuracy: 0.9660
loss: 1.2875  accuracy: 0.9587
mnist_ensemble
Deadline: Mar 29, 23:59 2 points
Your goal in this assignment is to implement model ensembling.
The mnist_ensemble.py
template trains args.models
individual models, and your goal is to perform
an ensemble of the first model, first two models, first three models, …, all
models, and evaluate their accuracy on the development set.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_ensemble.py models=3
Model 1, individual accuracy 97.78, ensemble accuracy 97.78
Model 2, individual accuracy 97.76, ensemble accuracy 98.02
Model 3, individual accuracy 97.88, ensemble accuracy 98.06
python3 mnist_ensemble.py models=5
Model 1, individual accuracy 97.78, ensemble accuracy 97.78
Model 2, individual accuracy 97.76, ensemble accuracy 98.02
Model 3, individual accuracy 97.88, ensemble accuracy 98.06
Model 4, individual accuracy 97.78, ensemble accuracy 98.10
Model 5, individual accuracy 97.78, ensemble accuracy 98.10
uppercase
Deadline: Mar 29, 23:59 4 points+5 bonus
This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.
This is an opendata task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py/ipynb file.
The task is also a competition. Everyone who submits
a solution which achieves at least 98.5% accuracy will get 4 basic points; the
5 bonus points will be distributed depending on relative ordering of your
solutions. The accuracy is computed percharacter and can be evaluated
by running uppercase_data.py
with evaluate
argument, or using its evaluate_file
method.
You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.
Do not use RNNs, CNNs or Transformer in this task (if you have doubts, contact me).
mnist_cnn
Deadline: Apr 05, 23:59 4 points
To pass this assignment, you will learn to construct basic convolutional
neural network layers. Start with the
mnist_cnn.py
template and assume the requested architecture is described by the cnn
argument, which contains commaseparated specifications of the following layers:
Cfilterskernel_sizestridepadding
: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example:C1031same
CBfilterskernel_sizestridepadding
: Same asCfilterskernel_sizestridepadding
, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally ReLU activation. Example:CB1031same
Mpool_sizestride
: Add max pooling with specified size and stride, using the default"valid"
padding. Example:M32
R[layers]
: Add a residual connection. Thelayers
contain a specification of at least one convolutional layer (but not a recursive residual connectionR
). The input to theR
layer should be processed sequentially bylayers
, and the produced output (after the ReLU nonlinearty of the last layer) should be added to the input (of thisR
layer). Example:R[C1631same,C1631same]
F
: Flatten inputs. Must appear exactly once in the architecture.Hhidden_layer_size
: Add a dense layer with ReLU activation and specified size. Example:H100
Ddropout_rate
: Apply dropout with the given dropout rate. Example:D0.5
An example architecture might be cnn=CB1652same,M32,F,H100,D0.5
.
You can assume the resulting network is valid; it is fine to crash if it is not.
After a successful ReCodEx submission, you can try obtaining the best accuracy
on MNIST and then advance to cifar_competition
.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_cnn.py cnn=F,H100
Epoch 1/5 loss: 0.5379  accuracy: 0.8500  val_loss: 0.1459  val_accuracy: 0.9612
Epoch 2/5 loss: 0.1563  accuracy: 0.9553  val_loss: 0.1128  val_accuracy: 0.9682
Epoch 3/5 loss: 0.1052  accuracy: 0.9697  val_loss: 0.0966  val_accuracy: 0.9714
Epoch 4/5 loss: 0.0792  accuracy: 0.9765  val_loss: 0.0864  val_accuracy: 0.9744
Epoch 5/5 loss: 0.0627  accuracy: 0.9814  val_loss: 0.0818  val_accuracy: 0.9768
loss: 0.0844  accuracy: 0.9757
python3 mnist_cnn.py cnn=F,H100,D0.5
Epoch 1/5 loss: 0.7447  accuracy: 0.7719  val_loss: 0.1617  val_accuracy: 0.9596
Epoch 2/5 loss: 0.2781  accuracy: 0.9167  val_loss: 0.1266  val_accuracy: 0.9668
Epoch 3/5 loss: 0.2293  accuracy: 0.9321  val_loss: 0.1097  val_accuracy: 0.9696
Epoch 4/5 loss: 0.2003  accuracy: 0.9399  val_loss: 0.1035  val_accuracy: 0.9716
Epoch 5/5 loss: 0.1858  accuracy: 0.9444  val_loss: 0.1019  val_accuracy: 0.9728
loss: 0.1131  accuracy: 0.9676
python3 mnist_cnn.py cnn=M52,F,H50
Epoch 1/5 loss: 1.0752  accuracy: 0.6618  val_loss: 0.3934  val_accuracy: 0.8818
Epoch 2/5 loss: 0.4421  accuracy: 0.8598  val_loss: 0.3241  val_accuracy: 0.9000
Epoch 3/5 loss: 0.3651  accuracy: 0.8849  val_loss: 0.2996  val_accuracy: 0.9078
Epoch 4/5 loss: 0.3271  accuracy: 0.8951  val_loss: 0.2712  val_accuracy: 0.9174
Epoch 5/5 loss: 0.3014  accuracy: 0.9049  val_loss: 0.2632  val_accuracy: 0.9182
loss: 0.2967  accuracy: 0.9067
python3 mnist_cnn.py cnn=C835same,C832valid,F,H50
Epoch 1/5 loss: 1.1907  accuracy: 0.6001  val_loss: 0.3445  val_accuracy: 0.9004
Epoch 2/5 loss: 0.4124  accuracy: 0.8730  val_loss: 0.2818  val_accuracy: 0.9158
Epoch 3/5 loss: 0.3335  accuracy: 0.8970  val_loss: 0.2523  val_accuracy: 0.9254
Epoch 4/5 loss: 0.3036  accuracy: 0.9043  val_loss: 0.2292  val_accuracy: 0.9316
Epoch 5/5 loss: 0.2802  accuracy: 0.9143  val_loss: 0.2186  val_accuracy: 0.9340
loss: 0.2520  accuracy: 0.9243
python3 mnist_cnn.py cnn=CB635valid,F,H32
Epoch 1/5 loss: 0.9799  accuracy: 0.6768  val_loss: 0.2519  val_accuracy: 0.9230
Epoch 2/5 loss: 0.3122  accuracy: 0.9045  val_loss: 0.2116  val_accuracy: 0.9338
Epoch 3/5 loss: 0.2493  accuracy: 0.9230  val_loss: 0.1792  val_accuracy: 0.9496
Epoch 4/5 loss: 0.2147  accuracy: 0.9322  val_loss: 0.1637  val_accuracy: 0.9528
Epoch 5/5 loss: 0.1873  accuracy: 0.9415  val_loss: 0.1544  val_accuracy: 0.9566
loss: 0.1857  accuracy: 0.9424
python3 mnist_cnn.py cnn=CB835valid,R[CB831same,CB831same],F,H50
Epoch 1/5 loss: 0.7976  accuracy: 0.7449  val_loss: 0.1791  val_accuracy: 0.9458
Epoch 2/5 loss: 0.2052  accuracy: 0.9360  val_loss: 0.1531  val_accuracy: 0.9506
Epoch 3/5 loss: 0.1497  accuracy: 0.9524  val_loss: 0.1340  val_accuracy: 0.9600
Epoch 4/5 loss: 0.1261  accuracy: 0.9593  val_loss: 0.1226  val_accuracy: 0.9624
Epoch 5/5 loss: 0.1113  accuracy: 0.9642  val_loss: 0.1094  val_accuracy: 0.9684
loss: 0.1212  accuracy: 0.9609
image_augmentation
Deadline: Apr 05, 23:59 1 points
The template image_augmentation.py creates a simple convolutional network for classifying CIFAR10. Your goal is to perform image data augmentation operations using ImageDataGenerator and to utilize these data during training.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 image_augmentation.py batch_size=50
Epoch 1/5 loss: 2.2698  accuracy: 0.1253  val_loss: 1.9850  val_accuracy: 0.2590
Epoch 2/5 loss: 2.0054  accuracy: 0.2387  val_loss: 1.7783  val_accuracy: 0.3250
Epoch 3/5 loss: 1.8557  accuracy: 0.3121  val_loss: 1.7411  val_accuracy: 0.3620
Epoch 4/5 loss: 1.7431  accuracy: 0.3565  val_loss: 1.6151  val_accuracy: 0.4160
Epoch 5/5 loss: 1.6636  accuracy: 0.3849  val_loss: 1.6074  val_accuracy: 0.4230
python3 image_augmentation.py batch_size=100
Epoch 1/5 loss: 2.2671  accuracy: 0.1350  val_loss: 1.9996  val_accuracy: 0.2680
Epoch 2/5 loss: 1.9756  accuracy: 0.2813  val_loss: 1.7990  val_accuracy: 0.3400
Epoch 3/5 loss: 1.8361  accuracy: 0.3266  val_loss: 1.6944  val_accuracy: 0.3550
Epoch 4/5 loss: 1.7677  accuracy: 0.3546  val_loss: 1.6714  val_accuracy: 0.3850
Epoch 5/5 loss: 1.6904  accuracy: 0.3673  val_loss: 1.6651  val_accuracy: 0.3870
tf_dataset
Deadline: Apr 05, 23:59 2 points
In this assignment you will familiarize yourselves with tf.data
, which is
TensorFlow highlevel API for constructing input pipelines. If you want,
you can read an official TensorFlow tf.data guide
or reference API manual.
The goal of this assignment is to implement image augmentation preprocessing
similar to image_augmentation
, but with tf.data
. Start with the
tf_dataset.py
template and implement the input pipelines employing the tf.data.Dataset
.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tf_dataset.py batch_size=50
Epoch 1/5 loss: 2.2395  accuracy: 0.1408  val_loss: 1.9160  val_accuracy: 0.3000
Epoch 2/5 loss: 1.9410  accuracy: 0.2794  val_loss: 1.7881  val_accuracy: 0.3430
Epoch 3/5 loss: 1.8415  accuracy: 0.3287  val_loss: 1.6749  val_accuracy: 0.3740
Epoch 4/5 loss: 1.7689  accuracy: 0.3480  val_loss: 1.6263  val_accuracy: 0.3780
Epoch 5/5 loss: 1.7185  accuracy: 0.3634  val_loss: 1.5976  val_accuracy: 0.4260
python3 tf_dataset.py batch_size=100
Epoch 1/5 loss: 2.2697  accuracy: 0.1305  val_loss: 2.0089  val_accuracy: 0.2700
Epoch 2/5 loss: 2.0114  accuracy: 0.2545  val_loss: 1.8020  val_accuracy: 0.3410
Epoch 3/5 loss: 1.8473  accuracy: 0.3278  val_loss: 1.7071  val_accuracy: 0.3630
Epoch 4/5 loss: 1.7961  accuracy: 0.3472  val_loss: 1.6509  val_accuracy: 0.3840
Epoch 5/5 loss: 1.7164  accuracy: 0.3681  val_loss: 1.6429  val_accuracy: 0.3910
mnist_multiple
Deadline: Apr 05, 23:59 3 points
In this assignment you will implement a model with multiple inputs and outputs. Start with the mnist_multiple.py template and:
 The goal is to create a model, which given two input MNIST images predicts, if the digit on the first one is larger than on the second one.
 The model has four outputs:
 direct prediction whether the first digit is larger than the second one,
 digit classification for the first image,
 digit classification for the second image,
 indirect prediction comparing the digits predicted by the above two outputs.
 You need to implement:
 the model, using multiple inputs, outputs, losses and metrics;
 construction of twoimage dataset examples using regular MNIST data via the
tf.data
API.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 mnist_multiple.py batch_size=50
Epoch 1/5 loss: 1.6499  digit_1_loss: 0.6142  digit_2_loss: 0.6227  direct_prediction_loss: 0.4130  direct_prediction_accuracy: 0.7896  indirect_prediction_accuracy: 0.8972  val_loss: 0.3579  val_digit_1_loss: 0.1265  val_digit_2_loss: 0.0724  val_direct_prediction_loss: 0.1590  val_direct_prediction_accuracy: 0.9428  val_indirect_prediction_accuracy: 0.9800
Epoch 2/5 loss: 0.3472  digit_1_loss: 0.0965  digit_2_loss: 0.0988  direct_prediction_loss: 0.1519  direct_prediction_accuracy: 0.9452  indirect_prediction_accuracy: 0.9788  val_loss: 0.2222  val_digit_1_loss: 0.0859  val_digit_2_loss: 0.0555  val_direct_prediction_loss: 0.0808  val_direct_prediction_accuracy: 0.9724  val_indirect_prediction_accuracy: 0.9872
Epoch 3/5 loss: 0.2184  digit_1_loss: 0.0597  digit_2_loss: 0.0624  direct_prediction_loss: 0.0964  direct_prediction_accuracy: 0.9643  indirect_prediction_accuracy: 0.9868  val_loss: 0.1976  val_digit_1_loss: 0.0776  val_digit_2_loss: 0.0610  val_direct_prediction_loss: 0.0590  val_direct_prediction_accuracy: 0.9824  val_indirect_prediction_accuracy: 0.9856
Epoch 4/5 loss: 0.1540  digit_1_loss: 0.0428  digit_2_loss: 0.0454  direct_prediction_loss: 0.0659  direct_prediction_accuracy: 0.9781  indirect_prediction_accuracy: 0.9889  val_loss: 0.1753  val_digit_1_loss: 0.0640  val_digit_2_loss: 0.0523  val_direct_prediction_loss: 0.0590  val_direct_prediction_accuracy: 0.9776  val_indirect_prediction_accuracy: 0.9876
Epoch 5/5 loss: 0.1253  digit_1_loss: 0.0333  digit_2_loss: 0.0337  direct_prediction_loss: 0.0583  direct_prediction_accuracy: 0.9806  indirect_prediction_accuracy: 0.9914  val_loss: 0.1596  val_digit_1_loss: 0.0648  val_digit_2_loss: 0.0525  val_direct_prediction_loss: 0.0423  val_direct_prediction_accuracy: 0.9880  val_indirect_prediction_accuracy: 0.9908
loss: 0.1471  digit_1_loss: 0.0429  digit_2_loss: 0.0484  direct_prediction_loss: 0.0558  direct_prediction_accuracy: 0.9822  indirect_prediction_accuracy: 0.9900
python3 mnist_multiple.py batch_size=100
Epoch 1/5 loss: 2.1134  digit_1_loss: 0.8183  digit_2_loss: 0.8250  direct_prediction_loss: 0.4701  direct_prediction_accuracy: 0.7570  indirect_prediction_accuracy: 0.8735  val_loss: 0.4835  val_digit_1_loss: 0.1706  val_digit_2_loss: 0.0993  val_direct_prediction_loss: 0.2136  val_direct_prediction_accuracy: 0.9168  val_indirect_prediction_accuracy: 0.9700
Epoch 2/5 loss: 0.4881  digit_1_loss: 0.1379  digit_2_loss: 0.1396  direct_prediction_loss: 0.2107  direct_prediction_accuracy: 0.9159  indirect_prediction_accuracy: 0.9706  val_loss: 0.3022  val_digit_1_loss: 0.1047  val_digit_2_loss: 0.0659  val_direct_prediction_loss: 0.1316  val_direct_prediction_accuracy: 0.9500  val_indirect_prediction_accuracy: 0.9832
Epoch 3/5 loss: 0.2938  digit_1_loss: 0.0795  digit_2_loss: 0.0825  direct_prediction_loss: 0.1317  direct_prediction_accuracy: 0.9493  indirect_prediction_accuracy: 0.9825  val_loss: 0.2150  val_digit_1_loss: 0.0782  val_digit_2_loss: 0.0586  val_direct_prediction_loss: 0.0782  val_direct_prediction_accuracy: 0.9688  val_indirect_prediction_accuracy: 0.9888
Epoch 4/5 loss: 0.2026  digit_1_loss: 0.0547  digit_2_loss: 0.0607  direct_prediction_loss: 0.0872  direct_prediction_accuracy: 0.9693  indirect_prediction_accuracy: 0.9881  val_loss: 0.1970  val_digit_1_loss: 0.0750  val_digit_2_loss: 0.0543  val_direct_prediction_loss: 0.0676  val_direct_prediction_accuracy: 0.9748  val_indirect_prediction_accuracy: 0.9868
Epoch 5/5 loss: 0.1618  digit_1_loss: 0.0437  digit_2_loss: 0.0470  direct_prediction_loss: 0.0711  direct_prediction_accuracy: 0.9753  indirect_prediction_accuracy: 0.9893  val_loss: 0.1735  val_digit_1_loss: 0.0667  val_digit_2_loss: 0.0507  val_direct_prediction_loss: 0.0562  val_direct_prediction_accuracy: 0.9816  val_indirect_prediction_accuracy: 0.9896
loss: 0.1658  digit_1_loss: 0.0469  digit_2_loss: 0.0506  direct_prediction_loss: 0.0683  direct_prediction_accuracy: 0.9768  indirect_prediction_accuracy: 0.9884
cifar_competition
Deadline: Apr 05, 23:59 5 points+5 bonus
The goal of this assignment is to devise the best possible model for CIFAR10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR10.
The task is a competition. Everyone who submits a solution which achieves at least 60% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve at least ~73% on the development set to score 60% on the test set.
You may want to start with the cifar_competition.py template which generates the test set annotation in the required format.
cnn_manual
Deadline: Apr 12, 23:59 3 points
To pass this assignment, you need to manually implement the forward and backward
pass through a 2D convolutional layer. Start with the
cnn_manual.py
template, which construct a series of 2D convolutional layers with ReLU
activation and valid
padding, specified in the args.cnn
option.
The args.cnn
contains comma separater layer specifications in the format
filterskernel_sizestride
.
Of course, you cannot use any TensorFlow convolutional operation (instead,
implement the forward and backward pass using matrix multiplication and other
operations) nor the GradientTape
for gradient computation.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 cnn_manual.py cnn=511
Dev accuracy after epoch 1 is 91.42
Dev accuracy after epoch 2 is 92.44
Dev accuracy after epoch 3 is 91.82
Dev accuracy after epoch 4 is 92.62
Dev accuracy after epoch 5 is 92.32
Test accuracy after epoch 5 is 90.73
python3 cnn_manual.py cnn=531
Dev accuracy after epoch 1 is 95.62
Dev accuracy after epoch 2 is 96.06
Dev accuracy after epoch 3 is 96.22
Dev accuracy after epoch 4 is 96.46
Dev accuracy after epoch 5 is 96.12
Test accuracy after epoch 5 is 95.73
python3 cnn_manual.py cnn=532
Dev accuracy after epoch 1 is 93.14
Dev accuracy after epoch 2 is 94.90
Dev accuracy after epoch 3 is 95.26
Dev accuracy after epoch 4 is 95.42
Dev accuracy after epoch 5 is 95.34
Test accuracy after epoch 5 is 95.01
python3 cnn_manual.py cnn=532,1032
Dev accuracy after epoch 1 is 95.00
Dev accuracy after epoch 2 is 96.40
Dev accuracy after epoch 3 is 96.42
Dev accuracy after epoch 4 is 96.84
Dev accuracy after epoch 5 is 97.16
Test accuracy after epoch 5 is 96.44
cags_classification
Deadline: Apr 12, 23:59 5 points+5 bonus
The goal of this assignment is to use pretrained EfficientNetB0 model to achieve best accuracy in CAGS classification.
The CAGS dataset consists
of images of cats and dogs of size $224×224$, each classified in one of
the 34 breeds and each containing a mask indicating the presence of the animal.
To load the dataset, use the cags_dataset.py
module. The dataset is stored in a
TFRecord file
and each element is encoded as a
tf.train.Example,
which is decoded using the CAGS.parse
method.
To load the EfficientNetB0, use the the provided
efficient_net.py
module. Its method pretrained_efficientnet_b0(include_top, dynamic_input_shape=False)
:
 downloads the pretrained weights if they are not found;
 it returns a
tf.keras.Model
processing image of shape $(224, 224, 3)$ with float values in range $[0, 1]$ and producing a list of results: the first value is the final network output:
 if
include_top == True
, the network will include the final classification layer and produce a distribution on 1000 classes (whose names are in imagenet_classes.py);  if
include_top == False
, the network will return image features (the result of the last global average pooling);
 if
 the rest of outputs are the intermediate results of the network just before a convolution with $\textit{stride} > 1$ is performed (denoted $C_5, C_4, C_3, C_2, C_1$ in the Object Detection lecture).
 the first value is the final network output:
An example performing classification of given images is available in image_classification.py.
A note on finetuning: each tf.keras.layers.Layer
has a mutable trainable
property indicating whether its variables should be updated – however, after
changing it, you need to call .compile
again (or otherwise make sure the list
of trainable variables for the optimizer is updated). Furthermore, training
argument passed to the invocation call decides whether the layer is executed in
training regime (neurons gets dropped in dropout, batch normalization computes
estimates on the batch) or in inference regime. There is one exception though
– if trainable == False
on a batch normalization layer, it runs in the
inference regime even when training == True
.
The task is a competition. Everyone who submits a solution which achieves at least 90% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_classification.py template which generates the test set annotation in the required format.
mnist_web
You can try a
Javascriptbased demo of MNIST classification.
This demo uses a neural network trained in TensorFlow
using the mnist_web.py module,
whose output was converted for Tensorflow.js
with tensorflowjs_converter input_format=keras
command and is then utilized
by mnist_web.html.
cags_segmentation
Deadline: Apr 19, 23:59 5 points+5 bonus
The goal of this assignment is to use pretrained EfficientNetB0 model to
achieve best image segmentation IoU score on the CAGS dataset.
The dataset and the EfficientNetB0 is described in the cags_classification
assignment.
A mask is evaluated using intersection over union (IoU) metric, which is the
intersection of the gold and predicted mask divided by their union, and the
whole test set score is the average of its masks' IoU. A TensorFlow compatible
metric is implemented by the class MaskIoUMetric
of the
cags_dataset.py
module, which can also evaluate your predictions (either by running with
task=segmentation evaluate=path
arguments, or using its
evaluate_segmentation_file
method).
The task is a competition. Everyone who submits a solution which achieves at least 87% test set IoU gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_segmentation.py template, which generates the test set annotation in the required format – each mask should be encoded on a single line as a space separated sequence of integers indicating the length of alternating runs of zeros and ones.
3d_recognition
Deadline: Apr 19, 23:59 5 points+5 bonus
Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data or 32×32×32 data. To load the dataset, use the modelnet.py module.
The official dataset offers only train and test sets, with the test set having a different distributions of labels. Our dataset contains also a development set, which has nearly the same label distribution as the test set.
The task is a competition. Everyone who submits a solution which achieves at least 87% test set accuracy gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You can start with the 3d_recognition.py template, which among others generates test set annotations in the required format.
bboxes_utils
Deadline: Apr 26, 23:59 2 points
This is a preparatory assignment for svhn_competition
. The goal is to
implement several bounding box manipulation routines in the
bboxes_utils.py
module. Notably, you need to implement the following methods:
bboxes_to_fast_rcnn
: convert given bounding boxes to a Fast RCNNlike representation relative to the given anchors;bboxes_from_fast_rcnn
: convert Fast RCNNlike representations relative to given anchors back to bounding boxes;bboxes_training
: given a list of anchors and gold objects, assign gold objects to anchors and generate suitable training data (the exact algorithm is described in the template).
The bboxes_utils.py contains simple unit tests, which are evaluated when executing the module, which you can use to check the validity of your implementation.
When submitting to ReCodEx, the method main
is executed, returning the
implemented bboxes_to_fast_rcnn
, bboxes_to_fast_rcnn
and bboxes_training
methods. These methods are then executed and compared to the reference
implementation.
svhn_competition
Deadline: Apr 26, 23:59; noncompetition part extended to May 03 5 points+5 bonus
The goal of this assignment is to implement a system performing object recognition, optionally utilizing pretrained EfficientNetB0 backbone.
The Street View House Numbers (SVHN) dataset
annotates for every photo all digits appearing on it, including their bounding
boxes. The dataset can be loaded using the svhn_dataset.py
module. Similarly to the CAGS
dataset, it is stored in a
TFRecord file
with tf.train.Example
elements. Every element is a dictionary with the following keys:
"image"
: a square 3channel image,"classes"
: a 1D tensor with all digit labels appearing in the image,"bboxes"
: a[num_digits, 4]
2D tensor with bounding boxes of every digit in the image.
Given that the dataset elements are each of possibly different size and you want
to preprocess them using bboxes_training
, it might be more comfortable to
convert the dataset to NumPy. Alternatively, you can implement bboxes_training
using TensorFlow operations or call Numpy implementation of bboxes_training
directly in tf.data.Dataset.map
by using tf.numpy_function
,
see FAQ.
Similarly to the cags_classification
, you can load the EfficientNetB0 using the provided
efficient_net.py
module. Note that the dynamic_input_shape=True
argument creates
a model capable of processing an input image of any size.
Each test set image annotation consists of a sequence of space separated
fivetuples label top left bottom right, and the annotation is considered
correct, if exactly the gold digits are predicted, each with IoU at least 0.5.
The whole test set score is then the prediction accuracy of individual images.
You can again evaluate your predictions using the
svhn_dataset.py
module, either by running with evaluate=path
arguments, or using its
evaluate_file
method.
The task is a competition. Everyone who submits a solution which achieves at least 20% test set IoU gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that I usually need at least 35% development set accuracy to achieve the required test set performance.
You should start with the svhn_competition.py template, which generates the test set annotation in the required format.
A baseline solution can use RetinaNetlike single stage detector,
using only a single level of convolutional features (no FPN)
with singlescale and singleaspect anchors. Focal loss is available
as tfa.losses.SigmoidFocalCrossEntropy
(using reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE
option is a good
idea) and nonmaximum suppression as
tf.image.non_max_suppression or
tf.image.combined_non_max_suppression.
sequence_classification
Deadline: May 03, 23:59 3 points
The goal of this assignment is to introduce recurrent neural networks. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with onehot representation of small integer.
Your goal is to modify the sequence_classification.py template and implement the following:
 Use specified RNN type (
SimpleRNN
,GRU
andLSTM
) and dimensionality.  Process the sequence using the required RNN.
 Use additional hidden layer on the RNN outputs if requested.
 Implement gradient clipping if requested.
In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:
rnn_cell=SimpleRNN sequence_dim=1
,rnn_cell=GRU sequence_dim=1
,rnn_cell=LSTM sequence_dim=1
 the same as above but with
sequence_dim=2
 the same as above but with
sequence_dim=10
rnn_cell=LSTM hidden_layer=70 rnn_cell_dim=30 sequence_dim=30
and the same withclip_gradient=1
 the same as above but with
rnn_cell=SimpleRNN
 the same as above but with
rnn_cell=GRU hidden_layer=90
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 sequence_classification.py rnn_cell SimpleRNN epochs=5
Epoch 1/5 loss: 0.7008  accuracy: 0.5037  val_loss: 0.6926  val_accuracy: 0.5176
Epoch 2/5 loss: 0.6924  accuracy: 0.5165  val_loss: 0.6921  val_accuracy: 0.5217
Epoch 3/5 loss: 0.6920  accuracy: 0.5166  val_loss: 0.6913  val_accuracy: 0.5114
Epoch 4/5 loss: 0.6908  accuracy: 0.5193  val_loss: 0.6881  val_accuracy: 0.5157
Epoch 5/5 loss: 0.6863  accuracy: 0.5217  val_loss: 0.6793  val_accuracy: 0.5231
python3 sequence_classification.py rnn_cell GRU epochs=5
Epoch 1/5 loss: 0.6930  accuracy: 0.5109  val_loss: 0.6917  val_accuracy: 0.5157
Epoch 2/5 loss: 0.6905  accuracy: 0.5170  val_loss: 0.6823  val_accuracy: 0.5143
Epoch 3/5 loss: 0.6342  accuracy: 0.5925  val_loss: 0.2222  val_accuracy: 0.9695
Epoch 4/5 loss: 0.1759  accuracy: 0.9760  val_loss: 0.0930  val_accuracy: 0.9882
Epoch 5/5 loss: 0.0754  accuracy: 0.9938  val_loss: 0.0381  val_accuracy: 0.9986
python3 sequence_classification.py rnn_cell LSTM epochs=5
Epoch 1/5 loss: 0.6931  accuracy: 0.5131  val_loss: 0.6927  val_accuracy: 0.5153
Epoch 2/5 loss: 0.6924  accuracy: 0.5158  val_loss: 0.6902  val_accuracy: 0.5156
Epoch 3/5 loss: 0.6874  accuracy: 0.5174  val_loss: 0.6748  val_accuracy: 0.5285
Epoch 4/5 loss: 0.5799  accuracy: 0.6247  val_loss: 0.0695  val_accuracy: 1.0000
Epoch 5/5 loss: 0.0482  accuracy: 1.0000  val_loss: 0.0183  val_accuracy: 1.0000
python3 sequence_classification.py rnn_cell LSTM epochs=5 hidden_layer=50
Epoch 1/5 loss: 0.6884  accuracy: 0.5129  val_loss: 0.6614  val_accuracy: 0.5309
Epoch 2/5 loss: 0.6544  accuracy: 0.5362  val_loss: 0.6378  val_accuracy: 0.5301
Epoch 3/5 loss: 0.6319  accuracy: 0.5482  val_loss: 0.5836  val_accuracy: 0.6181
Epoch 4/5 loss: 0.2933  accuracy: 0.8366  val_loss: 0.0030  val_accuracy: 0.9998
Epoch 5/5 loss: 0.0023  accuracy: 0.9999  val_loss: 0.0010  val_accuracy: 0.9999
python3 sequence_classification.py rnn_cell LSTM epochs=5 hidden_layer=50 clip_gradient=0.1
Epoch 1/5 loss: 0.6884  accuracy: 0.5130  val_loss: 0.6615  val_accuracy: 0.5302
Epoch 2/5 loss: 0.6544  accuracy: 0.5364  val_loss: 0.6373  val_accuracy: 0.5293
Epoch 3/5 loss: 0.6304  accuracy: 0.5517  val_loss: 0.5875  val_accuracy: 0.6107
Epoch 4/5 loss: 0.3835  accuracy: 0.7753  val_loss: 6.5897e04  val_accuracy: 1.0000
Epoch 5/5 loss: 0.0011  accuracy: 0.9999  val_loss: 1.6853e04  val_accuracy: 1.0000
tagger_we
Deadline: May 03, 23:59 3 points
In this assignment you will create a simple partofspeech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and partofspeech tag. The morpho_dataset.py module (down)loads the dataset and provides mappings between strings and integers.
Your goal is to modify the tagger_we.py template and implement the following:
 Use specified RNN cell type (
GRU
andLSTM
) and dimensionality.  Create word embeddings for training vocabulary.
 Process the sentences using bidirectional RNN.
 Predict partofspeech tags. Note that you need to properly handle sentences of different lengths in one batch using tf.RaggedTensors.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_we.py max_sentences=5000 rnn_cell=LSTM rnn_cell_dim=16
Epoch 1/5 loss: 1.9780  accuracy: 0.4436  val_loss: 0.5346  val_accuracy: 0.8354
Epoch 2/5 loss: 0.2443  accuracy: 0.9513  val_loss: 0.3686  val_accuracy: 0.8563
Epoch 3/5 loss: 0.0557  accuracy: 0.9893  val_loss: 0.3289  val_accuracy: 0.8735
Epoch 4/5 loss: 0.0333  accuracy: 0.9916  val_loss: 0.3430  val_accuracy: 0.8671
Epoch 5/5 loss: 0.0258  accuracy: 0.9936  val_loss: 0.3343  val_accuracy: 0.8736
loss: 0.3486  accuracy: 0.8737
python3 tagger_we.py max_sentences=5000 rnn_cell=GRU rnn_cell_dim=16
Epoch 1/5 loss: 1.6714  accuracy: 0.5524  val_loss: 0.3901  val_accuracy: 0.8744
Epoch 2/5 loss: 0.1312  accuracy: 0.9722  val_loss: 0.3210  val_accuracy: 0.8710
Epoch 3/5 loss: 0.0385  accuracy: 0.9898  val_loss: 0.3104  val_accuracy: 0.8817
Epoch 4/5 loss: 0.0261  accuracy: 0.9920  val_loss: 0.3056  val_accuracy: 0.8886
Epoch 5/5 loss: 0.0210  accuracy: 0.9933  val_loss: 0.3052  val_accuracy: 0.8925
loss: 0.3525  accuracy: 0.8788
tagger_cle
Deadline: May 03, 23:59 3 points
This assignment is a continuation of tagger_we
. Using the
tagger_cle.py
template, implement characterlevel word embedding computation using
a bidirectional characterlevel GRU.
Once submitted to ReCodEx, you should experiment with the effect of CLEs
compared to a plain tagger_we
, and the influence of their dimensionality. Note
that tagger_cle
has by default smaller word embeddings so that the size
of word representation (64 + 32 + 32) is the same as in the tagger_we
assignment.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_cle.py max_sentences=5000 rnn_cell=LSTM rnn_cell_dim=16 cle_dim=16
Epoch 1/5 loss: 1.8425  accuracy: 0.4607  val_loss: 0.4031  val_accuracy: 0.9008
Epoch 2/5 loss: 0.2080  accuracy: 0.9599  val_loss: 0.2516  val_accuracy: 0.9204
Epoch 3/5 loss: 0.0560  accuracy: 0.9882  val_loss: 0.2177  val_accuracy: 0.9286
Epoch 4/5 loss: 0.0335  accuracy: 0.9917  val_loss: 0.2155  val_accuracy: 0.9265
Epoch 5/5 loss: 0.0250  accuracy: 0.9935  val_loss: 0.1920  val_accuracy: 0.9363
loss: 0.2118  accuracy: 0.9289
python3 tagger_cle.py max_sentences=5000 rnn_cell=LSTM rnn_cell_dim=16 cle_dim=16 word_masking=0.1
Epoch 1/5 loss: 1.8989  accuracy: 0.4426  val_loss: 0.4616  val_accuracy: 0.8798
Epoch 2/5 loss: 0.3442  accuracy: 0.9155  val_loss: 0.2408  val_accuracy: 0.9265
Epoch 3/5 loss: 0.1503  accuracy: 0.9605  val_loss: 0.1994  val_accuracy: 0.9364
Epoch 4/5 loss: 0.1040  accuracy: 0.9706  val_loss: 0.1847  val_accuracy: 0.9427
Epoch 5/5 loss: 0.0892  accuracy: 0.9728  val_loss: 0.1882  val_accuracy: 0.9401
loss: 0.2029  accuracy: 0.9361
tagger_competition
Deadline: May 03, 23:59 4 points+5 bonus
In this assignment, you should extend tagger_cle
into a realworld Czech partofspeech tagger. We will use
Czech PDT dataset loadable using the morpho_dataset.py
module. Note that the dataset contains more than 1500 unique POS tags and that
the POS tags have a fixed structure of 15 positions (so it is possible to
generate the POS tag characters independently).
You can use the following additional data in this assignment:
 You can use outputs of a morphological analyzer loadable with morpho_analyzer.py. If a word form in train, dev or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
 You can use any unannotated text data (Wikipedia, Czech National Corpus, …), and also any pretrained word embeddings (assuming they were trained on plain texts).
The task is a competition. Everyone who submits a solution a solution with at least 92% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 95.89% from Spoustová et al., 2009.
You can start with the
tagger_competition.py
template, which among others generates test set annotations in the required format. Note that
you can evaluate the predictions as usual using the morpho_dataset.py
module, either by running with task=tagger evaluate=path
arguments, or using its
evaluate_file
method.
tensorboard_projector
You can try exploring the TensorBoard Projector with pretrained embeddings
for 20k most frequent lemmas in
Czech
and English
– after extracting the archive, start
tensorboard logdir dir_where_the_archive_is_extracted
.
In order to use the Projector tab yourself, you can take inspiration from the projector_export.py script, which was used to export the above pretrained embeddings from the Word2vec format.
tagger_crf
Deadline: May 10, 23:59 2 points
This assignment is an extension of tagger_we
task. Using the
tagger_crf.py
template, implement named entity recognition using CRF loss and CRF decoding
from the tensorflow_addons
package.
The evaluation is performed using the provided metric computing F1 score of the span prediction (i.e., a recognized possiblymultiword named entity is true positive if both the entity type and the span exactly match).
In practice, characterlevel embeddings (and also pretrained word embeddings) would be used to obtain superior results.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_crf.py max_sentences=5000 rnn_cell=LSTM rnn_cell_dim=24
Epoch 1/5 loss: 18.5475  val_f1: 0.0248
Epoch 2/5 loss: 9.8655  val_f1: 0.2207
Epoch 3/5 loss: 6.0053  val_f1: 0.3370
Epoch 4/5 loss: 3.1784  val_f1: 0.4000
Epoch 5/5 loss: 1.6535  val_f1: 0.4363
python3 tagger_crf.py max_sentences=5000 rnn_cell=GRU rnn_cell_dim=24
Epoch 1/5 loss: 17.7499  val_f1: 0.1624
Epoch 2/5 loss: 8.3992  val_f1: 0.4048
Epoch 3/5 loss: 3.7579  val_f1: 0.4444
Epoch 4/5 loss: 1.5298  val_f1: 0.4496
Epoch 5/5 loss: 0.7858  val_f1: 0.4769
speech_recognition
Deadline: May 10, 23:59 5 points+5 bonus
This assignment is a competition task in speech recognition area. Specifically,
your goal is to predict a sequence of letters given a spoken utterance.
We will be using Czech recordings from the Common Voice,
with input sound waves passed through the usual preprocessing – computing
Melfrequency cepstral coefficients (MFCCs).
You can repeat this preprocessing on a given audio using the wav_decode
and
mfcc_extract
methods from the
common_voice_cs.py module.
This module can also load the dataset, downloading it when necessary (note that
it has 200MB, so it might take a while). Furthermore, you can listen to the
development portion of the dataset.
This is an opendata task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.
The task is a competition.
The evaluation is performed by computing the edit distance to the gold letter
sequence, normalized by its length (a corresponding Keras metric
EditDistanceMetric
is provided by the common_voice_cs.py).
Everyone who submits a solution with at most 50% test set edit distance
gets 5 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. Note that
you can evaluate the predictions as usual using the common_voice_cs.py
module, either by running with evaluate=path
arguments, or using its
evaluate_file
method.
Start with the speech_recognition.py template which contains instructions for using the CTC loss and generates the test set annotation in the required format.
tagger_crf_manual
Deadline: May 17, 23:59 1 points
This assignment is an extension of tagger_crf
, where we will perform the CRF
loss computation (but not CRF decoding) manually.
The tagger_crf_manual.py
template is nearly identical to tagger_crf
, the only difference is the
crf_loss
method, where you should manually implement the CRF loss.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_crf_manual.py max_sentences=5000 rnn_cell=LSTM rnn_cell_dim=24
Epoch 1/5 loss: 18.5475  val_f1: 0.0248
Epoch 2/5 loss: 9.8655  val_f1: 0.2207
Epoch 3/5 loss: 6.0053  val_f1: 0.3370
Epoch 4/5 loss: 3.1784  val_f1: 0.4000
Epoch 5/5 loss: 1.6535  val_f1: 0.4363
python3 tagger_crf_manual.py max_sentences=5000 rnn_cell=GRU rnn_cell_dim=24
Epoch 1/5 loss: 17.7499  val_f1: 0.1624
Epoch 2/5 loss: 8.3992  val_f1: 0.4048
Epoch 3/5 loss: 3.7579  val_f1: 0.4444
Epoch 4/5 loss: 1.5298  val_f1: 0.4496
Epoch 5/5 loss: 0.7858  val_f1: 0.4769
lemmatizer_noattn
Deadline: May 17, 23:59 3 points
The goal of this assignment is to create a simple lemmatizer. For training
and evaluation, we use the same dataset as in tagger_we
loadable by the
updated morpho_dataset.py
module.
Your goal is to modify the lemmatizer_noattn.py template and implement the following:
 Embed characters of source forms and run a bidirectional GRU encoder.
 Embed characters of target lemmas.
 Implement a training time decoder which uses gold target characters as inputs.
 Implement an inference time decoder which uses previous predictions as inputs.
 The initial state of both decoders is the output state of the corresponding GRU encoded form.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 lemmatizer_noattn.py max_sentences=1000 batch_size=2 cle_dim=24 rnn_dim=24 epochs=3
Epoch 1/3 loss: 2.5645  val_loss: 0.0000e+00  val_accuracy: 0.1372
Epoch 2/3 loss: 1.9879  val_loss: 0.0000e+00  val_accuracy: 0.2061
Epoch 3/3 loss: 1.4119  val_loss: 0.0000e+00  val_accuracy: 0.2874
loss: 0.0000e+00  accuracy: 0.2921
python3 lemmatizer_noattn.py max_sentences=500 batch_size=2 cle_dim=32 rnn_dim=32 epochs=3
Epoch 1/3 loss: 2.5907  val_loss: 0.0000e+00  val_accuracy: 0.1206
Epoch 2/3 loss: 2.1792  val_loss: 0.0000e+00  val_accuracy: 0.2160
Epoch 3/3 loss: 1.5338  val_loss: 0.0000e+00  val_accuracy: 0.2590
loss: 0.0000e+00  accuracy: 0.2653
lemmatizer_attn
Deadline: May 17, 23:59 3 points
This task is a continuation of the lemmatizer_noattn
assignment. Using the
lemmatizer_attn.py
template, implement the following features in addition to lemmatizer_noattn
:
 The bidirectional GRU encoder returns outputs for all input characters, not just the last.
 Implement attention in the decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply nonlinearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.
Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 lemmatizer_attn.py max_sentences=1000 batch_size=2 cle_dim=24 rnn_dim=24 epochs=3
Epoch 1/3 loss: 2.4224  val_loss: 0.0000e+00  val_accuracy: 0.1627
Epoch 2/3 loss: 1.8042  val_loss: 0.0000e+00  val_accuracy: 0.2574
Epoch 3/3 loss: 0.9277  val_loss: 0.0000e+00  val_accuracy: 0.2998
loss: 0.0000e+00  accuracy: 0.3083
python3 lemmatizer_attn.py max_sentences=500 batch_size=2 cle_dim=32 rnn_dim=32 epochs=3
Epoch 1/3 loss: 2.6011  val_loss: 0.0000e+00  val_accuracy: 0.1232
Epoch 2/3 loss: 2.1855  val_loss: 0.0000e+00  val_accuracy: 0.2124
Epoch 3/3 loss: 1.4435  val_loss: 0.0000e+00  val_accuracy: 0.2649
loss: 0.0000e+00  accuracy: 0.2815
lemmatizer_competition
Deadline: May 17, 23:59 4 points+5 bonus
In this assignment, you should extend lemmatizer_noattn
or lemmatizer_attn
into a realworld Czech lemmatizer. As in tagger_competition
, we will use
Czech PDT dataset loadable using the morpho_dataset.py
module.
You can also use the following additional data as in the tagger_competition
assignment.
The task is a competition. Everyone who submits a solution a solution with at least 96% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing preneuralnetwork stateoftheart of 97.86%.
You can start with the
lemmatizer_competition.py
template, which among others generates test set annotations in the required format. Note that
you can evaluate the predictions as usual using the morpho_dataset.py
module, either by running with task=lemmatizer evaluate=path
arguments, or using its
evaluate_file
method.
tagger_transformer
Deadline: May 24, 23:59 3 points
This assignment is a continuation of tagger_we
. Using the
tagger_transformer.py
template, implement a Transformer encoder.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 tagger_transformer.py max_sentences=5000 transformer_layers=0
Epoch 1/5 loss: 1.9822  accuracy: 0.4003  val_loss: 0.8465  val_accuracy: 0.7235
Epoch 2/5 loss: 0.6168  accuracy: 0.8283  val_loss: 0.5454  val_accuracy: 0.8280
Epoch 3/5 loss: 0.2757  accuracy: 0.9528  val_loss: 0.4380  val_accuracy: 0.8416
Epoch 4/5 loss: 0.1424  accuracy: 0.9761  val_loss: 0.4046  val_accuracy: 0.8468
Epoch 5/5 loss: 0.0869  accuracy: 0.9843  val_loss: 0.3934  val_accuracy: 0.8480
loss: 0.4082  accuracy: 0.8472
python3 tagger_transformer.py max_sentences=5000 transformer_heads=1
Epoch 1/5 loss: 1.6145  accuracy: 0.4919  val_loss: 0.4468  val_accuracy: 0.8265
Epoch 2/5 loss: 0.1648  accuracy: 0.9494  val_loss: 0.5082  val_accuracy: 0.8356
Epoch 3/5 loss: 0.0470  accuracy: 0.9848  val_loss: 0.6596  val_accuracy: 0.8202
Epoch 4/5 loss: 0.0256  accuracy: 0.9909  val_loss: 0.5639  val_accuracy: 0.8291
Epoch 5/5 loss: 0.0187  accuracy: 0.9931  val_loss: 0.5991  val_accuracy: 0.8387
loss: 0.6571  accuracy: 0.8292
python3 tagger_transformer.py max_sentences=5000 transformer_heads=4
Epoch 1/5 loss: 1.6144  accuracy: 0.4935  val_loss: 0.4483  val_accuracy: 0.8250
Epoch 2/5 loss: 0.1598  accuracy: 0.9522  val_loss: 0.5113  val_accuracy: 0.8374
Epoch 3/5 loss: 0.0449  accuracy: 0.9853  val_loss: 0.7293  val_accuracy: 0.8174
Epoch 4/5 loss: 0.0267  accuracy: 0.9906  val_loss: 0.7311  val_accuracy: 0.8071
Epoch 5/5 loss: 0.0189  accuracy: 0.9931  val_loss: 0.6877  val_accuracy: 0.8417
loss: 0.8193  accuracy: 0.8206
python3 tagger_transformer.py max_sentences=5000 transformer_heads=4 transformer_dropout=0.1
Epoch 1/5 loss: 1.7227  accuracy: 0.4576  val_loss: 0.4702  val_accuracy: 0.8175
Epoch 2/5 loss: 0.2176  accuracy: 0.9332  val_loss: 0.4847  val_accuracy: 0.8403
Epoch 3/5 loss: 0.0621  accuracy: 0.9813  val_loss: 0.6176  val_accuracy: 0.8063
Epoch 4/5 loss: 0.0385  accuracy: 0.9869  val_loss: 0.5598  val_accuracy: 0.8232
Epoch 5/5 loss: 0.0312  accuracy: 0.9893  val_loss: 0.6466  val_accuracy: 0.8203
loss: 0.7229  accuracy: 0.8065
sentiment_analysis
Deadline: May 31, 23:59 3 points
Perform sentiment analysis on Czech Facebook data using provided pretrained
Czech Electra small. The dataset consists of pairs of (document, label)
and can be (down)loaded using the
text_classification_dataset.py
module. When loading the dataset, a tokenizer
might be provided, and if it is,
the document is also passed through the tokenizer and the resulting tokens are
added to the dataset.
Even though this assignment is not a competition, your goal is to submit test
set annotations with at least 77% accuracy. As usual, you can evaluate your
predictions using the text_classification_dataset.py
module, either by running with evaluate=path
arguments, or using its
evaluate_file
method.
Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.
You can start with the sentiment_analysis.py template, which among others loads the Electra Czech model and generates test set annotations in the required format. Note that bert_example.py module illustrate the usage of both the Electra tokenizer and the Electra model.
reading_comprehension
Deadline: May 31, 23:59; noncompetition part extended to Jun 30 Jul 7
5 points+5 bonus
May 27 Update: The evaluation was changed and is now performed only on nonempty answers. In other words, you do not need to decide if the answer is or is not in the context, but just to provide a best nonempty answer. However, the data was not modified, so you should ignore training data questions without answers during training (for development and test sets, provide predictions on the whole set, and the evaluation script will consider only the ones where the gold answers exist.)
Implement the best possible model for reading comprehension task using a translated version of the SQuAD 2.0 dataset, utilizing the provided pretrained Czech Electra small.
The dataset can be loaded using the
reading_comprehension_dataset.py
module. The loaded dataset is the direct reprentation of the data and not yet
ready to be directly trained on. Each of the train
, dev
and test
datasets
are composed of a list of paragraphs, each consisting of:
context
: text with the information;qas
: list of questions and answers, where each item consists of:question
: text of the question;answers
: a list of answers, each answer is composed of:text
: string of the text, exactly as appearing in the context;start
: character offset of the answer text in the context.
Note that a question might not be answerable given the context, in which case
the list of answers is empty. In the train
and dev
sets, each question has
at most one answer, while in the test
set there might be several answers.
We evaluate the reading comprehension task using accuracy, where an answer is
considered correct if its text is exactly equal to some correct answer.
You can evaluate your predictions as usual with the
reading_comprehension_dataset.py
module, either by running with evaluate=path
arguments, or using its
evaluate_file
method.
The task is a competition. Everyone who submits a solution
a solution with at least 49% answer accuracy gets 5 points; the rest 5 points
will be distributed depending on relative ordering of your solutions. Note that
usually achieving 47% on the dev
set is enough to get 49% on the test
set (because of multiple references in the test
set).
Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.
You can start with the reading_comprehension.py template, which among others (down)loads the data and Czech Electra small model, and describes the format of the required test set annotations.
vae
Deadline: Jun 30 Jul 7, 23:59
3 points
In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format. Your goal is to modify the vae.py template and implement a VAE.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
different latent variable dimensionality (z_dim=2
and z_dim=100
).
The generated images are available in TensorBoard logs.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 vae.py dataset=mnist z_dim=2 epochs=3
Epoch 1/3 reconstruction_loss: 0.2159  latent_loss: 2.4693  loss: 174.2038
Epoch 2/3 reconstruction_loss: 0.1928  latent_loss: 2.7937  loss: 156.7730
Epoch 3/3 reconstruction_loss: 0.1868  latent_loss: 2.9350  loss: 152.3162
python3 vae.py dataset=mnist z_dim=100 epochs=3
Epoch 1/3 reconstruction_loss: 0.1837  latent_loss: 0.1378  loss: 157.7933
Epoch 2/3 reconstruction_loss: 0.1319  latent_loss: 0.1847  loss: 121.9125
Epoch 3/3 reconstruction_loss: 0.1209  latent_loss: 0.1903  loss: 113.7889
python3 vae.py dataset=mnistfashion z_dim=2 epochs=3
Epoch 1/3 reconstruction_loss: 0.3539  latent_loss: 2.9950  loss: 283.4177
Epoch 2/3 reconstruction_loss: 0.3324  latent_loss: 3.0159  loss: 266.6620
Epoch 3/3 reconstruction_loss: 0.3288  latent_loss: 3.0269  loss: 263.8320
python3 vae.py dataset=mnistfashion z_dim=100 epochs=3
Epoch 1/3 reconstruction_loss: 0.3400  latent_loss: 0.1183  loss: 278.3589
Epoch 2/3 reconstruction_loss: 0.3088  latent_loss: 0.1061  loss: 252.7133
Epoch 3/3 reconstruction_loss: 0.3029  latent_loss: 0.1086  loss: 248.3083
python3 vae.py dataset=mnistcifarcars z_dim=2 epochs=3
Epoch 1/3 reconstruction_loss: 0.6373  latent_loss: 1.9468  loss: 503.5290
Epoch 2/3 reconstruction_loss: 0.6307  latent_loss: 2.0624  loss: 498.5606
Epoch 3/3 reconstruction_loss: 0.6292  latent_loss: 2.1156  loss: 497.5026
python3 vae.py dataset=mnistcifarcars z_dim=100 epochs=3
Epoch 1/3 reconstruction_loss: 0.6359  latent_loss: 0.0577  loss: 504.3351
Epoch 2/3 reconstruction_loss: 0.6164  latent_loss: 0.0714  loss: 490.4035
Epoch 3/3 reconstruction_loss: 0.6097  latent_loss: 0.0860  loss: 486.5849
gan
Deadline: Jun 30 Jul 7, 23:59
2 points
In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format. Your goal is to modify the gan.py template and implement a GAN.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
) and
maybe try different latent variable dimensionality. The generated images are
available in TensorBoard logs.
You can also continue with dcgan
assignment.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 gan.py dataset=mnist z_dim=2 epochs=5
Epoch 1/5 discriminator_loss: 0.0811  generator_loss: 5.2954  loss: 1.7356  discriminator_accuracy: 0.9826
Epoch 2/5 discriminator_loss: 0.0776  generator_loss: 3.8221  loss: 1.3290  discriminator_accuracy: 0.9926
Epoch 3/5 discriminator_loss: 0.0686  generator_loss: 4.3589  loss: 1.3821  discriminator_accuracy: 0.9920
Epoch 4/5 discriminator_loss: 0.0694  generator_loss: 4.4692  loss: 1.4952  discriminator_accuracy: 0.9910
Epoch 5/5 discriminator_loss: 0.0668  generator_loss: 4.5452  loss: 1.5248  discriminator_accuracy: 0.9919
python3 gan.py dataset=mnist z_dim=100 epochs=5
Epoch 1/5 discriminator_loss: 0.0526  generator_loss: 5.6836  loss: 1.5494  discriminator_accuracy: 0.9826
Epoch 2/5 discriminator_loss: 0.0333  generator_loss: 5.9819  loss: 1.9048  discriminator_accuracy: 0.9978
Epoch 3/5 discriminator_loss: 0.0660  generator_loss: 5.0259  loss: 1.7150  discriminator_accuracy: 0.9934
Epoch 4/5 discriminator_loss: 0.1227  generator_loss: 4.9251  loss: 1.8218  discriminator_accuracy: 0.9871
Epoch 5/5 discriminator_loss: 0.2496  generator_loss: 4.0308  loss: 1.4528  discriminator_accuracy: 0.9609
python3 gan.py dataset=mnistfashion z_dim=2 epochs=5
Epoch 1/5 discriminator_loss: 0.1560  generator_loss: 12.4313  loss: 1.6760  discriminator_accuracy: 0.9788
Epoch 2/5 discriminator_loss: 0.1748  generator_loss: 21.1818  loss: 10.1500  discriminator_accuracy: 0.9644
Epoch 3/5 discriminator_loss: 0.0691  generator_loss: 11.8005  loss: 5.7323  discriminator_accuracy: 0.9919
Epoch 4/5 discriminator_loss: 0.0429  generator_loss: 15.0839  loss: 5.9234  discriminator_accuracy: 0.9928
Epoch 5/5 discriminator_loss: 0.0687  generator_loss: 9.5255  loss: 2.9274  discriminator_accuracy: 0.9906
python3 gan.py dataset=mnistfashion z_dim=100 epochs=5
Epoch 1/5 discriminator_loss: 0.0710  generator_loss: 7.7963  loss: 1.8059  discriminator_accuracy: 0.9803
Epoch 2/5 discriminator_loss: 0.0728  generator_loss: 7.2306  loss: 2.4866  discriminator_accuracy: 0.9910
Epoch 3/5 discriminator_loss: 0.1112  generator_loss: 5.6444  loss: 1.8976  discriminator_accuracy: 0.9852
Epoch 4/5 discriminator_loss: 0.1899  generator_loss: 4.5056  loss: 1.6542  discriminator_accuracy: 0.9748
Epoch 5/5 discriminator_loss: 0.3114  generator_loss: 4.0829  loss: 1.5674  discriminator_accuracy: 0.9381
python3 gan.py dataset=mnistcifarcars z_dim=2 epochs=5
Epoch 1/5 discriminator_loss: 0.7178  generator_loss: 4.3867  loss: 0.9027  discriminator_accuracy: 0.8721
Epoch 2/5 discriminator_loss: 0.3499  generator_loss: 4.4815  loss: 2.1730  discriminator_accuracy: 0.9631
Epoch 3/5 discriminator_loss: 0.7672  generator_loss: 2.7376  loss: 1.2015  discriminator_accuracy: 0.8301
Epoch 4/5 discriminator_loss: 0.6904  generator_loss: 2.9754  loss: 1.2297  discriminator_accuracy: 0.8599
Epoch 5/5 discriminator_loss: 0.8773  generator_loss: 2.4737  loss: 1.1036  discriminator_accuracy: 0.7979
python3 gan.py dataset=mnistcifarcars z_dim=100 epochs=5
Epoch 1/5 discriminator_loss: 0.5299  generator_loss: 4.1585  loss: 1.2538  discriminator_accuracy: 0.8787
Epoch 2/5 discriminator_loss: 0.6910  generator_loss: 2.3183  loss: 0.9271  discriminator_accuracy: 0.8682
Epoch 3/5 discriminator_loss: 1.1221  generator_loss: 1.9830  loss: 1.1333  discriminator_accuracy: 0.7479
Epoch 4/5 discriminator_loss: 1.3696  generator_loss: 1.0735  loss: 0.8271  discriminator_accuracy: 0.6637
Epoch 5/5 discriminator_loss: 1.4549  generator_loss: 0.9048  loss: 0.7935  discriminator_accuracy: 0.5939
dcgan
Deadline: Jun 30 Jul 7, 23:59
1 points
This task is a continuation of the gan
assignment, which you will modify to
implement the Deep Convolutional GAN (DCGAN).
Start with the
dcgan.py
template and implement a DCGAN. Note that most of the TODO notes are from
the gan
assignment.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnistfashion
, and mnistcifarcars
). However,
note that you will need a lot of computational power (preferably a GPU) to
generate the images; the example outputs below were also generated on a GPU,
which means the results are nondeterministic.
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 dcgan.py dataset=mnist z_dim=2 epochs=3
Epoch 1/3 discriminator_loss: 0.2638  generator_loss: 3.3597  loss: 0.9523  discriminator_accuracy: 0.9061
Epoch 2/3 discriminator_loss: 0.0299  generator_loss: 5.7561  loss: 1.7968  discriminator_accuracy: 0.9972
Epoch 3/3 discriminator_loss: 0.0197  generator_loss: 5.9106  loss: 1.8184  discriminator_accuracy: 0.9981
python3 dcgan.py dataset=mnist z_dim=100 epochs=3
Epoch 1/3 discriminator_loss: 0.2744  generator_loss: 3.3752  loss: 0.9341  discriminator_accuracy: 0.8809
Epoch 2/3 discriminator_loss: 0.0297  generator_loss: 5.6908  loss: 1.7981  discriminator_accuracy: 0.9954
Epoch 3/3 discriminator_loss: 0.0257  generator_loss: 6.2856  loss: 2.1166  discriminator_accuracy: 0.9974
python3 dcgan.py dataset=mnistfashion z_dim=2 epochs=3
Epoch 1/3 discriminator_loss: 0.3830  generator_loss: 2.5970  loss: 0.8996  discriminator_accuracy: 0.9198
Epoch 2/3 discriminator_loss: 0.2759  generator_loss: 3.3412  loss: 1.1519  discriminator_accuracy: 0.9545
Epoch 3/3 discriminator_loss: 0.2125  generator_loss: 3.9514  loss: 1.3584  discriminator_accuracy: 0.9681
python3 dcgan.py dataset=mnistfashion z_dim=100 epochs=3
Epoch 1/3 discriminator_loss: 0.4766  generator_loss: 2.4001  loss: 0.8588  discriminator_accuracy: 0.8763
Epoch 2/3 discriminator_loss: 0.4254  generator_loss: 2.8352  loss: 1.0735  discriminator_accuracy: 0.9250
Epoch 3/3 discriminator_loss: 0.3939  generator_loss: 3.0114  loss: 1.1252  discriminator_accuracy: 0.9285
python3 dcgan.py dataset=mnistcifarcars z_dim=2 epochs=3
Epoch 1/3 discriminator_loss: 0.8294  generator_loss: 1.4831  loss: 0.7460  discriminator_accuracy: 0.7689
Epoch 2/3 discriminator_loss: 0.4352  generator_loss: 2.4002  loss: 0.9303  discriminator_accuracy: 0.9297
Epoch 3/3 discriminator_loss: 0.3052  generator_loss: 3.0020  loss: 1.0943  discriminator_accuracy: 0.9627
python3 dcgan.py dataset=mnistcifarcars z_dim=100 epochs=3
Epoch 1/3 discriminator_loss: 1.1401  generator_loss: 1.0359  loss: 0.7335  discriminator_accuracy: 0.6756
Epoch 2/3 discriminator_loss: 0.8321  generator_loss: 1.5365  loss: 0.7724  discriminator_accuracy: 0.7945
Epoch 3/3 discriminator_loss: 0.5566  generator_loss: 2.2292  loss: 0.9219  discriminator_accuracy: 0.8965
monte_carlo
Deadline: Jun 30 Jul 7, 23:59
2 points
Solve the discretized CartPolev1 environment
environment from the OpenAI Gym using the Monte Carlo
reinforcement learning algorithm. The gym
environments have the followng
methods and properties:
observation_space
: the description of environment observationsaction_space
: the description of environment actionsreset() → new_state
: starts a new episodestep(action) → new_state, reward, done, info
: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environmentspecific informationrender()
: render current environment state
We additionaly extend the gym
environment by:
episode
: number of the current episode (zerobased)reset(start_evaluation=False) → new_state
: ifstart_evaluation
isTrue
, an evaluation is started
Once you finish training (which you indicate by passing start_evaluate=True
to reset
), your goal is to reach an average return of 475 during 100
evaluation episodes. Note that the environment prints your 100episode
average return each 10 episodes even during training.
You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.
During evaluation in ReCodEx, three different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce
Deadline: Jun 30 Jul 7, 23:59
2 points
Solve the continuous CartPolev1 environment
environment from the OpenAI Gym using the REINFORCE
algorithm. The continuous environment is very similar to the discrete one, except
that the states are vectors of realvalued observations with shape
env.observation_space.shape
.
Your goal is to reach an average return of 475 during 100 evaluation episodes. Start with the reinforce.py template.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce_baseline
Deadline: Jun 30 Jul 7, 23:59
2 points
This is a continuation of the reinforce
assignment.
Using the reinforce_baseline.py template, solve the CartPolev1 environment environment using the REINFORCE with baseline algorithm.
Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.
Your goal is to reach an average return of 475 during 100 evaluation episodes.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
reinforce_pixels
Deadline: Jun 30 Jul 7, 23:59
2 points
This is a continuation of the reinforce
or reinforce_baseline
assignments.
The supplied cart_pole_pixels_environment.py
generates a pixel representation of the CartPole
environment
as an $80×80$ image with three channels, with each channel representing one time step
(i.e., the current observation and the two previous ones).
To pass the assignment, you need to reach an average return of 400 in 100 evaluation episodes. During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 10 minutes.
You should probably train the model locally and submit the already pretrained model to ReCodEx.
You can start with the reinforce_pixels.py template using the correct environment.
learning_to_learn
Deadline: Jun 30 Jul 7, 23:59
4 points
Implement a simple variant of learningtolearn architecture. Utilizing the Omniglot dataset loadable using the omniglot_dataset.py module, the goal is to learn to classify a sequence of images using a custom hierarchy by employing external memory.
The inputs image sequences consists of args.classes
random chosen Omniglot
classes, each class being assigned a randomly chosen label. For every chosen
class, args.images_per_class
images are randomly selected. Apart from the
images, the input contain the random labels one step after the corresponding
images (with the first label being 1). The gold outputs are also the labels,
but without the onestep offset.
The input images should be passed through a CNN feature extraction module
and then processed using memory augmented LSTM controller; the external memory
contains enough memory cells, each with args.cell_size
units. In each step,
the controller emits:
args.read_heads
read keys, each used to perform a read from memory as a weighted combination of cells according to the softmax of cosine similarities of the read key and the memory cells; a write value, which is prepended to the memory (dropping the last cell).
These tests are identical to the ones in ReCodEx, apart from a different random seed. Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 learning_to_learn.py recodex train_episodes=160 test_episodes=160 epochs=3 classes=2
Epoch 1/3 loss: 0.8135  acc: 0.5100  acc1: 0.5254  acc2: 0.5250  acc5: 0.5102  acc10: 0.5086  val_loss: 0.6928  val_acc: 0.5000  val_acc1: 0.5000  val_acc2: 0.5000  val_acc5: 0.5000  val_acc10: 0.5000
Epoch 2/3 loss: 0.7014  acc: 0.4985  acc1: 0.4974  acc2: 0.4868  acc5: 0.4918  acc10: 0.5170  val_loss: 0.6914  val_acc: 0.5522  val_acc1: 0.7750  val_acc2: 0.6344  val_acc5: 0.5125  val_acc10: 0.4719
Epoch 3/3 loss: 0.6932  acc: 0.5045  acc1: 0.5233  acc2: 0.4772  acc5: 0.5386  acc10: 0.5403  val_loss: 0.6902  val_acc: 0.5416  val_acc1: 0.7500  val_acc2: 0.6125  val_acc5: 0.4844  val_acc10: 0.4781
python3 learning_to_learn.py recodex train_episodes=160 test_episodes=160 epochs=3 classes=5
Epoch 1/3 loss: 1.6601  acc: 0.1993  acc1: 0.2227  acc2: 0.1895  acc5: 0.1909  acc10: 0.2063  val_loss: 1.6094  val_acc: 0.2077  val_acc1: 0.2163  val_acc2: 0.2313  val_acc5: 0.2013  val_acc10: 0.1900
Epoch 2/3 loss: 1.6168  acc: 0.2089  acc1: 0.2090  acc2: 0.2406  acc5: 0.2214  acc10: 0.2048  val_loss: 1.6079  val_acc: 0.2027  val_acc1: 0.2500  val_acc2: 0.2125  val_acc5: 0.1937  val_acc10: 0.1900
Epoch 3/3 loss: 1.6129  acc: 0.2111  acc1: 0.2369  acc2: 0.2266  acc5: 0.1976  acc10: 0.2131  val_loss: 1.6066  val_acc: 0.2184  val_acc1: 0.3237  val_acc2: 0.2237  val_acc5: 0.2025  val_acc10: 0.2000
Note that your results may be slightly different, depending on your CPU type and whether you use GPU.
python3 learning_to_learn.py classes=2 epochs=20
Epoch 1/20 loss: 0.6769  acc: 0.5682  acc1: 0.6769  acc2: 0.5943  acc5: 0.5546  acc10: 0.5331  val_loss: 0.4930  val_acc: 0.7337  val_acc1: 0.5415  val_acc2: 0.6910  val_acc5: 0.7525  val_acc10: 0.8065
Epoch 2/20 loss: 0.3461  acc: 0.8278  acc1: 0.6054  acc2: 0.7646  acc5: 0.8629  acc10: 0.8790  val_loss: 0.2857  val_acc: 0.8681  val_acc1: 0.6345  val_acc2: 0.8355  val_acc5: 0.9050  val_acc10: 0.9270
Epoch 3/20 loss: 0.2061  acc: 0.9045  acc1: 0.6381  acc2: 0.8721  acc5: 0.9407  acc10: 0.9458  val_loss: 0.2420  val_acc: 0.8895  val_acc1: 0.6160  val_acc2: 0.8435  val_acc5: 0.9295  val_acc10: 0.9505
Epoch 4/20 loss: 0.1619  acc: 0.9242  acc1: 0.6459  acc2: 0.9057  acc5: 0.9607  acc10: 0.9680  val_loss: 0.1938  val_acc: 0.9122  val_acc1: 0.6420  val_acc2: 0.8815  val_acc5: 0.9585  val_acc10: 0.9630
Epoch 5/20 loss: 0.1340  acc: 0.9363  acc1: 0.6693  acc2: 0.9237  acc5: 0.9692  acc10: 0.9768  val_loss: 0.2057  val_acc: 0.9099  val_acc1: 0.6735  val_acc2: 0.8870  val_acc5: 0.9405  val_acc10: 0.9540
Epoch 10/20 loss: 0.0998  acc: 0.9510  acc1: 0.6949  acc2: 0.9545  acc5: 0.9833  acc10: 0.9855  val_loss: 0.1590  val_acc: 0.9273  val_acc1: 0.6585  val_acc2: 0.9055  val_acc5: 0.9690  val_acc10: 0.9735
Epoch 20/20 loss: 0.0739  acc: 0.9604  acc1: 0.7074  acc2: 0.9712  acc5: 0.9913  acc10: 0.9937  val_loss: 0.1510  val_acc: 0.9356  val_acc1: 0.6815  val_acc2: 0.9270  val_acc5: 0.9665  val_acc10: 0.9785
python3 learning_to_learn.py classes=5 epochs=20
Epoch 1/20 loss: 1.6013  acc: 0.2300  acc1: 0.3162  acc2: 0.2454  acc5: 0.2198  acc10: 0.2094  val_loss: 1.3712  val_acc: 0.3809  val_acc1: 0.3884  val_acc2: 0.3504  val_acc5: 0.3692  val_acc10: 0.4240
Epoch 2/20 loss: 1.1060  acc: 0.5052  acc1: 0.3377  acc2: 0.4164  acc5: 0.5215  acc10: 0.5802  val_loss: 0.8220  val_acc: 0.6575  val_acc1: 0.2498  val_acc2: 0.5318  val_acc5: 0.7168  val_acc10: 0.7626
Epoch 3/20 loss: 0.6655  acc: 0.7209  acc1: 0.2486  acc2: 0.5665  acc5: 0.7999  acc10: 0.8255  val_loss: 0.8701  val_acc: 0.6682  val_acc1: 0.2568  val_acc2: 0.5396  val_acc5: 0.7256  val_acc10: 0.7730
Epoch 4/20 loss: 0.5154  acc: 0.7879  acc1: 0.2612  acc2: 0.6505  acc5: 0.8734  acc10: 0.8924  val_loss: 0.6253  val_acc: 0.7506  val_acc1: 0.2554  val_acc2: 0.6304  val_acc5: 0.8302  val_acc10: 0.8462
Epoch 5/20 loss: 0.4474  acc: 0.8171  acc1: 0.2783  acc2: 0.7003  acc5: 0.9011  acc10: 0.9188  val_loss: 0.5924  val_acc: 0.7648  val_acc1: 0.2682  val_acc2: 0.6552  val_acc5: 0.8434  val_acc10: 0.8568
Epoch 10/20 loss: 0.3356  acc: 0.8611  acc1: 0.3086  acc2: 0.7996  acc5: 0.9382  acc10: 0.9466  val_loss: 0.6684  val_acc: 0.7719  val_acc1: 0.3100  val_acc2: 0.6982  val_acc5: 0.8192  val_acc10: 0.8752
Epoch 20/20 loss: 0.2499  acc: 0.8953  acc1: 0.3398  acc2: 0.8851  acc5: 0.9635  acc10: 0.9741  val_loss: 0.5017  val_acc: 0.8230  val_acc1: 0.3202  val_acc2: 0.7908  val_acc5: 0.8802  val_acc10: 0.9178
In the competitions, your goal is to train a model and then predict target values on the given unannotated test set.
Submitting to ReCodEx
When submitting a competition solution to ReCodEx, you can include any
number of files of any kind, and either submit them individually or
compess them in a .zip
file. However, there should be exactly one
text file with the test set annotation (.txt
) and at least one
Python source (.py/ipynb
) containing the model training and prediction.
The Python sources are not executed, but must be included for inspection.
Evaluation in ReCodEx

For every submission, ReCodEx checks the above conditions (exactly one
.txt
, at least one.py/ipynb
) and whether the given annotations can be evaluated without error. If not, it will report a corresponding error in the logs. 
Before the deadline, ReCodEx prints the exact achieved score, but only if it is worse than the baseline.
If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached score.

After the competition deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.

After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.
What Is Allowed
 You can use the given annotated training data in any way.
 You can use the given annotated development data for evaluation or hyperparameter tuning, but not for the training itself.
 Additionally, you can use any unannotated or manually created data for training and evaluation.
 The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like handwritten rules).
 Do not use test set annotations in any way, if you somehow get access to them.
 Unless stated otherwise, you can use any algorithm to solve the competition task at hand. The implementation should be either created by you or it can be based on some publicly available implementation, in which case you must reference it and you must understand it fully.
 If you utilize an already trained model, it must be trained only on the allowed training data, unless stated otherwise.
Install

Installing to central user packages repository
You can install all required packages to central user packages repository using
pip3 install user upgrade pip setuptools
followed bypip3 install user tensorflow==2.4.1 tensorflowaddons==0.12.1 tensorflowprobability==0.12.1 tensorflowhub==0.11.0 gym==0.18.0
. 
Installing to a virtual environment
Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running
python3 m venv VENV_DIR
and then install the required packages withVENV_DIR/bin/pip3 install upgrade pip setuptools
followed byVENV_DIR/bin/pip3 install tensorflow==2.4.1 tensorflowaddons==0.12.1 tensorflowprobability==0.12.1 tensorflowhub==0.11.0 gym==0.18.0
. 
Installing to MetaCentrum
As of Apr 2021, the minimum CUDA version across MetaCentrum is 10.2, and the highest officially available CUDA+cuDNN is also 10.2. Therefore, I have build TensorFlow 2.4.1 for CUDA 10.2 and cuDNN 7.6 to use on MetaCentrum.
During installation, start by using official Python 3.6 and CUDA+cuDNN packages via
module add python3.6.2gcc cuda/cuda10.2.89gcc6.3.034gtciz cudnn/cudnn7.6.5.3210.2linuxx64gcc6.3.0xqx4s5f
. Note that this command must be always executed before using the installed TensorFlow.Then create a virtual environment by
python3 m venv VENV_DIR
and install the required packages withVENV_DIR/bin/pip3 install upgrade pip setuptools
followed byVENV_DIR/bin/pip3 install https://ufal.mff.cuni.cz/~straka/packages/tf/2.4/metacentrum/tensorflow2.4.1cp36cp36mlinux_x86_64.whl https://ufal.mff.cuni.cz/~straka/packages/tf/2.4/metacentrum/tensorflow_addons0.12.1cp36cp36mlinux_x86_64.whl tensorflowprobability==0.12.1 tensorflowhub==0.11.0 gym==0.18.0
. 
Windows TensorFlow fails with ImportError: DLL load failed
If your Windows TensorFlow fails with
ImportError: DLL load failed
, you are probably missing Visual C++ 2019 Redistributable. 
Cannot start TensorBoard after installation
If
tensorboard
cannot be found, make sure the directory with pip installed packages is in your PATH (that directory is either in your virtual environment if you use a virtual environment, or it should be~/.local/bin
on Linux and%UserProfile%\AppData\Roaming\Python\Python3[57]
and%UserProfile%\AppData\Roaming\Python\Python3[57]\Scripts
on Windows).
Git

Is it possible to keep the solutions in a Git repository?
Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.

On GitHub, do not create a public fork with your solutions
If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.
Of course, if you just want to create a pull request, GitHub requires a public fork and that is fine – just do not store your solutions in it.

How to clone the course repository?
To clone the course repository, run
git clone https://github.com/ufal/npfl114
This creates the repository in the
npfl114
subdirectory; if you want a different name, add it as a last parameter.To update the repository, run
git pull
inside the repository directory. 
How to keep the course repository as a branch in your repository?
If you want to store the course repository just in a local branch of your existing repository, you can run the following command while in it:
git remote add upstream https://github.com/ufal/npfl114 git fetch upstream git checkout t upstream/master
This creates a branch
master
; if you want a different name, addb BRANCH_NAME
to the last command.In both cases, you can update your checkout by running
git pull
while in it. 
How to merge the course repository with your modifications?
If you want to store your solutions in a branch merged with the course repository, you should start by
git remote add upstream https://github.com/ufal/npfl114 git pull upstream master
which creates a branch
master
; if you want a different name, change the last argument tomaster:BRANCH_NAME
.You can then commit to this branch and push it to your repository.
To merge the current course repository with your branch, run
git merge upstream master
while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same place in the templates.
ReCodEx

What are the tests used by ReCodEx
The tests used by ReCodEx correspond to the examples from the course website (unless stated otherwise), but they use a different random seed (so the results are not the same), and sometimes they use smaller number of epochs/iterations to finish sooner.
Debugging

How to debug problems “inside” computation graphs with weird stack traces?
At the beginning of your program, run
tf.config.run_functions_eagerly(True)
The
tf.funcion
s (with the exception of the ones used intf.data
pipelines) are then not traced (i.e., no computation graphs are created) and the pure Python code is executed instead. 
How to debug problems “inside”
tf.data
pipelines with weird stack traces?Unfortunately, the solution above does not affect tracing in
tf.data
pipelines (for example intf.data.Dataset.map
). However, since TF 2.5, the commandtf.data.experimental.enable_debug_mode()
should disable any asynchrony, parallelism, or nondeterminism and forces Python execution (as opposed to tracecompiled graph execution) of userdefined functions passed into transformations such as
tf.data.Dataset.map
.
GPU

Requirements for using a GPU
To use an NVIDIA GPU with TensorFlow 2.4, you need to install CUDA 11.0 and cuDNN 8.0 – see the details about GPU support.

Errors when running with a GPU
If you encounter errors when running with a GPU:
 if you are using the GPU also for displaying, try using the following
environment variable:
export TF_FORCE_GPU_ALLOW_GROWTH=true
 you can rerun with
export TF_CPP_MIN_LOG_LEVEL=0
environmental variable, which increases verbosity of the log messages.
 if you are using the GPU also for displaying, try using the following
environment variable:
tf.ragged

Bug when RaggedTensors are used in backward/bidirectional direction and whole sequence is returned
In TF 2.4, RaggedTensors processed by backward (and therefore also by bidirectional) RNNs produce bad results when whole sequences are returned. (Producing only the last output or processing in forward direction is fine.) The problem has been fixed in the master branch and also in the TF 2.5 branch.
A workaround is to use the manual to/from dense tensor conversion described in the next point.

Slow RNNs when using RaggedTensors on GPU
Unfortunately, the current LSTM/GRU implementation does not use cuDNN acceleration when processing RaggedTensors. However, you can get around it by manually converting the RaggedTensors to dense before/after the layer, so when
inputs
is atf.RaggedTensor
, if
rnn
is atf.keras.layers.LSTM/GRU/RNN/Bidirectional
layer producing a single output, you can use the following workaround:outputs = rnn(inputs.to_tensor(), mask=tf.sequence_mask(inputs.row_lengths()))
 if
rnn
is atf.keras.layers.LSTM/GRU/RNN/Bidirectional
layer producing a whole sequence, in addition to the above line you also need to convert the dense result back to a RaggedTensor via for example:outputs = tf.RaggedTensor.from_tensor(outputs, inputs.row_lengths())
 if
tf.data

How to look what is in a
tf.data.Dataset
?The
tf.data.Dataset
is not just an array, but a description of a pipeline, which can produce data if requested. A simple way to run the pipeline is to iterate it using Python iterators:dataset = tf.data.Dataset.range(10) for entry in dataset: print(entry)

How to use
tf.data.Dataset
withmodel.fit
ormodel.evaluate
?To use a
tf.data.Dataset
in Keras, the dataset elements should be pairs(input_data, gold_labels)
, whereinput_data
andgold_labels
must be already batched. For example, givenCAGS
dataset, you can preprocess training data forcags_classification
as (for development data, you would remove the.shuffle
):train = cags.train.map(lambda example: (example["image"], example["label"])) train = train.shuffle(10000, seed=args.seed) train = train.batch(args.batch_size)

Is every iteration through a
tf.data.Dataset
the same?No. Because the dataset is only a pipeline generating data, it is called each time the dataset is iterated – therefore, every
.shuffle
is called in every iteration. 
How to generate different random numbers each epoch during
tf.data.Dataset.map
?When a global random seed is set, methods like
tf.random.uniform
generate the same sequence of numbers on each iteration.Instead, create a
Generator
object and use it to produce random numbers.generator = tf.random.Generator.from_seed(42) data = tf.data.Dataset.from_tensor_slices(tf.zeros(10, tf.int32)) data = data.map(lambda x: x + generator.uniform([], maxval=10, dtype=tf.int32)) for _ in range(3): print(*[element.numpy() for element in data])

How to call numpy methods or other nontf functions in
tf.data.Dataset.map
?You can use tf.numpy_function to call a numpy function even in a computational graph. However, the results have no static shape information and you need to set it manually – ideally using tf.ensure_shape, which both sets the static shape and verifies during execution that the real shape mathes it.
For example, to use the
bboxes_training
method from bboxes_utils, you could proceed as follows:anchors = np.array(...) def prepare_data(example): anchor_classes, anchor_bboxes = tf.numpy_function( bboxes_utils.bboxes_training, [anchors, example["classes"], example["bboxes"], 0.5], (tf.int32, tf.float32)) anchor_classes = tf.ensure_shape(anchor_classes, [len(anchors)]) anchor_bboxes = tf.ensure_shape(anchor_bboxes, [len(anchors), 4]) ...

How to use
ImageDataGenerator
intf.data.Dataset.map
?The
ImageDataGenerator
offers a.random_transform
method, so we can usetf.numpy_function
from the previous answer:train_generator = tf.keras.preprocessing.image.ImageDataGenerator(...) def augment(image, label): return tf.ensure_shape( tf.numpy_function(train_generator.random_transform, [image], tf.float32), image.shape ), label dataset.map(augment)
Finetuning

How to make a part of the network frozen, so that its weights are not updated?
Each
tf.keras.layers.Layer
/tf.keras.Model
has a mutabletrainable
property indicating whether its variables should be updated – however, after changing it, you need to call.compile
again (or otherwise make sure the list of trainable variables for the optimizer is updated).Note that once
trainable == False
, the insides of a layer are no longer considered, even if some its sublayers havetrainable == True
. Therefore, if you want to freeze only some sublayers of a layer you use in your model, the layer itself must havetrainable == True
. 
How to choose whether dropout/batch normalization is executed in training or inference regime?
When calling a
tf.keras.layers.Layer
/tf.keras.Model
, a named optiontraining
can be specified, indicating whether training or inference regime should be used. For a model, this option is automatically passed to its layers which require it, and Keras automatically passes it duringmodel.{fit,evaluate,predict}
.However, you can manually pass for example
training=False
to a layer when using Functional API, meaning that layer is executed in the inference regime even when the whole model is training. 
How does
trainable
andtraining
interact?The only layer, which is influenced by both these options, is batch normalization, for which:
 if
trainable == False
, the layer is always executed in inference regime;  if
trainable == True
, the training/inference regime is chosen according to thetraining
option.
 if
TensorBoard

How to create TensorBoard logs manually?
Start by creating a SummaryWriter using for example:
writer = tf.summary.create_file_writer(args.logdir, flush_millis=10 * 1000)
and then you can generate logs inside a
with writer.as_default()
block.You can either specify
step
manually in each call, or you can set it as the first argument ofas_default()
. Also, during training you usually want to log only some batches, so the logging block during training usually looks like:if optimizer.iterations % 100 == 0: with self._writer.as_default(step=optimizer.iterations): # logging

What can be logged in TensorBoard?
 scalar values:
tf.summary.scalar(name like "train/loss", value, [step])
 tensor values displayed as histograms or distributions:
tf.summary.histogram(name like "train/output_layer", tensor value castable to `tf.float64`, [step])
 images as tensors with shape
[num_images, h, w, channels]
, wherechannels
can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):tf.summary.image(name like "train/samples", images, [step], [max_outputs=at most this many images])
 possibly large amount of text (e.g., all hyperparameter values, sample
translations in MT, …) in Markdown format:
tf.summary.text(name like "hyperparameters", markdown, [step])
 audio as tensors with shape
[num_clips, samples, channels]
and values in $[1,1]$ range:tf.summary.audio(name like "train/samples", clips, sample_rate, [step], [max_outputs=at most this many clips])
 scalar values:
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and nonbonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments, you obtain additional 50 bonus points.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100point exam, to obtain grades 3, 2 and 1, respectively. (PhD students with binary grades require 75 points.) The exam consists of 100pointworth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count.
Exam Questions
Lecture 1 Questions

Considering a neural network with $D$ input neurons, a single hidden layer with $H$ neurons, $K$ output neurons, hidden activation $f$ and output activation $a$, list its parameters (including their size) and write down how is the output computed. [5]

List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]

Formulate the Universal approximation theorem. [5]
Lecture 2 Questions

Describe maximum likelihood estimation, as minimizing NLL, crossentropy and KL divergence. [10]

Define mean squared error and show how it can be derived using MLE. [5]

Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]

Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]

Write down the backpropagation algorithm. [5]

Write down the minibatch SGD algorithm with momentum. Then, formulate SGD with Nesterov momentum and show the difference between them. [5]

Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of $1/\sqrt{t}$ in step $t$. Then write down the RMSProp algorithm and explain how it solves the problem with the involuntary learning rate decay. [10]

Write down the Adam algorithm. Then show why the biascorrection terms $(1\beta^t)$ make the estimation of the first and second moment unbiased. [10]
Lecture 3 Questions

Considering a neural network with $D$ input neurons, a single ReLU hidden layer with $H$ units and softmax output layer with $K$ units, write down the formulas of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input $\boldsymbol x$, target $g$ and negative log likelihood loss. [10]

Assume a network with MSE loss generated a single output $o \in \mathbb{R}$, and the target output is $g$. What is the value of the loss function itself, and what is the gradient of the loss function with respect to $o$? [5]

Assume a network with crossentropy loss generated a single output $z \in \mathbb{R}$, which is passed through the sigmoid output activation function, producing $o = \sigma(z)$. If the target output is $g$, what is the value of the loss function itself, and what is the gradient of the loss function with respect to $z$? [5]

Assume a network with crossentropy loss generated a kelement output $\boldsymbol z \in \mathbb{R}^K$, which is passed through the softmax output activation function, producing $\boldsymbol o=\operatorname{softmax}(\boldsymbol z)$. If the target distribution is $\boldsymbol g$, what is the value of the loss function itself, and what is the gradient of the loss function with respect to $\boldsymbol z$? [5]

Define $L_2$ regularization and describe its effect both on the value of the loss function and on the value of the loss function gradient. [5]

Describe the dropout method and write down exactly how it is used during training and during inference. [5]

Describe how label smoothing works for crossentropy loss, both for sigmoid and softmax activations. [5]

How are weights and biases initialized using the default Glorot initialization? [5]
Lecture 4 Questions

Write down the equation of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$, the stride is $T \times S$, the operation performed is in fact crosscorrelation (as usual in convolutional neural networks) and that $O$ output channels are computed. [5]

Explain both
SAME
andVALID
padding schemes and write down the output size of a convolutional operation with an $N \times M$ kernel on image of size $H \times W$ for both these padding schemes (stride is 1). [5] 
Describe batch normalization and write down an algorithm how it is used during training and an algorithm how it is used during inference. Be sure to explicitly write over what is being normalized in case of fully connected layers, and in case of convolutional layers. [10]

Describe overall architecture of VGG19 (you do not need to remember the exact number of layers/filters, but you should describe which layers are used). [5]
Lecture 5 Questions

Describe overall architecture of ResNet. You do not need to remember the exact number of layers/filters, but you should draw a bottleneck block (including the applications of BatchNorms and ReLUs) and state how residual connections work when the number of channels increases. [10]

Draw the original ResNet block (including the exact positions of BatchNorms and ReLUs) and also the improved variant with full preactivation. [5]

Compare the bottleneck block of ResNet and ResNeXt architectures (draw the latter using convolutions only, i.e., do not use grouped convolutions). [5]

Describe the CNN regularization method of networks with stochastic depth. [5]

Compare Cutout and BlockDrop. [5]

Describe Squeeze and Excitation applied to a ResNet block. [5]

Draw the Mobile inverted bottleneck block (including explanation of separable convolutions, the expansion factor, exact positions of BatchNorms and ReLUs, but without describing Squeeze and excitation bocks). [5]

Assume an input image $I$ of size $H \times W$ with $C$ channels, and a convolutional kernel $K$ with size $N \times M$, stride $S$ and $O$ output channels. Write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5]
Lecture 7 Questions

Write down how $\mathit{AP}_{50}$ is computed. [5]

Considering a FastRCNN architecture, draw overall network architecture, explain what a RoIpooling layer is, show how the network parametrizes bounding boxes and write down the loss. Finally, describe nonmaximum suppression and how the FastRCNN prediction is performed. [10]

Considering a FasterRCNN architecture, describe the region proposal network (what are anchors, architecture including both heads, how are the coordinates of proposals parametrized, what does the loss look like). [10]

Considering MaskRCNN architecture, describe the additions to a FasterRCNN architecture (the RoIAlign layer, the new maskproducing head). [5]

Write down the focal loss with class weighting, including the commonly used hyperparameter values. [5]

Draw the overall architecture of a RetinaNet architecture (the FPN architecture including the block combining feature maps of different resolutions; the classification and bounding box generation heads, including their output size). [5]

Draw the BiFPN block architecture, including the positions of all convolutions (and what kind of CNN is used), BatchNorms and ReLUs. Finally describe how downscaling and upscaling is performed. [5]
Lecture 8 Questions

Write down how the Long ShortTerm Memory (LSTM) cell operates, including the explicit formulas. Also mention the forget gate bias. [10]

Write down how the Gated Recurrent Unit (GRU) operates, including the explicit formulas. [10]

Describe Highway network computation. [5]

Why the usual dropout cannot be used on recurrent state? Describe how can the problem be alleviated with variational dropout. [5]

Describe layer normalization and write down an algorithm how it is used during training and an algorithm how it is used during inference. [5]

Sketch a tagger architecture utilizing word embeddings, recurrent characterlevel word embeddings and two sentencelevel bidirectional RNNs with a residual connection. [10]
Lecture 9 Questions

Considering a linearchain CRF, write down how a score of a label sequence $\boldsymbol y$ is defined, and how can a log probability be computed using the label sequence scores. [5]

Write down the dynamic programming algorithm for computing log probability of a linearchain CRF, including its asymptotic complexity. [10]

Write down the dynamic programming algorithm for linearchain CRF decoding, i.e., an algorithm computing the most probable label sequence $\boldsymbol y$. [10]

In the context of CTC loss, describe regular and extended labelings and write down an algorithm for computing the log probability of a gold label sequence $\boldsymbol y$. [10]

Describe how are CTC predictions performed using a beamsearch. [5]

Draw the CBOW architecture from
word2vec
, including the sizes of the inputs and the sizes of the outputs and used nonlinearities. Also make sure to indicate where are the embeddings being trained. [5] 
Draw the SkipGram architecture from
word2vec
, including the sizes of the inputs and the sizes of the outputs and used nonlinearities. Also make sure to indicate where are the embeddings being trained. [5] 
Describe the hierarchical softmax used in
word2vec
. [5] 
Describe the negative sampling proposed in
word2vec
, including the choice of distribution of negative samples. [5]
Lecture 10 Questions

Draw a sequencetosequence architecture for machine translation, both during training and during inference (without attention). [5]

Draw a sequencetosequence architecture for machine translation used during training, including the attention. Then write down how exactly is the attention computed. [10]

Explain how can word embeddings tying be used in a sequencetosequence architecture. [5]

Write down why are subword units used in text processing, and describe the BPE algorithm for constructing a subword dictionary from a large corpus. [5]

Write down why are subword units used in text processing, and describe the WordPieces algorithm for constructing a subword dictionary from a large corpus. [5]

Pinpoint the differences between the BPE and WordPieces algorithms, both during dictionary construction and during inference. [5]
Lecture 11 Questions

Describe the Transformer encoder architecture, including the description of selfattention (but you do not need to describe multihead attention), FFN and positions of LNs and dropouts. [10]

Write down the formula of Transformer selfattention, and then describe multihead selfattention in detail. [10]

Describe the Transformer decoder architecture, including the description of selfattention and masked selfattention (but you do not need to describe multihead attention), FFN and positions of LNs and dropouts. Also discuss the difference between training and prediction regimes. [10]

Why are positional embeddings needed in Transformer architecture? Write down the sinusoidal positional embeddings used in the Transformer. [5]

Compare RNN to Transformer – what are the strengths and weaknesses of these architectures? [5]

Explain how are ELMo embeddings trained and how are they used in downstream applications. [5]

Describe the BERT architecture (you do not need to describe the (multihead) selfattention operation). Elaborate also on what positional embeddings are used and what are the GELU activations. [10]

Describe the GELU activations and explain why are they a combination of ReLUs and Dropout. [5]

Elaborate on BERT training process (what are the two objectives used and how exactly are the corresponding losses computed). [10]

What alternatives to
Next Sentence Prediction
are proposed in RoBERTa and in ALBERT? [5]
Lecture 12 Questions

Write down the variational lower bound (ELBO) in the form of a reconstruction error minus the KL divergence between the encoder and the prior. Then prove it is actually a lower bound on probability $\log P(\boldsymbol x)$ (you can use Jensen's inequality if you want). [10]

Draw an architecture of a variational autoencoder (VAE). Pay attention to the parametrization of the distribution from the encoder (including the used activation functions), and show how to perform latent variable sampling so that it is differentiable with respect to the encoder parameters (the reparametrization trick). [10]

Write down the minmax formulation of generative adversarial network (GAN) objective. Then describe what loss is actually used for training the generator in order to avoid vanishing gradients at the beginning of the training. [5]

Write down the training algorithm of generative adversarial networks (GAN), including the losses minimized by the discriminator and the generator. Be sure to use the version of generator loss which avoids vanishing gradients at the beginning of the training. [10]

Explain how the class label is used when training a conditional generative adversarial network (CGAN). [5]

Illustrate that alternating SGD steps are not guaranteed to converge for a minmax problem. [5]
Lecture 13 Questions

Show how to incrementally update a running average (how to compute an average of $N$ numbers using the average of the first $N1$ numbers). [5]

Describe multiarm bandits and write down the $\epsilon$greedy algorithm for solving it. [5]

Define the Markov Decision Process, including the definition of the return. [5]

Define the value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]

Define the actionvalue function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]

Express the value function using the actionvalue function, and express the actionvalue function using the value function. [5]

Define the optimal value function and the optimal actionvalue function. Then define optimal policy in such a way that its existence is guaranteed. [5]

Write down the MonteCarlo onpolicy everyvisit $\epsilon$soft algorithm. [10]

Formulate the policy gradient theorem. [5]

Prove the part of the policy gradient theorem showing the value of $\nabla_{\boldsymbol\theta} v_\pi(s)$. [10]

Assuming the policy gradient theorem, formulate the loss used by the REINFORCE algorithm and show how can its gradient be expressed as an expectation over states and actions. [5]

Write down the REINFORCE algorithm. [10]

Show that introducing baseline does not influence validity of the policy gradient theorem. [5]

Write down the REINFORCE with baseline algorithm. [10]
Lecture 14 Questions

Sketch the overall structure and training procedure of the Neural Architecture Search. You do not need to describe how exactly is the block produced by the controller. [5]

Draw the WaveNet architecture (show the overall architecture, explain dilated convolutions, write down the gated activations, describe global and local conditioning). [10]

Define the Mixture of Logistic distribution used in the Teacher model of Parallel WaveNet, including the explicit formula of computing the likelihood of the data. [5]

Describe the changes in the Student model of Parallel WaveNet, which allow efficient sampling (how does the latent prior look like, how the output data distribution is modeled in a single iteration and then after multiple iterations). [5]

Describe the addressing mechanism used in Neural Turing Machines – show the overall structure including the required parameters, and explain content addressing, interpolation with location addressing, shifting and sharpening. [10]

Explain the overall architecture of a Neural Turing Machine with an LSTM controller, assuming $R$ reading heads and one write head. Describe the inputs and outputs of the LSTM controller itself, then how the memory is read from and written to, and how the final output is computed. You do not need to write down the implementation of the addressing mechanism (you can assume it is a function which gets parameters, memory and previous distribution, and computes a new distribution over memory cells). [10]