Deep Learning – Summer 2020/21

In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.

The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in image classification, object detection, lemmatization, speech recognition or 3D object recognition). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.

About

SIS code: NPFL114
Semester: summer
E-credits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

  • lectures: Czech lecture is held on Monday 9:50 in S5, English lecture on Monday 13:10 in S5; first lecture is on Mar 1
  • practicals: there are two parallel practicals, a Czech one on Tuesday 10:40 in S9, and an English ones on Tuesday 9:00 in S9; first practicals are on Mar 2
  • consultations: voluntary consultations regarding the assignments or other issues are held regularly on Tuesday 14:00 in SU1

All lectures and practicals will be recorded and available on this website.

Given the pandemic situation, all lectures and practicals are currently held online.


Lectures

1. Introduction to Deep Learning Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

2. Training Neural Networks Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

3. Training Neural Networks II Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions mnist_regularization mnist_ensemble uppercase

4. Convolutional Neural Networks Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions mnist_cnn image_augmentation tf_dataset mnist_multiple cifar_competition

5. Convolutional Neural Networks II Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions cnn_manual cags_classification

6. Easter Monday CZ Practicals EN Practicals EN Consultations mnist_web cags_segmentation 3d_recognition

7. Object Detection Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions bboxes_utils svhn_competition

8. Recurrent Neural Networks Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals EN Consultations Questions sequence_classification tagger_we tagger_cle tagger_competition

9. CRF, CRC, Word2Vec Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tensorboard_projector tagger_crf speech_recognition

10. Seq2seq, NMT, Transformer Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tagger_crf_manual lemmatizer_noattn lemmatizer_attn lemmatizer_competition

11. Transformer, BERT Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tagger_transformer sentiment_analysis reading_comprehension

12. Deep Generative Models Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions vae gan dcgan

13. Introduction to Deep Reinforcement Learning Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions monte_carlo reinforce reinforce_baseline reinforce_pixels

14. NASNet, Speech Synthesis, External Memory Networks Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions learning_to_learn

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Deep Learning

 Mar 01 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

  • Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
  • Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
  • Gaussian distribution [Section 3.9.3 of DLB]
  • Machine Learning Basics [Section 5.1-5.1.3 of DLB]
  • History of Deep Learning [Section 1.2 of DLB]
  • Linear regression [Section 5.1.4 of DLB]
  • Challenges Motivating Deep Learning [Section 5.11 of DLB]
  • Neural network basics
    • Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
    • Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
    • Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
    • Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
  • Universal approximation theorem

2. Training Neural Networks

 Mar 08 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

  • Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
  • Hyperparameters and validation sets [Section 5.3 of DLB]
  • Maximum Likelihood Estimation [Section 5.5 of DLB]
  • Neural network training
    • Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
    • Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.1 and 6.2; note that Algorithms 6.5 and 6.6 are used in practice]
    • SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
    • SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
    • SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
    • Optimization algorithms with adaptive gradients
      • AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
      • RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
      • Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

3. Training Neural Networks II

 Mar 15 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions mnist_regularization mnist_ensemble uppercase

  • Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 10-12]
  • Regularization [Chapter 7 until Section 7.1 of DLB]
    • Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
    • L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
    • Dataset augmentation [Section 7.4 of DLB]
    • Ensembling [Section 7.11 of DLB]
    • Dropout [Section 7.12 of DLB]
    • Label smoothing [Section 7.5.1 of DLB]
  • Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
  • Parameter initialization strategies [Section 8.4 of DLB]
  • Gradient clipping [Section 10.11.1 of DLB]

4. Convolutional Neural Networks

 Mar 22 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions mnist_cnn image_augmentation tf_dataset mnist_multiple cifar_competition

5. Convolutional Neural Networks II

 Mar 29 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions cnn_manual cags_classification

6. Easter Monday

 Apr 05 CZ Practicals EN Practicals EN Consultations mnist_web cags_segmentation 3d_recognition

7. Object Detection

 Apr 12 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions bboxes_utils svhn_competition

8. Recurrent Neural Networks

 Apr 19 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals EN Consultations Questions sequence_classification tagger_we tagger_cle tagger_competition

9. CRF, CRC, Word2Vec

 Apr 26 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tensorboard_projector tagger_crf speech_recognition

10. Seq2seq, NMT, Transformer

 May 03 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tagger_crf_manual lemmatizer_noattn lemmatizer_attn lemmatizer_competition

11. Transformer, BERT

 May 10 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions tagger_transformer sentiment_analysis reading_comprehension

12. Deep Generative Models

 May 17 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions vae gan dcgan

13. Introduction to Deep Reinforcement Learning

 May 24 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions monte_carlo reinforce reinforce_baseline reinforce_pixels

Study material for Reinforcement Learning is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (reffered to as RLB), available online.

  • Multi-armed bandits [Sections 2-2.4 of RLB]
  • Markov Decision Process [Sections 3-3.3 of RLB]
  • Policies and Value Functions [Sections 3.5 of RLB]
  • Monte Carlo Methods [Sections 5-5.4 of RLB]
  • Policy Gradient Methods [Sections 13-13.1 of RLB]
  • Policy Gradient Theorem [Section 13.2 of RLB]
  • REINFORCE algorithm [Section 13.3 of RLB]
  • REINFORCE with baseline algorithm [Section 13.4 of RLB]

14. NASNet, Speech Synthesis, External Memory Networks

 May 31 Slides PDF Slides CZ Lecture EN Lecture CZ Practicals EN Practicals Questions learning_to_learn

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments, you obtain additional 50 bonus points.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.8, TensorFlow 2.4.1, TensorFlow Addons 0.12.1, TensorFlow Probability 0.12.1, TensorFlow Hub 0.11.0 and OpenAI Gym 0.18.0. You should install the exact version of these packages yourselves.

Teamwork

Solving assignments in teams of size 2 or 3 is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments by itself, without using code it did not write (unless explicitly allowed). Of course, inside a team you are expected to share code and submit indentical solutions.

numpy_entropy

 Deadline: Mar 15, 23:59  3 points

The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file numpy_entropy_data.txt, whose lines consist of data points of our dataset, and load numpy_entropy_model.txt, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability).

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

  • entropy H(data distribution)
  • cross-entropy H(data distribution, model distribution)
  • KL-divergence DKL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

For data distribution file numpy_entropy_data.txt

A
BB
A
A
BB
A
CCC

and model distribution file numpy_entropy_model.txt

A	0.5
BB	0.3
CCC	0.1
D	0.1

the output should be

Entropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats

If we remove the CCC 0.1 line from the model distribution, the output should change to

Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats

pca_first

 Deadline: Mar 15, 23:59  2 points

The goal of this exercise is to familiarize with TensorFlow tf.Tensors, shapes and basic tensor manipulation methods. Start with the pca_first.py.

In this assignment, you will compute the covariance matrix of several examples from the MNIST dataset, compute the first principal component and quantify the explained variance of it.

It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 pca_first.py --examples=1024 --iterations=64
Total variance: 53.12
Explained variance: 9.64
  • python3 pca_first.py --examples=8192 --iterations=128
Total variance: 53.05
Explained variance: 9.89
  • python3 pca_first.py --examples=55000 --iterations=1024
Total variance: 52.74
Explained variance: 9.71

mnist_layers_activations

 Deadline: Mar 15, 23:59  2 points

Before solving the assignment, start by playing with example_keras_tensorboard.py, in order to familiarize with TensorFlow and TensorBoard. Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs. Then open http://localhost:6006 in a browser and explore the active tabs.

Your goal is to modify the mnist_layers_activations.py template and implement the following:

  • A number of hidden layers (including zero) can be specified on the command line using parameter hidden_layers.
  • Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.
  • Print the final accuracy on the test set.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_layers_activations.py --hidden_layers=0 --activation=none
Epoch  1/10 loss: 0.8272 - accuracy: 0.7869 - val_loss: 0.2755 - val_accuracy: 0.9308
Epoch  2/10 loss: 0.3328 - accuracy: 0.9089 - val_loss: 0.2419 - val_accuracy: 0.9342
Epoch  3/10 loss: 0.2995 - accuracy: 0.9165 - val_loss: 0.2269 - val_accuracy: 0.9392
Epoch  4/10 loss: 0.2886 - accuracy: 0.9197 - val_loss: 0.2219 - val_accuracy: 0.9414
Epoch  5/10 loss: 0.2778 - accuracy: 0.9222 - val_loss: 0.2202 - val_accuracy: 0.9430
Epoch  6/10 loss: 0.2745 - accuracy: 0.9234 - val_loss: 0.2171 - val_accuracy: 0.9416
Epoch  7/10 loss: 0.2669 - accuracy: 0.9246 - val_loss: 0.2152 - val_accuracy: 0.9420
Epoch  8/10 loss: 0.2615 - accuracy: 0.9263 - val_loss: 0.2159 - val_accuracy: 0.9424
Epoch  9/10 loss: 0.2561 - accuracy: 0.9280 - val_loss: 0.2156 - val_accuracy: 0.9404
Epoch 10/10 loss: 0.2596 - accuracy: 0.9270 - val_loss: 0.2146 - val_accuracy: 0.9434
loss: 0.2637 - accuracy: 0.9259
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=none
Epoch  1/10 loss: 0.5384 - accuracy: 0.8430 - val_loss: 0.2438 - val_accuracy: 0.9350
Epoch  2/10 loss: 0.2951 - accuracy: 0.9166 - val_loss: 0.2332 - val_accuracy: 0.9350
Epoch  3/10 loss: 0.2816 - accuracy: 0.9217 - val_loss: 0.2359 - val_accuracy: 0.9306
Epoch  4/10 loss: 0.2808 - accuracy: 0.9225 - val_loss: 0.2283 - val_accuracy: 0.9384
Epoch  5/10 loss: 0.2705 - accuracy: 0.9227 - val_loss: 0.2341 - val_accuracy: 0.9370
Epoch  6/10 loss: 0.2718 - accuracy: 0.9234 - val_loss: 0.2333 - val_accuracy: 0.9388
Epoch  7/10 loss: 0.2669 - accuracy: 0.9253 - val_loss: 0.2223 - val_accuracy: 0.9412
Epoch  8/10 loss: 0.2595 - accuracy: 0.9281 - val_loss: 0.2471 - val_accuracy: 0.9342
Epoch  9/10 loss: 0.2573 - accuracy: 0.9270 - val_loss: 0.2293 - val_accuracy: 0.9368
Epoch 10/10 loss: 0.2615 - accuracy: 0.9264 - val_loss: 0.2318 - val_accuracy: 0.9400
loss: 0.2795 - accuracy: 0.9241
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu
Epoch  1/10 loss: 0.5379 - accuracy: 0.8500 - val_loss: 0.1459 - val_accuracy: 0.9612
Epoch  2/10 loss: 0.1563 - accuracy: 0.9553 - val_loss: 0.1128 - val_accuracy: 0.9682
Epoch  3/10 loss: 0.1052 - accuracy: 0.9697 - val_loss: 0.0966 - val_accuracy: 0.9714
Epoch  4/10 loss: 0.0792 - accuracy: 0.9765 - val_loss: 0.0864 - val_accuracy: 0.9744
Epoch  5/10 loss: 0.0627 - accuracy: 0.9814 - val_loss: 0.0818 - val_accuracy: 0.9768
Epoch  6/10 loss: 0.0500 - accuracy: 0.9857 - val_loss: 0.0829 - val_accuracy: 0.9772
Epoch  7/10 loss: 0.0394 - accuracy: 0.9881 - val_loss: 0.0747 - val_accuracy: 0.9792
Epoch  8/10 loss: 0.0328 - accuracy: 0.9905 - val_loss: 0.0746 - val_accuracy: 0.9788
Epoch  9/10 loss: 0.0239 - accuracy: 0.9934 - val_loss: 0.0845 - val_accuracy: 0.9762
Epoch 10/10 loss: 0.0231 - accuracy: 0.9936 - val_loss: 0.0806 - val_accuracy: 0.9778
loss: 0.0829 - accuracy: 0.9773
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh
Epoch  1/10 loss: 0.5338 - accuracy: 0.8483 - val_loss: 0.1668 - val_accuracy: 0.9570
Epoch  2/10 loss: 0.1855 - accuracy: 0.9478 - val_loss: 0.1262 - val_accuracy: 0.9648
Epoch  3/10 loss: 0.1271 - accuracy: 0.9640 - val_loss: 0.1001 - val_accuracy: 0.9724
Epoch  4/10 loss: 0.0966 - accuracy: 0.9716 - val_loss: 0.0918 - val_accuracy: 0.9738
Epoch  5/10 loss: 0.0742 - accuracy: 0.9784 - val_loss: 0.0813 - val_accuracy: 0.9774
Epoch  6/10 loss: 0.0605 - accuracy: 0.9832 - val_loss: 0.0811 - val_accuracy: 0.9750
Epoch  7/10 loss: 0.0471 - accuracy: 0.9872 - val_loss: 0.0759 - val_accuracy: 0.9774
Epoch  8/10 loss: 0.0385 - accuracy: 0.9902 - val_loss: 0.0761 - val_accuracy: 0.9762
Epoch  9/10 loss: 0.0298 - accuracy: 0.9929 - val_loss: 0.0783 - val_accuracy: 0.9766
Epoch 10/10 loss: 0.0257 - accuracy: 0.9945 - val_loss: 0.0788 - val_accuracy: 0.9744
loss: 0.0822 - accuracy: 0.9751
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
Epoch  1/10 loss: 0.8219 - accuracy: 0.7952 - val_loss: 0.2150 - val_accuracy: 0.9400
Epoch  2/10 loss: 0.2485 - accuracy: 0.9301 - val_loss: 0.1632 - val_accuracy: 0.9562
Epoch  3/10 loss: 0.1864 - accuracy: 0.9477 - val_loss: 0.1322 - val_accuracy: 0.9636
Epoch  4/10 loss: 0.1513 - accuracy: 0.9560 - val_loss: 0.1163 - val_accuracy: 0.9676
Epoch  5/10 loss: 0.1235 - accuracy: 0.9646 - val_loss: 0.1041 - val_accuracy: 0.9718
Epoch  6/10 loss: 0.1069 - accuracy: 0.9702 - val_loss: 0.0957 - val_accuracy: 0.9722
Epoch  7/10 loss: 0.0889 - accuracy: 0.9746 - val_loss: 0.0887 - val_accuracy: 0.9746
Epoch  8/10 loss: 0.0774 - accuracy: 0.9785 - val_loss: 0.0869 - val_accuracy: 0.9756
Epoch  9/10 loss: 0.0641 - accuracy: 0.9832 - val_loss: 0.0845 - val_accuracy: 0.9760
Epoch 10/10 loss: 0.0594 - accuracy: 0.9842 - val_loss: 0.0805 - val_accuracy: 0.9772
loss: 0.0862 - accuracy: 0.9741
  • python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu
Epoch  1/10 loss: 0.4989 - accuracy: 0.8471 - val_loss: 0.1121 - val_accuracy: 0.9688
Epoch  2/10 loss: 0.1168 - accuracy: 0.9645 - val_loss: 0.1028 - val_accuracy: 0.9692
Epoch  3/10 loss: 0.0784 - accuracy: 0.9756 - val_loss: 0.1176 - val_accuracy: 0.9654
Epoch  4/10 loss: 0.0586 - accuracy: 0.9810 - val_loss: 0.0860 - val_accuracy: 0.9732
Epoch  5/10 loss: 0.0451 - accuracy: 0.9849 - val_loss: 0.0867 - val_accuracy: 0.9778
Epoch  6/10 loss: 0.0398 - accuracy: 0.9869 - val_loss: 0.0884 - val_accuracy: 0.9782
Epoch  7/10 loss: 0.0303 - accuracy: 0.9898 - val_loss: 0.0797 - val_accuracy: 0.9818
Epoch  8/10 loss: 0.0256 - accuracy: 0.9917 - val_loss: 0.0892 - val_accuracy: 0.9796
Epoch  9/10 loss: 0.0218 - accuracy: 0.9930 - val_loss: 0.1074 - val_accuracy: 0.9732
Epoch 10/10 loss: 0.0220 - accuracy: 0.9927 - val_loss: 0.0821 - val_accuracy: 0.9796
loss: 0.0883 - accuracy: 0.9779
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu
Epoch  1/10 loss: 0.6597 - accuracy: 0.7806 - val_loss: 0.1348 - val_accuracy: 0.9622
Epoch  2/10 loss: 0.1533 - accuracy: 0.9561 - val_loss: 0.1172 - val_accuracy: 0.9670
Epoch  3/10 loss: 0.1154 - accuracy: 0.9680 - val_loss: 0.0991 - val_accuracy: 0.9708
Epoch  4/10 loss: 0.0912 - accuracy: 0.9737 - val_loss: 0.1112 - val_accuracy: 0.9704
Epoch  5/10 loss: 0.0758 - accuracy: 0.9795 - val_loss: 0.1060 - val_accuracy: 0.9732
Epoch  6/10 loss: 0.0729 - accuracy: 0.9794 - val_loss: 0.1077 - val_accuracy: 0.9730
Epoch  7/10 loss: 0.0647 - accuracy: 0.9825 - val_loss: 0.0921 - val_accuracy: 0.9734
Epoch  8/10 loss: 0.0554 - accuracy: 0.9845 - val_loss: 0.0994 - val_accuracy: 0.9756
Epoch  9/10 loss: 0.0503 - accuracy: 0.9871 - val_loss: 0.1114 - val_accuracy: 0.9720
Epoch 10/10 loss: 0.0470 - accuracy: 0.9875 - val_loss: 0.1084 - val_accuracy: 0.9740
loss: 0.1119 - accuracy: 0.9736
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
Epoch  1/10 loss: 2.3115 - accuracy: 0.1026 - val_loss: 1.8614 - val_accuracy: 0.2174
Epoch  2/10 loss: 1.8910 - accuracy: 0.1963 - val_loss: 1.8708 - val_accuracy: 0.2064
Epoch  3/10 loss: 1.8796 - accuracy: 0.1998 - val_loss: 1.8007 - val_accuracy: 0.2030
Epoch  4/10 loss: 1.8249 - accuracy: 0.2047 - val_loss: 1.4527 - val_accuracy: 0.3074
Epoch  5/10 loss: 1.2759 - accuracy: 0.4293 - val_loss: 0.8859 - val_accuracy: 0.6154
Epoch  6/10 loss: 0.9357 - accuracy: 0.5910 - val_loss: 0.8584 - val_accuracy: 0.6884
Epoch  7/10 loss: 0.8281 - accuracy: 0.6777 - val_loss: 0.6917 - val_accuracy: 0.7296
Epoch  8/10 loss: 0.7334 - accuracy: 0.7111 - val_loss: 0.6801 - val_accuracy: 0.7124
Epoch  9/10 loss: 0.7111 - accuracy: 0.7132 - val_loss: 0.7223 - val_accuracy: 0.6916
Epoch 10/10 loss: 0.6875 - accuracy: 0.7243 - val_loss: 0.6183 - val_accuracy: 0.7850
loss: 0.6737 - accuracy: 0.7623

sgd_backpropagation

 Deadline: Mar 22, 23:59  3 points

In this exercise you will learn how to compute gradients using the so-called automatic differentiation, which is implemented by an automated backpropagation algorithm in TensorFlow. You will then perform training by running manually implemented minibatch stochastic gradient descent.

Starting with the sgd_backpropagation.py template, you should:

  • implement a neural network with a single tanh hidden layer and categorical output layer;
  • compute the crossentropy loss;
  • use tf.GradientTape to automatically compute the gradient of the loss with respect to all variables;
  • perform the SGD update.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 sgd_backpropagation.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Test accuracy after epoch 5 is 94.60
  • python3 sgd_backpropagation.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Test accuracy after epoch 5 is 95.31

sgd_manual

 Deadline: Mar 22, 23:59  2 points

The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.

While in this assignment we compute the gradient manually, we will nearly always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise than a real-world application. Furthermore, we will compute the derivatives together on the Mar 16 practicals.

Start with the sgd_manual.py template, which is based on sgd_backpropagation.py one. Be aware that these templates generates each a different output file.

In order to check that you do not use automatic differentiation, ReCodEx checks that you do not use tf.GradientTape in your solution.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 sgd_manual.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Test accuracy after epoch 5 is 94.60
  • python3 sgd_manual.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Test accuracy after epoch 5 is 95.31

mnist_training

 Deadline: Mar 22, 23:59  3 points

This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:

  • Using specified optimizer (either SGD or Adam).
  • Optionally using momentum for the SGD optimizer.
  • Using specified learning rate for the optimizer.
  • Optionally use a given learning rate schedule. The schedule can be either exponential or polynomial (with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.01
Epoch  1/10 loss: 1.2077 - accuracy: 0.6998 - val_loss: 0.3662 - val_accuracy: 0.9146
Epoch  2/10 loss: 0.4205 - accuracy: 0.8871 - val_loss: 0.2848 - val_accuracy: 0.9258
Epoch  3/10 loss: 0.3458 - accuracy: 0.9038 - val_loss: 0.2496 - val_accuracy: 0.9350
Epoch  4/10 loss: 0.3115 - accuracy: 0.9139 - val_loss: 0.2292 - val_accuracy: 0.9390
Epoch  5/10 loss: 0.2862 - accuracy: 0.9202 - val_loss: 0.2131 - val_accuracy: 0.9426
Epoch  6/10 loss: 0.2698 - accuracy: 0.9231 - val_loss: 0.2003 - val_accuracy: 0.9464
Epoch  7/10 loss: 0.2489 - accuracy: 0.9296 - val_loss: 0.1881 - val_accuracy: 0.9500
Epoch  8/10 loss: 0.2344 - accuracy: 0.9331 - val_loss: 0.1821 - val_accuracy: 0.9522
Epoch  9/10 loss: 0.2203 - accuracy: 0.9385 - val_loss: 0.1715 - val_accuracy: 0.9560
Epoch 10/10 loss: 0.2130 - accuracy: 0.9397 - val_loss: 0.1650 - val_accuracy: 0.9572
loss: 0.1977 - accuracy: 0.9442
  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.01 --momentum=0.9
Epoch  1/10 loss: 0.5876 - accuracy: 0.8309 - val_loss: 0.1684 - val_accuracy: 0.9560
Epoch  2/10 loss: 0.1929 - accuracy: 0.9458 - val_loss: 0.1274 - val_accuracy: 0.9644
Epoch  3/10 loss: 0.1370 - accuracy: 0.9617 - val_loss: 0.1051 - val_accuracy: 0.9706
Epoch  4/10 loss: 0.1073 - accuracy: 0.9696 - val_loss: 0.0922 - val_accuracy: 0.9746
Epoch  5/10 loss: 0.0870 - accuracy: 0.9754 - val_loss: 0.0844 - val_accuracy: 0.9782
Epoch  6/10 loss: 0.0740 - accuracy: 0.9798 - val_loss: 0.0790 - val_accuracy: 0.9782
Epoch  7/10 loss: 0.0616 - accuracy: 0.9827 - val_loss: 0.0738 - val_accuracy: 0.9820
Epoch  8/10 loss: 0.0546 - accuracy: 0.9853 - val_loss: 0.0749 - val_accuracy: 0.9796
Epoch  9/10 loss: 0.0450 - accuracy: 0.9878 - val_loss: 0.0762 - val_accuracy: 0.9798
Epoch 10/10 loss: 0.0438 - accuracy: 0.9885 - val_loss: 0.0703 - val_accuracy: 0.9806
loss: 0.0675 - accuracy: 0.9794
  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.1
Epoch  1/10 loss: 0.5462 - accuracy: 0.8503 - val_loss: 0.1677 - val_accuracy: 0.9572
Epoch  2/10 loss: 0.1909 - accuracy: 0.9459 - val_loss: 0.1267 - val_accuracy: 0.9648
Epoch  3/10 loss: 0.1361 - accuracy: 0.9615 - val_loss: 0.0994 - val_accuracy: 0.9724
Epoch  4/10 loss: 0.1057 - accuracy: 0.9699 - val_loss: 0.0890 - val_accuracy: 0.9762
Epoch  5/10 loss: 0.0851 - accuracy: 0.9762 - val_loss: 0.0844 - val_accuracy: 0.9784
Epoch  6/10 loss: 0.0730 - accuracy: 0.9796 - val_loss: 0.0800 - val_accuracy: 0.9784
Epoch  7/10 loss: 0.0604 - accuracy: 0.9833 - val_loss: 0.0725 - val_accuracy: 0.9814
Epoch  8/10 loss: 0.0536 - accuracy: 0.9859 - val_loss: 0.0726 - val_accuracy: 0.9796
Epoch  9/10 loss: 0.0444 - accuracy: 0.9886 - val_loss: 0.0744 - val_accuracy: 0.9802
Epoch 10/10 loss: 0.0430 - accuracy: 0.9883 - val_loss: 0.0665 - val_accuracy: 0.9822
loss: 0.0658 - accuracy: 0.9800
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.001
Epoch  1/10 loss: 0.4529 - accuracy: 0.8712 - val_loss: 0.1166 - val_accuracy: 0.9686
Epoch  2/10 loss: 0.1205 - accuracy: 0.9648 - val_loss: 0.0921 - val_accuracy: 0.9748
Epoch  3/10 loss: 0.0763 - accuracy: 0.9775 - val_loss: 0.0831 - val_accuracy: 0.9774
Epoch  4/10 loss: 0.0540 - accuracy: 0.9844 - val_loss: 0.0758 - val_accuracy: 0.9780
Epoch  5/10 loss: 0.0408 - accuracy: 0.9879 - val_loss: 0.0733 - val_accuracy: 0.9808
Epoch  6/10 loss: 0.0298 - accuracy: 0.9919 - val_loss: 0.0833 - val_accuracy: 0.9810
Epoch  7/10 loss: 0.0238 - accuracy: 0.9936 - val_loss: 0.0761 - val_accuracy: 0.9814
Epoch  8/10 loss: 0.0169 - accuracy: 0.9950 - val_loss: 0.0760 - val_accuracy: 0.9796
Epoch  9/10 loss: 0.0132 - accuracy: 0.9966 - val_loss: 0.0810 - val_accuracy: 0.9814
Epoch 10/10 loss: 0.0116 - accuracy: 0.9968 - val_loss: 0.0913 - val_accuracy: 0.9782
loss: 0.0812 - accuracy: 0.9784
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01
Epoch  1/10 loss: 0.3453 - accuracy: 0.8944 - val_loss: 0.1442 - val_accuracy: 0.9586
Epoch  2/10 loss: 0.1415 - accuracy: 0.9585 - val_loss: 0.1317 - val_accuracy: 0.9638
Epoch  3/10 loss: 0.1126 - accuracy: 0.9685 - val_loss: 0.1323 - val_accuracy: 0.9646
Epoch  4/10 loss: 0.0977 - accuracy: 0.9720 - val_loss: 0.1397 - val_accuracy: 0.9684
Epoch  5/10 loss: 0.0938 - accuracy: 0.9744 - val_loss: 0.1374 - val_accuracy: 0.9708
Epoch  6/10 loss: 0.0864 - accuracy: 0.9755 - val_loss: 0.2143 - val_accuracy: 0.9618
Epoch  7/10 loss: 0.0863 - accuracy: 0.9773 - val_loss: 0.1833 - val_accuracy: 0.9696
Epoch  8/10 loss: 0.0741 - accuracy: 0.9801 - val_loss: 0.1747 - val_accuracy: 0.9716
Epoch  9/10 loss: 0.0734 - accuracy: 0.9815 - val_loss: 0.2182 - val_accuracy: 0.9668
Epoch 10/10 loss: 0.0715 - accuracy: 0.9828 - val_loss: 0.2157 - val_accuracy: 0.9698
loss: 0.2383 - accuracy: 0.9687
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001
Epoch  1/10 loss: 0.3396 - accuracy: 0.8952 - val_loss: 0.1255 - val_accuracy: 0.9652
Epoch  2/10 loss: 0.1132 - accuracy: 0.9654 - val_loss: 0.1273 - val_accuracy: 0.9666
Epoch  3/10 loss: 0.0714 - accuracy: 0.9776 - val_loss: 0.0896 - val_accuracy: 0.9768
Epoch  4/10 loss: 0.0467 - accuracy: 0.9854 - val_loss: 0.0970 - val_accuracy: 0.9756
Epoch  5/10 loss: 0.0315 - accuracy: 0.9896 - val_loss: 0.1041 - val_accuracy: 0.9788
Epoch  6/10 loss: 0.0193 - accuracy: 0.9934 - val_loss: 0.1029 - val_accuracy: 0.9790
Epoch  7/10 loss: 0.0121 - accuracy: 0.9961 - val_loss: 0.0926 - val_accuracy: 0.9802
Epoch  8/10 loss: 0.0061 - accuracy: 0.9983 - val_loss: 0.1044 - val_accuracy: 0.9802
Epoch  9/10 loss: 0.0035 - accuracy: 0.9992 - val_loss: 0.0992 - val_accuracy: 0.9806
Epoch 10/10 loss: 0.0029 - accuracy: 0.9994 - val_loss: 0.1052 - val_accuracy: 0.9816
loss: 0.0880 - accuracy: 0.9797
Final learning rate: 0.001
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=polynomial --learning_rate_final=0.0001
Epoch  1/10 loss: 0.3428 - accuracy: 0.8944 - val_loss: 0.1176 - val_accuracy: 0.9634
Epoch  2/10 loss: 0.1229 - accuracy: 0.9632 - val_loss: 0.1303 - val_accuracy: 0.9642
Epoch  3/10 loss: 0.0920 - accuracy: 0.9728 - val_loss: 0.1064 - val_accuracy: 0.9724
Epoch  4/10 loss: 0.0702 - accuracy: 0.9784 - val_loss: 0.1086 - val_accuracy: 0.9726
Epoch  5/10 loss: 0.0472 - accuracy: 0.9856 - val_loss: 0.1197 - val_accuracy: 0.9738
Epoch  6/10 loss: 0.0328 - accuracy: 0.9896 - val_loss: 0.1195 - val_accuracy: 0.9758
Epoch  7/10 loss: 0.0208 - accuracy: 0.9929 - val_loss: 0.1094 - val_accuracy: 0.9776
Epoch  8/10 loss: 0.0112 - accuracy: 0.9962 - val_loss: 0.1135 - val_accuracy: 0.9794
Epoch  9/10 loss: 0.0051 - accuracy: 0.9986 - val_loss: 0.1074 - val_accuracy: 0.9800
Epoch 10/10 loss: 0.0027 - accuracy: 0.9995 - val_loss: 0.1088 - val_accuracy: 0.9794
loss: 0.0899 - accuracy: 0.9816
Final learning rate: 0.0001

gym_cartpole

 Deadline: Mar 22, 23:59  3 points

Solve the CartPole-v1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole_data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.

The solution to this task should be a model which passes evaluation on random inputs. This evaluation can be performed by running the gym_cartpole.py with --evaluate argument (optionally rendering if --render option is provided), or directly calling the evaluate_model method. In order to pass, you must achieve an average reward of at least 475 on 100 episodes. Your model should have either one or two outputs (i.e., using either sigmoid or softmax output function).

When designing the model, you should consider that the size of the training data is very small and the data is quite noisy.

When submitting to ReCodEx, do not forget to also submit the trained model.

mnist_regularization

 Deadline: Mar 29, 23:59  3 points

You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:

  • Allow using dropout with rate args.dropout. Add a dropout layer after the first Flatten and also after all Dense hidden layers (but not after the output layer).
  • Allow using L2 regularization with weight args.l2. Use tf.keras.regularizers.L1L2 as a regularizer for all kernels (but not biases) of all Dense layers (including the last one).
  • Allow using label smoothing with weight args.label_smoothing. Instead of SparseCategoricalCrossentropy, you will need to use CategoricalCrossentropy which offers label_smoothing argument.

In ReCodEx, there will be six tests tests (two for each regularization methods) and you will get half a point for passing each one.

In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):

  • dropout rate 0, 0.3, 0.5, 0.6, 0.8;
  • l2 regularization 0, 0.001, 0.0001, 0.00001;
  • label smoothing 0, 0.1, 0.3, 0.5.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_regularization.py --dropout=0.3
Epoch  5/30 loss: 0.2319 - accuracy: 0.9309 - val_loss: 0.1919 - val_accuracy: 0.9420
Epoch 10/30 loss: 0.1207 - accuracy: 0.9608 - val_loss: 0.1507 - val_accuracy: 0.9560
Epoch 15/30 loss: 0.0785 - accuracy: 0.9758 - val_loss: 0.1300 - val_accuracy: 0.9606
Epoch 20/30 loss: 0.0595 - accuracy: 0.9833 - val_loss: 0.1292 - val_accuracy: 0.9628
Epoch 25/30 loss: 0.0517 - accuracy: 0.9816 - val_loss: 0.1311 - val_accuracy: 0.9618
Epoch 30/30 loss: 0.0315 - accuracy: 0.9919 - val_loss: 0.1413 - val_accuracy: 0.9618
loss: 0.1630 - accuracy: 0.9541
  • python3 mnist_regularization.py --dropout=0.5
Epoch  5/30 loss: 0.3931 - accuracy: 0.8815 - val_loss: 0.2147 - val_accuracy: 0.9366
Epoch 10/30 loss: 0.2626 - accuracy: 0.9232 - val_loss: 0.1665 - val_accuracy: 0.9528
Epoch 15/30 loss: 0.2229 - accuracy: 0.9261 - val_loss: 0.1427 - val_accuracy: 0.9582
Epoch 20/30 loss: 0.1765 - accuracy: 0.9473 - val_loss: 0.1379 - val_accuracy: 0.9596
Epoch 25/30 loss: 0.1653 - accuracy: 0.9477 - val_loss: 0.1272 - val_accuracy: 0.9628
Epoch 30/30 loss: 0.1335 - accuracy: 0.9596 - val_loss: 0.1251 - val_accuracy: 0.9638
loss: 0.1510 - accuracy: 0.9521
  • python3 mnist_regularization.py --l2=0.001
Epoch  5/30 loss: 0.3280 - accuracy: 0.9699 - val_loss: 0.3755 - val_accuracy: 0.9426
Epoch 10/30 loss: 0.2259 - accuracy: 0.9867 - val_loss: 0.3511 - val_accuracy: 0.9408
Epoch 15/30 loss: 0.2089 - accuracy: 0.9866 - val_loss: 0.3109 - val_accuracy: 0.9516
Epoch 20/30 loss: 0.1966 - accuracy: 0.9911 - val_loss: 0.2973 - val_accuracy: 0.9532
Epoch 25/30 loss: 0.1928 - accuracy: 0.9947 - val_loss: 0.3079 - val_accuracy: 0.9510
Epoch 30/30 loss: 0.1916 - accuracy: 0.9918 - val_loss: 0.3002 - val_accuracy: 0.9522
loss: 0.3313 - accuracy: 0.9394
  • python3 mnist_regularization.py --l2=0.0001
Epoch  5/30 loss: 0.1387 - accuracy: 0.9793 - val_loss: 0.2231 - val_accuracy: 0.9452
Epoch 10/30 loss: 0.0686 - accuracy: 0.9982 - val_loss: 0.2132 - val_accuracy: 0.9508
Epoch 15/30 loss: 0.0530 - accuracy: 1.0000 - val_loss: 0.1938 - val_accuracy: 0.9564
Epoch 20/30 loss: 0.0446 - accuracy: 1.0000 - val_loss: 0.1954 - val_accuracy: 0.9538
Epoch 25/30 loss: 0.0431 - accuracy: 1.0000 - val_loss: 0.1909 - val_accuracy: 0.9572
Epoch 30/30 loss: 0.0439 - accuracy: 1.0000 - val_loss: 0.1914 - val_accuracy: 0.9608
loss: 0.2141 - accuracy: 0.9512
  • python3 mnist_regularization.py --label_smoothing=0.1
Epoch  5/30 loss: 0.6077 - accuracy: 0.9865 - val_loss: 0.6626 - val_accuracy: 0.9610
Epoch 10/30 loss: 0.5422 - accuracy: 0.9994 - val_loss: 0.6414 - val_accuracy: 0.9642
Epoch 15/30 loss: 0.5225 - accuracy: 1.0000 - val_loss: 0.6324 - val_accuracy: 0.9654
Epoch 20/30 loss: 0.5145 - accuracy: 1.0000 - val_loss: 0.6289 - val_accuracy: 0.9674
Epoch 25/30 loss: 0.5101 - accuracy: 1.0000 - val_loss: 0.6281 - val_accuracy: 0.9678
Epoch 30/30 loss: 0.5081 - accuracy: 1.0000 - val_loss: 0.6271 - val_accuracy: 0.9682
loss: 0.6449 - accuracy: 0.9592
  • python3 mnist_regularization.py --label_smoothing=0.3
Epoch  5/30 loss: 1.2506 - accuracy: 0.9884 - val_loss: 1.2963 - val_accuracy: 0.9630
Epoch 10/30 loss: 1.2070 - accuracy: 0.9992 - val_loss: 1.2799 - val_accuracy: 0.9652
Epoch 15/30 loss: 1.1937 - accuracy: 1.0000 - val_loss: 1.2773 - val_accuracy: 0.9638
Epoch 20/30 loss: 1.1875 - accuracy: 1.0000 - val_loss: 1.2748 - val_accuracy: 0.9662
Epoch 25/30 loss: 1.1847 - accuracy: 1.0000 - val_loss: 1.2753 - val_accuracy: 0.9676
Epoch 30/30 loss: 1.1834 - accuracy: 1.0000 - val_loss: 1.2760 - val_accuracy: 0.9660
loss: 1.2875 - accuracy: 0.9587

mnist_ensemble

 Deadline: Mar 29, 23:59  2 points

Your goal in this assignment is to implement model ensembling. The mnist_ensemble.py template trains args.models individual models, and your goal is to perform an ensemble of the first model, first two models, first three models, …, all models, and evaluate their accuracy on the development set.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_ensemble.py --models=3
Model 1, individual accuracy 97.78, ensemble accuracy 97.78
Model 2, individual accuracy 97.76, ensemble accuracy 98.02
Model 3, individual accuracy 97.88, ensemble accuracy 98.06
  • python3 mnist_ensemble.py --models=5
Model 1, individual accuracy 97.78, ensemble accuracy 97.78
Model 2, individual accuracy 97.76, ensemble accuracy 98.02
Model 3, individual accuracy 97.88, ensemble accuracy 98.06
Model 4, individual accuracy 97.78, ensemble accuracy 98.10
Model 5, individual accuracy 97.78, ensemble accuracy 98.10

uppercase

 Deadline: Mar 29, 23:59  4 points+5 bonus

This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.

This is an open-data task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py/ipynb file.

The task is also a competition. Everyone who submits a solution which achieves at least 98.5% accuracy will get 4 basic points; the 5 bonus points will be distributed depending on relative ordering of your solutions. The accuracy is computed per-character and can be evaluated by running uppercase_data.py with --evaluate argument, or using its evaluate_file method.

You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.

Do not use RNNs, CNNs or Transformer in this task (if you have doubts, contact me).

mnist_cnn

 Deadline: Apr 05, 23:59  4 points

To pass this assignment, you will learn to construct basic convolutional neural network layers. Start with the mnist_cnn.py template and assume the requested architecture is described by the cnn argument, which contains comma-separated specifications of the following layers:

  • C-filters-kernel_size-stride-padding: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example: C-10-3-1-same
  • CB-filters-kernel_size-stride-padding: Same as C-filters-kernel_size-stride-padding, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally ReLU activation. Example: CB-10-3-1-same
  • M-pool_size-stride: Add max pooling with specified size and stride, using the default "valid" padding. Example: M-3-2
  • R-[layers]: Add a residual connection. The layers contain a specification of at least one convolutional layer (but not a recursive residual connection R). The input to the R layer should be processed sequentially by layers, and the produced output (after the ReLU nonlinearty of the last layer) should be added to the input (of this R layer). Example: R-[C-16-3-1-same,C-16-3-1-same]
  • F: Flatten inputs. Must appear exactly once in the architecture.
  • H-hidden_layer_size: Add a dense layer with ReLU activation and specified size. Example: H-100
  • D-dropout_rate: Apply dropout with the given dropout rate. Example: D-0.5

An example architecture might be --cnn=CB-16-5-2-same,M-3-2,F,H-100,D-0.5. You can assume the resulting network is valid; it is fine to crash if it is not.

After a successful ReCodEx submission, you can try obtaining the best accuracy on MNIST and then advance to cifar_competition.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_cnn.py --cnn=F,H-100
Epoch 1/5 loss: 0.5379 - accuracy: 0.8500 - val_loss: 0.1459 - val_accuracy: 0.9612
Epoch 2/5 loss: 0.1563 - accuracy: 0.9553 - val_loss: 0.1128 - val_accuracy: 0.9682
Epoch 3/5 loss: 0.1052 - accuracy: 0.9697 - val_loss: 0.0966 - val_accuracy: 0.9714
Epoch 4/5 loss: 0.0792 - accuracy: 0.9765 - val_loss: 0.0864 - val_accuracy: 0.9744
Epoch 5/5 loss: 0.0627 - accuracy: 0.9814 - val_loss: 0.0818 - val_accuracy: 0.9768
loss: 0.0844 - accuracy: 0.9757
  • python3 mnist_cnn.py --cnn=F,H-100,D-0.5
Epoch 1/5 loss: 0.7447 - accuracy: 0.7719 - val_loss: 0.1617 - val_accuracy: 0.9596
Epoch 2/5 loss: 0.2781 - accuracy: 0.9167 - val_loss: 0.1266 - val_accuracy: 0.9668
Epoch 3/5 loss: 0.2293 - accuracy: 0.9321 - val_loss: 0.1097 - val_accuracy: 0.9696
Epoch 4/5 loss: 0.2003 - accuracy: 0.9399 - val_loss: 0.1035 - val_accuracy: 0.9716
Epoch 5/5 loss: 0.1858 - accuracy: 0.9444 - val_loss: 0.1019 - val_accuracy: 0.9728
loss: 0.1131 - accuracy: 0.9676
  • python3 mnist_cnn.py --cnn=M-5-2,F,H-50
Epoch 1/5 loss: 1.0752 - accuracy: 0.6618 - val_loss: 0.3934 - val_accuracy: 0.8818
Epoch 2/5 loss: 0.4421 - accuracy: 0.8598 - val_loss: 0.3241 - val_accuracy: 0.9000
Epoch 3/5 loss: 0.3651 - accuracy: 0.8849 - val_loss: 0.2996 - val_accuracy: 0.9078
Epoch 4/5 loss: 0.3271 - accuracy: 0.8951 - val_loss: 0.2712 - val_accuracy: 0.9174
Epoch 5/5 loss: 0.3014 - accuracy: 0.9049 - val_loss: 0.2632 - val_accuracy: 0.9182
loss: 0.2967 - accuracy: 0.9067
  • python3 mnist_cnn.py --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50
Epoch 1/5 loss: 1.1907 - accuracy: 0.6001 - val_loss: 0.3445 - val_accuracy: 0.9004
Epoch 2/5 loss: 0.4124 - accuracy: 0.8730 - val_loss: 0.2818 - val_accuracy: 0.9158
Epoch 3/5 loss: 0.3335 - accuracy: 0.8970 - val_loss: 0.2523 - val_accuracy: 0.9254
Epoch 4/5 loss: 0.3036 - accuracy: 0.9043 - val_loss: 0.2292 - val_accuracy: 0.9316
Epoch 5/5 loss: 0.2802 - accuracy: 0.9143 - val_loss: 0.2186 - val_accuracy: 0.9340
loss: 0.2520 - accuracy: 0.9243
  • python3 mnist_cnn.py --cnn=CB-6-3-5-valid,F,H-32
Epoch 1/5 loss: 0.9799 - accuracy: 0.6768 - val_loss: 0.2519 - val_accuracy: 0.9230
Epoch 2/5 loss: 0.3122 - accuracy: 0.9045 - val_loss: 0.2116 - val_accuracy: 0.9338
Epoch 3/5 loss: 0.2493 - accuracy: 0.9230 - val_loss: 0.1792 - val_accuracy: 0.9496
Epoch 4/5 loss: 0.2147 - accuracy: 0.9322 - val_loss: 0.1637 - val_accuracy: 0.9528
Epoch 5/5 loss: 0.1873 - accuracy: 0.9415 - val_loss: 0.1544 - val_accuracy: 0.9566
loss: 0.1857 - accuracy: 0.9424
  • python3 mnist_cnn.py --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50
Epoch 1/5 loss: 0.7976 - accuracy: 0.7449 - val_loss: 0.1791 - val_accuracy: 0.9458
Epoch 2/5 loss: 0.2052 - accuracy: 0.9360 - val_loss: 0.1531 - val_accuracy: 0.9506
Epoch 3/5 loss: 0.1497 - accuracy: 0.9524 - val_loss: 0.1340 - val_accuracy: 0.9600
Epoch 4/5 loss: 0.1261 - accuracy: 0.9593 - val_loss: 0.1226 - val_accuracy: 0.9624
Epoch 5/5 loss: 0.1113 - accuracy: 0.9642 - val_loss: 0.1094 - val_accuracy: 0.9684
loss: 0.1212 - accuracy: 0.9609

image_augmentation

 Deadline: Apr 05, 23:59  1 points

The template image_augmentation.py creates a simple convolutional network for classifying CIFAR-10. Your goal is to perform image data augmentation operations using ImageDataGenerator and to utilize these data during training.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 image_augmentation.py --batch_size=50
Epoch 1/5 loss: 2.2698 - accuracy: 0.1253 - val_loss: 1.9850 - val_accuracy: 0.2590
Epoch 2/5 loss: 2.0054 - accuracy: 0.2387 - val_loss: 1.7783 - val_accuracy: 0.3250
Epoch 3/5 loss: 1.8557 - accuracy: 0.3121 - val_loss: 1.7411 - val_accuracy: 0.3620
Epoch 4/5 loss: 1.7431 - accuracy: 0.3565 - val_loss: 1.6151 - val_accuracy: 0.4160
Epoch 5/5 loss: 1.6636 - accuracy: 0.3849 - val_loss: 1.6074 - val_accuracy: 0.4230
  • python3 image_augmentation.py --batch_size=100
Epoch 1/5 loss: 2.2671 - accuracy: 0.1350 - val_loss: 1.9996 - val_accuracy: 0.2680
Epoch 2/5 loss: 1.9756 - accuracy: 0.2813 - val_loss: 1.7990 - val_accuracy: 0.3400
Epoch 3/5 loss: 1.8361 - accuracy: 0.3266 - val_loss: 1.6944 - val_accuracy: 0.3550
Epoch 4/5 loss: 1.7677 - accuracy: 0.3546 - val_loss: 1.6714 - val_accuracy: 0.3850
Epoch 5/5 loss: 1.6904 - accuracy: 0.3673 - val_loss: 1.6651 - val_accuracy: 0.3870

tf_dataset

 Deadline: Apr 05, 23:59  2 points

In this assignment you will familiarize yourselves with tf.data, which is TensorFlow high-level API for constructing input pipelines. If you want, you can read an official TensorFlow tf.data guide or reference API manual.

The goal of this assignment is to implement image augmentation preprocessing similar to image_augmentation, but with tf.data. Start with the tf_dataset.py template and implement the input pipelines employing the tf.data.Dataset.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tf_dataset.py --batch_size=50
Epoch 1/5 loss: 2.2395 - accuracy: 0.1408 - val_loss: 1.9160 - val_accuracy: 0.3000
Epoch 2/5 loss: 1.9410 - accuracy: 0.2794 - val_loss: 1.7881 - val_accuracy: 0.3430
Epoch 3/5 loss: 1.8415 - accuracy: 0.3287 - val_loss: 1.6749 - val_accuracy: 0.3740
Epoch 4/5 loss: 1.7689 - accuracy: 0.3480 - val_loss: 1.6263 - val_accuracy: 0.3780
Epoch 5/5 loss: 1.7185 - accuracy: 0.3634 - val_loss: 1.5976 - val_accuracy: 0.4260
  • python3 tf_dataset.py --batch_size=100
Epoch 1/5 loss: 2.2697 - accuracy: 0.1305 - val_loss: 2.0089 - val_accuracy: 0.2700
Epoch 2/5 loss: 2.0114 - accuracy: 0.2545 - val_loss: 1.8020 - val_accuracy: 0.3410
Epoch 3/5 loss: 1.8473 - accuracy: 0.3278 - val_loss: 1.7071 - val_accuracy: 0.3630
Epoch 4/5 loss: 1.7961 - accuracy: 0.3472 - val_loss: 1.6509 - val_accuracy: 0.3840
Epoch 5/5 loss: 1.7164 - accuracy: 0.3681 - val_loss: 1.6429 - val_accuracy: 0.3910

mnist_multiple

 Deadline: Apr 05, 23:59  3 points

In this assignment you will implement a model with multiple inputs and outputs. Start with the mnist_multiple.py template and:

  • The goal is to create a model, which given two input MNIST images predicts, if the digit on the first one is larger than on the second one.
  • The model has four outputs:
    • direct prediction whether the first digit is larger than the second one,
    • digit classification for the first image,
    • digit classification for the second image,
    • indirect prediction comparing the digits predicted by the above two outputs.
  • You need to implement:
    • the model, using multiple inputs, outputs, losses and metrics;
    • construction of two-image dataset examples using regular MNIST data via the tf.data API.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 mnist_multiple.py --batch_size=50
Epoch 1/5 loss: 1.6499 - digit_1_loss: 0.6142 - digit_2_loss: 0.6227 - direct_prediction_loss: 0.4130 - direct_prediction_accuracy: 0.7896 - indirect_prediction_accuracy: 0.8972 - val_loss: 0.3579 - val_digit_1_loss: 0.1265 - val_digit_2_loss: 0.0724 - val_direct_prediction_loss: 0.1590 - val_direct_prediction_accuracy: 0.9428 - val_indirect_prediction_accuracy: 0.9800
Epoch 2/5 loss: 0.3472 - digit_1_loss: 0.0965 - digit_2_loss: 0.0988 - direct_prediction_loss: 0.1519 - direct_prediction_accuracy: 0.9452 - indirect_prediction_accuracy: 0.9788 - val_loss: 0.2222 - val_digit_1_loss: 0.0859 - val_digit_2_loss: 0.0555 - val_direct_prediction_loss: 0.0808 - val_direct_prediction_accuracy: 0.9724 - val_indirect_prediction_accuracy: 0.9872
Epoch 3/5 loss: 0.2184 - digit_1_loss: 0.0597 - digit_2_loss: 0.0624 - direct_prediction_loss: 0.0964 - direct_prediction_accuracy: 0.9643 - indirect_prediction_accuracy: 0.9868 - val_loss: 0.1976 - val_digit_1_loss: 0.0776 - val_digit_2_loss: 0.0610 - val_direct_prediction_loss: 0.0590 - val_direct_prediction_accuracy: 0.9824 - val_indirect_prediction_accuracy: 0.9856
Epoch 4/5 loss: 0.1540 - digit_1_loss: 0.0428 - digit_2_loss: 0.0454 - direct_prediction_loss: 0.0659 - direct_prediction_accuracy: 0.9781 - indirect_prediction_accuracy: 0.9889 - val_loss: 0.1753 - val_digit_1_loss: 0.0640 - val_digit_2_loss: 0.0523 - val_direct_prediction_loss: 0.0590 - val_direct_prediction_accuracy: 0.9776 - val_indirect_prediction_accuracy: 0.9876
Epoch 5/5 loss: 0.1253 - digit_1_loss: 0.0333 - digit_2_loss: 0.0337 - direct_prediction_loss: 0.0583 - direct_prediction_accuracy: 0.9806 - indirect_prediction_accuracy: 0.9914 - val_loss: 0.1596 - val_digit_1_loss: 0.0648 - val_digit_2_loss: 0.0525 - val_direct_prediction_loss: 0.0423 - val_direct_prediction_accuracy: 0.9880 - val_indirect_prediction_accuracy: 0.9908
loss: 0.1471 - digit_1_loss: 0.0429 - digit_2_loss: 0.0484 - direct_prediction_loss: 0.0558 - direct_prediction_accuracy: 0.9822 - indirect_prediction_accuracy: 0.9900
  • python3 mnist_multiple.py --batch_size=100
Epoch 1/5 loss: 2.1134 - digit_1_loss: 0.8183 - digit_2_loss: 0.8250 - direct_prediction_loss: 0.4701 - direct_prediction_accuracy: 0.7570 - indirect_prediction_accuracy: 0.8735 - val_loss: 0.4835 - val_digit_1_loss: 0.1706 - val_digit_2_loss: 0.0993 - val_direct_prediction_loss: 0.2136 - val_direct_prediction_accuracy: 0.9168 - val_indirect_prediction_accuracy: 0.9700
Epoch 2/5 loss: 0.4881 - digit_1_loss: 0.1379 - digit_2_loss: 0.1396 - direct_prediction_loss: 0.2107 - direct_prediction_accuracy: 0.9159 - indirect_prediction_accuracy: 0.9706 - val_loss: 0.3022 - val_digit_1_loss: 0.1047 - val_digit_2_loss: 0.0659 - val_direct_prediction_loss: 0.1316 - val_direct_prediction_accuracy: 0.9500 - val_indirect_prediction_accuracy: 0.9832
Epoch 3/5 loss: 0.2938 - digit_1_loss: 0.0795 - digit_2_loss: 0.0825 - direct_prediction_loss: 0.1317 - direct_prediction_accuracy: 0.9493 - indirect_prediction_accuracy: 0.9825 - val_loss: 0.2150 - val_digit_1_loss: 0.0782 - val_digit_2_loss: 0.0586 - val_direct_prediction_loss: 0.0782 - val_direct_prediction_accuracy: 0.9688 - val_indirect_prediction_accuracy: 0.9888
Epoch 4/5 loss: 0.2026 - digit_1_loss: 0.0547 - digit_2_loss: 0.0607 - direct_prediction_loss: 0.0872 - direct_prediction_accuracy: 0.9693 - indirect_prediction_accuracy: 0.9881 - val_loss: 0.1970 - val_digit_1_loss: 0.0750 - val_digit_2_loss: 0.0543 - val_direct_prediction_loss: 0.0676 - val_direct_prediction_accuracy: 0.9748 - val_indirect_prediction_accuracy: 0.9868
Epoch 5/5 loss: 0.1618 - digit_1_loss: 0.0437 - digit_2_loss: 0.0470 - direct_prediction_loss: 0.0711 - direct_prediction_accuracy: 0.9753 - indirect_prediction_accuracy: 0.9893 - val_loss: 0.1735 - val_digit_1_loss: 0.0667 - val_digit_2_loss: 0.0507 - val_direct_prediction_loss: 0.0562 - val_direct_prediction_accuracy: 0.9816 - val_indirect_prediction_accuracy: 0.9896
loss: 0.1658 - digit_1_loss: 0.0469 - digit_2_loss: 0.0506 - direct_prediction_loss: 0.0683 - direct_prediction_accuracy: 0.9768 - indirect_prediction_accuracy: 0.9884

cifar_competition

 Deadline: Apr 05, 23:59  5 points+5 bonus

The goal of this assignment is to devise the best possible model for CIFAR-10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR-10.

The task is a competition. Everyone who submits a solution which achieves at least 60% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve at least ~73% on the development set to score 60% on the test set.

You may want to start with the cifar_competition.py template which generates the test set annotation in the required format.

cnn_manual

 Deadline: Apr 12, 23:59  3 points

To pass this assignment, you need to manually implement the forward and backward pass through a 2D convolutional layer. Start with the cnn_manual.py template, which construct a series of 2D convolutional layers with ReLU activation and valid padding, specified in the args.cnn option. The args.cnn contains comma separater layer specifications in the format filters-kernel_size-stride.

Of course, you cannot use any TensorFlow convolutional operation (instead, implement the forward and backward pass using matrix multiplication and other operations) nor the GradientTape for gradient computation.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 cnn_manual.py --cnn=5-1-1
Dev accuracy after epoch 1 is 91.42
Dev accuracy after epoch 2 is 92.44
Dev accuracy after epoch 3 is 91.82
Dev accuracy after epoch 4 is 92.62
Dev accuracy after epoch 5 is 92.32
Test accuracy after epoch 5 is 90.73
  • python3 cnn_manual.py --cnn=5-3-1
Dev accuracy after epoch 1 is 95.62
Dev accuracy after epoch 2 is 96.06
Dev accuracy after epoch 3 is 96.22
Dev accuracy after epoch 4 is 96.46
Dev accuracy after epoch 5 is 96.12
Test accuracy after epoch 5 is 95.73
  • python3 cnn_manual.py --cnn=5-3-2
Dev accuracy after epoch 1 is 93.14
Dev accuracy after epoch 2 is 94.90
Dev accuracy after epoch 3 is 95.26
Dev accuracy after epoch 4 is 95.42
Dev accuracy after epoch 5 is 95.34
Test accuracy after epoch 5 is 95.01
  • python3 cnn_manual.py --cnn=5-3-2,10-3-2
Dev accuracy after epoch 1 is 95.00
Dev accuracy after epoch 2 is 96.40
Dev accuracy after epoch 3 is 96.42
Dev accuracy after epoch 4 is 96.84
Dev accuracy after epoch 5 is 97.16
Test accuracy after epoch 5 is 96.44

cags_classification

 Deadline: Apr 12, 23:59  5 points+5 bonus

The goal of this assignment is to use pretrained EfficientNet-B0 model to achieve best accuracy in CAGS classification.

The CAGS dataset consists of images of cats and dogs of size 224×224224×224, each classified in one of the 34 breeds and each containing a mask indicating the presence of the animal. To load the dataset, use the cags_dataset.py module. The dataset is stored in a TFRecord file and each element is encoded as a tf.train.Example, which is decoded using the CAGS.parse method.

To load the EfficientNet-B0, use the the provided efficient_net.py module. Its method pretrained_efficientnet_b0(include_top, dynamic_input_shape=False):

  • downloads the pretrained weights if they are not found;
  • it returns a tf.keras.Model processing image of shape (224,224,3)(224, 224, 3) with float values in range [0,1][0, 1] and producing a list of results:
    • the first value is the final network output:
      • if include_top == True, the network will include the final classification layer and produce a distribution on 1000 classes (whose names are in imagenet_classes.py);
      • if include_top == False, the network will return image features (the result of the last global average pooling);
    • the rest of outputs are the intermediate results of the network just before a convolution with stride>1\textit{stride} > 1 is performed (denoted C5,C4,C3,C2,C1C_5, C_4, C_3, C_2, C_1 in the Object Detection lecture).

An example performing classification of given images is available in image_classification.py.

A note on finetuning: each tf.keras.layers.Layer has a mutable trainable property indicating whether its variables should be updated – however, after changing it, you need to call .compile again (or otherwise make sure the list of trainable variables for the optimizer is updated). Furthermore, training argument passed to the invocation call decides whether the layer is executed in training regime (neurons gets dropped in dropout, batch normalization computes estimates on the batch) or in inference regime. There is one exception though – if trainable == False on a batch normalization layer, it runs in the inference regime even when training == True.

The task is a competition. Everyone who submits a solution which achieves at least 90% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.

You may want to start with the cags_classification.py template which generates the test set annotation in the required format.

mnist_web

You can try a Javascript-based demo of MNIST classification. This demo uses a neural network trained in TensorFlow using the mnist_web.py module, whose output was converted for Tensorflow.js with tensorflowjs_converter --input_format=keras command and is then utilized by mnist_web.html.

cags_segmentation

 Deadline: Apr 19, 23:59  5 points+5 bonus

The goal of this assignment is to use pretrained EfficientNet-B0 model to achieve best image segmentation IoU score on the CAGS dataset. The dataset and the EfficientNet-B0 is described in the cags_classification assignment.

A mask is evaluated using intersection over union (IoU) metric, which is the intersection of the gold and predicted mask divided by their union, and the whole test set score is the average of its masks' IoU. A TensorFlow compatible metric is implemented by the class MaskIoUMetric of the cags_dataset.py module, which can also evaluate your predictions (either by running with --task=segmentation --evaluate=path arguments, or using its evaluate_segmentation_file method).

The task is a competition. Everyone who submits a solution which achieves at least 87% test set IoU gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.

You may want to start with the cags_segmentation.py template, which generates the test set annotation in the required format – each mask should be encoded on a single line as a space separated sequence of integers indicating the length of alternating runs of zeros and ones.

3d_recognition

 Deadline: Apr 19, 23:59  5 points+5 bonus

Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data or 32×32×32 data. To load the dataset, use the modelnet.py module.

The official dataset offers only train and test sets, with the test set having a different distributions of labels. Our dataset contains also a development set, which has nearly the same label distribution as the test set.

The task is a competition. Everyone who submits a solution which achieves at least 87% test set accuracy gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.

You can start with the 3d_recognition.py template, which among others generates test set annotations in the required format.

bboxes_utils

 Deadline: Apr 26, 23:59  2 points

This is a preparatory assignment for svhn_competition. The goal is to implement several bounding box manipulation routines in the bboxes_utils.py module. Notably, you need to implement the following methods:

  • bboxes_to_fast_rcnn: convert given bounding boxes to a Fast R-CNN-like representation relative to the given anchors;
  • bboxes_from_fast_rcnn: convert Fast R-CNN-like representations relative to given anchors back to bounding boxes;
  • bboxes_training: given a list of anchors and gold objects, assign gold objects to anchors and generate suitable training data (the exact algorithm is described in the template).

The bboxes_utils.py contains simple unit tests, which are evaluated when executing the module, which you can use to check the validity of your implementation.

When submitting to ReCodEx, the method main is executed, returning the implemented bboxes_to_fast_rcnn, bboxes_to_fast_rcnn and bboxes_training methods. These methods are then executed and compared to the reference implementation.

svhn_competition

 Deadline: Apr 26, 23:59; non-competition part extended to May 03  5 points+5 bonus

The goal of this assignment is to implement a system performing object recognition, optionally utilizing pretrained EfficientNet-B0 backbone.

The Street View House Numbers (SVHN) dataset annotates for every photo all digits appearing on it, including their bounding boxes. The dataset can be loaded using the svhn_dataset.py module. Similarly to the CAGS dataset, it is stored in a TFRecord file with tf.train.Example elements. Every element is a dictionary with the following keys:

  • "image": a square 3-channel image,
  • "classes": a 1D tensor with all digit labels appearing in the image,
  • "bboxes": a [num_digits, 4] 2D tensor with bounding boxes of every digit in the image.

Given that the dataset elements are each of possibly different size and you want to preprocess them using bboxes_training, it might be more comfortable to convert the dataset to NumPy. Alternatively, you can implement bboxes_training using TensorFlow operations or call Numpy implementation of bboxes_training directly in tf.data.Dataset.map by using tf.numpy_function, see FAQ.

Similarly to the cags_classification, you can load the EfficientNet-B0 using the provided efficient_net.py module. Note that the dynamic_input_shape=True argument creates a model capable of processing an input image of any size.

Each test set image annotation consists of a sequence of space separated five-tuples label top left bottom right, and the annotation is considered correct, if exactly the gold digits are predicted, each with IoU at least 0.5. The whole test set score is then the prediction accuracy of individual images. You can again evaluate your predictions using the svhn_dataset.py module, either by running with --evaluate=path arguments, or using its evaluate_file method.

The task is a competition. Everyone who submits a solution which achieves at least 20% test set IoU gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that I usually need at least 35% development set accuracy to achieve the required test set performance.

You should start with the svhn_competition.py template, which generates the test set annotation in the required format.

A baseline solution can use RetinaNet-like single stage detector, using only a single level of convolutional features (no FPN) with single-scale and single-aspect anchors. Focal loss is available as tfa.losses.SigmoidFocalCrossEntropy (using reduction=tf.losses.Reduction.SUM_OVER_BATCH_SIZE option is a good idea) and non-maximum suppression as tf.image.non_max_suppression or tf.image.combined_non_max_suppression.

sequence_classification

 Deadline: May 03, 23:59  3 points

The goal of this assignment is to introduce recurrent neural networks. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.

Your goal is to modify the sequence_classification.py template and implement the following:

  • Use specified RNN type (SimpleRNN, GRU and LSTM) and dimensionality.
  • Process the sequence using the required RNN.
  • Use additional hidden layer on the RNN outputs if requested.
  • Implement gradient clipping if requested.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:

  • --rnn_cell=SimpleRNN --sequence_dim=1, --rnn_cell=GRU --sequence_dim=1, --rnn_cell=LSTM --sequence_dim=1
  • the same as above but with --sequence_dim=2
  • the same as above but with --sequence_dim=10
  • --rnn_cell=LSTM --hidden_layer=70 --rnn_cell_dim=30 --sequence_dim=30 and the same with --clip_gradient=1
  • the same as above but with --rnn_cell=SimpleRNN
  • the same as above but with --rnn_cell=GRU --hidden_layer=90

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 sequence_classification.py --rnn_cell SimpleRNN --epochs=5
Epoch 1/5 loss: 0.7008 - accuracy: 0.5037 - val_loss: 0.6926 - val_accuracy: 0.5176
Epoch 2/5 loss: 0.6924 - accuracy: 0.5165 - val_loss: 0.6921 - val_accuracy: 0.5217
Epoch 3/5 loss: 0.6920 - accuracy: 0.5166 - val_loss: 0.6913 - val_accuracy: 0.5114
Epoch 4/5 loss: 0.6908 - accuracy: 0.5193 - val_loss: 0.6881 - val_accuracy: 0.5157
Epoch 5/5 loss: 0.6863 - accuracy: 0.5217 - val_loss: 0.6793 - val_accuracy: 0.5231
  • python3 sequence_classification.py --rnn_cell GRU --epochs=5
Epoch 1/5 loss: 0.6930 - accuracy: 0.5109 - val_loss: 0.6917 - val_accuracy: 0.5157
Epoch 2/5 loss: 0.6905 - accuracy: 0.5170 - val_loss: 0.6823 - val_accuracy: 0.5143
Epoch 3/5 loss: 0.6342 - accuracy: 0.5925 - val_loss: 0.2222 - val_accuracy: 0.9695
Epoch 4/5 loss: 0.1759 - accuracy: 0.9760 - val_loss: 0.0930 - val_accuracy: 0.9882
Epoch 5/5 loss: 0.0754 - accuracy: 0.9938 - val_loss: 0.0381 - val_accuracy: 0.9986
  • python3 sequence_classification.py --rnn_cell LSTM --epochs=5
Epoch 1/5 loss: 0.6931 - accuracy: 0.5131 - val_loss: 0.6927 - val_accuracy: 0.5153
Epoch 2/5 loss: 0.6924 - accuracy: 0.5158 - val_loss: 0.6902 - val_accuracy: 0.5156
Epoch 3/5 loss: 0.6874 - accuracy: 0.5174 - val_loss: 0.6748 - val_accuracy: 0.5285
Epoch 4/5 loss: 0.5799 - accuracy: 0.6247 - val_loss: 0.0695 - val_accuracy: 1.0000
Epoch 5/5 loss: 0.0482 - accuracy: 1.0000 - val_loss: 0.0183 - val_accuracy: 1.0000
  • python3 sequence_classification.py --rnn_cell LSTM --epochs=5 --hidden_layer=50
Epoch 1/5 loss: 0.6884 - accuracy: 0.5129 - val_loss: 0.6614 - val_accuracy: 0.5309
Epoch 2/5 loss: 0.6544 - accuracy: 0.5362 - val_loss: 0.6378 - val_accuracy: 0.5301
Epoch 3/5 loss: 0.6319 - accuracy: 0.5482 - val_loss: 0.5836 - val_accuracy: 0.6181
Epoch 4/5 loss: 0.2933 - accuracy: 0.8366 - val_loss: 0.0030 - val_accuracy: 0.9998
Epoch 5/5 loss: 0.0023 - accuracy: 0.9999 - val_loss: 0.0010 - val_accuracy: 0.9999
  • python3 sequence_classification.py --rnn_cell LSTM --epochs=5 --hidden_layer=50 --clip_gradient=0.1
Epoch 1/5 loss: 0.6884 - accuracy: 0.5130 - val_loss: 0.6615 - val_accuracy: 0.5302
Epoch 2/5 loss: 0.6544 - accuracy: 0.5364 - val_loss: 0.6373 - val_accuracy: 0.5293
Epoch 3/5 loss: 0.6304 - accuracy: 0.5517 - val_loss: 0.5875 - val_accuracy: 0.6107
Epoch 4/5 loss: 0.3835 - accuracy: 0.7753 - val_loss: 6.5897e-04 - val_accuracy: 1.0000
Epoch 5/5 loss: 0.0011 - accuracy: 0.9999 - val_loss: 1.6853e-04 - val_accuracy: 1.0000

tagger_we

 Deadline: May 03, 23:59  3 points

In this assignment you will create a simple part-of-speech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The morpho_dataset.py module (down)loads the dataset and provides mappings between strings and integers.

Your goal is to modify the tagger_we.py template and implement the following:

  • Use specified RNN cell type (GRU and LSTM) and dimensionality.
  • Create word embeddings for training vocabulary.
  • Process the sentences using bidirectional RNN.
  • Predict part-of-speech tags. Note that you need to properly handle sentences of different lengths in one batch using tf.RaggedTensors.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tagger_we.py --max_sentences=5000 --rnn_cell=LSTM --rnn_cell_dim=16
Epoch 1/5 loss: 1.9780 - accuracy: 0.4436 - val_loss: 0.5346 - val_accuracy: 0.8354
Epoch 2/5 loss: 0.2443 - accuracy: 0.9513 - val_loss: 0.3686 - val_accuracy: 0.8563
Epoch 3/5 loss: 0.0557 - accuracy: 0.9893 - val_loss: 0.3289 - val_accuracy: 0.8735
Epoch 4/5 loss: 0.0333 - accuracy: 0.9916 - val_loss: 0.3430 - val_accuracy: 0.8671
Epoch 5/5 loss: 0.0258 - accuracy: 0.9936 - val_loss: 0.3343 - val_accuracy: 0.8736
loss: 0.3486 - accuracy: 0.8737
  • python3 tagger_we.py --max_sentences=5000 --rnn_cell=GRU --rnn_cell_dim=16
Epoch 1/5 loss: 1.6714 - accuracy: 0.5524 - val_loss: 0.3901 - val_accuracy: 0.8744
Epoch 2/5 loss: 0.1312 - accuracy: 0.9722 - val_loss: 0.3210 - val_accuracy: 0.8710
Epoch 3/5 loss: 0.0385 - accuracy: 0.9898 - val_loss: 0.3104 - val_accuracy: 0.8817
Epoch 4/5 loss: 0.0261 - accuracy: 0.9920 - val_loss: 0.3056 - val_accuracy: 0.8886
Epoch 5/5 loss: 0.0210 - accuracy: 0.9933 - val_loss: 0.3052 - val_accuracy: 0.8925
loss: 0.3525 - accuracy: 0.8788

tagger_cle

 Deadline: May 03, 23:59  3 points

This assignment is a continuation of tagger_we. Using the tagger_cle.py template, implement character-level word embedding computation using a bidirectional character-level GRU.

Once submitted to ReCodEx, you should experiment with the effect of CLEs compared to a plain tagger_we, and the influence of their dimensionality. Note that tagger_cle has by default smaller word embeddings so that the size of word representation (64 + 32 + 32) is the same as in the tagger_we assignment.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tagger_cle.py --max_sentences=5000 --rnn_cell=LSTM --rnn_cell_dim=16 --cle_dim=16
Epoch 1/5 loss: 1.8425 - accuracy: 0.4607 - val_loss: 0.4031 - val_accuracy: 0.9008
Epoch 2/5 loss: 0.2080 - accuracy: 0.9599 - val_loss: 0.2516 - val_accuracy: 0.9204
Epoch 3/5 loss: 0.0560 - accuracy: 0.9882 - val_loss: 0.2177 - val_accuracy: 0.9286
Epoch 4/5 loss: 0.0335 - accuracy: 0.9917 - val_loss: 0.2155 - val_accuracy: 0.9265
Epoch 5/5 loss: 0.0250 - accuracy: 0.9935 - val_loss: 0.1920 - val_accuracy: 0.9363
loss: 0.2118 - accuracy: 0.9289
  • python3 tagger_cle.py --max_sentences=5000 --rnn_cell=LSTM --rnn_cell_dim=16 --cle_dim=16 --word_masking=0.1
Epoch 1/5 loss: 1.8989 - accuracy: 0.4426 - val_loss: 0.4616 - val_accuracy: 0.8798
Epoch 2/5 loss: 0.3442 - accuracy: 0.9155 - val_loss: 0.2408 - val_accuracy: 0.9265
Epoch 3/5 loss: 0.1503 - accuracy: 0.9605 - val_loss: 0.1994 - val_accuracy: 0.9364
Epoch 4/5 loss: 0.1040 - accuracy: 0.9706 - val_loss: 0.1847 - val_accuracy: 0.9427
Epoch 5/5 loss: 0.0892 - accuracy: 0.9728 - val_loss: 0.1882 - val_accuracy: 0.9401
loss: 0.2029 - accuracy: 0.9361

tagger_competition

 Deadline: May 03, 23:59  4 points+5 bonus

In this assignment, you should extend tagger_cle into a real-world Czech part-of-speech tagger. We will use Czech PDT dataset loadable using the morpho_dataset.py module. Note that the dataset contains more than 1500 unique POS tags and that the POS tags have a fixed structure of 15 positions (so it is possible to generate the POS tag characters independently).

You can use the following additional data in this assignment:

  • You can use outputs of a morphological analyzer loadable with morpho_analyzer.py. If a word form in train, dev or test PDT data is known to the analyzer, all its (lemma, POS tag) pairs are returned.
  • You can use any unannotated text data (Wikipedia, Czech National Corpus, …), and also any pre-trained word embeddings (assuming they were trained on plain texts).

The task is a competition. Everyone who submits a solution a solution with at least 92% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing pre-neural-network state-of-the-art of 95.89% from Spoustová et al., 2009.

You can start with the tagger_competition.py template, which among others generates test set annotations in the required format. Note that you can evaluate the predictions as usual using the morpho_dataset.py module, either by running with --task=tagger --evaluate=path arguments, or using its evaluate_file method.

tensorboard_projector

You can try exploring the TensorBoard Projector with pre-trained embeddings for 20k most frequent lemmas in Czech and English – after extracting the archive, start tensorboard --logdir dir_where_the_archive_is_extracted.

In order to use the Projector tab yourself, you can take inspiration from the projector_export.py script, which was used to export the above pre-trained embeddings from the Word2vec format.

tagger_crf

 Deadline: May 10, 23:59  2 points

This assignment is an extension of tagger_we task. Using the tagger_crf.py template, implement named entity recognition using CRF loss and CRF decoding from the tensorflow_addons package.

The evaluation is performed using the provided metric computing F1 score of the span prediction (i.e., a recognized possibly-multiword named entity is true positive if both the entity type and the span exactly match).

In practice, character-level embeddings (and also pre-trained word embeddings) would be used to obtain superior results.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tagger_crf.py --max_sentences=5000 --rnn_cell=LSTM --rnn_cell_dim=24
Epoch 1/5 loss: 18.5475 - val_f1: 0.0248
Epoch 2/5 loss: 9.8655 - val_f1: 0.2207
Epoch 3/5 loss: 6.0053 - val_f1: 0.3370
Epoch 4/5 loss: 3.1784 - val_f1: 0.4000
Epoch 5/5 loss: 1.6535 - val_f1: 0.4363
  • python3 tagger_crf.py --max_sentences=5000 --rnn_cell=GRU --rnn_cell_dim=24
Epoch 1/5 loss: 17.7499 - val_f1: 0.1624
Epoch 2/5 loss: 8.3992 - val_f1: 0.4048
Epoch 3/5 loss: 3.7579 - val_f1: 0.4444
Epoch 4/5 loss: 1.5298 - val_f1: 0.4496
Epoch 5/5 loss: 0.7858 - val_f1: 0.4769

speech_recognition

 Deadline: May 10, 23:59  5 points+5 bonus

This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using Czech recordings from the Common Voice, with input sound waves passed through the usual preprocessing – computing Mel-frequency cepstral coefficients (MFCCs). You can repeat this preprocessing on a given audio using the wav_decode and mfcc_extract methods from the common_voice_cs.py module. This module can also load the dataset, downloading it when necessary (note that it has 200MB, so it might take a while). Furthermore, you can listen to the development portion of the dataset.

This is an open-data task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.

The task is a competition. The evaluation is performed by computing the edit distance to the gold letter sequence, normalized by its length (a corresponding Keras metric EditDistanceMetric is provided by the common_voice_cs.py). Everyone who submits a solution with at most 50% test set edit distance gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that you can evaluate the predictions as usual using the common_voice_cs.py module, either by running with --evaluate=path arguments, or using its evaluate_file method.

Start with the speech_recognition.py template which contains instructions for using the CTC loss and generates the test set annotation in the required format.

tagger_crf_manual

 Deadline: May 17, 23:59  1 points

This assignment is an extension of tagger_crf, where we will perform the CRF loss computation (but not CRF decoding) manually.

The tagger_crf_manual.py template is nearly identical to tagger_crf, the only difference is the crf_loss method, where you should manually implement the CRF loss.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tagger_crf_manual.py --max_sentences=5000 --rnn_cell=LSTM --rnn_cell_dim=24
Epoch 1/5 loss: 18.5475 - val_f1: 0.0248
Epoch 2/5 loss: 9.8655 - val_f1: 0.2207
Epoch 3/5 loss: 6.0053 - val_f1: 0.3370
Epoch 4/5 loss: 3.1784 - val_f1: 0.4000
Epoch 5/5 loss: 1.6535 - val_f1: 0.4363
  • python3 tagger_crf_manual.py --max_sentences=5000 --rnn_cell=GRU --rnn_cell_dim=24
Epoch 1/5 loss: 17.7499 - val_f1: 0.1624
Epoch 2/5 loss: 8.3992 - val_f1: 0.4048
Epoch 3/5 loss: 3.7579 - val_f1: 0.4444
Epoch 4/5 loss: 1.5298 - val_f1: 0.4496
Epoch 5/5 loss: 0.7858 - val_f1: 0.4769

lemmatizer_noattn

 Deadline: May 17, 23:59  3 points

The goal of this assignment is to create a simple lemmatizer. For training and evaluation, we use the same dataset as in tagger_we loadable by the updated morpho_dataset.py module.

Your goal is to modify the lemmatizer_noattn.py template and implement the following:

  • Embed characters of source forms and run a bidirectional GRU encoder.
  • Embed characters of target lemmas.
  • Implement a training time decoder which uses gold target characters as inputs.
  • Implement an inference time decoder which uses previous predictions as inputs.
  • The initial state of both decoders is the output state of the corresponding GRU encoded form.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 lemmatizer_noattn.py --max_sentences=1000 --batch_size=2 --cle_dim=24 --rnn_dim=24 --epochs=3
Epoch 1/3 loss: 2.5645 - val_loss: 0.0000e+00 - val_accuracy: 0.1372
Epoch 2/3 loss: 1.9879 - val_loss: 0.0000e+00 - val_accuracy: 0.2061
Epoch 3/3 loss: 1.4119 - val_loss: 0.0000e+00 - val_accuracy: 0.2874
loss: 0.0000e+00 - accuracy: 0.2921
  • python3 lemmatizer_noattn.py --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --epochs=3
Epoch 1/3 loss: 2.5907 - val_loss: 0.0000e+00 - val_accuracy: 0.1206
Epoch 2/3 loss: 2.1792 - val_loss: 0.0000e+00 - val_accuracy: 0.2160
Epoch 3/3 loss: 1.5338 - val_loss: 0.0000e+00 - val_accuracy: 0.2590
loss: 0.0000e+00 - accuracy: 0.2653

lemmatizer_attn

 Deadline: May 17, 23:59  3 points

This task is a continuation of the lemmatizer_noattn assignment. Using the lemmatizer_attn.py template, implement the following features in addition to lemmatizer_noattn:

  • The bidirectional GRU encoder returns outputs for all input characters, not just the last.
  • Implement attention in the decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply non-linearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.

Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 lemmatizer_attn.py --max_sentences=1000 --batch_size=2 --cle_dim=24 --rnn_dim=24 --epochs=3
Epoch 1/3 loss: 2.4224 - val_loss: 0.0000e+00 - val_accuracy: 0.1627
Epoch 2/3 loss: 1.8042 - val_loss: 0.0000e+00 - val_accuracy: 0.2574
Epoch 3/3 loss: 0.9277 - val_loss: 0.0000e+00 - val_accuracy: 0.2998
loss: 0.0000e+00 - accuracy: 0.3083
  • python3 lemmatizer_attn.py --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --epochs=3
Epoch 1/3 loss: 2.6011 - val_loss: 0.0000e+00 - val_accuracy: 0.1232
Epoch 2/3 loss: 2.1855 - val_loss: 0.0000e+00 - val_accuracy: 0.2124
Epoch 3/3 loss: 1.4435 - val_loss: 0.0000e+00 - val_accuracy: 0.2649
loss: 0.0000e+00 - accuracy: 0.2815

lemmatizer_competition

 Deadline: May 17, 23:59  4 points+5 bonus

In this assignment, you should extend lemmatizer_noattn or lemmatizer_attn into a real-world Czech lemmatizer. As in tagger_competition, we will use Czech PDT dataset loadable using the morpho_dataset.py module.

You can also use the following additional data as in the tagger_competition assignment.

The task is a competition. Everyone who submits a solution a solution with at least 96% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing pre-neural-network state-of-the-art of 97.86%.

You can start with the lemmatizer_competition.py template, which among others generates test set annotations in the required format. Note that you can evaluate the predictions as usual using the morpho_dataset.py module, either by running with --task=lemmatizer --evaluate=path arguments, or using its evaluate_file method.

tagger_transformer

 Deadline: May 24, 23:59  3 points

This assignment is a continuation of tagger_we. Using the tagger_transformer.py template, implement a Transformer encoder.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 tagger_transformer.py --max_sentences=5000 --transformer_layers=0
Epoch 1/5 loss: 1.9822 - accuracy: 0.4003 - val_loss: 0.8465 - val_accuracy: 0.7235
Epoch 2/5 loss: 0.6168 - accuracy: 0.8283 - val_loss: 0.5454 - val_accuracy: 0.8280
Epoch 3/5 loss: 0.2757 - accuracy: 0.9528 - val_loss: 0.4380 - val_accuracy: 0.8416
Epoch 4/5 loss: 0.1424 - accuracy: 0.9761 - val_loss: 0.4046 - val_accuracy: 0.8468
Epoch 5/5 loss: 0.0869 - accuracy: 0.9843 - val_loss: 0.3934 - val_accuracy: 0.8480
loss: 0.4082 - accuracy: 0.8472
  • python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=1
Epoch 1/5 loss: 1.6145 - accuracy: 0.4919 - val_loss: 0.4468 - val_accuracy: 0.8265
Epoch 2/5 loss: 0.1648 - accuracy: 0.9494 - val_loss: 0.5082 - val_accuracy: 0.8356
Epoch 3/5 loss: 0.0470 - accuracy: 0.9848 - val_loss: 0.6596 - val_accuracy: 0.8202
Epoch 4/5 loss: 0.0256 - accuracy: 0.9909 - val_loss: 0.5639 - val_accuracy: 0.8291
Epoch 5/5 loss: 0.0187 - accuracy: 0.9931 - val_loss: 0.5991 - val_accuracy: 0.8387
loss: 0.6571 - accuracy: 0.8292
  • python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4
Epoch 1/5 loss: 1.6144 - accuracy: 0.4935 - val_loss: 0.4483 - val_accuracy: 0.8250
Epoch 2/5 loss: 0.1598 - accuracy: 0.9522 - val_loss: 0.5113 - val_accuracy: 0.8374
Epoch 3/5 loss: 0.0449 - accuracy: 0.9853 - val_loss: 0.7293 - val_accuracy: 0.8174
Epoch 4/5 loss: 0.0267 - accuracy: 0.9906 - val_loss: 0.7311 - val_accuracy: 0.8071
Epoch 5/5 loss: 0.0189 - accuracy: 0.9931 - val_loss: 0.6877 - val_accuracy: 0.8417
loss: 0.8193 - accuracy: 0.8206
  • python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4 --transformer_dropout=0.1
Epoch 1/5 loss: 1.7227 - accuracy: 0.4576 - val_loss: 0.4702 - val_accuracy: 0.8175
Epoch 2/5 loss: 0.2176 - accuracy: 0.9332 - val_loss: 0.4847 - val_accuracy: 0.8403
Epoch 3/5 loss: 0.0621 - accuracy: 0.9813 - val_loss: 0.6176 - val_accuracy: 0.8063
Epoch 4/5 loss: 0.0385 - accuracy: 0.9869 - val_loss: 0.5598 - val_accuracy: 0.8232
Epoch 5/5 loss: 0.0312 - accuracy: 0.9893 - val_loss: 0.6466 - val_accuracy: 0.8203
loss: 0.7229 - accuracy: 0.8065

sentiment_analysis

 Deadline: May 31, 23:59  3 points

Perform sentiment analysis on Czech Facebook data using provided pre-trained Czech Electra small. The dataset consists of pairs of (document, label) and can be (down)loaded using the text_classification_dataset.py module. When loading the dataset, a tokenizer might be provided, and if it is, the document is also passed through the tokenizer and the resulting tokens are added to the dataset.

Even though this assignment is not a competition, your goal is to submit test set annotations with at least 77% accuracy. As usual, you can evaluate your predictions using the text_classification_dataset.py module, either by running with --evaluate=path arguments, or using its evaluate_file method.

Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.

You can start with the sentiment_analysis.py template, which among others loads the Electra Czech model and generates test set annotations in the required format. Note that bert_example.py module illustrate the usage of both the Electra tokenizer and the Electra model.

reading_comprehension

 Deadline: May 31, 23:59; non-competition part extended to Jun 30  5 points+5 bonus

May 27 Update: The evaluation was changed and is now performed only on non-empty answers. In other words, you do not need to decide if the answer is or is not in the context, but just to provide a best non-empty answer. However, the data was not modified, so you should ignore training data questions without answers during training (for development and test sets, provide predictions on the whole set, and the evaluation script will consider only the ones where the gold answers exist.)

Implement the best possible model for reading comprehension task using a translated version of the SQuAD 2.0 dataset, utilizing the provided pre-trained Czech Electra small.

The dataset can be loaded using the reading_comprehension_dataset.py module. The loaded dataset is the direct reprentation of the data and not yet ready to be directly trained on. Each of the train, dev and test datasets are composed of a list of paragraphs, each consisting of:

  • context: text with the information;
  • qas: list of questions and answers, where each item consists of:
    • question: text of the question;
    • answers: a list of answers, each answer is composed of:
      • text: string of the text, exactly as appearing in the context;
      • start: character offset of the answer text in the context.

Note that a question might not be answerable given the context, in which case the list of answers is empty. In the train and dev sets, each question has at most one answer, while in the test set there might be several answers. We evaluate the reading comprehension task using accuracy, where an answer is considered correct if its text is exactly equal to some correct answer. You can evaluate your predictions as usual with the reading_comprehension_dataset.py module, either by running with --evaluate=path arguments, or using its evaluate_file method.

The task is a competition. Everyone who submits a solution a solution with at least 49% answer accuracy gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that usually achieving 47% on the dev set is enough to get 49% on the test set (because of multiple references in the test set).

Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.

You can start with the reading_comprehension.py template, which among others (down)loads the data and Czech Electra small model, and describes the format of the required test set annotations.

vae

 Deadline: Jun 30, 23:59  3 points

In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format. Your goal is to modify the vae.py template and implement a VAE.

After submitting the assignment to ReCodEx, you can experiment with the three available datasets (mnist, mnist-fashion, and mnist-cifarcars) and different latent variable dimensionality (z_dim=2 and z_dim=100). The generated images are available in TensorBoard logs.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 vae.py --dataset=mnist --z_dim=2 --epochs=3
Epoch 1/3 reconstruction_loss: 0.2159 - latent_loss: 2.4693 - loss: 174.2038
Epoch 2/3 reconstruction_loss: 0.1928 - latent_loss: 2.7937 - loss: 156.7730
Epoch 3/3 reconstruction_loss: 0.1868 - latent_loss: 2.9350 - loss: 152.3162
  • python3 vae.py --dataset=mnist --z_dim=100 --epochs=3
Epoch 1/3 reconstruction_loss: 0.1837 - latent_loss: 0.1378 - loss: 157.7933
Epoch 2/3 reconstruction_loss: 0.1319 - latent_loss: 0.1847 - loss: 121.9125
Epoch 3/3 reconstruction_loss: 0.1209 - latent_loss: 0.1903 - loss: 113.7889
  • python3 vae.py --dataset=mnist-fashion --z_dim=2 --epochs=3
Epoch 1/3 reconstruction_loss: 0.3539 - latent_loss: 2.9950 - loss: 283.4177
Epoch 2/3 reconstruction_loss: 0.3324 - latent_loss: 3.0159 - loss: 266.6620
Epoch 3/3 reconstruction_loss: 0.3288 - latent_loss: 3.0269 - loss: 263.8320
  • python3 vae.py --dataset=mnist-fashion --z_dim=100 --epochs=3
Epoch 1/3 reconstruction_loss: 0.3400 - latent_loss: 0.1183 - loss: 278.3589
Epoch 2/3 reconstruction_loss: 0.3088 - latent_loss: 0.1061 - loss: 252.7133
Epoch 3/3 reconstruction_loss: 0.3029 - latent_loss: 0.1086 - loss: 248.3083
  • python3 vae.py --dataset=mnist-cifarcars --z_dim=2 --epochs=3
Epoch 1/3 reconstruction_loss: 0.6373 - latent_loss: 1.9468 - loss: 503.5290
Epoch 2/3 reconstruction_loss: 0.6307 - latent_loss: 2.0624 - loss: 498.5606
Epoch 3/3 reconstruction_loss: 0.6292 - latent_loss: 2.1156 - loss: 497.5026
  • python3 vae.py --dataset=mnist-cifarcars --z_dim=100 --epochs=3
Epoch 1/3 reconstruction_loss: 0.6359 - latent_loss: 0.0577 - loss: 504.3351
Epoch 2/3 reconstruction_loss: 0.6164 - latent_loss: 0.0714 - loss: 490.4035
Epoch 3/3 reconstruction_loss: 0.6097 - latent_loss: 0.0860 - loss: 486.5849

gan

 Deadline: Jun 30, 23:59  2 points

In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format. Your goal is to modify the gan.py template and implement a GAN.

After submitting the assignment to ReCodEx, you can experiment with the three available datasets (mnist, mnist-fashion, and mnist-cifarcars) and maybe try different latent variable dimensionality. The generated images are available in TensorBoard logs.

You can also continue with dcgan assignment.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 gan.py --dataset=mnist --z_dim=2 --epochs=5
Epoch 1/5 discriminator_loss: 0.0811 - generator_loss: 5.2954 - loss: 1.7356 - discriminator_accuracy: 0.9826
Epoch 2/5 discriminator_loss: 0.0776 - generator_loss: 3.8221 - loss: 1.3290 - discriminator_accuracy: 0.9926
Epoch 3/5 discriminator_loss: 0.0686 - generator_loss: 4.3589 - loss: 1.3821 - discriminator_accuracy: 0.9920
Epoch 4/5 discriminator_loss: 0.0694 - generator_loss: 4.4692 - loss: 1.4952 - discriminator_accuracy: 0.9910
Epoch 5/5 discriminator_loss: 0.0668 - generator_loss: 4.5452 - loss: 1.5248 - discriminator_accuracy: 0.9919
  • python3 gan.py --dataset=mnist --z_dim=100 --epochs=5
Epoch 1/5 discriminator_loss: 0.0526 - generator_loss: 5.6836 - loss: 1.5494 - discriminator_accuracy: 0.9826
Epoch 2/5 discriminator_loss: 0.0333 - generator_loss: 5.9819 - loss: 1.9048 - discriminator_accuracy: 0.9978
Epoch 3/5 discriminator_loss: 0.0660 - generator_loss: 5.0259 - loss: 1.7150 - discriminator_accuracy: 0.9934
Epoch 4/5 discriminator_loss: 0.1227 - generator_loss: 4.9251 - loss: 1.8218 - discriminator_accuracy: 0.9871
Epoch 5/5 discriminator_loss: 0.2496 - generator_loss: 4.0308 - loss: 1.4528 - discriminator_accuracy: 0.9609
  • python3 gan.py --dataset=mnist-fashion --z_dim=2 --epochs=5
Epoch 1/5 discriminator_loss: 0.1560 - generator_loss: 12.4313 - loss: 1.6760 - discriminator_accuracy: 0.9788
Epoch 2/5 discriminator_loss: 0.1748 - generator_loss: 21.1818 - loss: 10.1500 - discriminator_accuracy: 0.9644
Epoch 3/5 discriminator_loss: 0.0691 - generator_loss: 11.8005 - loss: 5.7323 - discriminator_accuracy: 0.9919
Epoch 4/5 discriminator_loss: 0.0429 - generator_loss: 15.0839 - loss: 5.9234 - discriminator_accuracy: 0.9928
Epoch 5/5 discriminator_loss: 0.0687 - generator_loss: 9.5255 - loss: 2.9274 - discriminator_accuracy: 0.9906
  • python3 gan.py --dataset=mnist-fashion --z_dim=100 --epochs=5
Epoch 1/5 discriminator_loss: 0.0710 - generator_loss: 7.7963 - loss: 1.8059 - discriminator_accuracy: 0.9803
Epoch 2/5 discriminator_loss: 0.0728 - generator_loss: 7.2306 - loss: 2.4866 - discriminator_accuracy: 0.9910
Epoch 3/5 discriminator_loss: 0.1112 - generator_loss: 5.6444 - loss: 1.8976 - discriminator_accuracy: 0.9852
Epoch 4/5 discriminator_loss: 0.1899 - generator_loss: 4.5056 - loss: 1.6542 - discriminator_accuracy: 0.9748
Epoch 5/5 discriminator_loss: 0.3114 - generator_loss: 4.0829 - loss: 1.5674 - discriminator_accuracy: 0.9381
  • python3 gan.py --dataset=mnist-cifarcars --z_dim=2 --epochs=5
Epoch 1/5 discriminator_loss: 0.7178 - generator_loss: 4.3867 - loss: 0.9027 - discriminator_accuracy: 0.8721
Epoch 2/5 discriminator_loss: 0.3499 - generator_loss: 4.4815 - loss: 2.1730 - discriminator_accuracy: 0.9631
Epoch 3/5 discriminator_loss: 0.7672 - generator_loss: 2.7376 - loss: 1.2015 - discriminator_accuracy: 0.8301
Epoch 4/5 discriminator_loss: 0.6904 - generator_loss: 2.9754 - loss: 1.2297 - discriminator_accuracy: 0.8599
Epoch 5/5 discriminator_loss: 0.8773 - generator_loss: 2.4737 - loss: 1.1036 - discriminator_accuracy: 0.7979
  • python3 gan.py --dataset=mnist-cifarcars --z_dim=100 --epochs=5
Epoch 1/5 discriminator_loss: 0.5299 - generator_loss: 4.1585 - loss: 1.2538 - discriminator_accuracy: 0.8787
Epoch 2/5 discriminator_loss: 0.6910 - generator_loss: 2.3183 - loss: 0.9271 - discriminator_accuracy: 0.8682
Epoch 3/5 discriminator_loss: 1.1221 - generator_loss: 1.9830 - loss: 1.1333 - discriminator_accuracy: 0.7479
Epoch 4/5 discriminator_loss: 1.3696 - generator_loss: 1.0735 - loss: 0.8271 - discriminator_accuracy: 0.6637
Epoch 5/5 discriminator_loss: 1.4549 - generator_loss: 0.9048 - loss: 0.7935 - discriminator_accuracy: 0.5939

dcgan

 Deadline: Jun 30, 23:59  1 points

This task is a continuation of the gan assignment, which you will modify to implement the Deep Convolutional GAN (DCGAN).

Start with the dcgan.py template and implement a DCGAN. Note that most of the TODO notes are from the gan assignment.

After submitting the assignment to ReCodEx, you can experiment with the three available datasets (mnist, mnist-fashion, and mnist-cifarcars). However, note that you will need a lot of computational power (preferably a GPU) to generate the images; the example outputs below were also generated on a GPU, which means the results are nondeterministic.

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 dcgan.py --dataset=mnist --z_dim=2 --epochs=3
Epoch 1/3 discriminator_loss: 0.2638 - generator_loss: 3.3597 - loss: 0.9523 - discriminator_accuracy: 0.9061
Epoch 2/3 discriminator_loss: 0.0299 - generator_loss: 5.7561 - loss: 1.7968 - discriminator_accuracy: 0.9972
Epoch 3/3 discriminator_loss: 0.0197 - generator_loss: 5.9106 - loss: 1.8184 - discriminator_accuracy: 0.9981
  • python3 dcgan.py --dataset=mnist --z_dim=100 --epochs=3
Epoch 1/3 discriminator_loss: 0.2744 - generator_loss: 3.3752 - loss: 0.9341 - discriminator_accuracy: 0.8809
Epoch 2/3 discriminator_loss: 0.0297 - generator_loss: 5.6908 - loss: 1.7981 - discriminator_accuracy: 0.9954
Epoch 3/3 discriminator_loss: 0.0257 - generator_loss: 6.2856 - loss: 2.1166 - discriminator_accuracy: 0.9974
  • python3 dcgan.py --dataset=mnist-fashion --z_dim=2 --epochs=3
Epoch 1/3 discriminator_loss: 0.3830 - generator_loss: 2.5970 - loss: 0.8996 - discriminator_accuracy: 0.9198
Epoch 2/3 discriminator_loss: 0.2759 - generator_loss: 3.3412 - loss: 1.1519 - discriminator_accuracy: 0.9545
Epoch 3/3 discriminator_loss: 0.2125 - generator_loss: 3.9514 - loss: 1.3584 - discriminator_accuracy: 0.9681
  • python3 dcgan.py --dataset=mnist-fashion --z_dim=100 --epochs=3
Epoch 1/3 discriminator_loss: 0.4766 - generator_loss: 2.4001 - loss: 0.8588 - discriminator_accuracy: 0.8763
Epoch 2/3 discriminator_loss: 0.4254 - generator_loss: 2.8352 - loss: 1.0735 - discriminator_accuracy: 0.9250
Epoch 3/3 discriminator_loss: 0.3939 - generator_loss: 3.0114 - loss: 1.1252 - discriminator_accuracy: 0.9285
  • python3 dcgan.py --dataset=mnist-cifarcars --z_dim=2 --epochs=3
Epoch 1/3 discriminator_loss: 0.8294 - generator_loss: 1.4831 - loss: 0.7460 - discriminator_accuracy: 0.7689
Epoch 2/3 discriminator_loss: 0.4352 - generator_loss: 2.4002 - loss: 0.9303 - discriminator_accuracy: 0.9297
Epoch 3/3 discriminator_loss: 0.3052 - generator_loss: 3.0020 - loss: 1.0943 - discriminator_accuracy: 0.9627
  • python3 dcgan.py --dataset=mnist-cifarcars --z_dim=100 --epochs=3
Epoch 1/3 discriminator_loss: 1.1401 - generator_loss: 1.0359 - loss: 0.7335 - discriminator_accuracy: 0.6756
Epoch 2/3 discriminator_loss: 0.8321 - generator_loss: 1.5365 - loss: 0.7724 - discriminator_accuracy: 0.7945
Epoch 3/3 discriminator_loss: 0.5566 - generator_loss: 2.2292 - loss: 0.9219 - discriminator_accuracy: 0.8965

monte_carlo

 Deadline: Jun 30, 23:59  2 points

Solve the discretized CartPole-v1 environment environment from the OpenAI Gym using the Monte Carlo reinforcement learning algorithm. The gym environments have the followng methods and properties:

  • observation_space: the description of environment observations
  • action_space: the description of environment actions
  • reset() → new_state: starts a new episode
  • step(action) → new_state, reward, done, info: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environment-specific information
  • render(): render current environment state

We additionaly extend the gym environment by:

  • episode: number of the current episode (zero-based)
  • reset(start_evaluation=False) → new_state: if start_evaluation is True, an evaluation is started

Once you finish training (which you indicate by passing start_evaluate=True to reset), your goal is to reach an average return of 475 during 100 evaluation episodes. Note that the environment prints your 100-episode average return each 10 episodes even during training.

You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.

During evaluation in ReCodEx, three different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.

reinforce

 Deadline: Jun 30, 23:59  2 points

Solve the continuous CartPole-v1 environment environment from the OpenAI Gym using the REINFORCE algorithm. The continuous environment is very similar to the discrete one, except that the states are vectors of real-valued observations with shape env.observation_space.shape.

Your goal is to reach an average return of 475 during 100 evaluation episodes. Start with the reinforce.py template.

During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.

reinforce_baseline

 Deadline: Jun 30, 23:59  2 points

This is a continuation of the reinforce assignment.

Using the reinforce_baseline.py template, solve the CartPole-v1 environment environment using the REINFORCE with baseline algorithm.

Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.

Your goal is to reach an average return of 475 during 100 evaluation episodes.

During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.

reinforce_pixels

 Deadline: Jun 30, 23:59  2 points

This is a continuation of the reinforce or reinforce_baseline assignments.

The supplied cart_pole_pixels_environment.py generates a pixel representation of the CartPole environment as an 80×8080×80 image with three channels, with each channel representing one time step (i.e., the current observation and the two previous ones).

To pass the assignment, you need to reach an average return of 400 in 100 evaluation episodes. During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 10 minutes.

You should probably train the model locally and submit the already pretrained model to ReCodEx.

You can start with the reinforce_pixels.py template using the correct environment.

learning_to_learn

 Deadline: Jun 30, 23:59  4 points

Implement a simple variant of learning-to-learn architecture. Utilizing the Omniglot dataset loadable using the omniglot_dataset.py module, the goal is to learn to classify a sequence of images using a custom hierarchy by employing external memory.

The inputs image sequences consists of args.classes random chosen Omniglot classes, each class being assigned a randomly chosen label. For every chosen class, args.images_per_class images are randomly selected. Apart from the images, the input contain the random labels one step after the corresponding images (with the first label being -1). The gold outputs are also the labels, but without the one-step offset.

The input images should be passed through a CNN feature extraction module and then processed using memory augmented LSTM controller; the external memory contains enough memory cells, each with args.cell_size units. In each step, the controller emits:

  • args.read_heads read keys, each used to perform a read from memory as a weighted combination of cells according to the softmax of cosine similarities of the read key and the memory cells;
  • a write value, which is prepended to the memory (dropping the last cell).

These tests are identical to the ones in ReCodEx, apart from a different random seed. Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 learning_to_learn.py --recodex --train_episodes=160 --test_episodes=160 --epochs=3 --classes=2
Epoch 1/3 loss: 0.8135 - acc: 0.5100 - acc1: 0.5254 - acc2: 0.5250 - acc5: 0.5102 - acc10: 0.5086 - val_loss: 0.6928 - val_acc: 0.5000 - val_acc1: 0.5000 - val_acc2: 0.5000 - val_acc5: 0.5000 - val_acc10: 0.5000
Epoch 2/3 loss: 0.7014 - acc: 0.4985 - acc1: 0.4974 - acc2: 0.4868 - acc5: 0.4918 - acc10: 0.5170 - val_loss: 0.6914 - val_acc: 0.5522 - val_acc1: 0.7750 - val_acc2: 0.6344 - val_acc5: 0.5125 - val_acc10: 0.4719
Epoch 3/3 loss: 0.6932 - acc: 0.5045 - acc1: 0.5233 - acc2: 0.4772 - acc5: 0.5386 - acc10: 0.5403 - val_loss: 0.6902 - val_acc: 0.5416 - val_acc1: 0.7500 - val_acc2: 0.6125 - val_acc5: 0.4844 - val_acc10: 0.4781
  • python3 learning_to_learn.py --recodex --train_episodes=160 --test_episodes=160 --epochs=3 --classes=5
Epoch 1/3 loss: 1.6601 - acc: 0.1993 - acc1: 0.2227 - acc2: 0.1895 - acc5: 0.1909 - acc10: 0.2063 - val_loss: 1.6094 - val_acc: 0.2077 - val_acc1: 0.2163 - val_acc2: 0.2313 - val_acc5: 0.2013 - val_acc10: 0.1900
Epoch 2/3 loss: 1.6168 - acc: 0.2089 - acc1: 0.2090 - acc2: 0.2406 - acc5: 0.2214 - acc10: 0.2048 - val_loss: 1.6079 - val_acc: 0.2027 - val_acc1: 0.2500 - val_acc2: 0.2125 - val_acc5: 0.1937 - val_acc10: 0.1900
Epoch 3/3 loss: 1.6129 - acc: 0.2111 - acc1: 0.2369 - acc2: 0.2266 - acc5: 0.1976 - acc10: 0.2131 - val_loss: 1.6066 - val_acc: 0.2184 - val_acc1: 0.3237 - val_acc2: 0.2237 - val_acc5: 0.2025 - val_acc10: 0.2000

Note that your results may be slightly different, depending on your CPU type and whether you use GPU.

  • python3 learning_to_learn.py --classes=2 --epochs=20
Epoch 1/20 loss: 0.6769 - acc: 0.5682 - acc1: 0.6769 - acc2: 0.5943 - acc5: 0.5546 - acc10: 0.5331 - val_loss: 0.4930 - val_acc: 0.7337 - val_acc1: 0.5415 - val_acc2: 0.6910 - val_acc5: 0.7525 - val_acc10: 0.8065
Epoch 2/20 loss: 0.3461 - acc: 0.8278 - acc1: 0.6054 - acc2: 0.7646 - acc5: 0.8629 - acc10: 0.8790 - val_loss: 0.2857 - val_acc: 0.8681 - val_acc1: 0.6345 - val_acc2: 0.8355 - val_acc5: 0.9050 - val_acc10: 0.9270
Epoch 3/20 loss: 0.2061 - acc: 0.9045 - acc1: 0.6381 - acc2: 0.8721 - acc5: 0.9407 - acc10: 0.9458 - val_loss: 0.2420 - val_acc: 0.8895 - val_acc1: 0.6160 - val_acc2: 0.8435 - val_acc5: 0.9295 - val_acc10: 0.9505
Epoch 4/20 loss: 0.1619 - acc: 0.9242 - acc1: 0.6459 - acc2: 0.9057 - acc5: 0.9607 - acc10: 0.9680 - val_loss: 0.1938 - val_acc: 0.9122 - val_acc1: 0.6420 - val_acc2: 0.8815 - val_acc5: 0.9585 - val_acc10: 0.9630
Epoch 5/20 loss: 0.1340 - acc: 0.9363 - acc1: 0.6693 - acc2: 0.9237 - acc5: 0.9692 - acc10: 0.9768 - val_loss: 0.2057 - val_acc: 0.9099 - val_acc1: 0.6735 - val_acc2: 0.8870 - val_acc5: 0.9405 - val_acc10: 0.9540
Epoch 10/20 loss: 0.0998 - acc: 0.9510 - acc1: 0.6949 - acc2: 0.9545 - acc5: 0.9833 - acc10: 0.9855 - val_loss: 0.1590 - val_acc: 0.9273 - val_acc1: 0.6585 - val_acc2: 0.9055 - val_acc5: 0.9690 - val_acc10: 0.9735
Epoch 20/20 loss: 0.0739 - acc: 0.9604 - acc1: 0.7074 - acc2: 0.9712 - acc5: 0.9913 - acc10: 0.9937 - val_loss: 0.1510 - val_acc: 0.9356 - val_acc1: 0.6815 - val_acc2: 0.9270 - val_acc5: 0.9665 - val_acc10: 0.9785
  • python3 learning_to_learn.py --classes=5 --epochs=20
Epoch 1/20 loss: 1.6013 - acc: 0.2300 - acc1: 0.3162 - acc2: 0.2454 - acc5: 0.2198 - acc10: 0.2094 - val_loss: 1.3712 - val_acc: 0.3809 - val_acc1: 0.3884 - val_acc2: 0.3504 - val_acc5: 0.3692 - val_acc10: 0.4240
Epoch 2/20 loss: 1.1060 - acc: 0.5052 - acc1: 0.3377 - acc2: 0.4164 - acc5: 0.5215 - acc10: 0.5802 - val_loss: 0.8220 - val_acc: 0.6575 - val_acc1: 0.2498 - val_acc2: 0.5318 - val_acc5: 0.7168 - val_acc10: 0.7626
Epoch 3/20 loss: 0.6655 - acc: 0.7209 - acc1: 0.2486 - acc2: 0.5665 - acc5: 0.7999 - acc10: 0.8255 - val_loss: 0.8701 - val_acc: 0.6682 - val_acc1: 0.2568 - val_acc2: 0.5396 - val_acc5: 0.7256 - val_acc10: 0.7730
Epoch 4/20 loss: 0.5154 - acc: 0.7879 - acc1: 0.2612 - acc2: 0.6505 - acc5: 0.8734 - acc10: 0.8924 - val_loss: 0.6253 - val_acc: 0.7506 - val_acc1: 0.2554 - val_acc2: 0.6304 - val_acc5: 0.8302 - val_acc10: 0.8462
Epoch 5/20 loss: 0.4474 - acc: 0.8171 - acc1: 0.2783 - acc2: 0.7003 - acc5: 0.9011 - acc10: 0.9188 - val_loss: 0.5924 - val_acc: 0.7648 - val_acc1: 0.2682 - val_acc2: 0.6552 - val_acc5: 0.8434 - val_acc10: 0.8568
Epoch 10/20 loss: 0.3356 - acc: 0.8611 - acc1: 0.3086 - acc2: 0.7996 - acc5: 0.9382 - acc10: 0.9466 - val_loss: 0.6684 - val_acc: 0.7719 - val_acc1: 0.3100 - val_acc2: 0.6982 - val_acc5: 0.8192 - val_acc10: 0.8752
Epoch 20/20 loss: 0.2499 - acc: 0.8953 - acc1: 0.3398 - acc2: 0.8851 - acc5: 0.9635 - acc10: 0.9741 - val_loss: 0.5017 - val_acc: 0.8230 - val_acc1: 0.3202 - val_acc2: 0.7908 - val_acc5: 0.8802 - val_acc10: 0.9178

In the competitions, your goal is to train a model and then predict target values on the given unannotated test set.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you can include any number of files of any kind, and either submit them individually or compess them in a .zip file. However, there should be exactly one text file with the test set annotation (.txt) and at least one Python source (.py/ipynb) containing the model training and prediction. The Python sources are not executed, but must be included for inspection.

Evaluation in ReCodEx

  • For every submission, ReCodEx checks the above conditions (exactly one .txt, at least one .py/ipynb) and whether the given annotations can be evaluated without error. If not, it will report a corresponding error in the logs.

  • Before the deadline, ReCodEx prints the exact achieved score, but only if it is worse than the baseline.

    If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached score.

  • After the competition deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.

  • After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.

What Is Allowed

  • You can use the given annotated training data in any way.
  • You can use the given annotated development data for evaluation or hyperparameter tuning, but not for the training itself.
  • Additionally, you can use any unannotated or manually created data for training and evaluation.
  • The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like hand-written rules).
  • Do not use test set annotations in any way, if you somehow get access to them.
  • Unless stated otherwise, you can use any algorithm to solve the competition task at hand. The implementation should be either created by you or it can be based on some publicly available implementation, in which case you must reference it and you must understand it fully.
  • If you utilize an already trained model, it must be trained only on the allowed training data, unless stated otherwise.

Install

  • Installing to central user packages repository

    You can install all required packages to central user packages repository using pip3 install --user --upgrade pip setuptools followed by pip3 install --user tensorflow==2.4.1 tensorflow-addons==0.12.1 tensorflow-probability==0.12.1 tensorflow-hub==0.11.0 gym==0.18.0.

  • Installing to a virtual environment

    Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running python3 -m venv VENV_DIR and then install the required packages with VENV_DIR/bin/pip3 install --upgrade pip setuptools followed by VENV_DIR/bin/pip3 install tensorflow==2.4.1 tensorflow-addons==0.12.1 tensorflow-probability==0.12.1 tensorflow-hub==0.11.0 gym==0.18.0.

  • Installing to MetaCentrum

    As of Apr 2021, the minimum CUDA version across MetaCentrum is 10.2, and the highest officially available CUDA+cuDNN is also 10.2. Therefore, I have build TensorFlow 2.4.1 for CUDA 10.2 and cuDNN 7.6 to use on MetaCentrum.

    During installation, start by using official Python 3.6 and CUDA+cuDNN packages via module add python-3.6.2-gcc cuda/cuda-10.2.89-gcc-6.3.0-34gtciz cudnn/cudnn-7.6.5.32-10.2-linux-x64-gcc-6.3.0-xqx4s5f. Note that this command must be always executed before using the installed TensorFlow.

    Then create a virtual environment by python3 -m venv VENV_DIR and install the required packages with VENV_DIR/bin/pip3 install --upgrade pip setuptools followed by VENV_DIR/bin/pip3 install https://ufal.mff.cuni.cz/~straka/packages/tf/2.4/metacentrum/tensorflow-2.4.1-cp36-cp36m-linux_x86_64.whl https://ufal.mff.cuni.cz/~straka/packages/tf/2.4/metacentrum/tensorflow_addons-0.12.1-cp36-cp36m-linux_x86_64.whl tensorflow-probability==0.12.1 tensorflow-hub==0.11.0 gym==0.18.0.

  • Windows TensorFlow fails with ImportError: DLL load failed

    If your Windows TensorFlow fails with ImportError: DLL load failed, you are probably missing Visual C++ 2019 Redistributable.

  • Cannot start TensorBoard after installation

    If tensorboard cannot be found, make sure the directory with pip installed packages is in your PATH (that directory is either in your virtual environment if you use a virtual environment, or it should be ~/.local/bin on Linux and %UserProfile%\AppData\Roaming\Python\Python3[5-7] and %UserProfile%\AppData\Roaming\Python\Python3[5-7]\Scripts on Windows).

Git

  • Is it possible to keep the solutions in a Git repository

    Definitely, keeping the solutions in a branch of your repository, where you merge it with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.

  • Do not create a public fork of the repository on Github

    On Github, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.

  • How to clone the course repository

    To clone the course repository, run

    git clone https://github.com/ufal/npfl114
    

    This creates the repository in npfl114 subdirectory; if you want a different name, add it as a last parameter.

    If you want to store the repository just in a local branch of your existing repository, you can run the following command while in it:

    git remote add upstream https://github.com/ufal/npfl114
    git fetch upstream
    git checkout -t upstream/master
    

    This creates a branch master; if you want a different name, add -b BRANCH_NAME to the last command.

    In both cases, you can update your checkout by running git pull while in it.

  • How to merge the course repository with your modifications

    If you want to store your solutions in a branch merged with the course repository, you should start by

    git remote add upstream https://github.com/ufal/npfl114
    git pull upstream master
    

    which creates a branch master; if you want a different name, change the last argument to master:BRANCH_NAME.

    You can then commit to this branch and push it to some central repository.

    To merge the current course repository with your branch, run

    git merge ustream master
    

    while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same place in the templates.

ReCodEx

  • What are the tests used by ReCodEx

    The tests used by ReCodEx correspond to the examples from the course website (unless stated otherwise), but they use a different random seed (so the results are not the same), and sometimes they use smaller number of epochs/iterations to finish sooner.

Debugging

  • How to debug problems “inside” computation graphs with weird stack traces?

    At the beginning of your program, run

    tf.config.run_functions_eagerly(True)
    

    The tf.funcions (with the exception of the ones used in tf.data pipelines) are then not traced (i.e., no computation graphs are created) and the pure Python code is executed instead.

  • How to debug problems “inside” tf.data pipelines with weird stack traces?

    Unfortunately, the solution above does not affect tracing in tf.data pipelines (for example in tf.data.Dataset.map). However, since TF 2.5, the command

    tf.data.experimental.enable_debug_mode()
    

    should disable any asynchrony, parallelism, or non-determinism and forces Python execution (as opposed to trace-compiled graph execution) of user-defined functions passed into transformations such as tf.data.Dataset.map.

GPU

  • Requirements for using a GPU

    To use an NVIDIA GPU with TensorFlow 2.4, you need to install CUDA 11.0 and cuDNN 8.0 – see the details about GPU support.

  • Errors when running with a GPU

    If you encounter errors when running with a GPU:

    • if you are using the GPU also for displaying, try using the following environment variable: export TF_FORCE_GPU_ALLOW_GROWTH=true
    • you can rerun with export TF_CPP_MIN_LOG_LEVEL=0 environmental variable, which increases verbosity of the log messages.

tf.ragged

  • Bug when RaggedTensors are used in backward/bidirectional direction and whole sequence is returned

    In TF 2.4, RaggedTensors processed by backward (and therefore also by bidirectional) RNNs produce bad results when whole sequences are returned. (Producing only the last output or processing in forward direction is fine.) The problem has been fixed in the master branch and also in the TF 2.5 branch.

    A workaround is to use the manual to/from dense tensor conversion described in the next point.

  • Slow RNNs when using RaggedTensors on GPU

    Unfortunately, the current LSTM/GRU implementation does not use cuDNN acceleration when processing RaggedTensors. However, you can get around it by manually converting the RaggedTensors to dense before/after the layer, so when inputs is a tf.RaggedTensor,

    • if rnn is a tf.keras.layers.LSTM/GRU/RNN/Bidirectional layer producing a single output, you can use the following workaround:
      outputs = rnn(inputs.to_tensor(), mask=tf.sequence_mask(inputs.row_lengths()))
      
    • if rnn is a tf.keras.layers.LSTM/GRU/RNN/Bidirectional layer producing a whole sequence, in addition to the above line you also need to convert the dense result back to a RaggedTensor via for example:
      outputs = tf.RaggedTensor.from_tensor(outputs, inputs.row_lengths())
      

tf.data

  • How to look what is in a tf.data.Dataset?

    The tf.data.Dataset is not just an array, but a description of a pipeline, which can produce data if requested. A simple way to run the pipeline is to iterate it using Python iterators:

    dataset = tf.data.Dataset.range(10)
    for entry in dataset:
        print(entry)
    
  • How to use tf.data.Dataset with model.fit or model.evaluate?

    To use a tf.data.Dataset in Keras, the dataset elements should be pairs (input_data, gold_labels), where input_data and gold_labels must be already batched. For example, given CAGS dataset, you can preprocess training data for cags_classification as (for development data, you would remove the .shuffle):

    train = cags.train.map(lambda example: (example["image"], example["label"]))
    train = train.shuffle(10000, seed=args.seed)
    train = train.batch(args.batch_size)
    
  • Is every iteration through a tf.data.Dataset the same?

    No. Because the dataset is only a pipeline generating data, it is called each time the dataset is iterated – therefore, every .shuffle is called in every iteration.

  • How to generate different random numbers each epoch during tf.data.Dataset.map?

    When a global random seed is set, methods like tf.random.uniform generate the same sequence of numbers on each iteration.

    Instead, create a Generator object and use it to produce random numbers.

    generator = tf.random.Generator.from_seed(42)
    data = tf.data.Dataset.from_tensor_slices(tf.zeros(10, tf.int32))
    data = data.map(lambda x: x + generator.uniform([], maxval=10, dtype=tf.int32))
    for _ in range(3):
        print(*[element.numpy() for element in data])
    
  • How to call numpy methods or other non-tf functions in tf.data.Dataset.map?

    You can use tf.numpy_function to call a numpy function even in a computational graph. However, the results have no static shape information and you need to set it manually – ideally using tf.ensure_shape, which both sets the static shape and verifies during execution that the real shape mathes it.

    For example, to use the bboxes_training method from bboxes_utils, you could proceed as follows:

    anchors = np.array(...)
    
    def prepare_data(example):
        anchor_classes, anchor_bboxes = tf.numpy_function(
            bboxes_utils.bboxes_training, [anchors, example["classes"], example["bboxes"], 0.5], (tf.int32, tf.float32))
        anchor_classes = tf.ensure_shape(anchor_classes, [len(anchors)])
        anchor_bboxes = tf.ensure_shape(anchor_bboxes, [len(anchors), 4])
        ...
    
  • How to use ImageDataGenerator in tf.data.Dataset.map?

    The ImageDataGenerator offers a .random_transform method, so we can use tf.numpy_function from the previous answer:

    train_generator = tf.keras.preprocessing.image.ImageDataGenerator(...)
    
    def augment(image, label):
        return tf.ensure_shape(
            tf.numpy_function(train_generator.random_transform, [image], tf.float32),
            image.shape
        ), label
    dataset.map(augment)
    

Finetuning

  • How to make a part of the network frozen, so that its weights are not updated?

    Each tf.keras.layers.Layer/tf.keras.Model has a mutable trainable property indicating whether its variables should be updated – however, after changing it, you need to call .compile again (or otherwise make sure the list of trainable variables for the optimizer is updated).

    Note that once trainable == False, the insides of a layer are no longer considered, even if some its sub-layers have trainable == True. Therefore, if you want to freeze only some sub-layers of a layer you use in your model, the layer itself must have trainable == True.

  • How to choose whether dropout/batch normalization is executed in training or inference regime?

    When calling a tf.keras.layers.Layer/tf.keras.Model, a named option training can be specified, indicating whether training or inference regime should be used. For a model, this option is automatically passed to its layers which require it, and Keras automatically passes it during model.{fit,evaluate,predict}.

    However, you can manually pass for example training=False to a layer when using Functional API, meaning that layer is executed in the inference regime even when the whole model is training.

  • How does trainable and training interact?

    The only layer, which is influenced by both these options, is batch normalization, for which:

    • if trainable == False, the layer is always executed in inference regime;
    • if trainable == True, the training/inference regime is chosen according to the training option.

TensorBoard

  • How to create TensorBoard logs manually?

    Start by creating a SummaryWriter using for example:

    writer = tf.summary.create_file_writer(args.logdir, flush_millis=10 * 1000)
    

    and then you can generate logs inside a with writer.as_default() block.

    You can either specify step manually in each call, or you can set it as the first argument of as_default(). Also, during training you usually want to log only some batches, so the logging block during training usually looks like:

    if optimizer.iterations % 100 == 0:
        with self._writer.as_default(step=optimizer.iterations):
            # logging
    
  • What can be logged in TensorBoard?

    • scalar values:
      tf.summary.scalar(name like "train/loss", value, [step])
      
    • tensor values displayed as histograms or distributions:
      tf.summary.histogram(name like "train/output_layer", tensor value castable to `tf.float64`, [step])
      
    • images as tensors with shape [num_images, h, w, channels], where channels can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):
      tf.summary.image(name like "train/samples", images, [step], [max_outputs=at most this many images])
      
    • possibly large amount of text (e.g., all hyperparameter values, sample translations in MT, …) in Markdown format:
      tf.summary.text(name like "hyperparameters", markdown, [step])
      
    • audio as tensors with shape [num_clips, samples, channels] and values in [1,1][-1,1] range:
      tf.summary.audio(name like "train/samples", clips, sample_rate, [step], [max_outputs=at most this many clips])
      

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments, you obtain additional 50 bonus points.

To pass the exam, you need to obtain at least 60, 75 and 90 out of 100-point exam, to obtain grades 3, 2 and 1, respectively. (PhD students with binary grades require 75 points.) The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count.

Exam Questions

Lecture 1 Questions

  • Considering a neural network with DD input neurons, a single hidden layer with HH neurons, KK output neurons, hidden activation ff and output activation aa, list its parameters (including their size) and write down how is the output computed. [5]

  • List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]

  • Formulate the Universal approximation theorem. [5]

Lecture 2 Questions

  • Describe maximum likelihood estimation, as minimizing NLL, cross-entropy and KL divergence. [10]

  • Define mean squared error and show how it can be derived using MLE. [5]

  • Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]

  • Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]

  • Write down the backpropagation algorithm. [5]

  • Write down the mini-batch SGD algorithm with momentum. Then, formulate SGD with Nesterov momentum and show the difference between them. [5]

  • Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of 1/t1/\sqrt{t} in step tt. Then write down the RMSProp algorithm and explain how it solves the problem with the involuntary learning rate decay. [10]

  • Write down the Adam algorithm. Then show why the bias-correction terms (1βt)(1-\beta^t) make the estimation of the first and second moment unbiased. [10]

Lecture 3 Questions

  • Considering a neural network with DD input neurons, a single ReLU hidden layer with HH units and softmax output layer with KK units, write down the formulas of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input x\boldsymbol x, target tt and negative log likelihood loss. [10]

  • Assume a network with MSE loss generated a single output oRo \in \mathbb{R}, and the target output is gg. What is the value of the loss function itself, and what is the gradient of the loss function with respect to oo? [5]

  • Assume a network with cross-entropy loss generated a single output zRz \in \mathbb{R}, which is passed through the sigmoid output activation function, producing o=σ(z)o = \sigma(z). If the target output is gg, what is the value of the loss function itself, and what is the gradient of the loss function with respect to zz? [5]

  • Assume a network with cross-entropy loss generated a k-element output zRK\boldsymbol z \in \mathbb{R}^K, which is passed through the softmax output activation function, producing o=softmax(z)\boldsymbol o=\operatorname{softmax}(\boldsymbol z). If the target distribution is g\boldsymbol g, what is the value of the loss function itself, and what is the gradient of the loss function with respect to z\boldsymbol z? [5]

  • Define L2L_2 regularization and describe its effect both on the value of the loss function and on the value of the loss function gradient. [5]

  • Describe the dropout method and write down exactly how is it used during training and during inference. [5]

  • Describe how label smoothing works for cross-entropy loss, both for sigmoid and softmax activations. [5]

  • How are weights and biases initialized using the default Glorot initialization? [5]

Lecture 4 Questions

  • Write down the equation of how convolution of a given image is computed. Assume the input is an image II of size H×WH \times W with CC channels, the kernel KK has size N×MN \times M, the stride is T×ST \times S, the operation performed is in fact cross-correlation (as usual in convolutional neural networks) and that OO output channels are computed. [5]

  • Explain both SAME and VALID padding schemes and write down the output size of a convolutional operation with an N×MN \times M kernel on image of size H×WH \times W for both these padding schemes. [5]

  • Describe batch normalization and write down an algorithm how it is used during training and an algorithm how it is used during inference. Be sure to explicitly write over what is being normalized in case of fully connected layers, and in case of convolutional layers. [10]

  • Describe overall architecture of VGG-19 (you do not need to remember exact number of layers/filters, but you should describe which layers are used). [5]

Lecture 5 Questions

  • Describe overall architecture of ResNet. You do not need to remember exact number of layers/filters, but you should draw a bottleneck block (including the applications of BatchNorms and ReLUs) and state how residual connections work when the number of channels increases. [10]

  • Draw the original ResNet block and also the improved variant with full pre-activation. [5]

  • Compare the bottleneck block of ResNet and ResNeXt architectures (draw the latter using convolutions only, i.e., do not use grouped convolutions). [5]

  • Describe the CNN regularization method of networks with stochastic depth. [5]

  • Compare Cutout and BlockDrop. [5]

  • Describe Squeeze and Excitation applied to a ResNet block. [5]

  • Draw the Mobile inverted bottleneck block (including explanation of separable convolutions, the expansion factor, exact positions of BatchNorms and ReLUs, but without describing Squeeze and excitation bocks). [5]

  • Assume an input image II of size H×WH \times W with CC channels, and a convolutional kernel KK with size N×MN \times M, stride SS and OO output channels. Then write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5]

Lecture 7 Questions

  • Write down how is AP50\mathit{AP}_{50} computed. [5]

  • Considering a Fast-RCNN architecture, draw overall network architecture, explain what a RoI-pooling layer is, show how the network parametrizes bounding boxes and write down the loss. Finally, describe non-maximum suppression and how is the Fast-RCNN prediction performed. [10]

  • Considering a Faster-RCNN architecture, describe the region proposal network (its architecture, what are anchors, what does the loss look like). [5]

  • Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN architecture (the RoI-Align layer, the new mask-producing head). [5]

  • Write down the focal loss with class weighting, including the commonly used hyperparameter values. [5]

  • Draw the overall architecture of a RetinaNet architecture (the FPN architecture including the block combining feature maps of different resolutions; the classification and bounding box generation heads, including their output size). [5]

  • Draw the BiFPN block architecture, including the positions of all convolutions, BatchNorms and ReLUs. [5]

Lecture 8 Questions

  • Write down how the Long Short-Term Memory (LSTM) cell operates, including the explicit formulas. Also mention the forget gate bias. [10]

  • Write down how the Gated Recurrent Unit (GRU) operates, including the explicit formulas. [10]

  • Describe Highway network computation. [5]

  • Why the usual dropout cannot be used on recurrent state? Describe how can the problem be alleviated with variational dropout. [5]

  • Describe layer normalization and write down an algorithm how it is used during training and an algorithm how it is used during inference. [5]

  • Sketch a tagger architecture utilizing word embeddings, recurrent character-level word embeddings and two sentence-level RNNs with a residual connection. [10]

Lecture 9 Questions

  • Considering a linear-chain CRF, write down how a score of a label sequence y\boldsymbol y is defined, and how can a log probability be computed using the label sequence scores. [5]

  • Write down the dynamic programming algorithm for computing log probability of a linear-chain CRF, including its asymptotic complexity. [10]

  • Write down the dynamic programming algorithm for linear-chain CRF decoding, i.e., an algorithm computing the most probable label sequence y\boldsymbol y. [10]

  • In the context of CTC loss, describe regular and extended labelings and write down an algorithm for computing the log probability of a gold label sequence y\boldsymbol y. [10]

  • Describe how are CTC predictions performed using a beam-search. [5]

  • Draw the CBOW architecture from word2vec, including the sizes of the inputs and the sizes of the outputs and used non-linearities. Also make sure to indicate where are the embeddings being trained. [5]

  • Draw the SkipGram architecture from word2vec, including the sizes of the inputs and the sizes of the outputs and used non-linearities. Also make sure to indicate where are the embeddings being trained. [5]

  • Describe the hierarchical softmax used in word2vec. [5]

  • Describe the negative sampling proposed in word2vec. [5]

Lecture 10 Questions

  • Draw a sequence-to-sequence architecture for machine translation, both during training and during inference (without attention). [5]

  • Draw a sequence-to-sequence architecture for machine translation used during training, including the attention. Then write down how exactly is the attention computed. [10]

  • Explain how can word embeddings tying be used in a sequence-to-sequence architecture. [5]

  • Write down why are subword units used in text processing, and describe the BPE algorithm for constructing a subword dictionary from a large corpus. [5]

  • Write down why are subword units used in text processing, and describe the WordPieces algorithm for constructing a subword dictionary from a large corpus. [5]

  • Pinpoint the differences between the BPE and WordPieces algorithms, both during dictionary construction and during inference. [5]

Lecture 11 Questions

  • Describe the Transformer encoder architecture, including the description of self-attention (but you do not need to describe multi-head attention). [10]

  • Write down the formula of Transformer self-attention, and then describe multi-head self-attention in detail. [10]

  • Describe the Transformer decoder architecture, including the description of self-attention and masked self-attention (but you do not need to describe multi-head attention). [10]

  • Why are positional embeddings needed in Transformer architecture? Write down the sinusoidal positional embeddings used in the Transformer. [5]

  • Compare RNN to Transformer – what are the strengths and weaknesses of these architectures? [5]

  • Explain how are ELMo embeddings trained and how are they used in downstream applications. [5]

  • Describe the BERT architecture (you do not need to describe the (multi-head) self-attention operation). Elaborate also on what positional embeddings are used and what are the GELU activations. [10]

  • Describe the GELU activations and explain why are they a combination of ReLUs and Dropout. [5]

  • Elaborate on BERT training process (what are the two objectives used and how exactly are the corresponding losses computed). [10]

  • What alternatives to Next Sentence Prediction are proposed in RoBERTa and in ALBERT? [5]

Lecture 12 Questions

  • Write down the variational lower bound (ELBO) in the form of a reconstruction error minus the KL divergence between the encoder and the prior. Then prove it is actually a lower bound on probability logP(x)\log P(\boldsymbol x) (you can use Jensen's inequality if you want). [10]

  • Draw an architecture of a variational autoencoder (VAE). Pay attention to the parametrization of the distribution from the encoder (including the used activation functions), and show how to perform latent variable sampling so that it is differentiable with respect to the encoder parameters (the reparametrization trick). [10]

  • Write down the min-max formulation of generative adversarial network (GAN) objective. Then describe what loss is actually used for training the generator in order to avoid vanishing gradients at the beginning of the training. [5]

  • Write down the training algorithm of generative adversarial networks (GAN), including the losses minimized by the discriminator and the generator. Be sure to use the version of generator loss which avoids vanishing gradients at the beginning of the training. [10]

  • Explain how is the class label used when training a conditional generative adversarial network (CGAN). [5]

  • Illustrate that alternating SGD steps are not guaranteed to converge for a min-max problem. [5]

Lecture 13 Questions

  • Show how to incrementally update a running average (how to compute an average of NN numbers using the average of the first N1N-1 numbers). [5]

  • Describe multi-arm bandits and write down the ϵ\epsilon-greedy algorithm for solving it. [5]

  • Define a Markov Decision Process, including the definition of a return. [5]

  • Define a value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]

  • Define an action-value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]

  • Express a value function using an action-value function, and express an action-value function using a value function. [5]

  • Define optimal value function and optimal action-value function. Then define optimal policy in such a way that its existence is guaranteed. [5]

  • Write down the Monte-Carlo on-policy every-visit ϵ\epsilon-soft algorithm. [10]

  • Formulate the policy gradient theorem. [5]

  • Prove the part of the policy gradient theorem showing the value of θvπ(s)\nabla_{\boldsymbol\theta} v_\pi(s). [10]

  • Assuming the policy gradient theorem, formulate the loss used by the REINFORCE algorithm and show how can its gradient be expressed as an expectation over states and actions. [5]

  • Write down the REINFORCE algorithm. [10]

  • Show that introducing baseline does not influence validity of the policy gradient theorem. [5]

  • Write down the REINFORCE with baseline algorithm. [10]

Lecture 14 Questions

  • Sketch the overall structure and training procedure of the Neural Architecture Search. You do not need to describe how exactly is the block produced by the controller. [5]

  • Draw the WaveNet architecture (show the overall architecture, explain dilated convolutions, write down the gated activations, describe global and local conditioning). [10]

  • Define the Mixture of Logistic distribution used in the Teacher model of Parallel WaveNet, including the explicit formula of computing the likelihood of the data. [5]

  • Describe the changes in the Student model of Parallel WaveNet, which allow efficient sampling (how does the latent prior look like, how is the output data distribution modeled in a single iteration and then after multiple iterations). [5]

  • Describe the addressing mechanism used in Neural Turing Machines – show the overall structure including the required parameters, and explain content addressing, interpolation with location addressing, shifting and sharpening. [10]

  • Explain the overall architecture of a Neural Turing Machine with an LSTM controller, assuming RR reading heads and one write head. Describe the inputs and outputs of the LSTM controller itself, then how is the memory read from and written to, and how is the final output computed. You do not need to write down the implementation of the addressing mechanism (you can assume it is a function which gets parameters, memory and previous distribution, and computes a new distribution over memory cells). [10]