In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in image classification, object detection, lemmatization, speech recognition or 3D object recognition). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.
SIS code: NPFL114
Semester: summer
E-credits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka
All lectures and practicals will be recorded and available on this website.
1. Introduction to Deep Learning Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions numpy_entropy pca_first mnist_layers_activations
2. Training Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
3. Training Neural Networks II Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_regularization mnist_ensemble uppercase
4. Convolutional Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_cnn tf_dataset mnist_multiple cifar_competition
5. Convolutional Neural Networks II Slides PDF Slides CZ Lecture CZ Transposed Convolution CZ Transfer Learning CZ Practicals EN Lecture EN Transposed Convolution EN Practicals Questions cnn_manual cags_classification cags_segmentation
6. Object Detection Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions bboxes_utils svhn_competition
7. Recurrent Neural Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sequence_classification tagger_we tagger_cle tagger_competition
8. CRF, CTC, Word2Vec Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tensorboard_projector tagger_crf tagger_crf_manual speech_recognition
9. Easter Monday EN Practicals 3d_recognition homr_competition
10. Seq2seq, NMT, Transformer Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions lemmatizer_noattn lemmatizer_attn lemmatizer_competition
11. Transformer, BERT Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tagger_transformer sentiment_analysis reading_comprehension
12. Deep Reinforcement Learning, VAE Slides PDF Slides CZ Lecture EN Lecture EN Practicals Questions reinforce reinforce_baseline reinforce_pixels vae crac2023
13. Generative Adversarial Networks, Diffusion Models Slides PDF Slides CZ Lecture EN Lecture EN Practicals EN Large Transformers Questions gan dcgan
14. Speech Synthesis, External Memory Networks Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions learning_to_learn ddim ddim_attention ddim_conditional
Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.
The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
Feb 13 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions numpy_entropy pca_first mnist_layers_activations
Feb 20 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole
Feb 27 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_regularization mnist_ensemble uppercase
Mar 06 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions mnist_cnn tf_dataset mnist_multiple cifar_competition
Mar 13 Slides PDF Slides CZ Lecture CZ Transposed Convolution CZ Transfer Learning CZ Practicals EN Lecture EN Transposed Convolution EN Practicals Questions cnn_manual cags_classification cags_segmentation
Mar 20 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions bboxes_utils svhn_competition
Mar 27 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions sequence_classification tagger_we tagger_cle tagger_competition
Apr 03 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tensorboard_projector tagger_crf tagger_crf_manual speech_recognition
Word2vec
word embeddings, notably the CBOW and Skip-gram architectures [Efficient Estimation of Word Representations in Vector Space]
Apr 10 EN Practicals 3d_recognition homr_competition
Apr 17 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions lemmatizer_noattn lemmatizer_attn lemmatizer_competition
Apr 24 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tagger_transformer sentiment_analysis reading_comprehension
May 02 Slides PDF Slides CZ Lecture EN Lecture EN Practicals Questions reinforce reinforce_baseline reinforce_pixels vae crac2023
Study material for Reinforcement Learning is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (reffered to as RLB), available online.
May 09 Slides PDF Slides CZ Lecture EN Lecture EN Practicals EN Large Transformers Questions gan dcgan
May 15 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions learning_to_learn ddim ddim_attention ddim_conditional
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.
The tasks are evaluated automatically using the ReCodEx Code Examiner.
The evaluation is performed using Python 3.9, TensorFlow 2.11.0, TensorFlow Addons 0.19.0, TensorFlow Probability 0.19.0, TensorFlow Hub 0.12.0, HF Transformers 4.26.0, and Gymnasium 0.27.1. You should install the exact version of these packages yourselves.
Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are expected to share code and submit identical solutions.
Deadline: Feb 27, 7:59 a.m. 3 points
The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.
Load a file specified in args.data_path
, whose lines consist of data points of our
dataset, and load a file specified in args.model_path
, which describes a model probability distribution,
with each line being a tab-separated pair of (data point, probability).
Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or inf
for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):
Use natural logarithms to compute the entropies and the divergence.
python3 numpy_entropy.py --data_path
numpy_entropy_data_1.txt --model_path
numpy_entropy_model_1.txtEntropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats
python3 numpy_entropy.py --data_path
numpy_entropy_data_2.txt --model_path
numpy_entropy_model_2.txtEntropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 numpy_entropy.py --data_path
numpy_entropy_data_1.txt --model_path
numpy_entropy_model_1.txtEntropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats
python3 numpy_entropy.py --data_path
numpy_entropy_data_2.txt --model_path
numpy_entropy_model_2.txtEntropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
python3 numpy_entropy.py --data_path
numpy_entropy_data_3.txt --model_path
numpy_entropy_model_3.txtEntropy: 4.15 nats
Crossentropy: 4.23 nats
KL divergence: 0.08 nats
python3 numpy_entropy.py --data_path
numpy_entropy_data_4.txt --model_path
numpy_entropy_model_4.txtEntropy: 4.99 nats
Crossentropy: 5.03 nats
KL divergence: 0.04 nats
Deadline: Feb 27, 7:59 a.m. 2 points
The goal of this exercise is to familiarize with TensorFlow tf.Tensor
s,
shapes and basic tensor manipulation methods. Start with the
pca_first.py
(and you will also need the mnist.py
module).
In this assignment, you will compute the covariance matrix of several examples from the MNIST dataset, compute the first principal component and quantify the explained variance of it.
It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.
Finally, it is a good idea to read the
TensorFlow guide about tf.Tensor
s.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 pca_first.py --examples=1024 --iterations=64
Total variance: 53.12
Explained variance: 9.64%
python3 pca_first.py --examples=8192 --iterations=128
Total variance: 53.05
Explained variance: 9.89%
python3 pca_first.py --examples=55000 --iterations=1024
Total variance: 52.74
Explained variance: 9.71%
Deadline: Feb 27, 7:59 a.m. 2 points
Before solving the assignment, start by playing with
example_keras_tensorboard.py,
in order to familiarize with TensorFlow and TensorBoard.
Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs
.
Then open http://localhost:6006 in a browser and explore the active tabs.
Your goal is to modify the mnist_layers_activations.py template and implement the following:
hidden_layers
.activation
, with supported values of none
, relu
, tanh
and sigmoid
.Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=0 --activation=none
loss: 0.5390 - accuracy: 0.8607 - val_loss: 0.2745 - val_accuracy: 0.9288
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=none
loss: 0.3809 - accuracy: 0.8915 - val_loss: 0.2403 - val_accuracy: 0.9344
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=relu
loss: 0.3093 - accuracy: 0.9130 - val_loss: 0.1374 - val_accuracy: 0.9624
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=tanh
loss: 0.3304 - accuracy: 0.9067 - val_loss: 0.1601 - val_accuracy: 0.9580
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=sigmoid
loss: 0.4905 - accuracy: 0.8771 - val_loss: 0.2123 - val_accuracy: 0.9452
python3 mnist_layers_activations.py --epochs=1 --hidden_layers=3 --activation=relu
loss: 0.2727 - accuracy: 0.9185 - val_loss: 0.1180 - val_accuracy: 0.9644
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_layers_activations.py --hidden_layers=0 --activation=none
Epoch 1/10 loss: 0.5390 - accuracy: 0.8607 - val_loss: 0.2745 - val_accuracy: 0.9288
Epoch 5/10 loss: 0.2777 - accuracy: 0.9228 - val_loss: 0.2195 - val_accuracy: 0.9432
Epoch 10/10 loss: 0.2592 - accuracy: 0.9279 - val_loss: 0.2141 - val_accuracy: 0.9434
python3 mnist_layers_activations.py --hidden_layers=1 --activation=none
Epoch 1/10 loss: 0.3809 - accuracy: 0.8915 - val_loss: 0.2403 - val_accuracy: 0.9344
Epoch 5/10 loss: 0.2759 - accuracy: 0.9220 - val_loss: 0.2338 - val_accuracy: 0.9348
Epoch 10/10 loss: 0.2642 - accuracy: 0.9257 - val_loss: 0.2322 - val_accuracy: 0.9386
python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu
Epoch 1/10 loss: 0.3093 - accuracy: 0.9130 - val_loss: 0.1374 - val_accuracy: 0.9624
Epoch 5/10 loss: 0.0613 - accuracy: 0.9809 - val_loss: 0.0733 - val_accuracy: 0.9798
Epoch 10/10 loss: 0.0226 - accuracy: 0.9934 - val_loss: 0.0751 - val_accuracy: 0.9784
python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh
Epoch 1/10 loss: 0.3304 - accuracy: 0.9067 - val_loss: 0.1601 - val_accuracy: 0.9580
Epoch 5/10 loss: 0.0745 - accuracy: 0.9785 - val_loss: 0.0804 - val_accuracy: 0.9758
Epoch 10/10 loss: 0.0272 - accuracy: 0.9930 - val_loss: 0.0719 - val_accuracy: 0.9782
python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
Epoch 1/10 loss: 0.4905 - accuracy: 0.8771 - val_loss: 0.2123 - val_accuracy: 0.9452
Epoch 5/10 loss: 0.1228 - accuracy: 0.9647 - val_loss: 0.1037 - val_accuracy: 0.9708
Epoch 10/10 loss: 0.0604 - accuracy: 0.9834 - val_loss: 0.0790 - val_accuracy: 0.9754
python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu
Epoch 1/10 loss: 0.2727 - accuracy: 0.9185 - val_loss: 0.1180 - val_accuracy: 0.9644
Epoch 5/10 loss: 0.0501 - accuracy: 0.9837 - val_loss: 0.0944 - val_accuracy: 0.9734
Epoch 10/10 loss: 0.0242 - accuracy: 0.9919 - val_loss: 0.0936 - val_accuracy: 0.9814
python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu
Epoch 1/10 loss: 0.3648 - accuracy: 0.8872 - val_loss: 0.1340 - val_accuracy: 0.9642
Epoch 5/10 loss: 0.0820 - accuracy: 0.9774 - val_loss: 0.0925 - val_accuracy: 0.9750
Epoch 10/10 loss: 0.0510 - accuracy: 0.9857 - val_loss: 0.0914 - val_accuracy: 0.9796
python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
Epoch 1/10 loss: 2.2465 - accuracy: 0.1236 - val_loss: 1.9748 - val_accuracy: 0.1996
Epoch 5/10 loss: 0.5975 - accuracy: 0.8113 - val_loss: 0.4746 - val_accuracy: 0.8552
Epoch 10/10 loss: 0.3410 - accuracy: 0.9216 - val_loss: 0.3415 - val_accuracy: 0.9198
Deadline: Mar 6, 7:59 a.m. 3 points
In this exercise you will learn how to compute gradients using the so-called automatic differentiation, which allows to automatically run backpropagation algorithm for a given computation. You can read the guide on automatic differentiation in TensorFlow if interested. After computing the gradient, you should then perform training by running manually implemented minibatch stochastic gradient descent.
Starting with the sgd_backpropagation.py template, you should:
tf.GradientTape
to automatically compute the gradient of the loss
with respect to all variables;Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sgd_backpropagation.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Test accuracy after epoch 2 is 93.21
python3 sgd_backpropagation.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Test accuracy after epoch 2 is 93.93
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sgd_backpropagation.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Dev accuracy after epoch 6 is 95.66
Dev accuracy after epoch 7 is 95.58
Dev accuracy after epoch 8 is 95.86
Dev accuracy after epoch 9 is 96.18
Dev accuracy after epoch 10 is 96.08
Test accuracy after epoch 10 is 95.53
python3 sgd_backpropagation.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Dev accuracy after epoch 6 is 96.16
Dev accuracy after epoch 7 is 96.42
Dev accuracy after epoch 8 is 96.36
Dev accuracy after epoch 9 is 96.60
Dev accuracy after epoch 10 is 96.58
Test accuracy after epoch 10 is 96.18
Deadline: Mar 6, 7:59 a.m. 2 points
The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.
While in this assignment we compute the gradient manually, we will nearly always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise than a real-world application. Furthermore, we will compute the derivatives together on the Feb 27/28 practicals.
Start with the sgd_manual.py template, which is based on sgd_backpropagation.py one. Be aware that these templates generates each a different output file.
In order to check that you do not use automatic differentiation, ReCodEx checks
that you do not use tf.GradientTape
in your solution.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sgd_manual.py --epochs=2 --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Test accuracy after epoch 2 is 93.21
python3 sgd_manual.py --epochs=2 --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Test accuracy after epoch 2 is 93.93
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sgd_manual.py --batch_size=64 --hidden_layer=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.84
Dev accuracy after epoch 2 is 93.86
Dev accuracy after epoch 3 is 94.64
Dev accuracy after epoch 4 is 95.24
Dev accuracy after epoch 5 is 95.26
Dev accuracy after epoch 6 is 95.66
Dev accuracy after epoch 7 is 95.58
Dev accuracy after epoch 8 is 95.86
Dev accuracy after epoch 9 is 96.18
Dev accuracy after epoch 10 is 96.08
Test accuracy after epoch 10 is 95.53
python3 sgd_manual.py --batch_size=100 --hidden_layer=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.66
Dev accuracy after epoch 2 is 95.00
Dev accuracy after epoch 3 is 95.72
Dev accuracy after epoch 4 is 95.80
Dev accuracy after epoch 5 is 96.34
Dev accuracy after epoch 6 is 96.16
Dev accuracy after epoch 7 is 96.42
Dev accuracy after epoch 8 is 96.36
Dev accuracy after epoch 9 is 96.60
Dev accuracy after epoch 10 is 96.58
Test accuracy after epoch 10 is 96.18
Deadline: Mar 6, 7:59 a.m. 2 points
This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:
SGD
or Adam
).SGD
optimizer.linear
, exponential
, or cosine
. If a schedule is specified, you also
get a final learning rate, and the learning rate should be gradually decresed
during training to reach the final learning rate just after the training
(i.e., the first update after the training would use exactly the final learning rate).Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.01
loss: 0.8214 - accuracy: 0.7996 - val_loss: 0.3673 - val_accuracy: 0.9096
python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.01 --momentum=0.9
loss: 0.3626 - accuracy: 0.8976 - val_loss: 0.1689 - val_accuracy: 0.9556
python3 mnist_training.py --epochs=1 --optimizer=SGD --learning_rate=0.1
loss: 0.3515 - accuracy: 0.9008 - val_loss: 0.1660 - val_accuracy: 0.9564
python3 mnist_training.py --epochs=1 --optimizer=Adam --learning_rate=0.001
loss: 0.2732 - accuracy: 0.9221 - val_loss: 0.1186 - val_accuracy: 0.9674
python3 mnist_training.py --epochs=1 --optimizer=Adam --learning_rate=0.01
loss: 0.2312 - accuracy: 0.9309 - val_loss: 0.1286 - val_accuracy: 0.9648
python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001
Epoch 1/2 loss: 0.1962 - accuracy: 0.9398 - val_loss: 0.1026 - val_accuracy: 0.9728
Epoch 2/2 loss: 0.0672 - accuracy: 0.9788 - val_loss: 0.0735 - val_accuracy: 0.9788
Next learning rate to be used: 0.001
python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001
Epoch 1/2 loss: 0.2106 - accuracy: 0.9369 - val_loss: 0.1174 - val_accuracy: 0.9664
Epoch 2/2 loss: 0.0715 - accuracy: 0.9775 - val_loss: 0.0745 - val_accuracy: 0.9778
Next learning rate to be used: 0.0001
python3 mnist_training.py --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001
Epoch 1/2 loss: 0.2158 - accuracy: 0.9346 - val_loss: 0.1231 - val_accuracy: 0.9670
Epoch 2/2 loss: 0.0694 - accuracy: 0.9781 - val_loss: 0.0746 - val_accuracy: 0.9786
Next learning rate to be used: 0.0001
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_training.py --optimizer=SGD --learning_rate=0.01
Epoch 1/10 loss: 0.8214 - accuracy: 0.7996 - val_loss: 0.3673 - val_accuracy: 0.9096
Epoch 2/10 loss: 0.3950 - accuracy: 0.8930 - val_loss: 0.2821 - val_accuracy: 0.9250
Epoch 3/10 loss: 0.3346 - accuracy: 0.9064 - val_loss: 0.2472 - val_accuracy: 0.9326
Epoch 4/10 loss: 0.3018 - accuracy: 0.9150 - val_loss: 0.2268 - val_accuracy: 0.9394
Epoch 5/10 loss: 0.2786 - accuracy: 0.9215 - val_loss: 0.2113 - val_accuracy: 0.9416
Epoch 6/10 loss: 0.2603 - accuracy: 0.9271 - val_loss: 0.1996 - val_accuracy: 0.9454
Epoch 7/10 loss: 0.2448 - accuracy: 0.9313 - val_loss: 0.1879 - val_accuracy: 0.9500
Epoch 8/10 loss: 0.2314 - accuracy: 0.9352 - val_loss: 0.1819 - val_accuracy: 0.9512
Epoch 9/10 loss: 0.2199 - accuracy: 0.9384 - val_loss: 0.1720 - val_accuracy: 0.9558
Epoch 10/10 loss: 0.2091 - accuracy: 0.9416 - val_loss: 0.1655 - val_accuracy: 0.9582
python3 mnist_training.py --optimizer=SGD --learning_rate=0.01 --momentum=0.9
Epoch 1/10 loss: 0.3626 - accuracy: 0.8976 - val_loss: 0.1689 - val_accuracy: 0.9556
Epoch 2/10 loss: 0.1813 - accuracy: 0.9486 - val_loss: 0.1278 - val_accuracy: 0.9668
Epoch 3/10 loss: 0.1324 - accuracy: 0.9622 - val_loss: 0.1060 - val_accuracy: 0.9706
Epoch 4/10 loss: 0.1053 - accuracy: 0.9704 - val_loss: 0.0916 - val_accuracy: 0.9742
Epoch 5/10 loss: 0.0866 - accuracy: 0.9753 - val_loss: 0.0862 - val_accuracy: 0.9766
Epoch 6/10 loss: 0.0732 - accuracy: 0.9793 - val_loss: 0.0806 - val_accuracy: 0.9774
Epoch 7/10 loss: 0.0637 - accuracy: 0.9825 - val_loss: 0.0756 - val_accuracy: 0.9806
Epoch 8/10 loss: 0.0547 - accuracy: 0.9851 - val_loss: 0.0740 - val_accuracy: 0.9794
Epoch 9/10 loss: 0.0486 - accuracy: 0.9867 - val_loss: 0.0781 - val_accuracy: 0.9768
Epoch 10/10 loss: 0.0430 - accuracy: 0.9886 - val_loss: 0.0731 - val_accuracy: 0.9790
python3 mnist_training.py --optimizer=SGD --learning_rate=0.1
Epoch 1/10 loss: 0.3515 - accuracy: 0.9008 - val_loss: 0.1660 - val_accuracy: 0.9564
Epoch 2/10 loss: 0.1788 - accuracy: 0.9488 - val_loss: 0.1267 - val_accuracy: 0.9668
Epoch 3/10 loss: 0.1307 - accuracy: 0.9624 - val_loss: 0.1006 - val_accuracy: 0.9734
Epoch 4/10 loss: 0.1039 - accuracy: 0.9711 - val_loss: 0.0902 - val_accuracy: 0.9736
Epoch 5/10 loss: 0.0856 - accuracy: 0.9755 - val_loss: 0.0845 - val_accuracy: 0.9776
Epoch 6/10 loss: 0.0729 - accuracy: 0.9789 - val_loss: 0.0841 - val_accuracy: 0.9762
Epoch 7/10 loss: 0.0628 - accuracy: 0.9827 - val_loss: 0.0742 - val_accuracy: 0.9812
Epoch 8/10 loss: 0.0546 - accuracy: 0.9850 - val_loss: 0.0746 - val_accuracy: 0.9790
Epoch 9/10 loss: 0.0488 - accuracy: 0.9866 - val_loss: 0.0756 - val_accuracy: 0.9780
Epoch 10/10 loss: 0.0426 - accuracy: 0.9887 - val_loss: 0.0711 - val_accuracy: 0.9784
python3 mnist_training.py --optimizer=Adam --learning_rate=0.001
Epoch 1/10 loss: 0.2732 - accuracy: 0.9221 - val_loss: 0.1186 - val_accuracy: 0.9674
Epoch 2/10 loss: 0.1156 - accuracy: 0.9662 - val_loss: 0.0921 - val_accuracy: 0.9716
Epoch 3/10 loss: 0.0776 - accuracy: 0.9772 - val_loss: 0.0785 - val_accuracy: 0.9764
Epoch 4/10 loss: 0.0569 - accuracy: 0.9831 - val_loss: 0.0795 - val_accuracy: 0.9756
Epoch 5/10 loss: 0.0428 - accuracy: 0.9866 - val_loss: 0.0736 - val_accuracy: 0.9788
Epoch 6/10 loss: 0.0324 - accuracy: 0.9900 - val_loss: 0.0749 - val_accuracy: 0.9806
Epoch 7/10 loss: 0.0265 - accuracy: 0.9921 - val_loss: 0.0781 - val_accuracy: 0.9782
Epoch 8/10 loss: 0.0203 - accuracy: 0.9942 - val_loss: 0.0886 - val_accuracy: 0.9776
Epoch 9/10 loss: 0.0157 - accuracy: 0.9953 - val_loss: 0.0830 - val_accuracy: 0.9786
Epoch 10/10 loss: 0.0136 - accuracy: 0.9958 - val_loss: 0.0878 - val_accuracy: 0.9778
python3 mnist_training.py --optimizer=Adam --learning_rate=0.01
Epoch 1/10 loss: 0.2312 - accuracy: 0.9309 - val_loss: 0.1286 - val_accuracy: 0.9648
Epoch 2/10 loss: 0.1384 - accuracy: 0.9607 - val_loss: 0.1295 - val_accuracy: 0.9638
Epoch 3/10 loss: 0.1251 - accuracy: 0.9655 - val_loss: 0.1784 - val_accuracy: 0.9594
Epoch 4/10 loss: 0.1079 - accuracy: 0.9701 - val_loss: 0.1693 - val_accuracy: 0.9608
Epoch 5/10 loss: 0.0988 - accuracy: 0.9729 - val_loss: 0.1524 - val_accuracy: 0.9676
Epoch 6/10 loss: 0.0950 - accuracy: 0.9747 - val_loss: 0.1813 - val_accuracy: 0.9698
Epoch 7/10 loss: 0.0938 - accuracy: 0.9764 - val_loss: 0.1850 - val_accuracy: 0.9654
Epoch 8/10 loss: 0.0849 - accuracy: 0.9781 - val_loss: 0.1919 - val_accuracy: 0.9670
Epoch 9/10 loss: 0.0833 - accuracy: 0.9792 - val_loss: 0.1739 - val_accuracy: 0.9712
Epoch 10/10 loss: 0.0744 - accuracy: 0.9815 - val_loss: 0.1840 - val_accuracy: 0.9690
python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001
Epoch 1/10 loss: 0.2204 - accuracy: 0.9335 - val_loss: 0.1276 - val_accuracy: 0.9656
Epoch 2/10 loss: 0.1126 - accuracy: 0.9672 - val_loss: 0.1180 - val_accuracy: 0.9660
Epoch 3/10 loss: 0.0745 - accuracy: 0.9767 - val_loss: 0.0989 - val_accuracy: 0.9750
Epoch 4/10 loss: 0.0495 - accuracy: 0.9843 - val_loss: 0.0898 - val_accuracy: 0.9780
Epoch 5/10 loss: 0.0326 - accuracy: 0.9899 - val_loss: 0.0970 - val_accuracy: 0.9788
Epoch 6/10 loss: 0.0197 - accuracy: 0.9936 - val_loss: 0.1005 - val_accuracy: 0.9808
Epoch 7/10 loss: 0.0133 - accuracy: 0.9955 - val_loss: 0.0857 - val_accuracy: 0.9812
Epoch 8/10 loss: 0.0067 - accuracy: 0.9982 - val_loss: 0.0976 - val_accuracy: 0.9804
Epoch 9/10 loss: 0.0042 - accuracy: 0.9991 - val_loss: 0.1056 - val_accuracy: 0.9804
Epoch 10/10 loss: 0.0023 - accuracy: 0.9997 - val_loss: 0.0931 - val_accuracy: 0.9822
Next learning rate to be used: 0.001
python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001
Epoch 1/10 loss: 0.2299 - accuracy: 0.9312 - val_loss: 0.1309 - val_accuracy: 0.9620
Epoch 2/10 loss: 0.1266 - accuracy: 0.9632 - val_loss: 0.1174 - val_accuracy: 0.9702
Epoch 3/10 loss: 0.0958 - accuracy: 0.9724 - val_loss: 0.1129 - val_accuracy: 0.9730
Epoch 4/10 loss: 0.0730 - accuracy: 0.9775 - val_loss: 0.1223 - val_accuracy: 0.9700
Epoch 5/10 loss: 0.0504 - accuracy: 0.9847 - val_loss: 0.1046 - val_accuracy: 0.9758
Epoch 6/10 loss: 0.0338 - accuracy: 0.9895 - val_loss: 0.1225 - val_accuracy: 0.9766
Epoch 7/10 loss: 0.0239 - accuracy: 0.9925 - val_loss: 0.1043 - val_accuracy: 0.9784
Epoch 8/10 loss: 0.0108 - accuracy: 0.9964 - val_loss: 0.1035 - val_accuracy: 0.9808
Epoch 9/10 loss: 0.0050 - accuracy: 0.9985 - val_loss: 0.0912 - val_accuracy: 0.9822
Epoch 10/10 loss: 0.0021 - accuracy: 0.9997 - val_loss: 0.0920 - val_accuracy: 0.9828
Next learning rate to be used: 0.0001
python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001
Epoch 1/10 loss: 0.2307 - accuracy: 0.9302 - val_loss: 0.1340 - val_accuracy: 0.9620
Epoch 2/10 loss: 0.1377 - accuracy: 0.9608 - val_loss: 0.1398 - val_accuracy: 0.9640
Epoch 3/10 loss: 0.1089 - accuracy: 0.9676 - val_loss: 0.1089 - val_accuracy: 0.9738
Epoch 4/10 loss: 0.0774 - accuracy: 0.9775 - val_loss: 0.1198 - val_accuracy: 0.9710
Epoch 5/10 loss: 0.0517 - accuracy: 0.9844 - val_loss: 0.1100 - val_accuracy: 0.9758
Epoch 6/10 loss: 0.0333 - accuracy: 0.9890 - val_loss: 0.1036 - val_accuracy: 0.9786
Epoch 7/10 loss: 0.0181 - accuracy: 0.9941 - val_loss: 0.0949 - val_accuracy: 0.9814
Epoch 8/10 loss: 0.0091 - accuracy: 0.9973 - val_loss: 0.0930 - val_accuracy: 0.9812
Epoch 9/10 loss: 0.0050 - accuracy: 0.9987 - val_loss: 0.0971 - val_accuracy: 0.9826
Epoch 10/10 loss: 0.0036 - accuracy: 0.9992 - val_loss: 0.0965 - val_accuracy: 0.9824
Next learning rate to be used: 0.0001
Deadline: Mar 6, 7:59 a.m. 3 points
Solve the CartPole-v1 environment from the Gymnasium library, utilizing only provided supervised training data. The data is available in gym_cartpole_data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.
The solution to this task should be a model which passes evaluation on random
inputs. This evaluation can be performed by running the
gym_cartpole.py
with --evaluate
argument (optionally rendering if --render
option is
provided), or directly calling the evaluate_model
method. In order to pass,
you must achieve an average reward of at least 475 on 100 episodes. Your model
should have either one or two outputs (i.e., using either sigmoid or softmax
output function).
When designing the model, you should consider that the size of the training data is very small and the data is quite noisy.
When submitting to ReCodEx, do not forget to also submit the trained model.
Deadline: Mar 13, 7:59 a.m. 3 points
You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:
args.dropout
. Add a dropout layer after the
first Flatten
and also after all Dense
hidden layers (but not after the
output layer).args.weight_decay
,
making sure the weight decay is not applied on bias.args.label_smoothing
. Instead
of SparseCategoricalCrossentropy
, you will need to use
CategoricalCrossentropy
which offers label_smoothing
argument.In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (or online here), notably the training, development and test set accuracy and loss:
0
, 0.3
, 0.5
, 0.6
, 0.8
;0
, 0.1
, 0.3
, 0.5
, 0.1
;0
, 0.1
, 0.3
, 0.5
.Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_regularization.py --epochs=1 --dropout=0.3
loss: 0.7988 - accuracy: 0.7646 - val_loss: 0.3164 - val_accuracy: 0.9116
python3 mnist_regularization.py --epochs=1 --dropout=0.5 --hidden_layers 300 300
loss: 1.4830 - accuracy: 0.4910 - val_loss: 0.4659 - val_accuracy: 0.8766
python3 mnist_regularization.py --epochs=1 --weight_decay=0.1
loss: 0.6040 - accuracy: 0.8386 - val_loss: 0.2718 - val_accuracy: 0.9236
python3 mnist_regularization.py --epochs=1 --weight_decay=0.3
loss: 0.6062 - accuracy: 0.8384 - val_loss: 0.2744 - val_accuracy: 0.9222
python3 mnist_regularization.py --epochs=1 --label_smoothing=0.1
loss: 0.9926 - accuracy: 0.8414 - val_loss: 0.7720 - val_accuracy: 0.9222
python3 mnist_regularization.py --epochs=1 --label_smoothing=0.3
loss: 1.5080 - accuracy: 0.8456 - val_loss: 1.3738 - val_accuracy: 0.9260
Deadline: Mar 13, 7:59 a.m. 2 points
Your goal in this assignment is to implement model ensembling.
The mnist_ensemble.py
template trains args.models
individual models, and your goal is to perform
an ensemble of the first model, first two models, first three models, …, all
models, and evaluate their accuracy on the test set.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_ensemble.py --epochs=1 --models=5
Model 1, individual accuracy 96.24, ensemble accuracy 96.24
Model 2, individual accuracy 96.34, ensemble accuracy 96.44
Model 3, individual accuracy 96.24, ensemble accuracy 96.46
Model 4, individual accuracy 96.64, ensemble accuracy 96.60
Model 5, individual accuracy 96.60, ensemble accuracy 96.60
python3 mnist_ensemble.py --epochs=1 --models=5 --hidden_layers=200
Model 1, individual accuracy 96.74, ensemble accuracy 96.74
Model 2, individual accuracy 96.92, ensemble accuracy 97.06
Model 3, individual accuracy 96.82, ensemble accuracy 97.06
Model 4, individual accuracy 96.86, ensemble accuracy 96.96
Model 5, individual accuracy 96.46, ensemble accuracy 96.86
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_ensemble.py --models=5
Model 1, individual accuracy 97.84, ensemble accuracy 97.84
Model 2, individual accuracy 98.04, ensemble accuracy 98.22
Model 3, individual accuracy 97.90, ensemble accuracy 98.16
Model 4, individual accuracy 97.88, ensemble accuracy 98.12
Model 5, individual accuracy 97.94, ensemble accuracy 98.12
python3 mnist_ensemble.py --models=5 --hidden_layers=200
Model 1, individual accuracy 97.78, ensemble accuracy 97.78
Model 2, individual accuracy 98.18, ensemble accuracy 98.30
Model 3, individual accuracy 98.02, ensemble accuracy 98.28
Model 4, individual accuracy 98.10, ensemble accuracy 98.40
Model 5, individual accuracy 97.98, ensemble accuracy 98.44
Deadline: Mar 13, 7:59 a.m. 4 points+5 bonus
This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.
This is an open-data task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py/ipynb file.
The task is also a competition. Everyone who submits
a solution which achieves at least 98.5% accuracy will get 4 basic points; the
5 bonus points will be distributed depending on relative ordering of your
solutions. The accuracy is computed per-character and can be evaluated
by running uppercase_data.py
with --evaluate
argument, or using its evaluate_file
method.
You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.
Do not use RNNs, CNNs, or Transformer in this task (if you have doubts, contact me).
Deadline: Mar 20, 7:59 a.m. 3 points
To pass this assignment, you will learn to construct basic convolutional
neural network layers. Start with the
mnist_cnn.py
template and assume the requested architecture is described by the cnn
argument, which contains comma-separated specifications of the following layers:
C-filters-kernel_size-stride-padding
: Add a convolutional layer with ReLU
activation and specified number of filters, kernel size, stride and padding.
Example: C-10-3-1-same
CB-filters-kernel_size-stride-padding
: Same as
C-filters-kernel_size-stride-padding
, but use batch normalization.
In detail, start with a convolutional layer without bias and activation,
then add batch normalization layer, and finally ReLU activation.
Example: CB-10-3-1-same
M-pool_size-stride
: Add max pooling with specified size and stride, using
the default "valid"
padding.
Example: M-3-2
R-[layers]
: Add a residual connection. The layers
contain a specification
of at least one convolutional layer (but not a recursive residual connection R
).
The input to the R
layer should be processed sequentially by layers
, and the
produced output (after the ReLU nonlinearty of the last layer) should be added
to the input (of this R
layer).
Example: R-[C-16-3-1-same,C-16-3-1-same]
F
: Flatten inputs. Must appear exactly once in the architecture.H-hidden_layer_size
: Add a dense layer with ReLU activation and specified
size. Example: H-100
D-dropout_rate
: Apply dropout with the given dropout rate. Example: D-0.5
An example architecture might be --cnn=CB-16-5-2-same,M-3-2,F,H-100,D-0.5
.
You can assume the resulting network is valid; it is fine to crash if it is not.
After a successful ReCodEx submission, you can try obtaining the best accuracy
on MNIST and then advance to cifar_competition
.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_cnn.py --epochs=1 --cnn=F,H-100
loss: 0.3093 - accuracy: 0.9130 - val_loss: 0.1374 - val_accuracy: 0.9624
python3 mnist_cnn.py --epochs=1 --cnn=F,H-100,D-0.5
loss: 0.4770 - accuracy: 0.8594 - val_loss: 0.1624 - val_accuracy: 0.9552
python3 mnist_cnn.py --epochs=1 --cnn=M-5-2,F,H-50
loss: 0.7365 - accuracy: 0.7773 - val_loss: 0.3899 - val_accuracy: 0.8800
python3 mnist_cnn.py --epochs=1 --cnn=C-8-3-5-same,C-8-3-2-valid,F,H-50
loss: 0.8051 - accuracy: 0.7453 - val_loss: 0.3693 - val_accuracy: 0.8868
python3 mnist_cnn.py --epochs=1 --cnn=CB-6-3-5-valid,F,H-32
loss: 0.5878 - accuracy: 0.8189 - val_loss: 0.2638 - val_accuracy: 0.9246
python3 mnist_cnn.py --epochs=1 --cnn=CB-8-3-5-valid,R-[CB-8-3-1-same,CB-8-3-1-same],F,H-50
loss: 0.4186 - accuracy: 0.8674 - val_loss: 0.1729 - val_accuracy: 0.9456
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_cnn.py --cnn=F,H-100
Epoch 1/10 loss: 0.3093 - accuracy: 0.9130 - val_loss: 0.1374 - val_accuracy: 0.9624
Epoch 2/10 loss: 0.1439 - accuracy: 0.9583 - val_loss: 0.1089 - val_accuracy: 0.9674
Epoch 3/10 loss: 0.1019 - accuracy: 0.9696 - val_loss: 0.0942 - val_accuracy: 0.9720
Epoch 4/10 loss: 0.0775 - accuracy: 0.9770 - val_loss: 0.0844 - val_accuracy: 0.9750
Epoch 5/10 loss: 0.0613 - accuracy: 0.9809 - val_loss: 0.0733 - val_accuracy: 0.9798
Epoch 6/10 loss: 0.0489 - accuracy: 0.9852 - val_loss: 0.0785 - val_accuracy: 0.9760
Epoch 7/10 loss: 0.0413 - accuracy: 0.9876 - val_loss: 0.0750 - val_accuracy: 0.9790
Epoch 8/10 loss: 0.0336 - accuracy: 0.9900 - val_loss: 0.0781 - val_accuracy: 0.9790
Epoch 9/10 loss: 0.0272 - accuracy: 0.9920 - val_loss: 0.0837 - val_accuracy: 0.9778
Epoch 10/10 loss: 0.0226 - accuracy: 0.9934 - val_loss: 0.0751 - val_accuracy: 0.9784
python3 mnist_cnn.py --cnn=F,H-100,D-0.5
Epoch 1/10 loss: 0.4770 - accuracy: 0.8594 - val_loss: 0.1624 - val_accuracy: 0.9552
Epoch 2/10 loss: 0.2734 - accuracy: 0.9196 - val_loss: 0.1274 - val_accuracy: 0.9654
Epoch 3/10 loss: 0.2262 - accuracy: 0.9322 - val_loss: 0.1054 - val_accuracy: 0.9712
Epoch 4/10 loss: 0.2027 - accuracy: 0.9388 - val_loss: 0.0976 - val_accuracy: 0.9720
Epoch 5/10 loss: 0.1898 - accuracy: 0.9429 - val_loss: 0.0906 - val_accuracy: 0.9734
Epoch 6/10 loss: 0.1749 - accuracy: 0.9455 - val_loss: 0.0863 - val_accuracy: 0.9748
Epoch 7/10 loss: 0.1643 - accuracy: 0.9501 - val_loss: 0.0857 - val_accuracy: 0.9750
Epoch 8/10 loss: 0.1570 - accuracy: 0.9509 - val_loss: 0.0838 - val_accuracy: 0.9736
Epoch 9/10 loss: 0.1519 - accuracy: 0.9529 - val_loss: 0.0843 - val_accuracy: 0.9758
Epoch 10/10 loss: 0.1472 - accuracy: 0.9547 - val_loss: 0.0807 - val_accuracy: 0.9768
python3 mnist_cnn.py --cnn=F,H-200,D-0.5
Epoch 1/10 loss: 0.3804 - accuracy: 0.8867 - val_loss: 0.1319 - val_accuracy: 0.9668
Epoch 2/10 loss: 0.1960 - accuracy: 0.9410 - val_loss: 0.1027 - val_accuracy: 0.9696
Epoch 3/10 loss: 0.1551 - accuracy: 0.9541 - val_loss: 0.0805 - val_accuracy: 0.9764
Epoch 4/10 loss: 0.1332 - accuracy: 0.9603 - val_loss: 0.0781 - val_accuracy: 0.9784
Epoch 5/10 loss: 0.1182 - accuracy: 0.9640 - val_loss: 0.0756 - val_accuracy: 0.9788
Epoch 6/10 loss: 0.1046 - accuracy: 0.9681 - val_loss: 0.0730 - val_accuracy: 0.9792
Epoch 7/10 loss: 0.1036 - accuracy: 0.9676 - val_loss: 0.0715 - val_accuracy: 0.9810
Epoch 8/10 loss: 0.0920 - accuracy: 0.9708 - val_loss: 0.0748 - val_accuracy: 0.9808
Epoch 9/10 loss: 0.0865 - accuracy: 0.9725 - val_loss: 0.0727 - val_accuracy: 0.9792
Epoch 10/10 loss: 0.0831 - accuracy: 0.9739 - val_loss: 0.0667 - val_accuracy: 0.9812
python3 mnist_cnn.py --cnn=C-8-3-1-same,C-8-3-1-same,M-3-2,C-16-3-1-same,C-16-3-1-same,M-3-2,F,H-100
Epoch 1/10 loss: 0.1932 - accuracy: 0.9403 - val_loss: 0.0596 - val_accuracy: 0.9806
Epoch 2/10 loss: 0.0578 - accuracy: 0.9812 - val_loss: 0.0488 - val_accuracy: 0.9870
Epoch 3/10 loss: 0.0434 - accuracy: 0.9860 - val_loss: 0.0335 - val_accuracy: 0.9902
Epoch 4/10 loss: 0.0348 - accuracy: 0.9887 - val_loss: 0.0342 - val_accuracy: 0.9918
Epoch 5/10 loss: 0.0278 - accuracy: 0.9911 - val_loss: 0.0307 - val_accuracy: 0.9926
Epoch 6/10 loss: 0.0236 - accuracy: 0.9922 - val_loss: 0.0292 - val_accuracy: 0.9928
Epoch 7/10 loss: 0.0210 - accuracy: 0.9934 - val_loss: 0.0333 - val_accuracy: 0.9916
Epoch 8/10 loss: 0.0184 - accuracy: 0.9939 - val_loss: 0.0419 - val_accuracy: 0.9916
Epoch 9/10 loss: 0.0159 - accuracy: 0.9950 - val_loss: 0.0360 - val_accuracy: 0.9914
Epoch 10/10 loss: 0.0139 - accuracy: 0.9953 - val_loss: 0.0334 - val_accuracy: 0.9934
python3 mnist_cnn.py --cnn=CB-8-3-1-same,CB-8-3-1-same,M-3-2,CB-16-3-1-same,CB-16-3-1-same,M-3-2,F,H-100
Epoch 1/10 loss: 0.1604 - accuracy: 0.9512 - val_loss: 0.0419 - val_accuracy: 0.9876
Epoch 2/10 loss: 0.0520 - accuracy: 0.9833 - val_loss: 0.0778 - val_accuracy: 0.9770
Epoch 3/10 loss: 0.0424 - accuracy: 0.9858 - val_loss: 0.0460 - val_accuracy: 0.9864
Epoch 4/10 loss: 0.0345 - accuracy: 0.9888 - val_loss: 0.0392 - val_accuracy: 0.9904
Epoch 5/10 loss: 0.0268 - accuracy: 0.9916 - val_loss: 0.0390 - val_accuracy: 0.9904
Epoch 6/10 loss: 0.0248 - accuracy: 0.9919 - val_loss: 0.0360 - val_accuracy: 0.9916
Epoch 7/10 loss: 0.0204 - accuracy: 0.9930 - val_loss: 0.0263 - val_accuracy: 0.9934
Epoch 8/10 loss: 0.0189 - accuracy: 0.9937 - val_loss: 0.0388 - val_accuracy: 0.9884
Epoch 9/10 loss: 0.0178 - accuracy: 0.9940 - val_loss: 0.0447 - val_accuracy: 0.9888
Epoch 10/10 loss: 0.0140 - accuracy: 0.9953 - val_loss: 0.0269 - val_accuracy: 0.9930
Deadline: Mar 20, 7:59 a.m. 2 points
In this assignment you will familiarize yourselves with tf.data
, which is
TensorFlow high-level API for constructing input pipelines. If you want,
you can read an official TensorFlow tf.data guide
or reference API manual.
The goal of this assignment is to implement image augmentation preprocessing
similar to image_augmentation
, but with tf.data
. Start with the
tf_dataset.py
template and implement the input pipelines employing the tf.data.Dataset
.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tf_dataset.py --epochs=1 --batch_size=100
loss: 2.1809 - accuracy: 0.1772 - val_loss: 1.9773 - val_accuracy: 0.2630
python3 tf_dataset.py --epochs=1 --batch_size=50 --augment=tf_image
loss: 2.1008 - accuracy: 0.2052 - val_loss: 1.8225 - val_accuracy: 0.3070
python3 tf_dataset.py --epochs=1 --batch_size=50 --augment=layers
loss: 2.1820 - accuracy: 0.1664 - val_loss: 2.0104 - val_accuracy: 0.2330
Deadline: Mar 20, 7:59 a.m. 3 points
In this assignment you will implement a model with multiple inputs and outputs. Start with the mnist_multiple.py template and:
tf.data
API.Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 mnist_multiple.py --epochs=1 --batch_size=50
loss: 0.8763 - digit_1_loss: 0.2936 - digit_2_loss: 0.2972 - direct_comparison_loss: 0.2855 - direct_comparison_accuracy: 0.8711 - indirect_comparison_accuracy: 0.9450 - val_loss: 0.3029 - val_digit_1_loss: 0.1076 - val_digit_2_loss: 0.0644 - val_direct_comparison_loss: 0.1309 - val_direct_comparison_accuracy: 0.9556 - val_indirect_comparison_accuracy: 0.9828
python3 mnist_multiple.py --epochs=1 --batch_size=100
loss: 1.1698 - digit_1_loss: 0.4132 - digit_2_loss: 0.4140 - direct_comparison_loss: 0.3426 - direct_comparison_accuracy: 0.8390 - indirect_comparison_accuracy: 0.9270 - val_loss: 0.4259 - val_digit_1_loss: 0.1502 - val_digit_2_loss: 0.0884 - val_direct_comparison_loss: 0.1873 - val_direct_comparison_accuracy: 0.9296 - val_indirect_comparison_accuracy: 0.9744
Deadline: Mar 20, 7:59 a.m. 4 points+5 bonus
The goal of this assignment is to devise the best possible model for CIFAR-10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR-10.
The task is a competition. Everyone who submits a solution which achieves at least 70% test set accuracy will get 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve around ~85% on the development set to score 70% on the test set.
You may want to start with the cifar_competition.py template which generates the test set annotation in the required format.
Deadline: Mar 27, 7:59 a.m. 3 points Slides
To pass this assignment, you need to manually implement the forward and backward
pass through a 2D convolutional layer. Start with the
cnn_manual.py
template, which constructs a series of 2D convolutional layers with ReLU
activation and valid
padding, specified in the args.cnn
option.
The args.cnn
contains comma-separated layer specifications in the format
filters-kernel_size-stride
.
Of course, you cannot use any TensorFlow convolutional operation (instead,
implement the forward and backward pass using matrix multiplication and other
operations), nor the tf.GradientTape
for gradient computation.
To make debugging easier, the template supports a --verify
option, which
allows comparing the forward pass and the three gradients you compute in the
backward pass to correct values.
Finally, it is a good idea to read the TensorFlow guide about tensor slicing.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 cnn_manual.py --epochs=1 --cnn=5-1-1
Dev accuracy after epoch 1 is 91.16
Test accuracy after epoch 1 is 89.63
python3 cnn_manual.py --epochs=1 --cnn=5-3-1
Dev accuracy after epoch 1 is 94.10
Test accuracy after epoch 1 is 92.86
python3 cnn_manual.py --epochs=1 --cnn=5-3-2
Dev accuracy after epoch 1 is 92.86
Test accuracy after epoch 1 is 91.00
python3 cnn_manual.py --epochs=1 --cnn=5-3-2,10-3-2
Dev accuracy after epoch 1 is 92.74
Test accuracy after epoch 1 is 90.91
Deadline: Mar 27, 7:59 a.m. 4 points+5 bonus
The goal of this assignment is to use a pretrained model, for example the EfficientNetV2-B0, to achieve best accuracy in CAGS classification.
The CAGS dataset consists
of images of cats and dogs of size $224×224$, each classified in one of
the 34 breeds and each containing a mask indicating the presence of the animal.
To load the dataset, use the cags_dataset.py
module. The dataset is stored in a
TFRecord file
and each element is encoded as a
tf.train.Example,
which is decoded using the CAGS.parse
method.
To load the EfficientNetV2-B0, use the
tf.keras.applications.efficientnet_v2.EfficientNetV2B0
class, which constructs a Keras model, downloading the weights automatically.
However, you can use any model from tf.keras.applications
in this
assignment.
An example performing classification of given images is available in image_classification.py.
A note on finetuning: each tf.keras.layers.Layer
has a mutable trainable
property indicating whether its variables should be updated – however, after
changing it, you need to call .compile
again (or otherwise make sure the list
of trainable variables for the optimizer is updated). Furthermore, training
argument passed to the invocation call decides whether the layer is executed in
training regime (neurons gets dropped in dropout, batch normalization computes
estimates on the batch) or in inference regime. There is one exception though
– if trainable == False
on a batch normalization layer, it runs in the
inference regime even when training == True
.
The task is a competition. Everyone who submits a solution which achieves at least 93% test set accuracy will get 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_classification.py template which generates the test set annotation in the required format.
Deadline: Mar 27, 7:59 a.m. 4 points+5 bonus
The goal of this assignment is to use a pretrained model, for example the
EfficientNetV2-B0, to achieve best image segmentation IoU score on the CAGS
dataset. The dataset and the EfficientNetV2-B0 is described in the
cags_classification
assignment. Nevertheless, you can again use any model
from tf.keras.applications
in this assignment.
A mask is evaluated using intersection over union (IoU) metric, which is the
intersection of the gold and predicted mask divided by their union, and the
whole test set score is the average of its masks' IoU. A TensorFlow compatible
metric is implemented by the class MaskIoUMetric
of the
cags_dataset.py
module, which can also evaluate your predictions (either by running with
--task=segmentation --evaluate=path
arguments, or using its
evaluate_segmentation_file
method).
The task is a competition. Everyone who submits a solution which achieves at least 87% test set IoU gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions.
You may want to start with the cags_segmentation.py template, which generates the test set annotation in the required format – each mask should be encoded on a single line as a space separated sequence of integers indicating the length of alternating runs of zeros and ones.
Deadline: Apr 3, 7:59 a.m. 2 points
This is a preparatory assignment for svhn_competition
. The goal is to
implement several bounding box manipulation routines in the
bboxes_utils.py
module. Notably, you need to implement the following methods:
bboxes_to_fast_rcnn
: convert given bounding boxes to a Fast R-CNN-like
representation relative to the given anchors;bboxes_from_fast_rcnn
: convert Fast R-CNN-like representations relative to
given anchors back to bounding boxes;bboxes_training
: given a list of anchors and gold objects, assign gold
objects to anchors and generate suitable training data (the exact algorithm
is described in the template).The bboxes_utils.py contains simple unit tests, which are evaluated when executing the module, which you can use to check the validity of your implementation. Note that the template does not contain type annotations because Python typing system is not flexible enough to describe the tensor shape changes.
When submitting to ReCodEx, the method main
is executed, returning the
implemented bboxes_to_fast_rcnn
, bboxes_from_fast_rcnn
and bboxes_training
methods. These methods are then executed and compared to the reference
implementation.
Deadline: Apr 3, 7:59 a.m. 5 points+5 bonus
The goal of this assignment is to implement a system performing object
recognition, optionally utilizing the pretrained EfficientNetV2-B0 backbone
(or any other model from tf.keras.applications
).
The Street View House Numbers (SVHN) dataset
annotates for every photo all digits appearing on it, including their bounding
boxes. The dataset can be loaded using the svhn_dataset.py
module. Similarly to the CAGS
dataset, it is stored in a
TFRecord file
with tf.train.Example
elements. Every element is a dictionary with the following keys:
"image"
: a square 3-channel image stored using tf.uint8
,"classes"
: a 1D tensor with all digit labels appearing in the image,"bboxes"
: a [num_digits, 4]
2D tensor with bounding boxes of every
digit in the image.Given that the dataset elements are each of possibly different size and you want
to preprocess them using bboxes_training
, it might be more comfortable to
convert the dataset to NumPy. Alternatively, you can implement bboxes_training
using TensorFlow operations or call Numpy implementation of bboxes_training
directly in tf.data.Dataset.map
by using tf.numpy_function
,
see FAQ.
Each test set image annotation consists of a sequence of space separated
five-tuples label top left bottom right, and the annotation is considered
correct, if exactly the gold digits are predicted, each with IoU at least 0.5.
The whole test set score is then the prediction accuracy of individual images.
You can again evaluate your predictions using the
svhn_dataset.py
module, either by running with --evaluate=path
arguments, or using its
evaluate_file
method.
The task is a competition. Everyone who submits a solution which achieves at least 20% test set IoU gets 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that I usually need at least 35% development set accuracy to achieve the required test set performance.
You should start with the svhn_competition.py template, which generates the test set annotation in the required format.
A baseline solution can use RetinaNet-like single stage detector, using only a single level of convolutional features (no FPN) with single-scale and single-aspect anchors. Focal loss is available as tf.losses.BinaryFocalCrossentropy and non-maximum suppression as tf.image.non_max_suppression or tf.image.combined_non_max_suppression.
Deadline: Apr 11, 7:59 a.m. 2 points
The goal of this assignment is to introduce recurrent neural networks, show their convergence speed, and illustrate exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.
Your goal is to modify the sequence_classification.py template and implement the following:
SimpleRNN
, GRU
, and LSTM
) and dimensionality.In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard (or online here). Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:
--rnn=SimpleRNN --sequence_dim=1
, --rnn=GRU --sequence_dim=1
, --rnn=LSTM --sequence_dim=1
--sequence_dim=3
--sequence_dim=10
--rnn=SimpleRNN --hidden_layer=85 --rnn_dim=30 --sequence_dim=30
and the same with --clip_gradient=1
--rnn=GRU
with and without --clip_gradient=1
--rnn=LSTM
with and without --clip_gradient=1
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=SimpleRNN --epochs=5
Epoch 1/5 loss: 0.6996 - accuracy: 0.4572 - val_loss: 0.6958 - val_accuracy: 0.4485
Epoch 2/5 loss: 0.6931 - accuracy: 0.5275 - val_loss: 0.6930 - val_accuracy: 0.5257
Epoch 3/5 loss: 0.6913 - accuracy: 0.5480 - val_loss: 0.6914 - val_accuracy: 0.5398
Epoch 4/5 loss: 0.6901 - accuracy: 0.5479 - val_loss: 0.6901 - val_accuracy: 0.5523
Epoch 5/5 loss: 0.6887 - accuracy: 0.5493 - val_loss: 0.6886 - val_accuracy: 0.5540
python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=GRU --epochs=5
Epoch 1/5 loss: 0.6942 - accuracy: 0.4766 - val_loss: 0.6934 - val_accuracy: 0.4635
Epoch 2/5 loss: 0.6930 - accuracy: 0.5046 - val_loss: 0.6927 - val_accuracy: 0.5278
Epoch 3/5 loss: 0.6924 - accuracy: 0.5338 - val_loss: 0.6922 - val_accuracy: 0.5331
Epoch 4/5 loss: 0.6921 - accuracy: 0.5307 - val_loss: 0.6918 - val_accuracy: 0.5343
Epoch 5/5 loss: 0.6917 - accuracy: 0.5310 - val_loss: 0.6914 - val_accuracy: 0.5217
python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5
Epoch 1/5 loss: 0.6935 - accuracy: 0.4816 - val_loss: 0.6934 - val_accuracy: 0.4615
Epoch 2/5 loss: 0.6931 - accuracy: 0.4979 - val_loss: 0.6931 - val_accuracy: 0.5250
Epoch 3/5 loss: 0.6929 - accuracy: 0.5264 - val_loss: 0.6929 - val_accuracy: 0.5275
Epoch 4/5 loss: 0.6928 - accuracy: 0.5321 - val_loss: 0.6927 - val_accuracy: 0.5340
Epoch 5/5 loss: 0.6925 - accuracy: 0.5420 - val_loss: 0.6925 - val_accuracy: 0.5357
python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50
Epoch 1/5 loss: 0.6917 - accuracy: 0.5486 - val_loss: 0.6905 - val_accuracy: 0.5315
Epoch 2/5 loss: 0.6889 - accuracy: 0.5382 - val_loss: 0.6869 - val_accuracy: 0.5274
Epoch 3/5 loss: 0.6832 - accuracy: 0.5451 - val_loss: 0.6792 - val_accuracy: 0.5354
Epoch 4/5 loss: 0.6731 - accuracy: 0.5522 - val_loss: 0.6677 - val_accuracy: 0.5601
Epoch 5/5 loss: 0.6610 - accuracy: 0.5538 - val_loss: 0.6567 - val_accuracy: 0.5613
python3 sequence_classification.py --train_sequences=1000 --sequence_length=20 --rnn=LSTM --epochs=5 --hidden_layer=50 --clip_gradient=0.01
Epoch 1/5 loss: 0.6917 - accuracy: 0.5477 - val_loss: 0.6904 - val_accuracy: 0.5368
Epoch 2/5 loss: 0.6886 - accuracy: 0.5447 - val_loss: 0.6865 - val_accuracy: 0.5347
Epoch 3/5 loss: 0.6828 - accuracy: 0.5476 - val_loss: 0.6786 - val_accuracy: 0.5512
Epoch 4/5 loss: 0.6730 - accuracy: 0.5559 - val_loss: 0.6680 - val_accuracy: 0.5779
Epoch 5/5 loss: 0.6618 - accuracy: 0.5541 - val_loss: 0.6584 - val_accuracy: 0.5344
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 sequence_classification.py --rnn=SimpleRNN --epochs=5
Epoch 1/5 loss: 0.6925 - accuracy: 0.5138 - val_loss: 0.6890 - val_accuracy: 0.5187
Epoch 2/5 loss: 0.6844 - accuracy: 0.5200 - val_loss: 0.6792 - val_accuracy: 0.5126
Epoch 3/5 loss: 0.6758 - accuracy: 0.5174 - val_loss: 0.6722 - val_accuracy: 0.5169
Epoch 4/5 loss: 0.6705 - accuracy: 0.5146 - val_loss: 0.6681 - val_accuracy: 0.5127
Epoch 5/5 loss: 0.6668 - accuracy: 0.5145 - val_loss: 0.6643 - val_accuracy: 0.5201
python3 sequence_classification.py --rnn=GRU --epochs=5
Epoch 1/5 loss: 0.6928 - accuracy: 0.5090 - val_loss: 0.6918 - val_accuracy: 0.5160
Epoch 2/5 loss: 0.6868 - accuracy: 0.5160 - val_loss: 0.6762 - val_accuracy: 0.5314
Epoch 3/5 loss: 0.3761 - accuracy: 0.8131 - val_loss: 0.0623 - val_accuracy: 1.0000
Epoch 4/5 loss: 0.0344 - accuracy: 0.9993 - val_loss: 0.0194 - val_accuracy: 0.9996
Epoch 5/5 loss: 0.0137 - accuracy: 0.9997 - val_loss: 0.0085 - val_accuracy: 1.0000
python3 sequence_classification.py --rnn=LSTM --epochs=5
Epoch 1/5 loss: 0.6931 - accuracy: 0.5063 - val_loss: 0.6929 - val_accuracy: 0.5135
Epoch 2/5 loss: 0.6921 - accuracy: 0.5148 - val_loss: 0.6900 - val_accuracy: 0.5137
Epoch 3/5 loss: 0.5484 - accuracy: 0.6868 - val_loss: 0.1687 - val_accuracy: 0.9983
Epoch 4/5 loss: 0.0766 - accuracy: 0.9998 - val_loss: 0.0338 - val_accuracy: 1.0000
Epoch 5/5 loss: 0.0215 - accuracy: 1.0000 - val_loss: 0.0137 - val_accuracy: 1.0000
python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=50
Epoch 1/5 loss: 0.6829 - accuracy: 0.5160 - val_loss: 0.6601 - val_accuracy: 0.5166
Epoch 2/5 loss: 0.6447 - accuracy: 0.5398 - val_loss: 0.6310 - val_accuracy: 0.5274
Epoch 3/5 loss: 0.6226 - accuracy: 0.5545 - val_loss: 0.6108 - val_accuracy: 0.5522
Epoch 4/5 loss: 0.5895 - accuracy: 0.5859 - val_loss: 0.5529 - val_accuracy: 0.6215
Epoch 5/5 loss: 0.4641 - accuracy: 0.7130 - val_loss: 0.3106 - val_accuracy: 0.8517
python3 sequence_classification.py --rnn=LSTM --epochs=5 --hidden_layer=50 --clip_gradient=1
Epoch 1/5 loss: 0.6829 - accuracy: 0.5160 - val_loss: 0.6601 - val_accuracy: 0.5166
Epoch 2/5 loss: 0.6447 - accuracy: 0.5398 - val_loss: 0.6310 - val_accuracy: 0.5274
Epoch 3/5 loss: 0.6226 - accuracy: 0.5545 - val_loss: 0.6108 - val_accuracy: 0.5522
Epoch 4/5 loss: 0.5892 - accuracy: 0.5859 - val_loss: 0.5516 - val_accuracy: 0.6238
Epoch 5/5 loss: 0.4073 - accuracy: 0.7596 - val_loss: 0.1919 - val_accuracy: 0.9277
Deadline: Apr 11, 7:59 a.m. 3 points
In this assignment you will create a simple part-of-speech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The morpho_dataset.py module (down)loads the dataset and provides mappings between strings and integers.
Your goal is to modify the tagger_we.py template and implement the following:
GRU
and LSTM
) and dimensionality.Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16
loss: 2.3505 - accuracy: 0.2911 - val_loss: 1.9399 - val_accuracy: 0.4305
python3 tagger_we.py --epochs=1 --max_sentences=1000 --rnn=GRU --rnn_dim=16
loss: 2.1355 - accuracy: 0.4300 - val_loss: 1.4387 - val_accuracy: 0.5663
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_we.py --epochs=3 --max_sentences=5000 --rnn=LSTM --rnn_dim=64
Epoch 1/3 loss: 0.9607 - accuracy: 0.7078 - val_loss: 0.3827 - val_accuracy: 0.8756
Epoch 2/3 loss: 0.1007 - accuracy: 0.9725 - val_loss: 0.2948 - val_accuracy: 0.8972
Epoch 3/3 loss: 0.0256 - accuracy: 0.9931 - val_loss: 0.2844 - val_accuracy: 0.9024
python3 tagger_we.py --epochs=3 --max_sentences=5000 --rnn=GRU --rnn_dim=64
Epoch 1/3 loss: 0.7540 - accuracy: 0.7717 - val_loss: 0.3682 - val_accuracy: 0.8712
Epoch 2/3 loss: 0.0726 - accuracy: 0.9797 - val_loss: 0.3989 - val_accuracy: 0.8639
Epoch 3/3 loss: 0.0236 - accuracy: 0.9926 - val_loss: 0.3725 - val_accuracy: 0.8772
Deadline: Apr 11, 7:59 a.m. 3 points
This assignment is a continuation of tagger_we
. Using the
tagger_cle.py
template, implement character-level word embedding computation using
a bidirectional character-level GRU.
Once submitted to ReCodEx, you should experiment with the effect of CLEs
compared to a plain tagger_we
, and the influence of their dimensionality. Note
that tagger_cle
has by default smaller word embeddings so that the size
of word representation (64 + 32 + 32) is the same as in the tagger_we
assignment.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16
loss: 2.2389 - accuracy: 0.3011 - val_loss: 1.7624 - val_accuracy: 0.4600
python3 tagger_cle.py --epochs=1 --max_sentences=1000 --rnn=LSTM --rnn_dim=16 --cle_dim=16 --word_masking=0.1
loss: 2.2506 - accuracy: 0.2967 - val_loss: 1.7892 - val_accuracy: 0.4606
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_cle.py --epochs=3 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32
Epoch 1/3 loss: 0.9835 - accuracy: 0.7045 - val_loss: 0.3117 - val_accuracy: 0.9046
Epoch 2/3 loss: 0.1116 - accuracy: 0.9740 - val_loss: 0.1840 - val_accuracy: 0.9358
Epoch 3/3 loss: 0.0369 - accuracy: 0.9906 - val_loss: 0.1672 - val_accuracy: 0.9394
python3 tagger_cle.py --epochs=3 --max_sentences=5000 --rnn=LSTM --rnn_dim=32 --cle_dim=32 --word_masking=0.1
Epoch 1/3 loss: 1.0664 - accuracy: 0.6762 - val_loss: 0.3462 - val_accuracy: 0.8997
Epoch 2/3 loss: 0.1977 - accuracy: 0.9475 - val_loss: 0.1834 - val_accuracy: 0.9461
Epoch 3/3 loss: 0.1009 - accuracy: 0.9711 - val_loss: 0.1619 - val_accuracy: 0.9504
Deadline: Apr 11, 7:59 a.m. 4 points+5 bonus
There is a bug in GPU implementation of masked LSTM and GRU when using cuDNN
8.1 or newer, causing silent corruption of results. The easiest way around it is
to (a) either use LSTM and GRU on non-masked inputs (plain rectangular tensors),
or (b) use non-zero recurrent_dropout
, which prevents using the cuDNN.
In this assignment, you should extend tagger_cle
into a real-world Czech part-of-speech tagger. We will use
Czech PDT dataset loadable using the morpho_dataset.py
module. Note that the dataset contains more than 1500 unique POS tags and that
the POS tags have a fixed structure of 15 positions (so it is possible to
generate the POS tag characters independently).
You can use the following additional data in this assignment:
The task is a competition. Everyone who submits a solution with at least 92.5% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing pre-neural-network state-of-the-art of 96.35%.
You can start with the
tagger_competition.py
template, which among others generates test set annotations in the required format. Note that
you can evaluate the predictions as usual using the morpho_dataset.py
module, either by running with --task=tagger --evaluate=path
arguments, or using its
evaluate_file
method.
You can try exploring the TensorBoard Projector with pre-trained embeddings
for 20k most frequent lemmas in
Czech
and English
– after extracting the archive, start
tensorboard --logdir dir_where_the_archive_is_extracted
.
In order to use the Projector tab yourself, you can take inspiration from the projector_export.py script, which was used to export the above pre-trained embeddings from the Word2vec format.
Deadline: Apr 17, 7:59 a.m. 2 points
This assignment is an extension of tagger_we
task. Using the
tagger_crf.py
template, implement named entity recognition using CRF loss and CRF decoding
from the tensorflow_addons
package.
The evaluation is performed using the provided metric computing F1 score of the span prediction (i.e., a recognized possibly-multiword named entity is true positive if both the entity type and the span exactly match).
In practice, character-level embeddings (and also pre-trained word embeddings) would be used to obtain superior results.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_crf.py --epochs=2 --max_sentences=1000 --rnn=LSTM --rnn_dim=24
Epoch 1/2 loss: 30.0429 - val_loss: 19.9425 - val_f1: 0.0000e+00
Epoch 2/2 loss: 16.9279 - val_loss: 17.7281 - val_f1: 0.0039
python3 tagger_crf.py --epochs=2 --max_sentences=1000 --rnn=GRU --rnn_dim=24
Epoch 1/2 loss: 29.0089 - val_loss: 19.2492 - val_f1: 0.0000e+00
Epoch 2/2 loss: 15.3984 - val_loss: 17.9794 - val_f1: 0.0811
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_crf.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=64
Epoch 1/5 loss: 17.2315 - val_loss: 13.4043 - val_f1: 0.1145
Epoch 2/5 loss: 8.7492 - val_loss: 10.6630 - val_f1: 0.3155
Epoch 3/5 loss: 4.5754 - val_loss: 9.7337 - val_f1: 0.4022
Epoch 4/5 loss: 2.1429 - val_loss: 10.1171 - val_f1: 0.4463
Epoch 5/5 loss: 1.1066 - val_loss: 10.5541 - val_f1: 0.4553
python3 tagger_crf.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=64
Epoch 1/5 loss: 16.4778 - val_loss: 12.8231 - val_f1: 0.2221
Epoch 2/5 loss: 7.2317 - val_loss: 9.5449 - val_f1: 0.4297
Epoch 3/5 loss: 2.8447 - val_loss: 10.2954 - val_f1: 0.4776
Epoch 4/5 loss: 1.1184 - val_loss: 11.5283 - val_f1: 0.4702
Epoch 5/5 loss: 0.5509 - val_loss: 11.1679 - val_f1: 0.4822
Deadline: Apr 17, 7:59 a.m. 2 points
This assignment is an extension of tagger_crf
, where we will perform the CRF
loss computation (but not the CRF decoding) manually.
The tagger_crf_manual.py
template is nearly identical to tagger_crf
, the only difference is the
crf_loss
method, where you should manually implement the CRF loss.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_crf_manual.py --epochs=2 --max_sentences=1000 --rnn=LSTM --rnn_dim=24
Epoch 1/2 loss: 30.0429 - val_loss: 19.9425 - val_f1: 0.0000e+00
Epoch 2/2 loss: 16.9279 - val_loss: 17.7281 - val_f1: 0.0039
python3 tagger_crf_manual.py --epochs=2 --max_sentences=1000 --rnn=GRU --rnn_dim=24
Epoch 1/2 loss: 29.0089 - val_loss: 19.2492 - val_f1: 0.0000e+00
Epoch 2/2 loss: 15.3984 - val_loss: 17.9794 - val_f1: 0.0811
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_crf_manual.py --epochs=5 --max_sentences=5000 --rnn=LSTM --rnn_dim=64
Epoch 1/5 loss: 17.2315 - val_loss: 13.4043 - val_f1: 0.1145
Epoch 2/5 loss: 8.7492 - val_loss: 10.6630 - val_f1: 0.3155
Epoch 3/5 loss: 4.5754 - val_loss: 9.7337 - val_f1: 0.4022
Epoch 4/5 loss: 2.1429 - val_loss: 10.1171 - val_f1: 0.4463
Epoch 5/5 loss: 1.1066 - val_loss: 10.5541 - val_f1: 0.4553
python3 tagger_crf_manual.py --epochs=5 --max_sentences=5000 --rnn=GRU --rnn_dim=64
Epoch 1/5 loss: 16.4778 - val_loss: 12.8231 - val_f1: 0.2221
Epoch 2/5 loss: 7.2317 - val_loss: 9.5449 - val_f1: 0.4297
Epoch 3/5 loss: 2.8447 - val_loss: 10.2954 - val_f1: 0.4776
Epoch 4/5 loss: 1.1184 - val_loss: 11.5283 - val_f1: 0.4702
Epoch 5/5 loss: 0.5509 - val_loss: 11.1679 - val_f1: 0.4822
Deadline: Apr 17, 7:59 a.m. 5 points+5 bonus
This assignment is a competition task in speech recognition area. Specifically,
your goal is to predict a sequence of letters given a spoken utterance.
We will be using Czech recordings from the Common Voice,
with input sound waves passed through the usual preprocessing – computing
Mel-frequency cepstral coefficients (MFCCs).
You can repeat this preprocessing on a given audio using the wav_decode
and
mfcc_extract
methods from the
common_voice_cs.py module.
This module can also load the dataset, downloading it when necessary (note that
it has 200MB, so it might take a while). Furthermore, you can listen to the
development portion of the dataset.
Lastly, the whole dataset is available for
download in MP3 format
(but you are not expected to download that, only if you would like to perform some
custom preprocessing).
Additional following data can be utilized in this assignment:
The task is a competition.
The evaluation is performed by computing the edit distance to the gold letter
sequence, normalized by its length (a corresponding Keras metric
EditDistanceMetric
is provided by the common_voice_cs.py).
Everyone who submits a solution with at most 50% test set edit distance
gets 5 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. Note that
you can evaluate the predictions as usual using the common_voice_cs.py
module, either by running with --evaluate=path
arguments, or using its
evaluate_file
method.
Start with the speech_recognition.py template which contains instructions for using the CTC loss and generates the test set annotation in the required format.
Deadline: Apr 24, 7:59 a.m. 3 points+4 bonus
Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data or 32×32×32 data. To load the dataset, use the modelnet.py module.
The official dataset offers only train and test sets, with the test set having a different distributions of labels. Our dataset contains also a development set, which has nearly the same label distribution as the test set.
If you want, it is possible to use any model from tf.keras.applications
in
this assignment; however, the only way I know how to utilize such a pre-trained
model is to render the objects to a sequence of 2D images and classify them
instead.
The task is a competition. Everyone who submits a solution which achieves at least 88% test set accuracy gets 3 points; the rest 4 points will be distributed depending on relative ordering of your solutions.
You can start with the 3d_recognition.py template, which among others generates test set annotations in the required format.
Deadline: Apr 24May 2, 7:59 a.m.
3 points+5 bonus
Tackle the handwritten optical music recognition in this assignment. The inputs are grayscale images of monophonic scores starting with a clef, key signature, and a time signature, followed by several staves. The dataset is loadable using the homr_dataset.py module, and is downloaded automatically if missing (note that it has ~500MB, so it might take a while). No other data or pretrained models are allowed for training.
The task is a competition.
The evaluation is performed using the same metric as in speech_recognition
, by
computing edit distance to the gold sequence, normalized by its length (the
EditDistanceMetric
is again provided by the
homr_dataset.py).
Everyone who submits a solution with at most
3% test set edit distance will get 3 points; the rest 5 points will be
distributed depending on relative ordering of your solutions.
You can evaluate the predictions as usual using the
homr_dataset.py
module, either by running with the --evaluate=path
argument, or using its
evaluate_file
method.
You can start with the homr_competition.py template, which among others generates test set annotations in the required format.
Deadline: May 2, 7:59 a.m. 3 points
The goal of this assignment is to create a simple lemmatizer. For training
and evaluation, we use the same dataset as in tagger_we
loadable by the
updated morpho_dataset.py
module.
Your goal is to modify the lemmatizer_noattn.py template and implement the following:
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 lemmatizer_noattn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32
loss: 3.0551 - val_accuracy: 0.1196 - 16s/epoch - 64ms/step
python3 lemmatizer_noattn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --tie_embeddings
loss: 2.8971 - val_accuracy: 0.1409 - 15s/epoch - 61ms/step
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 lemmatizer_noattn.py --epochs=3 --max_sentences=5000
Epoch 1/3 loss: 2.2070 - val_accuracy: 0.3395 - 38s/epoch - 77ms/step
Epoch 2/3 loss: 0.9121 - val_accuracy: 0.4963 - 30s/epoch - 59ms/step
Epoch 3/3 loss: 0.5123 - val_accuracy: 0.6151 - 30s/epoch - 61ms/step
python3 lemmatizer_noattn.py --epochs=3 --max_sentences=5000 --tie_embeddings
Epoch 1/3 loss: 1.8830 - val_accuracy: 0.3853 - 42s/epoch - 84ms/step
Epoch 2/3 loss: 0.7513 - val_accuracy: 0.5403 - 29s/epoch - 59ms/step
Epoch 3/3 loss: 0.4643 - val_accuracy: 0.6319 - 33s/epoch - 66ms/step
Deadline: May 2, 7:59 a.m. 3 points
This task is a continuation of the lemmatizer_noattn
assignment. Using the
lemmatizer_attn.py
template, implement the following features in addition to lemmatizer_noattn
:
Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 lemmatizer_attn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32
loss: 3.0485 - val_accuracy: 0.0794 - 22s/epoch - 89ms/step
python3 lemmatizer_attn.py --epochs=1 --max_sentences=500 --batch_size=2 --cle_dim=32 --rnn_dim=32 --tie_embeddings
loss: 2.8510 - val_accuracy: 0.1601 - 22s/epoch - 88ms/step
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 lemmatizer_attn.py --epochs=3 --max_sentences=5000
Epoch 1/3 loss: 1.8517 - val_accuracy: 0.6009 - 62s/epoch - 125ms/step
Epoch 2/3 loss: 0.3652 - val_accuracy: 0.7027 - 48s/epoch - 95ms/step
Epoch 3/3 loss: 0.2269 - val_accuracy: 0.7944 - 48s/epoch - 96ms/step
python3 lemmatizer_attn.py --epochs=3 --max_sentences=5000 --tie_embeddings
Epoch 1/3 loss: 1.6228 - val_accuracy: 0.6352 - 61s/epoch - 122ms/step
Epoch 2/3 loss: 0.3158 - val_accuracy: 0.7529 - 48s/epoch - 96ms/step
Epoch 3/3 loss: 0.1949 - val_accuracy: 0.7913 - 48s/epoch - 96ms/step
Deadline: May 9, 7:59 a.m. 4 points+5 bonus
In this assignment, you should extend lemmatizer_noattn
or lemmatizer_attn
into a real-world Czech lemmatizer. As in tagger_competition
, we will use
Czech PDT dataset loadable using the morpho_dataset.py
module.
You can also use the same additional data as in the tagger_competition
assignment.
The task is a competition. Everyone who submits a solution with at least 96.5% label accuracy gets 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Lastly, 3 bonus points will be given to anyone surpassing pre-neural-network state-of-the-art of 98.76%.
You can start with the
lemmatizer_competition.py
template, which among others generates test set annotations in the required format. Note that
you can evaluate the predictions as usual using the morpho_dataset.py
module, either by running with --task=lemmatizer --evaluate=path
arguments, or using its
evaluate_file
method.
Deadline: May 09, 7:59 a.m. 3 points
This assignment is a continuation of tagger_we
. Using the
tagger_transformer.py
template, implement a Pre-LN Transformer encoder.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_layers=0
loss: 2.3873 - accuracy: 0.2446 - val_loss: 2.0423 - val_accuracy: 0.3588
python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=1
loss: 2.0967 - accuracy: 0.3463 - val_loss: 1.8760 - val_accuracy: 0.4181
python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=4
loss: 2.1210 - accuracy: 0.3376 - val_loss: 1.9558 - val_accuracy: 0.3937
python3 tagger_transformer.py --epochs=1 --max_sentences=800 --transformer_heads=4 --transformer_dropout=0.1
loss: 2.2215 - accuracy: 0.3050 - val_loss: 2.0125 - val_accuracy: 0.3264
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 tagger_transformer.py --max_sentences=5000 --transformer_layers=0
Epoch 1/5 loss: 1.5015 - accuracy: 0.5407 - val_loss: 0.8692 - val_accuracy: 0.7163
Epoch 2/5 loss: 0.5383 - accuracy: 0.8481 - val_loss: 0.5703 - val_accuracy: 0.8240
Epoch 3/5 loss: 0.2542 - accuracy: 0.9589 - val_loss: 0.4611 - val_accuracy: 0.8308
Epoch 4/5 loss: 0.1315 - accuracy: 0.9795 - val_loss: 0.4289 - val_accuracy: 0.8330
Epoch 5/5 loss: 0.0791 - accuracy: 0.9852 - val_loss: 0.4171 - val_accuracy: 0.8348
python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=1
Epoch 1/5 loss: 1.0808 - accuracy: 0.6464 - val_loss: 0.5847 - val_accuracy: 0.7878
Epoch 2/5 loss: 0.2452 - accuracy: 0.9226 - val_loss: 0.4712 - val_accuracy: 0.8389
Epoch 3/5 loss: 0.0752 - accuracy: 0.9779 - val_loss: 0.7052 - val_accuracy: 0.8136
Epoch 4/5 loss: 0.0432 - accuracy: 0.9860 - val_loss: 0.6045 - val_accuracy: 0.8314
Epoch 5/5 loss: 0.0324 - accuracy: 0.9888 - val_loss: 0.6385 - val_accuracy: 0.8323
python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4
Epoch 1/5 loss: 1.0461 - accuracy: 0.6636 - val_loss: 0.5026 - val_accuracy: 0.8155
Epoch 2/5 loss: 0.1966 - accuracy: 0.9391 - val_loss: 0.4557 - val_accuracy: 0.8386
Epoch 3/5 loss: 0.0712 - accuracy: 0.9777 - val_loss: 0.5322 - val_accuracy: 0.8262
Epoch 4/5 loss: 0.0424 - accuracy: 0.9858 - val_loss: 0.5099 - val_accuracy: 0.8474
Epoch 5/5 loss: 0.0309 - accuracy: 0.9891 - val_loss: 0.6569 - val_accuracy: 0.8404
python3 tagger_transformer.py --max_sentences=5000 --transformer_heads=4 --transformer_dropout=0.1
Epoch 1/5 loss: 1.1482 - accuracy: 0.6274 - val_loss: 0.5542 - val_accuracy: 0.7950
Epoch 2/5 loss: 0.2579 - accuracy: 0.9176 - val_loss: 0.4971 - val_accuracy: 0.8091
Epoch 3/5 loss: 0.0944 - accuracy: 0.9727 - val_loss: 0.5654 - val_accuracy: 0.8082
Epoch 4/5 loss: 0.0573 - accuracy: 0.9820 - val_loss: 0.5170 - val_accuracy: 0.8340
Epoch 5/5 loss: 0.0448 - accuracy: 0.9850 - val_loss: 0.5199 - val_accuracy: 0.8465
Deadline: May 09, 7:59 a.m. 2 points
Perform sentiment analysis on Czech Facebook data using a provided pre-trained
Czech Electra model eleczech-lc-small
.
The dataset consists of pairs of (document, label) and can be (down)loaded using the
text_classification_dataset.py
module. When loading the dataset, a tokenizer
might be provided, and if it is,
the document is also passed through the tokenizer and the resulting tokens are
added to the dataset.
Even though this assignment is not a competition, your goal is to submit test
set annotations with at least 77% accuracy. As usual, you can evaluate your
predictions using the text_classification_dataset.py
module, either by running it with the --evaluate=path
argument, or using its
evaluate_file
method.
Note that contrary to working with EfficientNet, you need to finetune the Electra model in order to achieve the required accuracy.
You can start with the sentiment_analysis.py template, which among others loads the Electra Czech model and generates test set annotations in the required format. Note that bert_example.py module illustrates the usage of both the Electra tokenizer and the Electra model.
Deadline: May 15, 7:59 a.m. 4 points+5 bonus
Implement the best possible model for reading comprehension task using
an automatically translated version of the SQuAD 1.1 dataset, utilizing a provided
Czech RoBERTa model ufal/robeczech-base
.
The dataset can be loaded using the
reading_comprehension_dataset.py
module. The loaded dataset is the direct representation of the data and not yet
ready to be directly trained on. Each of the train
, dev
and test
datasets
are composed of a list of paragraphs, each consisting of:
context
: text with various information;qas
: list of questions and answers, where each item consists of:
question
: text of the question;answers
: a list of answers, each answer is composed of:
text
: answer test as string, exactly as appearing in the context;start
: character offset of the answer text in the context.In the train
and dev
sets, each question has exactly one answer, while in
the test
set there might be several answers. We evaluate the reading
comprehension task using accuracy, where an answer is considered correct if
its text is exactly equal to some correct answer. You can evaluate your
predictions as usual with the
reading_comprehension_dataset.py
module, either by running with --evaluate=path
arguments, or using its
evaluate_file
method.
The task is a competition. Everyone who submits
a solution with at least 65% answer accuracy gets 4 points; the rest 5 points
will be distributed depending on relative ordering of your solutions. Note that
usually achieving 62% on the dev
set is enough to get 65% on the test
set (because of multiple references in the test
set).
Note that contrary to working with EfficientNet, you need to finetune the RobeCzech model in order to achieve the required accuracy.
You can start with the reading_comprehension.py template, which among others (down)loads the data and the RobeCzech model, and describes the format of the required test set annotations.
Deadline: Jun 30, 23:59 2 points
Solve the continuous CartPole-v1 environment
from the Gymnasium library using the REINFORCE
algorithm. The gymnasium
environments have the following methods and
properties:
observation_space
: the description of environment observations; for
continuous spaces, observation_space.shape
contains their shapeaction_space
: the description of environment actions; for discrete
actions, action_space.n
is the number of actionsreset() → new_state, info
: starts a new episode, returning the new
state and additional environment-specific informationstep(action) → new_state, reward, terminated, truncated, info
: performs the
chosen action in the environment, returning the new state, obtained reward,
boolean flags indicating a terminal state and episode truncation, and
additional environment-specific informationWe additionally extend the gymnasium
environment by:
episode
: number of the current episode (zero-based)reset(start_evaluation=False) → new_state, info
: if start_evaluation
is
True
, an evaluation is startedOnce you finish training (which you indicate by passing start_evaluation=True
to reset
), your goal is to reach an average return of 475 during 100
evaluation episodes. Note that the environment prints your 100-episode
average return each 10 episodes even during training.
Start with the reinforce.py
template, which provides a simple network implementation in TensorFlow. However,
feel free to use PyTorch or JAX instead, if you like.
You will also need the wrappers.py
module, which wraps the standard gymnasium
API with the above-mentioned added features we use.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
Deadline: Jun 30, 23:59 2 points
This is a continuation of the reinforce
assignment.
Using the reinforce_baseline.py template, solve the continuous CartPole-v1 environment using the REINFORCE with baseline algorithm.
Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline. In this assignment, you must train your agent in ReCodEx using the provided environment only.
Your goal is to reach an average return of 475 during 100 evaluation episodes.
During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 5 minutes.
Deadline: Jun 30, 23:59 2 points
This is a continuation of the reinforce_baseline
assignment.
The supplied cart_pole_pixels_environment.py
generates a pixel representation of the CartPole
environment
as an $80×80$ np.uint8
image with three channels, with each channel representing one time step
(i.e., the current observation and the two previous ones).
To pass the assignment, you need to reach an average return of 400 in 100 evaluation episodes. During evaluation in ReCodEx, two different random seeds will be employed, and you need to reach the required return on all of them. Time limit for each test is 10 minutes.
You should probably train the model locally and submit the already pretrained model to ReCodEx.
Start with the reinforce_pixels.py template, which parses several parameters and creates the correct environment.
Deadline: Jun 30, 23:59 3 points
In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format. Your goal is to modify the vae.py template and implement a VAE.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnist-fashion
, and mnist-cifarcars
) and
different latent variable dimensionality (z_dim=2
and z_dim=100
).
The generated images are available in TensorBoard logs, and the images
generated by the reference solution can be also seen in the Examples.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 vae.py --dataset=mnist --train_size=500 --epochs=3 --z_dim=2
Epoch 1/3 reconstruction_loss: 0.5109 - latent_loss: 13.7155 - loss: 312.1707
Epoch 2/3 reconstruction_loss: 0.2830 - latent_loss: 8.5660 - loss: 215.4080
Epoch 3/3 reconstruction_loss: 0.2599 - latent_loss: 3.5324 - loss: 210.2326
python3 vae.py --dataset=mnist --train_size=500 --epochs=3 --z_dim=100
Epoch 1/3 reconstruction_loss: 0.4600 - latent_loss: 0.0883 - loss: 243.2955
Epoch 2/3 reconstruction_loss: 0.2742 - latent_loss: 0.0130 - loss: 206.6684
Epoch 3/3 reconstruction_loss: 0.2625 - latent_loss: 0.0076 - loss: 200.9970
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 vae.py --dataset=mnist --z_dim=2
python3 vae.py --dataset=mnist --z_dim=100
python3 vae.py --dataset=mnist-fashion --z_dim=2
python3 vae.py --dataset=mnist-fashion --z_dim=100
python3 vae.py --dataset=mnist-cifarcars --z_dim=2
python3 vae.py --dataset=mnist-cifarcars --z_dim=100
Deadline: Evaluation Jun 1-15
If you would like to try participating in a real shared task, right now CRAC 2023 Shared Task on Multilingual Coreference Resolution is running, with the evaluation phase in Jun 1-15.
The goal is to perform coreference resolution on 17 datasets in 12 languages,
where coreference resolution is the task of clustering together multiple
mentions of the same entity appearing in a textual document (e.g., Joe Biden
,
the U.S. President
, and he
).
Note that if you send a solution, you would ideally also send an accompanying “system description” paper – that requires some work, but you would have a published paper.
Deadline: Jun 30, 23:59 2 points
In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format. Your goal is to modify the gan.py template and implement a GAN.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnist-fashion
, and mnist-cifarcars
) and
maybe try different latent variable dimensionality. The generated images are
available in TensorBoard logs, and the images generated by the reference
solution can be also seen in the Examples.
You can also continue with dcgan
assignment.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 gan.py --dataset=mnist --train_size=490 --epochs=5 --z_dim=2
Epoch 1/5 discriminator_loss: 0.4331 - generator_loss: 3.3729 - loss: 1.2428 - discriminator_accuracy: 0.9092
Epoch 2/5 discriminator_loss: 0.0269 - generator_loss: 5.0672 - loss: 1.6841 - discriminator_accuracy: 1.0000
Epoch 3/5 discriminator_loss: 0.0102 - generator_loss: 5.5289 - loss: 1.8392 - discriminator_accuracy: 1.0000
Epoch 4/5 discriminator_loss: 0.0274 - generator_loss: 5.4984 - loss: 1.8360 - discriminator_accuracy: 0.9990
Epoch 5/5 discriminator_loss: 0.0144 - generator_loss: 5.8541 - loss: 1.9575 - discriminator_accuracy: 1.0000
python3 gan.py --dataset=mnist --train_size=490 --epochs=5 --z_dim=100
Epoch 1/5 discriminator_loss: 0.4518 - generator_loss: 3.1389 - loss: 1.1873 - discriminator_accuracy: 0.9010
Epoch 2/5 discriminator_loss: 0.0400 - generator_loss: 4.2957 - loss: 1.4290 - discriminator_accuracy: 1.0000
Epoch 3/5 discriminator_loss: 0.0314 - generator_loss: 4.6148 - loss: 1.5455 - discriminator_accuracy: 1.0000
Epoch 4/5 discriminator_loss: 0.0410 - generator_loss: 5.2547 - loss: 1.7673 - discriminator_accuracy: 0.9969
Epoch 5/5 discriminator_loss: 0.0717 - generator_loss: 5.2141 - loss: 1.7783 - discriminator_accuracy: 0.9898
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 gan.py --dataset=mnist --z_dim=2
python3 gan.py --dataset=mnist --z_dim=100
python3 gan.py --dataset=mnist-fashion --z_dim=2
python3 gan.py --dataset=mnist-fashion --z_dim=100
python3 gan.py --dataset=mnist-cifarcars --z_dim=2
python3 gan.py --dataset=mnist-cifarcars --z_dim=100
Deadline: Jun 30, 23:59 1 points
This task is a continuation of the gan
assignment, which you will modify to
implement the Deep Convolutional GAN (DCGAN).
Start with the
dcgan.py
template and implement a DCGAN. Note that most of the TODO notes are from
the gan
assignment.
After submitting the assignment to ReCodEx, you can experiment with the three
available datasets (mnist
, mnist-fashion
, and mnist-cifarcars
). However,
note that you will need a lot of computational power (preferably a GPU) to
generate the images. The generated images are available in TensorBoard logs, and
the images generated by the reference solution can be also seen in the Examples.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 dcgan.py --dataset=mnist --train_size=490 --epochs=2 --z_dim=2
Epoch 1/2 discriminator_loss: 2.0334 - generator_loss: 0.9712 - loss: 1.0177 - discriminator_accuracy: 0.5459
Epoch 2/2 discriminator_loss: 1.1909 - generator_loss: 1.0277 - loss: 0.7505 - discriminator_accuracy: 0.7163
python3 dcgan.py --dataset=mnist --train_size=490 --epochs=2 --z_dim=100
Epoch 1/2 discriminator_loss: 1.9740 - generator_loss: 0.8505 - loss: 0.9366 - discriminator_accuracy: 0.5071
Epoch 2/2 discriminator_loss: 1.4274 - generator_loss: 1.0789 - loss: 0.8433 - discriminator_accuracy: 0.6143
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 dcgan.py --dataset=mnist --z_dim=2
python3 dcgan.py --dataset=mnist --z_dim=100
python3 dcgan.py --dataset=mnist-fashion --z_dim=2
python3 dcgan.py --dataset=mnist-fashion --z_dim=100
python3 dcgan.py --dataset=mnist-cifarcars --z_dim=2
python3 dcgan.py --dataset=mnist-cifarcars --z_dim=100
Deadline: Jun 30, 23:59 4 points
Implement a simple variant of learning-to-learn architecture using the learning_to_learn.py template. Utilizing the Omniglot dataset loadable using the omniglot_dataset.py module, the goal is to learn to classify a sequence of images using a custom hierarchy by employing external memory.
The input image sequences consist of args.classes
randomly chosen Omniglot
classes, each class being assigned a randomly chosen label. For every chosen
class, args.images_per_class
images are randomly selected. Apart from the
images, the input contain the random labels one step after the corresponding
images (with the first label being -1). The gold outputs are also the labels,
but without the one-step offset.
The input images should be passed through a CNN feature extraction module
and then processed using memory augmented LSTM controller; the external memory
contains enough memory cells, each with args.cell_size
units. In each step,
the controller emits:
args.read_heads
read keys, each used to perform a read from memory as
a weighted combination of cells according to the softmax of cosine
similarities of the read key and the memory cells;Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 learning_to_learn.py --train_episodes=160 --test_episodes=160 --epochs=3 --classes=2
Epoch 1/3 loss: 0.7535 - acc: 0.4984 - acc1: 0.5250 - acc2: 0.4875 - acc5: 0.4938 - acc10: 0.5000 - val_loss: 0.6918 - val_acc: 0.5525 - val_acc1: 0.7375 - val_acc2: 0.6125 - val_acc5: 0.5531 - val_acc10: 0.4969
Epoch 2/3 loss: 0.6968 - acc: 0.4956 - acc1: 0.5531 - acc2: 0.4969 - acc5: 0.5031 - acc10: 0.4719 - val_loss: 0.6907 - val_acc: 0.5447 - val_acc1: 0.6969 - val_acc2: 0.6187 - val_acc5: 0.5344 - val_acc10: 0.4906
Epoch 3/3 loss: 0.6937 - acc: 0.5138 - acc1: 0.5781 - acc2: 0.5094 - acc5: 0.5125 - acc10: 0.4812 - val_loss: 0.6895 - val_acc: 0.5547 - val_acc1: 0.7688 - val_acc2: 0.5938 - val_acc5: 0.5063 - val_acc10: 0.4875
python3 learning_to_learn.py --train_episodes=160 --test_episodes=160 --epochs=3 --read_heads=2 --classes=5
Epoch 1/3 loss: 1.6529 - acc: 0.2004 - acc1: 0.2050 - acc2: 0.1838 - acc5: 0.2100 - acc10: 0.2075 - val_loss: 1.6091 - val_acc: 0.2136 - val_acc1: 0.2812 - val_acc2: 0.2100 - val_acc5: 0.2013 - val_acc10: 0.1925
Epoch 2/3 loss: 1.6139 - acc: 0.1996 - acc1: 0.2113 - acc2: 0.1675 - acc5: 0.2025 - acc10: 0.1925 - val_loss: 1.6078 - val_acc: 0.1984 - val_acc1: 0.2125 - val_acc2: 0.2075 - val_acc5: 0.2075 - val_acc10: 0.1850
Epoch 3/3 loss: 1.6102 - acc: 0.2066 - acc1: 0.2200 - acc2: 0.2150 - acc5: 0.2138 - acc10: 0.2013 - val_loss: 1.6068 - val_acc: 0.2237 - val_acc1: 0.3988 - val_acc2: 0.2188 - val_acc5: 0.2100 - val_acc10: 0.1688
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 learning_to_learn.py --epochs=50 --classes=2
Epoch 1/50 loss: 0.6592 - acc: 0.5888 - acc1: 0.7211 - acc2: 0.6151 - acc5: 0.5750 - acc10: 0.5482 - val_loss: 0.6233 - val_acc: 0.6270 - val_acc1: 0.7290 - val_acc2: 0.6310 - val_acc5: 0.5960 - val_acc10: 0.6510
Epoch 2/50 loss: 0.4043 - acc: 0.7890 - acc1: 0.6024 - acc2: 0.7499 - acc5: 0.8166 - acc10: 0.8218 - val_loss: 0.3930 - val_acc: 0.8067 - val_acc1: 0.6135 - val_acc2: 0.7830 - val_acc5: 0.8485 - val_acc10: 0.8485
Epoch 3/50 loss: 0.2931 - acc: 0.8566 - acc1: 0.6158 - acc2: 0.8373 - acc5: 0.8928 - acc10: 0.8890 - val_loss: 0.3038 - val_acc: 0.8551 - val_acc1: 0.6165 - val_acc2: 0.8275 - val_acc5: 0.8990 - val_acc10: 0.9060
Epoch 4/50 loss: 0.2133 - acc: 0.8988 - acc1: 0.6371 - acc2: 0.8763 - acc5: 0.9320 - acc10: 0.9391 - val_loss: 0.2513 - val_acc: 0.8849 - val_acc1: 0.6365 - val_acc2: 0.8440 - val_acc5: 0.9155 - val_acc10: 0.9360
Epoch 5/50 loss: 0.1714 - acc: 0.9194 - acc1: 0.6637 - acc2: 0.9099 - acc5: 0.9480 - acc10: 0.9561 - val_loss: 0.2459 - val_acc: 0.8897 - val_acc1: 0.6235 - val_acc2: 0.8750 - val_acc5: 0.9265 - val_acc10: 0.9400
Epoch 10/50 loss: 0.1125 - acc: 0.9449 - acc1: 0.6888 - acc2: 0.9461 - acc5: 0.9754 - acc10: 0.9801 - val_loss: 0.1714 - val_acc: 0.9201 - val_acc1: 0.6665 - val_acc2: 0.8975 - val_acc5: 0.9535 - val_acc10: 0.9665
Epoch 20/50 loss: 0.0784 - acc: 0.9588 - acc1: 0.7069 - acc2: 0.9671 - acc5: 0.9891 - acc10: 0.9908 - val_loss: 0.1525 - val_acc: 0.9320 - val_acc1: 0.6720 - val_acc2: 0.9255 - val_acc5: 0.9640 - val_acc10: 0.9755
Epoch 50/50 loss: 0.0585 - acc: 0.9659 - acc1: 0.7199 - acc2: 0.9819 - acc5: 0.9950 - acc10: 0.9958 - val_loss: 0.1255 - val_acc: 0.9430 - val_acc1: 0.6900 - val_acc2: 0.9280 - val_acc5: 0.9760 - val_acc10: 0.9860
python3 learning_to_learn.py --epochs=50 --read_heads=2 --classes=5
Epoch 1/50 loss: 1.5718 - acc: 0.2606 - acc1: 0.3784 - acc2: 0.2764 - acc5: 0.2469 - acc10: 0.2316 - val_loss: 1.3803 - val_acc: 0.3761 - val_acc1: 0.4078 - val_acc2: 0.3412 - val_acc5: 0.3638 - val_acc10: 0.4156
Epoch 2/50 loss: 0.9240 - acc: 0.5968 - acc1: 0.2956 - acc2: 0.4762 - acc5: 0.6400 - acc10: 0.6911 - val_loss: 0.8087 - val_acc: 0.6602 - val_acc1: 0.2484 - val_acc2: 0.5250 - val_acc5: 0.7178 - val_acc10: 0.7606
Epoch 3/50 loss: 0.5984 - acc: 0.7496 - acc1: 0.2499 - acc2: 0.6078 - acc5: 0.8306 - acc10: 0.8504 - val_loss: 0.7452 - val_acc: 0.6980 - val_acc1: 0.2322 - val_acc2: 0.5628 - val_acc5: 0.7754 - val_acc10: 0.8008
Epoch 4/50 loss: 0.5145 - acc: 0.7851 - acc1: 0.2579 - acc2: 0.6571 - acc5: 0.8713 - acc10: 0.8807 - val_loss: 0.7892 - val_acc: 0.7053 - val_acc1: 0.2684 - val_acc2: 0.5882 - val_acc5: 0.7708 - val_acc10: 0.8040
Epoch 5/50 loss: 0.4730 - acc: 0.8016 - acc1: 0.2680 - acc2: 0.6903 - acc5: 0.8865 - acc10: 0.8941 - val_loss: 0.6920 - val_acc: 0.7294 - val_acc1: 0.2722 - val_acc2: 0.6196 - val_acc5: 0.7964 - val_acc10: 0.8158
Epoch 10/50 loss: 0.3415 - acc: 0.8568 - acc1: 0.3011 - acc2: 0.7917 - acc5: 0.9351 - acc10: 0.9435 - val_loss: 0.6062 - val_acc: 0.7794 - val_acc1: 0.2922 - val_acc2: 0.6944 - val_acc5: 0.8428 - val_acc10: 0.8688
Epoch 20/50 loss: 0.2468 - acc: 0.8956 - acc1: 0.3413 - acc2: 0.8869 - acc5: 0.9649 - acc10: 0.9727 - val_loss: 0.5414 - val_acc: 0.8173 - val_acc1: 0.3004 - val_acc2: 0.7856 - val_acc5: 0.8782 - val_acc10: 0.9050
Epoch 50/50 loss: 0.1778 - acc: 0.9210 - acc1: 0.3793 - acc2: 0.9479 - acc5: 0.9850 - acc10: 0.9888 - val_loss: 0.5219 - val_acc: 0.8433 - val_acc1: 0.3728 - val_acc2: 0.8226 - val_acc5: 0.8990 - val_acc10: 0.9346
Deadline: Jun 30, 23:59 3 points
Jun 3: The templates and tests have been updated. ReCodEx accepts both the original and new templates; the original test outputs are still available.
Implement a Denoising Diffusion Implicit Model (DDIM) to unconditionally generate images with $64×64$ resolution.
The unlabeled image data can be loaded using the image64_dataset.py module, with the following datasets being available:
oxford_flowers102
: 8k images of flowers, 67MB,lsun_bedrooms
: 15k images of bedrooms, 109MB,ffhq
: 70k images of Flickr faces, 529MB.Start with the ddim.py
template, which contains extensive comments indicating how the architecture
should like and how the training and sampling should be performed. Note that the
template generate images to TensorBoard (after the whole training and optionally
also each --plot_each
epoch), and the images generated by the reference
solution can be also seen in the Examples.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim.py --epochs=1 --epoch_batches=16 --batch_size=8 --stages=2 --stage_blocks=2 --channels=8 --ema=0.9 --sampling_steps=8
loss: 0.7722 - sample_mean: 126.7602 - sample_std: 125.7844
python3 ddim.py --epochs=1 --epoch_batches=10 --batch_size=12 --stages=3 --stage_blocks=1 --channels=12 --ema=0.8 --sampling_steps=7
loss: 0.7749 - sample_mean: 125.7547 - sample_std: 125.7643
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim.py --dataset=oxford_flowers102 --epochs=70 --plot_each=10
python3 ddim.py --dataset=lsun_bedrooms --epochs=100 --plot_each=10
python3 ddim.py --dataset=ffhq --epochs=100 --plot_each=10
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim.py --epochs=1 --epoch_images=128 --batch_size=8 --stages=2 --stage_blocks=2 --channels=8 --ema=0.9 --sampling_steps=8
loss: 0.7697 - sample_mean: 126.8403 - sample_std: 126.0691
python3 ddim.py --epochs=1 --epoch_images=120 --batch_size=12 --stages=3 --stage_blocks=1 --channels=12 --ema=0.8 --sampling_steps=7
loss: 0.7732 - sample_mean: 125.9786 - sample_std: 126.0531
Deadline: Jun 30, 23:59 1 points
Jun 3: The templates and tests have been updated. ReCodEx accepts both the original and new templates; the original test outputs are still available.
This task is an extension of the ddim
assignment. Your goal is
to extend the original architecture with self-attention blocks,
which are used only in some number of lower-resolution stages.
Start with the ddim_attention.py
template, where most of the comments come already from the ddim
assignments.
Again, the template generate images to TensorBoard, and the images generated by
the reference solution can be also seen in the Examples.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_attention.py --epochs=1 --epoch_batches=16 --batch_size=8 --stages=2 --stage_blocks=1 --channels=6 --ema=0.9 --sampling_steps=8 --attention_stages=0 --attention_heads=4
loss: 0.7818 - sample_mean: 126.5896 - sample_std: 125.8099
python3 ddim_attention.py --epochs=1 --epoch_batches=10 --batch_size=12 --stages=3 --stage_blocks=1 --channels=4 --ema=0.8 --sampling_steps=7 --attention_stages=1 --attention_heads=2
loss: 0.7918 - sample_mean: 126.3455 - sample_std: 125.8369
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_attention.py --dataset=oxford_flowers102 --epochs=70 --plot_each=10
python3 ddim_attention.py --dataset=lsun_bedrooms --epochs=100 --plot_each=10
python3 ddim_attention.py --dataset=ffhq --epochs=100 --plot_each=10
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_attention.py --epochs=1 --epoch_images=128 --batch_size=8 --stages=2 --stage_blocks=1 --channels=6 --ema=0.9 --sampling_steps=8 --attention_stages=0 --attention_heads=4
loss: 0.7799 - sample_mean: 126.7503 - sample_std: 126.0936
python3 ddim_attention.py --epochs=1 --epoch_images=120 --batch_size=12 --stages=3 --stage_blocks=1 --channels=4 --ema=0.8 --sampling_steps=7 --attention_stages=1 --attention_heads=2
loss: 0.7912 - sample_mean: 126.6070 - sample_std: 126.1168
Deadline: Jun 30, 23:59 1 points
Jun 3: The templates and tests have been updated. ReCodEx accepts both the original and new templates; the original test outputs are still available.
This task is an extension of the ddim
assignment. Your goal is to extend the
original unconditional architecture to a conditional model, which also gets
a low-resolution version of the image to generate.
Start with the ddim_conditional.py
template, where most of the comments come already from the ddim
assignments.
Again, the template generate images to TensorBoard, and the images generated by
the reference solution can be also seen in the Examples.
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_conditional.py --epochs=1 --epoch_batches=16 --batch_size=8 --stages=2 --stage_blocks=2 --channels=8 --ema=0.9 --sampling_steps=8
loss: 0.7722 - sample_mean: 125.3491 - sample_std: 125.7724
python3 ddim_conditional.py --epochs=1 --epoch_batches=10 --batch_size=12 --stages=3 --stage_blocks=1 --channels=12 --ema=0.8 --sampling_steps=7
loss: 0.7766 - sample_mean: 126.1337 - sample_std: 125.7730
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_conditional.py --dataset=oxford_flowers102 --epochs=50 --plot_each=10
python3 ddim_conditional.py --dataset=lsun_bedrooms --epochs=50 --plot_each=10
python3 ddim_conditional.py --dataset=ffhq --epochs=100 --plot_each=10
Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.
python3 ddim_conditional.py --epochs=1 --epoch_images=128 --batch_size=8 --stages=2 --stage_blocks=2 --channels=8 --ema=0.9 --sampling_steps=8
loss: 0.7704 - sample_mean: 111.7210 - sample_std: 100.4695
python3 ddim_conditional.py --epochs=1 --epoch_images=120 --batch_size=12 --stages=3 --stage_blocks=1 --channels=12 --ema=0.8 --sampling_steps=7
loss: 0.7750 - sample_mean: 111.9353 - sample_std: 100.4928
In the competitions, your goal is to train a model, and then predict target values on the given unannotated test set.
When submitting a competition solution to ReCodEx, you can include any
number of files of any kind, and either submit them individually or
compess them in a .zip
file. However, there should be exactly one
text file with the test set annotation (.txt
) and at least one
Python source (.py/ipynb
) containing the model training and prediction.
The Python sources are not executed, but must be included for inspection.
For every submission, ReCodEx checks the above conditions (exactly one .txt
,
at least one .py/ipynb
) and whether the given annotations can be evaluated without
error. If not, it will report a corresponding error in the logs.
Before the deadline, ReCodEx prints the exact achieved performance, but only if it is worse than the baseline.
If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached performance.
After the competition deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.
After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.
Installing to central user packages repository
You can install all required packages to central user packages repository using
python3 -m pip install --user tensorflow~=2.11.0 tensorflow-addons~=0.19.0 tensorflow-probability~=0.19.0 tensorflow-hub~=0.12.0 scipy~=1.10.0 transformers~=4.26.0 gymnasium~=0.27.1 pygame~=2.1.3.dev8
.
Installing to a virtual environment
Python supports virtual environments, which are directories containing
independent sets of installed packages. You can create a virtual environment
by running python3 -m venv VENV_DIR
followed by
VENV_DIR/bin/pip install tensorflow~=2.11.0 tensorflow-addons~=0.19.0 tensorflow-probability~=0.19.0 tensorflow-hub~=0.12.0 scipy~=1.10.0 transformers~=4.26.0 gymnasium~=0.27.1 pygame~=2.1.3.dev8
.
(or VENV_DIR/Scripts/pip
on Windows).
Windows installation
On Windows, it can happen that python3
is not in PATH, while py
command
is – in that case you can use py -m venv VENV_DIR
, which uses the newest
Python available, or for example py -3.9 -m venv VENV_DIR
, which uses
Python version 3.9.
If your Windows TensorFlow fails with ImportError: DLL load failed
,
you are probably missing
Visual C++ 2019 Redistributable.
If you encounter a problem creating the logs in the args.logdir
directory,
a possible cause is that the path is longer than 260 characters, which is
the default maximum length of a complete path on Windows. However, you can
increase this limit on Windows 10, version 1607 or later, by following
the instructions.
macOS installation
With an Intel processor, you do not need anything special.
If you have Apple Silicon, you need to replace the tensorflow~=2.11.0
package
by tensorflow-macos~=2.11.0
.
GPU support on Linux
TensorFlow 2.11 supports NVIDIA GPU out of the box, but you need to install CUDA 11.2 (or newer 11.x) and cuDNN 8.1 (or newer 8.x) libraries yourself.
GPU support on Windows
TensorFlow 2.11 dropped NVIDIA GPU support for Windows native builds, and supports GPUs on Windows only via WSL2 – see the detailed instructions how to install official TensorFlow packages with GPU support in WSL2.
GPU support on macOS
The AMD and Apple Silicon GPUs can be used by installing a plugin providing the GPU acceleration using the following command:
python3 -m pip install tensorflow-metal==0.7.0
Common errors when running on a GPU
When your program crashes when using a GPU:
export TF_FORCE_GPU_ALLOW_GROWTH=true
.Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice.
...
Couldn't invoke ptxas --version
...
InternalError: libdevice not found at ./libdevice.10.bc [Op:__some_op]
then TensorFlow is trying to JIT compile a computation graph using XLA
which is not properly configured. This happens by default only if you do not
pass jit_compile=False
to an optimizer – you can either pass it (all our
templates explicitly constructing an optimizer do), or you can configure
XLA properly:
export XLA_FLAGS=--xla_gpu_cuda_data_dir=CUDA_DIR
and make sure that the CUDA_DIR
contains nvvm/libdevice/libdevice.10.bc
.ptxas
is in the PATH
.export TF_CPP_MIN_LOG_LEVEL=0
environmental variable, which increases verbosity of
the log messages.How to install TensorFlow dependencies on MetaCentrum?
To install CUDA, cuDNN and Python 3.10 on MetaCentrum, it is enough to run in every session the following command:
module add python/python-3.10.4-intel-19.0.4-sc7snnf cuda/cuda-11.2.0-intel-19.0.4-tn4edsz cudnn/cudnn-8.1.0.77-11.2-linux-x64-intel-19.0.4-wx22b5t
How to install TensorFlow on MetaCentrum?
Once you have the required dependencies, you can create a virtual environment and install TensorFlow in it. However, note that by default the MetaCentrum jobs have a little disk space, so read about how to ask for scratch storage when submitting a job, and about quotas,
TL;DR:
Run an interactive CPU job, asking for 16GB scratch space:
qsub -l select=1:ncpus=1:mem=8gb:scratch_local=16gb -I
In the job, use the allocated scratch space as a temporary directory:
export TMPDIR=$SCRATCHDIR
Finally, create the virtual environment and install TensorFlow in it:
module add python/python-3.10.4-intel-19.0.4-sc7snnf cuda/cuda-11.2.0-intel-19.0.4-tn4edsz cudnn/cudnn-8.1.0.77-11.2-linux-x64-intel-19.0.4-wx22b5t
python3 -m venv CHOSEN_VENV_DIR
CHOSEN_VENV_DIR/bin/pip install --no-cache-dir --upgrade pip setuptools
CHOSEN_VENV_DIR/bin/pip install --no-cache-dir tensorflow~=2.11.0 tensorflow-addons~=0.19.0 tensorflow-probability~=0.19.0 tensorflow-hub~=0.12.0 scipy~=1.10.0 transformers~=4.26.0 gymnasium~=0.27.1 pygame~=2.1.3.dev8
How to run a GPU computation on MetaCentrum?
First, read the official MetaCentrum documentation: Beginners guide, About scheduling system, GPU clusters.
TL;DR: To run an interactive GPU job with 1 CPU, 1 GPU, 16GB RAM, and 8GB scatch space, run:
qsub -q gpu -l select=1:ncpus=1:ngpus=1:mem=16gb:scratch_local=8gb -I
To run a script in a non-interactive way, replace the -I
option with the script to be executed.
If you want to run a CPU-only computation, remove the -q gpu
and ngpus=1:
from the above commands.
How to install TensorFlow dependencies on AIC?
To enable CUDA 11.8 and cuDNN 8.9.2 on AIC, you can either use modules
as described in
the section “CUDA modules” at https://aic.ufal.mff.cuni.cz/index.php/Submitting_GPU_Jobs,
or you can add the following to your .profile
:
export PATH="/lnet/aic/opt/cuda/cuda-11.8/bin:$PATH"
export LD_LIBRARY_PATH="/lnet/aic/opt/cuda/cuda-11.8/lib64:/lnet/aic/opt/cuda/cuda-11.8/cudnn/8.9.2/lib:/lnet/aic/opt/cuda/cuda-11.8/extras/CUPTI/lib64:$LD_LIBRARY_PATH"
export XLA_FLAGS=--xla_gpu_cuda_data_dir=/lnet/aic/opt/cuda/cuda-11.8 # XLA configuration
How to run a GPU computation on AIC?
First, read the official AIC documentation: Submitting CPU Jobs, Submitting GPU Jobs.
TL;DR: To run an interactive GPU job with 1 CPU, 1 GPU, and 16GB RAM, run:
srun -p gpu -c1 --gpus=1 --mem=16G --pty bash
To run a shell script requiring a GPU in a non-interactive way, use
sbatch -p gpu -c1 --gpus=1 --mem=16G SCRIPT_PATH
If you want to run a CPU-only computation, remove the -p gpu
and --gpus=1
from the above commands.
Is it possible to keep the solutions in a Git repository?
Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.
On GitHub, do not create a public fork with your solutions
If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.
Of course, if you just want to create a pull request, GitHub requires a public fork and that is fine – just do not store your solutions in it.
How to clone the course repository?
To clone the course repository, run
git clone https://github.com/ufal/npfl114
This creates the repository in the npfl114
subdirectory; if you want a different
name, add it as a last parameter.
To update the repository, run git pull
inside the repository directory.
How to keep the course repository as a branch in your repository?
If you want to store the course repository just in a local branch of your existing repository, you can run the following command while in it:
git remote add upstream https://github.com/ufal/npfl114
git fetch upstream
git checkout -t upstream/master
This creates a branch master
; if you want a different name, add
-b BRANCH_NAME
to the last command.
In both cases, you can update your checkout by running git pull
while in it.
How to merge the course repository with your modifications?
If you want to store your solutions in a branch merged with the course repository, you should start by
git remote add upstream https://github.com/ufal/npfl114
git pull upstream master
which creates a branch master
; if you want a different name,
change the last argument to master:BRANCH_NAME
.
You can then commit to this branch and push it to your repository.
To merge the current course repository with your branch, run
git merge upstream master
while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same place in the templates.
What files can be submitted to ReCodEx?
You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.
What file does ReCodEx execute and what arguments does it use?
Exactly one file with py
suffix must contain a line starting with def main(
.
Such a file is imported by ReCodEx and the main
method is executed
(during the import, __name__ == "__recodex__"
).
The file must also export an argument parser called parser
. ReCodEx uses its
arguments and default values, but it overwrites some of the arguments
depending on the test being executed – the template should always indicate which
arguments are set by ReCodEx and which are left intact.
What are the time and memory limits?
The memory limit during evaluation is 1.5GB. The time limit varies, but it should be at least 10 seconds and at least twice the running time of my solution.
How to work with the usual tf.Tensor
s?
Read the TensorFlow Tensor guide and also the TensorFlow tensor indexing guide.
How to work with the tf.RaggedTensor
s?
Read the TensorFlow RaggedTensor guide.
How to convert the tf.RaggedTensor
to a tf.Tensor
and back?
Often, you might want to convert a tf.RaggedTensor
to a tf.Tensor
and then
back.
To obtain just the valid elements (so the rank of the resulting
tf.Tensor
is smaller by one):
tensor_with_valid_elements = ragged_tensor.values
...
new_ragged_tensor = ragged_tensor.with_values(new_tensor_with_valid_elements)
To obtain a tf.Tensor
with the corresponding shape (so padding elements
are added where needed):
tensor_with_padding = ragged_tensor.to_tensor()
...
new_ragged_tensor = tf.RaggedTensor.from_tensor(new_tensor_with_padding, ragged_tensor.row_lengths())
How to look what is in a tf.data.Dataset
?
The tf.data.Dataset
is not just an array, but a description of a pipeline,
which can produce data if requested. A simple way to run the pipeline is
to iterate it using Python iterators:
dataset = tf.data.Dataset.range(10)
for entry in dataset:
print(entry)
How to use tf.data.Dataset
with model.fit
or model.evaluate
?
To use a tf.data.Dataset
in Keras, the dataset elements should be pairs
(input_data, gold_labels)
, where input_data
and gold_labels
must be
already batched. For example, given CAGS
dataset, you can preprocess
training data for cags_classification
as (for development data, you would
remove the .shuffle
):
train = cags.train.map(lambda example: (example["image"], example["label"]))
train = train.shuffle(10_000, seed=args.seed)
train = train.batch(args.batch_size)
Is every iteration through a tf.data.Dataset
the same?
No. Because the dataset is only a pipeline generating data, it is called
each time the dataset is iterated – therefore, every .shuffle
is called
in every iteration.
How to generate different random numbers each epoch during tf.data.Dataset.map
?
When a global random seed is set, methods like tf.random.uniform
generate
the same sequence of numbers on each iteration.
Instead, create a tf.random.Generator
object and use it to produce random numbers.
generator = tf.random.Generator.from_seed(42)
data = tf.data.Dataset.from_tensor_slices(tf.zeros(10, tf.int32))
data = data.map(lambda x: x + generator.uniform([], maxval=10, dtype=tf.int32))
for _ in range(3):
print(*[element.numpy() for element in data])
How to call numpy methods or other non-tf functions in tf.data.Dataset.map
?
You can use tf.numpy_function to call a numpy function even in a computational graph. However, the results have no static shape information and you need to set it manually – ideally using tf.ensure_shape, which both sets the static shape and verifies during execution that the real shape matches it.
For example, to use the bboxes_training
method from
bboxes_utils, you could proceed as follows:
anchors = np.array(...)
def prepare_data(example):
anchor_classes, anchor_bboxes = tf.numpy_function(
bboxes_utils.bboxes_training, [anchors, example["classes"], example["bboxes"], 0.5], (tf.int32, tf.float32))
anchor_classes = tf.ensure_shape(anchor_classes, [len(anchors)])
anchor_bboxes = tf.ensure_shape(anchor_bboxes, [len(anchors), 4])
...
How to use ImageDataGenerator
in tf.data.Dataset.map
?
The ImageDataGenerator
offers a .random_transform
method, so we can use
tf.numpy_function
from the previous answer:
train_generator = tf.keras.preprocessing.image.ImageDataGenerator(...)
def augment(image, label):
return tf.ensure_shape(
tf.numpy_function(train_generator.random_transform, [image], tf.float32),
image.shape
), label
dataset.map(augment)
How to make a part of the network frozen, so that its weights are not updated?
Each tf.keras.layers.Layer
/tf.keras.Model
has a mutable trainable
property indicating whether its variables should be updated – however, after
changing it, you need to call .compile
again (or otherwise make sure the
list of trainable variables for the optimizer is updated).
Note that once trainable == False
, the insides of a layer are no longer
considered, even if some its sub-layers have trainable == True
. Therefore, if
you want to freeze only some sub-layers of a layer you use in your model, the
layer itself must have trainable == True
.
How to choose whether dropout/batch normalization is executed in training or inference regime?
When calling a tf.keras.layers.Layer
/tf.keras.Model
, a named option
training
can be specified, indicating whether training or inference regime
should be used. For a model, this option is automatically passed to its layers
which require it, and Keras automatically passes it during
model.{fit,evaluate,predict}
.
However, you can manually pass for example training=False
to a layer when
using Functional API, meaning that layer is executed in the inference
regime even when the whole model is training.
How does trainable
and training
interact?
The only layer, which is influenced by both these options, is batch normalization, for which:
trainable == False
, the layer is always executed in inference regime;trainable == True
, the training/inference regime is chosen according
to the training
option.How to use linear warmup?
You can prepend any following_schedule
by using the following LinearWarmup
schedule:
class LinearWarmup(tf.optimizers.schedules.LearningRateSchedule):
def __init__(self, warmup_steps, following_schedule):
self._warmup_steps = warmup_steps
self._warmup = tf.optimizers.schedules.PolynomialDecay(0., warmup_steps, following_schedule(0))
self._following = following_schedule
def __call__(self, step):
return tf.cond(step < self._warmup_steps,
lambda: self._warmup(step),
lambda: self._following(step - self._warmup_steps))
Cannot start TensorBoard after installation
If tensorboard
executable cannot be found, make sure the directory with pip installed
packages is in your PATH (that directory is either in your virtual environment
if you use a virtual environment, or it should be ~/.local/bin
on Linux
and %UserProfile%\AppData\Roaming\Python\Python3[7-9]
and
%UserProfile%\AppData\Roaming\Python\Python3[7-9]\Scripts
on Windows).
How to create TensorBoard logs manually?
Start by creating a SummaryWriter using for example:
writer = tf.summary.create_file_writer(args.logdir, flush_millis=10 * 1000)
and then you can generate logs inside a with writer.as_default()
block.
You can either specify step
manually in each call, or you can set
it as the first argument of as_default()
. Also, during training you
usually want to log only some batches, so the logging block during
training usually looks like:
if optimizer.iterations % 100 == 0:
with self._writer.as_default(step=optimizer.iterations):
# logging
What can be logged in TensorBoard?
tf.summary.scalar(name like "train/loss", value, [step])
tf.summary.histogram(name like "train/output_layer", tensor value castable to `tf.float64`, [step])
[num_images, h, w, channels]
, where
channels
can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):tf.summary.image(name like "train/samples", images, [step], [max_outputs=at most this many images])
tf.summary.text(name like "hyperparameters", markdown, [step])
[num_clips, samples, channels]
and values in $[-1,1]$ range:tf.summary.audio(name like "train/samples", clips, sample_rate, [step], [max_outputs=at most this many clips])
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.
To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.
Lecture 1 Questions
Considering a neural network with $D$ input neurons, a single hidden layer with $H$ neurons, $K$ output neurons, hidden activation $f$ and output activation $a$, list its parameters (including their size) and write down how the output is computed. [5]
List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]
Formulate the Universal approximation theorem. [5]
Lecture 2 Questions
Describe maximum likelihood estimation, as minimizing NLL, cross-entropy and KL divergence. [10]
Define mean squared error and show how it can be derived using MLE. [5]
Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]
Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]
Write down the backpropagation algorithm. [5]
Write down the mini-batch SGD algorithm with momentum. Then, formulate SGD with Nesterov momentum and show the difference between them. [5]
Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of $1/\sqrt{t}$ in step $t$. Then write down the RMSProp algorithm and explain how it solves the problem with the involuntary learning rate decay. [10]
Write down the Adam algorithm. Then show why the bias-correction terms $(1-\beta^t)$ make the estimation of the first and second moment unbiased. [10]
Lecture 3 Questions
Considering a neural network with $D$ input neurons, a single ReLU hidden layer with $H$ units and softmax output layer with $K$ units, write down the explicit formulas of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input $\boldsymbol x$, target $g$ and negative log likelihood loss. [10]
Assume a network with MSE loss generated a single output $o \in \mathbb{R}$, and the target output is $g$. What is the value of the loss function itself, and what is the explicit formula of the gradient of the loss function with respect to $o$? [5]
Assume a binary-classification network with cross-entropy loss generated a single output $z \in \mathbb{R}$, which is passed through the sigmoid output activation function, producing $o = \sigma(z)$. If the target output is $g$, what is the value of the loss function itself, and what is the explicit formula of the gradient of the loss function with respect to $z$? [5]
Assume a $K$-class-classification network with cross-entropy loss generated a $K$-element output $\boldsymbol z \in \mathbb{R}^K$, which is passed through the softmax output activation function, producing $\boldsymbol o=\operatorname{softmax}(\boldsymbol z)$. If the target distribution is $\boldsymbol g$, what is the value of the loss function itself, and what is the explicit formula of the gradient of the loss function with respect to $\boldsymbol z$? [5]
Define $L_2$ regularization and describe its effect both on the value of the loss function and on the value of the loss function gradient. [5]
Describe the dropout method and write down exactly how it is used during training and during inference. [5]
Describe how label smoothing works for cross-entropy loss, both for sigmoid and softmax activations. [5]
How are weights and biases initialized using the default Glorot initialization? [5]
Lecture 4 Questions
Write down the equation of how convolution of a given image is computed. Assume the input is an image $I$ of size $H \times W$ with $C$ channels, the kernel $K$ has size $N \times M$, the stride is $T \times S$, the operation performed is in fact cross-correlation (as usual in convolutional neural networks) and that $O$ output channels are computed. [5]
Explain both SAME
and VALID
padding schemes and write down the output
size of a convolutional operation with an $N \times M$ kernel on image
of size $H \times W$ for both these padding schemes (stride is 1). [5]
Describe batch normalization including all its parameters, and write down an algorithm how it is used during training and the algorithm how it is used during inference. Be sure to explicitly write over what is being normalized in case of fully connected layers and in case of convolutional layers. [10]
Describe overall architecture of VGG-19 (you do not need to remember the exact number of layers/filters, but you should describe which layers are used). [5]
Lecture 5 Questions
Describe overall architecture of ResNet. You do not need to remember the exact number of layers/filters, but you should draw a bottleneck block (including the applications of BatchNorms and ReLUs) and state how residual connections work when the number of channels increases. [10]
Draw the original ResNet block (including the exact positions of BatchNorms and ReLUs) and also the improved variant with full pre-activation. [5]
Compare the bottleneck block of ResNet and ResNeXt architectures (draw the latter using convolutions only, i.e., do not use grouped convolutions). [5]
Describe the CNN regularization method of networks with stochastic depth. [5]
Compare Cutout and DropBlock. [5]
Describe Squeeze and Excitation applied to a ResNet block. [5]
Draw the Mobile inverted bottleneck block (including explanation of separable convolutions, the expansion factor, exact positions of BatchNorms and ReLUs, but without describing Squeeze and excitation blocks). [5]
Assume an input image $I$ of size $H \times W$ with $C$ channels, and a convolutional kernel $K$ with size $N \times M$, stride $S$ and $O$ output channels. Write down (or derive) the equation of transposed convolution (or equivalently backpropagation through a convolution to its inputs). [5]
Lecture 6 Questions
Write down how $\mathit{AP}_{50}$ is computed. [5]
Considering a Fast-RCNN architecture, draw overall network architecture, explain what a RoI-pooling layer is, show how the network parametrizes bounding boxes and write down the loss. Finally, describe non-maximum suppression and how the Fast-RCNN prediction is performed. [10]
Considering a Faster-RCNN architecture, describe the region proposal network (what are anchors, architecture including both heads, how are the coordinates of proposals parametrized, what does the loss look like). [10]
Considering Mask-RCNN architecture, describe the additions to a Faster-RCNN architecture (the RoI-Align layer, the new mask-producing head). [5]
Write down the focal loss with class weighting, including the commonly used hyperparameter values. [5]
Draw the overall architecture of a RetinaNet architecture (the FPN architecture including the block combining feature maps of different resolutions; the classification and bounding box generation heads, including their output size). [5]
Lecture 7 Questions
Write down how the Long Short-Term Memory (LSTM) cell operates, including the explicit formulas. Also mention the forget gate bias. [10]
Write down how the Gated Recurrent Unit (GRU) operates, including the explicit formulas. [10]
Describe Highway network computation. [5]
Why the usual dropout cannot be used on recurrent state? Describe how the problem can be alleviated with variational dropout. [5]
Describe layer normalization including all its parameters, and write down how it is computed (be sure to explicitly state over what is being normalized in case of fully connected layers and convolutional layers). [5]
Draw a tagger architecture utilizing word embeddings, recurrent character-level word embeddings (including how are these computed from individual characters), and two sentence-level bidirectional RNNs (explaining the bidirectionality) with a residual connection. Where would you put the dropout layers? [10]
Lecture 8 Questions
Considering a linear-chain CRF, write down how a score of a label sequence $\boldsymbol y$ is defined, and how can a log probability be computed using the label sequence scores. [5]
Write down the dynamic programming algorithm for computing log probability of a linear-chain CRF, including its asymptotic complexity. [10]
Write down the dynamic programming algorithm for linear-chain CRF decoding, i.e., the algorithm computing the most probable label sequence $\boldsymbol y$. [10]
In the context of CTC loss, describe regular and extended labelings and write down the algorithm for computing the log probability of a gold label sequence $\boldsymbol y$. [10]
Describe how CTC predictions are performed using a beam-search. [5]
Draw the CBOW architecture from word2vec
, including the sizes of the inputs
and the sizes of the outputs and used non-linearities. Also make sure to
indicate where the embeddings are being trained. [5]
Draw the SkipGram architecture from word2vec
, including the sizes of the
inputs and the sizes of the outputs and used non-linearities. Also make sure
to indicate where the embeddings are being trained. [5]
Describe the hierarchical softmax used in word2vec
. [5]
Describe the negative sampling proposed in word2vec
, including
the choice of distribution of negative samples. [5]
Lecture 10 Questions
Considering machine translation, draw a recurrent sequence-to-sequence architecture without attention, both during training and during inference (include embedding layers, recurrent cells, classification layers, argmax/softmax). [5]
Considering machine translation, draw a recurrent sequence-to-sequence architecture with attention, used during training (include embedding layers, recurrent cells, attention, classification layers). Then write down how exactly is the attention computed. [10]
Explain how is word embeddings tying used in a sequence-to-sequence architecture, including the necessary scaling. [5]
Write down why are subword units used in text processing, and describe the BPE algorithm for constructing a subword dictionary from a large corpus. [5]
Write down why are subword units used in text processing, and describe the WordPieces algorithm for constructing a subword dictionary from a large corpus. [5]
Pinpoint the differences between the BPE and WordPieces algorithms, both during dictionary construction and during inference. [5]
Lecture 11 Questions
Describe the Transformer encoder architecture, including the description of self-attention (but you do not need to describe multi-head attention), FFN and positions of LNs and dropouts. [10]
Write down the formula of Transformer self-attention, and then describe multi-head self-attention in detail. [10]
Describe the Transformer decoder architecture, including the description of self-attention and masked self-attention (but you do not need to describe multi-head attention), FFN and positions of LNs and dropouts. Also discuss the difference between training and prediction regimes. [10]
Why are positional embeddings needed in Transformer architecture? Write down the sinusoidal positional embeddings used in the Transformer. [5]
Compare RNN to Transformer – what are the strengths and weaknesses of these architectures? [5]
Explain how are ELMo embeddings trained and how are they used in downstream applications. [5]
Describe the BERT architecture (you do not need to describe the (multi-head) self-attention operation). Elaborate also on which positional embeddings are used and what are the GELU activations. [10]
Describe the GELU activations and explain why are they a combination of ReLUs and Dropout. [5]
Elaborate on BERT training process (what are the two objectives used and how exactly are the corresponding losses computed). [10]
Lecture 12 Questions
Define the Markov Decision Process, including the definition of the return. [5]
Define the value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]
Define the action-value function, such that all expectations are over simple random variables (actions, states, rewards), not trajectories. [5]
Express the value function using the action-value function, and express the action-value function using the value function. [5]
Formulate the policy gradient theorem. [5]
Prove the part of the policy gradient theorem showing the value of $\nabla_{\boldsymbol\theta} v_\pi(s)$. [10]
Assuming the policy gradient theorem, formulate the loss used by the REINFORCE algorithm and show how can its gradient be expressed as an expectation over states and actions. [5]
Write down the REINFORCE algorithm, including the loss formula. [10]
Show that introducing baseline does not influence validity of the policy gradient theorem. [5]
Write down the REINFORCE with baseline algorithm, including both loss formulas. [10]
Sketch the overall structure and training procedure of the Neural Architecture Search. You do not need to describe how exactly is the block produced by the controller. [5]
Write down the variational lower bound (ELBO) in the form of a reconstruction error minus the KL divergence between the encoder and the prior (i.e., in the form used for model training). Then prove it is actually a lower bound on the log-likelihood $\log P(\boldsymbol x)$. [10]
Draw an architecture of a variational autoencoder (VAE). Pay attention to the parametrization of the distribution from the encoder (including the used activation functions), and show how to perform latent variable sampling so that it is differentiable with respect to the encoder parameters (the reparametrization trick). [10]
Lecture 13 Questions
Write down the min-max formulation of generative adversarial network (GAN) objective. Then describe what loss is actually used for training the generator in order to avoid vanishing gradients at the beginning of the training. [5]
Write down the training algorithm of generative adversarial networks (GAN), including the losses minimized by the discriminator and the generator. Be sure to use the version of generator loss which avoids vanishing gradients at the beginning of the training. [10]
Explain how the class label is used when training a conditional generative adversarial network (CGAN). [5]
Illustrate that alternating SGD steps are not guaranteed to converge for a min-max problem. [5]
Assuming a data point $\boldsymbol x_0$ and a variance schedule $\beta_1, \ldots, \beta_T$, define the forward diffusion process $q$. [5]
Assuming a variance schedule $\beta_1, \ldots, \beta_T$, prove how the forward diffusion marginal $q(\boldsymbol x_t | \boldsymbol x_0)$ looks like. [10]
Write down the diffusion marginal $q(\boldsymbol x_t | \boldsymbol x_0)$ and the formulas of the cosine schedule of the signal rate and the noise rate. [5]
Write down the DDPM training algorithm, including the formula of the loss. [5]
Specify the inputs and outputs of the DDPM model, and describe its architecture – what the overall structure looks like (ResNet blocks, downsampling and upsampling, self-attention blocks), how the time is represented, and how the conditioning on an input image and an input text looks like. [10]
Define the forward DDIM process, and show how its forward diffusion marginal $q_0(\boldsymbol x_t | \boldsymbol x_0)$ looks like. [5]
Write down the DDIM sampling algorithm. [5]
Lecture 14 Questions
Draw the WaveNet architecture (show the overall architecture, explain dilated convolutions, write down the gated activations, describe global and local conditioning). [10]
Define the Mixture of Logistic distribution used in Parallel WaveNet, including the explicit formula of computing the likelihood of the data. [5]
Describe the changes in the Student model of Parallel WaveNet, which allow efficient sampling (how does the latent prior look like, how the output data distribution is modeled in a single iteration and then after multiple iterations). [5]
Write down the loss used for training of the Student model in Parallel WaveNet, then rewrite the cross-entropy part to a sum of per-time-step cross-entropies, and explain how are the per-time-step cross-entropies estimated. [10]
Describe the addressing mechanism used in Neural Turing Machines – show the overall structure including the required parameters, and explain content addressing, interpolation with location addressing, shifting and sharpening. [10]
Explain the overall architecture of a Neural Turing Machine with an LSTM controller, assuming $R$ reading heads and one write head. Describe the inputs and outputs of the LSTM controller itself, then how the memory is read from and written to, and how the final output is computed. You do not need to write down the implementation of the addressing mechanism (you can assume it is a function which gets parameters, memory and previous distribution, and computes a new distribution over memory cells). [10]