Deep Learning – Summer 2025/26

The objective of this course is to provide a comprehensive introduction to deep neural networks, which have consistently demonstrated superior performance across diverse domains, notably in processing and generating images, text, and speech.

The course focuses both on theory spanning from the basics to the latest advances, as well as on practical implementations in Python and PyTorch (students implement and train deep neural networks performing image classification, image segmentation, object detection, part of speech tagging, lemmatization, speech recognition, reading comprehension, and image generation). Basic Python skills are required, but no previous knowledge of artificial neural networks is needed; basic machine learning understanding is advantageous.

Students work either individually or in small teams on weekly assignments, including competition tasks, where the goal is to obtain the highest performance in the class.

Optionally, you can obtain a micro-credential after passing the course.

About

SIS code: NPFL138
Semester: summer
E-credits: 8
Examination: 3/4 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

  • lectures: Czech lecture is held on Tuesday 13:10 in S5, English lecture on Tuesday 16:30 in S5; first lecture is on Feb 17
  • practicals: there are two parallel practicals, a Czech one on Thursday 9:00 in S5, and an English one on Thursday 10:40 in S5; first practicals are on Feb 19
  • consultations: entirely optional consultations take place on Wednesday 12:20 in S5; first consultations are on Feb 25

All lectures and practicals will be recorded and available on this website.

Lectures

1. Introduction to Deep Learning Slides PDF Slides CZ Lecture CZ UniApprox CZ Practicals EN Lecture EN UniApprox EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

2. Training Neural Networks Slides PDF Slides CZ Lecture CZ AdaptiveLR CZ Practicals EN Lecture EN AdaptiveLR EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

A micro-credential (aka micro-certificate) is a digital certificate attesting that you have gained knowledge and skills in a specific area. It should be internationally recognized and verifiable using an online EU-wide verification system.

A micro-credential can be obtained both by the university students and external participants.

External Participants

If you are not a university student, you can apply to the Deep Learning micro-credential course here and then attend the course along the university students. Upon successfully passing the course, a micro-credential is issued.

The price of the course is 5 000 Kč. If you require a tax receipt, please inform Magdaléna Kokešová within three business days after the payment.

The lectures run for 14 weeks from Feb 17 to May 22, with the examination period continuing until the end of September. Please note that the organization of the course and the setup instructions will be described at the first lecture; if you have already applied, you do not need to do anything else until that time.

University Students

If you have passed the course (in academic year 2024/25 or later) as a part of your study plan, you can obtain a micro-credential by paying only an administrative fee of 300 Kč; if you passed the course but it is not in your study plan, the administrative fee is 500 Kč. Detailed instructions how to get the micro-credential will be sent to the course participants during the examination period.


The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Deep Learning

 Feb 17 Slides PDF Slides CZ Lecture CZ UniApprox CZ Practicals EN Lecture EN UniApprox EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

  • Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
  • Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DLB]
  • Gaussian distribution [Section 3.9.3 of DLB]
  • Machine Learning Basics [Section 5.1-5.1.3 of DLB]
  • History of Deep Learning [Section 1.2 of DLB]
  • Linear regression [Section 5.1.4 of DLB]
  • Challenges Motivating Deep Learning [Section 5.11 of DLB]
  • Neural network basics
    • Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
    • Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
    • Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
    • Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
  • Universal approximation theorem

2. Training Neural Networks

 Feb 24 Slides PDF Slides CZ Lecture CZ AdaptiveLR CZ Practicals EN Lecture EN AdaptiveLR EN Practicals Questions sgd_backpropagation sgd_manual mnist_training gym_cartpole

  • Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
  • Hyperparameters and validation sets [Section 5.3 of DLB]
  • Maximum Likelihood Estimation [Section 5.5 of DLB]
  • Neural network training
    • Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
    • Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.1 and 6.2; note that Algorithms 6.5 and 6.6 are used in practice]
    • SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
    • SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
    • SGD with Nesterov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
    • Optimization algorithms with adaptive gradients
      • AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
      • RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
      • Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.11, PyTorch, Python Image Models, HF Transformers, and Gymnasium. You should install the npfl138 package, which depends on the exact versions of the packages we will be using. The documentation of the npfl138 package is available here.

Teamwork

Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are allowed to share code and submit identical solutions. Note that all students involved in cheating will be punished, so if you share your source code with a friend, both you and your friend will be punished. That also means that you should never publish your solutions.

AI Assistance when Solving Assignments

Relying blindly on AI during learning seems to have negative¹ effect² on skill acquisition. Therefore, you are not allowed to directly copy the assignment descriptions to GenAI and you are not allowed to directly use or copy-paste source code generated by GenAI. However, discussing your manually written code with GenAI is fine.

numpy_entropy

 Deadline: Mar 04, 22:00  2 points

The goal of this exercise is to familiarize yourself with Python, NumPy and the ReCodEx submission system. Start with the numpy_entropy.py.

Load a file specified in args.data_path, whose lines consist of data points of our dataset, and load a file specified in args.model_path, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability).

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

  • entropy H(data distribution)
  • cross-entropy H(data distribution, model distribution)
  • KL-divergence DKL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

  1. python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt
Entropy: 0.96 nats
Crossentropy: 0.99 nats
KL divergence: 0.03 nats
  1. python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt
Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
  1. This test uses data available only in ReCodEx. They are analogous to the numpy_entropy_data_3.txt and numpy_entropy_model_3.txt but are generated with a different random seed.

  2. This test uses data available only in ReCodEx. They are analogous to the numpy_entropy_data_4.txt and numpy_entropy_model_4.txt but are generated with a different random seed.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

Entropy: 0.96 nats
Crossentropy: 0.99 nats
KL divergence: 0.03 nats
Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
Entropy: 4.15 nats
Crossentropy: 4.23 nats
KL divergence: 0.08 nats
Entropy: 4.99 nats
Crossentropy: 5.03 nats
KL divergence: 0.04 nats

pca_first

 Deadline: Mar 04, 22:00  2 points

The goal of this exercise is to familiarize yourself with PyTorch torch.Tensors, shapes and basic tensor manipulation methods. Start with the pca_first.py template.

In this assignment, you should compute the covariance matrix of several examples from the MNIST dataset, then compute the first principal component, and quantify the explained variance of it. It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.

Finally, you might want to read the Introduction to PyTorch Tensors.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 pca_first.py --examples=1024 --iterations=64
Total variance: 53.12
Explained variance: 9.64%
  1. python3 pca_first.py --examples=8192 --iterations=128
Total variance: 53.05
Explained variance: 9.89%
  1. python3 pca_first.py --examples=55000 --iterations=1024
Total variance: 52.74
Explained variance: 9.71%

mnist_layers_activations

 Deadline: Mar 04, 22:00  2 points

Before solving the assignment, start by playing with example_pytorch_tensorboard.py, in order to familiarize yourself with PyTorch and TensorBoard. After running the example, start TensorBoard in the same directory using tensorboard --logdir logs and open http://localhost:6006 in a browser and explore the generated logs.

Your goal is to modify the mnist_layers_activations.py template such that a user-specified neural network is constructed:

  • A number of hidden layers (including zero) can be specified on the command line using the parameter hidden_layers.
  • Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=0 --activation=none
Epoch 1/1 1.1s loss=0.5340 accuracy=0.8632 dev:loss=0.2762 dev:accuracy=0.9278
  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=1 --activation=none
Epoch 1/1 1.6s loss=0.3791 accuracy=0.8913 dev:loss=0.2372 dev:accuracy=0.9314
  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=1 --activation=relu
Epoch 1/1 1.7s loss=0.3149 accuracy=0.9110 dev:loss=0.1458 dev:accuracy=0.9608
  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=1 --activation=tanh
Epoch 1/1 1.7s loss=0.3333 accuracy=0.9049 dev:loss=0.1613 dev:accuracy=0.9582
  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=1 --activation=sigmoid
Epoch 1/1 1.7s loss=0.4900 accuracy=0.8782 dev:loss=0.2185 dev:accuracy=0.9390
  1. python3 mnist_layers_activations.py --recodex --epochs=1 --hidden_layers=3 --activation=relu
Epoch 1/1 2.1s loss=0.2736 accuracy=0.9194 dev:loss=0.1089 dev:accuracy=0.9676

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  • python3 mnist_layers_activations.py --hidden_layers=0 --activation=none
Epoch  1/10 1.1s loss=0.5374 accuracy=0.8614 dev:loss=0.2768 dev:accuracy=0.9270
Epoch  5/10 1.1s loss=0.2779 accuracy=0.9220 dev:loss=0.2201 dev:accuracy=0.9430
Epoch 10/10 1.1s loss=0.2591 accuracy=0.9278 dev:loss=0.2139 dev:accuracy=0.9432
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=none
Epoch  1/10 1.7s loss=0.3791 accuracy=0.8922 dev:loss=0.2400 dev:accuracy=0.9290
Epoch  5/10 1.7s loss=0.2775 accuracy=0.9225 dev:loss=0.2217 dev:accuracy=0.9396
Epoch 10/10 1.7s loss=0.2645 accuracy=0.9247 dev:loss=0.2264 dev:accuracy=0.9378
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu
Epoch  1/10 1.7s loss=0.3178 accuracy=0.9104 dev:loss=0.1482 dev:accuracy=0.9566
Epoch  5/10 1.9s loss=0.0627 accuracy=0.9811 dev:loss=0.0827 dev:accuracy=0.9786
Epoch 10/10 1.9s loss=0.0240 accuracy=0.9930 dev:loss=0.0782 dev:accuracy=0.9810
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh
Epoch  1/10 1.7s loss=0.3318 accuracy=0.9061 dev:loss=0.1632 dev:accuracy=0.9530
Epoch  5/10 1.7s loss=0.0732 accuracy=0.9798 dev:loss=0.0837 dev:accuracy=0.9768
Epoch 10/10 1.8s loss=0.0254 accuracy=0.9943 dev:loss=0.0733 dev:accuracy=0.9790
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
Epoch  1/10 1.7s loss=0.4985 accuracy=0.8788 dev:loss=0.2156 dev:accuracy=0.9382
Epoch  5/10 1.8s loss=0.1249 accuracy=0.9641 dev:loss=0.1077 dev:accuracy=0.9698
Epoch 10/10 1.8s loss=0.0605 accuracy=0.9837 dev:loss=0.0781 dev:accuracy=0.9762
  • python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu
Epoch  1/10 2.1s loss=0.2700 accuracy=0.9213 dev:loss=0.1188 dev:accuracy=0.9680
Epoch  5/10 2.2s loss=0.0477 accuracy=0.9849 dev:loss=0.0787 dev:accuracy=0.9794
Epoch 10/10 2.3s loss=0.0248 accuracy=0.9916 dev:loss=0.1015 dev:accuracy=0.9762
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu
Epoch  1/10 3.4s loss=0.3562 accuracy=0.8911 dev:loss=0.1556 dev:accuracy=0.9598
Epoch  5/10 3.9s loss=0.0864 accuracy=0.9764 dev:loss=0.1164 dev:accuracy=0.9686
Epoch 10/10 4.0s loss=0.0474 accuracy=0.9874 dev:loss=0.0877 dev:accuracy=0.9774
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
Epoch  1/10 3.1s loss=1.9711 accuracy=0.1803 dev:loss=1.8477 dev:accuracy=0.2148
Epoch  5/10 3.2s loss=0.9947 accuracy=0.5815 dev:loss=0.8246 dev:accuracy=0.6392
Epoch 10/10 3.2s loss=0.4406 accuracy=0.8924 dev:loss=0.4239 dev:accuracy=0.8992

sgd_backpropagation

 Deadline: Mar 11, 22:00  3 points

In this exercise you will learn how to compute gradients using the so-called automatic differentiation, which allows to automatically run backpropagation algorithm for a given computation. You can read the Automatic Differentiation with torch.autograd tutorial if interested. After computing the gradient, you should then perform training by running manually implemented minibatch stochastic gradient descent.

Starting with the sgd_backpropagation.py template, you should:

  • implement a neural network with a single tanh hidden layer and categorical output layer;
  • compute the crossentropy loss;
  • use .backward() to automatically compute the gradient of the loss with respect to all variables;
  • perform the SGD update.

This assignment also demonstrates the most important parts of the npfl138.TrainableModule that we are using.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 sgd_backpropagation.py --epochs=2 --batch_size=64 --hidden_layer_size=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Test accuracy after epoch 2 is 92.72
  1. python3 sgd_backpropagation.py --epochs=2 --batch_size=100 --hidden_layer_size=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Test accuracy after epoch 2 is 93.75

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  • python3 sgd_backpropagation.py --batch_size=64 --hidden_layer_size=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Dev accuracy after epoch 3 is 94.68
Dev accuracy after epoch 4 is 95.08
Dev accuracy after epoch 5 is 95.28
Dev accuracy after epoch 6 is 95.20
Dev accuracy after epoch 7 is 95.52
Dev accuracy after epoch 8 is 95.32
Dev accuracy after epoch 9 is 95.66
Dev accuracy after epoch 10 is 95.84
Test accuracy after epoch 10 is 95.02
  • python3 sgd_backpropagation.py --batch_size=100 --hidden_layer_size=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Dev accuracy after epoch 3 is 95.66
Dev accuracy after epoch 4 is 95.90
Dev accuracy after epoch 5 is 96.26
Dev accuracy after epoch 6 is 96.52
Dev accuracy after epoch 7 is 96.52
Dev accuracy after epoch 8 is 96.74
Dev accuracy after epoch 9 is 96.74
Dev accuracy after epoch 10 is 96.62
Test accuracy after epoch 10 is 95.84

sgd_manual  

 Deadline: Mar 11, 22:00  2 points

The goal in this exercise is to extend your solution to the sgd_backpropagation assignment by manually computing the gradient.

While in this assignment we compute the gradient manually, we will nearly always use the automatic differentiation. Therefore, the assignment is more of a mathematical exercise than a real-world application. Furthermore, we will compute the derivatives together on the Mar 06 practicals.

Start with the sgd_manual.py template, which is based on the sgd_backpropagation.py one.

Note that ReCodEx disables the PyTorch automatic differentiation during evaluation.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 sgd_manual.py --epochs=2 --batch_size=64 --hidden_layer_size=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Test accuracy after epoch 2 is 92.72
  1. python3 sgd_manual.py --epochs=2 --batch_size=100 --hidden_layer_size=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Test accuracy after epoch 2 is 93.75

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  • python3 sgd_manual.py --batch_size=64 --hidden_layer_size=20 --learning_rate=0.1
Dev accuracy after epoch 1 is 92.98
Dev accuracy after epoch 2 is 94.42
Dev accuracy after epoch 3 is 94.68
Dev accuracy after epoch 4 is 95.08
Dev accuracy after epoch 5 is 95.28
Dev accuracy after epoch 6 is 95.20
Dev accuracy after epoch 7 is 95.52
Dev accuracy after epoch 8 is 95.32
Dev accuracy after epoch 9 is 95.66
Dev accuracy after epoch 10 is 95.84
Test accuracy after epoch 10 is 95.02
  • python3 sgd_manual.py --batch_size=100 --hidden_layer_size=32 --learning_rate=0.2
Dev accuracy after epoch 1 is 93.58
Dev accuracy after epoch 2 is 95.26
Dev accuracy after epoch 3 is 95.66
Dev accuracy after epoch 4 is 95.90
Dev accuracy after epoch 5 is 96.26
Dev accuracy after epoch 6 is 96.52
Dev accuracy after epoch 7 is 96.52
Dev accuracy after epoch 8 is 96.74
Dev accuracy after epoch 9 is 96.74
Dev accuracy after epoch 10 is 96.62
Test accuracy after epoch 10 is 95.84

mnist_training

 Deadline: Mar 11, 22:00  2 points

This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:

  • Using specified optimizer (either SGD or Adam).
  • Optionally using momentum for the SGD optimizer.
  • Using specified learning rate for the optimizer.
  • Optionally use a given learning rate schedule. The schedule can be either linear, exponential, or cosine. If a schedule is specified, you also get a final learning rate, and the learning rate should be gradually decreased during training to reach the final learning rate just after the training (i.e., the first update after the training would use exactly the final learning rate).

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 mnist_training.py --recodex --epochs=1 --optimizer=SGD --learning_rate=0.01
Epoch 1/1 1.4s loss=0.8234 accuracy=0.7983 dev:loss=0.3712 dev:accuracy=0.9104
  1. python3 mnist_training.py --recodex --epochs=1 --optimizer=SGD --learning_rate=0.01 --momentum=0.9
Epoch 1/1 1.5s loss=0.3665 accuracy=0.8955 dev:loss=0.1809 dev:accuracy=0.9524
  1. python3 mnist_training.py --recodex --epochs=1 --optimizer=SGD --learning_rate=0.1
Epoch 1/1 1.4s loss=0.3580 accuracy=0.8987 dev:loss=0.1707 dev:accuracy=0.9542
  1. python3 mnist_training.py --recodex --epochs=1 --optimizer=Adam --learning_rate=0.001
Epoch 1/1 1.9s loss=0.2982 accuracy=0.9153 dev:loss=0.1324 dev:accuracy=0.9640
  1. python3 mnist_training.py --recodex --epochs=1 --optimizer=Adam --learning_rate=0.01
Epoch 1/1 2.0s loss=0.2313 accuracy=0.9296 dev:loss=0.1416 dev:accuracy=0.9606
  1. python3 mnist_training.py --recodex --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001
Epoch 1/2 2.1s lr=0.0050 loss=0.2106 accuracy=0.9354 dev:loss=0.1086 dev:accuracy=0.9702
Epoch 2/2 2.3s lr=1.00e-04 loss=0.0749 accuracy=0.9769 dev:loss=0.0732 dev:accuracy=0.9798
Next learning rate to be used: 0.0001
  1. python3 mnist_training.py --recodex --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001
Epoch 1/2 2.0s lr=0.0032 loss=0.2013 accuracy=0.9392 dev:loss=0.1019 dev:accuracy=0.9694
Epoch 2/2 2.3s lr=0.0010 loss=0.0734 accuracy=0.9778 dev:loss=0.0737 dev:accuracy=0.9800
Next learning rate to be used: 0.001
  1. python3 mnist_training.py --recodex --epochs=2 --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001
Epoch 1/2 2.1s lr=0.0050 loss=0.2172 accuracy=0.9337 dev:loss=0.1133 dev:accuracy=0.9680
Epoch 2/2 2.4s lr=1.00e-04 loss=0.0732 accuracy=0.9767 dev:loss=0.0776 dev:accuracy=0.9800
Next learning rate to be used: 0.0001

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.01
Epoch  1/10 1.4s loss=0.8300 accuracy=0.7960 dev:loss=0.3780 dev:accuracy=0.9060
Epoch  2/10 1.4s loss=0.4088 accuracy=0.8892 dev:loss=0.2940 dev:accuracy=0.9208
Epoch  3/10 1.4s loss=0.3473 accuracy=0.9030 dev:loss=0.2585 dev:accuracy=0.9286
Epoch  4/10 1.4s loss=0.3144 accuracy=0.9116 dev:loss=0.2383 dev:accuracy=0.9352
Epoch  5/10 1.4s loss=0.2911 accuracy=0.9184 dev:loss=0.2230 dev:accuracy=0.9404
Epoch  6/10 1.4s loss=0.2729 accuracy=0.9235 dev:loss=0.2093 dev:accuracy=0.9432
Epoch  7/10 1.4s loss=0.2577 accuracy=0.9281 dev:loss=0.1993 dev:accuracy=0.9480
Epoch  8/10 1.4s loss=0.2442 accuracy=0.9316 dev:loss=0.1903 dev:accuracy=0.9510
Epoch  9/10 1.4s loss=0.2326 accuracy=0.9350 dev:loss=0.1828 dev:accuracy=0.9546
Epoch 10/10 1.4s loss=0.2222 accuracy=0.9379 dev:loss=0.1744 dev:accuracy=0.9546
  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.01 --momentum=0.9
Epoch  1/10 1.5s loss=0.3731 accuracy=0.8952 dev:loss=0.1912 dev:accuracy=0.9472
Epoch  2/10 1.7s loss=0.1942 accuracy=0.9437 dev:loss=0.1322 dev:accuracy=0.9662
Epoch  3/10 1.7s loss=0.1432 accuracy=0.9588 dev:loss=0.1137 dev:accuracy=0.9688
Epoch  4/10 1.7s loss=0.1148 accuracy=0.9674 dev:loss=0.0954 dev:accuracy=0.9744
Epoch  5/10 1.7s loss=0.0962 accuracy=0.9728 dev:loss=0.0914 dev:accuracy=0.9740
Epoch  6/10 1.7s loss=0.0824 accuracy=0.9767 dev:loss=0.0823 dev:accuracy=0.9772
Epoch  7/10 1.7s loss=0.0718 accuracy=0.9801 dev:loss=0.0806 dev:accuracy=0.9780
Epoch  8/10 1.8s loss=0.0640 accuracy=0.9817 dev:loss=0.0741 dev:accuracy=0.9800
Epoch  9/10 1.7s loss=0.0565 accuracy=0.9841 dev:loss=0.0775 dev:accuracy=0.9800
Epoch 10/10 1.8s loss=0.0509 accuracy=0.9861 dev:loss=0.0737 dev:accuracy=0.9788
  • python3 mnist_training.py --optimizer=SGD --learning_rate=0.1
Epoch  1/10 1.4s loss=0.3660 accuracy=0.8970 dev:loss=0.1945 dev:accuracy=0.9460
Epoch  2/10 1.4s loss=0.1940 accuracy=0.9438 dev:loss=0.1320 dev:accuracy=0.9652
Epoch  3/10 1.4s loss=0.1433 accuracy=0.9588 dev:loss=0.1101 dev:accuracy=0.9696
Epoch  4/10 1.4s loss=0.1146 accuracy=0.9673 dev:loss=0.0941 dev:accuracy=0.9748
Epoch  5/10 1.4s loss=0.0949 accuracy=0.9735 dev:loss=0.0915 dev:accuracy=0.9754
Epoch  6/10 1.4s loss=0.0816 accuracy=0.9766 dev:loss=0.0804 dev:accuracy=0.9782
Epoch  7/10 1.4s loss=0.0714 accuracy=0.9800 dev:loss=0.0783 dev:accuracy=0.9792
Epoch  8/10 1.4s loss=0.0627 accuracy=0.9819 dev:loss=0.0734 dev:accuracy=0.9804
Epoch  9/10 1.4s loss=0.0558 accuracy=0.9843 dev:loss=0.0759 dev:accuracy=0.9814
Epoch 10/10 1.4s loss=0.0502 accuracy=0.9860 dev:loss=0.0728 dev:accuracy=0.9806
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.001
Epoch  1/10 1.9s loss=0.3025 accuracy=0.9152 dev:loss=0.1487 dev:accuracy=0.9582
Epoch  2/10 2.0s loss=0.1349 accuracy=0.9601 dev:loss=0.1003 dev:accuracy=0.9724
Epoch  3/10 2.0s loss=0.0909 accuracy=0.9724 dev:loss=0.0893 dev:accuracy=0.9756
Epoch  4/10 2.0s loss=0.0686 accuracy=0.9797 dev:loss=0.0879 dev:accuracy=0.9742
Epoch  5/10 2.0s loss=0.0542 accuracy=0.9838 dev:loss=0.0755 dev:accuracy=0.9782
Epoch  6/10 2.0s loss=0.0434 accuracy=0.9873 dev:loss=0.0781 dev:accuracy=0.9786
Epoch  7/10 2.0s loss=0.0344 accuracy=0.9900 dev:loss=0.0735 dev:accuracy=0.9796
Epoch  8/10 2.1s loss=0.0280 accuracy=0.9913 dev:loss=0.0746 dev:accuracy=0.9800
Epoch  9/10 2.0s loss=0.0225 accuracy=0.9934 dev:loss=0.0768 dev:accuracy=0.9814
Epoch 10/10 2.1s loss=0.0189 accuracy=0.9947 dev:loss=0.0838 dev:accuracy=0.9780
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01
Epoch  1/10 2.1s loss=0.2333 accuracy=0.9297 dev:loss=0.1618 dev:accuracy=0.9508
Epoch  2/10 2.4s loss=0.1456 accuracy=0.9569 dev:loss=0.1718 dev:accuracy=0.9600
Epoch  3/10 2.5s loss=0.1257 accuracy=0.9637 dev:loss=0.1653 dev:accuracy=0.9626
Epoch  4/10 2.5s loss=0.1128 accuracy=0.9679 dev:loss=0.1789 dev:accuracy=0.9604
Epoch  5/10 2.5s loss=0.1013 accuracy=0.9718 dev:loss=0.1316 dev:accuracy=0.9684
Epoch  6/10 2.6s loss=0.0992 accuracy=0.9729 dev:loss=0.1425 dev:accuracy=0.9642
Epoch  7/10 2.6s loss=0.0963 accuracy=0.9750 dev:loss=0.1814 dev:accuracy=0.9702
Epoch  8/10 2.8s loss=0.0969 accuracy=0.9759 dev:loss=0.1727 dev:accuracy=0.9712
Epoch  9/10 2.8s loss=0.0833 accuracy=0.9786 dev:loss=0.1854 dev:accuracy=0.9666
Epoch 10/10 2.9s loss=0.0808 accuracy=0.9796 dev:loss=0.1904 dev:accuracy=0.9710
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=linear --learning_rate_final=0.0001
Epoch  1/10 2.3s lr=0.0090 loss=0.2329 accuracy=0.9295 dev:loss=0.1592 dev:accuracy=0.9542
Epoch  2/10 2.6s lr=0.0080 loss=0.1313 accuracy=0.9611 dev:loss=0.1211 dev:accuracy=0.9674
Epoch  3/10 2.6s lr=0.0070 loss=0.0983 accuracy=0.9696 dev:loss=0.1034 dev:accuracy=0.9734
Epoch  4/10 2.5s lr=0.0060 loss=0.0713 accuracy=0.9784 dev:loss=0.1250 dev:accuracy=0.9690
Epoch  5/10 2.6s lr=0.0051 loss=0.0557 accuracy=0.9825 dev:loss=0.1086 dev:accuracy=0.9748
Epoch  6/10 2.5s lr=0.0041 loss=0.0414 accuracy=0.9867 dev:loss=0.0983 dev:accuracy=0.9776
Epoch  7/10 2.5s lr=0.0031 loss=0.0246 accuracy=0.9921 dev:loss=0.1009 dev:accuracy=0.9782
Epoch  8/10 2.5s lr=0.0021 loss=0.0144 accuracy=0.9955 dev:loss=0.0996 dev:accuracy=0.9798
Epoch  9/10 2.5s lr=0.0011 loss=0.0072 accuracy=0.9979 dev:loss=0.0999 dev:accuracy=0.9800
Epoch 10/10 2.5s lr=1.00e-04 loss=0.0039 accuracy=0.9993 dev:loss=0.0985 dev:accuracy=0.9812
Next learning rate to be used: 0.0001
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=exponential --learning_rate_final=0.001
Epoch  1/10 2.1s lr=0.0079 loss=0.2235 accuracy=0.9331 dev:loss=0.1471 dev:accuracy=0.9584
Epoch  2/10 2.4s lr=0.0063 loss=0.1151 accuracy=0.9654 dev:loss=0.1097 dev:accuracy=0.9706
Epoch  3/10 2.4s lr=0.0050 loss=0.0782 accuracy=0.9757 dev:loss=0.1059 dev:accuracy=0.9748
Epoch  4/10 2.4s lr=0.0040 loss=0.0521 accuracy=0.9839 dev:loss=0.0984 dev:accuracy=0.9720
Epoch  5/10 2.5s lr=0.0032 loss=0.0366 accuracy=0.9879 dev:loss=0.1046 dev:accuracy=0.9764
Epoch  6/10 2.5s lr=0.0025 loss=0.0235 accuracy=0.9921 dev:loss=0.0965 dev:accuracy=0.9798
Epoch  7/10 2.5s lr=0.0020 loss=0.0144 accuracy=0.9954 dev:loss=0.0914 dev:accuracy=0.9810
Epoch  8/10 2.4s lr=0.0016 loss=0.0101 accuracy=0.9970 dev:loss=0.0924 dev:accuracy=0.9808
Epoch  9/10 2.4s lr=0.0013 loss=0.0057 accuracy=0.9986 dev:loss=0.1007 dev:accuracy=0.9820
Epoch 10/10 2.5s lr=0.0010 loss=0.0038 accuracy=0.9992 dev:loss=0.0926 dev:accuracy=0.9832
Next learning rate to be used: 0.001
  • python3 mnist_training.py --optimizer=Adam --learning_rate=0.01 --decay=cosine --learning_rate_final=0.0001
Epoch  1/10 2.1s lr=0.0098 loss=0.2362 accuracy=0.9288 dev:loss=0.1563 dev:accuracy=0.9556
Epoch  2/10 2.5s lr=0.0091 loss=0.1340 accuracy=0.9605 dev:loss=0.1450 dev:accuracy=0.9652
Epoch  3/10 2.5s lr=0.0080 loss=0.1088 accuracy=0.9688 dev:loss=0.1465 dev:accuracy=0.9612
Epoch  4/10 2.5s lr=0.0066 loss=0.0774 accuracy=0.9767 dev:loss=0.1184 dev:accuracy=0.9706
Epoch  5/10 2.6s lr=0.0050 loss=0.0569 accuracy=0.9823 dev:loss=0.1140 dev:accuracy=0.9762
Epoch  6/10 2.5s lr=0.0035 loss=0.0381 accuracy=0.9876 dev:loss=0.1166 dev:accuracy=0.9770
Epoch  7/10 2.5s lr=0.0021 loss=0.0195 accuracy=0.9939 dev:loss=0.1022 dev:accuracy=0.9800
Epoch  8/10 2.6s lr=0.0010 loss=0.0097 accuracy=0.9972 dev:loss=0.1059 dev:accuracy=0.9808
Epoch  9/10 2.6s lr=0.0003 loss=0.0055 accuracy=0.9989 dev:loss=0.1073 dev:accuracy=0.9792
Epoch 10/10 2.6s lr=1.00e-04 loss=0.0040 accuracy=0.9993 dev:loss=0.1071 dev:accuracy=0.9792
Next learning rate to be used: 0.0001

gym_cartpole

 Deadline: Mar 11, 22:00  3 points

Solve the CartPole-v1 environment from the Gymnasium library, utilizing only provided supervised training dataset of 100 examples. Start with the gym_cartpole.py template.

The solution to this task should be a model which passes evaluation on random inputs. This evaluation can be performed by running the gym_cartpole.py with --evaluate argument (optionally rendering if --render option is provided), or directly calling the evaluate_model method. In order to pass, you must achieve an average reward of at least 475 on 100 episodes. Your model should have two outputs (i.e., corresponding to a categorical distribution with 2 output classes).

When designing the model, you should consider that the size of the training data is very small and the data is quite noisy.

When submitting to ReCodEx, do not forget to also submit the trained model.

In the competitions, your goal is to train a model, and then predict target values on the given unannotated test set.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you can include any number of files of any kind, and either submit them individually or compess them in a .zip file. However, there should be exactly one text file with the test set annotation (.txt) and at least one Python source (.py/ipynb) containing the model training and prediction. The Python sources are not executed, but must be included for inspection.

Competition Evaluation

  • For every submission, ReCodEx checks the above conditions (exactly one .txt, at least one .py/ipynb) and whether the given annotations can be evaluated without error. If not, it will report the corresponding error in the logs.

  • Before the first deadline, ReCodEx prints the exact achieved performance, but only if it is worse than the baseline.

    If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached performance.

  • After the first deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.

  • After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.

Repeated Participation in Competitions

  • If a participant got non-zero points for a competition task already in previous years, they are treated slightly differently. Namely, every team with one or more returning participants still get competition points, but
    • the returning team results are not shown on the slides on the practicals;
    • the returning team results are shown in italics in ReCodEx;
    • the returning team results are not used to compute the thresholds for competition points.

What Is Allowed

  • You can use only the given annotated data for training and evaluation.
  • You can use the given annotated training data in any way.
  • You can use the given annotated development data for evaluation or hyperparameter tuning, but not for the training itself.
  • Additionally, you can use any unannotated or manually created data for training and evaluation.
  • The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like hand-written rules).
  • Do not use test set annotations in any way, if you somehow get access to them.
  • Unless stated otherwise, you can use any architecture to solve the competition task at hand, but the implementation must be created by you and you must understand it fully. You can of course take inspiration from any paper or existing implementation, but please reference it in that case.
    • You can of course use anything from the PyTorch package (but unless stated otherwise, do not use models from torchvision, timm, torchaudio, …).
    • You can use any data augmentation (even implementations not written by you).
    • You can use any optimizer and any hyperparameter optimization method (even implementations not written by you).
  • If you utilize an already trained model, it must be trained only on the allowed training data, unless stated otherwise.

Install

  • What Python version to use

    The recommended Python version is 3.11. This version is used by ReCodEx to evaluate your solutions. Supported Python versions are 3.11–3.14.

    You can find out the version of your Python installation using python3 --version.

  • Installing to central user packages repository

    You can install all required packages to central user packages repository using python3 -m pip install --user --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu128 npfl138.

    On Linux and Windows, the above command installs CUDA 12.8 PyTorch build (which you would get also without specifying the --extra-index-url option), but you can change cu128 to:

    • cpu to get CPU-only (smaller) version,
    • cu126 to get CUDA 12.6 build,
    • rocm7.1 to get AMD ROCm 7.1 build (Linux only).

    On macOS, the above --extra-index-url values have no practical effect, the Metal support is installed in all cases.

    To update the npfl138 package later, use python3 -m pip install --user --upgrade npfl138.

  • Installing to a virtual environment

    Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running python3 -m venv VENV_DIR followed by VENV_DIR/bin/pip install --no-cache-dir --extra-index-url=https://download.pytorch.org/whl/cu128 npfl138. (or VENV_DIR/Scripts/pip on Windows).

    Again, apart from the CUDA 12.8 build (which you would get also without specifying the --extra-index-url option), you can change cu128 on Linux and Windows to:

    • cpu to get CPU-only (smaller) version,
    • cu126 to get CUDA 12.6 build,
    • rocm7.1 to get AMD ROCm 7.1 build (Linux only).

    To update the npfl138 package later, use VENV_DIR/bin/pip install --upgrade npfl138.

  • Installing to a virtual environment with uv

    If you would like to use uv pip to install the required packages to a virtual environment and you use --extra-index-url (i.e., you want a different build than the default CUDA 12.8 on Linux or Windows), you need to add --index-strategy unsafe-best-match to the above command for uv to resolve torchmetrics correctly.

    If you prefer to use uv add instead and again want to use a non-default build, first manually add torch~=2.10.0, torchaudio~=2.10.0, and torchvision~=0.25.0 with a specified tool.uv.index according to https://docs.astral.sh/uv/guides/integration/pytorch/#using-a-pytorch-index. Once you have PyTorch installed, you can then run uv add npfl138.

  • Windows installation

    • On Windows, it can happen that python3 is not in PATH, while py command is – in that case you can use py -m venv VENV_DIR, which uses the newest Python available, or for example py -3.11 -m venv VENV_DIR, which uses Python version 3.11.

    • If you encounter a problem creating the logs in the args.logdir directory, a possible cause is that the path is longer than 260 characters, which is the default maximum length of a complete path on Windows. However, you can increase this limit on Windows 10, version 1607 or later, by following the instructions.

    • If you encounter an Import Error: DLL load failed, install the VS 2017 Redistributable as described in the official documentation.

  • MacOS installation

  • GPU support on Linux and Windows

    PyTorch supports NVIDIA GPU or AMD GPU out of the box, you just need to select appropriate --extra-index-url when installing the packages.

    If you encounter problems loading CUDA or cuDNN libraries, make sure your LD_LIBRARY_PATH does not contain paths to older CUDA/cuDNN libraries.

Git

  • Is it possible to keep the solutions in a Git repository?

    Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.

  • On GitHub, do not create a public fork containing your solutions.

    If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button; this way, the cloned repository would be public.

    • If you created a public fork and want to make it private, you need to start by pressing Leave fork network in the repository settings; only then you can change the visibility to private.

    Of course, if you want to create a pull request, GitHub requires a public fork and you need to create it, just do not store your solutions in it (so you might end up with two repositories, a public fork for pull requests and a private repo for your own solutions).

  • How to clone the course repository?

    To clone the course repository, run

    git clone https://github.com/ufal/npfl138
    

    This creates the repository in the npfl138 subdirectory; if you want a different name, add it as an additional parameter.

    To update the repository, run git pull inside the repository directory.

  • How to merge the course repository updates into a private repository with additional changes?

    It is possible to have a private repository that combines your solutions and the updates from the course repository. To do that, start by cloning your empty private repository, and then run the following commands in it:

    git remote add course_repo https://github.com/ufal/npfl138
    git fetch course_repo
    git checkout --no-track course_repo/master
    

    This creates a new remote course_repo and a clone of the master branch from it; however, git pull and git push in this branch will operate on the repository you cloned originally.

    To update your branch with the changes from the course repository, run

    git fetch course_repo
    git merge course_repo/master
    

    while in your branch (the command git pull --no-rebase course_repo master has the same effect). Of course, it might be necessary to resolve conflicts if both you and the course repository modified the same lines in the same files.

ReCodEx

  • What files can be submitted to ReCodEx?

    You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.

  • What file does ReCodEx execute and what arguments does it use?

    Exactly one file with py suffix must contain a line starting with def main(. Such a file is imported by ReCodEx and the main method is executed (during the import, __name__ == "__recodex__").

    The file must also export an argument parser called parser. ReCodEx uses its arguments and default values, but it overwrites some of the arguments depending on the test being executed; the template always indicates which arguments are set by ReCodEx and which are left intact.

  • What are the time and memory limits?

    The memory limit during evaluation is 1.5GB. The time limit varies, but it should be at least 10 seconds and at least twice the running time of my solution.

TensorBoard

  • Should TensorFlow be installed when using TensorBoard?

    When TensorBoard starts, it warns about a reduced feature set because of missing TensorFlow, notably

    TensorFlow installation not found - running with reduced feature set.
    

    Do not worry about the warning, there is no need to install TensorFlow.

  • Cannot start TensorBoard after installation

    If you cannot run the tensorboard command after installation, it is most likely not in your PATH. You can either:

    • start tensorboard using python3 -m tensorboard.main --logdir logs, or
    • add the directory with pip installed packages to your PATH (that directory is either bin/Scripts in your virtual environment if you use a virtual environment, or it should be ~/.local/bin on Linux and %UserProfile%\AppData\Roaming\Python\Python311 and %UserProfile%\AppData\Roaming\Python\Python311\Scripts on Windows).
  • What can be logged in TensorBoard? See the documentation of the SummaryWriter. Common possibilities are:

    • scalar values:
      summary_writer.add_scalar(name like "train/loss", value, step)
      
    • tensor values displayed as histograms or distributions:
      summary_writer.add_histogram(name like "train/output_layer", tensor, step)
      
    • images as tensors with shape [num_images, h, w, channels], where channels can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):
      summary_writer.add_images(name like "train/samples", images, step, dataformats="NHWC")
      
      Other dataformats are "HWC" (shape [h, w, channels]), "HW", "NCHW", "CHW".
    • possibly large amount of text (e.g., all hyperparameter values, sample translations in MT, …) in Markdown format:
      summary_writer.add_text(name like "hyperparameters", markdown, step)
      
    • audio as tensors with shape [1, samples] and values in [1,1][-1,1] range:
      summary_writer.add_audio(name like "train/samples", clip, step, [sample_rate])
      
    • traced modules using:
      summary_writer.add_graph(module, example_input_batch)
      

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.

Exam Questions

Lecture 1 Questions

  • Considering a neural network with DD input neurons, a single hidden layer with HH neurons, KK output neurons, hidden activation ff and output activation aa, list its parameters (including their size) and write down how the output is computed. [5]

  • List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]

  • Formulate the Universal approximation theorem. [5]

Lecture 2 Questions

  • Define maximum likelihood estimation, and show that it is equal to minimizing NLL, minimizing cross-entropy, and minimizing KL divergence. [10]

  • Define mean squared error, show how it can be derived using MLE (define pmodelp_{\textrm{model}}, show how MLE looks using pmodelp_{\textrm{model}}, and prove that the maximum likelihood estimate is equal to minimizing MSE). [5]

  • Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]

  • Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]

  • Write down the backpropagation algorithm. [5]

  • Write down the mini-batch SGD algorithm with momentum. Then, formulate SGD with Nesterov momentum and show the difference between them. [5]

  • Write down the AdaGrad algorithm and show that it tends to internally decay learning rate by a factor of 1/t1/\sqrt{t} in step tt. Then write down the RMSProp algorithm and explain how it solves the problem with the involuntary learning rate decay. [10]

  • Write down the Adam algorithm. Then show why the bias-correction terms (1βt)(1-\beta^t) make the estimation of the first and second moment unbiased. [10]