Deep Learning – Summer 2023/24

The objective of this course is to provide a comprehensive introduction to deep neural networks, which have consistently demonstrated superior performance across diverse domains, notably in processing and generating images, text, and speech.

The course focuses both on theory spanning from the basics to the latest advances, as well as on practical implementations in Python and PyTorch (students implement and train deep neural networks performing image classification, image segmentation, object detection, part of speech tagging, lemmatization, speech recognition, reading comprehension, and image generation). Basic Python skills are required, but no previous knowledge of artificial neural networks is needed; basic machine learning understanding is advantageous.

Students work either individually or in small teams on weekly assignments, including competition tasks, where the goal is to obtain the highest performance in the class.

About

SIS code: NPFL138
Semester: summer
E-credits: 8
Examination: 3/4 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

  • lectures: Czech lecture is held on Monday 12:20 in S5, English lecture on Tuesday 12:20 in S4; first lecture is on Feb 19/20
  • practicals: there are two parallel practicals, a Czech one on Wednesday 9:00 in S9, and an English one on Wednesday 10:40 in S9; first practicals are on Feb 21
  • consultations: entirely optional consultations take place on Tuesday 15:40 in S4; first consultations are on Feb 27

All lectures and practicals will be recorded and available on this website.

Lectures

1. Introduction to Deep Learning Slides PDF Slides CZ Lecture CZ UniApprox CZ Practicals EN Lecture EN UniApprox EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Deep Learning

 Feb 19 Slides PDF Slides CZ Lecture CZ UniApprox CZ Practicals EN Lecture EN UniApprox EN Practicals Questions numpy_entropy pca_first mnist_layers_activations

  • Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
  • Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
  • Gaussian distribution [Section 3.9.3 of DLB]
  • Machine Learning Basics [Section 5.1-5.1.3 of DLB]
  • History of Deep Learning [Section 1.2 of DLB]
  • Linear regression [Section 5.1.4 of DLB]
  • Challenges Motivating Deep Learning [Section 5.11 of DLB]
  • Neural network basics
    • Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
    • Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
    • Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
    • Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]
  • Universal approximation theorem

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.11, Keras 3.0.5, PyTorch 2.2.0, HF Transformers 4.37.2, and Gymnasium 1.0.0a. You should install the exact version of these packages yourselves.

Teamwork

Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are allowed to share code and submit identical solutions.

numpy_entropy

 Deadline: Mar 05, 22:00  3 points

The goal of this exercise is to familiarize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file specified in args.data_path, whose lines consist of data points of our dataset, and load a file specified in args.model_path, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability).

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

  • entropy H(data distribution)
  • cross-entropy H(data distribution, model distribution)
  • KL-divergence DKL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

  1. python3 numpy_entropy.py --data_path numpy_entropy_data_1.txt --model_path numpy_entropy_model_1.txt
Entropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats
  1. python3 numpy_entropy.py --data_path numpy_entropy_data_2.txt --model_path numpy_entropy_model_2.txt
Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

Entropy: 0.96 nats
Crossentropy: 1.07 nats
KL divergence: 0.11 nats
Entropy: 0.96 nats
Crossentropy: inf nats
KL divergence: inf nats
Entropy: 4.15 nats
Crossentropy: 4.23 nats
KL divergence: 0.08 nats
Entropy: 4.99 nats
Crossentropy: 5.03 nats
KL divergence: 0.04 nats

pca_first

 Deadline: Mar 05, 22:00  2 points

The goal of this exercise is to familiarize with PyTorch torch.Tensors, shapes and basic tensor manipulation methods. Start with the pca_first.py (and you will also need the mnist.py module).

Alternatively, you can instead use the pca_first.keras.py template, which uses backend-agnostic keras.ops operations instead of PyTorch operations – both templates can be used to solve the assignment.

In this assignment, you should compute the covariance matrix of several examples from the MNIST dataset, then compute the first principal component, and quantify the explained variance of it. It is fine if you are not familiar with terms like covariance matrix or principal component – the template contains a detailed description of what you have to do.

Finally, you might want to read the Introduction to PyTorch Tensors.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 pca_first.py --examples=1024 --iterations=64
Total variance: 53.12
Explained variance: 9.64%
  1. python3 pca_first.py --examples=8192 --iterations=128
Total variance: 53.05
Explained variance: 9.89%
  1. python3 pca_first.py --examples=55000 --iterations=1024
Total variance: 52.74
Explained variance: 9.71%

mnist_layers_activations

 Deadline: Mar 05, 22:00  2 points

Before solving the assignment, start by playing with example_keras_tensorboard.py, in order to familiarize with TensorFlow and TensorBoard. Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs. Then open http://localhost:6006 in a browser and explore the active tabs.

Your goal is to modify the mnist_layers_activations.py template such that a user-specified neural network is constructed:

  • A number of hidden layers (including zero) can be specified on the command line using parameter hidden_layers.
  • Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=0 --activation=none
accuracy: 0.7801 - loss: 0.8405 - val_accuracy: 0.9300 - val_loss: 0.2716
  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=none
accuracy: 0.8483 - loss: 0.5230 - val_accuracy: 0.9352 - val_loss: 0.2422
  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=relu
accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432
  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=tanh
accuracy: 0.8529 - loss: 0.5183 - val_accuracy: 0.9564 - val_loss: 0.1632
  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=1 --activation=sigmoid
accuracy: 0.7851 - loss: 0.8650 - val_accuracy: 0.9414 - val_loss: 0.2196
  1. python3 mnist_layers_activations.py --epochs=1 --hidden_layers=3 --activation=relu
accuracy: 0.8497 - loss: 0.5011 - val_accuracy: 0.9664 - val_loss: 0.1225

Note that your results may be slightly different, depending on your CPU type and whether you use a GPU.

  • python3 mnist_layers_activations.py --hidden_layers=0 --activation=none
Epoch  1/10 accuracy: 0.7801 - loss: 0.8405 - val_accuracy: 0.9300 - val_loss: 0.2716
Epoch  5/10 accuracy: 0.9222 - loss: 0.2792 - val_accuracy: 0.9406 - val_loss: 0.2203
Epoch 10/10 accuracy: 0.9304 - loss: 0.2515 - val_accuracy: 0.9432 - val_loss: 0.2159
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=none
Epoch  1/10 accuracy: 0.8483 - loss: 0.5230 - val_accuracy: 0.9352 - val_loss: 0.2422
Epoch  5/10 accuracy: 0.9236 - loss: 0.2758 - val_accuracy: 0.9360 - val_loss: 0.2325
Epoch 10/10 accuracy: 0.9298 - loss: 0.2517 - val_accuracy: 0.9354 - val_loss: 0.2439
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=relu
Epoch  1/10 accuracy: 0.8503 - loss: 0.5286 - val_accuracy: 0.9604 - val_loss: 0.1432
Epoch  5/10 accuracy: 0.9824 - loss: 0.0613 - val_accuracy: 0.9808 - val_loss: 0.0740
Epoch 10/10 accuracy: 0.9948 - loss: 0.0202 - val_accuracy: 0.9788 - val_loss: 0.0821
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=tanh
Epoch  1/10 accuracy: 0.8529 - loss: 0.5183 - val_accuracy: 0.9564 - val_loss: 0.1632
Epoch  5/10 accuracy: 0.9800 - loss: 0.0728 - val_accuracy: 0.9740 - val_loss: 0.0853
Epoch 10/10 accuracy: 0.9948 - loss: 0.0244 - val_accuracy: 0.9782 - val_loss: 0.0772
  • python3 mnist_layers_activations.py --hidden_layers=1 --activation=sigmoid
Epoch  1/10 accuracy: 0.7851 - loss: 0.8650 - val_accuracy: 0.9414 - val_loss: 0.2196
Epoch  5/10 accuracy: 0.9647 - loss: 0.1270 - val_accuracy: 0.9704 - val_loss: 0.1079
Epoch 10/10 accuracy: 0.9852 - loss: 0.0583 - val_accuracy: 0.9756 - val_loss: 0.0837
  • python3 mnist_layers_activations.py --hidden_layers=3 --activation=relu
Epoch  1/10 accuracy: 0.8497 - loss: 0.5011 - val_accuracy: 0.9664 - val_loss: 0.1225
Epoch  5/10 accuracy: 0.9862 - loss: 0.0438 - val_accuracy: 0.9734 - val_loss: 0.1026
Epoch 10/10 accuracy: 0.9932 - loss: 0.0202 - val_accuracy: 0.9818 - val_loss: 0.0865
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=relu
Epoch  1/10 accuracy: 0.7710 - loss: 0.6793 - val_accuracy: 0.9570 - val_loss: 0.1479
Epoch  5/10 accuracy: 0.9780 - loss: 0.0783 - val_accuracy: 0.9786 - val_loss: 0.0808
Epoch 10/10 accuracy: 0.9869 - loss: 0.0481 - val_accuracy: 0.9724 - val_loss: 0.1163
  • python3 mnist_layers_activations.py --hidden_layers=10 --activation=sigmoid
Epoch  1/10 accuracy: 0.1072 - loss: 2.3068 - val_accuracy: 0.1784 - val_loss: 2.1247
Epoch  5/10 accuracy: 0.8825 - loss: 0.4776 - val_accuracy: 0.9164 - val_loss: 0.3686
Epoch 10/10 accuracy: 0.9294 - loss: 0.2994 - val_accuracy: 0.9386 - val_loss: 0.2671

Install

  • Installing to central user packages repository

    You can install all required packages to central user packages repository using python3 -m pip install --user keras~=3.0.5 --extra-index-url=https://download.pytorch.org/whl/cu118 torch~=2.2.0 torchaudio~=2.2.0 torchvision~=0.17.0 torchmetrics~=1.3.1 flashlight-text~=0.0.4 tensorboard~=2.16.2 transformers~=4.37.2 gymnasium~=1.0.0a1 pygame~=2.5.2.

    The above command installs CUDA 11.8 PyTorch build, but you can change cu118 to:

    • cpu to get CPU-only (smaller) version,
    • cu121 to get CUDA 12.1 build,
    • rocm5.7 to get AMD ROCm 5.7 build.
  • Installing to a virtual environment

    Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running python3 -m venv VENV_DIR followed by VENV_DIR/bin/pip install keras~=3.0.5 --extra-index-url=https://download.pytorch.org/whl/cu118 torch~=2.2.0 torchaudio~=2.2.0 torchvision~=0.17.0 torchmetrics~=1.3.1 flashlight-text~=0.0.4 tensorboard~=2.16.2 transformers~=4.37.2 gymnasium~=1.0.0a1 pygame~=2.5.2. (or VENV_DIR/Scripts/pip on Windows).

    Again, apart from the CUDA 11.8 build, you can change cu118 to:

    • cpu to get CPU-only (smaller) version,
    • cu121 to get CUDA 12.1 build,
    • rocm5.7 to get AMD ROCm 5.7 build.
  • Windows installation

    • On Windows, it can happen that python3 is not in PATH, while py command is – in that case you can use py -m venv VENV_DIR, which uses the newest Python available, or for example py -3.11 -m venv VENV_DIR, which uses Python version 3.11.

    • If you encounter a problem creating the logs in the args.logdir directory, a possible cause is that the path is longer than 260 characters, which is the default maximum length of a complete path on Windows. However, you can increase this limit on Windows 10, version 1607 or later, by following the instructions.

  • GPU support on Linux and Windows

    PyTorch supports NVIDIA GPU or AMD GPU out of the box, you just need to select appropriate --extra-index-url when installing the packages.

  • GPU support on macOS

    The support for Apple Silicon GPUs in PyTorch+Keras is currently not great. Apple is working on mlx backend for Keras, which might improve the situation in the future.

    One could in theory use the TensorFlow backend, but the latest release of tensorflow-metal==1.1.0 works with TensorFlow 2.14, which does not support Keras 3.

Git

  • Is it possible to keep the solutions in a Git repository?

    Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.

  • On GitHub, do not create a public fork with your solutions

    If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.

    Of course, if you just want to create a pull request, GitHub requires a public fork and that is fine – just do not store your solutions in it.

  • How to clone the course repository?

    To clone the course repository, run

    git clone https://github.com/ufal/npfl138
    

    This creates the repository in the npfl138 subdirectory; if you want a different name, add it as a last parameter.

    To update the repository, run git pull inside the repository directory.

  • How to keep the course repository as a branch in your repository?

    If you want to store the course repository just in a local branch of your existing repository, you can run the following command while in it:

    git remote add upstream https://github.com/ufal/npfl138
    git fetch upstream
    git checkout -t upstream/master
    

    This creates a branch master; if you want a different name, add -b BRANCH_NAME to the last command.

    In both cases, you can update your checkout by running git pull while in it.

  • How to merge the course repository with your modifications?

    If you want to store your solutions in a branch merged with the course repository, you should start by

    git remote add upstream https://github.com/ufal/npfl138
    git pull upstream master
    

    which creates a branch master; if you want a different name, change the last argument to master:BRANCH_NAME.

    You can then commit to this branch and push it to your repository.

    To merge the current course repository with your branch, run

    git merge upstream master
    

    while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same place in the templates.

ReCodEx

  • What files can be submitted to ReCodEx?

    You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.

  • What file does ReCodEx execute and what arguments does it use?

    Exactly one file with py suffix must contain a line starting with def main(. Such a file is imported by ReCodEx and the main method is executed (during the import, __name__ == "__recodex__").

    The file must also export an argument parser called parser. ReCodEx uses its arguments and default values, but it overwrites some of the arguments depending on the test being executed – the template should always indicate which arguments are set by ReCodEx and which are left intact.

  • What are the time and memory limits?

    The memory limit during evaluation is 1.5GB. The time limit varies, but it should be at least 10 seconds and at least twice the running time of my solution.

Finetuning

  • How to make a part of the network frozen, so that its weights are not updated?

    Each keras.layers.Layer/keras.Model has a mutable trainable property indicating whether its variables should be updated – however, after changing it, you need to call .compile again (or otherwise make sure the list of trainable variables for the optimizer is updated).

    Note that once trainable == False, the insides of a layer are no longer considered, even if some its sub-layers have trainable == True. Therefore, if you want to freeze only some sub-layers of a layer you use in your model, the layer itself must have trainable == True.

  • How to choose whether dropout/batch normalization is executed in training or inference regime?

    When calling a keras.layers.Layer/keras.Model, a named option training can be specified, indicating whether training or inference regime should be used. For a model, this option is automatically passed to its layers which require it, and Keras automatically passes it during model.{fit,evaluate,predict}.

    However, you can manually pass for example training=False to a layer when using Functional API, meaning that layer is executed in the inference regime even when the whole model is training.

  • How does trainable and training interact?

    The only layer, which is influenced by both these options, is batch normalization, for which:

    • if trainable == False, the layer is always executed in inference regime;
    • if trainable == True, the training/inference regime is chosen according to the training option.

TensorBoard

  • Cannot start TensorBoard after installation

    If tensorboard executable cannot be found, make sure the directory with pip installed packages is in your PATH (that directory is either in your virtual environment if you use a virtual environment, or it should be ~/.local/bin on Linux and %UserProfile%\AppData\Roaming\Python\Python311 and %UserProfile%\AppData\Roaming\Python\Python311\Scripts on Windows).

  • What can be logged in TensorBoard? See the documentation of the SummaryWriter. Common possibilities are:

    • scalar values:
      summary_writer.add_scalar(name like "train/loss", value, step)
      
    • tensor values displayed as histograms or distributions:
      summary_writer.add_histogram(name like "train/output_layer", tensor, step)
      
    • images as tensors with shape [num_images, h, w, channels], where channels can be 1 (grayscale), 2 (grayscale + alpha), 3 (RGB), 4 (RGBA):
      summary_writer.add_images(name like "train/samples", images, step, dataformats="NHWC")
      
      Other dataformats are "HWC" (shape [h, w, channels]), "HW", "NCHW", "CHW".
    • possibly large amount of text (e.g., all hyperparameter values, sample translations in MT, …) in Markdown format:
      summary_writer.add_text(name like "hyperparameters", markdown, step)
      
    • audio as tensors with shape [1, samples] and values in [1,1][-1,1] range:
      summary_writer.add_audio(name like "train/samples", clip, step, [sample_rate])
      

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.

Exam Questions

Lecture 1 Questions

  • Considering a neural network with DD input neurons, a single hidden layer with HH neurons, KK output neurons, hidden activation ff and output activation aa, list its parameters (including their size) and write down how the output is computed. [5]

  • List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]

  • Formulate the Universal approximation theorem. [5]