Deep Learning – Summer 2018/19

In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.

The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.

About

SIS code: NPFL114
Semester: summer
E-credits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

  • lectures: Czech lecture is held on Monday 14:50 in S9, English lecture on Monday 12:20 in S9; first lecture is on Mar 04
  • practicals: there are three parallel practicals, on Monday 17:20 in S9, on Tuesday 9:00 in SU1, and on Tuesday 12:20 in SU1; first practicals are on Mar 04/05

Lectures

1. Introduction to Deep Learning Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations

2. Training Neural Networks Slides PDF Slides 2018 Video mnist_training gym_cartpole

3. Training Neural Networks II Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase

4. Convolutional Neural Networks Slides PDF Slides 2018 Video mnist_cnn cifar_competition

5. Convolutional Neural Networks II Slides PDF Slides 2018 Video mnist_multiple fashion_masks

6. Convolutional Neural Networks III, Recurrent Neural Networks Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification

7. Recurrent Neural Networks II Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition


Requirements

To pass the practicals, you need to obtain at least 80 points, which are awarded for home assignments. Note that up to 40 points above 80 will be transfered to the exam.

To pass the exam, you need to obtain at least 60, 75 and 90 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Deep Learning

 Mar 04 Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations

  • Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
  • Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
  • Gaussian distribution [Section 3.9.3 of DLB]
  • Machine Learning Basics [Section 5.1-5.1.3 of DLB]
  • History of Deep Learning [Section 1.2 of DLB]
  • Linear regression [Section 5.1.4 of DLB]
  • Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
  • Challenges Motivating Deep Learning [Section 5.11 of DLB]
  • Neural network basics (this topic is treated in detail withing the lecture NAIL002)
    • Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
    • Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
    • Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
    • Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]

2. Training Neural Networks

 Mar 11 Slides PDF Slides 2018 Video mnist_training gym_cartpole

  • Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
  • Hyperparameters and validation sets [Section 5.3 of DLB]
  • Maximum Likelihood Estimation [Section 5.5 of DLB]
  • Neural network training (this topic is treated in detail withing the lecture NAIL002)
    • Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
    • Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3; note that Algorithms 6.5 and 6.6 are used in practice]
    • SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
    • SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
    • SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
    • Optimization algorithms with adaptive gradients
      • AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
      • RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
      • Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

3. Training Neural Networks II

 Mar 18 Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase

  • Training neural network with a single hidden layer
  • Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 10-12]
  • Regularization [Chapter 7 until Section 7.1 of DLB]
    • Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
    • L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
    • Dataset augmentation [Section 7.4 of DLB]
    • Ensembling [Section 7.11 of DLB]
    • Dropout [Section 7.12 of DLB]
    • Label smoothing [Section 7.5.1 of DLB]
  • Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
  • Parameter initialization strategies [Section 8.4 of DLB]

4. Convolutional Neural Networks

 Mar 25 Slides PDF Slides 2018 Video mnist_cnn cifar_competition

5. Convolutional Neural Networks II

 Apr 01 Slides PDF Slides 2018 Video mnist_multiple fashion_masks

6. Convolutional Neural Networks III, Recurrent Neural Networks

 Apr 08 Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification

7. Recurrent Neural Networks II

 Apr 15 Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition

The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, TensorFlow 2.0.0a0, NumPy 1.16.1 and OpenAI Gym 0.9.5.

You can install all required packages either to user packages using pip3 install --user tensorflow==2.0.0a0 gym==0.9.5, or create a virtual environment using python3 -m venv VENV_DIR and then installing the packages inside it by running VENV_DIR/bin/pip3 install tensorflow==2.0.0a0 gym==0.9.5. If you have a GPU, you can install GPU-enabled TensorFlow by using tensorflow-gpu instead of tensorflow.

Teamwork

Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

numpy_entropy

 Deadline: Mar 17, 23:59  3 points

The goal of this exercise is to famirialize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file numpy_entropy_data.txt, whose lines consist of data points of our dataset, and load numpy_entropy_model.txt, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability). Example files are in the labs/01.

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

  • entropy H(data distribution)
  • cross-entropy H(data distribution, model distribution)
  • KL-divergence DKL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

mnist_layers_activations

 Deadline: Mar 17, 23:59  3 points

The templates changed on Mar 11 because of the upgrade to TF 2.0.0a0, be sure to use the updated ones when submitting!

In order to familiarize with TensorFlow and TensorBoard, start by playing with example_keras_tensorboard.py. Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs. Then open http://localhost:6006 in a browser and explore the active tabs.

Your goal is to modify the mnist_layers_activations.py template and implement the following:

  • A number of hidden layers (including zero) can be specified on the command line using parameter layers.
  • Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.
  • Print the final accuracy on the test set.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

  • 0 layers, activation none
  • 1 layer, activation none, relu, tanh, sigmoid
  • 10 layers, activation sigmoid, relu

mnist_training

 Deadline: Mar 24, 23:59  4 points

This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:

  • Using specified optimizer (either SGD or Adam).
  • Optionally using momentum for the SGD optimizer.
  • Using specified learning rate for the optimizer.
  • Optionally use a given learning rate schedule. The schedule can be either exponential or polynomial (with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

  • SGD optimizer, learning_rate 0.01;
  • SGD optimizer, learning_rate 0.01, momentum 0.9;
  • SGD optimizer, learning_rate 0.1;
  • Adam optimizer, learning_rate 0.001;
  • Adam optimizer, learning_rate 0.01;
  • Adam optimizer, exponential decay, learning_rate 0.01 and learning_rate_final 0.001;
  • Adam optimizer, polynomial decay, learning_rate 0.01 and learning_rate_final 0.0001.

gym_cartpole

 Deadline: Mar 24, 23:59  4 points

Solve the CartPole-v1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole-data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.

The solution to this task should be a model which passes evaluation on random inputs. This evaluation is performed by running the gym_cartpole_evaluate.py, which loads a model and then evaluates it on 100 random episodes (optionally rendering if --render option is provided). In order to pass, you must achieve an average reward of at least 475 on 100 episodes. Your model should have either one or two outputs (i.e., using either sigmoid of softmax output function).

The size of the training data is very small and you should consider it when designing the model.

When submitting your model to ReCodEx, submit:

  • one file with the model itself (with h5 suffix),
  • the source code (or multiple sources) used to train the model (with py suffix), and possibly indicating teams.

mnist_regularization

 Deadline: Mar 31, 23:59  6 points

You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:

  • Allow using dropout with rate args.dropout. Add a dropout layer after the first Flatten and also after all Dense hidden layers (but not after the output layer).
  • Allow using L2 regularization with weight args.l2. Use tf.keras.regularizers.L1L2 as a regularizer for all kernels and biases of all Dense layers (including the last one).
  • Allow using label smoothing with weight args.label_smoothing. Instead of SparseCategoricalCrossentropy, you will need to use CategoricalCrossentropy which offers label_smoothing argument.

In ReCodEx, there will be three tests (one for each regularization methods) and you will get 2 points for passing each one.

In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):

  • dropout rate 0, 0.3, 0.5, 0.6, 0.8;
  • l2 regularization 0, 0.001, 0.0001, 0.00001;
  • label smoothing 0, 0.1, 0.3, 0.5.

mnist_ensemble

 Deadline: Mar 31, 23:59  2 points

Your goal in this assignment is to implement model ensembling. The mnist_ensemble.py template trains args.models individual models, and your goal is to perform an ensemble of the first model, first two models, first three models, …, all models, and evaluate their accuracy on the development set.

In addition to submitting the task in ReCodEx, run the script with args.models=7 and look at the results in mnist_ensemble.out file.

uppercase

 Deadline: Mar 31, 23:59  4-9 points

This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.

This is an open-data task, where you submit only the uppercased test set together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.

The task is also a competition. Everyone who submits a solution which achieves at least 96.5% accuracy will get 4 points; the rest 5 points will be distributed depending on relative ordering of your solutions, i.e., the best solution will get total 9 points, the worst solution (but at least with 96.5% accuracy) will get total 4 points. The accuracy is computed per-character and can be evaluated by uppercase_eval.py script.

You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.

Do not use RNNs or CNNs in this task (if you have doubts, contact me).

mnist_cnn

 Deadline: Apr 07, 23:59  5 points

To pass this assignment, you will learn to construct basic convolutional neural network layers. Start with the mnist_cnn.py template and assume the requested architecture is described by the cnn argument, which contains comma-separated specifications of the following layers:

  • C-filters-kernel_size-stride-padding: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example: C-10-3-1-same
  • CB-filters-kernel_size-stride-padding: Same as C-filters-kernel_size-stride-padding, but use batch normalization. In detail, start with a convolutional layer without bias and activation, then add batch normalization layer, and finally ReLU activation. Example: CB-10-3-1-same
  • M-kernel_size-stride: Add max pooling with specified size and stride. Example: M-3-2
  • R-[layers]: Add a residual connection. The layers contain a specification of at least one convolutional layer (but not a recursive residual connection R). The input to the specified layers is then added to their output. Example: R-[C-16-3-1-same,C-16-3-1-same]
  • F: Flatten inputs. Must appear exactly once in the architecture.
  • D-hidden_layer_size: Add a dense layer with ReLU activation and specified size. Example: D-100

An example architecture might be --cnn=CB-16-5-2-same,M-3-2,F,D-100.

After a successful ReCodEx submission, you can try obtaining the best accuracy on MNIST and then advance to cifar_competition.

cifar_competition

 Deadline: Apr 07, 23:59  5-10 points

The goal of this assignment is to devise the best possible model for CIFAR-10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR-10.

This is an open-data task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.

The task is also a competition. Everyone who submits a solution which achieves at least 60% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions. Note that my solutions usually need to achieve at least ~73% on the development set to score 60% on the test set.

You may want to start with the cifar_competition.py template.

mnist_multiple

 Deadline: Apr 14, 23:59  4 points

In this assignment you will implement a model with multiple inputs, multiple outputs, manual batch preparation, and manual evaluation. Start with the mnist_multiple.py template and:

  • The goal is to create a model, which given two input MNIST images predicts if the digit on the first one is larger than on the second one.
  • The model has three outputs:
    • direct prediction of the required value,
    • label prediction for the first image,
    • label prediction for the second image.
  • In addition to direct prediction, you can predict labels for both images and compare them -- an indirect prediction.
  • You need to implement:
    • the model, using multiple inputs, outputs, losses, and metrics;
    • generation of two-image batches using regular MNIST batches,
    • computation of direct and indirect prediction accuracy.

fashion_masks

 Deadline: Apr 14, 23:59  5-11 points

This assignment is a simple image segmentation task. The data for this task is available through the fashion_masks_data.py The inputs consist of 28×28 greyscale images of ten classes of clothing, while the outputs consist of the correct class and a pixel bit mask.

This is an open-data task, where you submit only the test set annotations together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file. Note that all .zip files you submit will be extracted first.

Performance is evaluated using mean IoU, where IoU for a single example is defined as an intersection of the gold and system mask divided by their union (assuming the predicted label is correct; if not, IoU is 0). The evaluation (using for example development data) can be performed by fashion_masks_eval.py script.

The task is a competition and the points will be awarded depending on your test set score. If your test set score surpasses 75%, you will be awarded 5 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that quite a straightfoward model surpasses 80% on development set after an hour of computation (and 90% after several hours), so reaching 75% is not that difficult.

You may want to start with the fashion_masks.py template, which loads the data and generates test set annotations in the required format (one example per line containing space separated label and mask, the mask stored as zeros and ones, rows first).

caltech42_competition

 Deadline: Apr 21, 23:59 Apr 22, 23:59  5-10 points

The goal of this assignment is to try transfer learning approach to train image recognition on a small dataset with 42 classes. You can load the data using the caltech42.py module. In addition to the training data, you should use a MobileNet v2 pretrained network (details in caltech42_competition.py).

This is an open-data task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.

The task is also a competition. Everyone who submits a solution which achieves at least 94% test set accuracy will get 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.

You may want to start with the caltech42_competition.py template.

sequence_classification

 Deadline: Apr 21, 23:59 Apr 22, 23:59  6 points

The goal of this assignment is to introduce recurrent neural networks, manual TensorBoard log collection, and manual gradient clipping. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.

Your goal is to modify the sequence_classification.py template and implement the following:

  • Use specified RNN cell type (SimpleRNN, GRU and LSTM) and dimensionality.
  • Process the sequence using the required RNN.
  • Use additional hidden layer on the RNN outputs if requested.
  • Implement gradient clipping if requested.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:

  • --rnn_cell=SimpleRNN --sequence_dim=1, --rnn_cell=GRU --sequence_dim=1, --rnn_cell=LSTM --sequence_dim=1
  • the same as above but with --sequence_dim=2
  • the same as above but with --sequence_dim=10
  • --rnn_cell=LSTM --hidden_layer=50 --rnn_cell_dim=30 --sequence_dim=30 and the same with --clip_gradient=1
  • the same as above but with --rnn_cell=SimpleRNN
  • the same as above but with --rnn_cell=GRU --hidden_layer=150

tagger_we

 Deadline: Apr 28, 23:59  3 points

In this assignment you will create a simple part-of-speech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The morpho_dataset.py module (down)loads the dataset and can generate batches.

Your goal is to modify the tagger_we.py template and implement the following:

  • Use specified RNN cell type (GRU and LSTM) and dimensionality.
  • Create word embeddings for training vocabulary.
  • Process the sentences using bidirectional RNN.
  • Predict part-of-speech tags. Note that you need to properly handle sentences of different lengths in one batch using masking.

After submitting the task to ReCodEx, continue with tagger_cle_rnn assignment.

tagger_cle_rnn

 Deadline: Apr 28, 23:59  3 points

This task is a continuation of tagger_we assignment. Using the tagger_cle_rnn.py template, implement the following features in addition to tagger_we:

  • Create character embeddings for training alphabet.
  • Process unique words with a bidirectional character-level RNN, concatenating the results.
  • Properly distribute the CLEs of unique words into the batches of sentences.
  • Generate overall embeddings by concatenating word-level embeddings and CLEs.

Once submitted to ReCodEx, continue with tagger_cle_cnn assignment. Additionaly, you should experiment with the effect of CLEs compared to plain tagger_we, and the influence of their dimensionality. Note that tagger_we has by default word embeddings twice the size of word embeddings in tagger_cle_rnn.

tagger_cle_cnn

 Deadline: Apr 28, 23:59  2 points

This task is a continuation of tagger_cle_rnn assignment. Using the tagger_cle_cnn.py template, implement the following features compared to tagger_cle_rnn:

  • Instead of using RNNs to generate character-level embeddings, process embedded unique words with 1D convolutional filters with kernel sizes of 2 to some given maximum. To obtain a fixed-size representation, perform global max-pooling over the whole word.

speech_recognition

 Deadline: Apr 28, 23:59  7-12 points

This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using TIMIT corpus, with input sound waves passed through the usual preprocessing – computing Mel-frequency cepstral coefficients (MFCCs). You can repeat exactly this preprocessing on a given audio using the timit_mfcc_preprocess.py script.

Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset using the timit_mfcc.py module.

This is an open-data task, where you submit only the test set labels together with the training script (which will not be executed, it will be only used to understand the approach you took, and to indicate teams). Explicitly, submit exactly one .txt file and at least one .py file.

The task is also a competition. The evaluation is performed by computing edit distance to the gold letter sequence, normalized by its length (i.e., exactly as tf.edit_distance). Everyone who submits a solution which achieves at most 50% test set edit distance will get 7 points; the rest 5 points will be distributed depending on relative ordering of your solutions. An evaluation (using for example development data) can be performed by speech_recognition_eval.py.

You should start with the speech_recognition.py template.

  • To perform speech recognition, you should use CTC loss for training and CTC beam search decoder for prediction. Both the CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
  • The basic architecture:
    • converts target letters into sparse representation,
    • use a bidirectional RNN and an output linear layer without activation,
    • compute CTC loss (tf.nn.ctc_loss),
    • if required, perform decoding by a CTC decoder (tf.nn.ctc_beam_search_decoder) and possibly evaluate results using normalized edit distance (tf.edit_distance).