Be aware that this is an archived page from former years. You can visit the current version instead.

Deep Learning – Summer 2017/18

In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.

The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of machine learning is advisable.

About

SIS code: NPFL114
Semester: summer
E-credits: 7
Examination: 3/2 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

lecture: Czech lecture is held on Monday 10:40 in S9, English lecture on Monday 13:10 in S9
practicals: there are four parallel practicals, on Monday 15:40 in SU1, Monday 17:20 in SU1, on Tuesday 9:00 in SU1 and on Tuesday 10:40 in SU1

Lectures

1. Introduction to Deep Learning Slides Video numpy_entropy mnist_layers_activations

2. Training Neural Networks Slides Video mnist_training gym_cartpole

3. Training Neural Networks II Slides Video mnist_dropout uppercase

4. Convolutional Networks Slides Video mnist_conv mnist_competition

5. Convolutional Networks II Slides Video mnist_batchnorm fashion_masks

6. Easter Monday 3d_recognition

7. Object Detection & Segmentation, Neural Networks Slides Video nsketch_transfer sequence_classification sequence_prediction

8. Recurrent Neural Networks II, Word Embeddings Slides Video tagger_we tagger_cle tagger_cnne tagger_sota

9. Recurrent Neural Networks III, Machine Translation Slides Video lemmatizer_noattn lemmatizer_attn lemmatizer_sota

10. Deep Generative Models Slides Video vae gan dcgan nli

11. Sequence Prediction, Reinforcement Learning Slides Video tagger_crf phoneme_recognition monte_carlo

12. Sequence Prediction II, Reinforcement Learning II Slides Video q_learning q_network reinforce reinforce_with_baseline reinforce_with_pixels

13. Practical Methodology, TF Development, Advanced Architectures Slides Video Master Thesis Proposals hyperparams_gp hyperparams_rl eager_mnist estimator_mnist

Requirements

To pass the practicals, you need to obtain at least 80 points, which are awarded for home assignments. Note that up to 40 points above 80 will be transfered to the exam.

To pass the exam, you need to obtain at least 55, 70 and 85 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

The student recordings of the lectures and the practicals are available here.

1. Introduction to Deep Learning

Feb 26 Slides Video numpy_entropy mnist_layers_activations

Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
Gaussian distribution [Section 3.9.3 of DLB]
Machine Learning Basics [Section 5.1-5.1.3 of DLB]
History of Deep Learning [Section 1.2 of DLB]
Linear regression [Section 5.1.4 of DLB]
Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
Challenges Motivating Deep Learning [Section 5.11 of DLB]
Neural network basics (this topic is treated in detail withing the lecture NAIL002)
- Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
- Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
- Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
- Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]

2. Training Neural Networks

Mar 05 Slides Video mnist_training gym_cartpole

Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
Hyperparameters and validation sets [Section 5.3 of DLB]
Maximum Likelihood Estimation [Section 5.5 of DLB]
Neural network training (this topic is treated in detail withing the lecture NAIL002)
- Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
- Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3; note that Algorithms 6.5 and 6.6 are used in practice]
- SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
- SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
- SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
- Optimization algorithms with adaptive gradients
  - AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
  - RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
  - Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

3. Training Neural Networks II

Mar 12 Slides Video mnist_dropout uppercase

Training neural network with a single hidden layer
Playing with TensorFlow Playground
Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 10-12]
Regularization [Chapter 7 until Section 7.1 of DLB]
Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
Dataset Augmentation [Section 7.4 of DLB]
Ensembling [Section 7.11 of DLB]
Dropout [Section 7.12 of DLB]

4. Convolutional Networks

Mar 19 Slides Video mnist_conv mnist_competition

Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
Gradient clipping [Section 10.11.1 of DLB]
Parameter initialization strategies [Section 8.4 of DLB]
Introduction to convolutional networks [Chapter 9 and Sections 9.1-9.3 of DLB]
Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
Max pooling and average pooling [Section 9.3 of DLB]
Stride and Padding schemes [Section 9.5 of DLB]
AlexNet [Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
Prior probabilities of convolutional network architecture [Deep Image Prior]

5. Convolutional Networks II

Mar 26 Slides Video mnist_batchnorm fashion_masks

VGG [Karen Simonyan and Andrew Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition]
GoogLeNet (aka Inception) [Christian Szegedy et al.: Going Deeper with Convolutions]
Batch normalization [Section 8.7.1 of DLB, optionally the paper Sergey Ioffe and Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]
ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
WideNet [Wide Residual Network]
ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]
NasNet [Learning Transferable Architectures for Scalable Image Recognition]

6. Easter Monday

Apr 02 3d_recognition

Easter Monday

Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa: Natural Language Processing (Almost) from Scratch]
Connectionist Temporal Classification (CTC) loss [A. Graves, S. Fernández, F. Gomez, J. Schmidhuber: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
Multi-arm Bandits [Chapter 2, Sections 2.1-2.5 of Sutton's Book]
General setting of Reinforcement Learning [Chapter 3, Sections 3.1-3.3 of Sutton's Book]
Monte Carlo Reinforcement Learning Algorithm [Chapter 5, Sections 5.1-5.4 (especially the algorithm in 5.4) of Sutton's Book]

12. Sequence Prediction II, Reinforcement Learning II

Mar 19 Slides Video q_learning q_network reinforce reinforce_with_baseline reinforce_with_pixels

Temporal Difference RL Methods [Section 6.1 of Sutton's Book]
SARSA algorithm [Section 6.4 of Sutton's Book]
Q-Learning algorithm [Section 6.5 of Sutton's Book]
Deep Q-Network [Volodymyr Mnih et al.: Human-level control through deep reinforcement learning]
Policy Gradient Methods [Section 13.1 of Sutton's Book]
Policy Gradient Theorem [Section 13.2 of Sutton's Book]
REINFORCE Algorithm [Section 13.3 of Sutton's Book; note that the gamma^t on the last line should not be there]
REINFORCE with Baseline Algorithm [Section 13.4 of Sutton's Book; note that the gamma^t on the last two lines should not be there]
Actor-Critic Reinforce Learning Algorithm [Section 13.5 of Sutton's Book]

13. Practical Methodology, TF Development, Advanced Architectures

May 21 Slides Video Master Thesis Proposals hyperparams_gp hyperparams_rl eager_mnist estimator_mnist

Hyperparameter selection using Reinforcement Learning [Learning Transferable Architectures for Scalable Image Recognition]
Hyperparameter selection using Bayesian Optimization [Practical Bayesian Optimization of Machine Learning Algorithms, Google Vizier: A Service for Black-Box Optimization]
TensorFlow Eager Mode
TensorFlow Estimator API
TensorFlow Data API
WaveNet [WaveNet: A Generative Model for Raw Audio]
Unsupervised Generation of a Word Dictionary [Word Translation Without Parallel Data]
Memory Augmented Networks [One-shot learning with Memory-Augmented Neural Networks]

The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.4, TensorFlow 1.5.0, NumPy 1.14.0 and OpenAI Gym 0.9.5.

You can install TensorFlow 1.5.0 either to user packages using pip3 install --user tensorflow==1.5.0, or create a virtual environment using python3 -m venv VENV_DIR and then installing TensorFlow inside it by running VENV_DIR/bin/pip3 install tensorflow==1.5.0.

Note that updates about the tasks (notably changes in the task descriptions) are announced on the UFAL NPFL114 mailing list. However, the mailing list will not contain anything not present on this website.

Teamwork

Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

numpy_entropy

Deadline: Mar 12, 15:39 3 points

The goal of this exercise is to famirialize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file numpy_entropy_data.txt, whose lines consist of data points of our dataset, and load numpy_entropy_model.txt, which describes a model probability distribution, with each line being a tab-separated pair of (data point, probability). Example files are in the labs/01.

Then compute the following quantities using NumPy, and print them each on a separate line rounded on two decimal places (or inf for positive infinity, which happens when an element of data distribution has zero probability under the model distribution):

entropy H(data distribution)
cross-entropy H(data distribution, model distribution)
KL-divergence D_KL(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence. The evaluation on ReCodEx is performed on data structurally similar to numpy_entropy_eval_examples.zip.

mnist_layers_activations

Deadline: Mar 12, 15:39 5 points

The motivation of this exercise is to famirialize a bit with TensorFlow and TensorBoard. Start by playing with mnist_example.py. Run it, and when it finishes, run TensorBoard using tensorboard --logdir logs. Then open http://localhost:6006 in a browser and explore the three active tabs – Scalars, Images and Graphs.

Your goal is to modify the mnist_layers_activations.py template and implement the following:

A number of hidden layers (including zero) can be specified on the command line using parameter layers.
Activation function of these hidden layers can be also specified as a command line parameter activation, with supported values of none, relu, tanh and sigmoid.
Print the final accuracy on the test set to standard output. Write the accuracy as percentage rounded on two decimal places, e.g., 91.23.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

0 layers, activation none
1 layer, activation none, relu, tanh, sigmoid
3 layers, activation sigmoid, relu
5 layers, activation sigmoid

mnist_training

Deadline: Mar 19, 15:39 4 points

This exercise should teach you using different optimizers and learning rates (including exponential decay). Your goal is to modify the mnist_training.py template and implement the following:

Using specified optimizer (either SGD or Adam).
Optionally using momentum for the SGD optimizer.
Using specified initial learning rate for the optimizer.
Optionally use given final learning rate. If the final learning rate is given, implement exponential learning rate decay (using tf.train.exponential_decay). Specifically, for the whole first epoch, train using the given initial learning rate. Then lower the learning rate between epochs by multiplying it each time by the same suitable constant, such that the whole last epoch is trained using the specified final learning rate.
Print the final accuracy on the test set to standard output. Write the accuracy as percentage rounded on two decimal places, e.g., 91.23.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

SGD optimizer, learning_rate 0.01;
SGD optimizer, learning_rate 0.01, momentum 0.9;
SGD optimizer, learning_rate 0.1;
Adam optimizer, learning_rate 0.001;
Adam optimizer, learning_rate 0.01 and learning_rate_final 0.001.

gym_cartpole

Deadline: Mar 19, 15:39 5 points

Solve the CartPole-v1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole-data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.

The solution to this task should be a model which passes evaluation on random inputs. This evaluation is performed by running the gym_cartpole_evaluate.py, which loads a model and then evaluates it on 100 random episodes (optionally rendering if --render option is provided). In order to pass, you must achieve an average reward of at least 475 on 100 episodes.

The size of the training data is very small and you should consider it when designing the model.

To submit your model in ReCodEx, use the supplied gym_cartpole_recodex.py script. When executed, the script embeds the saved model in current directory into a script gym_cartpole_recodex_submission.py, which can be submitted in ReCodEx. Note that by default there are at most five submission attempts, write me if you need more.

mnist_dropout

Deadline: Mar 26, 15:39 3 points

This exercise evaluates the effect of dropout. Your goal is to modify the mnist_dropout.py template and implement the following:

Allow using dropout with specified dropout rate on the hidden layer. The dropout must be active only during training and not during test set evaluation.
Print the final accuracy on the test set to standard output. Write the accuracy as percentage rounded on two decimal places, e.g., 91.23.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):

dropout rate 0, 0.3, 0.5, 0.6, 0.8, 0.9

uppercase

Deadline: Mar 26, 15:39 6-10 points

This assignment introduces first textual task. Your goal is to implement a network which is given a Czech text and it tries to uppercase appropriate letters. Specifically, your goal is to uppercase given test set as well as possible. The task data is available in uppercase_data.zip archive. While the training and the development sets are in correct case, the test set is all in lowercase.

This is an open-data task, so you will submit only the uppercased test set (in addition to a training script, which will be used only to understand the approach you took).

The task is also a competition. Everyone who submits a solution which achieves at least 96.5% accuracy will get 6 points; the rest 4 points will be distributed depending on relative ordering of your solutions, i.e., the best solution will get total 10 points, the worst solution (but at least with 96.5% accuracy) will get total 6 points. The accuracy is computed per-character and will be evaluated by uppercase_eval.py script.

If you want, you can start with the uppercase.py template, which loads the data, generate an alphabet of given size containing most frequent characters, and can generate sliding window view on the data. To represent characters, you might find tf.one_hot useful.

To submit the uppercased test set in ReCodEx, use the supplied uppercase_recodex.py script. You need to provide at least two arguments – the first is the path to the uppercased test data and all other arguments are paths to the sources used to generate the test data. Running the script will create uppercase_recodex_submission.py file, which can be submitted in ReCodEx.

Do not use RNNs or CNNs in this task, only densely connected layers (with various activation and output functions).

mnist_conv

Deadline: Apr 02, 15:39 3 points

In this assignment, you will be training convolutional networks. Start with the mnist_conv.py template and implement the following functionality using the tf.layers module. The architecture of the network is described by the cnn parameter, which contains comma-separated specifications of sequential layers:

C-filters-kernel_size-stride-padding: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example: C-10-3-1-same
M-kernel_size-stride: Add max pooling with specified size and stride. Example: M-3-2
F: Flatten inputs.
R-hidden_layer_size: Add a dense layer with ReLU activation and specified size. Example: R-100

For example, when using --cnn=C-10-3-2-same,M-3-2,F,R-100, the development accuracies after first five epochs on my CPU TensorFlow version are 95.14, 97.00, 97.68, 97.66, and 97.98. However, some students also obtained slightly different results on their computers and still passed ReCodEx evaluation.

After implementing this task, you should continue with mnist_batchnorm.

mnist_competition

Deadline: Apr 02, 15:39 5-10 points

The goal of this assignment is to devise the best possible model for MNIST data set. However, in order for the test set results not to be available, use the data from mnist-gan.zip. It was created using GANs (generative adversarial networks) from the original MNIST data and contain fake test labels (all labels are 255).

This is an open-data task, you will submit only test set labels (in addition to a training script, which will be used only to understand the approach you took).

The task is a competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 99.4%, you will be awarded 5 points; the rest 5 points will be distributed depending on relative ordering of your solutions.

The mnist_competition.py template loads data from mnist-gan directory and in the end saves the test labels in the required format (each label on a separate line).

To submit the test set labels in ReCodEx, use the supplied mnist_competition_recodex.py script. You need to provide at least two arguments – the first is the path to the test set labels and all other arguments are paths to the sources used to generate the test data. Running the script will create mnist_competition_recodex_submission.py file, which can be submitted in ReCodEx.

mnist_batchnorm

Deadline: Apr 08, 23:59 3 points

In this assignment, you will work with extend the mnist_conv assignment to support batch normalization. Start with the mnist_batchnorm.py template and in addition to all functionality of mnist_conv, implement also the following layer:

CB-filters-kernel_size-stride-padding: Add a convolutional layer with BatchNorm and ReLU activation and specified number of filters, kernel size, stride and padding, Example: CB-10-3-1-same

To correctly implement BatchNorm:

The convolutional layer should not use any activation and no biases.
The output of the convolutional layer is passed to batch normalization layer tf.layers.batch_normalization, which should specify training=True during training and training=False during inference.
The output of the batch normalization layer is passed through tf.nn.relu.
You need to update the moving averages of mean and variance in the batch normalization layer during each training batch. Such update operations can be obtained using tf.get_collection(tf.GraphKeys.UPDATE_OPS) and utilized either directly in session.run, or (preferably) attached to self.train using tf.control_dependencies.

For example, when using --cnn=CB-10-3-2-same,M-3-2,F,R-100, the development accuracies after first five epochs on my CPU TensorFlow version are 95.92, 97.54, 97.84, 97.76, and 98.18. However, some students also obtained slightly different results on their computers and still passed ReCodEx evaluation.

You can now experiment with various architectures and try obtaining best accuracy on MNIST.

fashion_masks

Deadline: Apr 08, 23:59 6-12 points

This assignment is a simple image segmentation task. The data for this task is available from fashion-masks.zip. The inputs consist of 28×28 greyscale images of ten classes of clothing, while the outputs consist of the correct class and a pixel bit mask. Your goal is to generate such outputs for the test set (including to a training script, which will be used only to understand the approach you took).

Performance is evaluated using mean IoU, where IoU for a single example is defined as an intersection of the gold and system mask divided by their union (assuming the predicted label is correct; if not, IoU is 0). The evaluation (using for example development data) can be performed by fashion_masks_eval.py script.

The task is a competition and the points will be awarded depending on your test set score. If your test set score surpasses 75%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that quite a straightfoward model surpasses 80% on development set after an hour of computation (and 90% after several hours), so reaching 75% is not that difficult.

You should start with the fashion_masks.py template, which loads the data, computes averate IoU and on the end produces test set annotations in the required format (one example per line containing space separated label and mask, the mask stored as zeros and ones, rows first).

To submit the test set annotations in ReCodEx, use the supplied fashion_masks_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

3d_recognition

Deadline: Apr 15, 23:59 7-13 points

Your goal in this assignment is to perform 3D object recognition. The input is voxelized representation of an object, stored as a 3D grid of either empty or occupied voxels, and your goal is to classify the object into one of 10 classes. The data is available in two resolutions, either as 20×20×20 data (visualization of objects of all classes) or 32×32×32 data (visualization of objects of all classes). As usual, this is an open data task; therefore, your goal is to generate labels for unannotated test set. Note that the original dataset contains only train and test portion – you need to use part of train portion as development set.

The task is a competition and the points will be awarded depending on your test set accuracy. If your test set score surpasses 75%, you will be awarded 7 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Note that even straightfoward models can reach more than 90% on the test set, the current state-of-the-art is more than 98%.

You should start with the 3d_recognition.py template, which loads the data, split development set from the training data, and on the end produces test set annotations in the required format.

To submit the test set annotations in ReCodEx, use the supplied 3d_recognition_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

nsketch_transfer

Deadline: Apr 22, 23:59 6-12 points

This assignment demonstrates usefulness of transfer learning. The goal is to train a classifier for hand-drawn sketches. The dataset of 224×224 grayscale sketches categorized in 250 classes is available from nsketch.zip. Again, this is an open data task, and your goal is to generate labels for unannotated test set.

The task is a competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 40%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions.

To solve the task with transfer learning, start with a pre-trained ImageNet network (NASNet A Mobile is used in the template, but feel free to use any) and convert images to features. Then (probably in a separate script) train a classifier processing the precomputed features into required classes. This approach leads to at least 52% accuracy on development set. To improve the accuracy, you can then finetune the original network – compose the pre-trained ImageNet network together with the trained classifier and continue training the whole composition. Such finetuning should lead to at least 70% accuracy on development set (using ResNet).

You should start with the nsketch_transfer.py template, which loads the data, creates NASNet network and load its weight, evaluates and predicts using batches, and on the end produces test set annotations in the required format. However, feel free to use multiple scripts for solving this assignment. The above template requires NASNet sources and pretrained weights, which you can download among others here. An independent example of using NASNet for classification is also available as nasnet_classify.py.

To submit the test set annotations in ReCodEx, use the supplied nsketch_transfer_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

sequence_classification

Deadline: Apr 22, 23:59 3 points

This exercise demonstrates tf.nn.dynamic_rnn, shows convergence speed and illustrates exploding gradient issue and how to fix it with gradient clipping. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.

Your goal is to modify the sequence_classification.py template and implement the following:

Use specified RNN cell type (RNN, GRU and LSTM) and dimensionality.
Process the sequence using tf.nn.dynamic_rnn.
Use additional hidden layer on the RNN outputs if requested.
Implement gradient clipping if requested.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:

--rnn_cell=RNN --sequence_dim=1, --rnn_cell=GRU --sequence_dim=1, --rnn_cell=LSTM --sequence_dim=1
the same as above but with --sequence_dim=2
the same as above but with --sequence_dim=10
--rnn_cell=LSTM --hidden_layer=50 --rnn_cell_dim=30 --sequence_dim=30 and the same with --clip_gradient=1
the same as above but with --rnn_cell=RNN
the same as above but with --rnn_cell=GRU --hidden_layer=70

sequence_prediction

Deadline: Apr 22, 23:59 3 points

The motivation of this exercise is to learn low-level handling of RNN cells. The network should learn to predict one specific sequence of montly totals of international airline passengers from 1949-1960.

Your goal is to modify the sequence_prediction.py template and implement the following:

Use specified RNN cell type (RNN, GRU and LSTM) and dimensionality.
For the training part of the sequence, the network should sequentially predict the elements, using the correct previous element value as inputs.
For the testing part of the sequence, the network should sequentially predict the elements using its own previous prediction.
After each epoch, print the tf.losses.mean_squared_error of the test part prediction using the "{:.2g}" format.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Note that the network does not regularize and only uses one sequence, so it is quite brittle.

try RNN, GRU and LSTM cells
try dimensions of 5, 10 and 50

tagger_we

Deadline: Apr 29, 23:59 2 points

In this assignment you will create a simple part-of-speech tagger. For training and evaluation, use czech-cac.zip data containing Czech tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The dataset can be loaded using the morpho_dataset.py module.

Your goal is to modify the tagger_we.py template and implement the following:

Use specified RNN cell type (GRU and LSTM) and dimensionality.
Create word embeddings for training vocabulary.
Process the sentences using bidirectional RNN.
Predict part-of-speech tags.
You need to properly handle sentences of different lengths in one batch.
Note how resettable metrics are handled by the template.

After submitting the task to ReCodEx, continue with tagger_cle and/or tagger_cnne assignment.

You should also experiment with what effect does the RNN cell type and cell dimensionality have on the results.

tagger_cle

Deadline: Apr 29, 23:59 2 points

This task is a continuation of tagger_we assignment.

Using the tagger_cle.py template, add the following features in addition to tagger_we ones:

Create character embeddings for training alphabet.
Process unique words with a bidirectional character-level RNN.
Create character word-level embeddings as a sum of the final forward and backward state.
Properly distribute the CLEs of unique words into the batches of sentences.
Generate overall embeddings by concatenating word-level embeddings and CLEs.

Once submitted to ReCodEx, you should experiment with the effect of CLEs compared to plain tagger_we, and the influence of their dimensionality. Note that tagger_we has by default word embeddings twice the size of word embeddings in tagger_cle.

tagger_cnne

Deadline: Apr 29, 23:59 2 points

This task is a continuation of tagger_we assignment.

Using the tagger_cnne.py template, add the following features in addition to tagger_we ones:

Create character embeddings for training alphabet.
Process unique words with one-dimensional convolutional filters with kernel size of 2 to some given maximum. To obtain a fixed-size representation, perform chanel-wise max-pooling over the whole word.
Generate convolutional embeddings (CNNE) as a concatenation of features corresponding to the ascending kernel sizes.
Properly distribute the CNNEs of unique words into the batches of sentences.
Generate overall embeddings by concatenating word-level embeddings and CNNEs.

Once submitted to ReCodEx, you should experiment with the effect of CNNEs compared to plain tagger_we, and the influence of the maximum kernel size and number of filters. Note that tagger_we has by default word embeddings twice the size of word embeddings in tagger_cnne.

tagger_sota

Deadline: Apr 29, 23:59 4-15 points

The goal of this task is to improve the state-of-the-art in Czech part-of-speech tagging. The current state-of-the-art is (to my best knowledge) from Spoustová et al., 2009 and is 95.67% in supervised and 95.89% in semi-supervised settings.

For training use the czech-pdt.zip dataset, which can be loaded using the morpho_dataset.py module. Note that the dataset contains more than 1500 unique POS tags and that the POS tags have a fixed structure of 15 positions (so it is possible to generate the POS tag characters independently).

Additionally, you can also use outputs of a morphological analyzer czech-pdt-analysis.zip. For each word form in train, dev and test PDT data, an analysis is present either in a file containing results from a manually generated morphological dictionary, or in a file with results from a trained morphological guesser. Both files have the same structure – each line describes one word form which is stored on the beginning of the line, followed by tab-separated lemma-tag pairs from the analyzer.

This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 90%, you will be awarded 4 points; the rest 6 points will be distributed depending on relative ordering of your solutions. Any solution surpassing 95.89% will get additional 5 points. The evaluation (using for example development data) can be performed by morpho_eval.py script.

You can start with the tagger_sota.py template, which loads the PDT data, loads the morphological analysers data, and finally generates the predictions in the required format (which is exactly the same as the input format).

To submit the test set annotations in ReCodEx, use the supplied tagger_sota_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

lemmatizer_noattn

Deadline: May 06, 23:59 4 points

In this assignment you will create a simple lemmatizer. For training and evaluation, use czech-cac.zip data containing Czech tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The dataset can be loaded using the morpho_dataset.py module.

Your goal is to modify the lemmatizer_noattn.py template and implement the following:

Embed characters of source forms and run a forward GRU encoder.
Embed characters of target lemmas.
Implement a training time decoder which uses gold target characters as inputs.
Implement an inference time decoder which uses previous predictions as inputs.
The initial state of both decoders is the output state of the corresponding GRU encoded form.

After submitting the task to ReCodEx, continue with lemmatizer_attn assignment.

lemmatizer_attn

Deadline: May 06, 23:59 2 points

This task is a continuation of lemmatizer_noattn assignment.

Using the lemmatizer_attn.py template, add the following features in addition to lemmatizer_noattn ones:

Run the encoder using bidirectional GRU.
Implement attention in both decoders. Notably, project the encoder outputs and current state into same dimensionality vectors, apply non-linearity, and generate weights for every encoder output. Finally sum the encoder outputs using these weights and concatenate the computed attention to the decoder inputs.

Once submitted to ReCodEx, you should experiment with the effect of using the attention, and the influence of RNN dimensionality on network performance.

lemmatizer_sota

Deadline: May 06, 23:59 4-13 points

The goal of this task is to improve the state-of-the-art in Czech lemmatization. The current state-of-the-art is (to my best knowledge) czech-morfflex-pdt-161115 reimplementation of Spoustová et al., 2009 tagger and achieves 97.86% lemma accuracy.

As in tagger_sota assignment, for training use the czech-pdt.zip dataset, which can be loaded employing the morpho_dataset.py module. Additionally, you can also use outputs of a morphological analyzer czech-pdt-analysis.zip.

This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 90%, you will be awarded 4 points; the rest 4 points will be distributed depending on relative ordering of your solutions. Any solution surpassing 97.86% will get additional 5 points. The evaluation (using for example development data) can be performed by morpho_eval.py script.

You can start with the lemmatizer_sota.py template, which loads the PDT data, loads the morphological analysers data, and finally generates the predictions in the required format (which is exactly the same as the input format).

To submit the test set annotations in ReCodEx, use the supplied lemmatizer_sota_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

vae

Deadline: May 13, 23:59 3 points

In this assignment you will implement a simple Variational Autoencoder for three datasets in the MNIST format.

Your goal is to modify the vae.py template and implement a functional VAE using the embedded TODO notes.

After submitting the task to ReCodEx, you can experiment with the three available datasets (fashion, cifar-cars and mnist-data) and different latent variable dimensionality (z_dim=2 and z_dim=100). The generated images are available in TensorBoard logs.

gan

Deadline: May 13, 23:59 3 points

In this assignment you will implement a simple Generative Adversarion Network for three datasets in the MNIST format.

Your goal is to modify the gan.py template and implement a functional GAN using the embedded TODO notes.

After submitting the task to ReCodEx, you can experiment with the three available datasets (fashion, cifar-cars and mnist-data) and maybe try different latent variable dimensionality. The generated images are available in TensorBoard logs.

You can also continue with dcgan task.

dcgan

Deadline: May 13, 23:59 1 points

This task is a continuation of gan assignment, which you will modify to implement the Deep Convolutional GAN (DCGAN).

Your goal is to modify the dcgan.py template to implement a DCGAN using the embedded TODO notes. Note that most of the TODO notes are from gan assignment.

After submitting the task to ReCodEx, you can experiment with the three available datasets (fashion, cifar-cars and mnist-data). However, not that you will need a lot of computational power (preferably a GPU) to generate the images.

nli

Deadline: May 13, 23:59 6-12 points

In this competition you will be solving the Native Language Identification task. In that task, you get an English essay writen by a non-native individual and your goal is to identify their native language.

We will be using NLI Shared Task 2013 data, which contains documents in 11 languages. For each language, the train, development and test sets contain 900, 100 and 100 documents, respectively. Particularly interesting is the fact that humans are very bad in this task, while machine learning models can achive quite high accuracy. Notably, the 2013 shared tasks winners achieved 83.6% accuracy, while current state-of-the-art is at least 87.1% (Malmasi and Dras, 2017).

Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, you can use nli_dataset.py script.

This task is an open-data competition and the points will be awarded depending on your test set accuracy. If your test set accuracy surpasses 50%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions. An evaluation (using for example development data) can be performed by nli_eval.py.

You can start with the nli.py template, which loads the data and generates predictions in the required format (language of each essay on a line).

To submit the test set annotations in ReCodEx, use the supplied nli_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

tagger_crf

Deadline: May 20, 23:59 1 points

This task is an extension of tagger_we assignment.

Using the tagger_crf.py template, in addition to tagger_we features, implement training and decoding with a CRF output layer, using the tf.contrib.crf module.

Once submitted to ReCodEx, you should experiment with the effect of CRF compared to plain tagger_we. Note however that the effect of CRF on tagging is minor – more appropriate task is for example named entity recognition, which you can experiment with using Czech Named Entity Corpus czech-cnec.zip.

phoneme_recognition

Deadline: May 20, 23:59 6-10 points

This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of phonemes given a spoken utterance. We will be using TIMIT corpus, with input sound waves passed through the usual preprocessing – computing 13 Mel-frequency cepstral coefficients (MFCCs) each 10 milliseconds and appending their derivation, obtaining 26 floats for every 10 milliseconds. You can repeat exactly this preprocessing on a given wav file using the timit_mfcc26_preprocess.py script.

Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, you can use timit_mfcc26_dataset.py module.

This task is an open-data competition and the points will be awarded depending on your test set performance. The generated phoneme sequences are evaluated using edit distance to the gold phoneme sequence, normalized by the length of the phoneme sequence (i.e., exactly as tf.edit_distance). If your test set score surpasses 50%, you will be awarded 6 points; the rest 6 points will be distributed depending on relative ordering of your solutions. An evaluation (using for example development data) can be performed by timit_mfcc26_eval.py.

You can start with the phoneme_recognition.py template. You will need to implement the following:

The CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
Convert the input phoneme sequences into sparse representation (tf.where and tf.gather_nd are useful).
Use a bidirectional RNN and an output linear layer without activation.
Utilize CTC loss (tf.nn.ctc_loss).
Perform decoding by a CTC decoder (either greedily using tf.nn.ctc_greedy_decoder, or with beam search employing tf.nn.ctc_beam_search_decoder).
Evaluate results using normalized edit distance (tf.edit_distance).
Write the generated phoneme sequences.

To submit the test set annotations in ReCodEx, use the supplied phoneme_recognition_recodex.py script. You need to provide at least two arguments – the first is the path to the test set annotations and all other arguments are paths to the sources used to generate the test data.

monte_carlo

Deadline: May 20, 23:59 3 points

Solve the CartPole-v1 environment environment from the OpenAI Gym using the Monte Carlo reinforcement learning algorithm. Note that this task does not require TensorFlow.

Use the supplied cart_pole_evaluator.py module (depending on gym_evaluator.py to interact with the discretized environment. The environment has the following methods and properties:

states: number of states of the environment
actions: number of actions of the environment
reset(start_evaluate=False) → new_state: starts a new episode
step(action) → new_state, reward, done, info: perform the chosen action in the environment, returning the new state, obtained reward, a boolean flag indicating an end of episode, and additional environment-specific information
render(): render current environment state

Once you finish training (which you indicate by passing start_evaluate=True to reset), your goal is to reach an average reward of 475 during 100 evaluation episodes. Note that the environment prints your 100-episode average reward each 10 episodes even during training.

You can start with the monte_carlo.py template, which parses several useful parameters, creates the environment and illustrates the overall usage.

During evaluation in ReCodEx, three different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.

q_learning

Deadline: May 27, 23:59 3 points

Solve the MountainCar-v0 environment environment from the OpenAI Gym using the Q-learning reinforcement learning algorithm. Note that this task does not require TensorFlow.

Use the supplied mountain_car_evaluator.py module (depending on gym_evaluator.py to interact with the discretized environment. The environment methods and properties are described in the monte_carlo assignment. Your goal is to reach an average reward of -140 during 100 evaluation episodes.

You can start with the q_learning.py template, which parses several useful parameters, creates the environment and illustrates the overall usage. Note that setting hyperparameters of Q-learning is a bit tricky – I usualy start with a larger value of ε (like 0.2 or even 0.5) an then gradually decrease it to almost zero.

During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.

q_network

Deadline: May 27, 23:59 2 points

Solve the MountainCar-v0 environment environment from the OpenAI Gym using a Q-network (neural network variant of Q-learning algorithm).

Note that training DQN (Deep Q-Networks) is inherently tricky and unstable. Therefore, we will implement a direct analogue of tabular Q-learning, allowing the network to employ independent weights for every discretized environment state.

You can start with the q_network.py template. Note that setting hyperparameters of Q-network is even more tricky than for Q-learning – if you try to vary the architecture, it might not learn at all.

During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 10 minutes.

reinforce

Deadline: May 27, 23:59 2 points

Solve the CartPole-v1 environment environment from the OpenAI Gym using the REINFORCE algorithm.

Use the supplied cart_pole_evaluator.py module (depending on gym_evaluator.py to interact with the continuous environment. The environment has the same properties and methods as the discrete environment described in monte_carlo task, with the following additions:

the continuous environment has to be created with discrete=False option
state_shape: the shape describing the floating point state tensor
states: as the number of states is infinite, raises an exception

During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.

After solving this task, you should continue with reinforce_with_baseline.

reinforce_with_baseline

Deadline: May 27, 23:59 2 points

This is a continuation of reinforce assignment.

Using the reinforce_with_baseline.py template, modify the REINFORCE algorithm to use a baseline.

Using a baseline lowers the variance of the value function gradient estimator, which allows faster training and decreases sensitivity to hyperparameter values. To reflect this effect in ReCodEx, note that the evaluation phase will automatically start after 200 episodes. Using only 200 episodes for training in this setting is probably too little for the REINFORCE algorithm, but suffices for the variant with a baseline.

During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 5 minutes.

reinforce_with_pixels

Deadline: May 27, 23:59 6 points

This is an experimental task which might require a lot of time to solve.

The goal of this assignment is to extend the reinforce_with_baseline assignment to make it work on pixel inputs.

The supplied cart_pole_pixels_evaluator.py module (depending on gym_evaluator.py generates a pixel representation of the CartPole environment as 80×80 image with three channels, with each channel representing one time step (i.e., the current situation and the two previous ones).

Start with the reinforce_with_pixels.py template, which contains a rich collection of summaries that you can use to explore the behaviour of the model being trained.

Note that this assignment is not trivial – it takes some time and resources to make the training progress at all. To show any progress, your goal is to reach an average reward of 50 during 100 evaluation episodes. As before, the evaluation period begins only after you call reset with start_evaluate.

During evaluation in ReCodEx, two different random seeds will be employed, and you will get a point for each setting where you reach the required reward. The time limit for each test is 15 minutes.

Because the time limit is 15 minutes per test, you cannot probably train the model directly in ReCodEx. Instead, you need to save the trained model and embed it in your Python solution (see the gym_cartpole assignment for an example of saving the model and then embedding it in a Python source).

hyperparams_gp

Deadline: Jun 03, 23:59 2 points

The goal of this assignment is to try performing automatic hyperparameter search. Your goal is to optimize conv_net.py model with several hyperparameters, so that it achieves highest validation accuracy on Fashion MNIST dataset after two epochs of training. The hyperparameters and their possible values and distributions are described in the ConvNet.hyperparameters method.

Implement the search using the skopt package (can be installed using pip3 install [--user] scikit-optimize), and print best accuracy after 15 trials. Implement the two following strategies:

random_search: use random search in the hyperparameter space
gp_ei: use gaussian process approach (skopt.gp_minimize) with expected improvement (EI) acquisition function

This task is evaluated manually. After you submit your solution to ReCodEx (which will not pass automatic evaluation), write me an email and I will perform the evaluation.

hyperparams_rl

Deadline: Jun 03, 23:59 3 points

The goal of this assignment is to try performing automatic hyperparameter search. Your goal is to optimize conv_net.py model with several hyperparameters, so that it achieves highest validation accuracy on Fashion MNIST dataset after two epochs of training. The hyperparameters and their possible values and distributions are described in the ConvNet.hyperparameters method.

Implement the search using reinforcement learning. Notably, generate the hyperparameters using a forward LSTM with dimensionality 16, generating individual hyperparameters on each time step.

This task is evaluated manually. After you submit your solution to ReCodEx (which will not pass automatic evaluation), write me an email and I will perform the evaluation.

eager_mnist

Deadline: Jun 03, 23:59 3 points

In this assignment, you will implement a simple MNIST CNN classification model using TensorFlow Eager.

Your goal is to start with the eager_mnist.py template and implement training and evaluation using Eager mode according to the template instructions.

estimator_mnist

Deadline: Jun 03, 23:59 3 points

In this assignment, you will implement a simple MNIST convolutional classification model using tf.estimator API.

Your goal is to start with the estimator_mnist.py template and implement training and evaluation using Estimator according to the template instructions.

The exam is primarily written and consists of 5 question, each worth 20 points. The required number of points (including the maximum of 40 surplus points from the practicals) to obtain grades 1, 2, 3 are 85, 70 and 55, respectively. An example exam is available.

Generally, only the topics covered on the lecture are part of the exam (i.e., you should be able to tell me what I told you). The references are to Deep Learning Book, unless stated otherwise.

Computation model of neural networks
- acyclic graph with nodes and edges
- evaluation (forward propagation) [Algorithm 6.1]
- activation functions [tanh and ReLUs, including equations]
- output functions [σ and softmax, including equations (3.30 and 6.29); you should also know how softmax is implemented to avoid overflows]
Backpropagation algorithm [Algorithm 6.2; Algorithms 6.5 and 6.6 are used in practise, i.e., during tf.train.Optimizer.compute_gradients, so you should understand the idea behind them, but you do not have to understand the notation of op.bprop etc. from Algorithms 6.5 and 6.6]
Gradient descent and stochastic gradient descent algorithm [Section 5.9]
Maximum likelihood estimation (MLE) principle [Section 5.5, excluding 5.5.2]
- negative log likelihood as a loss derived by MLE
- mean square error loss derived by MLE from Gaussian prior [Equations (5.64)-(5.66)]
In addition to have theoretical knowledge of the above, you should be able to perform all of it on practical examples – i.e., if you get a network with one hidden layer, a loss and a learning rate, you should perform the forward propagation, compute the loss, perform backpropagation and update weights using SGD. In order to do so, you should be able to derivate softmax with NLL, sigmoid with NLL and linear output with MSE.
Stochastic gradient descent algorithm improvements (you should be able to write the algorithms down and understand motivations behind them):
- learning-rate decay
- SGD with momentum [Section 8.3.2 and Algorithm 8.2]
- SGD with Nestorov Momentum (and how it is different from normal momentum) [Section 8.3.3 and Algorithm 8.3]
- AdaGrad (you should be able to explain why, in case of stationary gradient distribution, AdaGrad effectively decays learning rate) [Section 8.5.1 and Algorithm 8.4]
- RMSProp (and why is it a generalization of AdaGrad) [Section 8.5.2 and Algorithm 8.5]
- Adam (and why the bias-correction terms (1-β^t) are there) [Section 8.5.3 and Algorithm 8.7]
Regularization methods:
- Early stopping [Section 7.8, without the How early stopping acts as a regularizer part]
- L2 regularization [First paragraph of 7.1.1 and Equation (7.5)]
- L1 regularization [Section 7.1.2 up to Equation (7.20)]
- Dropout [just the description of the algorithm]
- Batch normalization [Section 8.7.1]
Gradient clipping [Section 10.11.1]
Convolutional networks:
- Basic convolution and cross-correlation operation on 4D tensors [Equations (9.5) and (9.6)]
- Differences compared to a fully connected layer [Section 9.2 and Figure 9.6]
- Multiple channels in a convolution [Equation (9.7)]
- Stride and padding schemes [Section 9.5 up to page 349, notably Equation (9.8)]
- Max pooling and average pooling [Section 9.3]
- AlexNet [general architecture without knowing specific constants, i.e., convolutional layers combined with pooling layers and two fully connected layers at the end; Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
- ResNet [only the important ideas (mainly residual connections) and overall architecture of ResNet 152; Kaiming He et al.: Deep Residual Learning for Image Recognition]
- Object detection using Fast R-CNN [overall architecture, RoI-pooling layer, parametrization of generated bounding boxes, used loss function; Ross Girshick: Fast R-CNN]
- Proposing RoIs using Faster R-CNN [overall architecture, the differences and similarities of Fast R-CNN and the proposal network from Faster R-CNN; Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks]
- Image segmentation with Mask R-CNN [overall architecture, RoI-align layer; Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick: Mask R-CNN]
Recurrent networks:
- Using RNNs to represent sequences [Figure 10.2 with h as output; Chapter 10 and Section 10.1]
- Using RNNs to classify every sequence element [Figure 10.3; details in Section 10.2 excluding Sections 10.2.1-10.2.4]
- Bidirectional RNNs [Section 10.3]
- Encoder-decoder sequence-to-sequence RNNs [Section 10.4; note that you should know how the network is trained and also how it is later used to predict sequences]
- Stacked (or multi-layer) LSTM [Figure 10.13a of Section 10.10.5; more details (not required for the exam) can be found in Alex Graves: Generating Sequences With Recurrent Neural Networks]
- The problem of vanishing and exploding gradient [Section 10.7]
- Long Short-Term Memory (LSTM) [Section 10.10.1]
- Gated Recurrent Unit (GRU) [Section 10.10.2]
Word representations [in all cases, you should be able to describe the algorithm for computing the embedding, and how the backpropagation works (there is usually nothing special, but if I ask what happens if a word occurs multiple time in a sentence, you should be able to answer)]
- The word2vec word embeddings
  - CBOW and Skip-gram models [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space]
  - Hierarchical softmax [Section 12.4.3.2, or Section 2.1 of the following paper]
  - Negative sampling [Section 2.2 of Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality]; note that negative sampling is a simplification of Importance sampling described in Section 12.4.3.3, with w_i=1; the proposal distribution in word2vec being unigram distribution to the power of 3/4
- Character-level embeddings using RNNs [C2W model from Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
- Character-level embeddings using CNNs [CharCNN from Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush: Character-Aware Neural Language Models]
Highway Networks [Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber: Training Very Deep Networks]
Machine Translation
- Translation using encoder-decoder (also called sequence-to-sequence) architecture [Sections 10.4 and Section 12.4.5]
- Attention mechanism in NMT [Section 12.4.5.1, but you should also know the equations for the attention, notably Equations (4), (5), (6) and (A.1.2) of Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
- Subword units [The BPE algorithm from Section 3.2 of Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
Deep generative models using differentiable generator nets [Section 20.10.2]:
- Variational autoencoders [Section 20.10.3 up to page 698 (excluding), together with Reparametrization trick from Section 20.9 (excluding Section 20.9.1)]
  - Regular autoencoders [undercomplete AE – Section 14.1, sparse AE – first two paragraphs of Section 14.2.1, denoising AE – Section 14.2.2]
- Generative Adversarial Networks [Section 20.10.4 up to page 702 (excluding)]
Structured Prediction
- Conditional Random Fields (CRF) loss [Sections 3.4.2 and A.7 of R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, P. Kuksa: Natural Language Processing (Almost) from Scratch]
- Connectionist Temporal Classification (CTC) loss [A. Graves, S. Fernández, F. Gomez, J. Schmidhuber: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks]
Reinforcement learning [note that proofs are not required for reinforcement learning; all references are to the Mar 2018 draft of second edition of Reinforcement Learning: An Introduction by Richar S. Sutton]
- Multi-arm bandits [Chapter 2, Sections 2.1-2.5]
- General setting of reinforcement learning [agent-environment, action-state-reward, return; Chapter 3, Sections 3.1-3.3]
- Monte Carlo reinforcement learning algorithm [Sections 5.1-5.4, especially the algorithm in Section 5.4]
- Temporal Difference RL Methods [Section 6.1]
- SARSA algorithm [Section 6.4]
- Q-Learning [Section 6.5; you should also understand Eq. (6.1) and (6.2)]
- Policy gradient methods [representing policy by the network, using softmax, Section 13.1]
  - Policy gradient theorem [Section 13.2]
  - REINFORCE algorithm [Section 13.3; note that the γ^t on the last line should not be there]
  - REINFORCE with baseline algorithm [Section 13.4; note that the γ^t on the last two lines should not be there]

Related Courses

Deep Learning Seminar

The goal of the seminar is to follow the newest advancements in the deep learning field. The course takes form of a reading group – each lecture a paper is presented by one of the students. The paper is announced in advance, hence all participants can read it beforehand and can take part in the discussion of the paper.

Deep Reinforcement Learning

Course introducing reinforcement learning, from basic tabular methods to involvement of deep neural networks, focusing both on theory as well as on practical aspects.

Search form

Deep Learning – Summer 2017/18

About

Timespace Coordinates

Lectures

Requirements

1. Introduction to Deep Learning

2. Training Neural Networks

3. Training Neural Networks II

4. Convolutional Networks

5. Convolutional Networks II

6. Easter Monday

7. Object Detection & Segmentation, Neural Networks

8. Recurrent Neural Networks II, Word Embeddings

9. Recurrent Neural Networks III, Machine Translation

10. Deep Generative Models

11. Sequence Prediction, Reinforcement Learning

12. Sequence Prediction II, Reinforcement Learning II

13. Practical Methodology, TF Development, Advanced Architectures

Teamwork

numpy_entropy

mnist_layers_activations

mnist_training

gym_cartpole

mnist_dropout

uppercase

mnist_conv

mnist_competition

mnist_batchnorm

fashion_masks

3d_recognition

nsketch_transfer

sequence_classification

sequence_prediction

tagger_we

tagger_cle

tagger_cnne

tagger_sota

lemmatizer_noattn

lemmatizer_attn

lemmatizer_sota

vae

gan

dcgan

nli

tagger_crf

phoneme_recognition

monte_carlo

q_learning

q_network

reinforce

reinforce_with_baseline

reinforce_with_pixels

hyperparams_gp

hyperparams_rl

eager_mnist

estimator_mnist

Related Courses

Deep Learning Seminar

Deep Reinforcement Learning

Archive

Winter 2016/17