Be aware that this is an archived page from former years. You can visit the current version instead.

Deep Learning – Winter 2016/17

In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.

The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of their core concepts and of machine learning is advisable.

Timespace Coordinates

lecture: Czech lecture is held on Monday 15:40 in S9, English lecture on Monday 14:00 in S4
practicals: there are two parallel practicals, on Monday 17:20 in SU1 and on Tuesday 12:20 in SU1

Pass Conditions

To complete the course, you need to pass the exam and obtain at least 30 points in the practicals.

The list of exam topics is available here, an example exam from 17th January is available here.
Points in the practicals are awarded for:
- home assignments (recommended way of getting all the points)
- talk (contact me if you are interested)
- optional project (depending on complexity, up to 30 points can be awarded)

Lecture Outlines

The lecture outlines, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover all theory required at the exam, and sometimes even more -- the references in italics cover topics not required for the exam.

Date Content

Oct 10

History of Deep Learning [Section 1.2 of DLB]
Machine Learning Basics [Section 5.1 of DLB]
Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]
Challenges Motivating Deep Learning [Section 5.11 of DLB]
Maximum Likelihood Estimation [Section 5.5 of DLB, excluding equations (5.59)-(5.61)]

Oct 17

Capacity, overfitting and underfitting [Section 5.2 of DLB, excluding Section 5.2.1]
Hyperparameters and validation sets [Section 5.3 of DLB]
Neural network basics (this topic is treated in detail withing the lecture NAIL002)
- Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
- Output activation functions [Section 6.2 of DLB, excluding Section 6.2.1.2 and 6.2.2.4]
- Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
- Basic network architectures [Section 6.4 of DLB]
- Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
- Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3; note that Algorithms 6.5 and 6.6 are used in practice]
Common Datasets

Name	Description	Instances
MNIST	Images (28x28, grayscale) of handwritten digits.	60k
CIFAR-10	Images (32x32, color) of 10 classes of objects.	50k
CIFAR-100	Images (32x32, color) of 100 classes of objects (with 20 defined superclasses).	50k
ImageNet	Labeled object image database (labeled objects, some with bounding boxes).	14.2M
ImageNet-ILSVRC	Subset of ImageNet for Large Scale Visual Recognition Challenge, annotated with 1000 object classes and their bounding boxes.	1.2M
MS COCO	(Microsoft Common Objects in Context) Complex everyday scenes with descriptions (5) and highlighting of objects (91 types).	2.5M
IAM-OnDB	(IAM Online Handwriting Database) Pen tip movements of handwritten English collected from 221 writers.	86k words
TIMIT	Recordings of 630 speakers (10 sentences each) of 8 major dialects of American English.	6.3k sentences
PTB	(Penn Treebank) 2500 stories from Wall Street Journal, annotated with POS tags and parsed into trees.	1M words
PDT	(Prague Dependency Treebank) Czech sentences annotated on 4 layers (word, morphological, analytical, tectogrammatical).	1.9M words
UD	(Universal Dependencies) Treebanks of 40+ languages with consistent annotation of lemmas, POS tags, morphological features and dependency trees.	55 treebanks

Oct 24

Softmax with NLL (negative log likelyhood) as a loss functioin [Section 6.2.2.3 of DLB, notably equation (6.30); you should also be able to compute derivative of softmax + NLL with respect to the inputs of the softmax]
Gradient optimization algorithms (this topic is treated in detail withing the lecture NAIL002)
- SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
- Learning rate decay [tf.train.exponential_decay]
- SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
- SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
Optimization algorithms with adaptive gradients
- AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
- RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
- Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]
Parameter initialization strategies [Section 8.4 of DLB]

Oct 31

Gradient clipping [Section 10.11.1 of DLB]
Regularization [Chapter 7 until Section 7.1 of DLB]
Early stopping [Section 7.8 of DLB, without the How early stopping acts as a regularizer part]
L1 and L2 regularization [Section 7.1 of DLB]
Ensembling [Section 7.11 of DLB]
Dropout [Section 7.12 of DLB]
Introduction to convolutional networks [Chapter 9 and Sections 9.1-9.3 of DLB]

Nov 07

Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
Max pooling and average pooling [Section 9.3 of DLB]
Stride and Padding schemes [Section 9.5 of DLB]
AlexNet [Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
VGG [Karen Simonyan and Andrew Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition]
GoogLeNet [Christian Szegedy et al.: Going Deeper with Convolutions]
Batch normalization [Section 8.7.1 of DLB, optionally the paper Sergey Ioffe and Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift]
ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]

Nov 14

Residual connections in ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
Sequence modelling using Recurrent Neural Networks (RNN) [Chapter 10 until Section 10.2.1 (excluding) of DLB]
The challenge of long-term dependencies [Section 10.7 of DLB]
Long Shoft-Term Memory (LSTM) [Section 10.10.1 of DLB]
Gated Recurrent Unit (GRU) [Section 10.10.2 of DLB]

Nov 21

Bidirectional RNN [Section 10.3 of DLB]
Stacked (or multi-layer) LSTM [Section 10.5 of DLB, or you can find more details in Alex Graves: Generating Sequences With Recurrent Neural Networks]
Stacked LSTM with residual connections [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
Grid LSTM [Nal Kalchbrenner, Ivo Danihelka, Alex Graves: Grid Long Short-Term Memory]
Distributed representation [Sections 5.11.1, 5.11.2 and 15.4 of DLB]
Word2vec word embeddings, notably the CBOW and Skip-gram architectures [Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space]

Nov 28

Hierarchical softmax and Negative sampling [Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality]
Character-level embeddings using Recurrent neural networks [C2W model from Wang Ling, Tiago Luís, Luís Marujo, Ramón Fernandez Astudillo, Silvio Amir, Chris Dyer, Alan W. Black, Isabel Trancoso: Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation]
Character-level embeddings using Convolutional neural networks [CharCNN from Yoon Kim, Yacine Jernite, David Sontag, Alexander M. Rush: Character-Aware Neural Language Models]
Character-level embeddings using character n-grams [Described simultaneously in several papers as Charagram (John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu: Charagram: Embedding Words and Sentences via Character n-grams), Subword Information (Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov: Enriching Word Vectors with Subword Information or SubGram (Tom Kocmi, Ondřej Bojar: SubGram: Extending Skip-Gram Word Representation with Substrings)]

Dec 06

Neural Machine Translation using Encoder-Decoder or Sequence-to-Sequence architecture [Ilya Sutskever, Oriol Vinyals, Quoc V. Le: Sequence to Sequence Learning with Neural Networks and Kyunghyun Cho et al.: Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation]
Using Attention mechanism in Neural Machine Translation [Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate]
Translating Subword Units [Rico Sennrich, Barry Haddow, Alexandra Birch: Neural Machine Translation of Rare Words with Subword Units]
Character-level NMT [Jason Lee, Kyunghyun Cho, Thomas Hofmann: Fully Character-Level Neural Machine Translation without Explicit Segmentation]
Google NMT [Yonghui Wu et al.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation]
Multi-lingual NMT [Melvin Johnson et al.: Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation or Thanh-Le Ha, Jan Niehues, Alexander Waibel: Toward Multilingual Neural Machine Translation with Universal Encoder and Decoder]

Dec 13

Study material for Reinforcement Learning is the second edition of Reinforcement Learning: An Introduction by Richar S. Sutton, available only as a draft.

Multi-arm Bandits [Chapter 2, Sections 2.1-2.3 of Sutton's Book]
General setting of Reinforcement Learning [Chapter 3, Sections 3.1-3.3 of Sutton's Book]
Monte Carlo Reinforcement Learning Algorithm [Chapter 5, Sections 5.1-5.4 (especially the algorithm in 5.4) of Sutton's Book]
Q-Learning [Chapter 6, Sections 6.1, 6.2 and 6.5 (especially the algorithm in 6.5) of Sutton's Book]
Deep Q-Network [Volodymyr Mnih et al.: Human-level control through deep reinforcement learning]

Dec 20

Policy Gradient Methods [Chapter 13, Sections 13.1-13.5 of Sutton's Book]
Policy-gradient (aka REINFORCE) Reinforce Learning Algorithm [Algorithm in Section 13.3 Sutton's Book; note that the gamma^t on the last line should not be there]
REINFORCE with Baseline Reinforce Learning Algorithm [Algorithm in Section 13.4 Sutton's Book; note that the gamma^t on the last line should not be there]
Actor-Critic Reinforce Learning Algorithm [Algorithm in Section 13.5 Sutton's Book; note that the gamma on the last but one line should not be there]
Asynchronous Advantage Actor-Critic (aka A3C) Reinforce Learning Algorithm [Volodymyr Mnih et al.: Asynchronous Methods for Deep Reinforcement Learning]

Jan 09

Autoencoders (undercomplete, sparse, denoising) [Chapter 14, Sections 14-14.2.3 of DLB]
Deep Generative Models using Differentiable Generator Nets [Section 20.10.2 of DLB]
Variational Autoencoders [Section 20.10.3 plus Reparametrization trick from Section 20.9 (but not Section 20.9.1) of DLB]
Generative Adversarial Networks [Section 20.10.4 of DLB]

Tasks

Please send me the solved tasks via email (straka@...).

You can send small files (sources) as attachments, but if you need to send large files, please send me links only!

Task	Points	Due To	Task Description
`mnist_layers_activations`	3	Oct 31 15:39	Modify one of the MNIST examples from `labs03` so that it uses the following hyperparameters: `layers`: number of hidden layers (1-3) `activation`: activation function, either `tf.tanh` or `tf.nn.relu` Then implement hyperparameter search – find the values of hyperpamaters resulting in the best accuracy on the development set (`mnist.validation`) and using these hyperparameters compute the accuracy on the test set (`mnist.test`).
`mnist_training`	2	Nov 07 15:39	Using the MNIST example `labs03/1-mnist.py`, try the following optimizers: standard SGD (`tf.train.GradientDescentOptimizer`), with batch sizes (10,50) and learning rates (0.01,0.001,0.0001) SGD with exponential learning rate decay (use `tf.train.exponential_decay`), with batch sizes (10,50) and the following (starting learning rate, final learning rate) pairs: (0.01,0.001), (0.01,0.0001), (0.001, 0.0001) SGD with momentum (`tf.train.MomentumOptimizer`), with batch sizes (10,50), learning rates (0.01,0.001,0.0001) and momentum 0.9 Adam optimizer (`tf.train.AdamOptimizer`), with batch sizes (10,50) and learning rates (0.002,0.001,0.0005) Report the development set accuracy for all the listed possibilities.
`mnist_dropout`	2	Nov 14 15:39	Using the MNIST example from `labs03/1-mnist.py`, implement dropout (using `tf.nn.dropout`). During training, allow specifying dropout probability for the input layer and for the hidden layer separately. Then perform hyperparameter search using: input layer dropout keep probability (0.8,0.9,1) hidden layer dropout keep probability (0.8,0.9,1) and report both development set accuracy for all hyperparameters and test set accuracy for the best hyperparameters.
`gym_cartpole_supervised`	3	Nov 14 15:39	Solve the CartPole-v1 environment from the OpenAI Gym using supervised learning. Very small amount of training data is available in the `labs04/gym-cartpole-data.txt` file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). The solution to this task should be a model which passes evaluation on random inputs. This evaluation is performed by running the `labs04/gym-cartpole-evaluate.py model_file` command. (You can also pass `--render` argument to render the evaluations interactively.) In order to pass, you should achieve an average reward of at least 475 on 100 episodes. In order to save the model, look at the `labs04/gym-cartpole-save.py`, which saves a model performing random guesses.
`mnist_conv`	3-5	Nov 21 15:39	Try achieving as high accuracy on the MNIST test set as possible (you can start from `labs03/1-mnist.py`, byt you can modify it freely). Nevertheless, remember that you should not perform hyperparameter search on the test set (when you design network architecture, you should perform hyperparameter search on the development set, and measure the test set accuracy only with the best hyperparameters; and optionally repeat with modified architecture). You will be awarded points according to the accuracy achieved: 99.1 test set accuracy: 3 points 99.25 test set accuracy: 4 points 99.4 test set accuracy: 5 points You should use convolution (see `tf.contrib.layers.convolution2d`) optionally with batch normalization (pass `tf.contrib.layers.batch_norm` as `normalizer_fn` argument of `convolution2d`). If you are unsure how, you can start with the following architecture (it is by no means the best solution, it is just a small network inspired by larger ImageNet processing networks): 3x3 convolution with ReLU and 8 filters, 3x3 convolution with ReLU and 8 filters, 3x3 maxpool with stride 2, 3x3 convolution with ReLU and 15 filters, 3x3 convolution with ReLU and 15 filters, 3x3 maxpool with stride 2, flatten (or possibly more convolutions and one maxpool), fully connected layer with 10 outputs and softmax (no more ReLU). To solve this task, send me a source code I can execute (using `python source.py`) which trains a neural network and prints the test set accuracy on standard output (in less than a day :-).
`resnet_subcaltech`	5	Nov 28 15:39	[This task is intended mostly for people which are interested in image processing; you can pass the practicals easily without working on this task.] Implement network which will perform image classification on Sub-Caltech50 dataset (this dataset was created for this task as a subset of Caltech101). The dataset contains images classified in 50 classes and has explicit train/test partitioning (it does not have explicit development partition, use some amount of training data if you need one). In order to implement the image classification, use pre-trained ResNet50 network to extract image features (we do not use ResNet101 nor ResNet152 as they are more computationally demanding). To see how ResNet50 can be used to classify an image on the ImageNet classes, see the `labs05/resnet50.py`. When using the ResNet50 to extract features, pass `num_classes=None` when creating the network, and the network will return 2048 image features instead of logits of 1000 classes. The goal of this task is to train an image classifier using the image features precomputed by ResNet50, and report the testing accuracy. The best course of action is probably to precompute the image features once (for both training and testing set) and save them to disc, and then train the classifier using the precomputed features. As for the classifier model, it is probably enough to create a fully connected layer to 50 neurons with softmax (without ReLU). Bonus: if you are interested, you can finetune the classifier including the ResNet50 and get additional points for it. After you train the classifier as described above, put both the ResNet50 and the pretrained classifier in one Graph, and continue training including the ResNet50 (you need to pass `is_training=True` during ResNet construction).
`sequence_generation`	4	Nov 28 15:39	Implement network which performs sequence generation via LSTM/GRU. Note that for training purposes, we will be using very low-level approach. The goal is to predict the `labs06/international-airline-passengers.tsv` sequence. Start with the `labs06/sequence-generation-skeleton.py` file, which loads the data and supports producing image summaries with the predicted sequence. For training, construct an unrolled series of LSTM/GRU cells, using training portion of gold data as input, predicting the next value in the training sequence (the LSTM/GRU output contains several numbers, so use additional linear layer with one output, and MSE loss). In every epoch, train the same sequence several times (500 is the default in the script). For prediction, use the last output state from the training portion of the network, and construct another unrolled series of LSTM/GRU cells, this time using the prediction from previous step as input. Report results of both LSTM and GRU, each with 8, 10 and 12 cells (by sending the logs of the 6 runs).
`uppercase_letters`	4	Dec 05 15:39	Implement network, which is given an English sentence in lowercase letters and tries to uppercase appropriate letters. Use the `labs06/en-ud-train.txt` as training data, `labs06/en-ud-dev.txt` as development data and `labs06/en-ud-test.txt` as testing data. Start with the `labs06/uppercase-letters-skeleton.py` file, which loads the data, remaps characters to integers, generates random batches and saves summaries. Represent letters either as one-hot vectors (`tf.one_hot`) or using trainable embeddings (`tf.nn.embedding_lookup`), and use bidirectional LSTM/GRU (using `tf.nn.bidirectional_dynamic_rnn`) combined with a linear classification layer with softmax. Report test set accuracy. For your information, straightforward approach with small hyperparameter search on development data has test accuracy of 97.63%.
`tagger`	1-7	Dec 12 15:39	Implement network performing part-of-speech tagging for Czech and English. The data (and word embeddings precomputed using `word2vec`) are available here. The files are stored in vertical format – each word is on a separate line, with empty line denoting end of sentence. Each word line contain three tab-separated values: word form, lemma and tag (you can ignore the lemmas in this task). However, note that only word forms are available in the test data. You can load the dataset using the `labs08/morpho_dataset.py` module. You should start with the `labs08/tagger-skeleton.py` file. This task has several subtasks, you can solve only some of them if you want. The network in each subtask is a bidirectional GRU (with dimension 100), only the word embeddings (always with dimension 100) differ: `learned_we` (1 point): use randomly initialized word embeddings, which you update during training `updated_pretrained_we` (1 point): use pretrained word embeddings, which you further update during training. The pretrained embeddings are in the original data and can be loaded using the `labs08/word_embeddings.py` module. `only_pretrained_we` (1 point): use pretrained word embeddings, which you do not update during training `char_rnn` (1 point): use character-level embeddings computed using bidirectional GRU on the word letters (beginning-of-word and end-of-word characters are not needed; pass `including_charseqs=True` to `MorphoDataset.next_batch` to get character-level information) `char_conv` (1 points): compute word embeddings as convolution of filters followed by a max-pooling layer (beginning-of-word and end-of-word characters are needed), using 25 filters of width 2, 25 filters of width 3, 25 filters of width 4 and 25 filters of width 5 (pass `including_charseqs=True` to `MorphoDataset.next_batch` to get character-level information) `charagram` (2 point): compute word embeddings as average of embeddings of character n-grams present in the word (beginning-of-word and end-of-word characters are needed), for n in (2,3,4) English competition (1-3): using any deep learning approach which uses only the data in the provided archive, try achieving highest accuracy on English testing data. The solution to this subtask is both a source code of you network and annotated testing data, which will be evaluated using the `labs08/morpho_evaluate.py` script. The points will be awarded according to the accuracy reached – three best submissions get 3 points, next three best submissions get 2 points and next three submissions get 1 point. Ondřej Hübsch (95.65) [3 points] Martin Hora (94.48) [3 points] Dušan Variš (93.83) [2 points] Peter Krčah (92.28) [3 points] Zafod (90.27) [2 points] Kuba (89.79) [2 points] Czech competition (1-3): using any deep learning approach which uses only the data in the provided archive, try achieving highest accuracy on Czech testing data. The solution to this subtask is both a source code of you network and annotated testing data, which will be evaluated using the `labs08/morpho_evaluate.py` script. The points will be awarded according to the accuracy reached – three best submissions get 3 points, next three best submissions get 2 points and next three submissions get 1 point Ondřej Hübsch (96.08) [3 points] Peter Krčah (95.56) [3 points] Martin Hora (95.30) [3 points] Dušan Variš (95.16) [2 points] Kuba (86.92) [2 points]
`lemmatizer`	2-6	Dec 19 15:39	Implement network performing lemmatization for Czech and English. Use the data from the previous task. Note that the lemmas are all in lowercase. You should start with the `labs09/lemmatizer-skeleton.py` file. This task has several subtasks, you can solve only some of them if you want. In every subtask, represent a form using concatenation of final states of bidirectional GRU run on the form's characters. `individual_decoder` (2 points): generate every lemma independently, using GRU as a decoder, producing one lemma letter at a time (use `labs09/contrib_seq2seq.py` as a dynamic rnn decoder, see `labs09/rnn_example_decoder.py` for a simple usage) `individual_attention_decoder` (2 point): as in `individual_decoder`, but use attention `combined_attention_decoder` (2 point): use the same approach as in the `individual_attention_decoder`, but use additional sentence-level bidirectional GRU (i.e., the form representations are processed by a bidirectional GRU and the results are used for the lemma generation) English competition (1-3): using any deep learning approach which uses only the data in the provided archive, try achieving highest accuracy on English testing data. The solution to this subtask is both a source code of you network and annotated testing data, which will be evaluated using the `labs08/morpho_evaluate.py` script. The points will be awarded according to the accuracy reached – three best submissions get 3 points, next three best submissions get 2 points and next three submissions get 1 point Krteček (95.68) [3 points] Peter Krčah (65.23) [3 points] Dušan Variš (62.95) [2 points] Czech competition (1-3): using any deep learning approach which uses only the data in the provided archive, try achieving highest accuracy on Czech testing data. The solution to this subtask is both a source code of you network and annotated testing data, which will be evaluated using the `labs08/morpho_evaluate.py` script. The points will be awarded according to the accuracy reached – three best submissions get 3 points, next three best submissions get 2 points and next three submissions get 1 point Dušan Variš (97.45) [2 points] Krteček (83.49) [3 points] Peter Krčah (20.57) [3 points]
`nli`	3-15	Jan 09 15:39	Try solving the Native Language Identification task with highest accuracy possible, ideally beating current state-of-the-art. The dataset is available under a restrictive license, so the details about how to obtain it have been sent by email to the course participants. If you have not received it, please write me an email and I will send you the instructions directly. Your goal is to achieve highest accuracy on the test data. The dataset you have does not contain test annotations, so you cannot measure test accuracy directly. Instead, you should measure development accuracy and finally submit test annotations for the model with best development accuracy. You can load the dataset using the `labs09/nli_dataset.py` module. You can start with the `labs09/nli-skeleton.py` file, which uses the `labs09/nli_dataset.py` module to load the data, passes the data to the network and finally produces test annotations using the model achieving highest development accuracy. In order to solve the task, send me the test set annotations and also the source code. I will evaluate the test set annotations using the `labs09/nli_evaluate.py` script. Every working solution will get 3 points, and you will get additional points accordint to your test set accuracy – the best solution will get a total of 15 points, the next one 14, and so on. Also everyone beating state-of-the-art will get a total of 15 points. Peter Krčah (80.73) MET + kokrous (71.18) Tom Kocmi (71.18) Miroslav Olšák (70.18) Jan Hrach (51.36)
`monte_carlo`	2	Jan 02 15:39	Implement Monte Carlo reinforcement learning algorithm, computing exact average for every state-action pair. Start with the `labs10/monte_carlo-skeleton.py` module. You should be able to reach average reward of 475 on `CartPole-v1` environment (using 500 steps).
`q_learning`	2	Jan 02 15:39	Implement Q-learning algorithm. Start with the `labs10/q_learning-skeleton.py` module. You should be able to reach average reward of 9.7 on `Taxi-v1` environment and -150 on `MountainCar-v0` environment.
`q_network`	2	Jan 02 15:39	Implement Q-learning algorithm, approximating Q-value using a simple linear network. Start with the `labs10/q_network-skeleton.py` module. You should be able to reach average reward of 9.7 on `Taxi-v1` environment.
`reinforce`	2	Jan 09 15:39	Implement REINFORCE algorithm, representing a policy using a neural network with a hidden layer. Start with the `labs10/reinforce-skeleton.py` module. You should be able to reach average reward of 475 on `CartPole-v1` environment (using 500 steps) and -100 on `Acrobot-v1` environment.
`reinforce_with_baseline`	2	Jan 09 15:39	Implement REINFORCE algorithm with value function as a baseline, representing both a policy and a value function using (independent) neural networks with a hidden layer. Start with the `labs11/reinforce_with_baseline-skeleton.py` module. You should be able to reach average reward of 490 on `CartPole-v1` environment (using 500 steps) and -90 on `Acrobot-v1` environment. To observe the effect of the baseline, try comparing your solution to basic `reinforce` using batch of size 1.
`reinforce_with_baseline_pixels`	3	Jan 09 15:39	Note that this task is experimental and may not be easily solvable! Modify the solution of `reinforce_with_baseline` to use pixel inputs. Start with the `labs11/reinforce_with_baseline_pixels-skeleton.py` module. You will get the points is you can show any improvement at all, reaching for example average reward of 50 on `CartPole-v1`. Note that according to papers, it could take hours for the network to converge. Also note that you probably have to use some kind of epsilon-greedy policy (otherwise the policy network usually converges too fast to a wrong solution; in some papers [for example in Asynchronous Methods for Deep Reinforcement Learning] entropy regularization term is used instead). Mean 1000-episode rewards of submitted solutions: Matěj Kocián: 78.8 Bedřich Pišl: 46
`a3c`	3	Jan 09 15:39	Note that this task is experimental and may not be easily solvable! Try implementing Asynchronous Advantage Actor Critic algorithm from Asynchronous Methods for Deep Reinforcement Learning paper. You can start with the `labs11/a3c-skeleton.py` module. You will get the points is you can show minor improvement, reaching average reward of at 100 on `CartPole-v1`. Do not hesitate to send the solution even if it is unstable. Note that the network frequently diverges – in addition to gradient clipping (present in the skeleton), you could use exponential learning rate decay, or some entropy regularization term (see the paper). Mean 1000-episode rewards of submitted solutions: Matěj Kocián: 500
`vae`	3	Feb 19 23:59	Implement simple Variational Autoencoder which generates MNIST digits. Start with `labs12/vae-skeleton.py` and proceed according to the instructions. Note that the skeleton automatically generates several random images each 1000 training batches and stores them in the log dir (i.e., it is not accesible in the TensorBoard). The generated images are random in the upper part and interpolating from left to right (and if dim(z) is 2, also from top to bottom) in the lower part of the generated summary. Bonus: If you would like to experiment with more complicated dataset, you can use CIFAR-10 Cars, which are images of cars from the CIFAR-10 dataset, cropped and desaturated, and stored in MNIST format – therefore, in order to use it, after unpacking just pass `--dataset cifar-cars` to the `labs12/vae-skeleton.py`. Note that you will probably need more complicated encoder (probably using convolutions) and decoder (larger hidden layers, maybe even more). If you are able to generate car images which looks better than with plain `labs12/vae-skeleton.py`, you will get 2 additional points.
`gan`	3	Feb 19 23:59	Implement simple Generative Adversarial Network which generates MNIST digits. Start with `labs12/gan-skeleton.py` and proceed according to the instructions. Note that the skeleton automatically generates several random images each 1000 training batches and stores them in the log dir (i.e., it is not accesible in the TensorBoard). The generated images are random in the upper part and interpolating from left to right (and if dim(z) is 2, also from top to bottom) in the lower part of the generated summary. If you would like to experiment with more complicated dataset, you can use CIFAR-10 Cars, which are images of cars from the CIFAR-10 dataset, cropped and desaturated, and stored in MNIST format – therefore, in order to use it, after unpacking just pass `--dataset cifar-cars` to the `labs12/gan-skeleton.py`.

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Deep Learning – Winter 2016/17

Timespace Coordinates

Pass Conditions

Lecture Outlines

Tasks