Deep Learning – Winter 2016/17
In recent years, deep neural networks have been used to solve complex machine-learning problems. They have achieved significant state-of-the-art results in many areas.
The goal of the course is to introduce deep neural networks, from the basics to the latest advances. The course will focus both on theory as well as on practical aspects (students will implement and train several deep neural networks capable of achieving state-of-the-art results, for example in named entity recognition, dependency parsing, machine translation, image labeling or in playing video games). No previous knowledge of artificial neural networks is required, but basic understanding of their core concepts and of machine learning is advisable.
Timespace Coordinates
- lecture: Czech lecture is held on Monday 15:40 in S9, English lecture on Monday 14:00 in S4
- practicals: there are two parallel practicals, on Monday 17:20 in SU1 and on Tuesday 12:20 in SU1
Pass Conditions
To complete the course, you need to pass the exam and obtain at least 30 points in the practicals.
- The list of exam topics is available here, an example exam from 17th January is available here.
-
Points in the practicals are awarded for:
- home assignments (recommended way of getting all the points)
- talk (contact me if you are interested)
- optional project (depending on complexity, up to 30 points can be awarded)
Lecture Outlines
The lecture outlines, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).
References to study materials cover all theory required at the exam, and sometimes even more -- the references in italics cover topics not required for the exam.
Date | Content | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Oct 10 |
| ||||||||||||||||||||||||||||||||||||
Oct 17 |
| ||||||||||||||||||||||||||||||||||||
Oct 24 |
| ||||||||||||||||||||||||||||||||||||
Oct 31 |
| ||||||||||||||||||||||||||||||||||||
Nov 07 |
| ||||||||||||||||||||||||||||||||||||
Nov 14 |
| ||||||||||||||||||||||||||||||||||||
Nov 21 |
| ||||||||||||||||||||||||||||||||||||
Nov 28 |
| ||||||||||||||||||||||||||||||||||||
Dec 06 |
| ||||||||||||||||||||||||||||||||||||
Dec 13 | Study material for Reinforcement Learning is the second edition of Reinforcement Learning: An Introduction by Richar S. Sutton, available only as a draft.
| ||||||||||||||||||||||||||||||||||||
Dec 20 |
| ||||||||||||||||||||||||||||||||||||
Jan 09 |
|
Tasks
Please send me the solved tasks via email (straka@...
).
Task | Points | Due To | Task Description |
---|---|---|---|
mnist_layers_activations | 3 | Oct 31 15:39 | Modify one of the MNIST examples from
Then implement hyperparameter search – find the values of hyperpamaters resulting in the best accuracy on the development set ( |
mnist_training | 2 | Nov 07 15:39 | Using the MNIST example
Report the development set accuracy for all the listed possibilities. |
mnist_dropout | 2 | Nov 14 15:39 | Using the MNIST example from
and report both development set accuracy for all hyperparameters and test set accuracy for the best hyperparameters. |
gym_cartpole_supervised | 3 | Nov 14 15:39 | Solve the CartPole-v1 environment from the OpenAI Gym using supervised learning. Very small amount of training data is available in the The solution to this task should be a model which passes evaluation on random inputs. This evaluation is performed by running the In order to save the model, look at the |
mnist_conv | 3-5 | Nov 21 15:39 | Try achieving as high accuracy on the MNIST test set as possible (you can start from
You should use convolution (see To solve this task, send me a source code I can execute (using |
resnet_subcaltech | 5 | Nov 28 15:39 | [This task is intended mostly for people which are interested in image processing; you can pass the practicals easily without working on this task.] Implement network which will perform image classification on Sub-Caltech50 dataset (this dataset was created for this task as a subset of Caltech101). The dataset contains images classified in 50 classes and has explicit train/test partitioning (it does not have explicit development partition, use some amount of training data if you need one). In order to implement the image classification, use pre-trained ResNet50 network to extract image features (we do not use ResNet101 nor ResNet152 as they are more computationally demanding). To see how ResNet50 can be used to classify an image on the ImageNet classes, see the The goal of this task is to train an image classifier using the image features precomputed by ResNet50, and report the testing accuracy. The best course of action is probably to precompute the image features once (for both training and testing set) and save them to disc, and then train the classifier using the precomputed features. As for the classifier model, it is probably enough to create a fully connected layer to 50 neurons with softmax (without ReLU). Bonus: if you are interested, you can finetune the classifier including the ResNet50 and get additional points for it. After you train the classifier as described above, put both the ResNet50 and the pretrained classifier in one Graph, and continue training including the ResNet50 (you need to pass |
sequence_generation | 4 | Nov 28 15:39 | Implement network which performs sequence generation via LSTM/GRU. Note that for training purposes, we will be using very low-level approach. The goal is to predict the For training, construct an unrolled series of LSTM/GRU cells, using training portion of gold data as input, predicting the next value in the training sequence (the LSTM/GRU output contains several numbers, so use additional linear layer with one output, and MSE loss). In every epoch, train the same sequence several times (500 is the default in the script). For prediction, use the last output state from the training portion of the network, and construct another unrolled series of LSTM/GRU cells, this time using the prediction from previous step as input. Report results of both LSTM and GRU, each with 8, 10 and 12 cells (by sending the logs of the 6 runs). |
uppercase_letters | 4 | Dec 05 15:39 | Implement network, which is given an English sentence in lowercase letters and tries to uppercase appropriate letters. Use the Start with the Represent letters either as one-hot vectors ( |
tagger | 1-7 | Dec 12 15:39 | Implement network performing part-of-speech tagging for Czech and English. The data (and word embeddings precomputed using This task has several subtasks, you can solve only some of them if you want. The network in each subtask is a bidirectional GRU (with dimension 100), only the word embeddings (always with dimension 100) differ:
|
lemmatizer | 2-6 | Dec 19 15:39 | Implement network performing lemmatization for Czech and English. Use the data from the previous task. Note that the lemmas are all in lowercase. You should start with the This task has several subtasks, you can solve only some of them if you want. In every subtask, represent a form using concatenation of final states of bidirectional GRU run on the form's characters.
|
nli | 3-15 | Jan 09 15:39 | Try solving the Native Language Identification task with highest accuracy possible, ideally beating current state-of-the-art. The dataset is available under a restrictive license, so the details about how to obtain it have been sent by email to the course participants. If you have not received it, please write me an email and I will send you the instructions directly. Your goal is to achieve highest accuracy on the test data. The dataset you have does not contain test annotations, so you cannot measure test accuracy directly. Instead, you should measure development accuracy and finally submit test annotations for the model with best development accuracy. You can load the dataset using the In order to solve the task, send me the test set annotations and also the source code. I will evaluate the test set annotations using the
|
monte_carlo | 2 | Jan 02 15:39 | Implement Monte Carlo reinforcement learning algorithm, computing exact average for every state-action pair. Start with the You should be able to reach average reward of 475 on |
q_learning | 2 | Jan 02 15:39 | Implement Q-learning algorithm. Start with the You should be able to reach average reward of 9.7 on |
q_network | 2 | Jan 02 15:39 | Implement Q-learning algorithm, approximating Q-value using a simple linear network. Start with the You should be able to reach average reward of 9.7 on |
reinforce | 2 | Jan 09 15:39 | Implement REINFORCE algorithm, representing a policy using a neural network with a hidden layer. Start with the You should be able to reach average reward of 475 on |
reinforce_with_baseline | 2 | Jan 09 15:39 | Implement REINFORCE algorithm with value function as a baseline, representing both a policy and a value function using (independent) neural networks with a hidden layer. Start with the You should be able to reach average reward of 490 on To observe the effect of the baseline, try comparing your solution to basic |
reinforce_with_baseline_pixels | 3 | Jan 09 15:39 | Note that this task is experimental and may not be easily solvable! Modify the solution of You will get the points is you can show any improvement at all, reaching for example average reward of 50 on Note that according to papers, it could take hours for the network to converge. Also note that you probably have to use some kind of epsilon-greedy policy (otherwise the policy network usually converges too fast to a wrong solution; in some papers [for example in Asynchronous Methods for Deep Reinforcement Learning] entropy regularization term is used instead). Mean 1000-episode rewards of submitted solutions:
|
a3c | 3 | Jan 09 15:39 | Note that this task is experimental and may not be easily solvable! Try implementing Asynchronous Advantage Actor Critic algorithm from Asynchronous Methods for Deep Reinforcement Learning paper. You can start with the You will get the points is you can show minor improvement, reaching average reward of at 100 on Note that the network frequently diverges – in addition to gradient clipping (present in the skeleton), you could use exponential learning rate decay, or some entropy regularization term (see the paper). Mean 1000-episode rewards of submitted solutions:
|
vae | 3 | Feb 19 23:59 | Implement simple Variational Autoencoder which generates MNIST digits. Start with Note that the skeleton automatically generates several random images each 1000 training batches and stores them in the log dir (i.e., it is not accesible in the TensorBoard). The generated images are random in the upper part and interpolating from left to right (and if dim(z) is 2, also from top to bottom) in the lower part of the generated summary. Bonus: If you would like to experiment with more complicated dataset, you can use CIFAR-10 Cars, which are images of cars from the CIFAR-10 dataset, cropped and desaturated, and stored in MNIST format – therefore, in order to use it, after unpacking just pass |
gan | 3 | Feb 19 23:59 | Implement simple Generative Adversarial Network which generates MNIST digits. Start with Note that the skeleton automatically generates several random images each 1000 training batches and stores them in the log dir (i.e., it is not accesible in the TensorBoard). The generated images are random in the upper part and interpolating from left to right (and if dim(z) is 2, also from top to bottom) in the lower part of the generated summary. If you would like to experiment with more complicated dataset, you can use CIFAR-10 Cars, which are images of cars from the CIFAR-10 dataset, cropped and desaturated, and stored in MNIST format – therefore, in order to use it, after unpacking just pass |