Faculty of Mathematics and Physics

In recent years, **deep neural networks** have been used to solve complex
machine-learning problems. They have achieved significant **state-of-the-art**
results in many areas.

The goal of the course is to introduce deep neural networks, from the basics to
the latest advances. The course will focus both on **theory** as well as on
**practical aspects** (students will implement and train several deep neural
networks capable of achieving state-of-the-art results, for example in named
entity recognition, dependency parsing, machine translation, image labeling or
in playing video games). **No previous knowledge** of artificial neural networks is
required, but basic understanding of machine learning is advisable.

SIS code: NPFL114

Semester: summer

E-credits: 7

Examination: 3/2 C+Ex

Guarantor: Milan Straka

**lectures**: Czech lecture is held on Monday 14:50 in S9, English lecture on Monday 12:20 in S9; first lecture is on**Mar 04****practicals**: there are three parallel practicals, on Monday 17:20 in S9, on Tuesday 9:00 in SU1, and on Tuesday 12:20 in SU1; first practicals are on**Mar 04/05**

1. Introduction to Deep Learning Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations

2. Training Neural Networks Slides PDF Slides 2018 Video mnist_training gym_cartpole

3. Training Neural Networks II Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase

4. Convolutional Neural Networks Slides PDF Slides 2018 Video mnist_cnn cifar_competition

5. Convolutional Neural Networks II Slides PDF Slides 2018 Video mnist_multiple fashion_masks

6. Convolutional Neural Networks III, Recurrent Neural Networks Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification

7. Recurrent Neural Networks II Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition

To pass the practicals, you need to obtain at least **80** points, which are
awarded for home assignments. Note that up to **40** points above 80 will be
transfered to the exam.

To pass the exam, you need to obtain at least 60, 75 and 90 out of 100 points for the written exam (plus up to 40 points from the practicals), to obtain grades 3, 2 and 1, respectively.

The lecture content, including references to study materials. The main study material is the Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville, (referred to as DLB).

References to study materials cover **all theory required** at the exam,
and sometimes even more – the references in *italics* cover topics
**not required** for the exam.

Mar 04 Slides PDF Slides 2018 Video numpy_entropy mnist_layers_activations

- Random variables, probability distributions, expectation, variance, Bernoulli distribution, Categorical distribution [Sections 3.2, 3.3, 3.8, 3.9.1 and 3.9.2 of DLB]
- Self-information, entropy, cross-entropy, KL-divergence [Section 3.13 of DBL]
- Gaussian distribution [Section 3.9.3 of DLB]
*Machine Learning Basics [Section 5.1-5.1.3 of DLB]**History of Deep Learning [Section 1.2 of DLB]**Linear regression [Section 5.1.4 of DLB]**Brief description of Logistic Regression, Maximum Entropy models and SVM [Sections 5.7.1 and 5.7.2 of DLB]**Challenges Motivating Deep Learning [Section 5.11 of DLB]*- Neural network basics (this topic is treated in detail withing the lecture NAIL002)
- Neural networks as graphs [Chapter 6 before Section 6.1 of DLB]
- Output activation functions [Section 6.2.2 of DLB, excluding Section 6.2.2.4]
- Hidden activation functions [Section 6.3 of DLB, excluding Section 6.3.3]
- Basic network architectures [Section 6.4 of DLB, excluding Section 6.4.2]

Mar 11 Slides PDF Slides 2018 Video mnist_training gym_cartpole

- Capacity, overfitting, underfitting, regularization [Section 5.2 of DLB]
- Hyperparameters and validation sets [Section 5.3 of DLB]
- Maximum Likelihood Estimation [Section 5.5 of DLB]
- Neural network training (this topic is treated in detail withing the lecture NAIL002)
- Gradient Descent and Stochastic Gradient Descent [Sections 4.3 and 5.9 of DLB]
- Backpropagation algorithm [Section 6.5 to 6.5.3 of DLB, especially Algorithms 6.2 and 6.3;
*note that Algorithms 6.5 and 6.6 are used in practice*] - SGD algorithm [Section 8.3.1 and Algorithm 8.1 of DLB]
- SGD with Momentum algorithm [Section 8.3.2 and Algorithm 8.2 of DLB]
- SGD with Nestorov Momentum algorithm [Section 8.3.3 and Algorithm 8.3 of DLB]
- Optimization algorithms with adaptive gradients
- AdaGrad algorithm [Section 8.5.1 and Algorithm 8.4 of DLB]
- RMSProp algorithm [Section 8.5.2 and Algorithm 8.5 of DLB]
- Adam algorithm [Section 8.5.3 and Algorithm 8.7 of DLB]

Mar 18 Slides PDF Slides 2018 Video mnist_regularization mnist_ensemble uppercase

*Training neural network with a single hidden layer*- Softmax with NLL (negative log likelihood) as a loss function [Section 6.2.2.3 of DLB, notably equation (6.30); plus slides 10-12]
- Regularization [Chapter 7 until Section 7.1 of DLB]
- Early stopping [Section 7.8 of DLB, without the
*How early stopping acts as a regularizer*part] - L2 and L1 regularization [Sections 7.1 and 5.6.1 of DLB; plus slides 17-18]
- Dataset augmentation [Section 7.4 of DLB]
- Ensembling [Section 7.11 of DLB]
- Dropout [Section 7.12 of DLB]
- Label smoothing [Section 7.5.1 of DLB]

- Early stopping [Section 7.8 of DLB, without the
- Saturating non-linearities [Section 6.3.2 and second half of Section 6.2.2.2 of DLB]
- Parameter initialization strategies [Section 8.4 of DLB]

Mar 25 Slides PDF Slides 2018 Video mnist_cnn cifar_competition

- Gradient clipping [Section 10.11.1 of DLB]
- Introduction to convolutional networks [Chapter 9 and Sections 9.1-9.3 of DLB]
- Convolution as operation on 4D tensors [Section 9.5 of DLB, notably Equations (9.7) and (9.8)]
- Max pooling and average pooling [Section 9.3 of DLB]
- Stride and Padding schemes [Section 9.5 of DLB]
- AlexNet [Alex Krizhevsky et al.: ImageNet Classification with Deep Convolutional Neural Networks]
- VGG [Karen Simonyan and Andrew Zisserman: Very Deep Convolutional Networks for Large-Scale Image Recognition]
- GoogLeNet (aka Inception) [Christian Szegedy et al.: Going Deeper with Convolutions]
- Batch normalization [Section 8.7.1 of DLB,
*optionally the paper Sergey Ioffe and Christian Szegedy: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift*]*Inception v2 and v3 [Rethinking the Inception Architecture for Computer Vision]* - ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]

Apr 01 Slides PDF Slides 2018 Video mnist_multiple fashion_masks

- ResNet [Kaiming He et al.: Deep Residual Learning for Image Recognition]
*WideNet [Wide Residual Network]**ResNeXt [Aggregated Residual Transformations for Deep Neural Networks]*- Object detection using Fast R-CNN [Ross Girshick:
**Fast R-CNN**] - Proposing RoIs using Faster R-CNN [Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun:
**Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks**] - Image segmentation [Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick:
**Mask R-CNN**] - Layer Normalization [Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton:
**Layer Normalization**] - Group Normalization [Yuxin Wu, Kaiming He:
**Group Normalization**]

Apr 08 Slides PDF Slides 2018 Video I 2018 Video II caltech42_competition sequence_classification

Apr 15 Slides PDF Slides 2018 Video I 2018 Video II 2018 Video III 2018 Video IV tagger_we tagger_cle_rnn tagger_cle_cnn speech_recognition

The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, TensorFlow 2.0.0a0, NumPy 1.16.1 and OpenAI Gym 0.9.5.

You can install all required packages either to user packages using
`pip3 install --user tensorflow==2.0.0a0 gym==0.9.5`

,
or create a virtual environment using `python3 -m venv VENV_DIR`

and then installing the packages inside it by running
`VENV_DIR/bin/pip3 install tensorflow==2.0.0a0 gym==0.9.5`

.
If you have a GPU, you can install GPU-enabled TensorFlow by using
`tensorflow-gpu`

instead of `tensorflow`

.

Working in teams of size 2 (or at most 3) is encouraged. All members of the team
must submit in ReCodEx individually, but can have exactly the same
sources/models/results. **However, each such solution must explicitly list all
members of the team to allow plagiarism detection using
this template.**

Deadline: Mar 17, 23:59 3 points

The goal of this exercise is to famirialize with Python, NumPy and ReCodEx submission system. Start with the numpy_entropy.py.

Load a file `numpy_entropy_data.txt`

, whose lines consist of data points of our
dataset, and load `numpy_entropy_model.txt`

, which describes a model probability distribution,
with each line being a tab-separated pair of *(data point, probability)*.
Example files are in the labs/01.

Then compute the following quantities using NumPy, and print them each on
a separate line rounded on two decimal places (or `inf`

for positive infinity,
which happens when an element of data distribution has zero probability
under the model distribution):

- entropy
*H(data distribution)* - cross-entropy
*H(data distribution, model distribution)* - KL-divergence
*D*_{KL}(data distribution, model distribution)

Use natural logarithms to compute the entropies and the divergence.

Deadline: Mar 17, 23:59 3 points

**The templates changed on Mar 11 because of the upgrade to TF 2.0.0a0,
be sure to use the updated ones when submitting!**

In order to familiarize with TensorFlow and TensorBoard, start by playing with
example_keras_tensorboard.py.
Run it, and when it finishes, run TensorBoard using `tensorboard --logdir logs`

.
Then open http://localhost:6006 in a browser and explore the active tabs.

Your goal is to modify the mnist_layers_activations.py template and implement the following:

- A number of hidden layers (including zero) can be specified on the command line
using parameter
`layers`

. - Activation function of these hidden layers can be also specified as a command
line parameter
`activation`

, with supported values of`none`

,`relu`

,`tanh`

and`sigmoid`

. - Print the final accuracy on the test set.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

`0`

layers, activation`none`

`1`

layer, activation`none`

,`relu`

,`tanh`

,`sigmoid`

`10`

layers, activation`sigmoid`

,`relu`

Deadline: Mar 24, 23:59 4 points

This exercise should teach you using different optimizers, learning rates, and learning rate decays. Your goal is to modify the mnist_training.py template and implement the following:

- Using specified optimizer (either
`SGD`

or`Adam`

). - Optionally using momentum for the
`SGD`

optimizer. - Using specified learning rate for the optimizer.
- Optionally use a given learning rate schedule. The schedule can be either
`exponential`

or`polynomial`

(with degree 1, so inverse time decay). Additionally, the final learning rate is given and the decay should gradually decrease the learning rate to reach the final learning rate just after the training.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard:

`SGD`

optimizer,`learning_rate`

0.01;`SGD`

optimizer,`learning_rate`

0.01,`momentum`

0.9;`SGD`

optimizer,`learning_rate`

0.1;`Adam`

optimizer,`learning_rate`

0.001;`Adam`

optimizer,`learning_rate`

0.01;`Adam`

optimizer,`exponential`

decay,`learning_rate`

0.01 and`learning_rate_final`

0.001;`Adam`

optimizer,`polynomial`

decay,`learning_rate`

0.01 and`learning_rate_final`

0.0001.

Deadline: Mar 24, 23:59 4 points

Solve the CartPole-v1 environment from the OpenAI Gym, utilizing only provided supervised training data. The data is available in gym_cartpole-data.txt file, each line containing one observation (four space separated floats) and a corresponding action (the last space separated integer). Start with the gym_cartpole.py.

The solution to this task should be a *model* which passes evaluation on random
inputs. This evaluation is performed by running the
gym_cartpole_evaluate.py,
which loads a model and then evaluates it on 100 random episodes (optionally
rendering if `--render`

option is provided). In order to pass, you must achieve
an average reward of at least 475 on 100 episodes. Your model should have either
one or two outputs (i.e., using either sigmoid of softmax output function).

*The size of the training data is very small and you should consider
it when designing the model.*

When submitting your model to ReCodEx, submit:

- one file with the model itself (with
`h5`

suffix), - the source code (or multiple sources) used to train the model (with
`py`

suffix), and possibly indicating teams.

Deadline: Mar 31, 23:59 6 points

You will learn how to implement three regularization methods in this assignment. Start with the mnist_regularization.py template and implement the following:

- Allow using dropout with rate
`args.dropout`

. Add a dropout layer after the first`Flatten`

and also after all`Dense`

hidden layers (but not after the output layer). - Allow using L2 regularization with weight
`args.l2`

. Use`tf.keras.regularizers.L1L2`

as a regularizer for all kernels and biases of all`Dense`

layers (including the last one). - Allow using label smoothing with weight
`args.label_smoothing`

. Instead of`SparseCategoricalCrossentropy`

, you will need to use`CategoricalCrossentropy`

which offers`label_smoothing`

argument.

In ReCodEx, there will be three tests (one for each regularization methods) and you will get 2 points for passing each one.

In addition to submitting the task in ReCodEx, also run the following variations and observe the results in TensorBoard (notably training, development and test set accuracy and loss):

- dropout rate
`0`

,`0.3`

,`0.5`

,`0.6`

,`0.8`

; - l2 regularization
`0`

,`0.001`

,`0.0001`

,`0.00001`

; - label smoothing
`0`

,`0.1`

,`0.3`

,`0.5`

.

Deadline: Mar 31, 23:59 2 points

Your goal in this assignment is to implement model ensembling.
The mnist_ensemble.py
template trains `args.models`

individual models, and your goal is to perform
an ensemble of the first model, first two models, first three models, …, all
models, and evaluate their accuracy on the **development set**.

In addition to submitting the task in ReCodEx, run the script with
`args.models=7`

and look at the results in `mnist_ensemble.out`

file.

Deadline: Mar 31, 23:59 4-9 points

This assignment introduces first NLP task. Your goal is to implement a model which is given Czech lowercased text and tries to uppercase appropriate letters. To load the dataset, use uppercase_data.py module which loads (and if required also downloads) the data. While the training and the development sets are in correct case, the test set is lowercased.

This is an *open-data task*, where you submit only the uppercased test set
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit **exactly one .txt file** and **at least one .py file**.

The task is also a *competition*. Everyone who submits a solution which achieves
at least *96.5%* accuracy will get 4 points; the rest 5 points will be distributed
depending on relative ordering of your solutions, i.e., the best solution will
get total 9 points, the worst solution (but at least with 96.5% accuracy) will
get total 4 points. The accuracy is computed per-character and can be evaluated
by uppercase_eval.py
script.

You may want to start with the uppercase.py template, which uses the uppercase_data.py to load the data, generate an alphabet of given size containing most frequent characters, and generate sliding window view on the data. The template also comments on possibilities of character representation.

**Do not use RNNs or CNNs in this task (if you have doubts, contact me).**

Deadline: Apr 07, 23:59 5 points

To pass this assignment, you will learn to construct basic convolutional
neural network layers. Start with the
mnist_cnn.py
template and assume the requested architecture is described by the `cnn`

argument, which contains comma-separated specifications of the following layers:

`C-filters-kernel_size-stride-padding`

: Add a convolutional layer with ReLU activation and specified number of filters, kernel size, stride and padding. Example:`C-10-3-1-same`

`CB-filters-kernel_size-stride-padding`

: Same as`C-filters-kernel_size-stride-padding`

, but use batch normalization. In detail, start with a convolutional layer**without bias and activation**, then add batch normalization layer, and finally ReLU activation. Example:`CB-10-3-1-same`

`M-kernel_size-stride`

: Add max pooling with specified size and stride. Example:`M-3-2`

`R-[layers]`

: Add a residual connection. The`layers`

contain a specification of at least one convolutional layer (but not a recursive residual connection`R`

). The input to the specified layers is then added to their output. Example:`R-[C-16-3-1-same,C-16-3-1-same]`

`F`

: Flatten inputs. Must appear exactly once in the architecture.`D-hidden_layer_size`

: Add a dense layer with ReLU activation and specified size. Example:`D-100`

An example architecture might be `--cnn=CB-16-5-2-same,M-3-2,F,D-100`

.

After a successful ReCodEx submission, you can try obtaining the best accuracy
on MNIST and then advance to `cifar_competition`

.

Deadline: Apr 07, 23:59 5-10 points

The goal of this assignment is to devise the best possible model for CIFAR-10. You can load the data using the cifar10.py module. Note that the test set is different than that of official CIFAR-10.

This is an *open-data task*, where you submit only the test set labels
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit **exactly one .txt file** and **at least one .py file**.

The task is also a *competition*. Everyone who submits a solution which achieves
at least *60%* test set accuracy will get 5 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. Note that my solutions usually
need to achieve at least ~73% on the development set to score 60% on the test set.

You may want to start with the cifar_competition.py template.

Deadline: Apr 14, 23:59 4 points

In this assignment you will implement a model with multiple inputs, multiple outputs, manual batch preparation, and manual evaluation. Start with the mnist_multiple.py template and:

- The goal is to create a model, which given two input MNIST images predicts if the digit on the first one is larger than on the second one.
- The model has three outputs:
- direct prediction of the required value,
- label prediction for the first image,
- label prediction for the second image.

- In addition to direct prediction, you can predict labels for both images
and compare them -- an
*indirect prediction*. - You need to implement:
- the model, using multiple inputs, outputs, losses, and metrics;
- generation of two-image batches using regular MNIST batches,
- computation of direct and indirect prediction accuracy.

Deadline: Apr 14, 23:59 5-11 points

This assignment is a simple image segmentation task. The data for this task is
available through the fashion_masks_data.py
The inputs consist of 28×28 greyscale images of ten classes of clothing,
while the outputs consist of the correct class *and* a pixel bit mask.

This is an *open-data task*, where you submit only the test set annotations
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit **exactly one .txt file** and

`.py`

file`.zip`

files you submit will be extracted first.Performance is evaluated using mean IoU, where IoU for a single example is defined as an intersection of the gold and system mask divided by their union (assuming the predicted label is correct; if not, IoU is 0). The evaluation (using for example development data) can be performed by fashion_masks_eval.py script.

The task is a *competition* and the points will be awarded depending on your
test set score. If your test set score surpasses 75%, you will be
awarded 5 points; the rest 6 points will be distributed depending on relative
ordering of your solutions. *Note that quite a straightfoward model surpasses
80% on development set after an hour of computation (and 90% after several
hours), so reaching 75% is not that difficult.*

You may want to start with the fashion_masks.py template, which loads the data and generates test set annotations in the required format (one example per line containing space separated label and mask, the mask stored as zeros and ones, rows first).

Deadline: ~~Apr 21, 23:59~~ Apr 22, 23:59
5-10 points

The goal of this assignment is to try transfer learning approach to train image recognition on a small dataset with 42 classes. You can load the data using the caltech42.py module. In addition to the training data, you should use a MobileNet v2 pretrained network (details in caltech42_competition.py).

This is an *open-data task*, where you submit only the test set labels
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit **exactly one .txt file** and **at least one .py file**.

The task is also a *competition*. Everyone who submits a solution which achieves
at least *94%* test set accuracy will get 5 points; the rest 5 points will be distributed
depending on relative ordering of your solutions.

You may want to start with the caltech42_competition.py template.

Deadline: ~~Apr 21, 23:59~~ Apr 22, 23:59
6 points

The goal of this assignment is to introduce recurrent neural networks, manual TensorBoard log collection, and manual gradient clipping. Considering recurrent neural network, the assignment shows convergence speed and illustrates exploding gradient issue. The network should process sequences of 50 small integers and compute parity for each prefix of the sequence. The inputs are either 0/1, or vectors with one-hot representation of small integer.

Your goal is to modify the sequence_classification.py template and implement the following:

- Use specified RNN cell type (
`SimpleRNN`

,`GRU`

and`LSTM`

) and dimensionality. - Process the sequence using the required RNN.
- Use additional hidden layer on the RNN outputs if requested.
- Implement gradient clipping if requested.

In addition to submitting the task in ReCodEx, please also run the following variations and observe the results in TensorBoard. Concentrate on the way how the RNNs converge, convergence speed, exploding gradient issues and how gradient clipping helps:

`--rnn_cell=SimpleRNN --sequence_dim=1`

,`--rnn_cell=GRU --sequence_dim=1`

,`--rnn_cell=LSTM --sequence_dim=1`

- the same as above but with
`--sequence_dim=2`

- the same as above but with
`--sequence_dim=10`

`--rnn_cell=LSTM --hidden_layer=50 --rnn_cell_dim=30 --sequence_dim=30`

and the same with`--clip_gradient=1`

- the same as above but with
`--rnn_cell=SimpleRNN`

- the same as above but with
`--rnn_cell=GRU --hidden_layer=150`

Deadline: Apr 28, 23:59 3 points

In this assignment you will create a simple part-of-speech tagger. For training and evaluation, we will use Czech dataset containing tokenized sentences, each word annotated by gold lemma and part-of-speech tag. The morpho_dataset.py module (down)loads the dataset and can generate batches.

Your goal is to modify the tagger_we.py template and implement the following:

- Use specified RNN cell type (
`GRU`

and`LSTM`

) and dimensionality. - Create word embeddings for training vocabulary.
- Process the sentences using bidirectional RNN.
- Predict part-of-speech tags. Note that you need to properly handle sentences of different lengths in one batch using masking.

After submitting the task to ReCodEx, continue with `tagger_cle_rnn`

assignment.

Deadline: Apr 28, 23:59 3 points

This task is a continuation of `tagger_we`

assignment. Using the
tagger_cle_rnn.py
template, implement the following features in addition to `tagger_we`

:

- Create character embeddings for training alphabet.
- Process unique words with a bidirectional character-level RNN, concatenating the results.
- Properly distribute the CLEs of unique words into the batches of sentences.
- Generate overall embeddings by concatenating word-level embeddings and CLEs.

Once submitted to ReCodEx, continue with `tagger_cle_cnn`

assignment. Additionaly, you should experiment with the effect of CLEs compared
to plain `tagger_we`

, and the influence of their dimensionality.
Note that `tagger_we`

has by default word embeddings twice the
size of word embeddings in `tagger_cle_rnn`

.

Deadline: Apr 28, 23:59 2 points

This task is a continuation of `tagger_cle_rnn`

assignment. Using the
tagger_cle_cnn.py
template, implement the following features compared to `tagger_cle_rnn`

:

- Instead of using RNNs to generate character-level embeddings, process embedded unique words with 1D convolutional filters with kernel sizes of 2 to some given maximum. To obtain a fixed-size representation, perform global max-pooling over the whole word.

Deadline: Apr 28, 23:59 7-12 points

This assignment is a competition task in speech recognition area. Specifically, your goal is to predict a sequence of letters given a spoken utterance. We will be using TIMIT corpus, with input sound waves passed through the usual preprocessing – computing Mel-frequency cepstral coefficients (MFCCs). You can repeat exactly this preprocessing on a given audio using the timit_mfcc_preprocess.py script.

Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset using the timit_mfcc.py module.

This is an *open-data task*, where you submit only the test set labels
together with the training script (which will not be executed, it will be
only used to understand the approach you took, and to indicate teams).
Explicitly, submit **exactly one .txt file** and **at least one .py file**.

The task is also a *competition*. The evaluation is performed by computing edit
distance to the gold letter sequence, normalized by its length (i.e., exactly as
`tf.edit_distance`

). Everyone who submits a solution which achieves
at most *50%* test set edit distance will get 7 points; the rest 5 points will be distributed
depending on relative ordering of your solutions. An evaluation (using for example development data)
can be performed by
speech_recognition_eval.py.

You should start with the speech_recognition.py template.

- To perform speech recognition, you should use CTC loss for training and CTC beam search decoder for prediction. Both the CTC loss and CTC decoder employ sparse tensor – therefore, start by studying them.
- The basic architecture:
- converts target letters into sparse representation,
- use a bidirectional RNN and an output linear layer without activation,
- compute CTC loss (
`tf.nn.ctc_loss`

), - if required, perform decoding by a CTC decoder (
`tf.nn.ctc_beam_search_decoder`

) and possibly evaluate results using normalized edit distance (`tf.edit_distance`

).