SIS code: 

NPFL099 – Dialogue Systems

This is the new course for the '21/22 Fall semester. You can find slides from last year on the archived old page.


This course presents advanced problems and current state-of-the-art in the field of dialogue systems, voice assistants, and conversational systems (chatbots). After a brief introduction into the topic, the course will focus mainly on the application of machine learning – especially deep learning/neural networks – in the individual components of the traditional dialogue system architecture as well as in end-to-end approaches (joining multiple components together).

This course is a follow-up to the course NPFL123 Dialogue Systems, but can be taken independently – important basics will be repeated. All required deep learning concepts will be explained, but only briefly, so some machine learning background is recommended.



The course will be taught in English, but we're happy to explain in Czech, too.

Time & Place

In-person lectures and labs take place in the room S10 (Malá Strana, 1st floor).

  • Lectures: Mon 15:40
  • Labs: Mon 17:20 (every other week, starts on 11 October)

In addition, we plan to stream both lectures and lab instruction over Zoom and make the recordings available on Youtube (under a private link, on request). We'll do our best to provide a useful experience, just note that audio quality might not be ideal.

  • Zoom meeting ID: 953 7826 3918
  • Password is the SIS code of this course (capitalized)

If you can't access Zoom, email us or text us on Slack.

There's also a Slack workspace you can use to discuss assignments and get news about the course. Please contact us by email if you want to join and haven't got an invite yet.

Passing the course

To pass this course, you will need to take an exam and do lab homeworks, which will amount to training an end-to-end neural dialogue system and writing a report on it. See more details here.

Topics covered

Dialogue systems schema
  • Brief introduction into dialogue systems
    • dialogue systems applications
    • basic components of dialogue systems
    • knowledge representation in dialogue systems
    • data and evaluation
  • Language understanding (NLU)
    • semantic representation of utterances
    • statistical methods for NLU
  • Dialogue management
    • dialogue representation as a (Partially Observable) Markov Decision Process
    • dialogue state tracking
    • action selection
    • reinforcement learning
    • user simulation
    • deep reinforcement learning (using neural networks)
  • Response generation (NLG)
    • introduction to NLG, basic methods (templates)
    • generation using neural networks
  • End-to-end dialogue systems (one network to handle everything)
    • sequence-to-sequence systems
    • memory/attention-based systems
    • pretrained language models
  • Open-domain systems (chatbots)
    • generative systems (sequence-to-sequence, hierarchical models)
    • information retrieval
    • ensemble systems
  • Multimodal systems
    • component-based and end-to-end systems
    • image classification
    • visual dialogue


PDFs with lecture slides will appear here shortly before each lecture (more details on each lecture are on a separate tab). You can also check out last year's lecture slides.

1. Introduction Slides

2. Data & Evaluation Slides Dataset Exploration

3. Neural Nets Basics Slides

4. Training Neural Nets Slides DailyDialogue Loader


A list of recommended literature is on a separate tab.


1. Introduction

 4 October Slides

  • What are dialogue systems
  • Common usage areas
  • Task-oriented vs. non-task oriented systems
  • Closed domain, multi-domain, open domain
  • System vs. user initiative in dialogue
  • Standard dialogue systems components
  • Research forefront
  • TTS audio examples: formant, concatenative, HMMs, neural

2. Data & Evaluation

 11 October Slides Dataset Exploration

  • Types of dialogue datasets
  • Dataset splits
  • Intrinsic vs. extrinsic evaluation
  • Objective vs. subjective evaluation
  • Evaluation metrics for dialogue components

3. Neural Nets Basics

 18 October Slides

  • machine learning as function approximation
  • machine learning problems (classification, regression, structured prediction)
  • input features (embeddings)
  • network shapes -- feed forward, CNNs, RNNs, attention, Transformer

4. Training Neural Nets

 25 October Slides DailyDialogue Loader

  • supervised training: gradient descent, backpropagation, cost
  • learning rate, schedules & optimizers
  • self-supervised: autoencoding, language modelling
  • unsupervised: GANs, clustering
  • reinforcement learning (short intro)

Homework Assignments

There will be 7 homework assignments, typically for a maximum of 10 points (the last one will be for 20 points). Please see details on grading and deadlines on a separate tab.

Assignments should be submitted via Git – see instructions on a separate tab.

All deadlines are 23:59:59 CET/CEST.


1. Dataset Exploration

2. DailyDialogue Loader

1. Dataset Exploration

 Presented: 11 October, Deadline: 27 October


Your task is to select one dialogue dataset, download and explore it.

  1. Find out (and mention in your report):
  • What kind of data it is (domain, modality)
  • How it was collected
  • What kind of dialogue system or dialogue system component it's designed for
  • What kind of annotation is present (if any at all), how was it obtained (human/automatic)
  • What format is it stored in
  • What is the license

Here you can use the dataset description/paper that came out with the data. The papers are linked from the dataset webpages or from here. If you can't find a paper, ask us and we'll try to help.

  1. Measure (and enter into your report):
  • Total data length (dialogues, turns, sentences, words)
  • Mean/std dev dialogue lengths (dialogues, turns, sentences, words)
  • Vocabulary size
  • User/ system entropy (or just overall entropy, if no user/system distinction can be made)

Here you should use your own programming skills.

  1. Have a closer look at the data and try to make an impression -- does the data look natural? How difficult do you think this dataset will be to learn from? How usable will it be in an actual system? Do you think there's some kind of problem or limitation with the data? Write a short paragraph about this in your report.

Things to submit:

  • A short summary detailing all of your findings (basic info, measurement, impressions) in Markdown as hw01/
  • Your code for analyzing the data as hw01/ or hw01/analysis.ipynb.

See the submission instructions here (create a MFF Gitlab repo and a new merge request)..

Datasets to select from

Primarily one of these:

Further links

Dataset surveys (broader, but shallower than what we're aiming at):

2. DailyDialogue Loader

 Presented: 25 October, Deadline: 10 November


In this assignment, you will work with the DailyDialog dataset. Your task is to create a component that will load the dataset and process the data so it is prepared for model training. This will consist of 2 Python classes -- one to hold the data, and one to prepare training batches.

In later assignments, you will train the GPT-2 model using data provided by this component. Note that this means that other assignments depend on this one.

Data background

DailyDialog is a chit-chat dialogue dataset labeled with intents and emotions. You can find more details in the paper desccribing the dataset.

Each DailyDialg entry consists of:

  • dialog: a list of string features.
  • act: a list of classification labels, e.g., question, commisssive, ...
  • emotion: a list of classification labels, e.g., anger, happiness, ...

The lists are of the same length and the order matters (it's the order of the turns in the dialogue, i.e. 5th entry in the act list corresponds to the 5th entry in the dialog list).

The data contains train, validation and test splits.

Dataset class

Implement a Python class for the dataset (feel free to use Pytorch Dataset, Huggingface datasets, or similar concepts of Tensorflow) that has the following properties:

  • It is able to load the data and process it into individual training examples (context + response + emotion + intent).

  • Each example should be a dictionary of the folowing structure:

        'context': list[str],    # list of utterances preceeding the current utterance 
        'utterance': str, 	    # the string with the current response
        'emotion': int,          # emotion index
        'intent': int            # intent index
    • Note that we will work with a model that takes dialogue context as an input.
    • Therefore, each dialogue of n turns will yield n examples, each with progressively longer context (starting from an empty context, up to n-1 turns of context).
  • It distinguishes between data splits, i.e. it can be parameterized by split type (train, val, test).

  • It can truncate long contexts to k last utterances, where k is a parameter of the class.

Data Loader

Implement a data loader Python class (feel free to use Pytorch DataLoader or similar concepts in Tensorflow) that has the following properties:

  • It is able to yield a batch of examples (a simple list with examples of your Dataset) of a batch size given in the constructor.
  • It will always yield conversations with similar lengths (numbers of tokens) inside the same batch.
  • It will not use the original data order, but will shuffle the examples randomly.
  • Yielding a batch repeatedly will never include the same example twice before all the examples have been processed.

Data loader batch encoding

Machine learning models usually work with numbers and matrices. That is why we also need to convert strings in our batches to integer ids (e.g., tokenize).

Therefore, inside your data loader class, implement a collate function that has the following properties:

  • It is able to work with batches of your Data Loader (lists of examples).
  • It uses GPT2Tokenizer for the tokenization itself.
  • It converts the batches to a single dictionary (output) of the following structure:
    output = {
      'context': list[list[int]], # tokenized context (list of subword ids from all preceding dialogue turns, separated by the GPT-2 special `<|endoftext|>` token) for all batch examples 
      'utterance': list[list[int]], # tokenized utterances (list of subword ids from the current dialogue turn) for all batch examples 
      'emotion': list[int], # emotion ids for all batch examples        
      'intent': list[int]   # intent ids for all batch examples            
    where {k : output[k][i] for k in output} should correspond to i-th example of the original input batch

General implementation guidelines

You're free to use any library code that you find helpful, just make sure it installs with pip and add the appropriate requirements.txt file.

We will not restrict you to a certain machine learning framework for this course. However, we strongly recommend you to use Huggingface and PyTorch so you can access the pretrained models easily.

It is OK to use also Tensorflow, but we consider PyTorch the preferred framework. This means that some examples in the future might contain PyTorch-specific notes, the reference implementations will be in PyTorch as well. Also, if you run into problems with Tensorflow, we might not be able to help you quickly.

Things to submit:

  • Your dataset and loader class implementations, both inside data/
  • A testing script -- either or hw02.ipynb (your choice), which will use your two classes, will load 3 batches from the training set, each of size 5, and print out both their string and token id representations. Make sure you fix your random seed at the start, so the results are repeatable!
  • A requirements.txt file listing all the required libraries.

Homework Submission Instructions

All homework assignments will be submitted using a Git repository on MFF GitLab.

Warning: This is not yet final, the instructions might change!

We provide an easy recipe to set up your repository below:

Creating the repository

  1. Log into your MFF gitlab account. Your username and password should be the same as in the CAS, see this.

  2. Create a new project (e.g. called NPFL099). Choose the Private visibility level.

     New project -> Create blank project
  3. Invite us (@duseo7af, @hudecekv) to your project so we can see it. Please give us "Reporter" access level.

     Members -> Invite Member
  4. Clone the newly created repository.

  5. Change into the cloned directory and run

git remote show origin

You should see these two lines:

* remote origin
  Fetch URL:
  Push  URL:

  1. You're all set!

Submitting the homework assignment

  1. Make sure you're on your master branch
git checkout master
  1. Checkout new branch:
git checkout -b hw-XX
  1. Solve the assignment :)

  2. Add new files (if applicable) and commit your changes:

git add hwXX/
git commit -am "commit message"
  1. Push to your origin remote repository:
git push origin hw-XX
  1. Create a Merge request in the web interface. Make sure you create the merge request into the master branch in your own forked repository (not into the upstream).

     Merge requests -> New merge request
Merge request
  1. Wait a bit till we check your solution, then enjoy your points :)!
  2. Once approved, merge your changes into your master branch – you might need them for further homeworks (but feel free to branch out from the previous homework in your next one if we're too slow with checking).

Exam Question Pool

The exam will have 10 questions (a pool of potential questions will be released by the end of the semester). Each question counts for 10 points.

See the Grading tab for details on grading.

Course Grading

To pass this course, you will need to:

  1. Take an exam (a written test covering important lecture content).
  2. Do lab homeworks (implementing an end-to-end dialogue system + other tasks).

Exam test

  • There will be a written exam test at the end of the semester.
  • There will be 10 questions, we expect 2-3 sentences as an answer, with a maximum of 10 points per question.
  • To pass the course, you need to get at least 50% of the total points from the test.
  • We plan to publish a list of possible questions beforehand.

In case the pandemic gets worse by the exam period, there will be a remote alternative for the exam (an essay with a discussion).

Homework assignments

  • There will be 7 homework assignments, introduced every other week.
  • You will submit the homework assignments into a private Gitlab repository (where we will be given access).
  • For each assignment, you will get a maximum of 10 points (the last one is for double points!).
  • All assignments will have a fixed deadline (typically 2-3 weeks).
  • If you submit the assignment after the deadline, you will get:
    • up to 50% of the maximum points if it is less than 2 weeks after the deadline;
    • 0 points if it is more than 2 weeks after the deadline.
  • Note that most assignments depend on each other! That means that if you miss a deadline, you still might need to do an assignment without points in order to score on later assignments.
  • Once we check the submitted assignments, you will see the points you got and the comments from us in:
  • To be allowed to take the exam (which is required to pass the course), you need to get at least 50% of the total points from the assignments.
  • If needed, there will be exam dates in the summer.


The final grade for the course will be a combination of your exam score and your homework assignment score, weighted 3:1 (i.e. the exam accounts for 75% of the grade, the assignments for 25%).


  • Grade 1: >=87% of the weighted combination
  • Grade 2: >=74% of the weighted combination
  • Grade 3: >=60% of the weighted combination
  • An overall score of less than 60% means you did not pass.

In any case, you need >50% of points from the test and >50% of points from the homeworks to pass. If you get less than 50% from either, even if you get more than 60% overall, you will not pass.

No cheating

  • Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty.
  • Discussing homework assignments with your classmates is OK. Sharing code is not OK (unless explicitly allowed); by default, you must complete the assignments yourself.
  • All students involved in cheating will be punished. E.g. if you share your assignment with a friend, both you and your friend will be punished.

Recommended Reading

You should be able to pass the course just by following the lectures, but here are some hints on further reading. There's nothing ideal on the topic as this is a very active research area, but some of these should give you a broader overview.

Recommended, though slightly outdated:

Recommended, but might be a bit too brief:

Further reading:

  • McTear et al.: The Conversational Interface: Talking to Smart Devices. Springer 2016.
    • good, detailed, but slightly outdated
  • Jokinen & McTear: Spoken dialogue systems. Morgan & Claypool 2010.
    • good but outdated, some systems very specific to particular research projects
  • Rieser & Lemon: Reinforcement learning for adaptive dialogue systems. Springer 2011.
    • advanced, slightly outdated, project-specific
  • Lemon & Pietquin: Data-Driven Methods for Adaptive Spoken Dialogue Systems. Springer 2012.
    • ditto
  • Skantze: Error Handling in Spoken Dialogue Systems. PhD Thesis 2007, Chap. 2.
    • good introduction into dialogue systems in general, albeit dated
  • current papers from the field (see links on lecture slides)