Deep Reinforcement Learning – Summer 2025/26

The objective of this course is to provide a comprehensive introduction to deep reinforcement learning, a powerful paradigm that combines reinforcement learning with deep neural networks. This approach has demonstrated super-human capabilities in diverse domains, including complex games like Go and chess, optimizing real-world systems like datacenter cooling, improving chip design, automated discovery of superior algorithms and neural network architectures, and advancing robotics and large language models.

The course focuses both on the theory, spanning from fundamental concepts to recent advancements, as well as on practical implementations in Python and PyTorch (students implement and train agents controlling robots, mastering video games, and planing in complex board games). Basic programming and deep learning skills are expected (for example from the Deep Learning course).

Students work either individually or in small teams on weekly assignments, including competition tasks, where the goal is to obtain the highest performance in the class.

Optionally, you can obtain a micro-credential after passing the course.

About

SIS code: NPFL139
Semester: summer
E-credits: 8
Examination: 3/4 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

These coordinates are still preliminary.

  • lecture: the lecture is held on Tuesday 9:00 in S5; first lecture is on Feb 17
  • practicals: the practicals take place on Thursday 14:00 in S5; first practicals are on Feb 19
  • consultations: entirely optional consultations take place on Wednesday 14:00 in S5; first consultations are on Feb 25

All lectures and practicals will be recorded and available on this website.

Lectures

1. Introduction to Reinforcement Learning Slides PDF Slides

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

A micro-credential (aka micro-certificate) is a digital certificate attesting that you have gained knowledge and skills in a specific area. It should be internationally recognized and verifiable using an online EU-wide verification system.

A micro-credential can be obtained both by the university students and external participants.

External Participants

If you are not a university student, you can apply to the Reinforcement Learning micro-credential course here and then attend the course along the university students. Upon successfully passing the course, a micro-credential is issued.

The price of the course is 5 000 Kč. The lectures take place for 14 weeks from Feb 17 to May 22, the examination period runs until the end of September.

If you applied, note that the the organization of the course and the setup instructions will be described on the first lecture; you do not need to do anything else until that time.

University Students

If you have passed the course (in academic year 2025/26 or later) as a part of your study plan, you can obtain a micro-credential by paying only an administrative fee of 300 Kč; if you passed the course but it is not in your study plan, the administrative fee is 500 Kč. Detailed instructions how to get the micro-credential will be sent to the course participants during the examination period.


The lecture content, including references to study materials.

The main study material is the Reinforcement Learning: An Introduction; second edition by Richard S. Sutton and Andrew G. Barto (referred to as RLB). It is available online and also as a hardcopy.

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Reinforcement Learning

 Feb 17 Slides PDF Slides

Introduction to Reinforcement Learning

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.11, Gymnasium, and PyTorch. Instructions how to install the exact versions of these packages will be added later.

Teamwork

Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are allowed to share code and submit identical solutions. Note that all students involved in cheating will be punished, so if you share your source code with a friend, both you and your friend will be punished. That also means that you should never publish your solutions.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you should submit a trained agent and a Python source capable of running it.

Furthermore, please also include the Python source and hyperparameters you used to train the submitted model. But be careful that there still must be exactly one Python source with a line starting with def main(.

Do not forget about the maximum allowed model size and time and memory limits.

Competition Evaluation

  • Before the deadline, ReCodEx prints the exact performance of your agent, but only if it is worse than the baseline.

    If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached performance.

  • After the first deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.

  • After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.

Repeated Participation in Competitions

  • If a participant got non-zero points for a competition task already in previous years, they are treated slightly differently. Namely, every team with one or more returning participants still get competition points, but
    • the returning team results are not shown on the slides on the practicals;
    • the returning team results are shown in italics in ReCodEx;
    • the returning team results are not used to compute the thresholds for competition points.

What Is Allowed

  • Unless stated otherwise, you can use any algorithm to solve the competition task at hand, but the implementation must be created by you and you must understand it fully. You can of course take inspiration from any paper or existing implementation, but please reference it in that case.
  • PyTorch and JAX are available in ReCodEx (but there are no GPUs).

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that all surplus points (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available, and if you solve all the assignments (any non-zero amount of points counts as solved), you automatically pass the exam with grade 1.

To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the last lecture). In addition, you can get surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.

Exam Questions