Faculty of Mathematics and Physics

The seminar focuses on deeper understanding of selected machine learning methods for students who have already have basic knowledge of machine learning and probability models. The first half of the semester is devoted to methods of unsupervised learning using Bayesian inference (Dirichlet process, Expectation Maximization, Gibbs sampling) and implementation of these methods on selected tasks. Other two lectures will be devoted to inspecting deep neural networks. Further topics are selected according to students interest.

SIS code: NPFL097

Semester: winter

E-credits: 3

Examination: 0/2 C

Guarantor: David Mareček

The seminar is held on Thursday, 9:00 - 10:30 in S1 (fourth floor)

Students are expected to be familiar with basic probabilistic and ML concepts, roughly in the extent of:

In the second half of the course, you should be familiar with the basics of deep-learning methods. I recommend to attend

- NPFL114 - Deep Learning

- There are three programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
- You can obtain 10 point for individual 30-minutes presentation about selected machine learning method or task.
- You pass the course if you obtain at least 20 points.

1. Introduction Slides Warm-Up test

2. Beta-Bernoulli probabilistic model Beta-Bernoulli Beta distribution

3. Dirichlet-Categorical probabilistic model Dirichlet-Categorical

3. Modeling document collections, Categorical Mixture models, Expectation-Maximization Document collections Categorial Mixture Models

4. Gibbs Sampling, Latent Dirichlet allocation Gibbs Sampling Latent Dirichlet allocation Latent Dirichlet Allocation

5. Text segmentation Bayessian inference with Tears Unuspervised text segmentation

6. Finding motifs Finding Motifs in DNA Finding motifs in DNA

Oct 4

- Course overview Slides
- revision of the basics of probability and machine learning theory Warm-Up test

Oct 11

- answering questions from the warm-up test
- slides for Beta-Bernoulli models by Carl Edward Rasmussen from University of Cambridge
- How to compute expected value of the Beta distribution can be found here: Beta distribution

Oct 18

- slides for Dirichlet-Categorical by Carl Edward Rasmussen from University of Cambridge
- Web application showing the Beta-Bernouli distribution and many others can be found at RandomServices.com. models by Carl Edward Rasmussen from University of Cambridge

Oct 25

- slides for Document collections and Categorial Mixture Models by Carl Edward Rasmussen from University of Cambridge
- Expectation Maximization is also very well described in Chapter 9 in the Bishop's book: Pattern Recognition and Machine Learning

Nov 8

- slides for Gibbs Sampling and Latent Dirichlet allocation by Carl Edward Rasmussen from University of Cambridge
- see Chapter 11 in the Bishop's book: Pattern Recognition and Machine Learning
- Assignment 1: Latent Dirichlet Allocation

Nov 29

- Unsupervised segmentation of texts in languages which does not use spaces between words
- tutorial Bayessian inference with Tears by Kevin Knight (2009)
- Assignment 2: Unuspervised text segmentation

Dec 6 Finding Motifs in DNA

- Assignemnt 3: Finding motifs in DNA

Dec 13

- Deep neural networks in NLP as a BlackBox
- What is being learned in their hiden states?
- How the attention mechanism works?

Dec 20

- Sentence structures in NLP
- Utilization of sentence structures in downstream NLP tasks
- Are some kinds of sentence structures learned latently inside deep networks?

Deadline: Nov 14 23:59 10 points Duration: 2h

- Instructions and questions: lda-assignment.pdf,
- Data: lda-data.zip

Deadline: Dec 5 23:59 10 points Duration: 2h

Deadline: Dec 19 23:59 10 points Duration: 2h