Faculty of Mathematics and Physics

The seminar focuses on deeper understanding of selected unsupervised machine learning methods for students who have already have basic knowledge of machine learning and probability models. The first half of the semester is devoted to methods of unsupervised learning using Bayesian inference (Dirichlet-Categorical models, Mixture of Categoricals, Mixture of Gaussians, Expectation Maximization, Gibbs sampling) and implementation of these methods on selected tasks. Other lectures will be devoted to clustering methods, componet analysis and inspecting deep neural networks.

SIS code: NPFL097

Semester: winter

E-credits: 3

Examination: 1/1 C

Guarantor: David Mareček

The lectures in Czech are given on Tuesdays, 12:20 - 13:50 in S1 (fourth floor)

The lectures in English are given on Wednesdays 12:20 - 13:50 (write me an e-mail if interested)

Students are expected to be familiar with basic probabilistic and ML concepts, roughly in the extent of:

In the second half of the course, you should be familiar with the basics of deep-learning methods. I recommend to attend

- NPFL114 - Deep Learning

- There are three programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
- You can obtain 10 point for individual 30-minutes presentation about selected machine learning method or task.
- You pass the course if you obtain at least 20 points.

1. Introduction Slides Warm-Up test

2. Beta-Bernoulli probabilistic model Beta-Bernoulli Beta distribution

3. Dirichlet-Categorical probabilistic model Dirichlet-Categorical Document collections

4. Modeling document collections, Categorical Mixture models, Expectation-Maximization Categorial Mixture Models Gibbs Sampling Gibbs Sampling for Bayesian mixture Expectation Maximization Gibbs Sampling

5. Gibbs Sampling, Latent Dirichlet allocation Latent Dirichlet allocation Algorithms for LDA and Mixture of Categoricals Latent Dirichlet Allocation

6. Working on and discussing assignment1

7. Text segmentation Chinese Restaurant Process Bayessian inference with Tears Unuspervised text segmentation

8. Working on and discussing assignment2

9. Mixture of Gaussians and other clustering methods K-Means and Gaussian Mixture Models

10. Working on and discussing assignment2

11. Inspecting Neural Networks

12. Latent learning of POS, word-alignment, and depednency structures

Feb 25

- Course overview Slides
- revision of the basics of probability and machine learning theory Warm-Up test

Mar 3

- answering questions from the warm-up test
- slides for Beta-Bernoulli models by Carl Edward Rasmussen from University of Cambridge
- How to compute expected value of the Beta distribution can be found here: Beta distribution
- Web application showing the Beta-Bernouli distribution and many others can be found at RandomServices.com. models by Carl Edward Rasmussen from University of Cambridge

Mar 10

- slides for Dirichlet-Categorical and Document collections by Carl Edward Rasmussen from University of Cambridge

Mar 17

- slides for Categorial Mixture Models and Gibbs Sampling and Gibbs Sampling for Bayesian mixture and Expectation Maximization by Carl Edward Rasmussen from University of Cambridge
- Gibbs sampling from the bivariate normal distribution: Gibbs Sampling
- Expectation Maximization is also very well described in Chapter 9 in the Bishop's book: Pattern Recognition and Machine Learning

Mar 24

- slides for Latent Dirichlet allocation by Carl Edward Rasmussen from University of Cambridge
- slides for Algorithms for LDA and Mixture of Categoricals
- see also Chapter 11 in the Bishop's book: Pattern Recognition and Machine Learning
- Assignment 1: Latent Dirichlet Allocation

Mar 31

Apr 7

- Unsupervised segmentation of texts in languages which does not use spaces between words Chinese Restaurant Process
- tutorial Bayessian inference with Tears by Kevin Knight (2009)
- Assignment 2: Unuspervised text segmentation

Apr 14

Apr 21

- slides K-Means and Gaussian Mixture Models by David Rosenberg from University of New York

Apr 28

May 5

- Deep neural networks in NLP as a BlackBox
- What is being learned in their hiden states?
- How the attention mechanism works?

May 12

- Word embeddings vs. POS tags
- Word alignment vs. attention mechanism
- Dependency parsing vs. self-attention mechanism

May 19

Deadline: Dec 5 23:59 10 points

- Instructions and questions: lda-assignment.pdf,
- Data: lda-data.zip

Deadline: Dec 20 23:59 10 points

- You will get English texts where the spaces between words were removed. The task is to use Bayessian inference to bring the spaces back in a completely unsupervised way. The task is relevant e.g. for Chinese, Japanese, Thai, or other languages not separating words. English was chosen so that everyone could see how good his/her results are. In case you have not attended the lecture, you can find all necesary information in Kewin Knight's tutorial Bayessian inference with tears. Try several hyperparameter combination to gain as good results as possible. Try also the simulated annealing method to delay the Gibbs sampling convergence. slides
- Data: eng-input.txt