Faculty of Mathematics and Physics

The seminar focuses on deeper understanding of selected unsupervised machine learning methods for students who already have basic knowledge of machine learning and probability models. The first half of the semester is devoted to methods of unsupervised learning using Bayesian inference (Dirichlet-Categorical models, Mixture of Categoricals, Mixture of Gaussians, Expectation Maximization, Gibbs sampling) and implementation of these methods on selected tasks. Other lectures will be devoted to clustering methods, componet analysis and unsupervised inspecting deep neural networks.

SIS code: NPFL097

Semester: winter

E-credits: 3

Examination: 1/1 C

Guarantor: David Mareček

The course will be taught online over Zoom, given the current pandemic. All lectures will be recorded so you can catch up later.

- The lectures in Czech are given on
**Mondays 10:40 - 12:10**, the first lecture is on**Oct 5**. - The lectures in English are given on
**Fridays 10:40 - 12:10**, the first lecture is on**Oct 9**.

All enrolled students will get a Zoom link via email. If you want to take part and have not officialy enrolled, email me.

Students are expected to be familiar with basic probabilistic concepts, roughly in the extent of:

- NPFL067 - Statistical methods in NLP I

In the second half of the course, it will be an advantage for you if you know the basics of deep-learning methods. I recommend to attend

- NPFL114 - Deep Learning

- There are three programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
- You can obtain 10 points for an individual 30-minutes presentation about selected machine learning method or task or about a novel approach in the field.
- You pass the course if you obtain at least 20 points.

1. Introduction Slides Warm-Up test

2. Beta-Bernoulli probabilistic model Beta-Bernoulli Beta distribution

3. Dirichlet-Categorical probabilistic model, Modeling document collections Dirichlet-Categorical Document collections Categorical Mixture Models Expectation-Maximization

4. Bayesian Mixture Models, Gibbs Sampling, Latent Dirichlet Allocation Gibbs Sampling Gibbs Sampling for Bayesian mixture Latent Dirichlet allocation Algorithms for LDA and Mixture of Categoricals Gibbs Sampling Latent Dirichlet Allocation

5. Chinese Restaurant Process Chinese Restaurant Process Bayessian inference with Tears Chinese Segmentation

6. Unsupervised POS tagging, Word-Alignment, and Dependency Parsing Tagging, Alignment, Parsing

Oct 05 Oct 09 (in English)

- Course overview Slides
- revision of the basics of probability and machine learning theory Warm-Up test

Oct 19 Oct 16 (in English)

- slides for Beta-Bernoulli models by Carl Edward Rasmussen from University of Cambridge
- How to compute expected value of the Beta distribution can be found here: Beta distribution
- Web application showing the Beta-Bernouli distribution from the Coin Experiment is here: Beta Coin Experiment.
- Beta distribution online demo: https://demonstrations.wolfram.com/BetaDistribution/.

Oct 26 Oct 30 (in English)

- slides for Dirichlet-Categorical and Document collections and Categorical Mixture Models by Carl Edward Rasmussen from University of Cambridge Expectation-Maximization

Nov 02 Nov 06 (in English)

- slides for Gibbs Sampling and Gibbs Sampling for Bayesian mixture and Latent Dirichlet allocation by Carl Edward Rasmussen from University of Cambridge
- slides for Algorithms for LDA and Mixture of Categoricals
- see also Chapter 11 in the Bishop's book: Pattern Recognition and Machine Learning
- Gibbs sampling from the bivariate normal distribution: Gibbs Sampling

Nov 09 Nov 13 (in English)

- Assignment 1: Latent Dirichlet Allocation

Nov 16 Nov 20 (in English)

- Slides for Chinese Restaurant Process and Unsupervised segmentation of texts Chinese Restaurant Process
- tutorial Bayessian inference with Tears by Kevin Knight (2009)

Nov 23 Nov 27 (in English)

- Assignment 2: Chinese Segmentation

Nov 30 Dec 04 (in English)

- Slides: Tagging, Alignment, Parsing

Deadline: Nov 30 23:59 Deadline: Dec 04 23:59 (for English students) 10 points

- Instructions: assignment1.pdf
- Preprocessing procedures: lda-data.py

Deadline: Dec 14 23:59 Deadline: Dec 18 23:59 (for English students) 10 points

- Instructions: assignment2.pdf
- Input data: data_small.txt
- Evaluation script eval.pl
- Evaluation data: data_small_gold.txt