Selected Problems in Machine Learning

The seminar focuses on deeper understanding of selected machine learning methods for students who have already have basic knowledge of machine learning and probability models. The first half of the semester is devoted to methods of unsupervised learning using Bayesian inference (Dirichlet-Categorical models, Mixture of Categoricals, Mixture of Gaussians, Expectation Maximization, Gibbs sampling) and implementation of these methods on selected tasks. Other two lectures will be devoted to inspecting deep neural networks. Further topics are selected according to students interest.

About

SIS code: NPFL097
Semester: winter
E-credits: 3
Examination: 0/2 C
Guarantor: David Mareček

Timespace Coordinates

The seminar is held on Thursday, 9:00 - 10:30 in S1 (fourth floor)

Course prerequisities

Students are expected to be familiar with basic probabilistic and ML concepts, roughly in the extent of:

  • NPFL067 - Statistical methods in NLP I,
  • NPFL054 - Introduction to Machine Learning (in NLP).

In the second half of the course, you should be familiar with the basics of deep-learning methods. I recommend to attend

Course passing requirements

  • There are three programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
  • You can obtain 10 point for individual 30-minutes presentation about selected machine learning method or task.
  • You pass the course if you obtain at least 20 points.

Lectures

1. Introduction Slides Warm-Up test

2. Beta-Bernoulli probabilistic model Beta-Bernoulli Beta distribution

3. Dirichlet-Categorical probabilistic model Dirichlet-Categorical Document collections

4. Modeling document collections, Categorical Mixture models, Expectation-Maximization Categorial Mixture Models Gibbs Sampling Gibbs Sampling for Bayesian mixture Expectation Maximization Gibbs Sampling

5. Gibbs Sampling, Latent Dirichlet allocation Latent Dirichlet allocation Algorithms for LDA and Mixture of Categoricals Latent Dirichlet Allocation

6. Working on and discussing assignment1

7. Text segmentation Chinese Restaurant Process Bayessian inference with Tears Unuspervised text segmentation

8. Working on and discussing assignment2

9. Mixture of Gaussians and other clustering methods

10. Inspecting Neural Networks

11. Sentence Structures

1. Introduction

 Oct 4

  • Course overview Slides
  • revision of the basics of probability and machine learning theory Warm-Up test

2. Beta-Bernoulli probabilistic model

 Oct 11

  • answering questions from the warm-up test
  • slides for Beta-Bernoulli models by Carl Edward Rasmussen from University of Cambridge
  • How to compute expected value of the Beta distribution can be found here: Beta distribution
  • Web application showing the Beta-Bernouli distribution and many others can be found at RandomServices.com. models by Carl Edward Rasmussen from University of Cambridge

3. Dirichlet-Categorical probabilistic model

 Oct 18

4. Modeling document collections, Categorical Mixture models, Expectation-Maximization

 Oct 25

5. Gibbs Sampling, Latent Dirichlet allocation

 Nov 8

6. Working on and discussing assignment1

 Nov 15

7. Text segmentation

 Nov 29

8. Working on and discussing assignment2

 Dec 6

9. Mixture of Gaussians and other clustering methods

 Dec 13

10. Inspecting Neural Networks

 Dec 20

  • Deep neural networks in NLP as a BlackBox
  • What is being learned in their hiden states?
  • How the attention mechanism works?

11. Sentence Structures

 Jan 3

  • Sentence structures in NLP
  • Utilization of sentence structures in downstream NLP tasks
  • Are some kinds of sentence structures learned latently inside deep networks?

Latent Dirichlet Allocation

 Deadline: Dec 5 23:59  10 points

Unuspervised text segmentation

 Deadline: Dec 19 23:59  10 points

  • You will get English texts where the spaces between words were removed. The task is to use Bayessian inference to bring the spaces back in a completely unsupervised way. The task is relevant e.g. for Chinese, Japanese, Thai, or other languages not separating words. English was chosen so that everyone could see how good his/her results are. In case you have not attended the lecture, you can find all necesary information in Kewin Knight's tutorial Bayessian inference with tears. Try several hyperparameter combination to gain as good results as possible. Try also the simulated annealing method to delay the Gibbs sampling convergence. slides
  • Data: eng-input.txt
  • Christopher Bishop: Pattern Recognition and Machine Learning, Springer-Verlag New York, 2006 (read here)

  • Kevin P. Murphy: Machine Learning: A Probabilistic Perspective, The MIT Press, Cambridge, Massachusetts, 201i2 (read here)