Unsupervised Machine Learning in NLP

The seminar focuses on deeper understanding of selected unsupervised machine learning methods for students who already have basic knowledge of machine learning and probability theory. The course covers the following topics: Latent Variable models, Categorical Mixture Models, Latent Dirichlet Allocation, Gaussian Mixture Models, Expectation-Maximization, Gibbs Sampling, Chinese Restaurant Process, Pitman-Yor Process, Hierarchical Clustering, Clustering Evaluation, Principal Component Analysis, T-SNE, Sparse Auto-Encoders.

About

SIS code: NPFL097
Semester: winter
E-credits: 3
Examination: 1/1 C
Guarantor: David Mareček

Timespace Coordinates

The lectures are given on Tuesdays 12:20 - 13:50, in room S10, the first lecture is on 1st October.

Course prerequisities

Students are expected to be familiar with basic probabilistic concepts, roughly in the extent of:

NPFL067 - Statistical methods in NLP I

In the second half of the course, it will be an advantage for you if you know the basics of deep-learning methods. I recommend to attend

NPFL114 - Deep Learning

Course passing requirements

There are three programming assignments during the term. For each one, you can obtain 10 points. When submitted after the deadline, you can obtain at most half of the points.
At the end of the course, there will be a test, from which you can get additional 15 points. You will get 5 questions from the list, each is 3 points.
You pass the course if you obtain at least 30 points.

Lectures

1. Introduction Introduction

2. Beta-Bernoulli and Dirichlet-Categorical probabilistic models Beta-Bernoulli Dirichlet-Categorical

3. Modeling document collections, Mixture of Categoricals, Expectation-Maximization Document collections Mixture of Categoricals

4. Bayesian Mixture Models, Latent Dirichlet Allocation, Gibbs Sampling Latent Dirichlet Allocation Gibbs Sampling for LDA

5. Gibbs Sampling for LDA, Entropy, Assignment 1 Gibbs Sampling for LDA Latent Dirichlet Allocation Gibbs Sampling for Mixture of Gaussians

6. Chinese Restaurant Process, Unsupervised Text Segmentation Chinese Restaurant Process

7. Gibbs Sampling method and Traditional NLP tasks Text Segmentation Traditional_NLP_Tasks Bayessian inference with Tears (by K.Knight)

8. K-Means clustering, Mixture of Gaussians K-Means and Gaussian Mixture Models

9. CANCELLED

10. Aglomerative Clustering, Evaluation methods Aglomerative Clustering and Clustering Evaluation

11. Dimesionality Reduction Dimensionality Reduction t-SNE and PCA demo Interpretation of Neural Networks Clustering and Component Analysis on Word Vectors

12. Final test

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

1. Introduction

Oct 1

Course overview, Examples of Unsupervised learning methods Introduction

2. Beta-Bernoulli and Dirichlet-Categorical probabilistic models

Oct 8

Binary models: Bernoulli, Binomial, and Beta distributions, making predictions Beta-Bernoulli
How to compute expected value of the Beta distribution: Expected value of Beta distribution
Web application showing the Beta-Bernouli distribution from the Coin Experiment: Beta Coin Experiment.
Beta distribution online demo: https://mathlets.org/mathlets/beta-distribution/.
generalization of Beta-Bernoulli to Dirichlet-Categorical distribution Dirichlet-Categorical

3. Modeling document collections, Mixture of Categoricals, Expectation-Maximization

Oct 15

introduction to modeling document collections Document collections
Expectation-Maximization for Categorical Mixture Models Mixture of Categoricals
A very simple code using EM algorithm: em.py.

4. Bayesian Mixture Models, Latent Dirichlet Allocation, Gibbs Sampling

Oct 22

Bayessian Mixture Models, Latent Dirichlet Allocation Latent Dirichlet Allocation
Gibbs Sampling in LDA Gibbs Sampling for LDA
Example video of Gibbs sampling from the bivariate normal distribution: Gibbs Sampling from bivariate Gaussian

5. Gibbs Sampling for LDA, Entropy, Assignment 1

Oct 29

LDA, Computing perplexities of models Gibbs Sampling for LDA
Assignment 1: Latent Dirichlet Allocation
Gibbs Sampling for the Mixture of Gaussians model Gibbs Sampling for Mixture of Gaussians

6. Chinese Restaurant Process, Unsupervised Text Segmentation

Nov 12

Slides for Chinese Restaurant Process and its application to unsupervised text segmentation Chinese Restaurant Process
Chinese Restaurant Process demo: https://topicmodels.info/ckling/tmt/crp.html.

7. Gibbs Sampling method and Traditional NLP tasks

Nov 19

Assignment 2: Text Segmentation
Solving other NLP tasks: Traditional_NLP_Tasks
tutorial (2009) Bayessian inference with Tears (by K.Knight)

8. K-Means clustering, Mixture of Gaussians

Nov 26

Slides K-Means and Gaussian Mixture Models

9. CANCELLED

Dec 3

10. Aglomerative Clustering, Evaluation methods

Dec 10

Aglomerative clustering, evaluation methods Aglomerative Clustering and Clustering Evaluation

11. Dimesionality Reduction

Dec 17

t-SNE, Principal Component Analysis, Independent Component Analysis, Sparse Auto-Encoders
Slides Dimensionality Reduction
Demo: t-SNE and PCA demo
Slides Interpretation of Neural Networks
Assignment 3: Clustering and Component Analysis on Word Vectors

12. Final test

Jan 7

Latent Dirichlet Allocation

Deadline: Nov 19, 23:59 10 points

Instructions: assignment1.pdf
Preprocessing procedures: lda-data.py

Text Segmentation

Deadline: Dec 10 23:59 10 points

Instructions: assignment2.pdf
Input data: data_small.txt
Evaluation script eval.py
Evaluation data: data_small_gold.txt

Clustering and Component Analysis on Word Vectors

Deadline: Feb 16 23:59 10 points

Instructions: assignment3.pdf
Input data: nmt-en-dec-512.txt

List of questions for the final test

Define Beta distribution, describe its parameters. Plot (roughly) the following distributions: Beta(1,1), Beta(0.1,0.1), Beta(10, 10).
Derive the posterior distribution from the prior (Beta distribution) and likelihood (Binomial distribution). Derive the predictive distribution for the Beta-Bernoulli posterior.
Explain Dirichlet distribution, describe its parameters. Plot (roughly) the following distributions: Dir(1,1,1), Dir(0.1,0.1,0.1), Dir(10, 10, 10).
Derive the posterior distribution from the prior (Dirichlet distribution) and likelihood (Multinomial distribution). Derive the predictive distribution for the Dirichlet-Categorical posterior.
Explain the "Mixture of Categoricals" model (a topic is assigned to each document) for Modeling document collections. Describe all its parameters and hyperparameters. From what distributions are they drawn? Describe the Expectation-Maximization algorithm for training such model.
Explain the Latent Dirichlet Allocation model (a topic is asigned to each word in each document). Describe all its parameters and hyperparameters. From what distributions are they drawn? What are the latent variables? Describe the learning algorithm based on Gibbs sampling.
Explain Collapsed Gibbs sampling. Choose one unsupervised task from the lectures (word alignment, tagging, segmentation) and describe the basic algorithm. What is annealing?
Explain Chinese Restaurant Process. What distributions does it generate? What is exchangeability? Explain its generalization to the Pitman-Yor process.
Explain the Gaussian Mixture model for clustering. What are the advantages of Gaussian Mixture model compared to K-means? Provide an example of clusters in 2D where K-means fails and where Gaussian Mixture model works well.
Explain Hierarchical Agglomerative clustering methods. What are their advantages over K-means? What linkage criteria do you know? Provide examples of clusters in 2D where these criteria fail.
What is t-SNE? What properties does it have? What is it used for? How does it work?
What is Principal Component Analysis? What properties does it have? What is it used for? How does it work? Explain it in a 2D example.

Christopher Bishop: Pattern Recognition and Machine Learning, Springer-Verlag New York, 2006 (read here)
Kevin P. Murphy: Machine Learning: A Probabilistic Perspective, The MIT Press, Cambridge, Massachusetts, 2012 (read here)
David Mareček, Jindřich Libovický, Tomáš Musil, Rudolf Rosa, Tomasz Limisiewicz: HIDDEN IN THE LAYERS: Interpretation of Neural Networks for Natural Language Processing. Institute of Formal and Applied Linguistics, 2020 (read_here)

Institute of Formal and Applied Linguistics

Charles University, Czech Republic
Faculty of Mathematics and Physics

Search form

Unsupervised Machine Learning in NLP

About

Timespace Coordinates

Course prerequisities

Course passing requirements

Lectures

License

1. Introduction

2. Beta-Bernoulli and Dirichlet-Categorical probabilistic models

3. Modeling document collections, Mixture of Categoricals, Expectation-Maximization

4. Bayesian Mixture Models, Latent Dirichlet Allocation, Gibbs Sampling

5. Gibbs Sampling for LDA, Entropy, Assignment 1

6. Chinese Restaurant Process, Unsupervised Text Segmentation

7. Gibbs Sampling method and Traditional NLP tasks

8. K-Means clustering, Mixture of Gaussians

9. CANCELLED

10. Aglomerative Clustering, Evaluation methods

11. Dimesionality Reduction

12. Final test

Latent Dirichlet Allocation

Text Segmentation

Clustering and Component Analysis on Word Vectors

List of questions for the final test