SIS code: 
winter s.:6
2/2 C+Ex



Úvod do strojového učení

Introduction to machine learning

Time and place


  • Czech  Wednesday 10:40 - 12:10, S5 
  • English  Thursday 14:00 - 15:30, S5

Lab session

  • Czech  Wednesday 12:20 - 13:50, SU2
    Czech – Thursday 10:40 - 12:10, SU2
  • English  Friday 14:00 - 15:30, SU2

Math and programming requirements

Probability and statistics

  • The most important requirements from probability and statistics are listed here: Preliminaries.Probability-Statistics
  • Make sure that you are familiar at least with the very basics: Prob-Stat.zaklady.2014
  • As to the MFF students, we expect the knowledge covered in the obligatory course "Pravděpodobnost a statistika" (NMAI059).
  • Gentle entry test in probability and statistics – a brief evaluation.

R programming

  • You can start with a simple tutorial Tutorial-on-R.2013
  • If you are not familiar with elementary R functions, use the resources listed below.

Calendar 2018/19

  Lecture date Lecture Lab
1. 3/10

Introduction to Machine Learning
    – What is Machine Learning
    – Basic formal concepts
    – Overview of the course  
    – Requirements for getting credits

Annotation experiment and data analysis
    – Practical experience with manual annotation
    – Annotation data analysis
    – Inter-annotator agreement
    – Confusion matrices and error analysis

Tutorial on annotation data analysis in R

2. 10/10
Data Analysis and Clustering Native Language Identification task
R code

   – Feature frequency with the MOV data
   – Clustering with the USArrests data
   – Clustering with the NLI data
3. 17/10

Entropy, Decision Trees, and classifier evaluation
    – Entropy
    – Basic principles of Decision Trees
    – Fundamentals of classifier evaluation


Tutorial on probability distributions and entropy in R
    – Data: xy.100.csv

Hints on computing entropy in R

Tutorial on Decision Trees

Hints on building Decision Trees using rpart() in R

Useful demo codes
    – load-wsd-data.R  
    – cp-and-pruning.Forbes.R

Note on homeworks: All homeworks so far are strongly recommended, although you do NOT have to submit them.

4. 24/10
Linear regression, logistic regression

!!! Lab session 25/10 cancelled

– HW1 assignment

R code
    – illustration of Gradient Descent Algorithm
    – lm() with the Auto data
    – glm() with the data on students
    – odds, parameter interpretation

5. 31/10

Evaluation and statistical tests [UPDATE  1/11]

Written Test #1

Exercises on t-test

6. 7/11
Instance-based learning,
Naive Bayes Classifier,
Bayesian Networks,
Maximum Likelihood Estimation

– HW1 early due date

R code
– scaling may affect performance
   – Caravan unbalanced data set
   – Precision vs. Recall
   – kNN, NB, 10-CV 

7. 14/11
Ensemble learning [TENTATIVE VERSION]

Test #1 – remarks on evaluation

Protein Ligandability Recognition
    – description of the task 

Tutorial on Random Forests

Obligatory homeworks
    – HW1 late due date
    – HW2 assignment

8. 21/11
Suppot Vector Machines, ROC

R code
    – NLI task
    – SVM

R code
    – Caravan data set
    – DT, LogR, SVM
    – ROC, AUC

R code
    – PLR task
    – kNN
    – generalization error estimation
       using CV and bootstrapping

9. 28/11

Feature analysis, importance, and selection
    – Why we need feature selection
    – Feature selection heuristics
    – Bayes error
    – Chi-square tests

Recommended materials and illustrations
    – Curse of dimensionality – illustration
    – FSelector package

Tutorials on using chi-squared tests
    – tutorial
    – another tutorial

– HW2 early due date

10. 5/12



Written Test #2

Obligatory homeworks
    – HW2 late due date
    – HW3 assignment will be posted by Friday, Dec 7

R code
    – College data set
    – regularization using glmnet()

11. 12/12
Perceptron and fundamentals of Neural Networks
    – The boom of Neural Networks
    – Perceptron learning
    – Single- and Multi- layer Perceptron
    – The success od deep architectures 

Discussion of Homework #3

12. 19/12

    – PCA
    – Course overview
    – HW #2 solution

    – Regularization

R code
    – PCA with the Auto data set
R code
    – PLR task
    – SVM
    – PCA
R code
    – College data set
    – regularization using glmnet()

13. 2/1

    – PCA
    – Course overview

Test #3 preparation
14. 9/1
Obligatory final written Test #3 at lecture time!

– HW #3 hard due date: January 8, 2019

Lab sessions
    Wed + Fri: HW #3  
        – bioinformatics' point of view
        – discussion about correct solution
    Thu 10:40
        Deep Neural Networks
        – a bonus lecture by dr. Milan Straka
        – see his presentation

    Exam dates
    – Jan 24 (S7), Jan 30 (S7), Jan 31 (S7)
    – Feb 6 (S1)
    – Feb 8 (S11)
    – Feb 14 (S10)
    – register in the SIS system  


Recommended readings

  • James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert. An Introduction to Statistical Learning. Springer New York, 2013. (link
  • Lantz, Brett. Machine learning with R. Packt Publishing Ltd. 2013. [available  in the MFF library]
  • Barbora Hladká — Martin Holub — Vilém Zouhar: A Collection of Machine Learning Excercises

Introductory readings

  • Alpaydin, Ethem. Introduction to Machine Learning. The MIT Press. 2004, 2010. (link)
  • Domingos, Pedro. A few useful things to know about Machine learning. Communication of the ACM, vol. 55, Issue 10, October 2012, pp. 78--87, ACM, New York, USA. (link)
  • Gonick, Larry and Woollcott Smith. The Cartoon Guide to Statistics. Harper Resource. 2005.
  • Hladká Barbora, Holub Martin: A Gentle Introduction to Machine Learning for Natural Language Processing: How to start in 16 practical steps.In: Language and Linguistics Compass, vol. 9, No. 2, pp. 55-76, 2015.
  • Hladká Barbora, Holub Martin: Machine Learning in Natural Language Processing using R. Course at ESSLLI2013, 2013.
  • Kononenko, Igor and Matjaz Kukar. Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing, 2007. (linka light survey of the whole field)

Advanced readings

  • Baayen, R. Harald. Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press, 2008.
  • Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006.
  • Burges Christopher J. C.  A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. (link)
  • Cristianni, Nello and John Shawe-Taylor. An Introduction to Support Vector M​achines and other Kernel-based Learning Methods. Cambridge University Press, 2000.
  • Duda, Richard O., Peter R. Hart and David G. Stork. Pattern Classification. Second Edition. Wiley, 2001.
  • Guyon, Isabelle and Gunn, Steve and Nikravesh, Masoud and Zadeh, Lotfi A. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag New York, Inc. 2006.
  • Hastie, Trevor, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009. (link)
  • Hsu Chih-Wei, Chang Chih-Chung Chang and Chih-Jen Lin. A Practical Guide to Support Vector Classication. 2010. (link)

About the R system

  • Everitt, B.S and Hothorn, Torsten. A Handbook of Statistical Analyses using R. CRC Press. 2010.
  • Dalgaard, Peter. Introductory Statistics with R. Springer, 2008.
  • Kerns, G. Jay. Introduction to Probability and Statistics Using R. 2011. (link) ​
  • Paradis, Emmanuel. R for Beginners. 2005. (link)
  • Rodrigue, German. Introducing R -- Getting started. (link)
  • Venables, W.N, D. M. Smith and the R core team. An Introduction to R. (link)
  • Venables, W. N. and B. D. Ripley. Modern Applied Statistics with S. Springer, 2002. (link)

Sample student projects from the past

This course was originally focused on machine learning in natural language processing. To get credits for lab sessions, students needed to do experimental projects

  Default Task Default Task Description Nice Student Reports
2014/15 Native Language Identification npfl054-term-project-2014-15.pdf
CUNI report

Reuters-21578 Text Categorization

Default task:
Sentiment analysis task:
2012/13 Word Sense Disambiguation PFL054.project.2012-13.pdf
2011/12 Semantic Pattern Classification PFL054.project.2011-12.specification.pdf
2010/11 Semantic Collocation Recognition PFL054.project.2010-11.pdf,
2009/10 Verb Sense Disambiguation PFL054_2009_10_project.pdf ML_report_Fabian.pdf,
2008/09 Coreference Resolution


2007/08 Named-entity Type Classification PFL054_2007_08_project.pdf Jana.Kravalova-FinalReport.pdf,


Other machine learning courses organized by UFAL

  • NPFL097 Selected problems in machine learning
  • NPFL104 Machine learning exercises 
  • NPFL114 Deep learning

MFF UK Internal Regulations