SIS code: 
winter s.:6
2/2 C+Ex



Úvod do strojového učení

Introduction to machine learning

Time and place


  • Czech  Wednesday 10:40 - 12:10, S1
  • English  Wednesday 12:20 - 13:50, S9

Lab session

  • Czech  Friday 12:20 - 13:50, SU2
  • English  Friday 14:00 - 15:30, SU2

Math and programming requirements

Probability and statistics

R programming

  • You can start with a simple tutorial Tutorial-on-R.2013
  • If you are not familiar with elementary R functions, use the resources listed below.

As to the MFF students, we expect the knowledge covered in the obligatory course "Pravděpodobnost a statistika" (NMAI059).

Calendar 2017/18

  Date Lecture (Wednesdays) Lab (Fridays)
1. 4/10
Introduction to Machine Learning
     What is Machine Learning
     Basic formal concepts
     Overview of the course  
Annotation experiment and data analysis
     Practical experience with manual annotation
    Elementary data processing in R
    Annotation data analysis
    – Inter-annotator agreement
    Confusion matrices and error analysis

Tutorial on annotation data analysis in R
2. 11/10
Entropy and Decision Trees
Note: The part about DT will be updated. 
Here is only a tentative version of whole presentation.
I am going to simplify it a bit...)
Exercises on entropy and remarks on evaluation

Note on homeworks:
All homeworks so far are strongly
recommended and you do NOT have to submit them. 

Tutorial on probability distributions and entropy in R
 Data: xy.100.csv

Demo codes
3. 18/10

Data analysis and Clustering
     Methods for basic data exploration
     K-Means, Agglomerative hierarchical clustering  

Demo code for data analysis and clustering

 HW #1 assignment
4. 25/10
Evaluation and statistical tests

Using t-test will be practiced together on Nov 10
at the Lab session. Chi-square tests were postponed
and will be explained only at some of the next Lectures. 
Practical exercises in R  --  Decision Trees
on Decision Trees
5. 1/11

Linear regression, Logistic regression

Demo code for linear and logistic regression

Written test #1 (30 minutes)
HW #1 early due date 3/11

6. 8/11
Dean's Day (no lecture)

Practical exercises in R  --  Statistical tests

Test #1 -- Review

    Data: test1.anonymous.csv

(!) HW #1 late due date 10/11

7. 15/11
Instance-based learning, Naive Bayes Classfiers, Bayesian networks
– presentation of the HW #1 solution
 HW #2 assignment
State Holiday
(no lab session)
8. 22/11
Ensemble learning

Demo code for kNN, Naive Bayes

9. 29/11
Support Vector Machines, ROC curve Native Language Identification Task and SVM
– ml-lab.SVM.NLI.2017-12.01.R

Demo code for ROC curve and AUC

HW #2 early due date 1/12
10. 6/12

Chi-square tests
Curse of dimensionality and feature selection
   – Bayes classifier and Bayes error

Recommended materials
    Exercises on t-test
    Examples on chi-square tests
    Another exercise on chi-square tests
   – Curse of dimensionality – illustration 
   – FSelector – R package


HW #2 late due date 8/12
Written Test #2 (30 minutes)
HW #3 assignment

11. 13/12
Regularization, Principal Component Analysis

Demo code for 

12. 20/12

Foundations of Neural Networks
    ... and brief remarks on Deep Learning

Learning ensemble models

13. 3/1

Machine Learning for Automatic Machine Translation
– a bonus lecture by guest teacher Ondřej Bojar

Maximum Likelihood Estimation
Summarizing remarks
14. 10/1
Obligatory session: Final Written Test #3  (80 minutes)
– on Jan 10 (Wednesday) in the regular lecture time

Concluding lab lesson
– example solution of HW #3 
– correct solution of Test #3


Recommended readings

  • James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert. An Introduction to Statistical Learning. Springer New York, 2013. (link
  • Lantz, Brett. Machine learning with R. Packt Publishing Ltd. 2013. [available  in the MFF library]

Introductory readings

  • Alpaydin, Ethem. Introduction to Machine Learning. The MIT Press. 2004, 2010. (link)
  • Domingos, Pedro. A few useful things to know about Machine learning. Communication of the ACM, vol. 55, Issue 10, October 2012, pp. 78--87, ACM, New York, USA. (link)
  • Gonick, Larry and Woollcott Smith. The Cartoon Guide to Statistics. Harper Resource. 2005.
  • Hladká Barbora, Holub Martin: A Gentle Introduction to Machine Learning for Natural Language Processing: How to start in 16 practical steps.In: Language and Linguistics Compass, vol. 9, No. 2, pp. 55-76, 2015.
  • Hladká Barbora, Holub Martin: Machine Learning in Natural Language Processing using R. Course at ESSLLI2013, 2013.
  • Kononenko, Igor and Matjaz Kukar. Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing, 2007. (linka light survey of the whole field)

Advanced readings

  • Baayen, R. Harald. Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press, 2008.
  • Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006.
  • Burges Christopher J. C.  A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. (link)
  • Cristianni, Nello and John Shawe-Taylor. An Introduction to Support Vector M​achines and other Kernel-based Learning Methods. Cambridge University Press, 2000.
  • Duda, Richard O., Peter R. Hart and David G. Stork. Pattern Classification. Second Edition. Wiley, 2001.
  • Guyon, Isabelle and Gunn, Steve and Nikravesh, Masoud and Zadeh, Lotfi A. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag New York, Inc. 2006.
  • Hastie, Trevor, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009. (link)
  • Hsu Chih-Wei, Chang Chih-Chung Chang and Chih-Jen Lin. A Practical Guide to Support Vector Classication. 2010. (link)

About the R system

  • Everitt, B.S and Hothorn, Torsten. A Handbook of Statistical Analyses using R. CRC Press. 2010.
  • Dalgaard, Peter. Introductory Statistics with R. Springer, 2008.
  • Kerns, G. Jay. Introduction to Probability and Statistics Using R. 2011. (link) ​
  • Paradis, Emmanuel. R for Beginners. 2005. (link)
  • Rodrigue, German. Introducing R -- Getting started. (link)
  • Venables, W.N, D. M. Smith and the R core team. An Introduction to R. (link)
  • Venables, W. N. and B. D. Ripley. Modern Applied Statistics with S. Springer, 2002. (link)

Sample student projects from the past

This course was originally focused on machine learning in natural language processing. To get credits for lab sessions, students needed to do experimental projects

  Default Task Default Task Description Nice Student Reports
2014/15 Native Language Identification npfl054-term-project-2014-15.pdf
CUNI report

Reuters-21578 Text Categorization

Default task:
Sentiment analysis task:
2012/13 Word Sense Disambiguation PFL054.project.2012-13.pdf
2011/12 Semantic Pattern Classification PFL054.project.2011-12.specification.pdf
2010/11 Semantic Collocation Recognition PFL054.project.2010-11.pdf,
2009/10 Verb Sense Disambiguation PFL054_2009_10_project.pdf ML_report_Fabian.pdf,
2008/09 Coreference Resolution


2007/08 Named-entity Type Classification PFL054_2007_08_project.pdf Jana.Kravalova-FinalReport.pdf,


Other machine learning courses organized by UFAL

  • NPFL097 Selected problems in machine learning
  • NPFL104 Machine learning exercises 
  • NPFL114 Deep learning

MFF UK Internal Regulations