SIS code: 
2/2 C+Ex

Úvod do strojového učení v systému R

Introduction to machine learning in the R system

  • Czech lecture     Tuesday 9:00-10:30
  • Czech lab            Friday 9:00-10:30
  • English lecture   Tuesday 12:20-13:50
  • English lab          Friday 10:40-12:10

Classes will take place remotely on the Zoom platform. The enrolled students will receive a link via email.

Math and programming requirements

Probability and statistics

R programming

  • You can start with a simple tutorial Tutorial-on-R.2013
  • If you are not familiar with elementary R functions, use the resources listed below.


  Lecture date Lecture Lab
1. Mar 2 Introductory lesson
(will mostly be explained only next time)

Annotation experiment – Demo

Working with R – Tutorial on annotation data analysis
(only Parts I and II today)

2. Mar 9 Data Analysis, Clustering Programming questionsml-lab.2021-03.12.R
3. Mar 16 Formal introduction to ML (see the presentation above)
Decision Trees – basic structure

Inter-annotator agreement – Cohen's kappa

Tutorial on Decision Trees – simple exercise in R
(only Parts I and II today)

4.  Mar 23 Clustering (see the presentation on March 9)
Linear regression
Programming questionsml-lab.2021-03-26.R
5. Mar 30 Learning Decision Trees & Random Forests
    – Entropy
    – Sample error, generalization error, and overfitting
    – Learning algorithms
No lab session 2/4
6.  Apr 6 Logistic regression & Evaluation of binary classifiers Programming questions, ml-lab.2021-09-04.R  
7. Apr 13

Building Decision Trees and Random Forests
(completion of the lecture, see the presentation above)

More on evaluation

Decision Trees, Random Forests, and evaluation
    Example code + illustrations

HW #1 – assignment

8. Apr 20 SVM and ROC curve

Apr 23: Obligatory on-line written Test #1 (30 min)

Homework solution – Random Forests + cross-validation
    Example code  +  illustration

Programming questionsml-lab2021-04.23.R

9. Apr 27

Ensemble learning – bagging and boosting


Apr 30: HW #1 submission deadline

Boosting Trees
    Example code  +  illustration

HW #2 assignment

10. May 4 Regularization  –  linear and logistic regression  ml-lab.2021-05.07.R
11. May 11

Statistical tests  –  applications in ML
    – general principles
    – t-test and its use
   example code

Exercise on t-test
    – paired t-test: see the example attached to the lecture

Pearson's chi-squared tests 
    – Example on the goodness-of-fit test
    – Tutorial on the independence test (data: xy.100)

12. May 18 Principal Component Analysis, Naive Bayes On details of classifier precision and recall and ROC
    Exercises with solutions
    + example code

Prepare for Test #2
13.  May 25

Remarks on Bayes classifier and Bayes error

Fundamentals of Neural Networks
    – single perceptron learning
    – multi-layer perceptron
    – historical remarks

14. Jun 1 Maximum Likelihood Estimation, Course Overview Jun 2: HW #2 submission deadline (hard)
  Jun 9 In-person examination at Malostranské náměstí 25. Sign-up for the exam in the SIS system.  
  Jun 21 In-person examination at Malostranské náměstí 25.  Sign-up for the exam in the SIS system.  


Recommended readings

  • James, Gareth and Witten, Daniela and Hastie, Trevor and Tibshirani, Robert. An Introduction to Statistical Learning. Springer New York, 2013. (link
  • Lantz, Brett. Machine learning with R. Packt Publishing Ltd. 2013. [available  in the MFF library]
  • Barbora Hladká — Martin Holub — Vilém Zouhar: A Collection of Machine Learning Excercises

Introductory readings

  • Alpaydin, Ethem. Introduction to Machine Learning. The MIT Press. 2004, 2010. (link)
  • Domingos, Pedro. A few useful things to know about Machine learning. Communication of the ACM, vol. 55, Issue 10, October 2012, pp. 78--87, ACM, New York, USA. (link)
  • Gonick, Larry and Woollcott Smith. The Cartoon Guide to Statistics. Harper Resource. 2005.
  • Hladká Barbora, Holub Martin: A Gentle Introduction to Machine Learning for Natural Language Processing: How to start in 16 practical steps.In: Language and Linguistics Compass, vol. 9, No. 2, pp. 55-76, 2015.
  • Hladká Barbora, Holub Martin: Machine Learning in Natural Language Processing using R. Course at ESSLLI2013, 2013.
  • Kononenko, Igor and Matjaz Kukar. Machine Learning and Data Mining: Introduction to Principles and Algorithms. Horwood Publishing, 2007. (linka light survey of the whole field)

Advanced readings

  • Baayen, R. Harald. Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press, 2008.
  • Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer, 2006.
  • Burges Christopher J. C.  A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998. (link)
  • Cristianni, Nello and John Shawe-Taylor. An Introduction to Support Vector M​achines and other Kernel-based Learning Methods. Cambridge University Press, 2000.
  • Duda, Richard O., Peter R. Hart and David G. Stork. Pattern Classification. Second Edition. Wiley, 2001.
  • Guyon, Isabelle and Gunn, Steve and Nikravesh, Masoud and Zadeh, Lotfi A. Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing). Springer-Verlag New York, Inc. 2006.
  • Hastie, Trevor, Robert Tibshirani and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2009. (link)
  • Hsu Chih-Wei, Chang Chih-Chung Chang and Chih-Jen Lin. A Practical Guide to Support Vector Classication. 2010. (link)

About the R system

  • Everitt, B.S and Hothorn, Torsten. A Handbook of Statistical Analyses using R. CRC Press. 2010.
  • Dalgaard, Peter. Introductory Statistics with R. Springer, 2008.
  • Kerns, G. Jay. Introduction to Probability and Statistics Using R. 2011. (link) ​
  • Paradis, Emmanuel. R for Beginners. 2005. (link)
  • Rodrigue, German. Introducing R -- Getting started. (link)
  • Venables, W.N, D. M. Smith and the R core team. An Introduction to R. (link)
  • Venables, W. N. and B. D. Ripley. Modern Applied Statistics with S. Springer, 2002. (link)

Sample student projects from the past

This course was originally focused on machine learning in natural language processing. To get credits for lab sessions, students needed to do experimental projects

  Default Task Default Task Description Nice Student Reports
2014/15 Native Language Identification npfl054-term-project-2014-15.pdf
CUNI report

Reuters-21578 Text Categorization

Default task:
Sentiment analysis task:
2012/13 Word Sense Disambiguation PFL054.project.2012-13.pdf
2011/12 Semantic Pattern Classification PFL054.project.2011-12.specification.pdf
2010/11 Semantic Collocation Recognition PFL054.project.2010-11.pdf,
2009/10 Verb Sense Disambiguation PFL054_2009_10_project.pdf ML_report_Fabian.pdf,
2008/09 Coreference Resolution


2007/08 Named-entity Type Classification PFL054_2007_08_project.pdf Jana.Kravalova-FinalReport.pdf,


Other machine learning courses organized by UFAL

  • NPFL097 Unsupervised Machine Learning in NLP (advanced)
  • NPFL114 Deep learning (introductory)
  • NPFL122 Deep Reinforcement Learning (advanced)
  • NPFL129 Introduction to Machine Learning with Python (introductory)

MFF UK Internal Regulations