Be aware that this is an archived page from former years. You can visit the current version instead.

Machine Learning for Greenhorns – Winter 2019/20

Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.

Python programming skills are required, together with basic probability theory knowledge.

About

SIS code: NPFL129
Semester: winter
E-credits: 5
Examination: 2/2 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

  • lecture: the lecture is held on Monday 14:00 in S3; first lecture is on Oct 07
  • practicals: there are two parallel practicals, on Tuesday 9:00 in SU1 and on Tuesday 12:20 in SU2; first practicals are on Oct 08

Lectures

1. Introduction to Machine Learning Slides PDF Slides linear_regression_manual linear_regression_l2 linear_regression_competition

2. Linear Regression II, SGD, Perceptron Slides PDF Slides feature_engineering linear_regression_sgd perceptron binary_classification_competition

3. Perceptron and Logistic Regression Slides PDF Slides grid_search logistic_regression_sgd mnist_competition

4. Multiclass Logistic Regression, Multilayer Perceptron Slides PDF Slides softmax_classification_sgd mlp_classification_sgd diacritization

5. Derivation of Softmax, Support Vector Machines Slides PDF Slides diacritization_dictionary kernelized_linear_regression

6. Soft-margin SVM, SMO Algorithm, Decision Trees Slides PDF Slides decision_tree_classification

7. SMO Algorithm Slides PDF Slides smo_algorithm svm_multiclass isnt_it_ironic isnt_it_ironic_open

8. Model Combinations, Gradient Boosted Trees, Naive Bayess Slides PDF Slides random_forest gradient_boosting human_activity_recognition

9. Naive Bayes, K-Means, Gaussian Mixture Slides PDF Slides naive_bayes kmeans gaussian_mixture

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

The lecture content, including references to study materials. The main study material is the Pattern Recognition and Machine Learning by Christopher Bishop, referred to as PRML.

References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.

1. Introduction to Machine Learning

 Oct 07 Slides PDF Slides linear_regression_manual linear_regression_l2 linear_regression_competition

2. Linear Regression II, SGD, Perceptron

 Oct 14 Slides PDF Slides feature_engineering linear_regression_sgd perceptron binary_classification_competition

3. Perceptron and Logistic Regression

 Oct 21 Slides PDF Slides grid_search logistic_regression_sgd mnist_competition

4. Multiclass Logistic Regression, Multilayer Perceptron

 Nov 11 Slides PDF Slides softmax_classification_sgd mlp_classification_sgd diacritization

5. Derivation of Softmax, Support Vector Machines

 Nov 18 Slides PDF Slides diacritization_dictionary kernelized_linear_regression

6. Soft-margin SVM, SMO Algorithm, Decision Trees

 Nov 25 Slides PDF Slides decision_tree_classification

7. SMO Algorithm

 Dec 02 Slides PDF Slides smo_algorithm svm_multiclass isnt_it_ironic isnt_it_ironic_open

8. Model Combinations, Gradient Boosted Trees, Naive Bayess

 Dec 09 Slides PDF Slides random_forest gradient_boosting human_activity_recognition

9. Naive Bayes, K-Means, Gaussian Mixture

 Dec 16 Slides PDF Slides naive_bayes kmeans gaussian_mixture

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 will be transfered to the exam.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, scikit-learn 0.21.3, pandas 0.25.1 and NumPy 1.17.2.

You can install all required packages either to user packages using pip3 install --user scikit-learn==0.21.3 pandas==0.25.1, or create a virtual environment using python3 -m venv VENV_DIR and then installing the packages inside it by running VENV_DIR/bin/pip3 install scikit-learn==0.21.3 pandas==0.25.1.

Teamwork

Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

linear_regression_manual

 Deadline: Oct 20, 23:59  3 points

Starting with the linear_regression_manual.py template, solve a linear regression problem using the algoritm from the lecture which explicitly computes the matrix inversion. Then compute root mean square error on the test set.

linear_regression_l2

 Deadline: Oct 20, 23:59  3 points

Starting with the linear_regression_l2.py template, use scikit-learn to train regularized linear regression models and print the results of the best of them.

linear_regression_competition

 Deadline: Oct 20, 23:59  3 points+5 bonus

This assignment is a competition task. Your goal is to perform linear regression on the data from a rental shop. The train set contains 1000 instances, each instance consists of 12 features, both integral and real.

The linear_regression_competition.py template shows how to load the linear_regression_competition.train.npz available in the repository. Furthermore, it shows how to save a trained estimator, how to load it, and it shows recodex_predict method which is called during ReCodEx evaluation.

The performance of your system is measured using root mean squared error and your goal is to achieve RMSE less than 130. Note that you can use any sklearn algorithm to solve this exercise.

feature_engineering

 Deadline: Nov 03, 23:59  4 points

Starting with the feature_engineering.py template, learn how to perform basic feature engineering using scikit-learn.

linear_regression_sgd

 Deadline: Nov 03, 23:59  5 points

Starting with the linear_regression_sgd.py, implement minibatch SGD for linear regression. Evaluate it using cross-validation and compare the results to an explicit linear regression solver.

perceptron

 Deadline: Nov 03, 23:59  3 points

Starting with the perceptron.py template, implement the perceptron algorithm.

binary_classification_competition

 Deadline: Nov 03, 23:59  4 points+5 bonus

This assignment is a competition task. Your goal is to perform binary classification on the data from contract approval. The train set contains 20k instances, each instance consists of 14 features, both integral and categorical. Evaluation is performed on 15k examples.

The binary_classification_competition.py template shows how to load the binary_classification_competition.train.csv.xz training data, downloading it if needed. Note that the input is in CSV format and we load it using pandas as a pandas.DataFrame, which is a two-dimensional heterogeneous array with labeled columns. Furthermore, the template shows how to save a compressed trained estimator and how to load it in recodex_predict method.

The performance of your system is measured using accuracy and your goal is to achieve at least 83%. Note that you can use any sklearn algorithm to solve this exercise, except for AdaBoost and gradient boosting.

 Deadline: Nov 10, 23:59  3 points

Starting with grid_search.py template, perform a hyperparameter grid search, evaluating hyperparameter performance using a stratified k-fold crossvalidation, and finally evaluate a model trained with best hyparparameters on all training data. The easiest way is to utilize sklearn.model_selection.GridSearchCV.

logistic_regression_sgd

 Deadline: Nov 10, 23:59  5 points

Starting with the logistic_regression_sgd.py, implement minibatch SGD for logistic regression.

mnist_competition

 Deadline: Nov 10, 23:59  6 points+6 bonus

This assignment is a competition task. Your goal is to perform 10-class classification on the well-known MNIST dataset. The train set contains 60k images, each consisting of 28×2828×28 pixels with values in {0,1,,255}\{0, 1, …, 255\}. Evaluation is performed on 10k test images.

The mnist_competition.py template shows how to load the mnist.train.npz training data, downloading it if needed. Furthermore, the template shows how to save a compressed trained estimator and how to load it in recodex_predict method.

The performance of your system is measured using accuracy and your goal is to achieve at least 94%. Note that you can use any sklearn algorithm to solve this exercise.

softmax_classification_sgd

 Deadline: Nov 24, 23:59  4 points

Starting with the softmax_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.

mlp_classification_sgd

 Deadline: Nov 24, 23:59  6 points

Starting with the mlp_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.

diacritization

 Deadline: Nov 24, 23:59  5 points+6 bonus

The goal of the diacritization competition is to learn to add diacritics to the given Czech text. We will use a small collection of fiction books, which is available under CC BY-NC-SA license. Note that these texts are the only allowed training data, you cannot use any other Czech texts to train or evaluate your model. At test time, you will be given a text without diacritics and you should return it including diacritical marks – to be explicit, we only consider diacritized letters áčďéěíňóřšťúůýž and their uppercase variants.

The diacritization.py template shows how to load the training data, downloading it if needed, it shows how to save a compressed trained estimator and how to load it in recodex_predict method.

Each sentence in the data is stored on a single line, with exactly one space character separating input words. The performance of your system is measured using word accuracy (the percentage of words you diacritized correctly, as computed by the diacritization_eval.py script) and your goal is to achieve at least 86.5%. You can use any sklearn algorithm which does not use decision trees to solve this assignment (so no random forests, extra trees, gradient boosting, AdaBoost, …).

diacritization_dictionary

 Deadline: Dec 01, 23:59  4 points+5 bonus

The diacritization_dictionary is an extension of the diacritization competition. In addition to the original training data, in this task you can also use a dictionary providing all known diacritized variants of all word forms present in the training and testing data, available again under CC BY-NC-SA license.

The rules of the competition is the same as of the diacritization competition, except that

  • you can utilize the dictionary, both during training and inference;
  • in order to pass, you need to achieve at least 95% word accuracy.

The diacritization_dictionary.py module provides a Dictionary class, which loads the dictionary (downloading it if necessary), exposing it in Dictionary.variants field as a mapping from undiacritized word form to a list of known diacritized variants.

Note that the fiction-dictionary.txt is available during ReCodEx evaluation, so you must not submit it to ReCodEx.

kernelized_linear_regression

 Deadline: Dec 01, 23:59  5 points

Starting with the kernelized_linear_regression.py, implement kernelized linear regression training using SGD on the dual formulation. You should support polynomial and Gaussian kernels and also L2 regularization. More details are present in the template.

  • --examples=50 --kernel=rbf --kernel_gamma=1 --iterations=5 --l2=0 --learning_rate=0.05 --seed 42
Iteration 1, train RMSE 0.80, test RMSE 0.68
Iteration 2, train RMSE 0.78, test RMSE 0.66
Iteration 3, train RMSE 0.76, test RMSE 0.63
Iteration 4, train RMSE 0.74, test RMSE 0.61
Iteration 5, train RMSE 0.72, test RMSE 0.59
  • --examples=50 --kernel=rbf --kernel_gamma=2 --iterations=5 --l2=0 --learning_rate=0.05 --seed 42
Iteration 1, train RMSE 0.75, test RMSE 0.63
Iteration 2, train RMSE 0.68, test RMSE 0.56
Iteration 3, train RMSE 0.62, test RMSE 0.49
Iteration 4, train RMSE 0.57, test RMSE 0.44
Iteration 5, train RMSE 0.53, test RMSE 0.40
  • --examples=50 --kernel=rbf --kernel_gamma=1 --iterations=5 --l2=2.0 --learning_rate=0.05 --seed 42
Iteration 1, train RMSE 0.80, test RMSE 0.68
Iteration 2, train RMSE 0.78, test RMSE 0.66
Iteration 3, train RMSE 0.76, test RMSE 0.64
Iteration 4, train RMSE 0.75, test RMSE 0.62
Iteration 5, train RMSE 0.74, test RMSE 0.61
  • --examples=50 --kernel=poly --kernel_gamma=1 --iterations=5 --l2=0 --learning_rate=0.01 --seed 42
Iteration 1, train RMSE 0.75, test RMSE 0.82
Iteration 2, train RMSE 0.70, test RMSE 0.59
Iteration 3, train RMSE 0.66, test RMSE 0.74
Iteration 4, train RMSE 0.63, test RMSE 0.64
Iteration 5, train RMSE 0.61, test RMSE 0.74

decision_tree_classification

 Deadline: Dec 08, 23:59  8 points

Starting with the decision_tree_classification.py, implement construction of a classification decision tree, supporting both gini and entropy criteria, and max_depth, min_to_split and max_leaves constraints.

In this assignment, you will get partial points during ReCodEx evaluation, depending on which argument values your solution support.

  • --criterion=gini --max_depth=2 --seed=42
Train acc: 94.1%
Test acc: 85.7%

Decision tree visualization

  • --criterion=gini --min_to_split=49 --seed=42
Train acc: 97.8%
Test acc: 95.2%

Decision tree visualization

  • --criterion=gini --max_leaves=4 --seed=42
Train acc: 97.1%
Test acc: 95.2%

Decision tree visualization

  • --criterion=entropy --max_leaves=5 --seed=42
Train acc: 97.8%
Test acc: 88.1%

Decision tree visualization

smo_algorithm

 Deadline: Dec 15, 23:59  7 points

The comments in the template were changed on Dec 08.

Using the smo_algorithm.py template, implement the SMO algorithm for binary classification using dual formulation of soft-margin SVM.

The example outputs were changed on Dec 07.

  • --C 1 --kernel linear --tolerance 0.0001 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 75.0%, test acc 67.0%
Train acc 70.0%, test acc 62.0%
Train acc 53.3%, test acc 55.0%
Train acc 56.7%, test acc 55.0%
Train acc 85.0%, test acc 77.0%
...
Train acc 85.0%, test acc 74.0%

Decision tree visualization

  • --C 1 --kernel rbf --kernel_gamma=1 --tolerance 0.0001 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 70.0%, test acc 56.0%
Train acc 91.7%, test acc 80.0%
Train acc 91.7%, test acc 82.0%
Train acc 91.7%, test acc 78.0%
Train acc 95.0%, test acc 74.0%
...
Train acc 93.3%, test acc 75.0%

Decision tree visualization

  • --C 1 --kernel rbf --kernel_gamma=0.1 --tolerance 0.0001 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 86.7%, test acc 76.0%
Train acc 80.0%, test acc 73.0%
Train acc 83.3%, test acc 75.0%
Train acc 81.7%, test acc 79.0%
Train acc 85.0%, test acc 73.0%
...
Train acc 85.0%, test acc 75.0%

Decision tree visualization

  • --C 1 --kernel poly --kernel_degree=3 --kernel_gamma=1 --tolerance 0.0001 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 73.3%, test acc 62.0%
Train acc 51.7%, test acc 51.0%
Train acc 63.3%, test acc 60.0%
Train acc 66.7%, test acc 63.0%
Train acc 85.0%, test acc 73.0%
...
Train acc 88.3%, test acc 73.0%

Decision tree visualization

  • --C 5 --kernel poly --kernel_degree=3 --kernel_gamma=1 --tolerance 0.0001 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 73.3%, test acc 62.0%
Train acc 51.7%, test acc 51.0%
Train acc 63.3%, test acc 60.0%
Train acc 66.7%, test acc 63.0%
Train acc 65.0%, test acc 62.0%
...
Train acc 91.7%, test acc 74.0%

Decision tree visualization

  • --C 1 --kernel poly --kernel_degree=3 --kernel_gamma=1 --tolerance 0.01 --examples 160 --num_passes 5 --seed 007 --test_ratio 0.625
Train acc 86.7%, test acc 73.0%
Train acc 56.7%, test acc 66.0%
Train acc 66.7%, test acc 63.0%
Train acc 70.0%, test acc 56.0%
Train acc 68.3%, test acc 73.0%
...
Train acc 78.3%, test acc 68.0%

Decision tree visualization

svm_multiclass

 Deadline: Dec 22, 23:59  3 points

Extend a solution to the smo_algorithm assignment to handle multiclass classification, using the svm_multiclass.py template.

  • --classes=4 --C=1 --kernel=linear --test_size=640 --num_passes=5 --seed=42 --tolerance=0.0001
96.72
  • --classes=3 --C=1 --kernel=rbf --kernel_gamma=1 --test_size=339 --num_passes=5 --seed=42 --tolerance=0.0001
99.12
  • --classes=5 --C=1 --kernel=poly --kernel_degree=3 --kernel_gamma=1 --test_size=701 --num_passes=5 --seed=42 --tolerance=0.0001
98.72
  • --classes=10 --C=2 --kernel=rbf --kernel_gamma=1 --test_size=1547 --num_passes=5 --seed=42 --tolerance=0.0001
91.73

isnt_it_ironic

 Deadline: Dec 15, 23:59  7 points+7 bonus

The goal of the isnt_it_ironic competition is to learn to classify given texts as ironic or not.

The isnt_it_ironic.py template shows how to load the training data, downloading it if needed, it shows how to save a compressed trained estimator and how to load it in recodex_predict method.

The data are a list of strings, each representing one text. The texts has already been tokenized and tokens are separated by exactly one space. The performance of your solution will be evaluated using F1 score with sklearn.metrics.f1_score and if you surpass at least 55%, you will obtain 7 points. Note that you can use any sklearn algorithm to solve this exercise (or anything you implement by yourself).

isnt_it_ironic_open

 Deadline: Jan 05, 23:59  up to 10 bonus points

This is an extended version of isnt_it_ironic competition to a so-called open track. The differences against the original competition are:

  • the general idea is to achieve the highest score, ideally state-of-the-art
  • you can use any framework and any algorithm, including code you did not write yourself (but of course do not share code outside teams)
  • you can use any unannotated data
  • use only the manual annotations (i.e., labels of irony) from the training data, do not search for or create more

The test set tweets without annotations are available here. They can be loaded for example using the Dataset class from the isnt_it_ironic.py template.

Your submission in ReCodEx should consist of:

  • exactly one .txt file consisting of the test set annotations, containing on every line either a number 0 or a number 1;
  • at least one .py file with the implementation you utilized to generate the test set annotations (this file is not executed, it is only for me to understand what approach you took)

Note that the baseline is now 65%, which is a bit more than the best solution of isnt_it_ironic.

random_forest

 Deadline: Jan 05, 23:59  3 points

Using the random_forest.py template, train a random forest, which is a collection of decision trees trained with dataset bagging and random feature subsampling.

  • --n_estimators=3 --max_depth=2 --seed=42
Train acc: 94.1%
Test acc: 85.7%

Random forest visualization

  • --bootstrapping --n_estimators=3 --max_depth=2 --seed=42
Train acc: 89.0%
Test acc: 88.1%

Random forest visualization

  • --feature_subsampling 0.5 --n_estimators=3 --max_depth=2 --seed=42
Train acc: 94.1%
Test acc: 90.5%

Random forest visualization

  • --bootstrapping --feature_subsampling 0.5 --n_estimators=3 --max_depth=2 --seed=42
Train acc: 90.4%
Test acc: 92.9%

Random forest visualization

gradient_boosting

 Deadline: Jan 05, 23:59  5 points

Using the gradient_boosting.py template, train gradient boosted decision tree forest for classification.

  • --max_depth=1 --learning_rate=0.2 --n_estimators=1
Train acc: 93.4%
Test acc: 92.9%

Gradient boosted trees visualization

  • --max_depth=1 --learning_rate=0.2 --n_estimators=2
Train acc: 96.3%
Test acc: 95.2%

Gradient boosted trees visualization

  • --max_depth=1 --learning_rate=0.2 --n_estimators=3
Train acc: 99.3%
Test acc: 97.6%

Gradient boosted trees visualization

  • --max_depth=2 --learning_rate=0.5 --l2=0.1 --n_estimators=3
Train acc: 100.0%
Test acc: 100.0%

Gradient boosted trees visualization

human_activity_recognition

 Deadline: Jan 05, 23:59  4 points+4 bonus

This assignment is a competition task. Your goal is to perform human activity recognition, namely to recognize one of five actions (walking, standing, sitting, standing up, sitting down) using data from four accelerometers.

The human_activity_recognition.py template loads the training data, downloading it if needed, and shows how to save a compressed trained estimator and how to load it in recodex_predict method.

Your model will be evaluated using accuracy and your goal is to achieve at least 99%. Note that you can use any sklearn algorithm to solve this exercise.

naive_bayes

 Deadline: Jan 12, 23:59  4 points

Using the naive_bayes.py template, train a naive Bayes classifier. Implement a Gaussian NB, multinomial NB and Bernoulli NB classifiers.

  • --naive_bayes_type=gaussian --gaussian_var_epsilon=1e-6 --seed=42
Test accuracy 55.35%
  • --naive_bayes_type=multinomial --alpha=5 --seed=42
Test accuracy 64.53%
  • --naive_bayes_type=bernoulli --alpha=0.5 --seed=42
Test accuracy 63.59%

kmeans

 Deadline: Jan 12, 23:59  3 points

Using the kmeans.py template, implement the k-means algorithm. After a given number of iterations, print out the predicted cluster for every data point.

  • --clusters=3 --examples 90 --iterations 3 --seed 44
1
1
2
1
2
2
2
2
1
2
0
0
1
2
1
...

KMeans visualization

  • --clusters=5 --examples 140 --iterations 6 --seed 44
2
3
2
3
2
0
2
2
1
0
3
4
0
3
1
...

KMeans visualization

gaussian_mixture

 Deadline: Jan 12, 23:59  4 points

Cluster given input by fitting a Gaussian mixture using the gaussian_mixture.py template. After every iteration, print out the current negative log likelihood of the model.

  • --clusters=3 --examples 90 --iterations 8 --seed 44
Loss after iteration 1: 387.6
Loss after iteration 2: 384.5
Loss after iteration 3: 382.0
Loss after iteration 4: 379.5
Loss after iteration 5: 376.8
Loss after iteration 6: 373.6
Loss after iteration 7: 367.3
Loss after iteration 8: 358.7

Gaussian Mixture visualization

  • --clusters=5 --examples 135 --iterations 10 --seed 44
Loss after iteration 1: 620.5
Loss after iteration 2: 612.4
Loss after iteration 3: 608.8
Loss after iteration 4: 606.7
Loss after iteration 5: 605.1
Loss after iteration 6: 603.6
Loss after iteration 7: 602.1
Loss after iteration 8: 600.5
Loss after iteration 9: 599.0
Loss after iteration 10: 597.2

Gaussian Mixture visualization

In the competitions, your goal is to train a model and then predict target values on the test set available only in ReCodEx.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you can include any number of files of any kind. However, these should be exactly one Python source (.py) containing a top-level method recodex_predict. This method is called with the test input data (either a Numpy array or pandas DataFrame) and should return the predictions again as a Numpy array.

If your submission contains a trained model(s), you should also submit the Python source you used to train it.

Evaluation in ReCodEx

ReCodEx starts the evaluation by importing all Python sources and checking if they export recodex_predict method. Then it executes it, evaluates the prediction, and returns one of the following results:

  • Failed, 0%: Either there was not exactly one Python source with recodex_predict, or it crashed during prediction, or it generated an output with incorrect size.
  • OK, 1-99%: Output of correct size was returned by recodex_predict, but it did not achieve required performance. The percentage returned is either achieved/required or required/achieved (depending on whether the goal is to get over/under the requirement). No points are awarded.
  • OK, 100%: The required level of performance was reached; however, the exact performance is unknown.

After the deadline, the exact performance becomes visible for all submissions.

Note that in any case, the exit code of your solution is reported as 0.

Points for Competition Submission

Everyone surpassing the required performance immediately gets the regular points for the assignment.

Furthermore, after the deadline, the latest submission of every user passing the required baseline is collected, and bonus points are awarded depending on relative ordering of performance of the selected submissions.

What Is Allowed

  • You can use only the given annotated data for training. However, you can generally use any unannotated or manually created data.
  • Do not use test set annotations in any way.
  • Unless stated otherwise, you can use any algorithm present in numpy or scipy, anything you implement yourself, and any pre/post-processing in sklearn. Do not use deep network frameworks like TensorFlow or PyTorch.

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 will be transfered to the exam.

To pass the exam, you need to obtain at least 60, 75 and 90 out of 100-point exam, to obtain grades 3, 2 and 1, respectively. The exam consists of five 20-point questions and in addition, you can get at most 40 surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues).

Exam Questions

  • Linear Regression Define the linear regression model, error function used, and derive an algorithm which can analytically find the best fit. Also describe an extension with L2 regularization (error function and an algorithm).

  • Gradient Descent Describe gradient descent, its variants (GD, SGD, minibatch SGD), and derive minibatch SGD algorithm for fitting L2 regularized linear regression.

  • Perceptron Formulate binary classification, perceptron algorithm, its SGD formulation (i.e., which loss gives rise to the algorithm)

  • Information Theory, MLE Define self-information, entropy, cross-entropy and Kullback-Leibler divergence. Prove Gibbs inequality, i.e., that cross-entropy is larger or equal to entropy. Finally describe maximum likelihood estimation, as minimizing NLL, cross-entropy and KL divergence.

  • Maximum Likelihood Estimation, MSE Describe maximum likelihood estimation, as minimizing NLL, cross-entropy and KL divergence. Define mean squared error and show how it can be derived using MLE.

  • Logistic Regression, SGD Formulate binary classification and logistic regression model including the sigmoid function. Describe how we can interpret the model outputs as logits. Finally, write down a training algorithm based on SGD and explicitly compute the gradient of the loss function.

  • Multiclass Logistic Regression, SGD Formulate multiclass classification and multiclass logistic regression model including the softmax function. Describe how we can interpret the model outputs as logits. Finally, write down a training algorithm based on SGD and explicitly compute the gradient of the loss function.

  • Multiclass Logistic Regression, Decision Regions Formulate multiclass classification and multiclass logistic regression model including the softmax function. Describe what a logit is. Write down a training algorithm based on SGD. Finally, show that decision regions of the multiclass logistic regression are convex.

  • Multilayer Perceptron, SGD Formulate multiclass classification and multilayer perceptron model, including hidden activation functions (σ\sigma, tanh, ReLU) and output activation functions (linear, σ\sigma, softmax). Write down a training algorithm based on SGD.

  • Multilayer Perceptron, Universal Approximation Theorem Formulate multiclass classification and multilayer perceptron model, including hidden activation functions (σ\sigma, tanh, ReLU) and output activation functions (linear, σ\sigma, softmax). Formulate the Universal approximation theorem.

  • Kernelized Linear Regression Formulate what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate linear regression using kernels and derive a dual formulation of the training algorithm (either as full GD or SGD) and write down how predictions are made.

  • Hard-Margin Support Vector Machines Formulate what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate kernelized hard-margin support vector machines and write down both the primal formulation (the loss to minimize and the constraints to fulfil) and the dual formulation (again the loss to minimize and the constraints to fulfil). Describe what a support vector is and how a prediction can be made using the support vectors only. Finally, describe the one-versus-rest and one-versus-one schemes for extending the binary SVM classification to a multiclass one.

  • Soft-Margin Support Vector Machines Describe what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate kernelized soft-margin support vector machines and write down the both the primal formulation (the loss to minimize and the constraints to fulfil) and the dual formulation (again the loss to minimize and the constraints to fulfil). Finally, describe what a support vector is, which input examples are support vectors, and how a prediction can be made using the support vectors only.

  • Sequential Minimization Optimization Algorithm Formulate kernelized soft-margin support vector machines and write down the dual formulation of the problem (the loss to minimize and constraints to fulfil). Describe the KKT conditions of the solution and formulate the SMO algorithm (to the level of AlgorithmSketch\textit{AlgorithmSketch} section of the slides; the UpdateRules\textit{UpdateRules} section of the slides is not required)

  • Decision Trees Describe a decision tree, and formulate an algorithm to fit a decision tree using a Gini criterion and entropy criterion. Show that the Gini criterion corresponds to MSE loss and that entropy criterion corresponds to NLL. Finally describe the maximum tree depth and maximum number of leaf nodes constraints and how can they be implemented.

  • Random Forests Describe a decision tree, and formulate an algorithm to fit a decision tree using a Gini criterion and entropy criterion. Then, define random forests model, describe bagging and random subset of features procedure, and show a training procedure for random forests.

  • Gradient Boosting Formulate gradient boosting decision trees, show how prediction is made, derive the loss using second-order Taylor expansion, show optimal weight for a node and the resulting splitting criterion. Finally, formulate the training algorithm.

  • Naive Bayes Derive the generic naive Bayes classifier – formulation and prediction. Then, describe Gaussian NB, multinomial NB and Bernoulli NB variants.

  • K-Means Describe clustering and derive the K-Means algorithm – formulate the loss it minimizes and show that the algorithm converges to its local minimum.

  • Gaussian Mixture Describe clustering, formulate multinomial Gaussian distribution and write down the Gaussian mixture model and the training algorithm.