Machine Learning for Greenhorns – Winter 2019/20
Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Python programming skills are required, together with basic probability theory knowledge.
About
SIS code: NPFL129
Semester: winter
Ecredits: 5
Examination: 2/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
 lecture: the lecture is held on Monday 14:00 in S3; first lecture is on Oct 07
 practicals: there are two parallel practicals, on Tuesday 9:00 in SU1 and on Tuesday 12:20 in SU2; first practicals are on Oct 08
Lectures
1. Introduction to Machine Learning Slides PDF Slides linear_regression_manual linear_regression_l2 linear_regression_competition
2. Linear Regression II, SGD, Perceptron Slides PDF Slides feature_engineering linear_regression_sgd perceptron binary_classification_competition
3. Perceptron and Logistic Regression Slides PDF Slides grid_search logistic_regression_sgd mnist_competition
4. Multiclass Logistic Regression, Multilayer Perceptron Slides PDF Slides softmax_classification_sgd mlp_classification_sgd diacritization
5. Derivation of Softmax, Support Vector Machines Slides PDF Slides diacritization_dictionary kernelized_linear_regression
6. Softmargin SVM, SMO Algorithm, Decision Trees Slides PDF Slides decision_tree_classification
7. SMO Algorithm Slides PDF Slides smo_algorithm svm_multiclass isnt_it_ironic isnt_it_ironic_open
8. Model Combinations, Gradient Boosted Trees, Naive Bayess Slides PDF Slides random_forest gradient_boosting human_activity_recognition
9. Naive Bayes, KMeans, Gaussian Mixture Slides PDF Slides naive_bayes kmeans gaussian_mixture
License
Unless otherwise stated, teaching materials for this course are available under CC BYSA 4.0.
The lecture content, including references to study materials. The main study material is the Pattern Recognition and Machine Learning by Christopher Bishop, referred to as PRML.
References to study materials cover all theory required at the exam, and sometimes even more – the references in italics cover topics not required for the exam.
1. Introduction to Machine Learning
Oct 07 Slides PDF Slides linear_regression_manual linear_regression_l2 linear_regression_competition
2. Linear Regression II, SGD, Perceptron
Oct 14 Slides PDF Slides feature_engineering linear_regression_sgd perceptron binary_classification_competition
3. Perceptron and Logistic Regression
Oct 21 Slides PDF Slides grid_search logistic_regression_sgd mnist_competition
4. Multiclass Logistic Regression, Multilayer Perceptron
Nov 11 Slides PDF Slides softmax_classification_sgd mlp_classification_sgd diacritization
5. Derivation of Softmax, Support Vector Machines
Nov 18 Slides PDF Slides diacritization_dictionary kernelized_linear_regression
6. Softmargin SVM, SMO Algorithm, Decision Trees
Nov 25 Slides PDF Slides decision_tree_classification
7. SMO Algorithm
Dec 02 Slides PDF Slides smo_algorithm svm_multiclass isnt_it_ironic isnt_it_ironic_open
8. Model Combinations, Gradient Boosted Trees, Naive Bayess
Dec 09 Slides PDF Slides random_forest gradient_boosting human_activity_recognition
9. Naive Bayes, KMeans, Gaussian Mixture
Dec 16 Slides PDF Slides naive_bayes kmeans gaussian_mixture
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 will be transfered to the exam.
Environment
The tasks are evaluated automatically using the ReCodEx Code Examiner. The evaluation is performed using Python 3.6, scikitlearn 0.21.3, pandas 0.25.1 and NumPy 1.17.2.
You can install all required packages either to user packages using
pip3 install user scikitlearn==0.21.3 pandas==0.25.1
,
or create a virtual environment using python3 m venv VENV_DIR
and then installing the packages inside it by running
VENV_DIR/bin/pip3 install scikitlearn==0.21.3 pandas==0.25.1
.
Teamwork
Working in teams of size 2 (or at most 3) is encouraged. All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. However, each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
linear_regression_manual
Deadline: Oct 20, 23:59 3 points
Starting with the linear_regression_manual.py template, solve a linear regression problem using the algoritm from the lecture which explicitly computes the matrix inversion. Then compute root mean square error on the test set.
linear_regression_l2
Deadline: Oct 20, 23:59 3 points
Starting with the linear_regression_l2.py
template, use scikitlearn
to train regularized linear regression models
and print the results of the best of them.
linear_regression_competition
Deadline: Oct 20, 23:59 3 points+5 bonus
This assignment is a competition task. Your goal is to perform linear regression on the data from a rental shop. The train set contains 1000 instances, each instance consists of 12 features, both integral and real.
The linear_regression_competition.py
template shows how to load the
linear_regression_competition.train.npz
available in the repository. Furthermore, it shows how to save a trained
estimator, how to load it, and it shows recodex_predict
method which
is called during ReCodEx evaluation.
The performance of your system is measured using root mean squared error and your goal is to achieve RMSE less than 130. Note that you can use any sklearn algorithm to solve this exercise.
feature_engineering
Deadline: Nov 03, 23:59 4 points
Starting with the feature_engineering.py template, learn how to perform basic feature engineering using scikitlearn.
linear_regression_sgd
Deadline: Nov 03, 23:59 5 points
Starting with the linear_regression_sgd.py, implement minibatch SGD for linear regression. Evaluate it using crossvalidation and compare the results to an explicit linear regression solver.
perceptron
Deadline: Nov 03, 23:59 3 points
Starting with the perceptron.py template, implement the perceptron algorithm.
binary_classification_competition
Deadline: Nov 03, 23:59 4 points+5 bonus
This assignment is a competition task. Your goal is to perform binary classification on the data from contract approval. The train set contains 20k instances, each instance consists of 14 features, both integral and categorical. Evaluation is performed on 15k examples.
The binary_classification_competition.py
template shows how to load the binary_classification_competition.train.csv.xz
training data, downloading it if needed. Note that the input is in
CSV format and we load
it using pandas as a
pandas.DataFrame,
which is a twodimensional heterogeneous array with labeled columns.
Furthermore, the template shows how to save a compressed trained
estimator and how to load it in recodex_predict
method.
The performance of your system is measured using accuracy and your goal is to achieve at least 83%. Note that you can use any sklearn algorithm to solve this exercise, except for AdaBoost and gradient boosting.
grid_search
Deadline: Nov 10, 23:59 3 points
Starting with grid_search.py
template, perform a hyperparameter grid search, evaluating hyperparameter performance
using a stratified kfold crossvalidation, and finally evaluate a model
trained with best hyparparameters on all training data. The easiest way is
to utilize sklearn.model_selection.GridSearchCV
.
logistic_regression_sgd
Deadline: Nov 10, 23:59 5 points
Starting with the logistic_regression_sgd.py, implement minibatch SGD for logistic regression.
mnist_competition
Deadline: Nov 10, 23:59 6 points+6 bonus
This assignment is a competition task. Your goal is to perform 10class classification on the wellknown MNIST dataset. The train set contains 60k images, each consisting of $28×28$ pixels with values in $\{0, 1, …, 255\}$. Evaluation is performed on 10k test images.
The mnist_competition.py
template shows how to load the mnist.train.npz
training data, downloading it
if needed. Furthermore, the template shows how to save a compressed trained
estimator and how to load it in recodex_predict
method.
The performance of your system is measured using accuracy and your goal is to achieve at least 94%. Note that you can use any sklearn algorithm to solve this exercise.
softmax_classification_sgd
Deadline: Nov 24, 23:59 4 points
Starting with the softmax_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.
mlp_classification_sgd
Deadline: Nov 24, 23:59 6 points
Starting with the mlp_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.
diacritization
Deadline: Nov 24, 23:59 5 points+6 bonus
The goal of the diacritization
competition is to learn to add diacritics to
the given Czech text. We will use a small collection of
fiction books,
which is available under CC BYNCSA license.
Note that these texts are the only allowed training data, you cannot use any
other Czech texts to train or evaluate your model. At test time, you will be
given a text without diacritics and you should return it including diacritical
marks – to be explicit, we only consider diacritized letters áčďéěíňóřšťúůýž
and their uppercase variants.
The diacritization.py
template shows how to load the training data, downloading it if needed,
it shows how to save a compressed trained estimator and how to load it in
recodex_predict
method.
Each sentence in the data is stored on a single line, with exactly one space character separating input words. The performance of your system is measured using word accuracy (the percentage of words you diacritized correctly, as computed by the diacritization_eval.py script) and your goal is to achieve at least 86.5%. You can use any sklearn algorithm which does not use decision trees to solve this assignment (so no random forests, extra trees, gradient boosting, AdaBoost, …).
diacritization_dictionary
Deadline: Dec 01, 23:59 4 points+5 bonus
The diacritization_dictionary
is an extension of the diacritization
competition.
In addition to the original training data,
in this task you can also use a dictionary providing all known diacritized
variants
of all word forms present in the training and testing data, available again under
CC BYNCSA license.
The rules of the competition is the same as of the diacritization
competition,
except that
 you can utilize the dictionary, both during training and inference;
 in order to pass, you need to achieve at least 95% word accuracy.
The diacritization_dictionary.py
module provides a Dictionary
class, which loads the dictionary
(downloading it if necessary), exposing it in Dictionary.variants
field
as a mapping from undiacritized word form to a list of known diacritized
variants.
Note that the fictiondictionary.txt
is available during ReCodEx evaluation,
so you must not submit it to ReCodEx.
kernelized_linear_regression
Deadline: Dec 01, 23:59 5 points
Starting with the kernelized_linear_regression.py, implement kernelized linear regression training using SGD on the dual formulation. You should support polynomial and Gaussian kernels and also L2 regularization. More details are present in the template.
examples=50 kernel=rbf kernel_gamma=1 iterations=5 l2=0 learning_rate=0.05 seed 42
Iteration 1, train RMSE 0.80, test RMSE 0.68
Iteration 2, train RMSE 0.78, test RMSE 0.66
Iteration 3, train RMSE 0.76, test RMSE 0.63
Iteration 4, train RMSE 0.74, test RMSE 0.61
Iteration 5, train RMSE 0.72, test RMSE 0.59
examples=50 kernel=rbf kernel_gamma=2 iterations=5 l2=0 learning_rate=0.05 seed 42
Iteration 1, train RMSE 0.75, test RMSE 0.63
Iteration 2, train RMSE 0.68, test RMSE 0.56
Iteration 3, train RMSE 0.62, test RMSE 0.49
Iteration 4, train RMSE 0.57, test RMSE 0.44
Iteration 5, train RMSE 0.53, test RMSE 0.40
examples=50 kernel=rbf kernel_gamma=1 iterations=5 l2=2.0 learning_rate=0.05 seed 42
Iteration 1, train RMSE 0.80, test RMSE 0.68
Iteration 2, train RMSE 0.78, test RMSE 0.66
Iteration 3, train RMSE 0.76, test RMSE 0.64
Iteration 4, train RMSE 0.75, test RMSE 0.62
Iteration 5, train RMSE 0.74, test RMSE 0.61
examples=50 kernel=poly kernel_gamma=1 iterations=5 l2=0 learning_rate=0.01 seed 42
Iteration 1, train RMSE 0.75, test RMSE 0.82
Iteration 2, train RMSE 0.70, test RMSE 0.59
Iteration 3, train RMSE 0.66, test RMSE 0.74
Iteration 4, train RMSE 0.63, test RMSE 0.64
Iteration 5, train RMSE 0.61, test RMSE 0.74
decision_tree_classification
Deadline: Dec 08, 23:59 8 points
Starting with the decision_tree_classification.py,
implement construction of a classification decision tree, supporting both
gini
and entropy
criteria, and max_depth
, min_to_split
and max_leaves
constraints.
In this assignment, you will get partial points during ReCodEx evaluation, depending on which argument values your solution support.
criterion=gini max_depth=2 seed=42
Train acc: 94.1%
Test acc: 85.7%
criterion=gini min_to_split=49 seed=42
Train acc: 97.8%
Test acc: 95.2%
criterion=gini max_leaves=4 seed=42
Train acc: 97.1%
Test acc: 95.2%
criterion=entropy max_leaves=5 seed=42
Train acc: 97.8%
Test acc: 88.1%
smo_algorithm
Deadline: Dec 15, 23:59 7 points
The comments in the template were changed on Dec 08.
Using the smo_algorithm.py template, implement the SMO algorithm for binary classification using dual formulation of softmargin SVM.
The example outputs were changed on Dec 07.
C 1 kernel linear tolerance 0.0001 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 75.0%, test acc 67.0%
Train acc 70.0%, test acc 62.0%
Train acc 53.3%, test acc 55.0%
Train acc 56.7%, test acc 55.0%
Train acc 85.0%, test acc 77.0%
...
Train acc 85.0%, test acc 74.0%
C 1 kernel rbf kernel_gamma=1 tolerance 0.0001 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 70.0%, test acc 56.0%
Train acc 91.7%, test acc 80.0%
Train acc 91.7%, test acc 82.0%
Train acc 91.7%, test acc 78.0%
Train acc 95.0%, test acc 74.0%
...
Train acc 93.3%, test acc 75.0%
C 1 kernel rbf kernel_gamma=0.1 tolerance 0.0001 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 86.7%, test acc 76.0%
Train acc 80.0%, test acc 73.0%
Train acc 83.3%, test acc 75.0%
Train acc 81.7%, test acc 79.0%
Train acc 85.0%, test acc 73.0%
...
Train acc 85.0%, test acc 75.0%
C 1 kernel poly kernel_degree=3 kernel_gamma=1 tolerance 0.0001 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 73.3%, test acc 62.0%
Train acc 51.7%, test acc 51.0%
Train acc 63.3%, test acc 60.0%
Train acc 66.7%, test acc 63.0%
Train acc 85.0%, test acc 73.0%
...
Train acc 88.3%, test acc 73.0%
C 5 kernel poly kernel_degree=3 kernel_gamma=1 tolerance 0.0001 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 73.3%, test acc 62.0%
Train acc 51.7%, test acc 51.0%
Train acc 63.3%, test acc 60.0%
Train acc 66.7%, test acc 63.0%
Train acc 65.0%, test acc 62.0%
...
Train acc 91.7%, test acc 74.0%
C 1 kernel poly kernel_degree=3 kernel_gamma=1 tolerance 0.01 examples 160 num_passes 5 seed 007 test_ratio 0.625
Train acc 86.7%, test acc 73.0%
Train acc 56.7%, test acc 66.0%
Train acc 66.7%, test acc 63.0%
Train acc 70.0%, test acc 56.0%
Train acc 68.3%, test acc 73.0%
...
Train acc 78.3%, test acc 68.0%
svm_multiclass
Deadline: Dec 22, 23:59 3 points
Extend a solution to the smo_algorithm
assignment to handle multiclass
classification, using the svm_multiclass.py
template.
classes=4 C=1 kernel=linear test_size=640 num_passes=5 seed=42 tolerance=0.0001
96.72
classes=3 C=1 kernel=rbf kernel_gamma=1 test_size=339 num_passes=5 seed=42 tolerance=0.0001
99.12
classes=5 C=1 kernel=poly kernel_degree=3 kernel_gamma=1 test_size=701 num_passes=5 seed=42 tolerance=0.0001
98.72
classes=10 C=2 kernel=rbf kernel_gamma=1 test_size=1547 num_passes=5 seed=42 tolerance=0.0001
91.73
isnt_it_ironic
Deadline: Dec 15, 23:59 7 points+7 bonus
The goal of the isnt_it_ironic
competition is to learn to classify given
texts as ironic or not.
The isnt_it_ironic.py
template shows how to load the training data, downloading it if needed,
it shows how to save a compressed trained estimator and how to load it in
recodex_predict
method.
The data are a list of strings, each representing one text. The texts has
already been tokenized and tokens are separated by exactly one space.
The performance of your solution will be evaluated using
F1 score with
sklearn.metrics.f1_score
and if you surpass at least 55%, you will obtain 7 points.
Note that you can use any sklearn algorithm to solve this exercise
(or anything you implement by yourself).
isnt_it_ironic_open
Deadline: Jan 05, 23:59 up to 10 bonus points
This is an extended version of isnt_it_ironic
competition to a socalled
open track. The differences against the original competition are:
 the general idea is to achieve the highest score, ideally stateoftheart
 you can use any framework and any algorithm, including code you did not write yourself (but of course do not share code outside teams)
 you can use any unannotated data
 use only the manual annotations (i.e., labels of irony) from the training data, do not search for or create more
The test set tweets without annotations
are available here.
They can be loaded for example using the Dataset
class from the
isnt_it_ironic.py
template.
Your submission in ReCodEx should consist of:
 exactly one
.txt
file consisting of the test set annotations, containing on every line either a number0
or a number1
;  at least one
.py
file with the implementation you utilized to generate the test set annotations (this file is not executed, it is only for me to understand what approach you took)
Note that the baseline is now 65%, which is a bit more than the best solution
of isnt_it_ironic
.
random_forest
Deadline: Jan 05, 23:59 3 points
Using the random_forest.py template, train a random forest, which is a collection of decision trees trained with dataset bagging and random feature subsampling.
n_estimators=3 max_depth=2 seed=42
Train acc: 94.1%
Test acc: 85.7%
bootstrapping n_estimators=3 max_depth=2 seed=42
Train acc: 89.0%
Test acc: 88.1%
feature_subsampling 0.5 n_estimators=3 max_depth=2 seed=42
Train acc: 94.1%
Test acc: 90.5%
bootstrapping feature_subsampling 0.5 n_estimators=3 max_depth=2 seed=42
Train acc: 90.4%
Test acc: 92.9%
gradient_boosting
Deadline: Jan 05, 23:59 5 points
Using the gradient_boosting.py template, train gradient boosted decision tree forest for classification.
max_depth=1 learning_rate=0.2 n_estimators=1
Train acc: 93.4%
Test acc: 92.9%
max_depth=1 learning_rate=0.2 n_estimators=2
Train acc: 96.3%
Test acc: 95.2%
max_depth=1 learning_rate=0.2 n_estimators=3
Train acc: 99.3%
Test acc: 97.6%
max_depth=2 learning_rate=0.5 l2=0.1 n_estimators=3
Train acc: 100.0%
Test acc: 100.0%
human_activity_recognition
Deadline: Jan 05, 23:59 4 points+4 bonus
This assignment is a competition task. Your goal is to perform human activity recognition, namely to recognize one of five actions (walking, standing, sitting, standing up, sitting down) using data from four accelerometers.
The human_activity_recognition.py
template loads the training data, downloading it if needed, and
shows how to save a compressed trained estimator and how to load it in
recodex_predict
method.
Your model will be evaluated using accuracy and your goal is to achieve at least 99%. Note that you can use any sklearn algorithm to solve this exercise.
naive_bayes
Deadline: Jan 12, 23:59 4 points
Using the naive_bayes.py template, train a naive Bayes classifier. Implement a Gaussian NB, multinomial NB and Bernoulli NB classifiers.
naive_bayes_type=gaussian gaussian_var_epsilon=1e6 seed=42
Test accuracy 55.35%
naive_bayes_type=multinomial alpha=5 seed=42
Test accuracy 64.53%
naive_bayes_type=bernoulli alpha=0.5 seed=42
Test accuracy 63.59%
kmeans
Deadline: Jan 12, 23:59 3 points
Using the kmeans.py template, implement the kmeans algorithm. After a given number of iterations, print out the predicted cluster for every data point.
clusters=3 examples 90 iterations 3 seed 44
1
1
2
1
2
2
2
2
1
2
0
0
1
2
1
...
clusters=5 examples 140 iterations 6 seed 44
2
3
2
3
2
0
2
2
1
0
3
4
0
3
1
...
gaussian_mixture
Deadline: Jan 12, 23:59 4 points
Cluster given input by fitting a Gaussian mixture using the gaussian_mixture.py template. After every iteration, print out the current negative log likelihood of the model.
clusters=3 examples 90 iterations 8 seed 44
Loss after iteration 1: 387.6
Loss after iteration 2: 384.5
Loss after iteration 3: 382.0
Loss after iteration 4: 379.5
Loss after iteration 5: 376.8
Loss after iteration 6: 373.6
Loss after iteration 7: 367.3
Loss after iteration 8: 358.7
clusters=5 examples 135 iterations 10 seed 44
Loss after iteration 1: 620.5
Loss after iteration 2: 612.4
Loss after iteration 3: 608.8
Loss after iteration 4: 606.7
Loss after iteration 5: 605.1
Loss after iteration 6: 603.6
Loss after iteration 7: 602.1
Loss after iteration 8: 600.5
Loss after iteration 9: 599.0
Loss after iteration 10: 597.2
In the competitions, your goal is to train a model and then predict target values on the test set available only in ReCodEx.
Submitting to ReCodEx
When submitting a competition solution to ReCodEx, you can include any
number of files of any kind. However, these should be exactly one
Python source (.py
) containing a toplevel method recodex_predict
.
This method is called with the test input data (either a Numpy array
or pandas DataFrame) and should return the predictions again as a Numpy array.
If your submission contains a trained model(s), you should also submit the Python source you used to train it.
Evaluation in ReCodEx
ReCodEx starts the evaluation by importing all Python sources and checking
if they export recodex_predict
method. Then it executes it, evaluates the
prediction, and returns one of the following results:
 Failed, 0%: Either there was
not exactly one Python source with
recodex_predict
, or it crashed during prediction, or it generated an output with incorrect size.  OK, 199%: Output of correct size
was returned by
recodex_predict
, but it did not achieve required performance. The percentage returned is eitherachieved/required
orrequired/achieved
(depending on whether the goal is to get over/under the requirement). No points are awarded.  OK, 100%: The required level of performance was reached; however, the exact performance is unknown.
After the deadline, the exact performance becomes visible for all submissions.
Note that in any case, the exit code of your solution is reported as 0.
Points for Competition Submission
Everyone surpassing the required performance immediately gets the regular points for the assignment.
Furthermore, after the deadline, the latest submission of every user passing the required baseline is collected, and bonus points are awarded depending on relative ordering of performance of the selected submissions.
What Is Allowed
 You can use only the given annotated data for training. However, you can generally use any unannotated or manually created data.
 Do not use test set annotations in any way.
 Unless stated otherwise, you can use any algorithm present in
numpy
orscipy
, anything you implement yourself, and any pre/postprocessing insklearn
. Do not use deep network frameworks like TensorFlow or PyTorch.
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 will be transfered to the exam.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100point exam, to obtain grades 3, 2 and 1, respectively. The exam consists of five 20point questions and in addition, you can get at most 40 surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues).
Exam Questions

Linear Regression Define the linear regression model, error function used, and derive an algorithm which can analytically find the best fit. Also describe an extension with L2 regularization (error function and an algorithm).

Gradient Descent Describe gradient descent, its variants (GD, SGD, minibatch SGD), and derive minibatch SGD algorithm for fitting L2 regularized linear regression.

Perceptron Formulate binary classification, perceptron algorithm, its SGD formulation (i.e., which loss gives rise to the algorithm)

Information Theory, MLE Define selfinformation, entropy, crossentropy and KullbackLeibler divergence. Prove Gibbs inequality, i.e., that crossentropy is larger or equal to entropy. Finally describe maximum likelihood estimation, as minimizing NLL, crossentropy and KL divergence.

Maximum Likelihood Estimation, MSE Describe maximum likelihood estimation, as minimizing NLL, crossentropy and KL divergence. Define mean squared error and show how it can be derived using MLE.

Logistic Regression, SGD Formulate binary classification and logistic regression model including the sigmoid function. Describe how we can interpret the model outputs as logits. Finally, write down a training algorithm based on SGD and explicitly compute the gradient of the loss function.

Multiclass Logistic Regression, SGD Formulate multiclass classification and multiclass logistic regression model including the softmax function. Describe how we can interpret the model outputs as logits. Finally, write down a training algorithm based on SGD and explicitly compute the gradient of the loss function.

Multiclass Logistic Regression, Decision Regions Formulate multiclass classification and multiclass logistic regression model including the softmax function. Describe what a logit is. Write down a training algorithm based on SGD. Finally, show that decision regions of the multiclass logistic regression are convex.

Multilayer Perceptron, SGD Formulate multiclass classification and multilayer perceptron model, including hidden activation functions ($\sigma$, tanh, ReLU) and output activation functions (linear, $\sigma$, softmax). Write down a training algorithm based on SGD.

Multilayer Perceptron, Universal Approximation Theorem Formulate multiclass classification and multilayer perceptron model, including hidden activation functions ($\sigma$, tanh, ReLU) and output activation functions (linear, $\sigma$, softmax). Formulate the Universal approximation theorem.

Kernelized Linear Regression Formulate what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate linear regression using kernels and derive a dual formulation of the training algorithm (either as full GD or SGD) and write down how predictions are made.

HardMargin Support Vector Machines Formulate what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate kernelized hardmargin support vector machines and write down both the primal formulation (the loss to minimize and the constraints to fulfil) and the dual formulation (again the loss to minimize and the constraints to fulfil). Describe what a support vector is and how a prediction can be made using the support vectors only. Finally, describe the oneversusrest and oneversusone schemes for extending the binary SVM classification to a multiclass one.

SoftMargin Support Vector Machines Describe what a kernel is, and describe polynomial kernel and Gaussian kernel. Formulate kernelized softmargin support vector machines and write down the both the primal formulation (the loss to minimize and the constraints to fulfil) and the dual formulation (again the loss to minimize and the constraints to fulfil). Finally, describe what a support vector is, which input examples are support vectors, and how a prediction can be made using the support vectors only.

Sequential Minimization Optimization Algorithm Formulate kernelized softmargin support vector machines and write down the dual formulation of the problem (the loss to minimize and constraints to fulfil). Describe the KKT conditions of the solution and formulate the SMO algorithm (to the level of $\textit{AlgorithmSketch}$ section of the slides; the $\textit{UpdateRules}$ section of the slides is not required)

Decision Trees Describe a decision tree, and formulate an algorithm to fit a decision tree using a Gini criterion and entropy criterion. Show that the Gini criterion corresponds to MSE loss and that entropy criterion corresponds to NLL. Finally describe the maximum tree depth and maximum number of leaf nodes constraints and how can they be implemented.

Random Forests Describe a decision tree, and formulate an algorithm to fit a decision tree using a Gini criterion and entropy criterion. Then, define random forests model, describe bagging and random subset of features procedure, and show a training procedure for random forests.

Gradient Boosting Formulate gradient boosting decision trees, show how prediction is made, derive the loss using secondorder Taylor expansion, show optimal weight for a node and the resulting splitting criterion. Finally, formulate the training algorithm.

Naive Bayes Derive the generic naive Bayes classifier – formulation and prediction. Then, describe Gaussian NB, multinomial NB and Bernoulli NB variants.

KMeans Describe clustering and derive the KMeans algorithm – formulate the loss it minimizes and show that the algorithm converges to its local minimum.

Gaussian Mixture Describe clustering, formulate multinomial Gaussian distribution and write down the Gaussian mixture model and the training algorithm.