Machine Learning for Greenhorns – Winter 2020/21
Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.
Python programming skills are required, together with basic probability theory knowledge.
About
Official name: Introduction to Machine Learning with Python
SIS code: NPFL129
Semester: winter
Ecredits: 5
Examination: 2/2 C+Ex
Guarantor: Milan Straka
Timespace Coordinates
 lecture: the lecture is held on Monday 12:20 (Czech) and 14:00 (English); first lecture is on Oct 05
 practicals: there are two parallel practicals, on Wednesday 10:40 (Czech) and 12:00 (English); first practicals are on Oct 07
Lectures
1. Introduction to Machine Learning Slides PDF Slides CZ Lecture EN Lecture Questions linear_regression_manual linear_regression_features
2. Linear Regression II, SGD Slides PDF Slides CZ Lecture EN Lecture Questions linear_regression_l2 linear_regression_sgd feature_engineering rental_competition
3. Perceptron and Logistic Regression Slides PDF Slides CZ Lecture EN Lecture Questions perceptron logistic_regression_sgd grid_search thyroid_competition
4. Multiclass Logistic Regression, Multilayer Perceptron Slides PDF Slides CZ Lecture EN Lecture Questions softmax_classification_sgd mlp_classification_sgd mnist_competition
5. Derivation of Softmax, kNN Slides PDF Slides CZ Lecture EN Lecture Questions k_nearest_neighbors diacritization
6. Kernel Methods, SVM Slides PDF Slides CZ Lecture EN Lecture Questions kernel_linear_regression diacritization_dictionary
7. Softmargin SVM, SMO Slides PDF Slides CZ Lecture EN Lecture Questions smo_algorithm svm_multiclass
8. SVR, Kernel Approximation, Naive Bayes Slides PDF Slides CZ Lecture EN Lecture Questions naive_bayes isnt_it_ironic kernel_approximation
9. Model Combination, Decision Trees, Random Forests Slides PDF Slides CZ Lecture EN Lecture Questions decision_tree random_forest human_activity_recognition
10. Gradient Boosting Decision Trees Slides PDF Slides CZ Lecture EN Lecture Questions gradient_boosting nli_competition
11. PCA, KMeans, Gaussian Mixture Slides PDF Slides CZ Lecture EN Lecture Questions pca kmeans
12. Gaussian Mixture, EM Algorithm, BiasVariance Tradeoff Slides PDF Slides Questions CZ Lecture EN Lecture
13. Statistical Hypothesis Testing, Model Comparison Slides PDF Slides CZ Lecture EN Lecture Questions gaussian_mixture bootstrap_resampling
License
Unless otherwise stated, teaching materials for this course are available under CC BYSA 4.0.
The lecture content, including references to some additional study materials. The main study material is the Pattern Recognition and Machine Learning by Christopher Bishop, referred to as PRML.
Note that the topics in italics are not required for the exam.
1. Introduction to Machine Learning
Oct 05 Slides PDF Slides CZ Lecture EN Lecture Questions linear_regression_manual linear_regression_features
 Introduction to machine learning
 Basic definitions [Sections 1 and 1.1 of PRML]
 Linear regression model [Section 3.1 of PRML]
2. Linear Regression II, SGD
Oct 12 Slides PDF Slides CZ Lecture EN Lecture Questions linear_regression_l2 linear_regression_sgd feature_engineering rental_competition
 L2 regularization in linear regression [Section 1.1, 3.1.4 of PRML]
 Solving linear regression with SVD
 Random variables and probability distributions [Section 1.2, 1.2.1 of PRML]
 Expectation and variance [Section 1.2.2 of PRML]
 Gradient descent [Section 5.2.4 of PRML]
 Stochastic gradient descent solution of linear regression [slides]
3. Perceptron and Logistic Regression
Oct 19 Slides PDF Slides CZ Lecture EN Lecture Questions perceptron logistic_regression_sgd grid_search thyroid_competition
 Crossvalidation [Section 1.3 of PRML]
 Linear models for classification [Section 4.1.1 of PRML]
 Perceptron algorithm [Section 4.1.7 of PRML]
 Probability distributions [Bernoulli Section 2.1, Categorical Section 2.2, Gaussian Section 2.3 of PRML]
 Information theory [Section 1.6 of PRML]
 Maximum likely estimation [Section 1.2.5 of PRML]
 Logistic regression [Section 4.3.2 of PRML]
4. Multiclass Logistic Regression, Multilayer Perceptron
Oct 26 Slides PDF Slides CZ Lecture EN Lecture Questions softmax_classification_sgd mlp_classification_sgd mnist_competition
 Generalized linear models
 MSE as MLE [Section 3.1.1 of PRML]
 Multiclass logistic regression [Section 4.3.4 of PRML]
 Poisson regression
 Multilayer perceptron (neural network) [Sections 55.3 of PRML]
 Universal approximation theorem
5. Derivation of Softmax, kNN
Nov 02 Slides PDF Slides CZ Lecture EN Lecture Questions k_nearest_neighbors diacritization
 Lagrange multipliers [Appendix E of PRML]
 Calculus of variations [Appendix D of PRML]
 Normal distribution via the maximum entropy principle [2 pages before Section 1.6.1 of PRML]
 Derivation of softmax via the maximum entropy principle [The equivalence of logistic regression and maximum entropy models writeup]
 $F_1$ score and $F_β$ score
 Knearest neighbors [Section 2.5.2 of PRML]
6. Kernel Methods, SVM
Nov 09 Slides PDF Slides CZ Lecture EN Lecture Questions kernel_linear_regression diacritization_dictionary
 Kernels [Sections 4.3.14.3.3 of Introduction to Machine Learning]
 Kernel linear regression [Sections 1.2, 1.3 of CS229 Lecture notes, part V]
 KarushKuhnTucker Conditions [Appendix E of PRML, Section 6 of CS229 Lecture notes, part V]
 Hardmargin SVM [Section 7.1 of PRML, Section 7 of CS229 Lecture notes, part V]
7. Softmargin SVM, SMO
Nov 16 Slides PDF Slides CZ Lecture EN Lecture Questions smo_algorithm svm_multiclass
 Softmargin SVM [Section 7.1.1 of PRML, Section 8 of CS229 Lecture notes, part V]
 Sequential minimal optimization algorithm [Section 9 of CS229 Lecture notes, part V, CS229 Simplified SMO Algorithm]
 Oneversusone and oneversusrest schemes [Section 4.1.2 of PRML]
8. SVR, Kernel Approximation, Naive Bayes
Nov 23 Slides PDF Slides CZ Lecture EN Lecture Questions naive_bayes isnt_it_ironic kernel_approximation
 Support Vector Machine for regression [Section 7.1.4 of PRML]
 Random Fourier Features [Paper Random Features for LargeScale Kernel Machines]
 Nyström Approximation [Paper Nyström Method vs Random Fourier Features]
 TFIDF
 Naive Bayess classifier [Basic idea in Section 8.2.2 of PRML]
9. Model Combination, Decision Trees, Random Forests
Nov 30 Slides PDF Slides CZ Lecture EN Lecture Questions decision_tree random_forest human_activity_recognition
 Covariance and correlation
 Model ensembling [Section 14.2 of PRML]
 Decision trees [Section 14.4 of PRML]
 Random forests
10. Gradient Boosting Decision Trees
Dec 07 Slides PDF Slides CZ Lecture EN Lecture Questions gradient_boosting nli_competition
 Gradient boosting decision trees [Paper XGBoost: A Scalable Tree Boosting System]
11. PCA, KMeans, Gaussian Mixture
Dec 14 Slides PDF Slides CZ Lecture EN Lecture Questions pca kmeans
 Principal component analysis [Sections 12.1 and 12.4.2 of PRML]
 Power iteration algorithm
 KMeans clustering [Section 9.1 of PRML]
 Multivariate Gaussian [Section 2.3 of PRML]
12. Gaussian Mixture, EM Algorithm, BiasVariance Tradeoff
Dec 21 Slides PDF Slides Questions CZ Lecture EN Lecture
 Gaussian mixture clustering [Section 9.2 of PRML]
 EM algorithm [Sections 9.3 and 9.4 of PRML]
 Biasvariance tradeoff [Section 3.2 of PRML]
 Double descent [Paper Reconciling modern machine learning practice and the biasvariance tradeoff]
 Deep double descent [Paper Deep Double Descent: Where Bigger Models and More Data Hurt]
13. Statistical Hypothesis Testing, Model Comparison
Jan 04 Slides PDF Slides CZ Lecture EN Lecture Questions gaussian_mixture bootstrap_resampling
 Statistical hypothesis testing
 Bootstrap resampling
 Model comparison
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 (including the bonus points) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available.
Environment
The tasks are evaluated automatically using the ReCodEx Code Examiner.
The evaluation is performed using Python 3.8, scikitlearn 0.23.2, numpy 1.18.5, scipy 1.5.2, pandas 1.1.2 and matplotlib 3.3.2. You should install the exact version of these packages yourselves.
Teamwork
Solving assignments in teams of size 2 or 3 is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.
linear_regression_manual
Deadline: Oct 20, 23:59 3 points
Starting with the linear_regression_manual.py template, solve a linear regression problem using the algoritm from the lecture which explicitly computes the matrix inversion. Then compute root mean square error on the test set.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 linear_regression_manual.py test_size=0.1
3.87
python3 linear_regression_manual.py test_size=0.9
5.29
linear_regression_features
Deadline: Oct 20, 23:59 3 points
Starting with the
linear_regression_features.py
template, use scikitlearn
to train a model of a 1D curve.
Try using features $x^1, x^2, …, x^D$ for $D$ from 1 to a given range, and report RMSE of every such configuration.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 linear_regression_features.py data_size=10 test_size=5 range=5
Maximum feature order 1: 0.74 RMSE
Maximum feature order 2: 1.87 RMSE
Maximum feature order 3: 0.53 RMSE
Maximum feature order 4: 4.52 RMSE
Maximum feature order 5: 1.70 RMSE
python3 linear_regression_features.py data_size=50 test_size=40 range=9
Maximum feature order 1: 0.63 RMSE
Maximum feature order 2: 0.73 RMSE
Maximum feature order 3: 0.31 RMSE
Maximum feature order 4: 0.26 RMSE
Maximum feature order 5: 1.22 RMSE
Maximum feature order 6: 0.69 RMSE
Maximum feature order 7: 2.39 RMSE
Maximum feature order 8: 7.28 RMSE
Maximum feature order 9: 201.70 RMSE
linear_regression_l2
Deadline: Oct 27, 23:59 2 points
Starting with the linear_regression_l2.py
template, use scikitlearn
to train L2regularized linear regression models
and print the results of the best of them.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 linear_regression_l2.py test_size=0.15
2.19 3.63
python3 linear_regression_l2.py test_size=0.9
3.00 5.29
linear_regression_sgd
Deadline: Oct 27, 23:59 5 points
Starting with the linear_regression_sgd.py, implement minibatch SGD for linear regression and compare the results to an explicit linear regression solver.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 linear_regression_sgd.py batch_size=10 epochs=50 learning_rate=0.01
Test RMSE: SGD 88.98, explicit 91.51
python3 linear_regression_sgd.py batch_size=10 epochs=50 learning_rate=0.1
Test RMSE: SGD 88.80, explicit 91.51
python3 linear_regression_sgd.py batch_size=10 epochs=50 learning_rate=0.001
Test RMSE: SGD 106.51, explicit 91.51
python3 linear_regression_sgd.py batch_size=1 epochs=50 learning_rate=0.01
Test RMSE: SGD 88.80, explicit 91.51
python3 linear_regression_sgd.py batch_size=50 epochs=50 learning_rate=0.01
Test RMSE: SGD 97.63, explicit 91.51
feature_engineering
Deadline: Oct 27, 23:59 4 points
Starting with the feature_engineering.py
template, learn how to perform basic feature engineering using scikitlearn
.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 feature_engineering.py dataset=boston
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
python3 feature_engineering.py dataset=iris
0.5427 0.01714 0.348 0.298 0.2945 0.009302 0.1889 0.1617 0.0002938 0.005965 0.005108 0.1211 0.1037 0.08882
0.4264 1.046 0.8796 1.375 0.1818 0.4458 0.3751 0.5864 1.093 0.9196 1.438 0.7736 1.21 1.891
0.4216 1.783 0.05276 0.02873 0.1777 0.7515 0.02224 0.01211 3.177 0.09404 0.0512 0.002783 0.001516 0.0008252
0.5427 2.331 1.306 1.183 0.2945 1.265 0.7085 0.6421 5.434 3.043 2.758 1.705 1.545 1.4
1.027 1.783 0.3606 0.3752 1.055 1.831 0.3705 0.3855 3.177 0.6429 0.6689 0.1301 0.1353 0.1408
0.3053 0.4971 0.4662 0.1059 0.09319 0.1517 0.1423 0.03234 0.2471 0.2317 0.05265 0.2173 0.04938 0.01122
0.1793 2.074 1.306 1.318 0.03214 0.3718 0.2341 0.2363 4.301 2.708 2.733 1.705 1.72 1.737
2.244 1.011 1.765 1.375 5.033 2.269 3.961 3.085 1.023 1.785 1.391 3.117 2.428 1.891
0.1841 0.24 0.348 0.298 0.03391 0.04418 0.06409 0.05488 0.05758 0.08352 0.07151 0.1211 0.1037 0.08882
1.153 0.4971 0.5252 0.1634 1.33 0.5732 0.6057 0.1884 0.2471 0.2611 0.08121 0.2759 0.08581 0.02669
python3 feature_engineering.py dataset=linnerud
1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
rental_competition
Deadline: Oct 27, 23:59 3 points+5 bonus
This assignment is a competition task. Your goal is to perform linear regression on the data from a rental shop. The train set contains 1000 instances, each instance consists of 12 features, both integral and real.
The rental_competition.py template shows how to load the training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.
The performance of your system is measured using root mean squared error and your goal is to achieve RMSE less than 110. Note that you can use any number of generalized linear models with any regularization to solve this assignment (but no decision trees, MLPs, …).
perceptron
Deadline: Nov 03, 23:59 2 points
Starting with the perceptron.py template, implement the perceptron algorithm.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 perceptron.py data_size=100 seed=17
Learned weights 4.10 2.94 1.00
python3 perceptron.py data_size=50 seed=320
Learned weights 2.30 1.96 2.00
logistic_regression_sgd
Deadline: Nov 03, 23:59 5 points
Starting with the logistic_regression_sgd.py, implement minibatch SGD for logistic regression.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 logistic_regression_sgd.py data_size=100 batch_size=10 iterations=9 learning_rate=0.5
After iteration 1: train loss 0.3464 acc 90.0%, test loss 0.3369 acc 84.0%
After iteration 2: train loss 0.2324 acc 96.0%, test loss 0.2397 acc 92.0%
After iteration 3: train loss 0.1840 acc 94.0%, test loss 0.1952 acc 94.0%
After iteration 4: train loss 0.1566 acc 98.0%, test loss 0.1686 acc 94.0%
After iteration 5: train loss 0.1392 acc 98.0%, test loss 0.1513 acc 96.0%
After iteration 6: train loss 0.1271 acc 98.0%, test loss 0.1394 acc 96.0%
After iteration 7: train loss 0.1178 acc 98.0%, test loss 0.1303 acc 96.0%
After iteration 8: train loss 0.1103 acc 98.0%, test loss 0.1229 acc 96.0%
After iteration 9: train loss 0.1041 acc 98.0%, test loss 0.1169 acc 96.0%
Learned weights 2.81 0.59 0.19
python3 logistic_regression_sgd.py data_size=90 batch_size=5 iterations=9 learning_rate=0.2
After iteration 1: train loss 0.4005 acc 86.7%, test loss 0.4404 acc 80.0%
After iteration 2: train loss 0.3736 acc 88.9%, test loss 0.4110 acc 80.0%
After iteration 3: train loss 0.3604 acc 86.7%, test loss 0.3967 acc 80.0%
After iteration 4: train loss 0.3520 acc 86.7%, test loss 0.3880 acc 80.0%
After iteration 5: train loss 0.3475 acc 86.7%, test loss 0.3836 acc 80.0%
After iteration 6: train loss 0.3446 acc 86.7%, test loss 0.3809 acc 82.2%
After iteration 7: train loss 0.3424 acc 86.7%, test loss 0.3791 acc 84.4%
After iteration 8: train loss 0.3409 acc 86.7%, test loss 0.3780 acc 84.4%
After iteration 9: train loss 0.3400 acc 86.7%, test loss 0.3773 acc 84.4%
Learned weights 0.09 1.74 0.03
grid_search
Deadline: Nov 03, 23:59 3 points
Starting with grid_search.py
template, perform a hyperparameter grid search, evaluating hyperparameter performance
using a stratified kfold crossvalidation, and finally evaluate a model
trained with best hyparparameters on all training data. The easiest way is
to utilize sklearn.model_selection.GridSearchCV
.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 grid_search.py test_size=0.5
Rank: 11 Crossval: 86.7% lr__C: 0.01 lr__solver: lbfgs polynomial__degree: 1
Rank: 5 Crossval: 92.7% lr__C: 0.01 lr__solver: lbfgs polynomial__degree: 2
Rank: 11 Crossval: 86.7% lr__C: 0.01 lr__solver: sag polynomial__degree: 1
Rank: 5 Crossval: 92.7% lr__C: 0.01 lr__solver: sag polynomial__degree: 2
Rank: 7 Crossval: 90.8% lr__C: 1 lr__solver: lbfgs polynomial__degree: 1
Rank: 3 Crossval: 96.8% lr__C: 1 lr__solver: lbfgs polynomial__degree: 2
Rank: 7 Crossval: 90.8% lr__C: 1 lr__solver: sag polynomial__degree: 1
Rank: 4 Crossval: 96.8% lr__C: 1 lr__solver: sag polynomial__degree: 2
Rank: 10 Crossval: 90.1% lr__C: 100 lr__solver: lbfgs polynomial__degree: 1
Rank: 1 Crossval: 97.2% lr__C: 100 lr__solver: lbfgs polynomial__degree: 2
Rank: 9 Crossval: 90.5% lr__C: 100 lr__solver: sag polynomial__degree: 1
Rank: 2 Crossval: 97.0% lr__C: 100 lr__solver: sag polynomial__degree: 2
Test accuracy: 98.33
python3 grid_search.py test_size=0.7
Rank: 11 Crossval: 87.9% lr__C: 0.01 lr__solver: lbfgs polynomial__degree: 1
Rank: 5 Crossval: 91.8% lr__C: 0.01 lr__solver: lbfgs polynomial__degree: 2
Rank: 11 Crossval: 87.9% lr__C: 0.01 lr__solver: sag polynomial__degree: 1
Rank: 5 Crossval: 91.8% lr__C: 0.01 lr__solver: sag polynomial__degree: 2
Rank: 7 Crossval: 91.3% lr__C: 1 lr__solver: lbfgs polynomial__degree: 1
Rank: 3 Crossval: 95.9% lr__C: 1 lr__solver: lbfgs polynomial__degree: 2
Rank: 7 Crossval: 91.3% lr__C: 1 lr__solver: sag polynomial__degree: 1
Rank: 4 Crossval: 95.7% lr__C: 1 lr__solver: sag polynomial__degree: 2
Rank: 10 Crossval: 89.2% lr__C: 100 lr__solver: lbfgs polynomial__degree: 1
Rank: 1 Crossval: 96.5% lr__C: 100 lr__solver: lbfgs polynomial__degree: 2
Rank: 9 Crossval: 89.2% lr__C: 100 lr__solver: sag polynomial__degree: 1
Rank: 2 Crossval: 96.1% lr__C: 100 lr__solver: sag polynomial__degree: 2
Test accuracy: 96.98
thyroid_competition
Deadline: Nov 03, 23:59 4 points+5 bonus
This assignment is a competition task. Your goal is to perform binary classification – given medical data with 15 binary and 6 realvalued attributes, predict whether thyroid is functioning normally or not. The train set and test set consist of ~3.5k instances.
The thyroid_competition.py template shows how to load training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.
The performance of your system is measured using accuracy of correctly predicted examples and your goal is to achieve at least 96% accuracy. Note that you can use any number of generalized linear models with any regularization to solve this assignment (but no decision trees, MLPs, …).
softmax_classification_sgd
Deadline: Nov 10, 23:59 3 points
Starting with the softmax_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 softmax_classification_sgd.py batch_size=10 iterations=10 learning_rate=0.005
After iteration 1: train loss 0.2428 acc 93.4%, test loss 0.2996 acc 90.8%
After iteration 2: train loss 0.1962 acc 94.6%, test loss 0.2649 acc 92.5%
After iteration 3: train loss 0.1746 acc 95.2%, test loss 0.2595 acc 91.5%
After iteration 4: train loss 0.1268 acc 96.8%, test loss 0.2074 acc 92.1%
After iteration 5: train loss 0.1013 acc 97.4%, test loss 0.1861 acc 93.9%
After iteration 6: train loss 0.0950 acc 98.4%, test loss 0.1754 acc 93.7%
After iteration 7: train loss 0.0810 acc 98.1%, test loss 0.1587 acc 94.9%
After iteration 8: train loss 0.0761 acc 98.2%, test loss 0.1564 acc 94.9%
After iteration 9: train loss 0.0764 acc 98.3%, test loss 0.1654 acc 94.9%
After iteration 10: train loss 0.0746 acc 98.2%, test loss 0.1694 acc 95.2%
Learned weights:
0.03 0.10 0.01 0.07 0.03 0.03 0.07 0.05 0.07 0.10 ...
0.09 0.08 0.12 0.07 0.20 0.10 0.02 0.06 0.02 0.07 ...
0.05 0.07 0.01 0.01 0.03 0.02 0.01 0.10 0.03 0.10 ...
0.02 0.05 0.03 0.09 0.17 0.14 0.02 0.05 0.09 0.04 ...
0.07 0.07 0.11 0.06 0.09 0.09 0.10 0.03 0.04 0.01 ...
0.07 0.04 0.18 0.02 0.04 0.14 0.10 0.03 0.03 0.02 ...
0.09 0.04 0.12 0.08 0.08 0.12 0.08 0.05 0.05 0.05 ...
0.07 0.02 0.04 0.02 0.10 0.01 0.16 0.04 0.03 0.01 ...
0.02 0.02 0.02 0.05 0.02 0.03 0.10 0.03 0.08 0.07 ...
0.04 0.06 0.07 0.10 0.04 0.05 0.06 0.08 0.01 0.01 ...
python3 softmax_classification_sgd.py batch_size=1 iterations=10 learning_rate=0.005 test_size=1597
After iteration 1: train loss 0.9709 acc 83.0%, test loss 1.5315 acc 76.8%
After iteration 2: train loss 0.5639 acc 90.0%, test loss 1.2635 acc 84.2%
After iteration 3: train loss 0.9004 acc 83.5%, test loss 1.4205 acc 80.3%
After iteration 4: train loss 0.0646 acc 97.5%, test loss 0.8086 acc 88.7%
After iteration 5: train loss 0.0475 acc 98.5%, test loss 0.7024 acc 90.7%
After iteration 6: train loss 0.0711 acc 97.0%, test loss 0.8800 acc 88.5%
After iteration 7: train loss 0.3303 acc 92.5%, test loss 1.1663 acc 85.3%
After iteration 8: train loss 0.0862 acc 97.5%, test loss 0.9596 acc 87.5%
After iteration 9: train loss 0.0050 acc 100.0%, test loss 0.7205 acc 90.5%
After iteration 10: train loss 0.2158 acc 95.0%, test loss 1.1674 acc 87.0%
Learned weights:
0.03 0.10 0.04 0.15 0.01 0.05 0.07 0.05 0.07 0.12 ...
0.09 0.06 0.37 0.24 0.43 0.17 0.06 0.06 0.02 0.17 ...
0.05 0.12 0.20 0.09 0.03 0.15 0.01 0.10 0.03 0.25 ...
0.02 0.01 0.10 0.37 0.60 0.13 0.16 0.06 0.09 0.30 ...
0.07 0.06 0.22 0.20 0.26 0.28 0.10 0.04 0.04 0.03 ...
0.07 0.07 0.49 0.25 0.33 0.41 0.40 0.05 0.03 0.05 ...
0.09 0.07 0.33 0.28 0.07 0.30 0.10 0.05 0.05 0.14 ...
0.07 0.00 0.15 0.02 0.16 0.18 0.19 0.09 0.03 0.09 ...
0.02 0.01 0.08 0.11 0.08 0.10 0.29 0.03 0.08 0.12 ...
0.04 0.05 0.12 0.00 0.46 0.26 0.01 0.08 0.01 0.08 ...
python3 softmax_classification_sgd.py batch_size=100 iterations=10 learning_rate=0.05
After iteration 1: train loss 4.6130 acc 66.4%, test loss 4.7159 acc 66.5%
After iteration 2: train loss 0.4662 acc 91.0%, test loss 0.5955 acc 87.1%
After iteration 3: train loss 0.4140 acc 90.2%, test loss 0.6039 acc 87.5%
After iteration 4: train loss 0.2332 acc 93.3%, test loss 0.4390 acc 90.0%
After iteration 5: train loss 0.1398 acc 96.2%, test loss 0.2685 acc 91.8%
After iteration 6: train loss 0.0965 acc 96.9%, test loss 0.2300 acc 93.7%
After iteration 7: train loss 0.1034 acc 96.7%, test loss 0.2712 acc 92.5%
After iteration 8: train loss 0.2048 acc 93.8%, test loss 0.4191 acc 90.5%
After iteration 9: train loss 0.8357 acc 84.6%, test loss 0.9188 acc 84.1%
After iteration 10: train loss 0.0825 acc 97.7%, test loss 0.2471 acc 94.0%
Learned weights:
0.03 0.11 0.02 0.08 0.08 0.04 0.10 0.05 0.07 0.12 ...
0.09 0.07 0.15 0.14 0.27 0.15 0.05 0.07 0.02 0.11 ...
0.05 0.10 0.10 0.03 0.04 0.05 0.03 0.10 0.03 0.20 ...
0.02 0.04 0.00 0.12 0.26 0.24 0.02 0.05 0.09 0.08 ...
0.07 0.07 0.15 0.14 0.08 0.14 0.13 0.03 0.04 0.00 ...
0.07 0.03 0.29 0.07 0.10 0.27 0.20 0.03 0.03 0.01 ...
0.09 0.05 0.27 0.12 0.21 0.36 0.15 0.05 0.04 0.12 ...
0.07 0.01 0.05 0.00 0.14 0.09 0.23 0.04 0.03 0.01 ...
0.02 0.03 0.02 0.04 0.02 0.06 0.16 0.03 0.08 0.06 ...
0.04 0.06 0.05 0.15 0.03 0.06 0.07 0.07 0.01 0.01 ...
mlp_classification_sgd
Deadline: Nov 10, 23:59 6 points
Starting with the mlp_classification_sgd.py, implement minibatch SGD for multilayer perceptron classification.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 mlp_classification_sgd.py iterations=10 batch_size=10 hidden_layer=20
After iteration 1: train acc 78.1%, test acc 76.3%
After iteration 2: train acc 91.7%, test acc 88.2%
After iteration 3: train acc 94.8%, test acc 91.8%
After iteration 4: train acc 94.7%, test acc 91.6%
After iteration 5: train acc 96.4%, test acc 94.1%
After iteration 6: train acc 96.3%, test acc 92.0%
After iteration 7: train acc 98.2%, test acc 95.0%
After iteration 8: train acc 97.8%, test acc 94.9%
After iteration 9: train acc 98.3%, test acc 95.7%
After iteration 10: train acc 97.6%, test acc 94.4%
Learned parameters:
0.03 0.09 0.05 0.02 0.07 0.07 0.09 0.07 0.02 0.04 0.10 0.09 0.07 0.06 0.06 0.06 0.04 0.00 0.01 0.04 ...
0.10 0.05 0.32 0.02 0.29 0.02 0.04 0.17 0.02 0.09 0.14 0.22 0.09 0.11 0.03 0.13 0.17 0.09 0.07 0.02 ...
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
0.01 0.02 0.01 0.01 0.02 0.00 0.00 0.01 0.01 0.01 ...
python3 mlp_classification_sgd.py iterations=10 batch_size=10 hidden_layer=50
After iteration 1: train acc 89.9%, test acc 87.6%
After iteration 2: train acc 94.2%, test acc 92.1%
After iteration 3: train acc 97.2%, test acc 93.1%
After iteration 4: train acc 97.0%, test acc 92.8%
After iteration 5: train acc 98.3%, test acc 96.4%
After iteration 6: train acc 98.4%, test acc 95.5%
After iteration 7: train acc 98.9%, test acc 96.4%
After iteration 8: train acc 98.1%, test acc 93.5%
After iteration 9: train acc 99.2%, test acc 96.7%
After iteration 10: train acc 99.5%, test acc 95.9%
Learned parameters:
0.03 0.09 0.05 0.02 0.07 0.07 0.09 0.07 0.02 0.04 0.10 0.09 0.07 0.06 0.06 0.06 0.04 0.00 0.01 0.04 ...
0.03 0.02 0.16 0.04 0.16 0.04 0.15 0.03 0.08 0.07 0.09 0.04 0.17 0.16 0.18 0.03 0.14 0.17 0.09 0.24 ...
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
0.01 0.01 0.00 0.01 0.00 0.01 0.01 0.00 0.00 0.00 ...
python3 mlp_classification_sgd.py iterations=10 batch_size=10 hidden_layer=200
After iteration 1: train acc 96.0%, test acc 92.8%
After iteration 2: train acc 97.7%, test acc 95.1%
After iteration 3: train acc 98.2%, test acc 94.4%
After iteration 4: train acc 99.3%, test acc 97.0%
After iteration 5: train acc 99.3%, test acc 96.9%
After iteration 6: train acc 99.7%, test acc 96.1%
After iteration 7: train acc 99.9%, test acc 97.4%
After iteration 8: train acc 99.7%, test acc 97.0%
After iteration 9: train acc 100.0%, test acc 97.1%
After iteration 10: train acc 100.0%, test acc 97.6%
Learned parameters:
0.03 0.09 0.05 0.02 0.07 0.07 0.09 0.07 0.02 0.04 0.10 0.09 0.07 0.06 0.06 0.06 0.04 0.00 0.01 0.04 ...
0.01 0.09 0.04 0.09 0.06 0.06 0.05 0.04 0.00 0.02 0.04 0.02 0.04 0.09 0.10 0.08 0.06 0.09 0.01 0.01 ...
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
python3 mlp_classification_sgd.py iterations=10 batch_size=1 hidden_layer=200 test_size=1597
After iteration 1: train acc 88.0%, test acc 78.3%
After iteration 2: train acc 83.0%, test acc 75.6%
After iteration 3: train acc 91.5%, test acc 82.7%
After iteration 4: train acc 90.5%, test acc 85.3%
After iteration 5: train acc 74.5%, test acc 69.0%
After iteration 6: train acc 85.0%, test acc 78.8%
After iteration 7: train acc 98.0%, test acc 89.4%
After iteration 8: train acc 92.0%, test acc 84.7%
After iteration 9: train acc 99.5%, test acc 90.7%
After iteration 10: train acc 97.5%, test acc 91.7%
Learned parameters:
0.03 0.09 0.05 0.02 0.07 0.07 0.09 0.07 0.02 0.04 0.10 0.09 0.07 0.06 0.06 0.06 0.04 0.00 0.01 0.04 ...
0.01 0.09 0.04 0.09 0.06 0.06 0.05 0.04 0.00 0.02 0.05 0.02 0.06 0.13 0.11 0.10 0.05 0.08 0.10 0.04 ...
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 ...
0.03 0.06 0.06 0.00 0.05 0.04 0.06 0.03 0.20 0.02 ...
mnist_competition
Deadline: Nov 10, 23:59 6 points+6 bonus
This assignment is a competition task. Your goal is to perform 10class classification on the wellknown MNIST dataset. The train set contains 60k images, each consisting of $28×28$ pixels with values in $\{0, 1, …, 255\}$. Evaluation is performed on 10k test images. You can find a simple online demo of a trained classifier here.
The mnist_competition.py template shows how to load training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.
The performance of your system is measured using accuracy of correctly predicted examples and your goal is to achieve at least 94% accuracy. Note that you can use any sklearn algorithm to solve this exercise.
k_nearest_neighbors
Deadline: Nov 17, 23:59 4 points
Starting with the k_nearest_neighbors.py, implement knearest neighbors algoritm for classifying MNIST.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 k_nearest_neighbors.py k=1 p=2 weights=uniform test_size=500 train_size=100
Knn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 67.60%
python3 k_nearest_neighbors.py k=3 p=2 weights=uniform test_size=500 train_size=100
Knn accuracy for 3 nearest neighbors, L_2 metric, uniform weights: 61.40%
python3 k_nearest_neighbors.py k=1 p=2 weights=uniform test_size=500 train_size=1000
Knn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 87.40%
python3 k_nearest_neighbors.py k=5 p=2 weights=uniform test_size=500 train_size=1000
Knn accuracy for 5 nearest neighbors, L_2 metric, uniform weights: 88.80%
python3 k_nearest_neighbors.py k=5 p=1 weights=uniform test_size=500 train_size=1000
Knn accuracy for 5 nearest neighbors, L_1 metric, uniform weights: 86.60%
python3 k_nearest_neighbors.py k=5 p=3 weights=uniform test_size=500 train_size=1000
Knn accuracy for 5 nearest neighbors, L_3 metric, uniform weights: 89.20%
python3 k_nearest_neighbors.py k=1 p=2 weights=uniform test_size=500 train_size=5000
Knn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 94.00%
python3 k_nearest_neighbors.py k=10 p=2 weights=uniform test_size=500 train_size=5000
Knn accuracy for 10 nearest neighbors, L_2 metric, uniform weights: 94.00%
python3 k_nearest_neighbors.py k=10 p=2 weights=inverse test_size=500 train_size=5000
Knn accuracy for 10 nearest neighbors, L_2 metric, inverse weights: 94.20%
python3 k_nearest_neighbors.py k=10 p=2 weights=softmax test_size=500 train_size=5000
Knn accuracy for 10 nearest neighbors, L_2 metric, softmax weights: 94.80%
diacritization
Deadline: Nov 17, 23:59 6 points+6 bonus
The goal of the diacritization
competition is to learn to add diacritics to
the given Czech text. We will use a small collection of
fiction books,
which is available under CC BYNCSA license.
Note that these texts are the only allowed training data, you cannot use any
other Czech texts to train or evaluate your model. At test time, you will be
given a text without diacritics and you should return it including diacritical
marks – to be explicit, we only consider diacritized letters áčďéěíňóřšťúůýž
and their uppercase variants.
The diacritization.py template shows how to load the training data, downloading it if needed.
Each sentence in the data is stored on a single line, with exactly one space character separating input words. The performance of your system is measured using word accuracy (the percentage of words you diacritized correctly, as computed by the diacritization_eval.py script) and your goal is to achieve at least 86.5%. You can use any sklearn algorithm with the exception of decision trees to solve this assignment (so no random forests, extra trees, gradient boosting, AdaBoost with decision trees, …).
kernel_linear_regression
Deadline: Nov 24, 23:59 5 points
Starting with the kernel_linear_regression.py, implement kernel linear regression training using SGD on the dual formulation. You should support polynomial and Gaussian kernels and also L2 regularization.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 kernel_linear_regression.py batch_size=1 kernel=poly kernel_degree=3 learning_rate=0.1
Iteration 10, train RMSE 0.59, test RMSE 1.10
Iteration 20, train RMSE 0.48, test RMSE 0.98
Iteration 30, train RMSE 0.51, test RMSE 1.15
Iteration 40, train RMSE 0.49, test RMSE 1.13
Iteration 50, train RMSE 0.47, test RMSE 1.10
Iteration 60, train RMSE 0.48, test RMSE 1.23
Iteration 70, train RMSE 0.49, test RMSE 1.29
Iteration 80, train RMSE 0.48, test RMSE 1.24
Iteration 90, train RMSE 0.47, test RMSE 1.12
Iteration 100, train RMSE 0.49, test RMSE 1.02
Iteration 110, train RMSE 0.52, test RMSE 1.22
Iteration 120, train RMSE 0.53, test RMSE 1.37
Iteration 130, train RMSE 0.50, test RMSE 1.28
Iteration 140, train RMSE 0.49, test RMSE 1.25
Iteration 150, train RMSE 0.53, test RMSE 1.19
Iteration 160, train RMSE 0.49, test RMSE 1.02
Iteration 170, train RMSE 0.51, test RMSE 1.12
Iteration 180, train RMSE 0.48, test RMSE 1.24
Iteration 190, train RMSE 0.47, test RMSE 1.13
Iteration 200, train RMSE 0.48, test RMSE 1.12
python3 kernel_linear_regression.py batch_size=1 kernel=poly kernel_degree=5 learning_rate=0.05
Iteration 10, train RMSE 0.61, test RMSE 1.59
Iteration 20, train RMSE 0.52, test RMSE 0.92
Iteration 30, train RMSE 0.48, test RMSE 1.08
Iteration 40, train RMSE 0.45, test RMSE 1.01
Iteration 50, train RMSE 0.47, test RMSE 0.71
Iteration 60, train RMSE 0.43, test RMSE 0.89
Iteration 70, train RMSE 0.45, test RMSE 1.01
Iteration 80, train RMSE 0.41, test RMSE 0.86
Iteration 90, train RMSE 0.43, test RMSE 0.63
Iteration 100, train RMSE 0.50, test RMSE 0.38
Iteration 110, train RMSE 0.38, test RMSE 0.60
Iteration 120, train RMSE 0.43, test RMSE 0.79
Iteration 130, train RMSE 0.36, test RMSE 0.56
Iteration 140, train RMSE 0.36, test RMSE 0.53
Iteration 150, train RMSE 0.39, test RMSE 0.50
Iteration 160, train RMSE 0.36, test RMSE 0.49
Iteration 170, train RMSE 0.35, test RMSE 0.29
Iteration 180, train RMSE 0.31, test RMSE 0.29
Iteration 190, train RMSE 0.31, test RMSE 0.24
Iteration 200, train RMSE 0.37, test RMSE 0.34
python3 kernel_linear_regression.py batch_size=5 kernel=poly kernel_degree=5 learning_rate=0.1 iterations=400
Iteration 10, train RMSE 0.52, test RMSE 1.20
Iteration 20, train RMSE 0.48, test RMSE 1.01
Iteration 30, train RMSE 0.49, test RMSE 1.09
Iteration 40, train RMSE 0.47, test RMSE 1.05
Iteration 50, train RMSE 0.47, test RMSE 0.96
Iteration 60, train RMSE 0.47, test RMSE 1.16
Iteration 70, train RMSE 0.45, test RMSE 1.12
Iteration 80, train RMSE 0.44, test RMSE 1.02
Iteration 90, train RMSE 0.45, test RMSE 0.87
Iteration 100, train RMSE 0.43, test RMSE 0.86
Iteration 110, train RMSE 0.44, test RMSE 0.95
Iteration 120, train RMSE 0.44, test RMSE 1.04
Iteration 130, train RMSE 0.43, test RMSE 0.96
Iteration 140, train RMSE 0.42, test RMSE 0.91
Iteration 150, train RMSE 0.44, test RMSE 0.79
Iteration 160, train RMSE 0.43, test RMSE 0.64
Iteration 170, train RMSE 0.42, test RMSE 0.72
Iteration 180, train RMSE 0.41, test RMSE 0.86
Iteration 190, train RMSE 0.39, test RMSE 0.69
Iteration 200, train RMSE 0.39, test RMSE 0.70
Iteration 210, train RMSE 0.39, test RMSE 0.69
Iteration 220, train RMSE 0.39, test RMSE 0.68
Iteration 230, train RMSE 0.38, test RMSE 0.67
Iteration 240, train RMSE 0.39, test RMSE 0.72
Iteration 250, train RMSE 0.37, test RMSE 0.60
Iteration 260, train RMSE 0.38, test RMSE 0.68
Iteration 270, train RMSE 0.37, test RMSE 0.63
Iteration 280, train RMSE 0.36, test RMSE 0.59
Iteration 290, train RMSE 0.36, test RMSE 0.48
Iteration 300, train RMSE 0.36, test RMSE 0.43
Iteration 310, train RMSE 0.35, test RMSE 0.43
Iteration 320, train RMSE 0.36, test RMSE 0.41
Iteration 330, train RMSE 0.35, test RMSE 0.42
Iteration 340, train RMSE 0.34, test RMSE 0.42
Iteration 350, train RMSE 0.36, test RMSE 0.33
Iteration 360, train RMSE 0.37, test RMSE 0.28
Iteration 370, train RMSE 0.36, test RMSE 0.42
Iteration 380, train RMSE 0.35, test RMSE 0.46
Iteration 390, train RMSE 0.35, test RMSE 0.41
Iteration 400, train RMSE 0.33, test RMSE 0.28
python3 kernel_linear_regression.py batch_size=1 kernel=rbf
Iteration 10, train RMSE 0.78, test RMSE 0.66
Iteration 20, train RMSE 0.74, test RMSE 0.61
Iteration 30, train RMSE 0.71, test RMSE 0.58
Iteration 40, train RMSE 0.67, test RMSE 0.54
Iteration 50, train RMSE 0.64, test RMSE 0.52
Iteration 60, train RMSE 0.62, test RMSE 0.50
Iteration 70, train RMSE 0.59, test RMSE 0.48
Iteration 80, train RMSE 0.57, test RMSE 0.47
Iteration 90, train RMSE 0.55, test RMSE 0.46
Iteration 100, train RMSE 0.53, test RMSE 0.45
Iteration 110, train RMSE 0.51, test RMSE 0.45
Iteration 120, train RMSE 0.49, test RMSE 0.45
Iteration 130, train RMSE 0.48, test RMSE 0.45
Iteration 140, train RMSE 0.46, test RMSE 0.46
Iteration 150, train RMSE 0.45, test RMSE 0.46
Iteration 160, train RMSE 0.44, test RMSE 0.47
Iteration 170, train RMSE 0.43, test RMSE 0.48
Iteration 180, train RMSE 0.42, test RMSE 0.49
Iteration 190, train RMSE 0.41, test RMSE 0.49
Iteration 200, train RMSE 0.40, test RMSE 0.50
python3 kernel_linear_regression.py batch_size=1 kernel=rbf kernel_gamma=0.5
Iteration 10, train RMSE 0.80, test RMSE 0.69
Iteration 20, train RMSE 0.79, test RMSE 0.67
Iteration 30, train RMSE 0.79, test RMSE 0.66
Iteration 40, train RMSE 0.78, test RMSE 0.65
Iteration 50, train RMSE 0.77, test RMSE 0.64
Iteration 60, train RMSE 0.76, test RMSE 0.64
Iteration 70, train RMSE 0.76, test RMSE 0.63
Iteration 80, train RMSE 0.75, test RMSE 0.62
Iteration 90, train RMSE 0.74, test RMSE 0.61
Iteration 100, train RMSE 0.74, test RMSE 0.61
Iteration 110, train RMSE 0.73, test RMSE 0.60
Iteration 120, train RMSE 0.72, test RMSE 0.59
Iteration 130, train RMSE 0.72, test RMSE 0.59
Iteration 140, train RMSE 0.71, test RMSE 0.58
Iteration 150, train RMSE 0.71, test RMSE 0.58
Iteration 160, train RMSE 0.70, test RMSE 0.57
Iteration 170, train RMSE 0.69, test RMSE 0.57
Iteration 180, train RMSE 0.69, test RMSE 0.56
Iteration 190, train RMSE 0.68, test RMSE 0.56
Iteration 200, train RMSE 0.68, test RMSE 0.56
python3 kernel_linear_regression.py batch_size=1 kernel=rbf kernel_gamma=5
Iteration 10, train RMSE 0.52, test RMSE 0.40
Iteration 20, train RMSE 0.36, test RMSE 0.22
Iteration 30, train RMSE 0.27, test RMSE 0.14
Iteration 40, train RMSE 0.24, test RMSE 0.13
Iteration 50, train RMSE 0.22, test RMSE 0.14
Iteration 60, train RMSE 0.22, test RMSE 0.16
Iteration 70, train RMSE 0.21, test RMSE 0.16
Iteration 80, train RMSE 0.21, test RMSE 0.17
Iteration 90, train RMSE 0.21, test RMSE 0.17
Iteration 100, train RMSE 0.21, test RMSE 0.17
Iteration 110, train RMSE 0.21, test RMSE 0.18
Iteration 120, train RMSE 0.21, test RMSE 0.18
Iteration 130, train RMSE 0.21, test RMSE 0.18
Iteration 140, train RMSE 0.21, test RMSE 0.18
Iteration 150, train RMSE 0.21, test RMSE 0.18
Iteration 160, train RMSE 0.21, test RMSE 0.18
Iteration 170, train RMSE 0.21, test RMSE 0.17
Iteration 180, train RMSE 0.21, test RMSE 0.17
Iteration 190, train RMSE 0.21, test RMSE 0.17
Iteration 200, train RMSE 0.21, test RMSE 0.17
python3 kernel_linear_regression.py batch_size=1 kernel=rbf kernel_gamma=50
Iteration 10, train RMSE 0.52, test RMSE 0.44
Iteration 20, train RMSE 0.36, test RMSE 0.28
Iteration 30, train RMSE 0.27, test RMSE 0.21
Iteration 40, train RMSE 0.23, test RMSE 0.17
Iteration 50, train RMSE 0.21, test RMSE 0.16
Iteration 60, train RMSE 0.21, test RMSE 0.16
Iteration 70, train RMSE 0.20, test RMSE 0.16
Iteration 80, train RMSE 0.20, test RMSE 0.16
Iteration 90, train RMSE 0.20, test RMSE 0.16
Iteration 100, train RMSE 0.20, test RMSE 0.15
Iteration 110, train RMSE 0.19, test RMSE 0.15
Iteration 120, train RMSE 0.19, test RMSE 0.15
Iteration 130, train RMSE 0.19, test RMSE 0.15
Iteration 140, train RMSE 0.19, test RMSE 0.15
Iteration 150, train RMSE 0.19, test RMSE 0.15
Iteration 160, train RMSE 0.19, test RMSE 0.15
Iteration 170, train RMSE 0.19, test RMSE 0.15
Iteration 180, train RMSE 0.19, test RMSE 0.15
Iteration 190, train RMSE 0.19, test RMSE 0.15
Iteration 200, train RMSE 0.19, test RMSE 0.15
python3 kernel_linear_regression.py batch_size=1 kernel=rbf kernel_gamma=50 l2=0.02
Iteration 10, train RMSE 0.54, test RMSE 0.45
Iteration 20, train RMSE 0.39, test RMSE 0.31
Iteration 30, train RMSE 0.32, test RMSE 0.25
Iteration 40, train RMSE 0.28, test RMSE 0.21
Iteration 50, train RMSE 0.26, test RMSE 0.20
Iteration 60, train RMSE 0.25, test RMSE 0.19
Iteration 70, train RMSE 0.25, test RMSE 0.18
Iteration 80, train RMSE 0.24, test RMSE 0.18
Iteration 90, train RMSE 0.24, test RMSE 0.18
Iteration 100, train RMSE 0.24, test RMSE 0.17
Iteration 110, train RMSE 0.24, test RMSE 0.17
Iteration 120, train RMSE 0.24, test RMSE 0.17
Iteration 130, train RMSE 0.24, test RMSE 0.17
Iteration 140, train RMSE 0.24, test RMSE 0.17
Iteration 150, train RMSE 0.24, test RMSE 0.17
Iteration 160, train RMSE 0.24, test RMSE 0.17
Iteration 170, train RMSE 0.24, test RMSE 0.17
Iteration 180, train RMSE 0.24, test RMSE 0.17
Iteration 190, train RMSE 0.24, test RMSE 0.17
Iteration 200, train RMSE 0.24, test RMSE 0.17
diacritization_dictionary
Deadline: Nov 24, 23:59 4 points+5 bonus
The diacritization_dictionary
is an extension of the diacritization
competition.
In addition to the original training data,
in this task you can also use a dictionary providing all known diacritized
variants
of word forms present in the training and testing data, available again under
CC BYNCSA license.
The dictionary is not guaranteed to contain all words from the training and
testing data, but if it contains a word, you can rely on all valid
diacritization variants being present.
The rules of the competition is the same as of the diacritization
competition,
except that
 you can utilize the dictionary, both during training and inference;
 in order to pass, you need to achieve at least 95% word accuracy.
The diacritization_dictionary.py
module provides a Dictionary
class, which loads the dictionary
(downloading it if necessary), exposing it in Dictionary.variants
field
as a mapping from undiacritized word form to a list of known diacritized
variants.
Note that the fictiondictionary.txt
is available during ReCodEx evaluation.
smo_algorithm
Deadline: Dec 1, 23:59 7 points
Using the smo_algorithm.py template, implement the SMO algorithm for binary classification using dual formulation of softmargin SVM. The template contains more detailed instructions.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 smo_algorithm.py kernel=poly kernel_degree=1
Iteration 100, train acc 88.0%, test acc 83.0%
Training finished after iteration 140, train acc 88.0%, test acc 83.0%
python3 smo_algorithm.py kernel=poly kernel_degree=3
Iteration 100, train acc 91.0%, test acc 89.0%
Iteration 200, train acc 91.0%, test acc 86.0%
Iteration 300, train acc 91.0%, test acc 86.0%
Iteration 400, train acc 91.0%, test acc 84.0%
Iteration 500, train acc 88.0%, test acc 87.0%
Iteration 600, train acc 91.0%, test acc 86.0%
Iteration 700, train acc 91.0%, test acc 86.0%
Iteration 800, train acc 90.0%, test acc 86.0%
Iteration 900, train acc 91.0%, test acc 86.0%
Training finished after iteration 1000, train acc 91.0%, test acc 86.0%
python3 smo_algorithm.py kernel=poly kernel_degree=3 C=5
Iteration 100, train acc 85.0%, test acc 82.0%
Iteration 200, train acc 83.0%, test acc 82.0%
Iteration 300, train acc 84.0%, test acc 84.0%
Iteration 400, train acc 64.0%, test acc 66.0%
Iteration 500, train acc 89.0%, test acc 89.0%
Iteration 600, train acc 91.0%, test acc 89.0%
Iteration 700, train acc 89.0%, test acc 90.0%
Iteration 800, train acc 89.0%, test acc 89.0%
Iteration 900, train acc 55.0%, test acc 60.0%
Training finished after iteration 1000, train acc 91.0%, test acc 88.0%
python3 smo_algorithm.py kernel=rbf kernel_gamma=1
Iteration 100, train acc 92.0%, test acc 84.0%
Iteration 200, train acc 92.0%, test acc 84.0%
Training finished after iteration 207, train acc 92.0%, test acc 84.0%
python3 smo_algorithm.py kernel=rbf kernel_gamma=0.1
Training finished after iteration 87, train acc 88.0%, test acc 85.0%
svm_multiclass
Deadline: Dec 01, 23:59 3 points
Extend your solution to the smo_algorithm
assignment to handle multiclass
classification, using the svm_multiclass.py
template.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 svm_multiclass.py classes=5 kernel=poly kernel_degree=2 test_size=0.8
Training classes 0 and 1
Training finished after iteration 71, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Training finished after iteration 88, train acc 100.0%, test acc 99.7%
Training classes 0 and 3
Training finished after iteration 57, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Iteration 100, train acc 100.0%, test acc 100.0%
Training finished after iteration 122, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Iteration 100, train acc 100.0%, test acc 98.2%
Training finished after iteration 108, train acc 100.0%, test acc 98.2%
Training classes 1 and 3
Training finished after iteration 67, train acc 100.0%, test acc 99.7%
Training classes 1 and 4
Iteration 100, train acc 100.0%, test acc 98.9%
Training finished after iteration 135, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Training finished after iteration 84, train acc 100.0%, test acc 98.0%
Training classes 2 and 4
Training finished after iteration 71, train acc 100.0%, test acc 98.3%
Training classes 3 and 4
Training finished after iteration 75, train acc 100.0%, test acc 98.6%
Test set accuracy: 97.92%
python3 svm_multiclass.py classes=5 kernel=poly kernel_degree=3 test_size=0.8
Training classes 0 and 1
Training finished after iteration 40, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Training finished after iteration 29, train acc 100.0%, test acc 99.3%
Training classes 0 and 3
Training finished after iteration 18, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Training finished after iteration 31, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Training finished after iteration 36, train acc 100.0%, test acc 98.2%
Training classes 1 and 3
Training finished after iteration 18, train acc 100.0%, test acc 99.3%
Training classes 1 and 4
Training finished after iteration 41, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Training finished after iteration 44, train acc 100.0%, test acc 97.6%
Training classes 2 and 4
Training finished after iteration 28, train acc 100.0%, test acc 98.3%
Training classes 3 and 4
Training finished after iteration 19, train acc 100.0%, test acc 99.0%
Test set accuracy: 97.64%
python3 svm_multiclass.py classes=5 kernel=poly kernel_degree=3 kernel_gamma=0.5 test_size=0.8
Training classes 0 and 1
Training finished after iteration 69, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Training finished after iteration 41, train acc 100.0%, test acc 99.3%
Training classes 0 and 3
Training finished after iteration 57, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Training finished after iteration 80, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Training finished after iteration 96, train acc 100.0%, test acc 98.2%
Training classes 1 and 3
Training finished after iteration 62, train acc 100.0%, test acc 99.7%
Training classes 1 and 4
Training finished after iteration 76, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Training finished after iteration 98, train acc 100.0%, test acc 98.0%
Training classes 2 and 4
Training finished after iteration 47, train acc 100.0%, test acc 98.3%
Training classes 3 and 4
Training finished after iteration 51, train acc 100.0%, test acc 98.6%
Test set accuracy: 98.06%
python3 svm_multiclass.py classes=5 kernel=rbf kernel_gamma=1 test_size=0.8
Training classes 0 and 1
Training finished after iteration 51, train acc 100.0%, test acc 97.9%
Training classes 0 and 2
Training finished after iteration 50, train acc 100.0%, test acc 99.3%
Training classes 0 and 3
Training finished after iteration 43, train acc 100.0%, test acc 99.3%
Training classes 0 and 4
Training finished after iteration 51, train acc 100.0%, test acc 95.5%
Training classes 1 and 2
Training finished after iteration 55, train acc 100.0%, test acc 92.6%
Training classes 1 and 3
Training finished after iteration 48, train acc 100.0%, test acc 97.2%
Training classes 1 and 4
Training finished after iteration 45, train acc 100.0%, test acc 99.6%
Training classes 2 and 3
Training finished after iteration 40, train acc 100.0%, test acc 99.3%
Training classes 2 and 4
Training finished after iteration 42, train acc 100.0%, test acc 90.9%
Training classes 3 and 4
Training finished after iteration 41, train acc 100.0%, test acc 94.5%
Test set accuracy: 92.23%
python3 svm_multiclass.py classes=5 kernel=rbf kernel_gamma=0.1 C=3 test_size=0.8
Training classes 0 and 1
Iteration 100, train acc 100.0%, test acc 100.0%
Training finished after iteration 185, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Iteration 100, train acc 100.0%, test acc 99.7%
Training finished after iteration 128, train acc 100.0%, test acc 99.7%
Training classes 0 and 3
Iteration 100, train acc 100.0%, test acc 100.0%
Training finished after iteration 189, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Iteration 100, train acc 100.0%, test acc 100.0%
Training finished after iteration 140, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Iteration 100, train acc 100.0%, test acc 97.9%
Training finished after iteration 168, train acc 100.0%, test acc 97.9%
Training classes 1 and 3
Iteration 100, train acc 100.0%, test acc 99.7%
Training finished after iteration 114, train acc 100.0%, test acc 99.7%
Training classes 1 and 4
Iteration 100, train acc 100.0%, test acc 98.9%
Training finished after iteration 141, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Iteration 100, train acc 100.0%, test acc 98.3%
Training finished after iteration 129, train acc 100.0%, test acc 98.3%
Training classes 2 and 4
Iteration 100, train acc 100.0%, test acc 99.0%
Training finished after iteration 106, train acc 100.0%, test acc 99.0%
Training classes 3 and 4
Iteration 100, train acc 100.0%, test acc 99.3%
Training finished after iteration 175, train acc 100.0%, test acc 99.3%
Test set accuracy: 97.92%
naive_bayes
Deadline: Dec 08, 23:59 4 points
Using the naive_bayes.py
template, implement a naive Bayes classifier (without using the sklearn
one).
Support all of Gaussian NB, multinomial NB and Bernoulli NB.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 naive_bayes.py classes=3 naive_bayes_type=bernoulli
Test accuracy 93.31%
python3 naive_bayes.py classes=3 naive_bayes_type=multinomial
Test accuracy 94.05%
python3 naive_bayes.py classes=3 naive_bayes_type=gaussian
Test accuracy 97.03%
python3 naive_bayes.py classes=10 naive_bayes_type=bernoulli
Test accuracy 84.32%
python3 naive_bayes.py classes=10 naive_bayes_type=multinomial alpha=10
Test accuracy 89.66%
python3 naive_bayes.py classes=10 naive_bayes_type=gaussian alpha=10 seed=41
Test accuracy 91.55%
isnt_it_ironic
Deadline: Dec 08, 23:59 5 points+6 bonus
The goal of the isnt_it_ironic
competition task is to learn to
classify given text as ironic or not.
The isnt_it_ironic.py template shows how to load the training data, downloading it if needed. Please note that the data are provided only for the purpose of this class and you cannot use them in any other way.
Each instance is a string of an English tweet. The texts have already been tokenized and tokens are separated by exactly one space. The performance of your solution will be evaluated using F1score with sklearn.metrics.f1_score and if you surpass at least 57%, you will obtain 5 points. Note that you can use any sklearn algorithm to solve this exercise (or anything you implement yourselves).
You might find TfidfTransformer or TfidfVectorizer useful.
kernel_approximation
Deadline: Dec 08 Dec 15, 23:59
3 points
Using the kernel_approximation.py template, implement the RFF and Nyström approximations of an RBF kernel.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 kernel_approximation.py original
Test set accuracy: 89.64%
python3 kernel_approximation.py original svm
Test set accuracy: 94.68%
python3 kernel_approximation.py rff=300
Test set accuracy: 84.36%
python3 kernel_approximation.py rff=800
Test set accuracy: 91.64%
python3 kernel_approximation.py nystroem=100
Test set accuracy: 90.80%
python3 kernel_approximation.py nystroem=300
Test set accuracy: 93.28%
decision_tree
Deadline: Dec 15, 23:59 4 points
Starting with the decision_tree.py,
manually implement construction of a classification decision tree, supporting both
gini
and entropy
criteria, and max_depth
, min_to_split
and max_leaves
constraints.
In this assignment, you will get partial points during ReCodEx evaluation, depending on which argument values your solution support.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 decision_tree.py criterion=gini min_to_split=50 seed=91
Train accuracy: 92.6%
Test accuracy: 88.1%
python3 decision_tree.py criterion=gini max_depth=2 seed=91
Train accuracy: 93.4%
Test accuracy: 88.1%
python3 decision_tree.py criterion=gini max_leaves=4 seed=91
Train accuracy: 97.8%
Test accuracy: 92.9%
python3 decision_tree.py criterion=gini min_to_split=40 max_leaves=4 seed=92
Train accuracy: 94.1%
Test accuracy: 81.0%
python3 decision_tree.py criterion=entropy min_to_split=55 seed=97
Train accuracy: 94.1%
Test accuracy: 78.6%
python3 decision_tree.py criterion=entropy max_depth=2 seed=97
Train accuracy: 94.9%
Test accuracy: 81.0%
python3 decision_tree.py criterion=entropy max_leaves=4 seed=97
Train accuracy: 98.5%
Test accuracy: 88.1%
python3 decision_tree.py criterion=entropy min_to_split=45 max_depth=2 seed=100
Train accuracy: 94.9%
Test accuracy: 78.6%
random_forest
Deadline: Dec 15, 23:59 3 points
Using the random_forest.py template, train a random forest, which is a collection of decision trees trained with dataset bagging and random feature subsampling.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 random_forest.py trees=3 max_depth=2 seed=46
Train accuracy: 93.4%
Test accuracy: 83.3%
python3 random_forest.py trees=3 bootstrapping max_depth=2 seed=46
Train accuracy: 97.1%
Test accuracy: 88.1%
python3 random_forest.py trees=3 feature_subsampling=0.5 max_depth=2 seed=46
Train accuracy: 97.1%
Test accuracy: 85.7%
python3 random_forest.py trees=3 bootstrapping feature_subsampling=0.5 max_depth=2 seed=46
Train accuracy: 98.5%
Test accuracy: 90.5%
human_activity_recognition
Deadline: Dec 15, 23:59 4 points+4 bonus
This assignment is a competition task. Your goal is to perform human activity recognition, namely to recognize one of five actions (walking, standing, sitting, standing up, sitting down) using data from four accelerometers. The train set consists of 50k examples, the test set of approximately 115k.
The human_activity_recognition.py template shows how to load the training data, downloading it if needed.
Your model will be evaluated using accuracy and your goal is to achieve at least 99%. Note that you can use any sklearn algorithm to solve this exercise.
gradient_boosting
Deadline: Jan 05, 23:59 6 points
Using the gradient_boosting.py template, train gradient boosted decision tree forest for classification.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 gradient_boosting.py dataset=wine trees=3 max_depth=1 learning_rate=0.3
Using 1 trees, train accuracy: 91.2%, test accuracy: 73.8%
Using 2 trees, train accuracy: 96.3%, test accuracy: 90.5%
Using 3 trees, train accuracy: 97.1%, test accuracy: 95.2%
python3 gradient_boosting.py dataset=wine trees=3 max_depth=2 learning_rate=0.3
Using 1 trees, train accuracy: 97.1%, test accuracy: 83.3%
Using 2 trees, train accuracy: 97.1%, test accuracy: 90.5%
Using 3 trees, train accuracy: 98.5%, test accuracy: 97.6%
python3 gradient_boosting.py dataset=wine trees=3 max_depth=2 l2=0.5 learning_rate=0.3
Using 1 trees, train accuracy: 96.3%, test accuracy: 83.3%
Using 2 trees, train accuracy: 98.5%, test accuracy: 97.6%
Using 3 trees, train accuracy: 98.5%, test accuracy: 100.0%
python3 gradient_boosting.py dataset=digits trees=3 max_depth=2 learning_rate=0.5
Using 1 trees, train accuracy: 79.4%, test accuracy: 76.2%
Using 2 trees, train accuracy: 86.8%, test accuracy: 81.0%
Using 3 trees, train accuracy: 90.3%, test accuracy: 83.3%
python3 gradient_boosting.py dataset=breast_cancer trees=3 max_depth=2 learning_rate=0.5
Using 1 trees, train accuracy: 94.3%, test accuracy: 97.6%
Using 2 trees, train accuracy: 96.0%, test accuracy: 97.6%
Using 3 trees, train accuracy: 96.6%, test accuracy: 100.0%
nli_competition
Deadline: Jan 05, 23:59 5 points+5 bonus
In this competition task you will be solving the Native Language Identification. In that task, you get an English essay writen by a nonnative individual and your goal is to identify their native language.
We will be using NLI Shared Task 2013 data, which contains documents in 11 languages. For each language, the train, development and test sets contain 900, 100 and 100 documents, respectively. Particularly interesting is the fact that humans are quite bad in this task (in a simplified settings, human professionals achieve 4050% accuracy), while machine learning models can achive high performance.
Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. To load the dataset, use the nli_competition.py template.
The performance of your system is measured using accuracy of correctly predicted documents and your goal is to achieve at least 77% accuracy. Note that you can use any sklearn algorithm to solve this exercise.
pca
Deadline: Feb 28, 23:59 3 points
Using the pca.py template, implement the PCA computation with both
 power iteration algorithm,
 SVD decomposition.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 pca.py max_iter=20
Test set accuracy: 90.76%
python3 pca.py max_iter=20 pca=1
Test set accuracy: 30.64%
python3 pca.py max_iter=20 pca=5
Test set accuracy: 68.96%
python3 pca.py max_iter=20 pca=10
Test set accuracy: 80.44%
python3 pca.py max_iter=20 pca=20
Test set accuracy: 87.76%
python3 pca.py max_iter=20 pca=50
Test set accuracy: 89.92%
python3 pca.py max_iter=20 pca=100
Test set accuracy: 90.68%
python3 pca.py max_iter=20 pca=200
Test set accuracy: 90.88%
kmeans
Deadline: Feb 28, 23:59 3 points
Using the kmeans.py template, implement the KMeans algorithm with both
 random initialization,
kmeans++
initialization.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 kmeans.py clusters=5 examples 150 iterations 5 seed 51 init=random
Cluster assignments:
[2 3 3 4 1 2 1 1 2 3 2 1 1 3 3 4 0 4 4 1 3 1 1 1 1 0 1 3 3 2 3 0 1 0 3 3 0
0 1 0 1 2 1 1 3 2 1 2 2 1 3 2 2 2 3 2 1 2 1 4 3 3 4 4 2 1 1 1 1 3 1 3 1 4
1 3 2 1 0 0 1 2 2 0 2 2 3 1 1 1 2 2 4 2 2 1 1 1 2 2 2 3 1 3 1 3 2 1 0 2 2
3 1 1 1 3 3 0 1 3 4 1 1 4 1 3 1 4 4 3 1 4 1 4 1 1 1 3 1 1 4 2 0 3 1 4 1 2
2 1]
python3 kmeans.py clusters=5 examples 150 iterations 5 seed 51 init=kmeans++
Cluster assignments:
[1 3 3 4 0 1 0 2 1 3 1 2 2 3 3 4 4 4 4 2 3 2 2 2 2 4 2 3 3 0 3 4 0 4 3 3 4
4 2 4 2 1 0 0 3 1 0 1 1 0 3 1 0 0 3 1 0 1 2 4 3 3 4 4 1 0 2 0 0 3 0 3 0 4
2 3 1 2 4 4 2 1 1 4 1 1 3 0 2 2 1 1 4 1 1 2 0 2 1 1 1 3 0 3 2 3 1 0 4 1 1
3 0 2 0 3 0 4 0 3 4 0 2 4 2 3 2 4 4 3 2 4 2 4 2 0 0 3 0 0 4 1 4 3 0 4 2 1
1 2]
python3 kmeans.py clusters=7 examples 200 iterations 11 seed 67 init=random
Cluster assignments:
[2 1 0 4 5 1 4 1 1 2 0 3 6 6 1 6 1 1 0 2 3 2 4 0 6 5 5 4 5 4 4 6 6 1 0 6 4
4 1 6 4 5 4 4 1 0 2 1 2 2 4 3 2 1 5 2 6 0 5 6 4 2 6 3 1 1 4 5 1 2 4 5 4 5
1 1 4 2 5 4 4 5 4 2 2 4 4 1 5 0 4 4 4 1 3 0 3 5 4 1 0 4 4 4 4 4 5 4 1 4 4
2 5 2 6 5 2 2 4 5 4 4 3 3 2 6 1 4 1 6 1 2 3 0 5 6 4 6 4 5 5 2 0 1 6 0 1 4
4 6 5 1 2 4 0 0 4 0 4 3 5 4 3 4 6 3 6 5 5 6 0 2 6 5 4 5 4 3 2 4 1 2 4 2 4
2 6 4 4 6 2 4 4 6 6 5 0 2 4 1]
python3 kmeans.py clusters=7 examples 200 iterations 5 seed 67 init=kmeans++
Cluster assignments:
[3 1 4 5 0 1 6 1 3 3 4 4 2 2 1 2 1 1 4 3 4 3 5 4 2 0 0 6 0 6 5 2 2 1 4 2 5
5 1 2 5 0 6 6 1 4 3 1 3 3 5 4 3 1 0 3 2 4 0 2 5 3 2 4 1 1 6 0 1 3 5 0 5 0
1 1 6 3 0 6 5 0 5 3 3 5 6 1 0 4 5 6 5 1 4 4 2 0 6 1 4 6 5 5 6 5 0 5 1 6 6
3 0 3 2 0 3 3 5 0 6 6 4 4 3 2 1 6 1 2 1 3 4 4 0 2 6 2 6 0 0 3 4 1 2 4 1 5
6 2 0 1 3 5 4 4 6 4 6 4 0 5 2 5 2 4 2 0 0 2 4 3 2 0 6 0 5 2 3 5 1 3 6 3 5
3 2 6 5 2 0 6 6 2 2 0 4 3 6 1]
gaussian_mixture
Deadline: Feb 28, 23:59 4 points
Cluster given input by fitting a Gaussian mixture using the gaussian_mixture.py template. Use full covariances and compute negative log likelihood of the model after every iteration of the EM algorithm.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 gaussian_mixture.py examples=112 clusters=4 iterations=5 init=random
Loss after iteration 1: 546.2
Loss after iteration 2: 524.1
Loss after iteration 3: 502.9
Loss after iteration 4: 471.1
Loss after iteration 5: 463.5
python3 gaussian_mixture.py examples=112 clusters=4 iterations=3 init=kmeans++
Loss after iteration 1: 458.5
Loss after iteration 2: 458.5
Loss after iteration 3: 458.5
python3 gaussian_mixture.py examples=120 clusters=5 iterations=11 init=random
Loss after iteration 1: 526.2
Loss after iteration 2: 520.9
Loss after iteration 3: 517.5
Loss after iteration 4: 517.2
Loss after iteration 5: 517.1
Loss after iteration 6: 517.0
Loss after iteration 7: 517.0
Loss after iteration 8: 517.0
Loss after iteration 9: 516.9
Loss after iteration 10: 516.9
Loss after iteration 11: 516.9
python3 gaussian_mixture.py examples=120 clusters=5 iterations=5 init=kmeans++
Loss after iteration 1: 516.5
Loss after iteration 2: 513.7
Loss after iteration 3: 508.8
Loss after iteration 4: 505.4
Loss after iteration 5: 504.5
bootstrap_resampling
Deadline: Feb 28, 23:59 3 points
Given two trained models, compute their 95% confidence intervals using bootstrap resampling. Then, perform a paired bootstrap test that the second one is better than the first one.
Start with the bootstrap_resampling.py template. Note that you usually need to perform a lot of the bootstrap resamplings, so you should make sure your implementation is fast enough.
Note that your results may sometimes be slightly different (for example because of varying floating point arithmetic on your CPU).
python3 bootstrap_resampling.py seed=49 test_size=0.9 bootstrap_samples=1000
Confidence intervals of the two models:
 [90.23% .. 93.02%]
 [90.98% .. 93.63%]
The pvalue of the test: 1.40%
python3 bootstrap_resampling.py seed=49 test_size=0.9 bootstrap_samples=10000
Confidence intervals of the two models:
 [90.30% .. 93.02%]
 [91.10% .. 93.70%]
The pvalue of the test: 1.71%
python3 bootstrap_resampling.py seed=49 test_size=0.9 bootstrap_samples=100000
Confidence intervals of the two models:
 [90.30% .. 92.95%]
 [91.10% .. 93.70%]
The pvalue of the test: 1.62%
python3 bootstrap_resampling.py seed=85 test_size=0.95 bootstrap_samples=50000
Confidence intervals of the two models:
 [86.83% .. 89.87%]
 [87.18% .. 90.16%]
The pvalue of the test: 15.63%
In the competitions, your goal is to train a model and then predict target values on the test set available only in ReCodEx.
Submitting to ReCodEx
When submitting a competition solution to ReCodEx, you should submit a trained model and a Python source capable of running it.
Furthermore, please also include the Python source and hyperparameters
you used to train the submitted model. But be careful that there still must be
exactly one Python source with a line starting with def main(
.
Competition Evaluation

Before the deadline, ReCodEx prints the exact achieved score, but only if it is worse than the baseline.
If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached score.

After the deadline, ReCodEx starts to print the exact performance in all cases, and all submissions are reevaluated.
The latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.
What Is Allowed
 You can use only the given annotated data, either for training or evaluation.
 You can use any unannotated or manually created data for training or evaluation.
 The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like handwritten rules).
 Do not use test set annotations in any way.
 Unless stated otherwise, you can use any algorithm present in
numpy
orscipy
, anything you implement yourself, and any pre/postprocessing or ensembling methods insklearn
. Do not use deep network frameworks like TensorFlow or PyTorch.
Install

Installing to central user packages repository
You can install all required packages to central user packages repository using
pip3 install user scikitlearn==0.23.2 pandas==1.1.2 numpy==1.18.5 scipy==1.5.2 matplotlib==3.3.2
. 
Installing to a virtual environment
Python supports virtual environments, which are directories containing independent sets of installed packages. You can create the virtual environment by running
python3 m venv VENV_DIR
followed byVENV_DIR/bin/pip3 install scikitlearn==0.23.2 pandas==1.1.2 numpy==1.18.5 scipy==1.5.2 matplotlib==3.3.2
.
ReCodEx

What files can be submitted to ReCodEx?
You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.

What file does ReCodEx execute and what arguments does it use?
Exactly one file with
py
suffix must contain a line starting withdef main(
. Such a file is imported by ReCodEx and themain
method is executed (during the import,__name__ == "__recodex__"
).The file must also export an argument parser called
parser
. ReCodEx uses its arguments and default values, but is overwrites some of the arguments depending on the test being executed – the template should always indicate which arguments are set by ReCodEx and which are left intact. 
What are the time and memory limits?
The memory limit during evaluation is 1.5GB. The time limit varies, but should be at least 10 seconds and at least twice the running time of my solution. For competition assignments, the time limit is 5 minutes.
Requirements
To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 (including the bonus points) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available.
To pass the exam, you need to obtain at least 60, 75 and 90 out of 100point exam, to obtain grades 3, 2 and 1, respectively. (PhD students with binary grades require 75 points.) The exam consists of 100pointworth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get at most 40 surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count.
Exam Questions
Lecture 1 Questions

Define prediction function of a linear regression model and write down $L_2$regularized mean squared error loss. [5]

Starting from unregularized sum of squares error of a linear regression model, show how the explicit solution can be obtained, assuming $\boldsymbol X^T \boldsymbol X$ is regular. [10]
Lecture 2 Questions

Define expectation $\mathbb{E}[f(x)]$ and variance $\operatorname{Var}(f(x))$ of a discrete random variable. Then define bias of an estimator and show that estimating an expectation using a single sample is unbiased. [5]

Describe gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]

Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]

Write an $L_2$regularized minibatch SGD algorithm for training a linear regression model, including the explicit formulas of the loss function and its gradient. [5]
Lecture 3 Questions

Define binary classification, write down the perceptron algorithm and show how a prediction is made for a given example. [5]

Show that the perceptron algorithm is an instance of a stochastic gradient descent. Why are the learning rates not needed (i.e., why does not the result of the training depend on the learning rate)? [5]

Define entropy, crossentropy, KullbackLeibler divergence, and prove the Gibbs inequality (i.e., that KL divergence is nonnegative). [5]

Describe maximum likelihood estimation, as minimizing NLL, crossentropy and KL divergence. [10]

Considering binary logistic regression model, write down its parameters (including their size) and explain how is prediction performed (including the formula for the sigmoid function). Describe how we can interpret the outputs of the linear part of the model as logits. [5]

Write down an $L_2$regularized minibatch SGD algorithm for training a binary logistic regression model, including the explicit formulas of the loss function and its gradient. [10]
Lecture 4 Questions

Define mean squared error and show how it can be derived using MLE. [5]

Considering $K$class logistic regression model, write down its parameters (including their size) and explain how is prediction performed (including the formula for the softmax function). Describe how we can interpret the outputs of the linear part of the model as logits. [5]

Write down an $L_2$regularized minibatch SGD algorithm for training a $K$class logistic regression model, including the explicit formulas of the loss function and its gradient. [10]

Considering a singlelayer MLP with $D$ input neurons, $H$ hidden neurons, $K$ output neurons, hidden activation $f$ and output activation $a$, list its parameters (including their size) and write down how is the output computed. [5]

List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]

Considering a singlelayer MLP with $D$ input neurons, a ReLU hidden layer with $H$ units and softmax output layer with $K$ units, write down the formulas of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input $\boldsymbol x$, target $t$ and negative log likelihood loss. [10]

Formulate the Universal approximation theorem. [5]
Lecture 5 Questions

Consider derivation of softmax using maximum entropy principle, assuming we have a dataset of $N$ examples $(x_i, t_i), x_i \in \mathbb{R}^D, t_i \in \{1, 2, \ldots, K\}$. Formulate the three conditions we impose on the searched $\pi: \mathbb{R}^D \rightarrow \mathbb{R}^K$, and write down the Lagrangian to be maximized. [10]

Define precision (including true positives and others), recall, $F_1$ score and $F_\beta$ score (we stated several formulations for $F_1$ and $F_\beta$ scores; any of them will do). [5]

Explain the difference between microaveraged and macroaveraged $F_1$ score. [5]

Describe knearest neighbors prediction, both for regression and classification. Define $L_p$ norm and describe uniform, inverse and softmax weighting. [5]
Lecture 6 Questions

Define a kernel based on a feature map $\varphi: \mathbb{R}^D \rightarrow \mathbb{R}^F$, and write down the formulas for (1) a polynomial kernel of degree $d$, (2) a polynomial kernel of degree at most $d$, (3) an RBF kernel. [5]

Define a kernel and write down the minibatch SGD training algorithm of dual formulation of kernel linear regression. Then, describe how predictions for unseen data are made. [10]

Derive the primary formulation of hardmargin SVM (the value to minimize, the constraints to fulfil) as a maximummargin classifier. [5]

Starting from primary hardmargin SVM formulation, derive the dual formulation (the Lagrangian L, the required conditions, the KKT conditions). [10]

Considering hardmargin SVM, define what a support vector is, and how predictions are performed for unseen data. [5]
Lecture 7 Questions

Write down the primary formulation of softmargin SVM using the slack variables (the value to minimize, the constraints to fulfil). [5]

Starting from primary softmargin SVM formulation, derive the dual formulation (the Lagrangian L, the required conditions, the KKT conditions). [10]

Write down the primary formulation of softmargin SVM using the hinge loss. [5]

Describe the highlevel overview of the SMO algorithm (the test whether the KKT conditions hold, how we select the $a_i$ and $a_j$ to update, what is the goal of updating the $a_i$ and $a_j$, how do we detect convergence; but without the update of $a_i$, $a_j$, $b$ themselves). [5]

Describe the part of the SMO algorithm which updates $a_i$ and $a_j$ to maximize the Lagrangian. If you explain how is the update derived (so that if I followed the instructions, I would come up with the update rules), you do not need to write explicit formulas. [10]

Describe the part of the SMO algorithm which updates $b$ to maximize the Lagrangian. If you explain how is the update derived (so that if I followed the instructions, I would come up with two $b$ candidates and a rule how to utilize them), you do not need to write explicit formulas. [10]

Describe the oneversusone and oneversusrest schemes of constructing a $K$class classifier by combining multiple binary classifiers. [5]
Lecture 8 Questions

Write down how is a Nyström approximation of an RBF kernel constructed. [10]

Describe how is a trained Nyström approximation of an RBF kernel applied to input data. [5]

Explain how is the TFIDF weight of a given term computed. [5]

Write down how is $p(C_k  \boldsymbol x)$ approximated in a Naive Bayes classifier, and explicitly state the Naive Bayes assumption. [5]

Considering a Gaussian naive Bayes, describe how are $p(x_i  C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]

Considering a Multinomial naive Bayes, describe how are $p(\boldsymbol x  C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]

Considering a Bernoulli naive Bayes, describe how are $p(x_i  C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]
Lecture 9 Questions

Prove that independent discrete random variables are uncorrelated. [5]

Write down the definition of covariance and Pearson correlation coefficient $\rho$, including its range. [5]

Explain how are the Spearman's rank correlation coefficient and the Kendall rank correlation coefficient computed (no need to describe the Pearson correlation coefficient). [5]

Considering an averaging ensemble of $M$ models, prove the relation between the average mean squared error of the ensemble and the average error of the individual models, assuming the model errors have zero mean and are uncorrelated. [10]

In a regression decision tree, state what values are kept in internal nodes, define the squared error criterion and describe how a leaf is split during training (without discussing splitting constraints). [5]

In a $K$class classification decision tree, state what values are kept in internal nodes, define the Gini index and describe how a node is split during training (without discussing splitting constraints). [5]

In a $K$class classification decision tree, state what values are kept in internal nodes, define the entropy criterion and describe how a node is split during training (without discussing splitting constraints). [5]

For binary classification, derive the Gini index from a squared error loss. [10]

For $K$class classification, derive the entropy criterion from a nonaveraged NLL loss. [10]

Describe how is a random forest trained (including bagging and random subset of features) and how is prediction performed for regression and classification. [10]
Lecture 10 Questions

Write down the loss function which we optimize in gradient boosting decision tree during the construction of $t^\mathrm{th}$ tree. Then define $g_i$ and $h_i$ and show the value $w_\mathcal{T}$ of optimal prediction in node $\mathcal{T}$. [10]

Write down the loss function which we optimize in gradient boosting decision tree during the construction of $t^\mathrm{th}$ tree. Then define $g_i$ and $h_i$ and the criterion used during node splitting. [10]

How is the learning rate used during training and prediction of a gradient boosting decision tree? [5]

For a $K$class classification, describe how to perform prediction with a gradient boosting decision tree trained for $T$ timestamps (how the individual trees perform prediction and how are the $K \cdot T$ trees combined to produce the predicted categorical distribution). [5]

Considering a $K$class classification, describe which individual trees (and in which order) are created during gradient boosted decision tree training, and what perexample loss is used for training every one of them (expressed using predictions of the already trained trees). You do not need to describe the training process of the individual trees themselves. [10]
Lecture 11 Questions

When deriving the first principal component, write the value of the variance we aim to maximize, both without and with the covariance matrix (and define the covariance matrix). [5]

When deriving the first $M$ principal components, write the value of the reconstruction loss we aim to minimize using all but the first $M$ principal components, both without and with the covariance matrix (and define the covariance matrix). [10]

Write down the formula for whitening (sphering) the data matrix $\boldsymbol X$, and state what mean and covariance does the result has. [5]

Explain how to compute the first $M$ principal components using the SVD decomposition of the centered data matrix $\boldsymbol X$. [5]

Write down the algorithm of computing the first $M$ principal components of the data matrix $\boldsymbol X$ using the power iteration algorithm. [10]

Describe the Kmeans algorithm, including the
kmeans++
initialization. [10] 
Define the multivariate Gaussian distribution of dimension $D$. [5]

Show how to sample from a multivariate Gaussian distribution $\mathcal{N}(\boldsymbol \mu, \boldsymbol \Sigma)$ with a full covariance matrix, by using random samples from $\mathcal{N}(0, \boldsymbol I)$ distribution. [5]

Describe the constant surfaces of a multivariate Gaussian distribution with (1) $\sigma^2 \boldsymbol I$ covariation, (2) a diagonal covariation matrix, (3) a full covariation matrix. [5]
Lecture 12 Questions

Considering a Gaussian mixture with $K$ clusters, explain how we represent the individual clusters and write down the likelihood of an example $\boldsymbol x$ for a given Gaussian mixture. [5]

Write down the log likelihood of an $N$element dataset for a given Gaussian mixture model with $K$ components. [5]

Considering the algorithm for Gaussian mixture clustering, write down the E step (how to compute the responsibilities) and the M step (how to update the means, covariances and priors of the individual clusters). [10]

Write down the MSE loss of a regression problem, and formulate the biasvariance tradeoff, i.e., the decomposition of expected MSE loss (with respect to a randomly sampled test set) into bias, variance and irreducible error terms. [10]
Lecture 13 Questions

Considering statistical hypothesis testing, define type I errors and type II errors (in terms of the null hypothesis). Finally define what a significance level is. [5]

Explain what a test statistic and a pvalue are. [5]

Write down the steps of a statistical hypothesis test. [10]

Explain the differences between a onesample test, twosample test and a paired test. [5]

When considering multiple comparison problem, define the familywise error rate, and formulate the Bonferroni correction, which allows to limit the familywise error rate by a given $\alpha$. [5]

For a trained model and a given test set with $N$ examples and metric $E$, write how to estimate 95% confidence intervals using bootstrap resampling. [5]

For two trained models and a given test set with $N$ examples and metric $E$, explain how to perform a paired bootstrap test that the first model is better than the other with a significance level $\alpha$. [10]