Be aware that this is an archived page from former years. You can visit the current version instead.

Machine Learning for Greenhorns – Winter 2022/23

Machine learning is reaching notable success when solving complex tasks in many fields. This course serves as in introduction to basic machine learning concepts and techniques, focusing both on the theoretical foundation, and on implementation and utilization of machine learning algorithms in Python programming language. High attention is paid to the ability of application of the machine learning techniques on practical tasks, in which the students try to devise a solution with highest performance.

Python programming skills are required, together with basic probability theory knowledge.

About

Official name: Introduction to Machine Learning with Python
SIS code: NPFL129
Semester: winter
E-credits: 5
Examination: 2/2 C+Ex
Guarantor: Milan Straka

Timespace Coordinates

lecture: English lecture is held on Monday 12:20 in S9, Czech lecture on Tuesday 9:00 in N1; first lecture is on Oct 03
practicals: there are two parallel practicals, an English one Monday 14:00 in S9 and a Czech one on Wednesday 9:00 in S5; first practicals are on Oct 03

All lectures and practicals will be recorded and available on this website.

Lectures

1. Introduction to Machine Learning Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions linear_regression_manual linear_regression_features

2. Linear Regression II, SGD Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions linear_regression_l2 linear_regression_sgd feature_engineering rental_competition

3. Perceptron and Logistic Regression Slides PDF Slides CZ Lecture CZ Logistic Regression CZ Practicals EN Lecture EN Logit EN Practicals Questions perceptron logistic_regression_sgd grid_search thyroid_competition

4. Multiclass Logistic Regression, Multilayer Perceptron Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions softmax_classification_sgd mlp_classification_sgd mnist_competition

5. Derivation of Softmax, F1, k-NN Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN kNN EN Practicals Questions multilabel_classification_sgd k_nearest_neighbors diacritization

6. Kernel Methods, SVM Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions kernel_linear_regression diacritization_dictionary

7. Soft-margin SVM, SMO Slides PDF Slides CZ Lecture CZ SMO Computations CZ Practicals EN Lecture EN SMO Computations EN Practicals Questions smo_algorithm svm_multiclass

8. TF-IDF, Naive Bayes Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tf_idf naive_bayes isnt_it_ironic

9. Correlation, Model Combination, Decision Trees, Random Forests Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions metric_correlation decision_tree random_forest

10. Gradient Boosting Decision Trees Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions gradient_boosting human_activity_recognition

11. PCA, K-Means Slides PDF Slides CZ Lecture CZ K-means CZ Practicals EN Lecture EN K-means EN Practicals Questions pca kmeans nli_competition

12. Gaussian Mixture, EM Algorithm, Bias-Variance Trade-off Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions gaussian_mixture

13. Statistical Hypothesis Testing, Model Comparison Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions bootstrap_resampling permutation_test

License

Unless otherwise stated, teaching materials for this course are available under CC BY-SA 4.0.

The lecture content, including references to some additional study materials. The main study material is the Pattern Recognition and Machine Learning by Christopher Bishop, referred to as PRML.

Note that the topics in italics are not required for the exam.

1. Introduction to Machine Learning

Oct 03 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions linear_regression_manual linear_regression_features

Introduction to machine learning
Basic definitions [Sections 1 and 1.1 of PRML]
Linear regression model [Section 3.1 of PRML]

2. Linear Regression II, SGD

Oct 10 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions linear_regression_l2 linear_regression_sgd feature_engineering rental_competition

L2 regularization in linear regression [Section 1.1, 3.1.4 of PRML]
Solving linear regression with SVD
Random variables and probability distributions [Section 1.2, 1.2.1 of PRML]
Expectation and variance [Section 1.2.2 of PRML]
Gradient descent [Section 5.2.4 of PRML]
- Stochastic gradient descent solution of linear regression [slides]

3. Perceptron and Logistic Regression

Oct 17 Slides PDF Slides CZ Lecture CZ Logistic Regression CZ Practicals EN Lecture EN Logit EN Practicals Questions perceptron logistic_regression_sgd grid_search thyroid_competition

Cross-validation [Section 1.3 of PRML]
Linear models for classification [Section 4.1.1 of PRML]
Perceptron algorithm [Section 4.1.7 of PRML]
Probability distributions [Bernoulli Section 2.1, Categorical Section 2.2, Gaussian Section 2.3 of PRML]
Information theory [Section 1.6 of PRML]
Maximum likely estimation [Section 1.2.5 of PRML]
Logistic regression [Section 4.3.2 of PRML]

4. Multiclass Logistic Regression, Multilayer Perceptron

Oct 24 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions softmax_classification_sgd mlp_classification_sgd mnist_competition

Generalized linear models
MSE as MLE [Section 3.1.1 of PRML]
Multiclass logistic regression [Section 4.3.4 of PRML]
Poisson regression
Multilayer perceptron (neural network) [Sections 5-5.3 of PRML]
Universal approximation theorem

5. Derivation of Softmax, F1, k-NN

Oct 31 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN kNN EN Practicals Questions multilabel_classification_sgd k_nearest_neighbors diacritization

Lagrange multipliers [Appendix E of PRML]
Calculus of variations [Appendix D of PRML]
Normal distribution via the maximum entropy principle [2 pages before Section 1.6.1 of PRML]
Derivation of softmax via the maximum entropy principle [The equivalence of logistic regression and maximum entropy models writeup]
$F_1$ score and $F_β$ score
K-nearest neighbors [Section 2.5.2 of PRML]

6. Kernel Methods, SVM

Nov 07 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions kernel_linear_regression diacritization_dictionary

Kernels [Sections 4.3.1-4.3.3 of Introduction to Machine Learning]
Kernel linear regression [Sections 1.2, 1.3 of CS229 Lecture notes, part V]
Karush-Kuhn-Tucker Conditions [Appendix E of PRML, Section 6 of CS229 Lecture notes, part V]
Hard-margin SVM [Section 7.1 of PRML, Section 7 of CS229 Lecture notes, part V]

7. Soft-margin SVM, SMO

Nov 14 Slides PDF Slides CZ Lecture CZ SMO Computations CZ Practicals EN Lecture EN SMO Computations EN Practicals Questions smo_algorithm svm_multiclass

Soft-margin SVM [Section 7.1.1 of PRML, Section 8 of CS229 Lecture notes, part V]
Sequential minimal optimization algorithm [Section 9 of CS229 Lecture notes, part V, CS229 Simplified SMO Algorithm]
One-versus-one and one-versus-rest schemes [Section 4.1.2 of PRML]
Support Vector Machine for regression [Section 7.1.4 of PRML]

8. TF-IDF, Naive Bayes

Nov 21 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions tf_idf naive_bayes isnt_it_ironic

TF-IDF
Naive Bayes classifier [Basic idea in Section 8.2.2 of PRML]
- On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes

9. Correlation, Model Combination, Decision Trees, Random Forests

Nov 28 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions metric_correlation decision_tree random_forest

Covariance and correlation
Model ensembling [Section 14.2 of PRML]
Decision trees [Section 14.4 of PRML]
Random forests

10. Gradient Boosting Decision Trees

Dec 05 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions gradient_boosting human_activity_recognition

Gradient boosting decision trees [Paper XGBoost: A Scalable Tree Boosting System]

11. PCA, K-Means

Dec 12 Slides PDF Slides CZ Lecture CZ K-means CZ Practicals EN Lecture EN K-means EN Practicals Questions pca kmeans nli_competition

Principal component analysis [Sections 12.1 and 12.4.2 of PRML]
Power iteration algorithm
K-Means clustering [Section 9.1 of PRML]

12. Gaussian Mixture, EM Algorithm, Bias-Variance Trade-off

Dec 19 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions gaussian_mixture

Multivariate Gaussian [Section 2.3 of PRML]
Gaussian mixture clustering [Section 9.2 of PRML]
EM algorithm [Sections 9.3 and 9.4 of PRML]
Bias-variance tradeoff [Section 3.2 of PRML]
Double descent [Paper Reconciling modern machine learning practice and the bias-variance trade-off]
Deep double descent [Paper Deep Double Descent: Where Bigger Models and More Data Hurt]

13. Statistical Hypothesis Testing, Model Comparison

Jan 02 Slides PDF Slides CZ Lecture CZ Practicals EN Lecture EN Practicals Questions bootstrap_resampling permutation_test

Statistical hypothesis testing
Bootstrap resampling
Paired bootstrap test
Random permutation test

Requirements

To pass the practicals, you need to obtain at least 80 points, excluding the bonus points. Note that up to 40 points above 80 (both bonus and non-bonus) will be transfered to the exam. In total, assignments for at least 120 points (not including the bonus points) will be available.

Environment

The tasks are evaluated automatically using the ReCodEx Code Examiner.

The evaluation is performed using Python 3.9, scikit-learn 1.1.2, numpy 1.23.3, scipy 1.9.1, pandas 1.5.0, and matplotlib 3.6.0. You should install the exact version of these packages yourselves.

Teamwork

Solving assignments in teams (of size at most 3) is encouraged, but everyone has to participate (it is forbidden not to work on an assignment and then submit a solution created by other team members). All members of the team must submit in ReCodEx individually, but can have exactly the same sources/models/results. Each such solution must explicitly list all members of the team to allow plagiarism detection using this template.

No Cheating

Cheating is strictly prohibited and any student found cheating will be punished. The punishment can involve failing the whole course, or, in grave cases, being expelled from the faculty. While discussing assignments with any classmate is fine, each team must complete the assignments themselves, without using code they did not write (unless explicitly allowed). Of course, inside a team you are expected to share code and submit identical solutions.

linear_regression_manual

Deadline: Oct 17, 7:59 a.m. 3 points

Starting with the linear_regression_manual.py template, solve a linear regression problem using the algorithm from the lecture which explicitly computes the matrix inversion. Then compute root mean square error on the test set.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 linear_regression_manual.py --test_size=0.1

52.38

python3 linear_regression_manual.py --test_size=0.5

54.58

python3 linear_regression_manual.py --test_size=0.9

59.46

linear_regression_features

Deadline: Oct 17, 7:59 a.m. 3 points

Starting with the linear_regression_features.py template, use scikit-learn to train a model of a 1D curve.

Try using a concatenation of features $x^1, x^2, …, x^D$ for $D$ from 1 to a given range, and report RMSE of every such configuration.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 linear_regression_features.py --data_size=10 --test_size=5 --range=6

Maximum feature order 1: 0.74 RMSE
Maximum feature order 2: 1.87 RMSE
Maximum feature order 3: 0.53 RMSE
Maximum feature order 4: 4.52 RMSE
Maximum feature order 5: 1.70 RMSE
Maximum feature order 6: 2.82 RMSE

python3 linear_regression_features.py --data_size=30 --test_size=20 --range=9

Maximum feature order 1: 0.56 RMSE
Maximum feature order 2: 1.53 RMSE
Maximum feature order 3: 1.10 RMSE
Maximum feature order 4: 0.28 RMSE
Maximum feature order 5: 1.60 RMSE
Maximum feature order 6: 3.09 RMSE
Maximum feature order 7: 3.92 RMSE
Maximum feature order 8: 65.11 RMSE
Maximum feature order 9: 3886.97 RMSE

python3 linear_regression_features.py --data_size=50 --test_size=40 --range=9

Maximum feature order 1: 0.63 RMSE
Maximum feature order 2: 0.73 RMSE
Maximum feature order 3: 0.31 RMSE
Maximum feature order 4: 0.26 RMSE
Maximum feature order 5: 1.22 RMSE
Maximum feature order 6: 0.69 RMSE
Maximum feature order 7: 2.39 RMSE
Maximum feature order 8: 7.28 RMSE
Maximum feature order 9: 201.70 RMSE

linear_regression_l2

Deadline: Oct 24, 7:59 a.m. 2 points

Starting with the linear_regression_l2.py template, use scikit-learn to train L2-regularized linear regression models and print the results of the best of them.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 linear_regression_l2.py --test_size=0.15

0.49 52.11

python3 linear_regression_l2.py --test_size=0.80

0.10 53.53

linear_regression_sgd

Deadline: Oct 24, 7:59 a.m. 4 points

Starting with the linear_regression_sgd.py, implement minibatch SGD for linear regression and compare the results to an explicit linear regression solver.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 linear_regression_sgd.py --batch_size=10 --epochs=50 --learning_rate=0.01

Test RMSE: SGD 90.96, explicit 91.38
Learned weights: 3.94 7.52 0.08 30.82 -1.72 -1.13 -1.98 6.29 1.98 -10.60 -13.84 -4.31 ...

python3 linear_regression_sgd.py --batch_size=10 --epochs=50 --learning_rate=0.1

Test RMSE: SGD 90.73, explicit 91.38
Learned weights: 1.94 8.31 1.22 33.18 -3.74 -3.64 -2.46 5.19 1.72 -12.40 -14.08 -2.28 ...

python3 linear_regression_sgd.py --batch_size=10 --epochs=50 --learning_rate=0.001

Test RMSE: SGD 108.66, explicit 91.38
Learned weights: 2.79 2.19 -0.06 14.16 -1.07 0.97 0.78 4.62 0.79 -4.62 -7.37 -3.07 ...

python3 linear_regression_sgd.py --batch_size=1 --epochs=50 --learning_rate=0.01

Test RMSE: SGD 90.73, explicit 91.38
Learned weights: 1.94 8.31 1.22 33.18 -3.74 -3.64 -2.46 5.19 1.72 -12.40 -14.08 -2.28 ...

python3 linear_regression_sgd.py --batch_size=50 --epochs=50 --learning_rate=0.01

Test RMSE: SGD 99.74, explicit 91.38
Learned weights: 3.99 3.67 -0.20 20.79 -1.29 1.37 0.47 6.54 1.54 -6.95 -10.65 -4.50 ...

python3 linear_regression_sgd.py --batch_size=50 --epochs=500 --learning_rate=0.01

Test RMSE: SGD 90.67, explicit 91.38
Learned weights: 3.20 8.00 0.57 32.21 -2.49 -2.45 -2.28 5.57 1.69 -11.47 -14.00 -3.44 ...

python3 linear_regression_sgd.py --batch_size=50 --epochs=500 --learning_rate=0.01 --l2=0.1

Test RMSE: SGD 90.71, explicit 91.38
Learned weights: 3.40 7.36 0.32 30.21 -2.14 -1.68 -1.88 5.79 1.72 -10.84 -13.45 -3.73 ...

feature_engineering

Deadline: Oct 24, 7:59 a.m. 3 points

Starting with the feature_engineering.py template, learn how to perform basic feature engineering using scikit-learn.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 feature_engineering.py --dataset=diabetes

-0.5745 -0.9514 1.797 -0.4984 0.4751 0.9487 -0.6961 0.7574 0.06019 1.625 0.33 0.5465 -1.033 0.2863 -0.2729 -0.545 0.3999 -0.4351 -0.03458 -0.9334 0.9052 -1.71 0.4742 -0.452 -0.9026 0.6623 -0.7206 -0.05727 -1.546 3.23 -0.8959 0.8539 1.705 -1.251 1.361 0.1082 2.92 0.2484 -0.2368 -0.4729 0.347 -0.3775 -0.03 -0.8099 0.2257 0.4507 -0.3307 0.3598 0.0286 0.7719 0.9 -0.6604 0.7185 0.0571 1.541 0.4845 -0.5272 -0.0419 -1.131 0.5736 0.04559 1.231 0.003623 0.0978 2.64
0.2776 -0.9514 0.08366 -1.148 -1.592 -1.397 -0.4687 -0.7816 -0.3766 -1.973 0.07706 -0.2641 0.02322 -0.3186 -0.442 -0.3878 -0.1301 -0.217 -0.1045 -0.5477 0.9052 -0.0796 1.092 1.515 1.329 0.4459 0.7436 0.3583 1.877 0.007 -0.09602 -0.1332 -0.1169 -0.03921 -0.06539 -0.03151 -0.1651 1.317 1.827 1.603 0.5379 0.8971 0.4322 2.264 2.535 2.224 0.7462 1.245 0.5996 3.141 1.952 0.6548 1.092 0.5261 2.757 0.2197 0.3663 0.1765 0.9247 0.6109 0.2944 1.542 0.1418 0.7431 3.893
0.8198 1.051 -0.683 -0.8108 -0.6896 -0.4871 -0.2413 -0.03186 -0.2682 0.04527 0.6721 0.8617 -0.5599 -0.6647 -0.5653 -0.3993 -0.1978 -0.02612 -0.2199 0.03711 1.105 -0.7179 -0.8522 -0.7248 -0.512 -0.2536 -0.03348 -0.2819 0.04758 0.4665 0.5538 0.471 0.3327 0.1648 0.02176 0.1832 -0.03092 0.6574 0.5591 0.3949 0.1956 0.02583 0.2175 -0.0367 0.4755 0.3359 0.1664 0.02197 0.185 -0.03121 0.2373 0.1175 0.01552 0.1306 -0.02205 0.05822 0.007686 0.06472 -0.01092 0.001015 0.008544 -0.001442 0.07194 -0.01214 0.002049
0.9747 1.051 1.211 0.6803 0.6207 -0.9859 -1.151 1.547 2.783 2.853 0.9501 1.025 1.18 0.6631 0.605 -0.961 -1.122 1.508 2.712 2.781 1.105 1.273 0.715 0.6524 -1.036 -1.21 1.626 2.925 2.999 1.467 0.8239 0.7517 -1.194 -1.394 1.873 3.37 3.456 0.4628 0.4222 -0.6707 -0.7829 1.052 1.893 1.941 0.3852 -0.6119 -0.7143 0.96 1.727 1.771 0.972 1.135 -1.525 -2.743 -2.813 1.325 -1.78 -3.203 -3.284 2.392 4.304 4.413 7.743 7.94 8.142
-0.1872 -0.9514 0.1739 -1.171 -0.5149 -0.8915 0.5925 -0.8211 0.3554 -0.1302 0.03503 0.1781 -0.03254 0.2193 0.09637 0.1669 -0.1109 0.1537 -0.06651 0.02438 0.9052 -0.1654 1.115 0.4899 0.8482 -0.5637 0.7812 -0.3381 0.1239 0.03023 -0.2037 -0.08952 -0.155 0.103 -0.1428 0.06178 -0.02264 1.372 0.6032 1.044 -0.6941 0.9619 -0.4163 0.1526 0.2651 0.459 -0.3051 0.4228 -0.183 0.06706 0.7948 -0.5282 0.732 -0.3168 0.1161 0.3511 -0.4865 0.2106 -0.07717 0.6742 -0.2918 0.1069 0.1263 -0.04628 0.01696
0.9747 -0.9514 -0.1869 -0.3058 2.659 2.728 0.3651 0.7574 0.676 -0.1302 0.9501 -0.9274 -0.1822 -0.2981 2.592 2.659 0.3559 0.7382 0.6589 -0.1269 0.9052 0.1778 0.291 -2.53 -2.596 -0.3474 -0.7206 -0.6431 0.1239 0.03494 0.05717 -0.497 -0.51 -0.06824 -0.1416 -0.1264 0.02434 0.09354 -0.8132 -0.8344 -0.1117 -0.2316 -0.2067 0.03983 7.07 7.254 0.9708 2.014 1.797 -0.3463 7.443 0.9961 2.066 1.844 -0.3553 0.1333 0.2765 0.2468 -0.04755 0.5736 0.512 -0.09864 0.457 -0.08804 0.01696
1.982 -0.9514 0.715 0.4877 -0.5149 -0.3253 -0.01389 -0.8211 -0.4683 -0.4813 3.927 -1.885 1.417 0.9664 -1.02 -0.6447 -0.02753 -1.627 -0.928 -0.9537 0.9052 -0.6803 -0.464 0.4899 0.3095 0.01322 0.7812 0.4455 0.4579 0.5113 0.3487 -0.3682 -0.2326 -0.009932 -0.5871 -0.3348 -0.3441 0.2378 -0.2511 -0.1586 -0.006774 -0.4004 -0.2284 -0.2347 0.2651 0.1675 0.007152 0.4228 0.2411 0.2478 0.1058 0.004519 0.2671 0.1523 0.1566 0.000193 0.01141 0.006505 0.006685 0.6742 0.3845 0.3952 0.2193 0.2254 0.2316
1.362 1.051 -0.1418 -0.2337 2.193 1.084 1.123 -0.03186 1.76 -0.3935 1.855 1.432 -0.1932 -0.3183 2.987 1.476 1.53 -0.04339 2.397 -0.536 1.105 -0.1491 -0.2456 2.305 1.139 1.18 -0.03348 1.85 -0.4136 0.02011 0.03314 -0.311 -0.1537 -0.1593 0.004518 -0.2496 0.05581 0.05462 -0.5125 -0.2532 -0.2625 0.007445 -0.4113 0.09196 4.809 2.376 2.463 -0.06986 3.86 -0.8629 1.174 1.217 -0.03452 1.907 -0.4264 1.261 -0.03578 1.977 -0.4419 0.001015 -0.05607 0.01253 3.098 -0.6926 0.1548
2.059 -0.9514 1.031 1.69 1.174 0.8206 -1.606 3.046 2.055 1.274 4.24 -1.959 2.122 3.48 2.417 1.69 -3.306 6.273 4.231 2.623 0.9052 -0.9806 -1.608 -1.117 -0.7807 1.528 -2.898 -1.955 -1.212 1.062 1.742 1.21 0.8458 -1.655 3.14 2.118 1.313 2.857 1.984 1.387 -2.714 5.149 3.473 2.153 1.378 0.9633 -1.885 3.576 2.412 1.495 0.6734 -1.318 2.5 1.686 1.045 2.578 -4.891 -3.3 -2.045 9.279 6.26 3.88 4.223 2.618 1.623
0.2776 1.051 -0.48 -0.0173 0.8245 1.178 -0.1655 0.7574 -0.1065 -0.218 0.07706 0.2918 -0.1333 -0.004801 0.2289 0.327 -0.04594 0.2102 -0.02956 -0.06051 1.105 -0.5046 -0.01818 0.8666 1.238 -0.1739 0.7961 -0.1119 -0.2291 0.2304 0.008303 -0.3958 -0.5654 0.07944 -0.3636 0.05111 0.1046 0.0002992 -0.01426 -0.02037 0.002862 -0.0131 0.001842 0.00377 0.6798 0.9711 -0.1364 0.6245 -0.08779 -0.1797 1.387 -0.1949 0.8921 -0.1254 -0.2568 0.02739 -0.1253 0.01762 0.03608 0.5736 -0.08064 -0.1651 0.01134 0.02321 0.04752

python3 feature_engineering.py --dataset=linnerud

1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...

python3 feature_engineering.py --dataset=wine

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.2976 1.31 0.1177 0.9155 0.7783 0.5271 0.4232 0.6668 -1.122 1.048 0.7913 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.282 -1.41 0.8239 -0.4697 -0.1538 0.2001 -1.152 1.377 -0.9146 -0.6609 0.7232 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.442 1.419 0.5492 -1.8 1.03 0.9984 -1.302 0.8977 -0.02794 -0.2336 1.336 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4121 -0.9989 -1.844 -0.3312 -1.263 -0.6176 -0.6272 -0.3989 -1.174 0.4073 0.301 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9155 -0.7526 -0.2354 0.8601 -0.7751 -0.3001 0.4232 -0.02594 -1.174 1.646 -0.3936 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.778 0.6071 0.7454 -1.245 0.586 0.9888 -1.527 0.1517 -0.02794 0.06549 1.105 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.43 1.465 0.2746 -0.2204 0.8079 0.6233 -0.5521 -0.5766 0.03261 -0.3191 1.064 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.03446 0.3424 1.295 0.3614 -1.13 -1.445 1.173 -1.465 -0.2442 -0.7464 -0.3255 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0.8809 -0.8529 1.295 0.777 1.03 1.2 -0.6272 1.431 0.2316 1.048 0.2193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.6752 -1.154 -1.765 -0.02646 -0.2869 -0.001945 -0.7772 -0.9496 -0.2096 0.7491 1.268 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...

rental_competition

Deadline: Oct 24, 7:59 a.m. 3 points+4 bonus

This assignment is a competition task. Your goal is to perform regression on the data from a bike rental shop. The train set contains 1000 instances, each instance consists of 12 features, both integral and real.

The rental_competition.py template shows how to load the training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.

The performance of your system is measured using root mean squared error and your goal is to achieve RMSE less than 100. Note that you can use any number of generalized linear models from sklearn to solve this assignment (but no decision trees, MLPs, …).

perceptron

Deadline: Oct 31, 7:59 a.m. 2 points

Starting with the perceptron.py template, implement the perceptron algorithm.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 perceptron.py --data_size=100 --seed=17

Learned weights 4.10 2.94 -1.00

python3 perceptron.py --data_size=50 --seed=320

Learned weights -2.30 -1.96 -2.00

python3 perceptron.py --data_size=200 --seed=92

Learned weights 4.43 1.54 -2.00

logistic_regression_sgd

Deadline: Oct 31, 7:59 a.m. 5 points

Starting with the logistic_regression_sgd.py, implement minibatch SGD for logistic regression.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 logistic_regression_sgd.py --data_size=100 --batch_size=10 --epochs=9 --learning_rate=0.5

After epoch 1: train loss 0.3259 acc 94.0%, test loss 0.3301 acc 96.0%
After epoch 2: train loss 0.2321 acc 96.0%, test loss 0.2385 acc 98.0%
After epoch 3: train loss 0.1877 acc 98.0%, test loss 0.1949 acc 98.0%
After epoch 4: train loss 0.1612 acc 98.0%, test loss 0.1689 acc 98.0%
After epoch 5: train loss 0.1435 acc 98.0%, test loss 0.1517 acc 98.0%
After epoch 6: train loss 0.1307 acc 98.0%, test loss 0.1396 acc 98.0%
After epoch 7: train loss 0.1208 acc 98.0%, test loss 0.1304 acc 96.0%
After epoch 8: train loss 0.1129 acc 98.0%, test loss 0.1230 acc 96.0%
After epoch 9: train loss 0.1065 acc 98.0%, test loss 0.1170 acc 96.0%
Learned weights 2.77 -0.60 0.12

python3 logistic_regression_sgd.py --data_size=95 --test_size=45 --batch_size=5 --epochs=9 --learning_rate=0.5

After epoch 1: train loss 0.2429 acc 96.0%, test loss 0.3187 acc 93.3%
After epoch 2: train loss 0.1853 acc 96.0%, test loss 0.2724 acc 93.3%
After epoch 3: train loss 0.1590 acc 96.0%, test loss 0.2525 acc 93.3%
After epoch 4: train loss 0.1428 acc 96.0%, test loss 0.2411 acc 93.3%
After epoch 5: train loss 0.1313 acc 98.0%, test loss 0.2335 acc 93.3%
After epoch 6: train loss 0.1225 acc 96.0%, test loss 0.2258 acc 93.3%
After epoch 7: train loss 0.1159 acc 96.0%, test loss 0.2220 acc 93.3%
After epoch 8: train loss 0.1105 acc 96.0%, test loss 0.2187 acc 93.3%
After epoch 9: train loss 0.1061 acc 96.0%, test loss 0.2163 acc 93.3%
Learned weights -0.61 3.61 0.12

python3 logistic_regression_sgd.py --data_size=95 --test_size=45 --batch_size=1 --epochs=9 --learning_rate=0.7

After epoch 1: train loss 0.1141 acc 96.0%, test loss 0.2268 acc 93.3%
After epoch 2: train loss 0.0867 acc 96.0%, test loss 0.2150 acc 91.1%
After epoch 3: train loss 0.0797 acc 98.0%, test loss 0.2320 acc 88.9%
After epoch 4: train loss 0.0753 acc 96.0%, test loss 0.2224 acc 88.9%
After epoch 5: train loss 0.0692 acc 96.0%, test loss 0.2154 acc 88.9%
After epoch 6: train loss 0.0749 acc 98.0%, test loss 0.2458 acc 88.9%
After epoch 7: train loss 0.0638 acc 96.0%, test loss 0.2190 acc 88.9%
After epoch 8: train loss 0.0644 acc 98.0%, test loss 0.2341 acc 88.9%
After epoch 9: train loss 0.0663 acc 98.0%, test loss 0.2490 acc 88.9%
Learned weights -1.07 7.33 -0.40

grid_search

Deadline: Oct 31, 7:59 a.m. 2 points

Starting with grid_search.py template, perform a hyperparameter grid search, evaluating hyperparameter performance using a stratified k-fold crossvalidation, and finally evaluate a model trained with best hyperparameters on all training data. The easiest way is to utilize sklearn.model_selection.GridSearchCV.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 grid_search.py --test_size=0.5

Rank: 11 Cross-val: 86.7% lr__C: 0.01  lr__solver: lbfgs polynomial__degree: 1    
Rank:  5 Cross-val: 92.7% lr__C: 0.01  lr__solver: lbfgs polynomial__degree: 2    
Rank: 11 Cross-val: 86.7% lr__C: 0.01  lr__solver: sag   polynomial__degree: 1    
Rank:  5 Cross-val: 92.7% lr__C: 0.01  lr__solver: sag   polynomial__degree: 2    
Rank:  7 Cross-val: 90.8% lr__C: 1     lr__solver: lbfgs polynomial__degree: 1    
Rank:  3 Cross-val: 96.8% lr__C: 1     lr__solver: lbfgs polynomial__degree: 2    
Rank:  7 Cross-val: 90.8% lr__C: 1     lr__solver: sag   polynomial__degree: 1    
Rank:  4 Cross-val: 96.8% lr__C: 1     lr__solver: sag   polynomial__degree: 2    
Rank: 10 Cross-val: 90.1% lr__C: 100   lr__solver: lbfgs polynomial__degree: 1    
Rank:  1 Cross-val: 97.2% lr__C: 100   lr__solver: lbfgs polynomial__degree: 2    
Rank:  9 Cross-val: 90.5% lr__C: 100   lr__solver: sag   polynomial__degree: 1    
Rank:  2 Cross-val: 97.0% lr__C: 100   lr__solver: sag   polynomial__degree: 2    
Test accuracy: 98.33

python3 grid_search.py --test_size=0.7

Rank: 11 Cross-val: 87.9% lr__C: 0.01  lr__solver: lbfgs polynomial__degree: 1    
Rank:  5 Cross-val: 91.8% lr__C: 0.01  lr__solver: lbfgs polynomial__degree: 2    
Rank: 11 Cross-val: 87.9% lr__C: 0.01  lr__solver: sag   polynomial__degree: 1    
Rank:  5 Cross-val: 91.8% lr__C: 0.01  lr__solver: sag   polynomial__degree: 2    
Rank:  7 Cross-val: 91.3% lr__C: 1     lr__solver: lbfgs polynomial__degree: 1    
Rank:  3 Cross-val: 95.9% lr__C: 1     lr__solver: lbfgs polynomial__degree: 2    
Rank:  7 Cross-val: 91.3% lr__C: 1     lr__solver: sag   polynomial__degree: 1    
Rank:  4 Cross-val: 95.7% lr__C: 1     lr__solver: sag   polynomial__degree: 2    
Rank: 10 Cross-val: 89.2% lr__C: 100   lr__solver: lbfgs polynomial__degree: 1    
Rank:  1 Cross-val: 96.5% lr__C: 100   lr__solver: lbfgs polynomial__degree: 2    
Rank:  9 Cross-val: 89.2% lr__C: 100   lr__solver: sag   polynomial__degree: 1    
Rank:  2 Cross-val: 96.1% lr__C: 100   lr__solver: sag   polynomial__degree: 2    
Test accuracy: 96.98

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 grid_search.py --test_size=0.5

Test accuracy: 98.33

python3 grid_search.py --test_size=0.7

Test accuracy: 96.98

thyroid_competition

Deadline: Oct 31, 7:59 a.m. 3 points+4 bonus

This assignment is a competition task. Your goal is to perform binary classification – given medical data with 15 binary and 6 real-valued attributes, predict whether thyroid is functioning normally or not. The train set and test set consist of ~3.5k instances.

The thyroid_competition.py template shows how to load training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.

The performance of your system is measured using accuracy of correctly predicted examples and your goal is to achieve at least 96% accuracy. Note that you can use any number of generalized linear models from sklearn to solve this assignment (but no decision trees, MLPs, …).

softmax_classification_sgd

Deadline: Nov 7, 7:59 a.m. 3 points

Starting with the softmax_classification_sgd.py, implement minibatch SGD for multinomial logistic regression.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 softmax_classification_sgd.py --batch_size=10 --epochs=10 --learning_rate=0.005

After epoch 1: train loss 0.3130 acc 90.8%, test loss 0.3529 acc 88.7%
After epoch 2: train loss 0.2134 acc 93.9%, test loss 0.2450 acc 92.5%
After epoch 3: train loss 0.1366 acc 96.8%, test loss 0.1735 acc 94.6%
After epoch 4: train loss 0.1374 acc 96.2%, test loss 0.1705 acc 94.0%
After epoch 5: train loss 0.1169 acc 97.2%, test loss 0.1667 acc 95.1%
After epoch 6: train loss 0.0978 acc 97.5%, test loss 0.1340 acc 96.1%
After epoch 7: train loss 0.0878 acc 98.0%, test loss 0.1366 acc 95.9%
After epoch 8: train loss 0.0889 acc 97.5%, test loss 0.1515 acc 95.1%
After epoch 9: train loss 0.0819 acc 98.0%, test loss 0.1336 acc 96.5%
After epoch 10: train loss 0.0801 acc 97.9%, test loss 0.1342 acc 96.4%
Learned weights:
  -0.03 -0.10 0.01 0.08 -0.05 0.01 -0.06 0.05 0.07 -0.10 ...
  0.09 0.07 -0.15 -0.02 -0.21 0.13 -0.01 -0.06 0.02 -0.07 ...
  0.05 0.08 0.01 -0.03 -0.05 0.06 0.04 -0.10 -0.03 0.09 ...
  0.02 -0.03 -0.02 0.11 0.16 0.09 -0.06 0.06 -0.09 0.05 ...
  -0.07 -0.07 -0.10 -0.07 -0.10 -0.13 -0.09 0.03 -0.04 0.02 ...
  -0.07 -0.04 0.20 0.05 -0.02 0.12 0.06 0.04 -0.04 0.01 ...
  -0.09 -0.04 -0.14 -0.09 -0.02 -0.08 -0.09 0.05 0.05 -0.03 ...
  0.07 0.01 0.05 -0.01 0.06 -0.01 0.13 -0.04 0.03 -0.02 ...
  0.02 -0.02 0.01 -0.08 0.03 0.01 -0.10 -0.03 0.08 -0.05 ...
  0.04 -0.05 -0.07 0.09 -0.00 -0.05 0.10 -0.09 -0.01 0.01 ...

python3 softmax_classification_sgd.py --batch_size=1 --epochs=10 --learning_rate=0.005 --test_size=1597

After epoch 1: train loss 1.7683 acc 73.5%, test loss 2.0028 acc 72.2%
After epoch 2: train loss 0.7731 acc 88.5%, test loss 1.5349 acc 77.8%
After epoch 3: train loss 1.2189 acc 82.5%, test loss 2.0718 acc 73.7%
After epoch 4: train loss 1.1752 acc 89.5%, test loss 2.3474 acc 79.0%
After epoch 5: train loss 0.2969 acc 95.5%, test loss 1.0299 acc 86.0%
After epoch 6: train loss 0.2176 acc 96.0%, test loss 0.9374 acc 86.7%
After epoch 7: train loss 0.1214 acc 97.5%, test loss 0.8018 acc 87.7%
After epoch 8: train loss 0.0178 acc 99.0%, test loss 0.5969 acc 90.4%
After epoch 9: train loss 0.2188 acc 94.0%, test loss 1.2211 acc 83.0%
After epoch 10: train loss 0.0054 acc 100.0%, test loss 0.6710 acc 89.8%
Learned weights:
  -0.03 -0.10 0.05 0.12 0.09 0.00 -0.07 0.05 0.07 -0.15 ...
  0.09 0.10 -0.31 -0.21 -0.55 0.21 -0.08 -0.06 0.02 -0.11 ...
  0.05 0.07 0.14 -0.01 -0.15 0.02 0.03 -0.10 -0.04 0.28 ...
  0.02 -0.02 0.11 0.28 0.13 0.11 -0.12 0.06 -0.08 0.19 ...
  -0.07 -0.09 -0.10 -0.32 -0.27 -0.50 -0.21 0.04 -0.04 0.06 ...
  -0.07 -0.07 0.42 0.18 0.11 0.51 0.13 0.04 -0.03 0.12 ...
  -0.09 -0.05 -0.31 -0.16 0.15 -0.02 -0.12 0.05 0.05 -0.10 ...
  0.07 0.02 0.05 0.09 0.16 0.05 0.20 -0.08 0.03 -0.10 ...
  0.02 -0.02 -0.12 -0.06 0.07 -0.09 -0.06 -0.03 0.08 -0.18 ...
  0.04 -0.04 -0.14 0.08 0.08 -0.15 0.24 -0.08 -0.01 -0.11 ...

python3 softmax_classification_sgd.py --batch_size=100 --epochs=10 --learning_rate=0.05

After epoch 1: train loss 4.1126 acc 77.8%, test loss 4.2883 acc 75.5%
After epoch 2: train loss 0.4290 acc 90.5%, test loss 0.5414 acc 89.8%
After epoch 3: train loss 0.6189 acc 88.0%, test loss 0.5752 acc 89.2%
After epoch 4: train loss 0.3084 acc 91.9%, test loss 0.3482 acc 91.3%
After epoch 5: train loss 0.2757 acc 93.2%, test loss 0.3792 acc 91.3%
After epoch 6: train loss 0.2559 acc 92.7%, test loss 0.3718 acc 91.8%
After epoch 7: train loss 0.1164 acc 96.8%, test loss 0.1761 acc 95.1%
After epoch 8: train loss 0.2891 acc 91.5%, test loss 0.4110 acc 90.2%
After epoch 9: train loss 0.1256 acc 96.4%, test loss 0.1977 acc 94.9%
After epoch 10: train loss 0.1239 acc 96.3%, test loss 0.1847 acc 95.0%
Learned weights:
  -0.03 -0.10 -0.05 0.07 -0.08 -0.04 -0.06 0.05 0.07 -0.12 ...
  0.09 0.05 -0.24 -0.03 -0.25 0.16 -0.01 -0.06 0.02 -0.13 ...
  0.05 0.10 0.05 -0.02 -0.06 0.04 0.03 -0.10 -0.03 0.16 ...
  0.02 -0.03 0.03 0.15 0.25 0.13 -0.09 0.06 -0.09 0.11 ...
  -0.07 -0.08 -0.13 -0.10 -0.10 -0.18 -0.11 0.03 -0.04 0.00 ...
  -0.07 -0.02 0.32 0.06 0.03 0.23 0.10 0.04 -0.03 0.03 ...
  -0.09 -0.04 -0.18 -0.12 -0.01 -0.12 -0.10 0.05 0.04 -0.06 ...
  0.07 0.01 0.10 0.00 0.05 0.05 0.20 -0.02 0.03 -0.02 ...
  0.02 -0.03 -0.04 -0.12 0.02 -0.02 -0.15 -0.04 0.08 -0.08 ...
  0.04 -0.06 -0.06 0.12 -0.04 -0.10 0.12 -0.09 -0.01 0.01 ...

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 softmax_classification_sgd.py --batch_size=10 --epochs=2 --learning_rate=0.005

After epoch 1: train loss 0.3130 acc 90.8%, test loss 0.3529 acc 88.7%
After epoch 2: train loss 0.2134 acc 93.9%, test loss 0.2450 acc 92.5%
Learned weights:
  -0.03 -0.10 0.01 0.06 -0.07 0.04 -0.05 0.05 0.07 -0.10 ...
  0.09 0.08 -0.12 -0.08 -0.10 0.09 -0.03 -0.06 0.02 -0.01 ...
  0.05 0.07 0.01 -0.03 -0.05 0.06 0.04 -0.10 -0.03 0.08 ...
  0.02 -0.05 -0.01 0.10 0.11 0.09 -0.05 0.06 -0.09 0.04 ...
  -0.07 -0.07 -0.10 -0.01 -0.06 -0.07 -0.08 0.04 -0.04 0.01 ...
  -0.07 -0.05 0.14 0.06 0.02 0.14 0.05 0.04 -0.04 0.03 ...
  -0.09 -0.04 -0.11 -0.06 -0.04 -0.10 -0.09 0.05 0.05 -0.01 ...
  0.07 0.01 0.02 -0.04 0.04 -0.01 0.11 -0.06 0.03 -0.03 ...
  0.02 -0.02 0.01 -0.03 0.00 -0.03 -0.09 -0.03 0.08 -0.07 ...
  0.04 -0.04 -0.05 0.05 -0.04 -0.05 0.09 -0.08 -0.01 -0.04 ...

python3 softmax_classification_sgd.py --batch_size=1 --epochs=1 --learning_rate=0.005 --test_size=1597

After epoch 1: train loss 1.7683 acc 73.5%, test loss 2.0028 acc 72.2%
Learned weights:
  -0.03 -0.10 0.03 0.08 0.03 0.03 -0.07 0.05 0.07 -0.15 ...
  0.09 0.08 -0.25 -0.15 -0.17 0.11 -0.00 -0.06 0.02 -0.05 ...
  0.05 0.06 0.07 0.04 -0.12 0.11 0.07 -0.10 -0.03 0.16 ...
  0.02 -0.03 0.03 0.14 0.03 0.08 -0.09 0.06 -0.09 0.09 ...
  -0.07 -0.08 -0.22 -0.07 -0.11 -0.27 -0.13 0.04 -0.04 -0.00 ...
  -0.07 -0.08 0.17 0.16 0.17 0.39 0.07 0.04 -0.03 0.03 ...
  -0.09 -0.04 -0.13 -0.10 -0.03 -0.16 -0.09 0.05 0.05 -0.03 ...
  0.07 0.02 0.10 0.03 0.09 -0.05 0.13 -0.08 0.03 -0.05 ...
  0.02 0.00 0.02 -0.17 -0.01 -0.04 -0.12 -0.03 0.08 -0.09 ...
  0.04 -0.04 -0.02 0.06 -0.07 -0.05 0.16 -0.08 -0.01 -0.01 ...

python3 softmax_classification_sgd.py --batch_size=100 --epochs=3 --learning_rate=0.05

After epoch 1: train loss 4.1126 acc 77.8%, test loss 4.2883 acc 75.5%
After epoch 2: train loss 0.4290 acc 90.5%, test loss 0.5414 acc 89.8%
After epoch 3: train loss 0.6189 acc 88.0%, test loss 0.5752 acc 89.2%
Learned weights:
  -0.03 -0.10 -0.04 0.08 -0.07 -0.02 -0.05 0.05 0.07 -0.12 ...
  0.09 0.06 -0.23 -0.08 -0.12 0.11 -0.04 -0.06 0.02 -0.09 ...
  0.05 0.09 0.07 -0.01 -0.08 0.01 0.02 -0.10 -0.03 0.16 ...
  0.02 -0.04 0.03 0.15 0.18 0.09 -0.09 0.06 -0.09 0.10 ...
  -0.07 -0.07 -0.14 -0.07 -0.07 -0.13 -0.11 0.03 -0.04 -0.01 ...
  -0.07 -0.03 0.28 0.07 0.06 0.29 0.11 0.04 -0.03 0.08 ...
  -0.09 -0.04 -0.16 -0.09 -0.04 -0.13 -0.09 0.05 0.04 -0.05 ...
  0.07 0.01 0.06 -0.01 0.05 0.06 0.18 -0.04 0.03 -0.03 ...
  0.02 -0.03 -0.03 -0.08 -0.00 -0.03 -0.12 -0.04 0.08 -0.12 ...
  0.04 -0.05 -0.04 0.06 -0.09 -0.09 0.12 -0.09 -0.01 -0.03 ...

mlp_classification_sgd

Deadline: Nov 7, 7:59 a.m. 6 points

Starting with the mlp_classification_sgd.py, implement minibatch SGD for multilayer perceptron classification.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 mlp_classification_sgd.py --epochs=10 --batch_size=10 --hidden_layer=20

After epoch 1: train acc 79.7%, test acc 80.2%
After epoch 2: train acc 91.9%, test acc 88.3%
After epoch 3: train acc 92.4%, test acc 90.0%
After epoch 4: train acc 96.1%, test acc 93.1%
After epoch 5: train acc 95.3%, test acc 93.1%
After epoch 6: train acc 96.6%, test acc 93.9%
After epoch 7: train acc 97.3%, test acc 94.2%
After epoch 8: train acc 98.2%, test acc 94.9%
After epoch 9: train acc 98.1%, test acc 95.7%
After epoch 10: train acc 97.4%, test acc 95.1%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  -0.07 0.12 0.33 -0.21 -0.16 -0.13 0.02 -0.14 0.01 -0.12 -0.02 -0.04 ...
  -0.00 -0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 0.00 ...
  0.02 -0.01 0.01 -0.03 0.02 -0.01 0.00 0.01 0.01 -0.01 ...

python3 mlp_classification_sgd.py --epochs=10 --batch_size=10 --hidden_layer=50

After epoch 1: train acc 91.1%, test acc 89.2%
After epoch 2: train acc 95.9%, test acc 93.5%
After epoch 3: train acc 96.5%, test acc 95.2%
After epoch 4: train acc 96.1%, test acc 94.5%
After epoch 5: train acc 96.3%, test acc 93.5%
After epoch 6: train acc 98.3%, test acc 96.2%
After epoch 7: train acc 98.4%, test acc 96.4%
After epoch 8: train acc 98.3%, test acc 95.7%
After epoch 9: train acc 99.1%, test acc 97.4%
After epoch 10: train acc 98.8%, test acc 97.4%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  0.01 0.10 -0.16 0.02 0.13 0.04 0.14 -0.01 0.05 -0.07 -0.08 0.02 ...
  0.00 0.00 -0.00 0.00 0.00 -0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 ...
  0.01 -0.00 -0.00 0.00 0.00 0.00 -0.01 0.00 -0.00 -0.00 ...

python3 mlp_classification_sgd.py --epochs=10 --batch_size=10 --hidden_layer=200

After epoch 1: train acc 95.4%, test acc 93.0%
After epoch 2: train acc 97.9%, test acc 96.6%
After epoch 3: train acc 98.8%, test acc 96.9%
After epoch 4: train acc 98.0%, test acc 95.4%
After epoch 5: train acc 99.6%, test acc 97.7%
After epoch 6: train acc 99.7%, test acc 98.0%
After epoch 7: train acc 97.4%, test acc 95.4%
After epoch 8: train acc 99.7%, test acc 97.5%
After epoch 9: train acc 99.8%, test acc 97.9%
After epoch 10: train acc 99.9%, test acc 97.9%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  0.01 -0.09 0.04 -0.09 0.06 0.06 -0.05 -0.04 -0.00 0.02 -0.04 0.02 ...
  0.00 0.00 0.00 -0.00 0.00 -0.00 -0.00 -0.00 0.00 0.00 0.00 -0.00 ...
  -0.00 -0.01 -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.01 0.00 ...

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 mlp_classification_sgd.py --epochs=3 --batch_size=10 --hidden_layer=20

After epoch 1: train acc 79.7%, test acc 80.2%
After epoch 2: train acc 91.9%, test acc 88.3%
After epoch 3: train acc 92.4%, test acc 90.0%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  -0.09 0.07 0.21 -0.16 -0.15 -0.07 0.01 -0.09 0.05 -0.11 -0.02 -0.04 ...
  -0.00 -0.00 0.00 0.00 0.00 0.00 -0.00 0.00 0.00 -0.00 0.00 -0.00 ...
  0.01 -0.01 0.01 -0.02 0.01 -0.01 0.00 0.01 0.01 -0.01 ...

python3 mlp_classification_sgd.py --epochs=3 --batch_size=10 --hidden_layer=50

After epoch 1: train acc 91.1%, test acc 89.2%
After epoch 2: train acc 95.9%, test acc 93.5%
After epoch 3: train acc 96.5%, test acc 95.2%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  0.01 0.06 -0.13 0.04 0.11 0.04 0.13 0.01 0.05 -0.05 -0.07 0.02 ...
  0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 0.00 -0.00 -0.00 -0.00 ...
  0.01 0.00 -0.00 0.00 -0.00 0.00 -0.01 -0.00 -0.00 0.00 ...

python3 mlp_classification_sgd.py --epochs=3 --batch_size=10 --hidden_layer=200

After epoch 1: train acc 95.4%, test acc 93.0%
After epoch 2: train acc 97.9%, test acc 96.6%
After epoch 3: train acc 98.8%, test acc 96.9%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  0.01 -0.09 0.04 -0.09 0.06 0.06 -0.05 -0.04 -0.00 0.02 -0.04 0.02 ...
  0.00 0.00 -0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 0.00 -0.00 -0.00 ...
  -0.00 -0.00 -0.00 -0.00 0.00 -0.00 -0.00 0.00 0.00 0.00 ...

python3 mlp_classification_sgd.py --epochs=1 --batch_size=1 --hidden_layer=200 --test_size=1597

After epoch 1: train acc 74.0%, test acc 68.7%
Learned parameters:
  -0.03 0.09 0.05 0.02 -0.07 -0.07 -0.09 0.07 0.02 0.04 -0.10 0.09 ...
  0.01 -0.09 0.04 -0.09 0.06 0.06 -0.05 -0.04 -0.00 0.02 -0.04 0.02 ...
  0.00 0.00 -0.00 -0.00 -0.00 -0.00 -0.00 -0.00 0.00 -0.00 -0.00 -0.00 ...
  -0.02 0.01 -0.00 -0.02 0.02 -0.00 0.00 -0.02 0.02 0.01 ...

mnist_competition

Deadline: Nov 7, 7:59 a.m. 4 points+5 bonus

This assignment is a competition task. Your goal is to perform 10-class classification on the well-known MNIST dataset. The train set contains 60k images, each consisting of $28×28$ pixels with values in $\{0, 1, …, 255\}$ . Evaluation is performed on 10k test images. You can find a simple online demo of a trained classifier here.

The mnist_competition.py template shows how to load training data, downloading it if needed. Furthermore, it shows how to save a trained estimator and how to load it during prediction.

The performance of your system is measured using accuracy of correctly predicted examples and your goal is to achieve at least 97% accuracy. Note that you can use any sklearn algorithm to solve this exercise (and of course anything you implement yourself).

multilabel_classification_sgd

Deadline: Nov 14, 7:59 a.m. 3 points

Starting with the multilabel_classification_sgd.py, implement minibatch SGD for multi-label classification and manually compute micro-averaged and macro-averaged $F_1$ -score.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 multilabel_classification_sgd.py --batch_size=10 --epochs=9 --classes=5

After epoch 1: train F1 micro 56.45% macro 46.71%, test F1 micro 58.25% macro 43.9%
After epoch 2: train F1 micro 71.46% macro 59.47%, test F1 micro 73.77% macro 60.3%
After epoch 3: train F1 micro 73.06% macro 61.02%, test F1 micro 71.71% macro 56.8%
After epoch 4: train F1 micro 77.30% macro 66.48%, test F1 micro 76.19% macro 64.1%
After epoch 5: train F1 micro 76.05% macro 67.34%, test F1 micro 74.46% macro 61.4%
After epoch 6: train F1 micro 78.22% macro 73.24%, test F1 micro 77.40% macro 66.1%
After epoch 7: train F1 micro 78.13% macro 73.33%, test F1 micro 74.41% macro 61.7%
After epoch 8: train F1 micro 78.92% macro 74.73%, test F1 micro 76.78% macro 66.9%
After epoch 9: train F1 micro 80.76% macro 76.31%, test F1 micro 78.18% macro 68.3%
Learned weights:
  -0.09 -0.17 -0.16 -0.01 0.09 0.01 0.04 -0.09 0.04 0.07 ...
  -0.08 0.09 0.02 -0.07 -0.08 -0.13 -0.07 0.09 0.06 0.01 ...
  0.20 0.25 0.09 0.00 0.02 -0.18 -0.18 -0.15 0.06 0.07 ...
  0.06 -0.04 -0.07 -0.01 0.10 0.13 0.10 0.17 0.20 -0.01 ...
  0.06 -0.11 -0.12 -0.05 -0.20 0.04 -0.01 -0.03 -0.16 -0.11 ...

python3 multilabel_classification_sgd.py --batch_size=10 --epochs=9 --classes=10

After epoch 1: train F1 micro 20.14% macro 9.95%, test F1 micro 21.57% macro 10.4%
After epoch 2: train F1 micro 11.29% macro 7.35%, test F1 micro 14.45% macro 8.8%
After epoch 3: train F1 micro 41.53% macro 26.29%, test F1 micro 33.54% macro 20.4%
After epoch 4: train F1 micro 44.23% macro 30.24%, test F1 micro 37.85% macro 24.4%
After epoch 5: train F1 micro 43.23% macro 29.85%, test F1 micro 42.37% macro 28.3%
After epoch 6: train F1 micro 49.53% macro 35.63%, test F1 micro 46.53% macro 32.2%
After epoch 7: train F1 micro 55.69% macro 40.36%, test F1 micro 48.21% macro 33.8%
After epoch 8: train F1 micro 52.47% macro 37.65%, test F1 micro 46.53% macro 31.9%
After epoch 9: train F1 micro 59.89% macro 43.27%, test F1 micro 53.44% macro 37.5%
Learned weights:
  -0.02 -0.04 -0.02 -0.08 -0.04 -0.10 0.12 0.04 -0.06 -0.15 ...
  0.18 0.04 -0.10 -0.06 0.15 -0.06 -0.08 0.05 0.05 0.05 ...
  0.13 -0.02 -0.20 -0.20 -0.01 0.13 -0.06 -0.15 0.09 -0.08 ...
  -0.05 -0.08 0.11 0.12 0.13 -0.07 0.05 -0.22 -0.02 -0.02 ...
  -0.09 -0.14 -0.00 -0.02 -0.10 -0.05 -0.09 -0.08 -0.06 0.07 ...
  -0.10 -0.01 0.11 0.03 0.03 0.04 0.05 -0.11 -0.04 -0.10 ...
  -0.16 -0.09 -0.13 -0.11 -0.10 -0.20 -0.04 -0.00 0.04 -0.08 ...
  -0.03 0.05 -0.21 -0.09 -0.12 0.03 -0.13 -0.09 -0.02 0.13 ...
  0.05 0.07 0.08 0.04 -0.18 -0.11 -0.09 0.18 -0.09 -0.07 ...
  0.04 -0.10 0.00 -0.07 -0.15 0.17 -0.03 -0.12 -0.12 -0.16 ...

python3 multilabel_classification_sgd.py --batch_size=5 --epochs=9 --classes=5 --learning_rate=0.02

After epoch 1: train F1 micro 60.66% macro 47.96%, test F1 micro 60.82% macro 46.6%
After epoch 2: train F1 micro 79.28% macro 77.99%, test F1 micro 77.65% macro 71.1%
After epoch 3: train F1 micro 80.27% macro 74.86%, test F1 micro 79.57% macro 69.6%
After epoch 4: train F1 micro 81.22% macro 79.85%, test F1 micro 77.41% macro 70.1%
After epoch 5: train F1 micro 80.50% macro 78.76%, test F1 micro 72.54% macro 65.1%
After epoch 6: train F1 micro 82.86% macro 81.46%, test F1 micro 75.62% macro 69.2%
After epoch 7: train F1 micro 81.19% macro 79.54%, test F1 micro 72.51% macro 65.3%
After epoch 8: train F1 micro 81.37% macro 79.59%, test F1 micro 75.06% macro 68.9%
After epoch 9: train F1 micro 83.83% macro 82.38%, test F1 micro 79.74% macro 74.3%
Learned weights:
  -0.18 -0.31 -0.23 0.05 0.12 -0.02 0.09 -0.25 0.21 0.16 ...
  -0.21 0.18 -0.12 -0.08 -0.13 -0.17 -0.12 0.15 0.10 0.04 ...
  0.47 0.32 0.13 0.01 0.09 -0.36 -0.29 -0.26 0.27 0.14 ...
  0.12 -0.07 -0.11 0.04 0.28 0.21 0.11 0.28 0.39 0.04 ...
  0.22 -0.24 -0.26 -0.03 -0.48 0.06 -0.10 0.01 -0.28 -0.14 ...

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 multilabel_classification_sgd.py --batch_size=10 --epochs=2 --classes=5

After epoch 1: train F1 micro 56.45% macro 46.71%, test F1 micro 58.25% macro 43.9%
After epoch 2: train F1 micro 71.46% macro 59.47%, test F1 micro 73.77% macro 60.3%
Learned weights:
  -0.05 -0.11 -0.12 -0.05 0.04 0.04 0.02 0.01 -0.05 0.03 ...
  0.05 -0.01 0.09 -0.05 -0.06 -0.08 -0.05 0.02 0.03 0.00 ...
  0.10 0.16 0.08 0.01 -0.02 -0.05 -0.11 -0.09 -0.04 0.05 ...
  0.03 0.00 -0.06 -0.01 0.01 0.06 0.10 0.08 0.12 0.01 ...
  -0.03 -0.02 -0.08 -0.05 -0.07 -0.05 0.06 -0.03 -0.09 -0.09 ...

python3 multilabel_classification_sgd.py --batch_size=10 --epochs=2 --classes=10

After epoch 1: train F1 micro 20.14% macro 9.95%, test F1 micro 21.57% macro 10.4%
After epoch 2: train F1 micro 11.29% macro 7.35%, test F1 micro 14.45% macro 8.8%
Learned weights:
  -0.04 -0.09 -0.01 -0.01 -0.09 0.02 0.01 0.04 0.02 -0.11 ...
  0.12 0.07 -0.09 -0.07 0.04 0.02 -0.06 -0.03 0.03 0.05 ...
  0.05 0.03 -0.11 -0.13 -0.09 0.08 0.02 -0.14 -0.01 -0.00 ...
  -0.03 -0.07 0.00 0.09 0.08 0.01 -0.01 -0.04 -0.08 -0.02 ...
  -0.11 -0.11 -0.04 0.04 -0.11 -0.03 -0.08 -0.03 -0.07 0.03 ...
  -0.11 -0.07 0.04 0.04 -0.00 0.04 0.00 -0.03 -0.06 -0.05 ...
  -0.14 -0.08 -0.12 -0.09 -0.11 -0.15 -0.09 -0.01 0.01 -0.05 ...
  0.04 0.00 -0.08 -0.10 -0.06 -0.04 -0.01 -0.10 -0.00 0.02 ...
  0.03 0.01 0.04 0.03 -0.06 -0.10 -0.09 0.04 0.02 -0.10 ...
  0.04 -0.06 -0.07 -0.03 -0.09 0.04 0.05 -0.09 -0.04 -0.10 ...

python3 multilabel_classification_sgd.py --batch_size=5 --epochs=2 --classes=5 --learning_rate=0.02

After epoch 1: train F1 micro 60.66% macro 47.96%, test F1 micro 60.82% macro 46.6%
After epoch 2: train F1 micro 79.28% macro 77.99%, test F1 micro 77.65% macro 71.1%
Learned weights:
  -0.08 -0.15 -0.14 -0.01 0.09 0.03 0.04 -0.08 0.03 0.08 ...
  -0.06 0.09 0.04 -0.06 -0.08 -0.13 -0.06 0.11 0.07 0.01 ...
  0.21 0.28 0.12 0.03 0.02 -0.16 -0.16 -0.14 0.06 0.13 ...
  0.07 -0.00 -0.04 0.00 0.12 0.13 0.11 0.19 0.21 0.03 ...
  0.07 -0.10 -0.10 -0.04 -0.19 0.05 0.01 -0.03 -0.15 -0.10 ...

k_nearest_neighbors

Deadline: ~~Nov 14~~ Nov 21, 7:59 a.m. 3 points

Starting with the k_nearest_neighbors.py, implement k-nearest neighbors algorithm for classifying MNIST, without using the sklearn.neighbors module or scipy.spatial module in any way.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 k_nearest_neighbors.py --k=1 --p=2 --weights=uniform --test_size=500 --train_size=100

K-nn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 73.60%

python3 k_nearest_neighbors.py --k=3 --p=2 --weights=uniform --test_size=500 --train_size=100

K-nn accuracy for 3 nearest neighbors, L_2 metric, uniform weights: 66.80%

python3 k_nearest_neighbors.py --k=1 --p=2 --weights=uniform --test_size=500 --train_size=1000

K-nn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 90.40%

python3 k_nearest_neighbors.py --k=5 --p=2 --weights=uniform --test_size=500 --train_size=1000

K-nn accuracy for 5 nearest neighbors, L_2 metric, uniform weights: 88.40%

python3 k_nearest_neighbors.py --k=5 --p=1 --weights=uniform --test_size=500 --train_size=1000

K-nn accuracy for 5 nearest neighbors, L_1 metric, uniform weights: 87.00%

python3 k_nearest_neighbors.py --k=5 --p=3 --weights=uniform --test_size=500 --train_size=1000

K-nn accuracy for 5 nearest neighbors, L_3 metric, uniform weights: 89.40%

python3 k_nearest_neighbors.py --k=1 --p=2 --weights=uniform --test_size=500 --train_size=5000

K-nn accuracy for 1 nearest neighbors, L_2 metric, uniform weights: 94.40%

python3 k_nearest_neighbors.py --k=9 --p=2 --weights=uniform --test_size=500 --train_size=5000

K-nn accuracy for 9 nearest neighbors, L_2 metric, uniform weights: 92.80%

python3 k_nearest_neighbors.py --k=9 --p=2 --weights=inverse --test_size=500 --train_size=5000

K-nn accuracy for 9 nearest neighbors, L_2 metric, inverse weights: 93.00%

python3 k_nearest_neighbors.py --k=9 --p=2 --weights=softmax --test_size=500 --train_size=5000

K-nn accuracy for 9 nearest neighbors, L_2 metric, softmax weights: 94.00%

diacritization

Deadline: Nov 14, 7:59 a.m. 5 points+5 bonus

The goal of the diacritization competition task is to learn to add diacritics to the given Czech text. We will use a small collection of fiction books, which is available under CC BY-NC-SA license. Note that these texts are the only allowed training data, you cannot use any other Czech texts (even manually annotated) to train or evaluate your model. At test time, you will be given a text without diacritics and you should return it including diacritical marks – to be explicit, we only consider diacritized letters áčďéěíňóřšťúůýž and their uppercase variants.

The diacritization.py template shows how to load the training data, downloading it if needed.

Each sentence in the data is stored on a single line, with exactly one space character separating input words. The performance of your system is measured using word accuracy (the percentage of words you diacritized correctly, as computed by the diacritization_eval.py script) and your goal is to achieve at least 86.5%. You can use any sklearn algorithm with the exception of decision trees to solve this assignment (so no random forests, extra trees, gradient boosting, AdaBoost with decision trees, …).

kernel_linear_regression

Deadline: Nov 21, 7:59 a.m. 5 points

Starting with the kernel_linear_regression.py, implement kernel linear regression training using SGD on the dual formulation. You should support polynomial and Gaussian kernels and also L2 regularization.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 kernel_linear_regression.py --batch_size=5 --kernel=poly --kernel_degree=3 --learning_rate=0.1

After epoch 10: train RMSE 0.69, test RMSE 0.61
After epoch 20: train RMSE 0.61, test RMSE 0.65
After epoch 30: train RMSE 0.56, test RMSE 0.72
After epoch 40: train RMSE 0.53, test RMSE 0.77
After epoch 50: train RMSE 0.51, test RMSE 0.84
After epoch 60: train RMSE 0.49, test RMSE 0.91
After epoch 70: train RMSE 0.48, test RMSE 0.98
After epoch 80: train RMSE 0.48, test RMSE 1.01
After epoch 90: train RMSE 0.47, test RMSE 1.04
After epoch 100: train RMSE 0.47, test RMSE 1.05
After epoch 110: train RMSE 0.48, test RMSE 1.09
After epoch 120: train RMSE 0.47, test RMSE 1.12
After epoch 130: train RMSE 0.47, test RMSE 1.12
After epoch 140: train RMSE 0.47, test RMSE 1.13
After epoch 150: train RMSE 0.47, test RMSE 1.13
After epoch 160: train RMSE 0.47, test RMSE 1.10
After epoch 170: train RMSE 0.47, test RMSE 1.13
After epoch 180: train RMSE 0.47, test RMSE 1.16
After epoch 190: train RMSE 0.47, test RMSE 1.14
After epoch 200: train RMSE 0.47, test RMSE 1.14
Learned betas -2.28 -1.44 0.55 2.41 1.17 1.48 3.39 2.52 0.96 1.62 0.11 -0.37 -0.20 -2.91 -3.15 ...
Learned bias 0.44076460113546156

python3 kernel_linear_regression.py --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05

After epoch 10: train RMSE 0.63, test RMSE 1.61
After epoch 20: train RMSE 0.54, test RMSE 0.90
After epoch 30: train RMSE 0.49, test RMSE 1.11
After epoch 40: train RMSE 0.46, test RMSE 1.01
After epoch 50: train RMSE 0.47, test RMSE 0.72
After epoch 60: train RMSE 0.44, test RMSE 0.89
After epoch 70: train RMSE 0.46, test RMSE 1.03
After epoch 80: train RMSE 0.41, test RMSE 0.86
After epoch 90: train RMSE 0.44, test RMSE 0.65
After epoch 100: train RMSE 0.51, test RMSE 0.39
After epoch 110: train RMSE 0.39, test RMSE 0.61
After epoch 120: train RMSE 0.43, test RMSE 0.78
After epoch 130: train RMSE 0.36, test RMSE 0.54
After epoch 140: train RMSE 0.36, test RMSE 0.52
After epoch 150: train RMSE 0.40, test RMSE 0.51
After epoch 160: train RMSE 0.36, test RMSE 0.51
After epoch 170: train RMSE 0.34, test RMSE 0.29
After epoch 180: train RMSE 0.31, test RMSE 0.28
After epoch 190: train RMSE 0.31, test RMSE 0.25
After epoch 200: train RMSE 0.38, test RMSE 0.34
Learned betas -5.90 -4.93 0.06 4.67 1.48 2.01 7.45 5.76 2.26 4.14 0.74 0.20 1.17 -5.26 -5.86 ...
Learned bias 0.48895858087675187

python3 kernel_linear_regression.py --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05 --kernel_gamma=0.15

After epoch 10: train RMSE 0.80, test RMSE 0.66
After epoch 20: train RMSE 0.77, test RMSE 0.65
After epoch 30: train RMSE 0.76, test RMSE 0.63
After epoch 40: train RMSE 0.77, test RMSE 0.66
After epoch 50: train RMSE 0.75, test RMSE 0.63
After epoch 60: train RMSE 0.74, test RMSE 0.62
After epoch 70: train RMSE 0.72, test RMSE 0.61
After epoch 80: train RMSE 0.71, test RMSE 0.60
After epoch 90: train RMSE 0.72, test RMSE 0.63
After epoch 100: train RMSE 0.69, test RMSE 0.60
After epoch 110: train RMSE 0.73, test RMSE 0.67
After epoch 120: train RMSE 0.72, test RMSE 0.65
After epoch 130: train RMSE 0.70, test RMSE 0.62
After epoch 140: train RMSE 0.66, test RMSE 0.60
After epoch 150: train RMSE 0.67, test RMSE 0.62
After epoch 160: train RMSE 0.66, test RMSE 0.61
After epoch 170: train RMSE 0.64, test RMSE 0.62
After epoch 180: train RMSE 0.64, test RMSE 0.61
After epoch 190: train RMSE 0.63, test RMSE 0.62
After epoch 200: train RMSE 0.64, test RMSE 0.65
Learned betas 3.77 3.44 6.39 9.15 4.53 4.08 7.47 4.30 -0.32 0.70 -3.85 -5.11 -4.98 -11.87 -12.67 ...
Learned bias 0.3756022427815734

python3 kernel_linear_regression.py --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05 --l2=0.02

After epoch 10: train RMSE 0.63, test RMSE 1.52
After epoch 20: train RMSE 0.56, test RMSE 0.88
After epoch 30: train RMSE 0.51, test RMSE 1.11
After epoch 40: train RMSE 0.50, test RMSE 1.05
After epoch 50: train RMSE 0.50, test RMSE 0.85
After epoch 60: train RMSE 0.48, test RMSE 1.03
After epoch 70: train RMSE 0.52, test RMSE 1.28
After epoch 80: train RMSE 0.49, test RMSE 1.17
After epoch 90: train RMSE 0.50, test RMSE 0.95
After epoch 100: train RMSE 0.59, test RMSE 0.66
After epoch 110: train RMSE 0.53, test RMSE 1.07
After epoch 120: train RMSE 0.56, test RMSE 1.30
After epoch 130: train RMSE 0.50, test RMSE 1.13
After epoch 140: train RMSE 0.49, test RMSE 1.08
After epoch 150: train RMSE 0.57, test RMSE 1.10
After epoch 160: train RMSE 0.46, test RMSE 0.96
After epoch 170: train RMSE 0.50, test RMSE 0.96
After epoch 180: train RMSE 0.47, test RMSE 1.03
After epoch 190: train RMSE 0.47, test RMSE 1.04
After epoch 200: train RMSE 0.56, test RMSE 0.96
Learned betas -0.65 -0.47 -0.10 0.55 0.39 0.34 0.87 0.60 0.26 0.44 0.05 -0.00 0.05 -0.68 -0.76 ...
Learned bias 0.9258410067869733

python3 kernel_linear_regression.py --batch_size=1 --kernel=rbf

After epoch 10: train RMSE 0.78, test RMSE 0.66
After epoch 20: train RMSE 0.74, test RMSE 0.61
After epoch 30: train RMSE 0.71, test RMSE 0.58
After epoch 40: train RMSE 0.67, test RMSE 0.54
After epoch 50: train RMSE 0.64, test RMSE 0.52
After epoch 60: train RMSE 0.62, test RMSE 0.50
After epoch 70: train RMSE 0.59, test RMSE 0.48
After epoch 80: train RMSE 0.57, test RMSE 0.47
After epoch 90: train RMSE 0.55, test RMSE 0.46
After epoch 100: train RMSE 0.53, test RMSE 0.45
After epoch 110: train RMSE 0.51, test RMSE 0.45
After epoch 120: train RMSE 0.49, test RMSE 0.45
After epoch 130: train RMSE 0.48, test RMSE 0.45
After epoch 140: train RMSE 0.46, test RMSE 0.46
After epoch 150: train RMSE 0.45, test RMSE 0.46
After epoch 160: train RMSE 0.44, test RMSE 0.47
After epoch 170: train RMSE 0.43, test RMSE 0.48
After epoch 180: train RMSE 0.42, test RMSE 0.49
After epoch 190: train RMSE 0.41, test RMSE 0.50
After epoch 200: train RMSE 0.40, test RMSE 0.50
Learned betas 0.65 0.59 1.17 1.72 0.86 0.82 1.61 1.04 0.21 0.47 -0.31 -0.56 -0.46 -1.77 -1.88 ...
Learned bias 0.6512539914766637

python3 kernel_linear_regression.py --batch_size=1 --kernel=rbf --kernel_gamma=0.5

After epoch 10: train RMSE 0.81, test RMSE 0.69
After epoch 20: train RMSE 0.80, test RMSE 0.67
After epoch 30: train RMSE 0.79, test RMSE 0.66
After epoch 40: train RMSE 0.78, test RMSE 0.65
After epoch 50: train RMSE 0.77, test RMSE 0.64
After epoch 60: train RMSE 0.77, test RMSE 0.64
After epoch 70: train RMSE 0.76, test RMSE 0.63
After epoch 80: train RMSE 0.75, test RMSE 0.62
After epoch 90: train RMSE 0.74, test RMSE 0.61
After epoch 100: train RMSE 0.74, test RMSE 0.61
After epoch 110: train RMSE 0.73, test RMSE 0.60
After epoch 120: train RMSE 0.72, test RMSE 0.60
After epoch 130: train RMSE 0.72, test RMSE 0.59
After epoch 140: train RMSE 0.71, test RMSE 0.58
After epoch 150: train RMSE 0.70, test RMSE 0.58
After epoch 160: train RMSE 0.70, test RMSE 0.57
After epoch 170: train RMSE 0.69, test RMSE 0.57
After epoch 180: train RMSE 0.69, test RMSE 0.56
After epoch 190: train RMSE 0.68, test RMSE 0.56
After epoch 200: train RMSE 0.68, test RMSE 0.56
Learned betas 1.45 1.28 1.74 2.17 1.18 1.01 1.67 0.98 0.03 0.19 -0.69 -1.02 -0.99 -2.36 -2.50 ...
Learned bias 0.6326715226190537

python3 kernel_linear_regression.py --batch_size=2 --kernel=rbf --kernel_gamma=5

After epoch 10: train RMSE 0.65, test RMSE 0.55
After epoch 20: train RMSE 0.52, test RMSE 0.40
After epoch 30: train RMSE 0.43, test RMSE 0.30
After epoch 40: train RMSE 0.36, test RMSE 0.22
After epoch 50: train RMSE 0.31, test RMSE 0.17
After epoch 60: train RMSE 0.27, test RMSE 0.15
After epoch 70: train RMSE 0.25, test RMSE 0.13
After epoch 80: train RMSE 0.24, test RMSE 0.13
After epoch 90: train RMSE 0.23, test RMSE 0.14
After epoch 100: train RMSE 0.22, test RMSE 0.15
After epoch 110: train RMSE 0.22, test RMSE 0.15
After epoch 120: train RMSE 0.22, test RMSE 0.16
After epoch 130: train RMSE 0.21, test RMSE 0.16
After epoch 140: train RMSE 0.21, test RMSE 0.17
After epoch 150: train RMSE 0.21, test RMSE 0.17
After epoch 160: train RMSE 0.21, test RMSE 0.17
After epoch 170: train RMSE 0.21, test RMSE 0.17
After epoch 180: train RMSE 0.21, test RMSE 0.17
After epoch 190: train RMSE 0.21, test RMSE 0.18
After epoch 200: train RMSE 0.21, test RMSE 0.18
Learned betas 0.21 0.08 0.29 0.51 0.06 0.05 0.49 0.27 -0.06 0.18 -0.09 -0.10 0.06 -0.49 -0.45 ...
Learned bias 0.7290386122306763

python3 kernel_linear_regression.py --batch_size=1 --kernel=rbf --kernel_gamma=50

After epoch 10: train RMSE 0.52, test RMSE 0.44
After epoch 20: train RMSE 0.36, test RMSE 0.29
After epoch 30: train RMSE 0.27, test RMSE 0.21
After epoch 40: train RMSE 0.23, test RMSE 0.18
After epoch 50: train RMSE 0.21, test RMSE 0.17
After epoch 60: train RMSE 0.20, test RMSE 0.17
After epoch 70: train RMSE 0.20, test RMSE 0.16
After epoch 80: train RMSE 0.20, test RMSE 0.16
After epoch 90: train RMSE 0.20, test RMSE 0.16
After epoch 100: train RMSE 0.19, test RMSE 0.16
After epoch 110: train RMSE 0.19, test RMSE 0.16
After epoch 120: train RMSE 0.19, test RMSE 0.16
After epoch 130: train RMSE 0.19, test RMSE 0.16
After epoch 140: train RMSE 0.19, test RMSE 0.16
After epoch 150: train RMSE 0.19, test RMSE 0.16
After epoch 160: train RMSE 0.19, test RMSE 0.16
After epoch 170: train RMSE 0.19, test RMSE 0.16
After epoch 180: train RMSE 0.19, test RMSE 0.16
After epoch 190: train RMSE 0.19, test RMSE 0.16
After epoch 200: train RMSE 0.19, test RMSE 0.16
Learned betas 0.61 0.03 0.28 0.67 -0.21 -0.21 0.69 0.28 -0.32 0.25 -0.17 -0.06 0.41 -0.59 -0.48 ...
Learned bias 0.8351544798239042

python3 kernel_linear_regression.py --batch_size=1 --kernel=rbf --kernel_gamma=50 --l2=0.02

After epoch 10: train RMSE 0.54, test RMSE 0.45
After epoch 20: train RMSE 0.39, test RMSE 0.32
After epoch 30: train RMSE 0.32, test RMSE 0.25
After epoch 40: train RMSE 0.28, test RMSE 0.22
After epoch 50: train RMSE 0.26, test RMSE 0.20
After epoch 60: train RMSE 0.25, test RMSE 0.19
After epoch 70: train RMSE 0.25, test RMSE 0.18
After epoch 80: train RMSE 0.24, test RMSE 0.18
After epoch 90: train RMSE 0.24, test RMSE 0.18
After epoch 100: train RMSE 0.24, test RMSE 0.18
After epoch 110: train RMSE 0.24, test RMSE 0.18
After epoch 120: train RMSE 0.24, test RMSE 0.18
After epoch 130: train RMSE 0.24, test RMSE 0.17
After epoch 140: train RMSE 0.24, test RMSE 0.17
After epoch 150: train RMSE 0.24, test RMSE 0.17
After epoch 160: train RMSE 0.24, test RMSE 0.17
After epoch 170: train RMSE 0.24, test RMSE 0.17
After epoch 180: train RMSE 0.24, test RMSE 0.17
After epoch 190: train RMSE 0.24, test RMSE 0.17
After epoch 200: train RMSE 0.24, test RMSE 0.17
Learned betas 0.35 0.11 0.22 0.38 -0.02 -0.03 0.35 0.16 -0.11 0.12 -0.09 -0.06 0.12 -0.34 -0.30 ...
Learned bias 0.9187854392321663

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 kernel_linear_regression.py --epochs=20 --batch_size=5 --kernel=poly --kernel_degree=3 --learning_rate=0.1

After epoch 10: train RMSE 0.69, test RMSE 0.61
After epoch 20: train RMSE 0.61, test RMSE 0.65
Learned betas 0.11 0.11 0.25 0.37 0.17 0.16 0.29 0.17 -0.02 0.02 -0.14 -0.20 -0.18 -0.45 -0.49 ...
Learned bias 0.4388019915399849

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05

After epoch 10: train RMSE 0.63, test RMSE 1.61
After epoch 20: train RMSE 0.54, test RMSE 0.90
Learned betas -0.81 -0.58 0.20 0.62 0.23 0.44 0.78 0.71 0.27 0.40 -0.00 0.06 -0.03 -0.63 -0.73 ...
Learned bias 0.44409714525206545

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05 --kernel_gamma=0.15

After epoch 10: train RMSE 0.80, test RMSE 0.66
After epoch 20: train RMSE 0.77, test RMSE 0.65
Learned betas 0.71 0.61 0.87 1.13 0.57 0.45 0.71 0.40 -0.09 -0.04 -0.50 -0.62 -0.64 -1.23 -1.38 ...
Learned bias 0.39859934926020046

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=poly --kernel_degree=5 --learning_rate=0.05 --l2=0.02

After epoch 10: train RMSE 0.63, test RMSE 1.52
After epoch 20: train RMSE 0.56, test RMSE 0.88
Learned betas -0.48 -0.32 0.13 0.38 0.13 0.28 0.49 0.44 0.16 0.24 -0.02 0.03 -0.04 -0.42 -0.47 ...
Learned bias 0.6096489059733282

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=rbf

After epoch 10: train RMSE 0.78, test RMSE 0.66
After epoch 20: train RMSE 0.74, test RMSE 0.61
Learned betas 0.21 0.19 0.24 0.27 0.17 0.14 0.20 0.13 0.03 0.04 -0.05 -0.08 -0.08 -0.22 -0.24 ...
Learned bias 0.6111050342939267

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=rbf --kernel_gamma=0.5

After epoch 10: train RMSE 0.81, test RMSE 0.69
After epoch 20: train RMSE 0.80, test RMSE 0.67
Learned betas 0.22 0.20 0.24 0.27 0.17 0.14 0.20 0.13 0.02 0.03 -0.06 -0.10 -0.09 -0.23 -0.25 ...
Learned bias 0.5619981553157737

python3 kernel_linear_regression.py --epochs=20 --batch_size=2 --kernel=rbf --kernel_gamma=5

After epoch 10: train RMSE 0.65, test RMSE 0.55
After epoch 20: train RMSE 0.52, test RMSE 0.40
Learned betas 0.11 0.10 0.12 0.13 0.08 0.07 0.11 0.07 0.03 0.03 -0.01 -0.02 -0.02 -0.08 -0.09 ...
Learned bias 0.7126228629963139

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=rbf --kernel_gamma=50

After epoch 10: train RMSE 0.52, test RMSE 0.44
After epoch 20: train RMSE 0.36, test RMSE 0.29
Learned betas 0.19 0.15 0.18 0.21 0.11 0.09 0.16 0.10 0.02 0.05 -0.02 -0.04 -0.02 -0.14 -0.15 ...
Learned bias 0.843378438496468

python3 kernel_linear_regression.py --epochs=20 --batch_size=1 --kernel=rbf --kernel_gamma=50 --l2=0.02

After epoch 10: train RMSE 0.54, test RMSE 0.45
After epoch 20: train RMSE 0.39, test RMSE 0.32
Learned betas 0.17 0.14 0.16 0.19 0.10 0.08 0.15 0.10 0.02 0.05 -0.02 -0.04 -0.02 -0.13 -0.14 ...
Learned bias 0.8566622017610405

diacritization_dictionary

Deadline: Nov 21, 7:59 a.m. 4 points+4 bonus

The diacritization_dictionary is an extension of the diacritization competition. In addition to the original training data, in this task you can also use a dictionary providing all known diacritized variants of word forms present in the training and testing data, available again under CC BY-NC-SA license. The dictionary is not guaranteed to contain all words from the training and testing data, but if it contains a word, you can rely on all valid Czech diacritization variants being present.

The rules of the competition are the same as of the diacritization competition, except that

you can utilize the dictionary, both during training and inference;
in order to pass, you need to achieve at least 95% word accuracy.

The diacritization_dictionary.py module provides a Dictionary class, which loads the dictionary (downloading it if necessary), exposing it in Dictionary.variants field as a mapping from undiacritized word form to a list of known diacritized variants.

Note that the fiction-dictionary.txt is available in ReCodEx during evaluation.

smo_algorithm

Deadline: Nov 28, 7:59 a.m. 7 points

Using the smo_algorithm.py template, implement the SMO algorithm for binary classification using dual formulation of soft-margin SVM. The template contains more detailed instructions.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 smo_algorithm.py --kernel=poly --kernel_degree=1

Iteration 100, train acc 88.0%, test acc 83.0%
Done, iteration 140, support vectors 41, train acc 88.0%, test acc 83.0%

python3 smo_algorithm.py --kernel=poly --kernel_degree=3

Iteration 100, train acc 89.0%, test acc 88.0%
Iteration 200, train acc 91.0%, test acc 86.0%
Iteration 300, train acc 86.0%, test acc 77.0%
Iteration 400, train acc 91.0%, test acc 84.0%
Iteration 500, train acc 88.0%, test acc 86.0%
Iteration 600, train acc 91.0%, test acc 86.0%
Iteration 700, train acc 91.0%, test acc 86.0%
Iteration 800, train acc 90.0%, test acc 86.0%
Iteration 900, train acc 91.0%, test acc 86.0%
Done, iteration 1000, support vectors 39, train acc 91.0%, test acc 86.0%

python3 smo_algorithm.py --kernel=poly --kernel_degree=3 --C=5 --max_iterations=1500

Iteration 100, train acc 85.0%, test acc 82.0%
Iteration 200, train acc 83.0%, test acc 83.0%
Iteration 300, train acc 84.0%, test acc 85.0%
Iteration 400, train acc 63.0%, test acc 66.0%
Iteration 500, train acc 89.0%, test acc 89.0%
Iteration 600, train acc 91.0%, test acc 89.0%
Iteration 700, train acc 89.0%, test acc 90.0%
Iteration 800, train acc 89.0%, test acc 89.0%
Iteration 900, train acc 55.0%, test acc 60.0%
Iteration 1000, train acc 91.0%, test acc 88.0%
Iteration 1100, train acc 91.0%, test acc 89.0%
Iteration 1200, train acc 90.0%, test acc 90.0%
Iteration 1300, train acc 91.0%, test acc 89.0%
Iteration 1400, train acc 89.0%, test acc 88.0%
Done, iteration 1500, support vectors 40, train acc 89.0%, test acc 90.0%

python3 smo_algorithm.py --kernel=poly --kernel_degree=4 --kernel_gamma=0.6

Iteration 100, train acc 65.0%, test acc 67.0%
Iteration 200, train acc 80.0%, test acc 80.0%
Iteration 300, train acc 92.0%, test acc 84.0%
Iteration 400, train acc 93.0%, test acc 85.0%
Iteration 500, train acc 92.0%, test acc 83.0%
Iteration 600, train acc 92.0%, test acc 86.0%
Iteration 700, train acc 92.0%, test acc 85.0%
Iteration 800, train acc 92.0%, test acc 85.0%
Iteration 900, train acc 92.0%, test acc 84.0%
Done, iteration 1000, support vectors 35, train acc 93.0%, test acc 86.0%

python3 smo_algorithm.py --kernel=rbf --kernel_gamma=1

Iteration 100, train acc 92.0%, test acc 84.0%
Iteration 200, train acc 92.0%, test acc 84.0%
Iteration 300, train acc 92.0%, test acc 84.0%
Iteration 400, train acc 92.0%, test acc 84.0%
Done, iteration 483, support vectors 53, train acc 92.0%, test acc 84.0%

python3 smo_algorithm.py --kernel=rbf --kernel_gamma=0.1

Done, iteration 87, support vectors 51, train acc 88.0%, test acc 85.0%

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 smo_algorithm.py --max_iterations=20 --kernel=poly --kernel_degree=1

Done, iteration 20, support vectors 48, train acc 84.0%, test acc 86.0%

python3 smo_algorithm.py --max_iterations=20 --kernel=poly --kernel_degree=3

Done, iteration 20, support vectors 70, train acc 85.0%, test acc 84.0%

python3 smo_algorithm.py --max_iterations=20 --kernel=poly --kernel_degree=3 --C=5

Done, iteration 20, support vectors 78, train acc 87.0%, test acc 86.0%

python3 smo_algorithm.py --max_iterations=20 --kernel=poly --kernel_degree=4 --kernel_gamma=0.6

Done, iteration 20, support vectors 55, train acc 84.0%, test acc 86.0%

python3 smo_algorithm.py --max_iterations=20 --kernel=rbf --kernel_gamma=1

Done, iteration 20, support vectors 67, train acc 92.0%, test acc 84.0%

python3 smo_algorithm.py --max_iterations=20 --kernel=rbf --kernel_gamma=0.1

Done, iteration 20, support vectors 53, train acc 85.0%, test acc 84.0%

svm_multiclass

Deadline: Nov 28, 7:59 a.m. 3 points

Extend your solution to the smo_algorithm assignment to handle multiclass classification, using the svm_multiclass.py template.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 svm_multiclass.py --max_iterations=20 --classes=5 --kernel=poly --kernel_degree=2 --test_size=0.8

Training classes 0 and 1
Done, iteration 20, support vectors 14, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Done, iteration 20, support vectors 12, train acc 100.0%, test acc 99.7%
Training classes 0 and 3
Done, iteration 20, support vectors 13, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Done, iteration 20, support vectors 22, train acc 100.0%, test acc 98.6%
Training classes 1 and 3
Done, iteration 20, support vectors 20, train acc 100.0%, test acc 99.7%
Training classes 1 and 4
Done, iteration 20, support vectors 21, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Done, iteration 20, support vectors 21, train acc 100.0%, test acc 98.3%
Training classes 2 and 4
Done, iteration 20, support vectors 18, train acc 100.0%, test acc 98.6%
Training classes 3 and 4
Done, iteration 20, support vectors 18, train acc 100.0%, test acc 99.0%
Test set accuracy: 98.06%

python3 svm_multiclass.py --max_iterations=20 --classes=5 --kernel=poly --kernel_degree=3 --test_size=0.8

Training classes 0 and 1
Done, iteration 20, support vectors 16, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Done, iteration 20, support vectors 14, train acc 100.0%, test acc 99.7%
Training classes 0 and 3
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Done, iteration 20, support vectors 20, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Done, iteration 20, support vectors 21, train acc 100.0%, test acc 98.2%
Training classes 1 and 3
Done, iteration 20, support vectors 24, train acc 100.0%, test acc 99.7%
Training classes 1 and 4
Done, iteration 20, support vectors 25, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Done, iteration 20, support vectors 18, train acc 100.0%, test acc 98.3%
Training classes 2 and 4
Done, iteration 20, support vectors 18, train acc 100.0%, test acc 98.3%
Training classes 3 and 4
Done, iteration 20, support vectors 21, train acc 100.0%, test acc 99.0%
Test set accuracy: 98.20%

python3 svm_multiclass.py --max_iterations=20 --classes=5 --kernel=poly --kernel_degree=3 --kernel_gamma=0.02 --test_size=0.8

Training classes 0 and 1
Done, iteration 20, support vectors 23, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 99.0%
Training classes 0 and 3
Done, iteration 20, support vectors 20, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Done, iteration 20, support vectors 24, train acc 98.6%, test acc 100.0%
Training classes 1 and 2
Done, iteration 20, support vectors 32, train acc 100.0%, test acc 97.5%
Training classes 1 and 3
Done, iteration 20, support vectors 29, train acc 100.0%, test acc 99.0%
Training classes 1 and 4
Done, iteration 20, support vectors 36, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Done, iteration 20, support vectors 32, train acc 100.0%, test acc 97.3%
Training classes 2 and 4
Done, iteration 20, support vectors 18, train acc 100.0%, test acc 99.0%
Training classes 3 and 4
Done, iteration 20, support vectors 21, train acc 100.0%, test acc 99.3%
Test set accuracy: 96.95%

python3 svm_multiclass.py --max_iterations=20 --classes=5 --kernel=rbf --kernel_gamma=1 --test_size=0.8

Training classes 0 and 1
Done, iteration 20, support vectors 69, train acc 100.0%, test acc 97.9%
Training classes 0 and 2
Done, iteration 20, support vectors 60, train acc 100.0%, test acc 99.3%
Training classes 0 and 3
Done, iteration 20, support vectors 60, train acc 100.0%, test acc 99.3%
Training classes 0 and 4
Done, iteration 20, support vectors 66, train acc 100.0%, test acc 95.5%
Training classes 1 and 2
Done, iteration 20, support vectors 74, train acc 100.0%, test acc 92.6%
Training classes 1 and 3
Done, iteration 20, support vectors 73, train acc 100.0%, test acc 97.2%
Training classes 1 and 4
Done, iteration 20, support vectors 80, train acc 100.0%, test acc 99.6%
Training classes 2 and 3
Done, iteration 20, support vectors 64, train acc 100.0%, test acc 99.3%
Training classes 2 and 4
Done, iteration 20, support vectors 71, train acc 100.0%, test acc 90.9%
Training classes 3 and 4
Done, iteration 20, support vectors 71, train acc 100.0%, test acc 94.5%
Test set accuracy: 92.23%

python3 svm_multiclass.py --max_iterations=20 --classes=5 --kernel=rbf --kernel_gamma=0.05 --C=3 --test_size=0.8

Training classes 0 and 1
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 100.0%
Training classes 0 and 2
Done, iteration 20, support vectors 15, train acc 100.0%, test acc 99.7%
Training classes 0 and 3
Done, iteration 20, support vectors 15, train acc 100.0%, test acc 100.0%
Training classes 0 and 4
Done, iteration 20, support vectors 19, train acc 100.0%, test acc 100.0%
Training classes 1 and 2
Done, iteration 20, support vectors 24, train acc 100.0%, test acc 98.6%
Training classes 1 and 3
Done, iteration 20, support vectors 22, train acc 100.0%, test acc 99.3%
Training classes 1 and 4
Done, iteration 20, support vectors 28, train acc 100.0%, test acc 98.9%
Training classes 2 and 3
Done, iteration 20, support vectors 24, train acc 100.0%, test acc 98.6%
Training classes 2 and 4
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 99.0%
Training classes 3 and 4
Done, iteration 20, support vectors 17, train acc 100.0%, test acc 99.0%
Test set accuracy: 98.20%

tf_idf

Deadline: Dec 5, 7:59 a.m. 3 points

Using the tf_idf.py template, perform classification of text documents from the 20 Newsgroups dataset. To represent the documents, use TF and/or IDF weights, which you implement manually (without using the sklearn.feature_extraction module in any way). Classify test set documents using the majority class of the $k$ most similar training documents (evaluated using cosine similarity) and report macro F1-score; utilizing the k-nearest neighbors classifier from sklearn is fine.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 tf_idf.py --train_size=1000 --test_size=500 --k=1

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=False, IDF=False, k=1: 25.7%

python3 tf_idf.py --train_size=1000 --test_size=500 --k=1 --tf

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=True, IDF=False, k=1: 16.1%

python3 tf_idf.py --train_size=1000 --test_size=500 --k=1 --tf --idf

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=True, IDF=True, k=1: 48.3%

python3 tf_idf.py --train_size=1000 --test_size=500 --k=1 --idf

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=False, IDF=True, k=1: 46.5%

python3 tf_idf.py --train_size=1000 --test_size=500 --k=5 --idf

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=False, IDF=True, k=5: 48.3%

python3 tf_idf.py --train_size=1000 --test_size=500 --k=15 --idf

Number of unique terms with at least two occurrences: 13120
F-1 score for TF=False, IDF=True, k=15: 49.8%

python3 tf_idf.py --train_size=2000 --test_size=500 --k=15 --idf

Number of unique terms with at least two occurrences: 20414
F-1 score for TF=False, IDF=True, k=15: 57.2%

python3 tf_idf.py --train_size=2000 --test_size=500 --k=15 --tf --idf

Number of unique terms with at least two occurrences: 20414
F-1 score for TF=True, IDF=True, k=15: 59.5%

naive_bayes

Deadline: Dec 5, 7:59 a.m. 3 points

Using the naive_bayes.py template, implement a naive Bayes classifier (without using the sklearn.naive_bayes module in any way). Support all of Gaussian NB, multinomial NB and Bernoulli NB.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 naive_bayes.py --classes=3 --naive_bayes_type=bernoulli --seed=72

Test accuracy 95.17%

python3 naive_bayes.py --classes=3 --naive_bayes_type=multinomial --seed=72

Test accuracy 93.68%

python3 naive_bayes.py --classes=3 --naive_bayes_type=gaussian --seed=72

Test accuracy 95.54%

python3 naive_bayes.py --classes=10 --naive_bayes_type=bernoulli --seed=72

Test accuracy 89.21%

python3 naive_bayes.py --classes=10 --naive_bayes_type=bernoulli --alpha=10 --seed=72

Test accuracy 88.54%

python3 naive_bayes.py --classes=10 --naive_bayes_type=multinomial --alpha=10 --seed=53

Test accuracy 90.77%

python3 naive_bayes.py --classes=10 --naive_bayes_type=gaussian --alpha=10 --seed=72

Test accuracy 92.10%

isnt_it_ironic

Deadline: Dec 5, 7:59 a.m. 4 points+4 bonus

The goal of the isnt_it_ironic competition task is to learn to classify given text as ironic or not.

The isnt_it_ironic.py template shows how to load the training data, downloading it if needed. Please note that the data are provided only for the purpose of this class and you cannot use them in any other way.

Each instance is a string of an English tweet. The texts have already been tokenized and tokens are separated by exactly one space. The performance of your solution will be evaluated using F1-score with sklearn.metrics.f1_score and if you surpass at least 58.5%, you will obtain 4 points. Note that you can use any sklearn algorithm to solve this exercise (or anything you implement yourselves).

You might find TfidfTransformer or TfidfVectorizer useful.

metric_correlation

Deadline: Dec 12, 7:59 a.m. 3 points

Using the metric_correlation.py template, find a $\beta$ for which $F_\beta$ score correlates best with human ratings.

We use an aritificial dataset, which for every sentence contains:

the number of edits that must be performed for every sentence,
the number of edits proposed by a model,
the number of correct edits proposed by a model,
human rating of the sentence.

Using bootstrap resampling, compute the mean human rating and $F_\beta$ score for each sampled dataset and then manually compute the Pearson correlation for betas between 0 and 2, and return the most correlating beta.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 metric_correlation.py --bootstrap_samples=100 --data_size=1000

Best correlation of 0.711 was found for beta 0.79

python3 metric_correlation.py --bootstrap_samples=100 --data_size=2000

Best correlation of 0.726 was found for beta 0.63

python3 metric_correlation.py --bootstrap_samples=200 --data_size=2000

Best correlation of 0.676 was found for beta 0.61

decision_tree

Deadline: Dec 12, 7:59 a.m. 4 points

Starting with the decision_tree.py, manually implement construction of a classification decision tree, supporting both gini and entropy criteria, and max_depth, min_to_split and max_leaves constraints.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 decision_tree.py --dataset=digits --criterion=gini --min_to_split=250

Train accuracy: 60.7%
Test accuracy: 59.6%

python3 decision_tree.py --dataset=digits --criterion=gini --max_depth=3

Train accuracy: 41.1%
Test accuracy: 38.0%

python3 decision_tree.py --dataset=digits --criterion=gini --max_leaves=8

Train accuracy: 60.1%
Test accuracy: 57.1%

python3 decision_tree.py --dataset=digits --criterion=gini --min_to_split=220 --max_leaves=8

Train accuracy: 60.7%
Test accuracy: 59.6%

python3 decision_tree.py --dataset=digits --criterion=entropy --min_to_split=420

Train accuracy: 42.4%
Test accuracy: 40.2%

python3 decision_tree.py --dataset=breast_cancer --criterion=entropy --max_depth=3 --seed=44

Train accuracy: 94.8%
Test accuracy: 93.7%

python3 decision_tree.py --dataset=digits --criterion=entropy --max_leaves=7

Train accuracy: 53.2%
Test accuracy: 51.6%

python3 decision_tree.py --dataset=breast_cancer --criterion=entropy --min_to_split=55 --max_depth=3 --seed=44

Train accuracy: 94.4%
Test accuracy: 93.7%

random_forest

Deadline: Dec 12, 7:59 a.m. 3 points

Using the random_forest.py template, train a random forest, which is a collection of decision trees trained with dataset bagging and random feature subsampling.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 random_forest.py --dataset=wine --trees=3 --max_depth=3

Train accuracy: 99.2%
Test accuracy: 88.9%

python3 random_forest.py --dataset=wine --trees=3 --bagging --max_depth=3

Train accuracy: 97.7%
Test accuracy: 95.6%

python3 random_forest.py --dataset=wine --trees=3 --feature_subsampling=0.5 --max_depth=3

Train accuracy: 97.7%
Test accuracy: 88.9%

python3 random_forest.py --dataset=wine --trees=3 --bagging --feature_subsampling=0.5 --max_depth=3

Train accuracy: 99.2%
Test accuracy: 95.6%

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 random_forest.py --dataset=digits --trees=10 --max_depth=3

Train accuracy: 54.4%
Test accuracy: 50.4%

python3 random_forest.py --dataset=digits --trees=10 --bagging --max_depth=3

Train accuracy: 72.8%
Test accuracy: 72.2%

python3 random_forest.py --dataset=digits --trees=10 --feature_subsampling=0.5 --max_depth=3

Train accuracy: 64.3%
Test accuracy: 62.7%

python3 random_forest.py --dataset=digits --trees=10 --bagging --feature_subsampling=0.5 --max_depth=3

Train accuracy: 73.5%
Test accuracy: 75.6%

python3 random_forest.py --dataset=wine --trees=10 --max_depth=3

Train accuracy: 99.2%
Test accuracy: 88.9%

python3 random_forest.py --dataset=wine --trees=10 --bagging --max_depth=3

Train accuracy: 100.0%
Test accuracy: 97.8%

python3 random_forest.py --dataset=breast_cancer --trees=10 --feature_subsampling=0.5 --max_depth=3

Train accuracy: 97.9%
Test accuracy: 95.1%

python3 random_forest.py --dataset=breast_cancer --trees=10 --bagging --feature_subsampling=0.5 --max_depth=3

Train accuracy: 98.6%
Test accuracy: 95.1%

gradient_boosting

Deadline: Dec 19, 7:59 a.m. 6 points

Using the gradient_boosting.py template, train gradient boosted decision tree forest for classification.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 gradient_boosting.py --dataset=wine --trees=3 --max_depth=1 --learning_rate=0.3

Using 1 trees, train accuracy: 95.5%, test accuracy: 91.1%
Using 2 trees, train accuracy: 95.5%, test accuracy: 86.7%
Using 3 trees, train accuracy: 97.7%, test accuracy: 91.1%

python3 gradient_boosting.py --dataset=wine --trees=3 --max_depth=2 --learning_rate=0.3 --seed=599

Using 1 trees, train accuracy: 99.2%, test accuracy: 91.1%
Using 2 trees, train accuracy: 99.2%, test accuracy: 91.1%
Using 3 trees, train accuracy: 99.2%, test accuracy: 95.6%

python3 gradient_boosting.py --dataset=wine --trees=3 --max_depth=2 --l2=0.5 --learning_rate=0.3 --seed=488

Using 1 trees, train accuracy: 97.0%, test accuracy: 95.6%
Using 2 trees, train accuracy: 98.5%, test accuracy: 97.8%
Using 3 trees, train accuracy: 99.2%, test accuracy: 97.8%

python3 gradient_boosting.py --dataset=digits --trees=3 --max_depth=2 --learning_rate=0.5

Using 1 trees, train accuracy: 79.1%, test accuracy: 76.9%
Using 2 trees, train accuracy: 85.7%, test accuracy: 84.4%
Using 3 trees, train accuracy: 91.3%, test accuracy: 87.8%

python3 gradient_boosting.py --dataset=breast_cancer --trees=3 --max_depth=2 --learning_rate=0.5 --seed=45

Using 1 trees, train accuracy: 94.6%, test accuracy: 90.2%
Using 2 trees, train accuracy: 96.9%, test accuracy: 95.1%
Using 3 trees, train accuracy: 96.9%, test accuracy: 93.7%

human_activity_recognition

Deadline: Dec 19, 7:59 a.m. 3 points+4 bonus

The goal of this competition task is to perform human activity recognition, namely to recognize one of five actions (walking, standing, sitting, standing up, sitting down) using data from four accelerometers. The train set consists of 50k examples, the test set of approximately 115k.

The human_activity_recognition.py template shows how to load the training data, downloading it if needed.

Your model will be evaluated using accuracy and your goal is to achieve at least 99%. Note that you can use any sklearn algorithm to solve this assignment.

pca

Deadline: Feb 12, 23:59 3 points

Using the pca.py template, implement the PCA computation with both

power iteration algorithm,
SVD decomposition.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 pca.py --max_iter=1000 --solver=lbfgs

Test set accuracy: 89.76%

python3 pca.py --max_iter=1000 --pca=1 --solver=lbfgs

Test set accuracy: 30.88%

python3 pca.py --max_iter=1000 --pca=5 --solver=lbfgs

Test set accuracy: 68.96%

python3 pca.py --max_iter=1000 --pca=10 --solver=lbfgs

Test set accuracy: 80.48%

python3 pca.py --max_iter=1000 --pca=20 --solver=lbfgs

Test set accuracy: 87.80%

python3 pca.py --max_iter=1000 --pca=50 --solver=lbfgs

Test set accuracy: 90.08%

python3 pca.py --max_iter=1000 --pca=100 --solver=lbfgs

Test set accuracy: 90.16%

python3 pca.py --max_iter=1000 --pca=200 --solver=lbfgs

Test set accuracy: 89.88%

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 pca.py --max_iter=5

Test set accuracy: 90.48%

python3 pca.py --max_iter=5 --pca=1

Test set accuracy: 30.28%

python3 pca.py --max_iter=5 --pca=5

Test set accuracy: 68.88%

python3 pca.py --max_iter=5 --pca=10

Test set accuracy: 80.00%

python3 pca.py --max_iter=5 --pca=20

Test set accuracy: 87.68%

python3 pca.py --max_iter=5 --pca=50

Test set accuracy: 90.28%

python3 pca.py --max_iter=5 --pca=100

Test set accuracy: 90.76%

python3 pca.py --max_iter=5 --pca=200

Test set accuracy: 90.68%

kmeans

Deadline: Feb 12, 23:59 3 points

Using the kmeans.py template, implement the K-Means algorithm with both

random initialization,
kmeans++ initialization.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 kmeans.py --clusters=5 --examples=150 --iterations=5 --seed=51 --init=random

Cluster assignments:
[2 3 3 4 1 2 1 1 2 3 2 1 1 3 3 4 0 4 4 1 3 1 1 1 1 0 1 3 3 2 3 0 1 0 3 3 0
 0 1 0 1 2 1 1 3 2 1 2 2 1 3 2 2 2 3 2 1 2 1 4 3 3 4 4 2 1 1 1 1 3 1 3 1 4
 1 3 2 1 0 0 1 2 2 0 2 2 3 1 1 1 2 2 4 2 2 1 1 1 2 2 2 3 1 3 1 3 2 1 0 2 2
 3 1 1 1 3 3 0 1 3 4 1 1 4 1 3 1 4 4 3 1 4 1 4 1 1 1 3 1 1 4 2 0 3 1 4 1 2
 2 1]

python3 kmeans.py --clusters=5 --examples=150 --iterations=5 --seed=51 --init=kmeans++

Cluster assignments:
[1 3 3 4 0 1 0 2 1 3 1 2 2 3 3 4 4 4 4 2 3 2 2 2 2 4 2 3 3 0 3 4 0 4 3 3 4
 4 2 4 2 1 0 0 3 1 0 1 1 0 3 1 0 0 3 1 0 1 2 4 3 3 4 4 1 0 2 0 0 3 0 3 0 4
 2 3 1 2 4 4 2 1 1 4 1 1 3 0 2 2 1 1 4 1 1 2 0 2 1 1 1 3 0 3 2 3 1 0 4 1 1
 3 0 2 0 3 0 4 0 3 4 0 2 4 2 3 2 4 4 3 2 4 2 4 2 0 0 3 0 0 4 1 4 3 0 4 2 1
 1 2]

python3 kmeans.py --clusters=7 --examples=200 --iterations=11 --seed=67 --init=random

Cluster assignments:
[2 1 0 4 5 1 4 1 1 2 0 3 6 6 1 6 1 1 0 2 3 2 4 0 6 5 5 4 5 4 4 6 6 1 0 6 4
 4 1 6 4 5 4 4 1 0 2 1 2 2 4 3 2 1 5 2 6 0 5 6 4 2 6 3 1 1 4 5 1 2 4 5 4 5
 1 1 4 2 5 4 4 5 4 2 2 4 4 1 5 0 4 4 4 1 3 0 3 5 4 1 0 4 4 4 4 4 5 4 1 4 4
 2 5 2 6 5 2 2 4 5 4 4 3 3 2 6 1 4 1 6 1 2 3 0 5 6 4 6 4 5 5 2 0 1 6 0 1 4
 4 6 5 1 2 4 0 0 4 0 4 3 5 4 3 4 6 3 6 5 5 6 0 2 6 5 4 5 4 3 2 4 1 2 4 2 4
 2 6 4 4 6 2 4 4 6 6 5 0 2 4 1]

python3 kmeans.py --clusters=7 --examples=200 --iterations=5 --seed=67 --init=kmeans++

Cluster assignments:
[3 1 4 5 0 1 6 1 3 3 4 4 2 2 1 2 1 1 4 3 4 3 5 4 2 0 0 6 0 6 5 2 2 1 4 2 5
 5 1 2 5 0 6 6 1 4 3 1 3 3 5 4 3 1 0 3 2 4 0 2 5 3 2 4 1 1 6 0 1 3 5 0 5 0
 1 1 6 3 0 6 5 0 5 3 3 5 6 1 0 4 5 6 5 1 4 4 2 0 6 1 4 6 5 5 6 5 0 5 1 6 6
 3 0 3 2 0 3 3 5 0 6 6 4 4 3 2 1 6 1 2 1 3 4 4 0 2 6 2 6 0 0 3 4 1 2 4 1 5
 6 2 0 1 3 5 4 4 6 4 6 4 0 5 2 5 2 4 2 0 0 2 4 3 2 0 6 0 5 2 3 5 1 3 6 3 5
 3 2 6 5 2 0 6 6 2 2 0 4 3 6 1]

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 kmeans.py --clusters=5 --examples=150 --iterations=3 --init=random

Cluster assignments:
[4 3 4 4 3 2 3 3 4 3 4 4 4 3 1 4 3 4 1 2 3 4 3 3 1 3 3 3 3 4 0 3 3 3 4 0 3
 3 4 3 4 3 3 4 4 4 3 0 4 4 4 4 2 3 3 2 4 0 0 3 3 4 4 0 3 2 4 1 3 4 1 4 4 0
 4 3 4 1 3 0 3 4 4 3 4 1 3 4 3 3 4 4 4 3 4 4 1 4 4 3 1 3 1 4 3 3 3 4 4 4 4
 3 4 4 4 4 4 4 2 4 3 4 4 2 3 3 3 4 2 4 4 3 3 2 3 3 2 3 2 0 4 3 3 3 3 3 3 3
 3 4]

python3 kmeans.py --clusters=5 --examples=150 --iterations=3 --init=kmeans++

Cluster assignments:
[4 1 4 2 3 0 1 1 4 3 4 2 2 3 0 2 3 2 0 0 1 4 3 3 0 3 3 1 3 2 0 1 3 3 2 0 3
 3 4 1 2 3 1 2 4 2 1 0 2 4 2 2 0 1 1 0 2 0 0 1 3 4 2 0 3 0 4 0 3 2 0 4 2 0
 2 1 2 0 3 0 1 2 4 1 4 0 1 2 3 3 2 4 2 1 4 2 0 4 4 1 0 3 0 2 1 3 1 4 2 2 4
 3 2 2 4 4 2 4 0 2 1 4 4 0 3 1 3 4 0 2 4 1 1 0 3 1 0 3 0 0 4 1 1 3 3 1 1 3
 1 2]

python3 kmeans.py --clusters=7 --examples=200 --iterations=3 --init=random

Cluster assignments:
[6 0 0 3 3 1 1 1 5 4 3 6 5 2 4 4 3 3 4 3 6 3 1 3 0 6 2 0 2 6 2 1 0 3 6 3 1
 3 5 3 3 0 4 3 5 6 3 0 3 3 6 6 3 3 3 5 0 5 3 6 2 2 0 2 2 4 3 3 6 6 6 5 0 2
 1 2 2 6 2 5 0 0 1 2 5 2 2 4 5 3 6 4 6 3 3 5 1 3 3 0 5 3 0 6 2 0 2 3 5 0 3
 1 3 5 6 5 4 2 0 3 2 3 3 6 1 2 2 2 4 3 0 5 5 2 3 5 1 2 6 6 0 5 5 3 3 3 2 6
 0 0 2 0 6 3 3 2 6 0 3 5 4 1 3 5 0 3 0 5 6 3 1 5 0 0 3 0 1 3 5 4 3 3 3 2 3
 3 2 3 6 1 4 3 0 5 5 3 3 5 6 6]

python3 kmeans.py --clusters=7 --examples=200 --iterations=3 --init=kmeans++

Cluster assignments:
[4 0 0 5 5 2 2 2 2 2 5 4 2 3 2 2 6 6 2 6 4 5 2 1 0 4 3 0 3 4 3 2 0 1 4 6 2
 5 2 5 5 0 2 1 2 4 5 0 1 5 4 4 5 5 5 2 0 2 6 4 3 3 0 3 3 2 5 1 4 4 4 2 0 3
 2 3 3 4 3 2 0 0 2 3 2 3 3 2 2 5 4 2 4 5 1 2 2 5 1 0 2 1 0 4 3 0 3 1 2 0 6
 2 1 2 4 2 2 3 0 5 3 1 1 4 2 3 3 3 2 6 0 2 2 3 1 2 2 3 4 4 0 2 2 1 5 1 3 4
 0 0 3 0 4 1 5 3 4 0 1 2 2 2 6 2 0 1 0 2 4 1 2 2 0 0 5 0 2 1 2 2 1 5 1 3 1
 5 3 5 4 2 2 6 0 2 2 1 6 2 4 4]

nli_competition

Deadline: Jan 02, 7:59 5 points+5 bonus

In this competition task you will be solving the Native Language Identification. In that task, you get an English essay writen by a non-native individual and your goal is to identify their native language.

We will be using NLI Shared Task 2013 data, which contains documents in 11 languages. For each language, the train and test sets contain 1000 and 100 documents, respectively. Particularly interesting is the fact that humans are quite bad in this task (in a simplified settings, human professionals achieve 40-50% accuracy), while machine learning models can achieve high performance.

Because the data is not publicly available, you can download it only through ReCodEx. Please do not distribute it. The template nli_competition.py can then be used to load the dataset as usual.

The performance of your system is measured using accuracy of correctly predicted documents and your goal is to achieve at least 78% accuracy. Note that you can use any sklearn algorithm to solve this exercise.

gaussian_mixture

Deadline: Feb 12, 23:59 4 points

Cluster given input by fitting a Gaussian mixture using the gaussian_mixture.py template. Use full covariances and compute the negative log-likelihood of the model after every iteration of the EM algorithm.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 gaussian_mixture.py --examples=112 --clusters=4 --iterations=5 --init=random

Loss after iteration 1: 546.2
Loss after iteration 2: 524.1
Loss after iteration 3: 502.9
Loss after iteration 4: 471.1
Loss after iteration 5: 463.5

python3 gaussian_mixture.py --examples=112 --clusters=4 --iterations=3 --init=kmeans++

Loss after iteration 1: 458.5
Loss after iteration 2: 458.5
Loss after iteration 3: 458.5

python3 gaussian_mixture.py --examples=120 --clusters=5 --iterations=11 --init=random

Loss after iteration 1: 526.2
Loss after iteration 2: 520.9
Loss after iteration 3: 517.5
Loss after iteration 4: 517.2
Loss after iteration 5: 517.1
Loss after iteration 6: 517.0
Loss after iteration 7: 517.0
Loss after iteration 8: 517.0
Loss after iteration 9: 516.9
Loss after iteration 10: 516.9
Loss after iteration 11: 516.9

python3 gaussian_mixture.py --examples=120 --clusters=5 --iterations=5 --init=kmeans++

Loss after iteration 1: 516.5
Loss after iteration 2: 513.7
Loss after iteration 3: 508.8
Loss after iteration 4: 505.4
Loss after iteration 5: 504.5

bootstrap_resampling

Deadline: Feb 12, 23:59 3 points

Given two trained models, compute their 95% confidence intervals using bootstrap resampling. Then, estimate the probability that the second one is better than the first one using a paired bootstrap test.

Start with the bootstrap_resampling.py template. Note that you usually need to perform a lot of the bootstrap resamplings, so you should make sure your implementation is fast enough.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 bootstrap_resampling.py --seed=49 --test_size=0.9 --bootstrap_samples=1000

Confidence intervals of the two models:
- [90.23% .. 93.02%]
- [90.98% .. 93.63%]
The estimated probability that the null hypothesis holds: 1.40%

python3 bootstrap_resampling.py --seed=49 --test_size=0.9 --bootstrap_samples=10000

Confidence intervals of the two models:
- [90.30% .. 93.02%]
- [91.10% .. 93.70%]
The estimated probability that the null hypothesis holds: 1.71%

python3 bootstrap_resampling.py --seed=49 --test_size=0.9 --bootstrap_samples=100000

Confidence intervals of the two models:
- [90.30% .. 92.95%]
- [91.10% .. 93.70%]
The estimated probability that the null hypothesis holds: 1.62%

python3 bootstrap_resampling.py --seed=33 --test_size=0.95 --bootstrap_samples=50000

Confidence intervals of the two models:
- [87.18% .. 90.16%]
- [87.65% .. 90.63%]
The estimated probability that the null hypothesis holds: 8.94%

permutation_test

Deadline: Feb 12, 23:59 1 point

Given two trained models, perform a random permutation test that the second one is better than the first one.

Start with the permutation_test.py template. Note that you usually need to perform a lot of resamplings, so you should make sure your implementation is fast enough.

Note that your results may be slightly different (because of varying floating point arithmetic on your CPU).

python3 permutation_test.py --seed=49 --test_size=0.9 --random_samples=1000

The estimated p-value of the random permutation test: 2.40%

python3 permutation_test.py --seed=49 --test_size=0.9 --random_samples=10000

The estimated p-value of the random permutation test: 2.15%

python3 permutation_test.py --seed=49 --test_size=0.9 --random_samples=100000

The estimated p-value of the random permutation test: 2.16%

python3 permutation_test.py --seed=33 --test_size=0.95 --random_samples=50000

The estimated p-value of the random permutation test: 10.72%

In the competitions, your goal is to train a model and then predict target values on the test set available only in ReCodEx.

Submitting to ReCodEx

When submitting a competition solution to ReCodEx, you should submit a trained model and a Python source capable of running it.

Furthermore, please also include the Python source and hyperparameters you used to train the submitted model. But be careful that there still must be exactly one Python source with a line starting with def main(.

Do not forget about the maximum allowed model size and time and memory limits.

Competition Evaluation

Before the deadline, ReCodEx prints the exact achieved score, but only if it is worse than the baseline.

If you surpass the baseline, the assignment is marked as solved in ReCodEx and you immediately get regular points for the assignment. However, ReCodEx does not print the reached score.
After the competition deadline, the latest submission of every user surpassing the required baseline participates in a competition. Additional bonus points are then awarded according to the ordering of the performance of the participating submissions.
After the competition results announcement, ReCodEx starts to show the exact performance for all the already submitted solutions and also for the solutions submitted later.

What Is Allowed

You can use only the given annotated data, both for training and evaluation.
Additionally, you can use any unannotated or manually created data for training and evaluation.
The test set annotations must be the result of your system (so you cannot manually correct them; but your system can contain other parts than just trained models, like hand-written rules).
Do not use test set annotations in any way, if you somehow get access to them.
Unless stated otherwise, you can use any algorithm present in numpy or scipy, anything you implement yourself, and any pre/post-processing or ensembling methods in sklearn. Apart from the allowed algorithms, the implementation must be created by you and you must understand it fully. Do not use deep network frameworks like TensorFlow or PyTorch.

Install

Installing to central user packages repository

You can install all required packages to central user packages repository using pip3 install --user scikit-learn==1.1.2 numpy==1.23.3 scipy==1.9.1 pandas==1.5.0 matplotlib==3.6.0.
Installing to a virtual environment

Python supports virtual environments, which are directories containing independent sets of installed packages. You can create a virtual environment by running python3 -m venv VENV_DIR followed by VENV_DIR/bin/pip3 install scikit-learn==1.1.2 numpy==1.23.3 scipy==1.9.1 pandas==1.5.0 matplotlib==3.6.0. (or VENV_DIR/Scripts/pip3 on Windows).
Windows installation
- On Windows, it can happen that python3 is not in PATH, while py command is – in that case you can use py -m venv VENV_DIR, which uses the newest Python available, or for example py -3.9 -m venv VENV_DIR, which uses Python version 3.9.

Git

Is it possible to keep the solutions in a Git repository?

Definitely. Keeping the solutions in a branch of your repository, where you merge them with the course repository, is probably a good idea. However, please keep the cloned repository with your solutions private.
On GitHub, do not create a public fork with your solutions

If you keep your solutions in a GitHub repository, please do not create a clone of the repository by using the Fork button – this way, the cloned repository would be public.

Of course, if you just want to create a pull request, GitHub requires a public fork and that is fine – just do not store your solutions in it.
How to clone the course repository?

To clone the course repository, run
```
git clone https://github.com/ufal/npfl129
```
This creates the repository in the npfl129 subdirectory; if you want a different name, add it as a last parameter.

To update the repository, run git pull inside the repository directory.
How to keep the course repository as a branch in your repository?

If you want to store the course repository just in a local branch of your existing repository, you can run the following command while in it:
```
git remote add upstream https://github.com/ufal/npfl129
git fetch upstream
git checkout -t upstream/master
```
This creates a branch master; if you want a different name, add -b BRANCH_NAME to the last command.

In both cases, you can update your checkout by running git pull while in it.
How to merge the course repository with your modifications?

If you want to store your solutions in a branch merged with the course repository, you should start by
```
git remote add upstream https://github.com/ufal/npfl129
git pull upstream master
```
which creates a branch master; if you want a different name, change the last argument to master:BRANCH_NAME.

You can then commit to this branch and push it to your repository.

To merge the current course repository with your branch, run
```
git merge upstream master
```
while in your branch. Of course, it might be necessary to resolve conflicts if both you and I modified the same place in the templates.

ReCodEx

What files can be submitted to ReCodEx?

You can submit multiple files of any type to ReCodEx. There is a limit of 20 files per submission, with a total size of 20MB.
What file does ReCodEx execute and what arguments does it use?

Exactly one file with py suffix must contain a line starting with def main(. Such a file is imported by ReCodEx and the main method is executed (during the import, __name__ == "__recodex__").

The file must also export an argument parser called parser. ReCodEx uses its arguments and default values, but it overwrites some of the arguments depending on the test being executed – the template should always indicate which arguments are set by ReCodEx and which are left intact.
What are the time and memory limits?

The memory limit during evaluation is 1.5GB. The time limit varies, but it should be at least 10 seconds and at least twice the running time of my solution. For competition assignments, the time limit is 5 minutes.

Requirements

To pass the exam, you need to obtain at least 60, 75, or 90 points out of 100-point exam to receive a grade 3, 2, or 1, respectively. The exam consists of 100-point-worth questions from the list below (the questions are randomly generated, but in such a way that there is at least one question from every but the first lecture). In addition, you can get at most 40 surplus points from the practicals and at most 10 points for community work (i.e., fixing slides or reporting issues) – but only the points you already have at the time of the exam count. You can take the exam without passing the practicals first.

Exam Questions

Lecture 1 Questions

Define prediction function of a linear regression model and write down $L^2$ -regularized mean squared error loss. [5]
Starting from unregularized sum of squares error of a linear regression model, show how the explicit solution can be obtained, assuming $\boldsymbol X^T \boldsymbol X$ is regular. [10]

Lecture 2 Questions

Define expectation $\mathbb{E}[f(x)]$ and variance $\operatorname{Var}(f(x))$ of a discrete random variable. Then define the bias of an estimator and show that estimating an expectation using a single sample is unbiased. [5]
Describe standard gradient descent and compare it to stochastic (i.e., online) gradient descent and minibatch stochastic gradient descent. [5]
Formulate conditions on the sequence of learning rates used in SGD to converge to optimum almost surely. [5]
Write an $L^2$ -regularized minibatch SGD algorithm for training a linear regression model, including the explicit formulas of the loss function and its gradient. [5]

Lecture 3 Questions

Define binary classification, write down the perceptron algorithm and show how a prediction is made for a given example. [5]
Show that the perceptron algorithm is an instance of stochastic gradient descent. Why are the learning rates not needed (i.e., why are the predictions of a trained model the same for all positive learning rates)? [5]
For discrete random variables, define entropy, cross-entropy, Kullback-Leibler divergence, and prove the Gibbs inequality (i.e., that KL divergence is non-negative). [5]
Define data-generating distribution, empirical data distribution, and likelihood. [5]
Describe maximum likelihood estimation, as minimizing NLL, cross-entropy, and KL divergence. [10]
Considering binary logistic regression model, write down its parameters (including their size) and explain how prediction is performed (including the formula for the sigmoid function). Describe how we can interpret the outputs of the linear part of the model as logits. [5]
Write down an $L^2$ -regularized minibatch SGD algorithm for training a binary logistic regression model, including the explicit formulas of the loss function and its gradient. [10]

Lecture 4 Questions

Define mean squared error and show how it can be derived using MLE. [5]
Considering $K$ -class logistic regression model, write down its parameters (including their size) and explain how prediction is performed (including the formula for the softmax function). Describe how we can interpret the outputs of the linear part of the model as logits. [5]
Write down an $L^2$ -regularized minibatch SGD algorithm for training a $K$ -class logistic regression model, including the explicit formulas of the loss function and its gradient. [10]
Prove why are decision regions of a multiclass logistic regression convex. [5]
Considering a single-layer MLP with $D$ input neurons, $H$ hidden neurons, $K$ output neurons, hidden activation $f$ , and output activation $a$ , list its parameters (including their size) and write down how the output is computed. [5]
List the definitions of frequently used MLP output layer activations (the ones producing parameters of a Bernoulli distribution and a categorical distribution). Then write down three commonly used hidden layer activations (sigmoid, tanh, ReLU). [5]
Considering a single-layer MLP with $D$ input neurons, a ReLU hidden layer with $H$ units and a softmax output layer with $K$ units, write down the explicit formulas of the gradient of all the MLP parameters (two weight matrices and two bias vectors), assuming input $\boldsymbol x$ , target $t$ , and negative log likelihood loss. [10]
Formulate the Universal approximation theorem. [5]

Lecture 5 Questions

How do we search for a minimum of a function $f(\boldsymbol x): \mathbb{R}^D \rightarrow \mathbb{R}$ subject to equality constraints $g_1(\boldsymbol x)=0, \ldots, g_m(\boldsymbol x)=0$ ? [5]
Prove which categorical distribution with $N$ classes has maximum entropy. [5]
Consider derivation of softmax using maximum entropy principle, assuming we have a dataset of $N$ examples $(x_i, t_i), x_i \in \mathbb{R}^D, t_i \in \{1, 2, \ldots, K\}$ . Formulate the three conditions we impose on the searched $\pi: \mathbb{R}^D \rightarrow \mathbb{R}^K$ , and write down the Lagrangian to be minimized. [10]
Define precision (including true positives and others), recall, $F_1$ score, and $F_\beta$ score (we stated several formulations for $F_1$ and $F_\beta$ scores; any one of them will do). [5]
Explain the difference between micro-averaged and macro-averaged $F_1$ scores. [5]
Describe k-nearest neighbors prediction, both for regression and classification. Define $L_p$ norm and describe uniform, inverse, and softmax weighting. [5]

Lecture 6 Questions

Define a kernel based on a feature map $\varphi: \mathbb{R}^D \rightarrow \mathbb{R}^F$ , and write down the formulas for (1) a polynomial kernel of degree $d$ , (2) a polynomial kernel of degree at most $d$ , (3) an RBF kernel. [5]
Define a kernel and write down the mini-batch SGD training algorithm of dual formulation of kernel linear regression (including the update for the bias). Then describe how are predictions for unseen data made. [10]
Derive the primary formulation of hard-margin SVM (the value to minimize, the constraints to fulfill) as a maximum-margin classifier (i.e., start by margin maximization). [5]
How do we search for a minimum of a function $f(\boldsymbol x): \mathbb{R}^D \rightarrow \mathbb{R}$ subject to an inequality constraint $g(\boldsymbol x) \ge 0$ ? Formulate both the variant with KKT conditions and the variant with the $\lambda$ maximization, and prove that they are equivalent. [10]
Starting from primary hard-margin SVM formulation, derive the dual formulation (the Lagrangian $\mathcal{L}$ in the form used for training, the required conditions, the KKT conditions of the solution, and how is the prediction performed). [10]
Considering hard-margin SVM, define what is a support vector, and how are predictions performed for unseen data. [5]

Lecture 7 Questions

Write down the primary formulation of soft-margin SVM using the slack variables (the value to minimize, the constraints to fulfill). [5]
Starting from primary soft-margin SVM formulation, derive the dual formulation (the Lagrangian $\mathcal{L}$ in the form used for training, the required conditions, the KKT conditions of the solution, and how is prediction performed). [10]
Write down the primary formulation of soft-margin SVM using the hinge loss. [5]
Describe the high-level overview of the SMO algorithm (the test whether the KKT conditions hold, how do we select the $a_i$ and $a_j$ to update, what is the goal of updating the $a_i$ and $a_j$ , how do we detect convergence; but without the update of $a_i$ , $a_j$ , $b$ themselves). [5]
Describe the part of the SMO algorithm which updates $a_i$ and $a_j$ to maximize the Lagrangian. If you explain how the update is derived (so that if I followed the instructions, I would come up with the update rules), you do not need to write explicit formulas. [10]
Describe the part of the SMO algorithm which updates $b$ to maximize the Lagrangian. If you explain how the update is derived (so that if I followed the instructions, I would come up with two $b$ candidates and a rule to utilize them), you do not need to write explicit formulas. [10]
Describe the one-versus-one and one-versus-rest schemes of constructing a $K$ -class classifier by combining multiple binary classifiers. Which of them cannot be used for SVM and why? [5]

Lecture 8 Questions

Explain how is the TF-IDF weight of a given document-term pair computed. [5]
Define conditional entropy, mutual information, write down the relation between them, and finally prove that mutual information is zero if and only if the two random variables are independent (you do not need to prove statements about $D_\textrm{KL}$ ). [5]
Show that TF-IDF terms can be considered portions of suitable mutual information. [5]
Show that $L^2$ -regularization can be obtained from a suitable prior by Bayesian inference (from the MAP estimate). [5]
Write down how $p(C_k | \boldsymbol x)$ is approximated in a Naive Bayes classifier, explicitly state the Naive Bayes assumption, and show how is the prediction performed. [5]
Considering a Gaussian naive Bayes, describe how are $p(x_d | C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]
Considering a Multinomial naive Bayes, describe how are $p(\boldsymbol x | C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]
Considering a Bernoulli naive Bayes, describe how are $p(x_d | C_k)$ modeled (what distribution and which parameters does it have) and how we estimate it during fitting. [5]
Describe the difference between a generative and a discriminative model, the strengths of these models, and explain why is logistic regression and multinomial/Bernoulli naive Bayes called a generative-discriminative pair. [5]

Lecture 9 Questions

Prove that independent discrete random variables are uncorrelated. [5]
Write down the definition of covariance and Pearson correlation coefficient $\rho$ , including its range. [5]
Explain how are the Spearman's rank correlation coefficient and the Kendall rank correlation coefficient computed (no need to describe the Pearson correlation coefficient). [5]
Considering an averaging ensemble of $M$ models, prove the relation between the average mean squared error of the ensemble and the average error of the individual models, assuming the model errors have zero means and are uncorrelated. [10]
In a regression decision tree, state what values are kept in internal nodes, define the squared error criterion and describe how is a leaf split during training (without discussing splitting constraints). [5]
In a $K$ -class classification decision tree, state what values are kept in internal nodes, define the Gini index and describe how is a node split during training (without discussing splitting constraints). [5]
In a $K$ -class classification decision tree, state what values are kept in internal nodes, define the entropy criterion and describe how is a node split during training (without discussing splitting constraints). [5]
For binary classification, derive the Gini index from a squared error loss. [10]
For $K$ -class classification, derive the entropy criterion from a non-averaged NLL loss. [10]
Describe how is a random forest trained (including bagging and a random subset of features) and how is prediction performed for regression and classification. [5]

Lecture 10 Questions

Write down the loss function which we optimize in gradient boosting decision tree during the construction of $t^\mathrm{th}$ tree. Then define $g_i$ and $h_i$ and show the value $w_\mathcal{T}$ of optimal prediction in node $\mathcal{T}$ . [10]
Write down the loss function which we optimize in gradient boosting decision tree during the construction of $t^\mathrm{th}$ tree. Then define $g_i$ and $h_i$ and the criterion used during node splitting. [10]
How is the learning rate used during training and prediction of a gradient boosting decision tree? [5]
For a $K$ -class classification, describe how to perform prediction with a gradient boosting decision tree trained for $T$ timestamps (how the individual trees perform prediction and how are the $K \cdot T$ trees combined to produce the predicted categorical distribution). [5]
Considering a $K$ -class classification, describe which trees and in which order are created during gradient boosted decision tree training, how does per-example loss look like (expressed in detail using predictions of the already trained trees), and how can we compute the per-example gradient of every tree. You do not need to describe the training process of the individual trees themselves. [10]

Lecture 11 Questions

When deriving the first principal component, write the value of the variance we aim to maximize, both without and with the covariance matrix (and define the covariance matrix). [5]
When deriving the first $M$ principal components, write the value of the reconstruction loss we aim to minimize using all but the first $M$ principal components, both without and with the covariance matrix (and define the covariance matrix). [10]
Write down the formula for whitening (sphering) the data matrix $\boldsymbol X$ , and state what mean and covariance does the result have. [5]
Explain how to compute the PCA of dimension $M$ using the SVD decomposition of a data matrix $\boldsymbol X$ , and why it works. [5]
Given a data matrix $\boldsymbol X$ , write down the algorithm for computing the PCA of dimension $M$ using the power iteration algorithm. [10]
Describe the K-means algorithm, including the kmeans++ initialization. [10]

Lecture 12 Questions

Define the multivariate Gaussian distribution of dimension $D$ . [5]
Show how to sample from a multivariate Gaussian distribution $\mathcal{N}(\boldsymbol \mu, \boldsymbol \Sigma)$ with a full covariance matrix, by using random samples from $\mathcal{N}(0, \boldsymbol I)$ distribution. [5]
Describe the constant surfaces of a multivariate Gaussian distribution with (1) $\sigma^2 \boldsymbol I$ covariation, (2) a diagonal covariation matrix, (3) a full covariation matrix. [5]
Considering a Gaussian mixture with $K$ clusters, explain how we represent the individual clusters and write down the likelihood of an example $\boldsymbol x$ for a given Gaussian mixture. [5]
Write down the log-likelihood of an $N$ -element dataset for a given Gaussian mixture model with $K$ components. [5]
Considering the algorithm for Gaussian mixture clustering, write down the E step (how to compute the responsibilities) and the M step (how to update the means, covariances, and priors of the individual clusters). [10]
Write down the MSE loss of a regression problem, and formulate the bias-variance trade-off, i.e., the decomposition of expected MSE loss (with respect to a randomly sampled test set) into bias, variance, and irreducible error terms. [10]

Lecture 13 Questions

Considering statistical hypothesis testing, define type I errors and type II errors (in terms of the null hypothesis). Finally, define what a significance level is. [5]
Explain what a test statistic and a p-value are. [5]
Write down the steps of a statistical hypothesis test, including a definition of a p-value. [5]
Explain the differences between a one-sample test, two-sample test, and a paired test. [5]
When considering multiple comparison problem, define the family-wise error rate, and prove the Bonferroni correction, which allows limiting the family-wise error rate by a given $\alpha$ . [5]
For a trained model and a given test set with $N$ examples and metric $E$ , write how to estimate 95% confidence intervals using bootstrap resampling. [5]
For two trained models and a given test set with $N$ examples and metric $E$ , explain how to perform a paired bootstrap test that the first model is better than the other. [5]
For two trained models and a given test set with $N$ examples and metric $E$ , explain how to perform a random permutation test that the first model is better than the other with a significance level $\alpha$ . [5]

Search form

Machine Learning for Greenhorns – Winter 2022/23

About

Timespace Coordinates

Lectures

License

1. Introduction to Machine Learning

2. Linear Regression II, SGD

3. Perceptron and Logistic Regression

4. Multiclass Logistic Regression, Multilayer Perceptron

5. Derivation of Softmax, F1, k-NN

6. Kernel Methods, SVM

7. Soft-margin SVM, SMO

8. TF-IDF, Naive Bayes

9. Correlation, Model Combination, Decision Trees, Random Forests

10. Gradient Boosting Decision Trees

11. PCA, K-Means

12. Gaussian Mixture, EM Algorithm, Bias-Variance Trade-off

13. Statistical Hypothesis Testing, Model Comparison

Requirements

Environment

Teamwork

No Cheating

linear_regression_manual

linear_regression_features

linear_regression_l2

linear_regression_sgd

feature_engineering

rental_competition

perceptron

logistic_regression_sgd

grid_search

thyroid_competition

softmax_classification_sgd

mlp_classification_sgd

mnist_competition

multilabel_classification_sgd

k_nearest_neighbors

diacritization

kernel_linear_regression

diacritization_dictionary

smo_algorithm

svm_multiclass

tf_idf

naive_bayes

isnt_it_ironic

metric_correlation

decision_tree

random_forest

gradient_boosting

human_activity_recognition

pca

kmeans

nli_competition

gaussian_mixture

bootstrap_resampling

permutation_test

Submitting to ReCodEx

Competition Evaluation

What Is Allowed

Install

Git

ReCodEx

Requirements

Exam Questions

Related Courses

Deep Learning

Deep Reinforcement Learning

Introduction to Machine Learning with R

Archive

Winter 2021/22

Winter 2020/21

Winter 2019/20