This course is a comprehensive introduction to machine learning methods. If the difference between statistics and machine learning sometimes appears to be blurred, because some methods are common to both disciplines, the basic idea is that statistics is above all oriented towards the estimation of parameters, in order to interpret them, whereas machine learning is above all oriented towards prediction. The course will present a number of popular prediction algorithms. On the one hand, we will try to understand how to analyze from a mathematical point of view the performance of these algorithms. On the other hand, through R-based practice sessions, we will see how to use these methods in practice.
The final grade of the course will be composed of the continuous assessment grade (33%) and the final exam (67%). The continuous assessment grade is equal to the mid-term grade, assessed during a machine test.
Difference between estimation (statistical) and prediction (ML); definition of loss functions, risk, empirical risk. Transition from the paradigm where the basic object is more the algorithm (ML) than the model (statistical).
2 Classification algorithms.
Methods from statistics, linear discrimination. Nearest neighbor method and other universally consistent methods. Decision trees and random forests.
3 Regression algorithms.
Least squares method. Penalization methods. RIDGE estimator. LASSO estimator.
4 Selection of estimators.
Discussion of empirical risk minimization methods. Learning and test data. Cross-validation.