Introduction to Applied Statistical Learning – CI/MS


Objective

This course is a comprehensive introduction to machine learning methods. It will introduce the typical problems of data description and modeling in order to better predict the response of a new individual. We will describe the algorithms and quantify their good behavior and, in parallel, through R-based work sessions, we will see how to use these methods in practice.

At the end of this course, the students should be able to

  • Set up classification or regression methods
  • Know the theory of the methods presented
  • Read and interpret the digital outputs of these methods
     

Planning

Introduction.

  • Difference between estimation (statistical) and prediction (ML); definition of loss functions, risk, empirical risk. 

Classification algorithms.

  • Methods from statistics, linear discrimination. Nearest neighbor method and other universally consistent methods. Decision trees and Random forests.

Regression algorithms.

  • Least squares method. Penalization methods: RIDGE estimator, LASSO estimator and Elastic Net.

Selection of estimators.

  • Empirical risk minimization methods. Learning and test data. Cross-validation.

Références

Devroye, Györfi, Lugosi – A Probabilistic Theory of Pattern Recognition – (1996) Springer-Verlag
Hastie, Tibshirani, Friedman – The Elements of Statistical Learning – (2008) Springer Series in Statistics