ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Introduction to Applied Statistical Learning - CI/MS

Teacher

BUTUCEA Cristina

Department: Statistics

Objective

This course is a comprehensive introduction to machine learning methods. It will introduce the typical problems of data description and modeling in order to better predict the response of a new individual. We will describe the algorithms and quantify their good behavior and, in parallel, through R-based work sessions, we will see how to use these methods in practice.

At the end of this course, the students should be able to

  • Set up classification or regression methods
  • Know the theory of the methods presented
  • Read and interpret the digital outputs of these methods
     

Planning

Introduction.

  • Difference between estimation (statistical) and prediction (ML); definition of loss functions, risk, empirical risk. 

Classification algorithms.

  • Methods from statistics, linear discrimination. Nearest neighbor method and other universally consistent methods. Decision trees and Random forests.

Regression algorithms.

  • Least squares method. Penalization methods: RIDGE estimator, LASSO estimator and Elastic Net.

Selection of estimators.

  • Empirical risk minimization methods. Learning and test data. Cross-validation.

References

Devroye, Györfi, Lugosi - A Probabilistic Theory of Pattern Recognition - (1996) Springer-Verlag
Hastie, Tibshirani, Friedman - The Elements of Statistical Learning - (2008) Springer Series in Statistics