ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

High-dimensional statistics

Objective

Statistical science has undergone a profound transformation over the last decade with the development of large-scale statistical inference methods. This recent development has been driven by the need to deal with new data, such that for each individual there is a large number of observed variables, sometimes more than the number of individuals in the sample. Of course, not all variables are relevant and usually there are very few. The notion of parsimony (sparsity) is therefore fundamental to the statistical interpretation of large scale data. The aim of this course is to present some of the founding principles that emerge in this context. These principles are common to many problems that have emerged recently, such as large linear regression, estimation of large low-rank matrices, and network models, for example, stochastic block models. Emphasis will be placed on the construction of optimal methods in convergence speed and their oracle properties.

Planning

  1. Gaussian suite model. Sparsity and thresholding procedures.
  2. Large dimensional linear regression. BIC, Lasso, Dantzig selector, square root Lasso methods.
  3. Oracle properties and variable selection. 
  4. Estimation of large low rank matrices. PCA Sparse. 
  5. Network inference. Stochastic block model. 
     

References

Alexandre Tsybakov. High-dimensional Statistics. Lecture Notes. (Detailed Lecture Notes are available.)