Machine Learning for Econometrics


This course covers recent applications of high-dimensional statistics and machine learning to econometrics, including variable selection, inference with high-dimensional nuisance parameters in different settings, heterogeneity, networks and text data. The focus will be on policy evaluation problems. Recent advances in the econometrics of policy evaluation such as the synthetic control method and Directed Acyclical Graphs (DAG) will be reviewed. If time allows, the course will also review optimal policy estimation and learning.

The goal of the course is to give insights about these new methods, their benefits and their limitations. It will mostly benefit students who are highly curious about recent advances in econometrics, whether they want to study theory or use them in applied work. Students are expected to be familiar with Econometrics 2 (2A) and Statistical Learning (3A).

A written exam will take place at the end of the semester.


  1. Introduction
  2. High-dimension, model selection and post-selection inference
  3. Methodology: Using Machine Learning Tools in Econometrics
  4. High-Dimension and Endogeneity
  5. The Synthetic Control Method
  6. Machine Learning Methods for Heterogeneous Treatment Effects
  7. Network Data and Peer Effects
  8. Analysis of Text Data


Lecture notes are available online at

There are no required textbooks but general references are:

Angrist, J.D. Pischke, J.S. (2008) “Mostly Harmless Econometrics”, Princeton University Press.

Imbens, G. and Rubin, D. (2015) “Causal Inference for Statistics, Social and Biomedical Sciences”, Cambridge University Press.

Mullainathan, S. and Spiess, J. (2017). “Machine Learning: An Applied Econometric Approach”, Journal of Economic Perspectives, Vol. 31, No. 2.

Wooldridge, J.M. (2010), “Econometric Analysis of Cross Section and Panel Data”, second edition, MIT Press.