ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Information Theory for Machine Learning

Objective

The purpose of this module is to introduce the students to the main concepts of information theory, and its applications to statistics and machine learning.

Planning

The following material will be covered during this course. 

Entropy, Divergence, Chain Rule, Mutual Information
Sufficient Statistics and Data-Processing Inequality, Fano's inequality    
f-divergences Definition, Total Variation, Hellinger, Inequalities between f-divergences, examples: KL, chi2, Variational Representation
Asymptotic Equipartition Property (AEP) and typicality
Information Projections and Large Deviations: log MGF and rate function, Sanov's theorem
Metric Entropy: Covering and Packing, Finite-Dimensional Space and Volume bound, Sudakov Minoration, Slepian’s lemma, Hilbert ball, Infinite-Dimensional Spaces, Small-ball Probability
Le Cam's 2-point method, Assouad's Lemma
Entropic Bounds Yang-Barron, Le Cam-Birgé


If time permits, we may also discuss a bit of coding theory. 

References

  1. Elements of Information Theory, Thomas and Cover, 2005
  2. Information Theory, Coding Theorems for Discrete Memorlyess Systems, Imre Csiszar, 2012.  
  3. Information Theory, Yury Polyanskiy and Yihong Wu, 2023 
  4. Ergodic Theory, Karl E. Petersen, 1983 (only the section on Entropy is consulted)
  5. Entropy and Information, Robert Gray, 2023