Information Theory for Machine Learning
Enseignant
KHALEGHI Azadeh
Département : Statistics
                            Crédits ECTS :
                            4
                        
                            Heures de cours :
                            18
                        
                            Heures de TD :
                            6
                        
                            Langue :
                            Anglais
                        
                            Modalité d'examen :
                            écrit
                        
Objectif
The purpose of this module is to introduce students to the main concepts of information theory and their applications to statistics and machine learning. The course covers entropy, divergence measures, the chain rule, mutual information, sufficient statistics, and the data-processing inequality. Students will study f-divergences (including total variation, Hellinger, KL, and chi-square divergences), the asymptotic equipartition property (AEP) and typicality, information projections and large deviations (log moment generating functions, rate functions, Sanov’s theorem), metric entropy in finite- and infinite-dimensional spaces, as well as minimax lower bound techniques such as Le Cam’s two-point method and Assouad’s lemma. Entropic bounds (Yang–Barron, Le Cam–Birgé) will also be discussed, and if time permits, a brief introduction to coding theory will be offered.
Plan
The following material will be covered during this course.
Entropy, Divergence, Chain Rule, Mutual Information
Sufficient Statistics and Data-Processing Inequality, Fano's inequality    
f-divergences Definition, Total Variation, Hellinger, Inequalities between f-divergences, examples: KL, chi2, Variational Representation
Asymptotic Equipartition Property (AEP) and typicality
Information Projections and Large Deviations: log MGF and rate function, Sanov's theorem
Metric Entropy: Covering and Packing, Finite-Dimensional Space and Volume bound, Sudakov Minoration, Slepian’s lemma, Hilbert ball, Infinite-Dimensional Spaces, Small-ball Probability
Le Cam's 2-point method, Assouad's Lemma
Entropic Bounds Yang-Barron, Le Cam-Birgé
If time permits, we may also discuss a bit of coding theory. 
Références
- Elements of Information Theory, Thomas and Cover, 2005
- Information Theory, Coding Theorems for Discrete Memorlyess Systems, Imre Csiszar, 2012.
- Information Theory, Yury Polyanskiy and Yihong Wu, 2023
- Ergodic Theory, Karl E. Petersen, 1983 (only the section on Entropy is consulted)
- Entropy and Information, Robert Gray, 2023
 
     
					 
    
     
					 
 
 
 
 
 
