Information Theory for Machine Learning

Teacher

KHALEGHI Azadeh

Department: Statistics

ECTS:
4

Course Hours:
18

Tutorials Hours:
6

Language:
English

Examination Modality:
written exam

Objective

The purpose of this module is to introduce students to the main concepts of information theory and their applications to statistics and machine learning. The course covers entropy, divergence measures, the chain rule, mutual information, sufficient statistics, and the data-processing inequality. Students will study f-divergences (including total variation, Hellinger, KL, and chi-square divergences), the asymptotic equipartition property (AEP) and typicality, information projections and large deviations (log moment generating functions, rate functions, Sanov’s theorem), metric entropy in finite- and infinite-dimensional spaces, as well as minimax lower bound techniques such as Le Cam’s two-point method and Assouad’s lemma. Entropic bounds (Yang–Barron, Le Cam–Birgé) will also be discussed, and if time permits, a brief introduction to coding theory will be offered.

Learning Outcomes
By the end of this module, students will be able to:

Explain and apply fundamental concepts of information theory to statistical problems.
Analyze and compare probability measures using f-divergences and related inequalities.
Apply the asymptotic equipartition property and principles of typicality.
Use large deviation techniques and information projections to solve probabilistic problems.
Understand and apply metric entropy concepts in both finite- and infinite-dimensional settings.
Employ minimax lower bound techniques in statistical estimation.
Interpret and apply entropic bounds in statistical learning contexts.

Planning

The following material will be covered during this course.

Entropy, Divergence, Chain Rule, Mutual Information
Sufficient Statistics and Data-Processing Inequality, Fano's inequality
f-divergences Definition, Total Variation, Hellinger, Inequalities between f-divergences, examples: KL, chi2, Variational Representation
Asymptotic Equipartition Property (AEP) and typicality
Information Projections and Large Deviations: log MGF and rate function, Sanov's theorem
Metric Entropy: Covering and Packing, Finite-Dimensional Space and Volume bound, Sudakov Minoration, Slepian’s lemma, Hilbert ball, Infinite-Dimensional Spaces, Small-ball Probability
Le Cam's 2-point method, Assouad's Lemma
Entropic Bounds Yang-Barron, Le Cam-Birgé

If time permits, we may also discuss a bit of coding theory.

References

Elements of Information Theory, Thomas and Cover, 2005
Information Theory, Coding Theorems for Discrete Memorlyess Systems, Imre Csiszar, 2012.
Information Theory, Yury Polyanskiy and Yihong Wu, 2023
Ergodic Theory, Karl E. Petersen, 1983 (only the section on Entropy is consulted)
Entropy and Information, Robert Gray, 2023

Back