Information Theory for Machine Learning
Teacher
ECTS:
4
Course Hours:
18
Tutorials Hours:
6
Language:
Remedial English
Examination Modality:
mém.
Objective
The purpose of this module is to introduce the students to the main concepts of information theory, and its applications to statistics and machine learning.
Planning
The following material will be covered during this course.
Entropy, Divergence, Chain Rule, Mutual Information
Sufficient Statistics and Data-Processing Inequality, Fano's inequality
f-divergences Definition, Total Variation, Hellinger, Inequalities between f-divergences, examples: KL, chi2, Variational Representation
Asymptotic Equipartition Property (AEP) and typicality
Information Projections and Large Deviations: log MGF and rate function, Sanov's theorem
Metric Entropy: Covering and Packing, Finite-Dimensional Space and Volume bound, Sudakov Minoration, Slepian’s lemma, Hilbert ball, Infinite-Dimensional Spaces, Small-ball Probability
Le Cam's 2-point method, Assouad's Lemma
Entropic Bounds Yang-Barron, Le Cam-Birgé
If time permits, we may also discuss a bit of coding theory.
References
- Elements of Information Theory, Thomas and Cover, 2005
- Information Theory, Coding Theorems for Discrete Memorlyess Systems, Imre Csiszar, 2012.
- Information Theory, Yury Polyanskiy and Yihong Wu, 2023
- Ergodic Theory, Karl E. Petersen, 1983 (only the section on Entropy is consulted)
- Entropy and Information, Robert Gray, 2023