This course provides an introduction to online learning. The aim of online
learning is to develop efficient recursive algorithms of prediction when the
data are arriving sequentially in a streaming way rather than as an array
given once and for all. The course covers several fundamental problems
and methods such as online gradient descent, prediction with expert advice
and the bandit problem. One of the key ideas is, given several candidate
predictors, at each step of the algorithm make them vote rather than select
one of them. A closely related problem is that of aggregation, which consists
in constructing statistical estimators that predict almost as good as the best
estimator in a given set. It will be shown that these online voting techniques
based specically on sequential exponential weighting solve the problem of
aggregation in an optimal way under very general conditions.
- Basics: online regret, realizable case, halving.
- Online gradient descent for convex and strongly convex loss. Online-to-batch conversion. Perceptron.
- Exponential weighting. Prediction with expert advice.
- Exponential weighting in the bandit problem.
- Aggregation of estimators.
- Gradient-free online learning. Continuous bandit problem.
Cesa-Bianchi, N. , Lugosi, G. (2006) Prediction, Learning, and Games. Cambridge Univ. Press.
Shalev-Schwartz, S. (2011) Online learning and online convex optimization. Foundations and Trends in Machine Learning, vol. 4, pages 107-194.?