Dynamic optimization and reinforcement learning


Objective

Dynamic optimization problems are concerned with the properties of dynamic systems that evolve deterministically or in an environment of uncertainty, and that can be acted upon/guided by means of control in order to optimize a certain criterion (optimal control). The origins and applications are very diverse: engineering (rocket: trajectory control), mechanics (car: turning the steering wheel, accelerator pedal), management, economy or finance, automatic learning, video games, robotics, etc…
The objective of this course is to present the tools and different basic mathematical approaches of the theory of optimal control, in particular dynamic programming, and to illustrate them with concrete applications, especially in economics and finance. The first part will deal with the deterministic framework, and the second part will focus on the stochastic framework with an introduction to the theoretical and algorithmic aspects of reinforcement learning.
 

Planning

Part 1 – Deterministic Optimization

  1. Introduction: discrete-time model
  2. Continuous time dynamic programming approach
  3. Pontryagin's Maximum Continuous Time Principle

Part 2 – Introduction to discrete-time stochastic optimization and reinforcement learning 

  1. Introduction
  2. Markovian decision-making process
  3. Bellman Principle of Optimality
  4. Reinforcement learning algorithms

Références

Part I

  1. Carlier G. Programmation dynamique, notes de cours de l'ENSAE, 2007.
  2. Fleming W.H. et Rishel R.W. (1975), Deterministic and Stochastic Optimal Control, Springer-Verlag.
  3. Kamien M. et N. Schwartz: Dynamic Optimization, 1991, 2ème édition, North Holland.
  4. Trélat E. : Contrôle optimal : théorie et applications, 2008, Vuibert, 2nde édition.

Part II

  1. Bauerle, N. et U. Rieder (2011): Markov Decision Processes with Applications to Finance, Springer
  2. Sutton et Barto (1998): Introduction to Reinforcement Learning.
  3. Szepesvari (2009): Algorithms for Reinforcement Learning.
  4. Groupe PDMIA (2008): Processus décisionnels de Markov en intelligence artificielle.