ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Parallel Programming for Machine Learning


This course is taught by Xavier Dupré and Matthieu Durut.

This course covers CPU programming (the classical processing unit) and GPU programming (on graphic cards). It is directed towards the goal of writing efficient programs which take advantage of the hardware architecture.

A first part of the course is deditaced to computer architecture, and in particular everything that enables pograms to run in parallel and communicate.

Sessions that follow put these notions in practice. First, C++ programming on CPU with several example of efficient algorithm implementations, then programming of GPU.


  1. Architecture: hardware, shared memory, order of magnitude of CPU speed, communication

  2. Parallel execution: algorithm, multithread, race condition, lock

  3. CPU parallelisation: development tools, examples of parallel programs, application to machine learning algorithm (Random Forest etc.)

  4. GPU programming:

    1. CUDA, threads, memory management

    2. Pointers, GPU/CPU interaction, using __inline__ and __globals__ 

    3. PyTorch: extension implementation