ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Optimal Transport: From Theory to Tweaks, Computations and Applications in Machine Learning



Département : Statistics



This course will present recent methodological advances in machine learning that all have in common that they rely on geometric principles, and particularly on the idea that data analysis can be carried out using pairwise comparisons between data points. We will cover in particular the cases where such pairwise comparisons are distances or kernel similarities. The course will answer the following questions:

  1. Visualization of metric data: how can we represent and visualize data that is available under the form of a matrix of pairwise distances or similarities?
  2. Learning metrics: Given a task at hand (notably classification), how can we choose a "good" metric or kernel to improve the performance on that task?
  3. Metrics and kernels for exotic data-types (e.g. text, sequences, time-series, shapes, histograms): how can we choose a metrics or a kernel that performs well in supervised tasks? How can we ensure that they can be seamlessly used in a learning problem (e.g. auto-differentiated with modern frameworks such as tensorflow or pytorch)?


  1. (3 lectures + 1 programming session) Introduction, motivating examples (k-NN / SVM). Reminders on kernels and distances, Hilbert/Metric spaces, Positive / negative definiteness.
  2. (2 lectures + 1 programming session). Dimensionality reduction and visualization techniques for geometric data: (kernel)-PCA, (metric) multidimensional scaling, isomap, LLE, t-SNE, embeddings, extensions to (variational) autoencoders
  3.  (2 lectures + 1 programming session). Learning metrics. LMNN, localized FDA and other metric learning algorithms. Learning kernels: multiple kernel learning.
  4.  (3 lectures + 1 programming session). Metrics and kernels for structured data: Fisher kernels, kernels for strings/sequences/texts,  DTW/edit-distances for time-series, Distances/kernels on the simplex, Wasserstein and Gromov-Wasserstein metrics, autodifferentiation

Note: Lectures will be taught in english if non-french speakers register to the course, french otherwise.


Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. "Nonlinear component analysis as a kernel eigenvalue problem." Neural computation10.5 (1998): 1299-1319.

Tenenbaum, Joshua B., Vin De Silva, and John C. Langford. "A global geometric framework for nonlinear dimensionality reduction." science290.5500 (2000): 2319-2323.

Roweis, Sam T., and Lawrence K. Saul. "Nonlinear dimensionality reduction by locally linear embedding." Science 290.5500 (2000): 2323-2326.

Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

Davis, Jason V., et al. "Information-theoretic metric learning." Proceedings of the 24th international conference on Machine learning. ACM, 2007.

Weinberger, Kilian Q., and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." Journal of Machine Learning Research 10.Feb (2009): 207-244.

Aurélien Bellet, Amaury Habrard, Marc Sebban: Metric Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers 2015

Kulis, Brian. "Metric learning: A survey." Foundations and Trends in Machine Learning 5.4 (2012): 287-364.

Cuturi, Marco, et al. "A kernel for time series based on global alignments."2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07. Vol. 2. IEEE, 2007.

Villani, Cédric. Optimal transport: old and new. Vol. 338. Springer Science & Business Media, 2008.

Mémoli, Facundo. "Gromov–Wasserstein distances and the metric approach to object matching." Foundations of computational mathematics 11.4 (2011): 417-487.