ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Deployment of Data-Science Projects

Objective

This course is taught by Lino Galiana and Romain Avouac.

This course covers the most important aspects of deployment of data-science projects. During the course, we will follow the example of an API that is used to serve a Machine Learning Model.

The evaluation of the course is twofold:

- first, in a group of 2 or 3, students will have to choose a personnal project, make it compliant to the best practices, choose a format and publish it on a production infrastructure

- second, every student will play as a data-scientist who will check the quality of contributions to a project. They will, on their own, evaluate a projects of another group (peer review), discuss technical choices and practices used

Planning

Prerequisite

This course follow the Infrastrucure and Software System course of the first semester. It is higly recommended to have followed that course or to know the following : 

It is also supposed that Python for data-science (in second year) was taken. If it is not, you can browse the content of the course on the course website.

Part 1: Best development practices

  • Code quality standards

  • Project architecture

  • Working collaboratively with Git and GitHub

Part 2: towards deployment

  • Maximize reproductibility and portability of projects

  • Virtual environments: venv and conda

  • Containers: Docker

  • Deploy and highlight a data-science project

  • Production environment

  • Continuous integration (CI) and continuous deployment (CD)

  • Principles of Kubernetes 

  • Valorisation formats

  • Introduction to MLOps