Deployment of Data-Science Projects

Teacher

AVOUAC Romain

ECTS:
3

Course Hours:
18

Tutorials Hours:
0

Language:
French

Examination Modality:
mém.

Objective

This course is taught by Lino Galiana and Romain Avouac.

This course covers the most important aspects of deployment of data-science projects. During the course, we will follow the example of an API that is used to serve a Machine Learning Model.

The evaluation of the course is twofold:

- first, in a group of 2 or 3, students will have to choose a personnal project, make it compliant to the best practices, choose a format and publish it on a production infrastructure

- second, every student will play as a data-scientist who will check the quality of contributions to a project. They will, on their own, evaluate a projects of another group (peer review), discuss technical choices and practices used

Planning

Prerequisite

This course follow the Infrastrucure and Software System course of the first semester. It is higly recommended to have followed that course or to know the following :

Git version control

Linux Terminal

It is also supposed that Python for data-science (in second year) was taken. If it is not, you can browse the content of the course on the course website.

Part 1: Best development practices

Code quality standards
Project architecture

Working collaboratively with Git and GitHub

Part 2: towards deployment

Maximize reproductibility and portability of projects

Virtual environments: venv and conda
Containers: Docker

Deploy and highlight a data-science project

Production environment
Continuous integration (CI) and continuous deployment (CD)
Principles of Kubernetes
Valorisation formats

Introduction to MLOps

References

Site du cours : https://ensae-reproductibilite.github.io/website/

Back