ENSAE Paris - École d'ingénieurs pour l'économie, la data science, la finance et l'actuariat

Python for Data Science


Python has recently become a more than convincing alternative for scientists and as it is a generic language, it is possible to manage all the processing applied to data, from data source processing to data visualization without changing the language. This course introduces different tools that allow you to make the data "speak" in order to quickly obtain results.


Part 1: Handling Data

* Introduction:

                Back to the basics of Python,

                Presentation of the Python Ecosystem for Data Science

                Introduction to good practices

                Presentation of the principles of data-science

* Handling structured data :

                Basic principles with numpy

                Manipulate databases with pandas and SQL

                Introduction to spatial data (geopandas)

* Handle less traditional data:

                Retrieve data by webscraping and APIs

                Manipulate text data

Part 2: View

* Presentation of the basic packages for graphics:

                matplotlib, seaborn

* Cartography:

                still maps

                dynamic maps (HTML)

Part 3: Modeling

* General models:



                Machine Learning with sklearn

* Natural Language Processing

* Deepening of Machine Learning models


Site web du cours: https://pythonds.linogaliana.fr/

Tous les codes sources sont disponibles sur Github: https://github.com/linogaliana/python-datascientist

Tous les chapitres du cours sont disponibles sur le site web et disponibles sous format notebook dans divers environnement (SSP Cloud, Google Colab, Binder, Visual studio dev...).

Les éléments relatifs à l'évaluation sont dans la section dédiée

Un ensemble de référence est disponible dans la section dédiée

La présentation faite en amphithéâtre est disponible ici