Data exploration and preparation for Machine Learning
09:00-16:30 January 26

Full day
Beginner level

Details

“It is important to understand what you CAN DO before you learn to measure how WELL you seem to have DONE it.”

John W. Tukey


Data preparation and exploration is the most time consuming part of machine learning modeling, and it also highly influences the performance of the models. Some of the steps in data cleaning and exploration are must dos and some are really dependent on the the domain.


In this workshop we will go through various techniques involved in preparing data for machine learning modeling. We will not focus on the modeling, but rather we will choose a linear model and show how we can improve the model’s performance by better understanding the data and improving the data quality.


The workshop will be split into two parts, the first session focusing on the basic data exploration and the second session on advanced exploration. Participants can attend both sessions or just one. Beginners should though attend the first session as it will help with the more advanced session.


Outcome

At the end of the workshop participants will be at best be familiar with the various steps in data exploration and data cleaning needed when preparing structured data for machine learning modeling.


At worst participants will have a bird eye view of various techniques for data exploration.


Prerequisites

  • Basic Python and statistics knowledge
  • Running Python installation on own laptop, and Jupyter Notebook installed

Organizers

Tereza Iofciu

Data Scientist, mytaxi

Twitter  ·  Website

Alisa Dammer

Data Scientist, mytaxi

Website

Honza Bílek

Data Scientist, mytaxi

Philipp Kähler

Data Engineer, mytaxi

Caio Miyashiro

Data Scientist, mytaxi