Using PySpark and interactive Jupyter notebook on Amazon Clusters
13:30-16:30 January 26

½ day
Intermediate level

Details

Working with Big Data sometimes requires access to remote distributing systems such as Amazon or Google Cloud services. In this workshop, I will be showing how you can set up PySpark on Amazon Elastic Map Reduce (EMR) and do interactive data processing and machine learning on EMR from a Jupyter notebook on your local computer.


Outcome

At the end of the workshop, participants will be able to use Pyspark for data processing and machine learning on Amazon EMR. They also learn how to set up an interactive Jupyter notebook to connect with Amazon EMR clusters.


Prerequisites

  • Know how to use PySpark (or have already participated in the PySpark: Big Data Processing and Machine Learning with Python workshop).
  • Either make a free account on AWS (which comes with some free tier to use in the workshop) or if they already have used up their free tier, they will be charged upon using AWS during the workshop (the costs are expected to be less that 10 CHF).

Organizers

Hamed Razavi

Scientist, EPFL