Advances in ML: Theory meets practice
13:30-16:30 January 27

½ day
Intermediate level

@ 1BC


Efficient algorithms are indispensable in large scale ML applications. In recent years, the ML community has not just been a large consumer of what the optimization literature had to offer, but it has also been acting as a driving force in the development of new algorithmic tools. The challenges of massive data and efficient implementations have led to many cutting-edge advances in optimization.

The goal of this workshop is to bring practitioners and theoreticians together and to stimulate the exchange between experts from industry and academia. For practitioners, the workshop should give an idea of exciting new developments which they can *use* in their work. For theorists, it should provide a forum to frame the practicality of assumptions and recent work, as well as potentially interesting open questions.


Welcome Remarks

13:30-13:35 January 27

Theory of neural networks training: challenges and recent results

13:35-14:15 January 27

Theory Vs. Practice – It’s a Data Problem

14:15-14:55 January 27 · with Claudiu Musat

Coffee Break

14:55-15:10 January 27

Bayesian Hyperparameter Optimization for Automated Machine Learning

15:10-15:50 January 27 · with Aaron Klein

Multilingual word alignment

15:50-16:30 January 27 · with Armand Joulin


Lénaïc Chizat: Theory of neural networks training: challenges and recent results

CNRS researcher in Orsay (France)

The current successes achieved by neural networks are mostly driven by experimental exploration of various architectures, pipelines, and hyper-parameters, motivated by intuition rather than precise theories. Focusing on the optimization/training aspect, we will see in this talk why pushing theory forward is challenging, but also why it matters and key insights it may lead to. Along the way, we will present some recent results on the role of over-parameterization, on the phenomenon of "lazy training" and on training neural networks with a single hidden layer.

Claudiu Musat: Theory Vs. Practice – It’s a Data Problem

Director of Research, Data, Analytics & AI, Swisscom

Focusing on the creation of dialogue systems, the Swisscom ML research team has been often asked to improve systems that are currently in use. To an ML person, this is a dream case, as the starting expectation is that any ML system should beat hand crafted rules. Moreover, after framing the problem correctly, we find that it is actually well studied and that the solutions abound. The task is easy. In theory.

We find that the data available at the start of projects is not there, not in the right format or simply not enough. In this talk I will discuss possible solutions to the low data availability and showcase several problems where traditional approaches fail. In practice.

Aaron Klein: Bayesian Hyperparameter Optimization for Automated Machine Learning

Phd student, Machine Learning Lab (Frank Hutter), University of Freiburg

Machine learning has recently achieved great successes in a wide range of practical applications, but the performance of the most prominent methods depends more strongly than ever on the correct setting of many internal hyperparameters. The best-performing models for many modern applications are getting ever larger and thus more computationally expensive to train, but at the same time both researchers and practitioners desire to set as many hyperparameters automatically as possible. Automatic machine learning (AutoML) is a new research area that targets the progressive automation of machine learning. One of its success stories is Bayesian hyperparameter optimization which tries to find the best hyperparameter setting for a given machine learning algorithm.

In this talk, I will show how Bayesian optimization can be efficiently used for hyperparameter optimization. I will also present recent advances that speed up the optimization process by exploiting cheap approximations of the objective function, such as the performance when running on a subset of data or the learning curves of iterative machine learning algorithms.

Armand Joulin: Multilingual word alignment

Research scientist, Facebook Artificial Intelligence Research

We consider the problem of aligning continuous word representations, learned in multiple languages, to a common space. It was recently shown that, in the case of two languages, it is possible to learn such a mapping. This talk will present several recent approaches for bilingual alignment that works with and without supervision, as well as an extension to the problem of jointly aligning multiple languages to a common space.

In presence of supervision, we show that this problem can be cast as retrieval problem with a convex formulation, leading to significant improvement over the state of the art. In absence of supervision, we propose an approach based on optimal transport, with theoretical guarantees and competitive empirical performance.


The workshop will not discuss high level aspects of ML or data processing, but rather focus on core components that are essential for actual implementations. Hence, the participants should ideally be familiar with the main optimization algorithms used in ML and the main challenges arising in the implementations.


Sebastian Stich

Scientist, EPFL


Aymeric Dieuleveut

PostDoc, EPFL



Lénaïc Chizat

Researcher, CNRS


Claudiu Musat

Director of Research for Data, Analytics & AI, Swisscom


Aaron Klein

PhD Student, University of Freiburg


Armand Joulin

Research scientist, Facebook Artificial Intelligence Research