Title: DoG is SGD’s best friend: toward tuning-free stochastic optimization
Speaker: Yair Carmon – Assistant Professor, Tel Aviv University
Date: May 29
Time: 4:00 PM
Location: Packard 202
While stochastic optimization methods drive continual improvements in machine learning, choosing the optimization parameters—and particularly the learning rate (LR)—remains a difficulty. In this talk, I will describe our work on removing LR tuning from stochastic gradient descent (SGD), culminating in a tuning-free dynamic SGD step size formula, which we call Distance over Gradients (DoG). We show that DoG removes the need to tune learning rate both theoretically (obtaining strong parameter-free convergence guarantees) and empirically (performing nearly as well as expensively-tuned SGD on neural network training tasks). Our developments rely on a novel certificate for the SGD step size choice, and strong time-uniform empirical-Bernstein-type concentration bounds.
Joint work with Maor Ivgi and Oliver Hinder.
Yair Carmon is an assistant professor of computer science at Tel Aviv university. He works on the foundations of optimization and machine learning, focusing on questions about fundamental limits and robustness. Yair received a PhD from Stanford University, advised by John Duchi and Aaron Sidford, and M.Sc. and B.Sc. degrees from the Technion.