Beyond stochastic gradient descent for large-scale machine learning

15 septembre 2016, 11h00–12h15

Toulouse

Salle MS 003

MAD-Stat. Seminar

Résumé

Many machine learning and statistics problems are traditionally cast as convex optimization problems. A common difficulty in solving these problems is the size of the data, where there are many observations ("large n") and each of these is large ("large p"). In this setting, online algorithms such as stochastic gradient descent which pass over the data only once, are usually preferred over batch algorithms, which require multiple passes over the data. Given n observations/iterations, the optimal convergence rates of these algorithms are O(1/n^(1/2)) for general convex functions and reaches O(1/n) for strongly-convex functions. In this talk, I will show how the smoothness of loss functions may be used to design novel simple algorithms with improved behavior, both in theory and practice: in the ideal infinite-data setting, an efficient novel Newton-based stochastic approximation algorithm leads to a convergence rate of O(1/n) without strong convexity assumptions. (joint work with Alexandre Defossez, Aymeric Dieuleveut, Nicolas Flammarion, and Eric Moulines)

Beyond stochastic gradient descent for large-scale machine learning

Résumé

Partager