Non-Convex Learning through the Lens of Algorithmic Stability
The past few decades have witnessed tremendous progress in the field of Machine Learning (ML) and Artificial Intelligence (AI). Perhaps surprisingly, our theoretical understanding, and thus the ability of developing principled improvements, has lagged behind. Key to this gap between practice and theory is the emerge of complicated machine learning models, whose underlying optimization problems admit a non-convex structure. Therefore, a key challenge is to understand under what conditions non-convex learning tasks are efficiently learnable.
In this talk we will identify algorithmic stability as a prominent algorithmic property enabling an efficient reduction approach for both stochastic and online non-convex learning. We discuss the following two results:
a) Focusing on a wide class of "nice" non-convex learning tasks, called strict saddle problems, we show that every algorithm that minimizes the empirical risk is on-average stable. As a result, we derive fast (sample complexity) rates resembling rates attainable in the strongly convex case.
b) We suggest an efficient offline-to-online reduction for the non-convex setting. Namely, assuming an access to an offline optimization oracle, we develop an efficient online learning algorithm for any sequence of Lipschitz-bounded (but not necessarily convex) loss functions. Apparently, efficient reduction in this setting amounts to stabilizing the offline oracle. Notably, we bypass a recent hardness result of Hazan and Koren, 2016 by moderately refining the oracle model.
*Based on joint works with Shai Shalev-Shwartz (2017) and Elad Hazan (2018).