A Curvature-based Approach for Statistical Learning Theory
Both the statistical and the computational costs associated with minimizing the risk of a learning algorithm are determined by the curvature of the risk function. We present one computational and one statistical result:
1. We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equipping Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speed-up relative to state-of-the-art methods for this task.
2. We derive bounds on the sample complexity of empirical risk minimization (ERM) in
the context of minimizing non-convex risks that admit the strict saddle property. Recent
progress in non-convex optimization has yielded efficient algorithms for minimizing such
functions. Our results imply that these efficient algorithms are statistically stable and
also generalize well.
*This is a joint work with Shai Shalev-Shwartz (HUJI) and Francesco Orabona (Yahoo Research, New York)