Overview

Course in theory of ML and optimization in ML, broad topic:

<aside>

Out-of-sample, aka generalization performance of optimization algorithms in ML

</aside>

General plan:

Review different models of optimization in ML, relations between them
Statistical estimation: concentration of measure, information theoretic lower bounds
Online optimization: online convex optimization, design & analysis of algorithms, mirror-descent, adaptive optimization (e.g. AdaGrad), prediction with expert advice, limited feedback models, …
Different approaches to out-of-sample performance: uniform convergence, covering numbers, Rademacher complexity, regularization, algorithmic stability, online to batch (SGD), …
Some (surprising?) phenomena compared to traditional settings, e.g. PAC in classical learning theory

This lecture: