Overview
Course in theory of ML and optimization in ML, broad topic:
<aside>
Out-of-sample, aka generalization performance of optimization algorithms in ML
</aside>
- How to define out-of-sample performance
- Algorithmic toolbox for designing optimization methods that “generalize”
- Fundamental limits and unexpected pitfalls
General plan:
- Review different models of optimization in ML, relations between them
- Statistical estimation: concentration of measure, information theoretic lower bounds
- Online optimization: online convex optimization, design & analysis of algorithms, mirror-descent, adaptive optimization (e.g. AdaGrad), prediction with expert advice, limited feedback models, …
- Different approaches to out-of-sample performance: uniform convergence, covering numbers, Rademacher complexity, regularization, algorithmic stability, online to batch (SGD), …
- Some (surprising?) phenomena compared to traditional settings, e.g. PAC in classical learning theory
This lecture: