Scope and context
In previous lectures:
- we saw that by direct (covering numbers) arguments, the uniform convergence rate of stochastic optimization is roughly $O(\sqrt{d/n})$ in dimension $d$ and with sample size $n$
- in fact, this holds regardless of convexity and only requires a Lipschitz condition and a bounded optimization domain
- with convexity, this dimension dependent rate doesn’t improve and is inferior to what we got from online-to-batch regret analysis
- we also saw that for GLMs the uniform convergence rate becomes dimension independent
This lecture deals with some of the remaining questions:
- Does the uniform convergence rates in SCO really depend on dimension? (or is it a weakness of our upper bound technique)
- Can we go beyond uniform convergence rates and analyze specific algorithms more tightly?
Recap & Setup for this lecture
We will consider the general convex case of stochastic optimization:
<aside>
🚧 Stochastic Convex Optimization (SCO)
Given:
- convex optimization domain $W \subseteq \R^d$, arbitrary sample space $Z$
- loss function $f : W \times Z \to \R$, convex in $w$
- sample $S = \{ z_1, \ldots, z_n \}$ drawn iid (unknown) population distribution $\cal D$ over $Z$
Goal: minimize:
$$
\newcommand{\E}{\mathbb E}
\begin{aligned}
F(w) = \E_{z \sim D}[f(w,z)]
\end{aligned}
$$
</aside>
Notation/terminology: