Scope and context

In previous lectures:

we saw that by direct (covering numbers) arguments, the uniform convergence rate of stochastic optimization is roughly $O(\sqrt{d/n})$ in dimension $d$ and with sample size $n$
in fact, this holds regardless of convexity and only requires a Lipschitz condition and a bounded optimization domain
with convexity, this dimension dependent rate doesn’t improve and is inferior to what we got from online-to-batch regret analysis
we also saw that for GLMs the uniform convergence rate becomes dimension independent

This lecture deals with some of the remaining questions:

Does the uniform convergence rates in SCO really depend on dimension? (or is it a weakness of our upper bound technique)
Can we go beyond uniform convergence rates and analyze specific algorithms more tightly?

Recap & Setup for this lecture

We will consider the general convex case of stochastic optimization:

<aside> 🚧 Stochastic Convex Optimization (SCO)

Given:

convex optimization domain $W \subseteq \R^d$, arbitrary sample space $Z$
loss function $f : W \times Z \to \R$, convex in $w$
sample $S = \{ z_1, \ldots, z_n \}$ drawn iid (unknown) population distribution $\cal D$ over $Z$

Goal: minimize:

$$ \newcommand{\E}{\mathbb E}

\begin{aligned} F(w) = \E_{z \sim D}[f(w,z)] \end{aligned} $$

</aside>

Notation/terminology: