Recap & scope

We revisit Stochastic Optimization:

<aside> 🚧 Stochastic Optimization (SO)

Goal:

$$ \newcommand{\E}{\mathbb E}

\begin{aligned} \min_{w \in W} \; F_D(w) = \E_{z \sim D}[f(w,z)] \end{aligned} $$

given sample $S$ of $n$ examples $z_1, \ldots, z_n \overset{iid}{\sim} D$

</aside>

Previously:

We saw concrete algorithms, based on Online Mirror Descent, to solve SO
We established information-theoretic optimality of the rates we obtained

This lecture: explore the statistical learning view to SO

approach SO through standard notions of empirical risk, generalization gap, …
gain insights about algorithms beyond (single pass) online-to-batch, e.g. ERM

Setup for this lecture

The canonical setting of statistical learning is essentially a slight abstraction of SO, where we allow for a generic “hypothesis class” (which is not necessarily represented as a subset of $\R^d$):

<aside> 🚧

Statistical learning:

Setup:

population distribution $\cal D$ over instance/sample space $\cal Z$

(think of $\cal Z = X \times Y$ in the usual prediction setup)
function/hypothesis class $\cal H$
loss function $\ell : {\cal H} \times {\cal Z} \to \mathbb{R}$

Goal:

given an iid sample $S$ of $n$ instances $z_1,\ldots,z_n \overset{iid}{\sim} \cal D$, solve

$$ \newcommand{\E}{\mathbb E}

\begin{aligned} \min_{h \in {\cal H}} \; L(h) = \E_{z \sim {\cal D}}[\ell(h, z)] \end{aligned} $$

</aside>