We revisit Stochastic Optimization:
<aside> 🚧 Stochastic Optimization (SO)
Goal:
$$ \newcommand{\E}{\mathbb E}
\begin{aligned} \min_{w \in W} \; F_D(w) = \E_{z \sim D}[f(w,z)] \end{aligned} $$
given sample $S$ of $n$ examples $z_1, \ldots, z_n \overset{iid}{\sim} D$
</aside>
Previously:
This lecture: explore the statistical learning view to SO
The canonical setting of statistical learning is essentially a slight abstraction of SO, where we allow for a generic “hypothesis class” (which is not necessarily represented as a subset of $\R^d$):
<aside> 🚧
Statistical learning:
Setup:
population distribution $\cal D$ over instance/sample space $\cal Z$
(think of $\cal Z = X \times Y$ in the usual prediction setup)
function/hypothesis class $\cal H$
loss function $\ell : {\cal H} \times {\cal Z} \to \mathbb{R}$
Goal:
given an iid sample $S$ of $n$ instances $z_1,\ldots,z_n \overset{iid}{\sim} \cal D$, solve
$$ \newcommand{\E}{\mathbb E}
\begin{aligned} \min_{h \in {\cal H}} \; L(h) = \E_{z \sim {\cal D}}[\ell(h, z)] \end{aligned} $$
</aside>