Context and scope

In the previous lectures—

we saw that this implies $O(1/\sqrt{n})$ convergence rate in stochastic optimization ($n$ is the sample size), through online-to-batch conversions
equivalently, the obtained stochastic optimization algorithms reach $\varepsilon$ error with sample complexity $O(1/\varepsilon^2)$
discussed matching lower bounds through information theoretic arguments
basics of information theory, the KL-divergence, chain rule…

In this lecture:

discuss classical matching upper bounds through statistical estimation
see how to derive bounds that hold with high probability (not just in expectation)
see how they relate to what we obtain from online algorithms

On the way, we’ll see several important concepts and techniques:

sub-Gaussian random variables
concentration of measure and Hoeffding bounds

Statistical estimation upper bounds