Context and scope

In previous lectures—

we established that $O(\sqrt{T})$ regret is achievable in OCO in a very broad sense
we saw that this implies $O(1/\sqrt{n})$ convergence rate in stochastic optimization ($n$ is the sample size), through online-to-batch conversions
equivalently, the obtained stochastic optimization algorithms reach $\varepsilon$ error with sample complexity $O(1/\varepsilon^2)$

In this lecture:

how good are these stochastic optimization / statistical learning algorithms?
discuss basic upper and lower bounds in statistical estimation / learning
see how to derive bounds that hold with high probability (not just in expectation)
see how they relate to what we obtain from online algorithms

On the way, we’ll see several important concepts and techniques:

sub-Gaussian random variables
concentration of measure and Hoeffding bounds
basics of information theory, the KL-divergence
information theoretic lower bounds