Context and scope
In previous lectures—
- we established that $O(\sqrt{T})$ regret is achievable in OCO in a very broad sense
- we saw that this implies $O(1/\sqrt{n})$ convergence rate in stochastic optimization ($n$ is the sample size), through online-to-batch conversions
- equivalently, the obtained stochastic optimization algorithms reach $\varepsilon$ error with sample complexity $O(1/\varepsilon^2)$
In this lecture:
- how good are these stochastic optimization / statistical learning algorithms?
- discuss basic upper and lower bounds in statistical estimation / learning
- see how to derive bounds that hold with high probability (not just in expectation)
- see how they relate to what we obtain from online algorithms
On the way, we’ll see several important concepts and techniques:
- sub-Gaussian random variables
- concentration of measure and Hoeffding bounds
- basics of information theory, the KL-divergence
- information theoretic lower bounds