Context and scope
In the previous lectures—
- we saw that this implies $O(1/\sqrt{n})$ convergence rate in stochastic optimization ($n$ is the sample size), through online-to-batch conversions
- equivalently, the obtained stochastic optimization algorithms reach $\varepsilon$ error with sample complexity $O(1/\varepsilon^2)$
- discussed matching lower bounds through information theoretic arguments
- basics of information theory, the KL-divergence, chain rule…
In this lecture:
- discuss classical matching upper bounds through statistical estimation
- see how to derive bounds that hold with high probability (not just in expectation)
- see how they relate to what we obtain from online algorithms
On the way, we’ll see several important concepts and techniques:
- sub-Gaussian random variables
- concentration of measure and Hoeffding bounds
Statistical estimation upper bounds