Recap and context
- So far, we have focused on deterministic gradient algorithms (mostly just gradient descent).
- Focus of today: whether and how can randomization help.
- First, we will consider randomization in the optimization algorithm; later we will study cases where the optimization problem itself is randomized.
Finite sum optimization
Let us see a simple and important case where randomized algorithms can be useful. Consider an optimization problem with an objective given as an average of functions,
$$
f(x) = \frac1n \sum_{i=1}^n f_i(x),
$$
where $f_1,\ldots,f_n : S \to \R$ are convex. ($S \subseteq \R^d$ is a convex domain.)
- We will assume we can compute (sub-)gradients of each individual $f_i$. Namely, we assume we are given (sub-)gradient oracle access to all individual components in the sum.
- Equivalently, we can define $f$ as a sum rather than an average: this will only scale everything (including converge rates we obtain) by a factor of $n$.
Examples
Finite sum problems are extremely common in machine learning (where they are called “empirical risk minimization” problems).