Recap and context

So far, we have focused on deterministic gradient algorithms (mostly just gradient descent).
Focus of today: whether and how can randomization help.
First, we will consider randomization in the optimization algorithm; later we will study cases where the optimization problem itself is randomized.

Finite sum optimization

Let us see a simple and important case where randomized algorithms can be useful. Consider an optimization problem with an objective given as an average of functions,

$$ f(x) = \frac1n \sum_{i=1}^n f_i(x), $$

where $f_1,\ldots,f_n : S \to \R$ are convex. ($S \subseteq \R^d$ is a convex domain.)

We will assume we can compute (sub-)gradients of each individual $f_i$. Namely, we assume we are given (sub-)gradient oracle access to all individual components in the sum.
Equivalently, we can define $f$ as a sum rather than an average: this will only scale everything (including converge rates we obtain) by a factor of $n$.

Examples

Finite sum problems are extremely common in machine learning (where they are called “empirical risk minimization” problems).