Context & scope

Last lecture:

introduced Online Convex Optimization (OCO)
saw general purpose algorithm, Online Gradient Descent (OGD), with $O(\sqrt{T})$ regret

This lecture will cover:

one of the most important instances of OCO: the fundamental “prediction with expert advice” problem
the Multiplicative Weights (MW) aka “Hedge” algorithm and its relation to OCO
regularization in OCO and Online Mirror Descent

Learning from Expert Advice

One of the most fundamental problems in ML (and arguably more broadly in CS):

<aside> 🚧 The Experts problem:

For $t=1,2,\ldots,T$:

Learner’s goal is to minimize regret compared to best expert in hindsight:

$$ R_T = \sum_{t=1}^T \ell_t(i_t) - \min_{i^* \in [N]} \sum_{t=1}^T \ell_t(i^*) $$

</aside>

In general, we would like to avoid placing assumptions on the experts losses $\ell_t$ (besides being bounded in $[0,1]$); i.e. we can think of them as chosen by an adversary
For simplicity, we often consider weaker oblivious adversaries that do not adapt/react to the decisions of the player, but are otherwise arbitrary