Context & scope
Last lecture:
- introduced Online Convex Optimization (OCO)
- saw general purpose algorithm, Online Gradient Descent (OGD), with $O(\sqrt{T})$ regret
This lecture will cover:
- one of the most important instances of OCO: the fundamental “prediction with expert advice” problem
- the Multiplicative Weights (MW) aka “Hedge” algorithm and its relation to OCO
- regularization in OCO and Online Mirror Descent
Learning from Expert Advice
One of the most fundamental problems in ML (and arguably more broadly in CS):
<aside>
🚧 The Experts problem:
For $t=1,2,\ldots,T$:
- learner choses one of $N$ experts, $i_t \in [N]$
- experts losses are revealed: $\ell_t \in [0,1]^N$
- player incurs loss of chosen expert, $\ell_t(i_t)$
Learner’s goal is to minimize regret compared to best expert in hindsight:
$$
R_T = \sum_{t=1}^T \ell_t(i_t) - \min_{i^* \in [N]} \sum_{t=1}^T \ell_t(i^*)
$$
</aside>
- In general, we would like to avoid placing assumptions on the experts losses $\ell_t$ (besides being bounded in $[0,1]$); i.e. we can think of them as chosen by an adversary
- For simplicity, we often consider weaker oblivious adversaries that do not adapt/react to the decisions of the player, but are otherwise arbitrary