Recap & context
- Last lecture we established the importance of convexity in classical optimization and saw a simpler subgradient algorithm for general convex optimization
- Here we continue developing algorithms for convex optimization, but in the more general model of online learning / online optimization
- We will cover the online convex optimization framework, regularization, strong convexity, and online to batch conversion
Online Convex Optimization
- Online optimization model where loss functions $f_1,\ldots,f_T$ are convex functions over a convex decision set $W \subseteq \R^d$
- Recall: modeled as a game between learner/player and environment/adversary
<aside>
đźš§ Online convex optimization (OCO)
For $t=1,2,\ldots,T$:
- player/learner choses decision $w_t \in W$
- adversary chooses convex loss function $f_t : W \mapsto \R$
- player incurs loss $f_t(w_t)$ and observes $f_t$ as feedback
Player’s goal is to minimize regret:
$$
R_T = \sum_{t=1}^T f_t(w_t) - \min_{w^* \in W} \sum_{t=1}^T f_t(w^*)
$$
</aside>
- equivalently, we can denote the adversary’s choice by $z_t \in \Z$ and loss functions by $f_t(w) = f(w,z_t)$ for a fixed and known in advance $f : W \times Z \to \R$
- regret is comparative benchmark**:** goal is to compete with single best decision $w^*$ in hindsight
- if $R_T = o(T)$ (equivalently $R_T/T \to 0$ as $T \to \infty$) then player’s performance (in terms of instantaneous loss) improves with time