Recap and context

In the previous lecture,

we introduced basic gradient methods (for convex and Lipschitz objectives) and covered (sub)gradient descent and its projected variant;
we saw that despite the name “gradient descent”, these methods do not necessarily descend in function values — namely, they are not monotone.

Intuitively, the reason why moving from a point $x_t$ along the negative gradient $-\nabla f(x_t)$ does not necessarily lead to a descent is because the linear approximation to $f$ at $x_t$, defined by the gradient at $x_t$, could be a quite bad approximation as we move only slightly away from $x_t$.

Smoothness

The next notion we introduce is aimed precisely at quantifying how quickly does the quality of the linear approximation decreases as move locally.

<aside> 💡 Definition: Smooth function

A differentiable function $f$ is $\boldsymbol\beta$-smooth over $S \subseteq \mathrm{dom} f$ if for all $x,y \in S$:

$$ \begin{align*} -\frac{\beta}{2} \|y-x\|^2 \leq f(y) - f(x) - \nabla f(x) \cdot (y-x) \leq \frac{\beta}{2} \|y-x\|^2 . \end{align*} $$

</aside>

Untitled

In words: $f$ is $\beta$-smooth iff the difference between $f$ and its linear approximation at $x$ is bounded by a quadratic function with leading coefficient $\beta$. That is, for small $\beta$, the linear approximation defined by the gradient $\nabla f(x)$ is a good approximation around $x$.
Note that if $f$ is also convex, the lower bound in the theorem is redundant. (Why?)
Smoothness is unrelated to convexity: there are smooth non-convex functions, as there are convex non-smooth functions. (Examples?)

Recap and context

Smoothness

Examples and basic properties