Recap and context

In the previous lecture,

Intuitively, the reason why moving from a point $x_t$ along the negative gradient $-\nabla f(x_t)$ does not necessarily lead to a descent is because the linear approximation to $f$ at $x_t$, defined by the gradient at $x_t$, could be a quite bad approximation as we move only slightly away from $x_t$.

Smoothness

The next notion we introduce is aimed precisely at quantifying how quickly does the quality of the linear approximation decreases as move locally.

<aside> 💡 Definition: Smooth function

A differentiable function $f$ is $\boldsymbol\beta$-smooth over $S \subseteq \mathrm{dom} f$ if for all $x,y \in S$:

$$ \begin{align*} -\frac{\beta}{2} \|y-x\|^2 \leq f(y) - f(x) - \nabla f(x) \cdot (y-x) \leq \frac{\beta}{2} \|y-x\|^2 . \end{align*} $$

</aside>

Untitled

Examples and basic properties