In this lecture we move on to see efficient algorithms for computing (approximately) solutions to convex optimization problems. We will assume throughout the lecture that the optimization objective is specified by a black-box gradient (first-order) oracle.

Gradient descent

First, let us consider unconstrained minimization of $f: \R^d \to \R$.

Recall that the negative gradient $-\nabla f(x)$ gives the direction of steepest descent of $f$ at the point $x$. Thus, a very natural “greedy” approach for minimizing $f$ is via an iterative process:

<aside> ⚙ Algorithm: Gradient Descent (GD)

Starting from $x_1 \in \R^d$ with step size $\eta>0$, do:

$$ %\begin{aligned} %g_t \quad&= \nabla f(x_t); %\\ x_{t+1} = x_t - \eta \nabla f(x_t) %\end{aligned} \qquad\qquad t=1,2,\ldots. $$

</aside>

Gradient descent dates back to Cauchy (circa 1840’s), but the analysis we will present here was only laid out in as late as the 1940’s. Here we will prove:

<aside> 💡 Theorem:
Assume that $\|x_1-x^*\| \leq D$ and set $\eta = D/(G\sqrt{T})$. Then for the average iterate of gradient descent, we have:

$$ f(\bar x_T) - f(x^*) \leq \frac{GD}{\sqrt{T}}, \qquad\text{for}\qquad

\bar x_T = \frac1T \sum_{t=1}^T x_t . $$

</aside>