This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 2: Gradient Descent

Gradient Descent: Main Principle

face Luca Grillotti

Consider a function L that we intend to minimise. In the following sections, those functions that we intend to minimise take the name of Loss functions.

One basic first-order optimisation algorithm for minimising L is called gradient descent.

We call \theta the parameters of the loss function L. We also call \theta^* the value of \theta minimising L. The purpose of gradient descent is to iteratively update \theta to make it converge to \theta^*.

This update of \theta takes the following form: \theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}

where:

  • \lambda represents the learning rate of the gradient descent.
  • \dfrac{\partial L}{\partial \theta} represents the gradient of L with respect to its parameters \theta.

In the following sections, we will illustrate the following update rule in a 1-dimensional problem.