Introduction to Deep Learning with PyTorch > Gradient Descent: Main Principle | Python Programming (70053 Autumn Term 2021/2022) | Department of Computing

Introduction to Deep Learning with PyTorch

Chapter 2: Gradient Descent

Gradient Descent: Main Principle

face Luca Grillotti

Consider a function $L$ that we intend to minimise. In the following sections, those functions that we intend to minimise take the name of Loss functions.

One basic first-order optimisation algorithm for minimising $L$ is called gradient descent.

We call $\theta$ the parameters of the loss function $L$ . We also call $\theta^*$ the value of $\theta$ minimising $L$ . The purpose of gradient descent is to iteratively update $\theta$ to make it converge to $\theta^*$ .

This update of $\theta$ takes the following form: $\theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}$

where:

$\lambda$ represents the learning rate of the gradient descent.
$\dfrac{\partial L}{\partial \theta}$ represents the gradient of L with respect to its parameters $\theta$ .

In the following sections, we will illustrate the following update rule in a 1-dimensional problem.