Introduction to Deep Learning with PyTorch
Chapter 2: Gradient Descent
Gradient Descent: Main Principle
Consider a function L that we intend to minimise. We will call such functions Loss functions.
One basic first-order optimisation algorithm for minimising L is called gradient descent.
We call \theta the parameters of the loss function L.
We also call \theta^* the value of \theta that minimises L.
The purpose of gradient descent is to iteratively update \theta to make it converge to \theta^*.
This update of \theta takes the following form:
\theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}
where:
- \lambda represents the learning rate of the gradient descent.
- \dfrac{\partial L}{\partial \theta} represents the gradient of L with respect to its parameters \theta.
In the next page, we will illustrate the following update rule in a 1-dimensional problem.