Chapter 4: PyTorch for Automatic Gradient Descent

Automatic Gradient Descent

face Luca Grillotti

Recap

So far we saw how to use the backward() method to compute the gradient automatically. For instance, the code below evaluates the gradient of the loss function L(\theta) = \theta^2 at the value \theta_0 = 1.

import torch

tensor_0 = torch.Tensor([1])
theta = torch.nn.Parameter(tensor_0)

loss = theta * theta
print(theta.grad)
loss.backward()
print(theta.grad)
None
tensor([2.])

So, now we have the gradient \dfrac{\partial L}{\partial \theta} automatically computed.

But…

But we still have to perform gradient descent manually!

\theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}

If we try to print \theta after running the code above, we can see that its value has not changed since its initialisation:

print(theta)
Parameter containing:
tensor([1.], requires_grad=True)

So this means that \theta has not yet been updated.

What we would like to have: automatic gradient descent!

Given that we have \dfrac{\partial L}{\partial \theta}(\theta) automatically computed, can we also have \theta updated to minimise the loss using gradient descent?

\theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}

We will see how to do this on the next page!