Chapter 4: PyTorch for Automatic Gradient Descent

Automatic Gradient Calculation with PyTorch

face Luca Grillotti

Let’s consider (once again ^^) the squared function L(\theta) = \theta^2, and we would like to calculate the gradient \dfrac{\partial L}{\partial \theta}(\theta_0), where \theta_0 = 1.

import torch

tensor_0 = torch.Tensor([1])
theta_0 = torch.nn.Parameter(tensor_0)

loss = theta_0 * theta_0
print(loss)
tensor([1.], grad_fn=<MulBackward0>)

Illustration Multiplication Operator Graph

grad attribute

Each parameter has a grad attribute. Let’s have a look at its value:

print(theta_0.grad)
None

Absolutely no worries! It is completely normal that theta_0.grad equals None right now! We did not tell torch which gradient we wanted to calculate. ☺️

Calculating the gradient of the loss

As said before, we would like to calculate the gradient of the loss L(\cdot) with respect to \theta, and evaluate that value at \theta_0 = 1: \dfrac{\partial L}{\partial \theta}(\theta_0).

To do so, we simply need to add the following line:

loss.backward()

The method backward() will propagate the gradient of the loss in its computation graph.

Illustration loss.backward()

And now if we try to print out theta_0.grad, we get:

tensor([2.])

which corresponds to the value of \dfrac{\partial L}{\partial \theta}(\theta_0) = 2\theta_0 (where \theta_0 = 1) .

In other words, now we can compute gradients automatically!