This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 4: PyTorch for Automatic Gradient Descent

The Secrets of torch Parameters

face Luca Grillotti

torch provides very convenient tools to estimate the value of a gradient. The torch Parameter stores the operations applied to it (such as: multiplication, addition, exp…)

import torch

tensor = torch.Tensor([[1, 2],
                       [3, 4]])
parameters = torch.nn.Parameter(tensor)
print(parameters)

The code above produces the following output:

Parameter containing:
tensor([[1., 2.],
        [3., 4.]], requires_grad=True)

The requires_grad=True indicates that the parameters will keep track of all the operations applied to them, so that the gradients can be computed automatically when needed.

Dynamic Building Graph of Operations

Everytime we perform an operation on a torch Parameter, that operation is recorded in each computed tensor. Those operations are recorded until the loss function.

For example, let’s see what happens after an addition:

import torch

tensor = torch.Tensor([[1, 2],
                       [3, 4]])
parameters = torch.nn.Parameter(tensor)

temp = parameters + 5
print(temp)
tensor([[6., 7.],
        [8., 9.]], grad_fn=<AddBackward0>)

Illustration Addition Operator Graph

the grad_fn=<AddBackward0> indicates that it remembers the last operation performed on the parameters (in this case: an addition with 5).

The same phenomenon happens with torch mathematical functions, such as torch.exp

import torch

tensor = torch.Tensor([[1, 2],
                       [3, 4]])
parameters = torch.nn.Parameter(tensor)

temp = torch.exp(parameters)
print(temp)
tensor([[ 2.7183,  7.3891],
        [20.0855, 54.5982]], grad_fn=<ExpBackward0>)

Illustration Exponential Operator Graph