This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 4: PyTorch for Automatic Gradient Descent

Exercise: Automatic Gradient Descent

face Luca Grillotti

A few chapters before, you implemented the function gradient_descent_torch_autograd, performing a gradient descent on torch tensors with automatic computation of gradients.

But before, you were performing the gradient descents by hand!

Exercise 1

Try implementing a function gradient_descent_torch_optimiser(...) that does the same thing as gradient_descent_torch_autograd, but using the automatic update from the SGD class of torch.optim

def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """
    tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
    optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
    for _ in range(number_steps):
        optimizer.zero_grad()
        loss = tensor * tensor
        loss.backward()
        optimizer.step()
        print(tensor)
    return tensor

Exercise 2

So far, we only considered the square function as loss function. Try implementing a function gradient_descent_norm_torch_optimiser(...) that performs automatic gradient descent on the loss:

L(\theta) = \| \theta \|_2^2, where \theta is a 2-dimensional vector.

In other words, L(\theta)=\theta_1^2+\theta_2^2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def gradient_descent_norm_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """
    tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
    optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
    for _ in range(number_steps):
        optimizer.zero_grad()
        loss = torch.sum(tensor * tensor) # corresponding to the sum of squares of components of the tensor
        loss.backward()
        optimizer.step()
        print(tensor)
    return tensor

initial_tensor = torch.Tensor([1,-1])  # for example
gradient_descent_norm_torch_optimiser(initial_tensor,
                                      number_steps=20,
                                      learning_rate=0.2)