Chapter 4: PyTorch for Automatic Gradient Descent

Exercise: Automatic Gradient Descent

face Luca Grillotti

Earlier, you implemented the function gradient_descent_torch_autograd, performing gradient descent on torch tensors with automatic computation of gradients.

But before, you were still performing gradient descent by hand!

Exercise 1: Automatic Gradient Descent

Try implementing a function gradient_descent_torch_optimiser(...) that does the same thing as gradient_descent_torch_autograd, but using the automatic update from the SGD class of torch.optim

def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """

def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """
    tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
    optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
    for _ in range(number_steps):
        optimizer.zero_grad()
        loss = tensor * tensor
        loss.backward()
        optimizer.step()
        print(tensor)
    return tensor

Exercise 2: A different loss function

So far, we only considered the square function as loss function. Try implementing a function gradient_descent_norm_torch_optimiser(...) that performs automatic gradient descent on the loss:

L(\theta) = \| \theta \|_2^2, where \theta is a 2-dimensional vector.

In other words, L(\theta)=\theta_1^2+\theta_2^2.

Hint: You only need to change the definition of the loss, so that it works for this 2-dimensional vector.

def gradient_descent_norm_torch_optimiser(initial_theta, learning_rate, number_steps):
    """
    Args:
        initial_theta (torch.Tensor): Initial value of theta
        learning_rate (float)
        number_steps (int): number of 1-step gradient descent to perform.

    Returns:
        final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
    """
    tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
    optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
    for _ in range(number_steps):
        optimizer.zero_grad()
        loss = torch.sum(tensor * tensor) # corresponding to the sum of squares of components of the tensor
        loss.backward()
        optimizer.step()
        print(tensor)
    return tensor

initial_tensor = torch.Tensor([1,-1])  # for example
gradient_descent_norm_torch_optimiser(initial_tensor,
                                      number_steps=20,
                                      learning_rate=0.2)