Introduction to Deep Learning with PyTorch
Chapter 4: PyTorch for Automatic Gradient Descent
Exercise: Automatic Gradient Descent
Earlier, you implemented the function gradient_descent_torch_autograd
, performing gradient descent on
torch tensors with automatic computation of gradients.
But before, you were still performing gradient descent by hand!
Exercise 1: Automatic Gradient Descent
Try implementing a function gradient_descent_torch_optimiser(...)
that does the same thing
as gradient_descent_torch_autograd
, but using the automatic update from the SGD
class of torch.optim
def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
"""
Args:
initial_theta (torch.Tensor): Initial value of theta
learning_rate (float)
number_steps (int): number of 1-step gradient descent to perform.
Returns:
final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
"""
def gradient_descent_torch_optimiser(initial_theta, learning_rate, number_steps):
"""
Args:
initial_theta (torch.Tensor): Initial value of theta
learning_rate (float)
number_steps (int): number of 1-step gradient descent to perform.
Returns:
final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
"""
tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
for _ in range(number_steps):
optimizer.zero_grad()
loss = tensor * tensor
loss.backward()
optimizer.step()
print(tensor)
return tensor
Exercise 2: A different loss function
So far, we only considered the square function as loss function. Try implementing a
function gradient_descent_norm_torch_optimiser(...)
that performs automatic gradient descent on the loss:
L(\theta) = \| \theta \|_2^2, where \theta is a 2-dimensional vector.
In other words, L(\theta)=\theta_1^2+\theta_2^2.
Hint: You only need to change the definition of the loss, so that it works for this 2-dimensional vector.
def gradient_descent_norm_torch_optimiser(initial_theta, learning_rate, number_steps):
"""
Args:
initial_theta (torch.Tensor): Initial value of theta
learning_rate (float)
number_steps (int): number of 1-step gradient descent to perform.
Returns:
final_theta (torch.Tensor): Final value of theta after several gradient descents performed with the SGD torch optimiser.
"""
tensor = torch.nn.Parameter(initial_theta, requires_grad=True)
optimizer = torch.optim.SGD(params=[tensor], lr=learning_rate)
for _ in range(number_steps):
optimizer.zero_grad()
loss = torch.sum(tensor * tensor) # corresponding to the sum of squares of components of the tensor
loss.backward()
optimizer.step()
print(tensor)
return tensor
initial_tensor = torch.Tensor([1,-1]) # for example
gradient_descent_norm_torch_optimiser(initial_tensor,
number_steps=20,
learning_rate=0.2)