Introduction to Deep Learning with PyTorch
Chapter 2: Gradient Descent
Exercise: Let's Implement Gradient Descent!
Gradient Descent
Remember the single-step Gradient Descent formula we just saw ?
\theta \leftarrow \theta - \lambda \dfrac{\partial L}{\partial \theta}
where:
- \theta are the inputs of the loss function L.
- \lambda represents the learning rate of the gradient descent.
- \dfrac{\partial L}{\partial \theta} represents the gradient of L with respect to its parameters \theta.
Exercise
Single-step gradient-descent
Let’s try to implement that formula!
As a loss, we consider the square function L(\theta) = \theta^2 where \theta is a 1-dimensional variable.
Try to implement a function that implements the update rule given above:
def single_step_gradient_descent(theta, learning_rate):
"""
Args:
theta (float)
learning_rate (float)
Returns:
new_theta (float): updated value of theta after 1 step of gradient descent.
"""
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Gradient Descent
Once you’ve finished implementing single_step_gradient_descent
, you can implement a full Gradient Descent:
def gradient_descent(initial_theta, learning_rate, number_steps):
"""
Args:
initial_theta (float): Initial value of theta
learning_rate (float)
number_steps (int): number of 1-step gradient descent to perform.
Returns:
final_theta (float): Final value of theta after multiple 1-step gradient descents
"""
gradient_descent
simply iteratively performs several 1-step gradient descent.
You can test your function gradient_descent
with:
- a learning rate of 0.2
- 20 steps
- an initial theta of 1.
Do not hesitate to print the value of the updated \theta after each iteration ;)
Once you’ve finished, try running your gradient_descent
function with very low (~0.001) and very high (~10) learning rates.
What do you notice?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
A low learning rate reduces the speed of convergence; whereas with a high learning rate, there are greater risks of divergence.