Chapter 5: Training a Linear Model with PyTorch

Vectorising your computations

face Luca Grillotti

When manipulating tensors, for loops are usually very time-consuming.

Instead, we prefer vectorising our operations as much as possible.

PyTorch and other Machine Learning libraries are optimised to make vectorised operations way faster. In particular, vectorised operations can benefit hardware acceleration provided by GPUs and TPUs.

To start vectorising our operations, let’s first put all our data in the same tensors:

import torch

tensor_number_tasks = torch.Tensor([[1.],
                                    [2.],
                                    [4.],
                                    [4.],
                                    [5.],
                                    [6.],
                                    [6.],
                                    [6.],
                                    [8.],
                                    [8.],
                                    [9.],
                                    [10.]])

tensor_number_questions = torch.Tensor([[5.],
                                        [11.],
                                        [21.],
                                        [22.],
                                        [26.],
                                        [31.],
                                        [32.],
                                        [31.],
                                        [41.],
                                        [42.],
                                        [48.],
                                        [52.]])

Here is a simpler way to initialise the tensors above:

tensor_number_tasks = torch.Tensor([1, 2, 4, 4, 5, 6, 6, 6, 8, 8, 9, 10]).view(-1, 1)
tensor_number_questions = torch.Tensor([5, 11, 21, 22, 26, 31, 32, 31, 41, 42, 48, 52]).view(-1, 1)

You can then calculate the estimators of the number of estimator_number_questions for all number of tensor_number_tasks at the same time!

estimator_number_questions = net.forward(tensor_number_tasks)

And following the formula of the loss provided before (L(\theta) = \dfrac{1}{12} \sum_{i=0}^{11} (\widehat{n_Q^{(i)}} - n_Q^{(i)})^2):

difference_tensor = (estimator_number_questions - tensor_number_questions) 
squared_difference_tensor = difference_tensor * difference_tensor
loss_tensor = squared_difference_tensor.mean()

Exercise

Try to vectorise the code you were using for training ModelNumberQuestions.

The signature of your train_parameters_linear_regression function should become:

def train_parameters_linear_regression(tensor_number_tasks, tensor_number_questions, learning_rate=0.02, number_training_steps=200):
    """
    Instantiate ModelNumberQuestions model and optimises the parameters of the model, given the dataset
    of tensor_number_tasks and tensor_number_tasks.

    Args:
        tensor_number_tasks (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        tensor_number_questions (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        learning_rate (float):
        number_training_steps (int):

    Returns:
        trained network (ModelNumberQuestions)
    """

Our function compute_loss can be vectorised!

def compute_loss(tensor_number_tasks, tensor_number_questions, model_number_questions):
    # computing estimator number questions for all data samples present in tensor_number_tasks
    estimator_number_questions = model_number_questions(tensor_number_tasks)

    # computing squared error for all data samples
    error = estimator_number_questions - tensor_number_questions
    squared_error = error * error

    # computing mean squared error.
    mse_loss = torch.mean(squared_error)

    return mse_loss

Similarly, we also vectorise train_parameters_linear_regression:

def train_parameters_linear_regression(tensor_number_tasks, tensor_number_questions, learning_rate=0.02, number_training_steps=200):
    """
    Instantiate ModelNumberQuestions model and optimises the parameters of the model, given the dataset
    of tensor_number_tasks and tensor_number_tasks.

    Args:
        tensor_number_tasks (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        tensor_number_questions (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        learning_rate (float):
        number_training_steps (int):

    Returns:
        trained network (ModelNumberQuestions)
    """

    net = ModelNumberQuestions()

    optimiser = torch.optim.SGD(net.parameters(), lr=learning_rate)

    for _ in range(number_training_steps):
        optimiser.zero_grad()

        mse_loss = compute_loss(tensor_number_tasks, tensor_number_questions, model_number_questions=net)

        mse_loss.backward()  # Compute gradients
        optimiser.step()  # Perform 1-step gradient descent.

        print("loss:", mse_loss.item())

    print("Final Parameters:\n", list(net.named_parameters()))

    return net