This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 7: Building and Training a simple Classification Model

Training our Classifier

face Luca Grillotti

Remember the code we had for training our linear model?

def train_parameters_linear_regression(tensor_number_tasks,
                                       tensor_number_questions,
                                       learning_rate=0.02,
                                       number_training_steps=200):
    """
    Instantiate ModelNumberQuestions model and Loss, and optimises the parameters of the model, given the dataset
    of tensor_number_tasks and tensor_number_tasks.

    Args:
        tensor_number_tasks (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        tensor_number_questions (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
        learning_rate (float):
        number_training_steps (int):

    Returns:
        trained network (ModelNumberQuestions)
    """
    net = ModelNumberQuestions()  # model
    loss = torch.nn.MSELoss()  # loss module

    optimiser = torch.optim.SGD(net.parameters(), lr=learning_rate)

    for _ in range(number_training_steps):
        optimiser.zero_grad()

        # Compute Loss
        estimator_number_questions = net.forward(tensor_number_tasks)
        mse_loss = loss.forward(input=estimator_number_questions,
                                target=tensor_number_questions)

        mse_loss.backward()
        optimiser.step()
        print("loss:", mse_loss.item())

    print("Final Parameters:\n", list(net.named_parameters()))
    return net

Well the code we have for training our classifier is super similar! But we will make some slight adjustments:

Loss

Previously, we said that we use a cross-entropy loss cross-entropy loss.

cross_entropy_loss = torch.nn.CrossEntropyLoss()

Which Optimiser should we use?

So far, we mostly relied on the Stochastic Gradient Descent SGD optimiser.

Here we use the much more common Adam Optimiser

optimiser = Adam(net.parameters())

Training Loop

In the end our training code looks like this (we refactored it slightly):

def train_classifier(mnist_classifier, loss, optimiser, dataset_images, dataset_labels, number_training_steps):
    for _ in range(number_training_steps):
        optimiser.zero_grad()

        # Compute Loss
        estimator_number_questions = mnist_classifier(dataset_images)
        value_loss = loss.forward(input=estimator_number_questions,
                                target=dataset_labels)

        value_loss.backward()
        optimiser.step()
        print("loss:", value_loss.item())

And we declare our loss, model, and optimiser outside the function:

mnist_classifier = MNISTClassifier()
loss = torch.nn.CrossEntropyLoss()
optimiser = torch.optim.Adam(mnist_classifier.parameters())

Then, we just need a simple call to train_classifier and the work is done! :D

train_classifier(mnist_classifier,
                 loss,
                 optimiser,
                 dataset_training_images,
                 dataset_training_labels,
                 number_training_steps=500
                 )

However, passing the entire dataset through the model may be:

  • computationally expensive, as it computes a single gradient for the whole dataset.
  • memory inefficient, as we need to load all the dataset in memory at once.

That is why it is more common to divide our dataset into batches.