Introduction to Deep Learning with PyTorch > Implementing Simple Classifier | Python Programming | Department of Computing

Introduction to Deep Learning with PyTorch

Chapter 7: Building and Training a simple Classification Model

Implementing Simple Classifier

face Luca Grillotti

Before Implementing anything…

There are a few questions we need to address before starting to implement our MNISTClassifier module.

We have already answered most of these questions last chapter, but let’s recap quickly:

What kind of model should we use?

For the moment, we just consider that our model performs a single linear transformation to the image, without anything else (no activation function, no additional linear transformations…)

No worries, we will add more layers very soon 😉

What is the input to our model?

Our model should take as input a batch of images of size: torch.Size([N_batch, 1, 28, 28]), where N_batch represents the number of images in the batch.

What is the output of our model?

Output Characteristics

For each image we expect our model to output the score $s_i$ of belonging to each class $\mathcal{C}_i$ . Where:

$\mathcal{C}_i$ is the class of digits $i$
$0 \leq i \leq 9$ .

In other words, for each image, our model should output a vector of scores: $[s_0, s_1, \cdots, s_9]$ .

And if we feed the model with an image of a 5, we expect our model to output a vector of 10 scores: $[s_0, s_1, \cdots, s_9]$ where $s_5$ is significantly higher than the other scores.

Output Shape

As explained above, for each image, we expect our model to output a 10-dimensional vector.

So, when we give as input to the model N_batch images, we expect our model to return N_batch 10-dimensional vectors.

Thus, if the input is of shape torch.Size([N_batch, 1, 28, 28]), the output should be of shape torch.Size([N_batch, 10]).

Which loss function should we consider?

Converting scores into probabilities:

Given an image $I$ , our model should output a 10-dimensional vector of scores $[s_0, s_1, \cdots, s_9]$ .

Those scores $s_i$ can take any value in the set of real numbers. But we would like to convert them into a vector of probabilities: $[p(I=0), p(I=1), \cdots, p(I=9)]$ , where:

$p(I=0)$ is the predicted probability that the label of $I$ is 0.
$p(I=1)$ is the predicted probability that the label of $I$ is 1.
and so on…

One common approach consists of using the softmax function to estimate those probabilities:

$p(I=i) = \dfrac{\exp(s_i)}{\sum_{j=0}^{9} \exp(s_j)}$ .

Comparing predicted probabilities with ground-truth labels

Ideally, if our model was perfect, given an image of a 0, our model should predict the vector of probabilities: $[1, 0, 0, ...]$ .

So, given an image of a 0, we intend to minimise the distance between the predicted probabilities $[p(I=0), p(I=1), \cdots, p(I=9)]$ and the ground truth label of the image $[1, 0, 0, ...]$ .

Cross-Entropy Loss Function

In PyTorch, the loss torch.nn.CrossEntropyLoss performs the two operations we just mentioned:

Converting the scores returned by the model $[s_0, s_1, \cdots, s_9]$ into a vector of probabilities $[p(I=0), p(I=1), \cdots, p(I=9)]$ .
Calculating the distance between the estimated probabilities $[p(I=0), p(I=1), \cdots, p(I=9)]$ and the ground truth label of the image $[1, 0, 0, ...]$ .

In the end, by minimising the cross-entropy loss, we improve the predictions of our model.

And in PyTorch?

Remember the structure of our good old PyTorch Module?

import torch

class MNISTClassifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        ...

    def forward(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """
        ...

We now have all the tools to fill this module, and adapt it to our problem! :D

The `init` method

As we said previously, for the moment, we just consider our model is made of a single linear layer.

    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(in_features=<in_features>, out_features=<out_features>)

What should we put for <in_features> and <out_features>?

<out_features> is easy to determine, we said we wanted our model to output a 10-dimensional vector for each image. So we take <out_features>=10.

<in_features> corresponds to the total number of dimensions of each image. As each image is made of 1 channel with 28 * 28 pixels: <in_features>=1 * 28 * 28.

The `forward` method

    def forward(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """
        tensor_images = tensor_images.view(-1, 1 * 28 * 28)
        outcome_scores = self.linear(tensor_images)
        return outcome_scores

In Summary:

class MNISTClassifier(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = torch.nn.Linear(in_features=1 * 28 * 28, out_features=10)

    def forward(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """
        tensor_images = tensor_images.view(-1, 1 * 28 * 28)
        outcome_scores = self.linear(tensor_images)
        return outcome_scores

Let’s test!

It is always important to test our model by providing random tensors as input. This way, we ensure that we did not do any mistake.

import torch
mnist_classifier = MNISTClassifier()
n_batch = 42
random_tensor = torch.randn(size=(n_batch, 1, 28, 28))
print(mnist_classifier(random_tensor).size())

and we get:

torch.Size([42, 10])

so, the result is of size (n_batch, 10), as expected! \o/