Introduction to Deep Learning with PyTorch
Chapter 7: Building and Training a simple Classification Model
Implementing Simple Classifier
Before Implementing anything…
There are a few questions we need to address before starting implementing our MNISTClassifier
module.
We already answered most of those questions last chapter, but let’s recap quickly:
What kind of model should we use?
For the moment, we just consider our model performs a single linear transformation to the image, without anything else (no activation function, no additional linear transformations…)
No worries, we will add more layers very soon ;)
What is the input to our model?
Our model should take as input a batch of images of size: torch.Size([N_batch, 1, 28, 28])
, where N_batch
represents the number of images in the batch.
What is the output of our model?
Output Characteristics
For each image we expect our model to output the score s_i of belonging to each class \mathcal{C}_i. Where:
- \mathcal{C}_i is the class of digits i
- 0 \leq i \leq 9.
In other words, for each image, our model should output a vector of scores: [s_0, s_1, \cdots, s_9].
And if we feed the model with the image of a 5, we expect our model to output a vector of 10 scores: [s_0, s_1, \cdots, s_9] where s_5 is significantly higher than the other scores.
Output Shape
As explained above, for each image, we expect our model to output a 10-dimensional vector.
So, when we give as input to the model N_batch
images, we expect our model to return N_batch
10-dimensional vectors.
Thus, if the input is of shape torch.Size([N_batch, 1, 28, 28])
, the output should be of shape torch.Size([N_batch, 10])
.
Which loss function should we consider?
Converting scores into probabilities:
Given an image I, our model should output a 10-dimensional vector of scores [s_0, s_1, \cdots, s_9].
Those scores s_i can take any value in the set of real numbers. But we would like to convert them into a vector of probabilities: [p(I=0), p(I=1), \cdots, p(I=9)], where:
- p(I=0) is the predicted probability that the label of I is 0.
- p(I=1) is the predicted probability that the label of I is 1.
- and so on…
One common approach consists of using the softmax function to estimate those probabilities:
p(I=i) = \dfrac{\exp(s_i)}{\sum_{j=0}^{9} \exp(s_j)}.
Comparing predicted probabilities with ground-truth labels
Ideally, if our model was perfect, given an image of a 0, our model should predict the vector of probabilities: [1, 0, 0, ...].
So, given an image of a 0, we intend to minimise the distance between the predicted probabilities [p(I=0), p(I=1), \cdots, p(I=9)] and the ground truth label of the image [1, 0, 0, ...].
Cross-Entropy Loss Function
In PyTorch
, the loss torch.nn.CrossEntropyLoss
performs the two operations we just mentioned:
- Converting the scores returned by the model [s_0, s_1, \cdots, s_9] into a vector of probabilities [p(I=0), p(I=1), \cdots, p(I=9)].
- Calculating the distance between the estimated probabilities [p(I=0), p(I=1), \cdots, p(I=9)] with
In the end, by minimising the cross-entropy loss, we improve the predictions of our model.
And in PyTorch?
Remember the structure of our good old PyTorch Module?
import torch
class MNISTClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
...
def forward(self, tensor_images):
"""
Args:
tensor_images: tensor of shape (N_batch, 1, 28, 28)
"""
...
We now have all the tools to fill this module, and adapt it to our problem! :D
The __init__
method
As we said previously, for the moment, we just consider our model is made of a single linear layer.
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(in_features=<in_features>, out_features=<out_features>)
What should we put for <in_features>
and <out_features>
?
<out_features>
is easy to determine, we said we wanted our model to output a 10-dimensional vector for each image.
So we take <out_features>=10
.
<in_features>
corresponds to the total number of dimensions of each image.
As each image is made of 1 channel with 28 * 28 pixels: <in_features>=1 * 28 * 28
.
The forward
method
def forward(self, tensor_images):
"""
Args:
tensor_images: tensor of shape (N_batch, 1, 28, 28)
"""
tensor_images = tensor_images.view(-1, 1 * 28 * 28)
outcome_scores = self.linear(tensor_images)
return outcome_scores
In Summary:
class MNISTClassifier(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(in_features=1 * 28 * 28, out_features=10)
def forward(self, tensor_images):
"""
Args:
tensor_images: tensor of shape (N_batch, 1, 28, 28)
"""
tensor_images = tensor_images.view(-1, 1 * 28 * 28)
outcome_scores = self.linear(tensor_images)
return outcome_scores
Let’s test!
It is always important to test our model by providing random tensors as input. This way, we ensure that we did not do any mistake.
import torch
mnist_classifier = MNISTClassifier()
n_batch = 42
random_tensor = torch.randn(size=(n_batch, 1, 28, 28))
print(mnist_classifier(random_tensor).size())
and we get:
torch.Size([42, 10])
so, the result is of size (n_batch, 10)
, as expected! \o/