Chapter 8: Building and Training an AutoEncoder

Implementing Simple Auto-Encoder

face Luca Grillotti

Before Implementing anything…

There are a few questions we need to address before starting implementing our MNISTAutoEncoder module.

What is the input to our model?

Our model should take as input a batch of images of size: torch.Size([N_batch, 1, 28, 28]), where:

  • N_batch represents the number of images in the batch.

What is the output of our model?

Output Shape

As explained before, for each image, we expect our model to output an accurate reconstruction of that image.

So, when we give as input to the model N_batch images, we expect our model to return N_batch images.

Thus, if the input is of shape torch.Size([N_batch, 1, 28, 28]), the output should be of shape torch.Size([N_batch, 1, 28, 28]).

Which loss function should we consider?

Ideally, if our model was perfect, given an image as input, our model should reconstruct the same identical image.

So, given an image I, we intend to minimise the distance between I and its reconstruction I' from the auto-encoder.

In PyTorch, the loss torch.nn.MSELoss calculates the mean squared error between two tensors (in this case: between two batches images). This is the loss we will use to compare original images with their reconstructions.

In the end, by minimising the mean squared error loss, we intend to provide more accurate reconstructions.

And in PyTorch?

We will use the same structure as before with a slight difference: we would like to have a flexible number of dimensions for our encoding/feature space.

So we add a parameter feature_space_dimensionality to our __init__.

Also, in this example, we may be interested in the values of the encodings, so we add a method get_encoding()

import torch

class MNISTAutoEncoder(torch.nn.Module):
    def __init__(self, feature_space_dimensionality):
        super().__init__()
        ...

    def forward(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """
        ...

    def get_encoding(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """
        ...

We now have all the tools to fill this module, and adapt it to our problem!

The __init__ method

As before, here we define all our layers.

We consider that both the encoder and the decoder have 3 Linear operations (with hidden layers of size 64).

    def __init__(self, feature_space_dimensionality):
        super().__init__()
        self.linear_encoder_1 = torch.nn.Linear(in_features=1 * 28 * 28, out_features=64)
        self.linear_encoder_2 = torch.nn.Linear(in_features=64, out_features=64)
        self.linear_encoder_final = torch.nn.Linear(in_features=64, out_features=feature_space_dimensionality)

        self.linear_decoder_1 = torch.nn.Linear(in_features=feature_space_dimensionality, out_features=64)
        self.linear_decoder_2 = torch.nn.Linear(in_features=64, out_features=64)
        self.linear_decoder_final = torch.nn.Linear(in_features=64, out_features=1 * 28 * 28)

The forward method

There is one subtlety here. We said that our model should output reconstructions of the original images. As a consequence, the output should be of shape (N_batch, 1, 28, 28).

    def forward(self, tensor_images):
        """
        Args:
            tensor_images: tensor of shape (N_batch, 1, 28, 28)
        """

        # Flattening images
        x = tensor_images.view(-1, 1 * 28 * 28)

        # Encoder --------------------

        # Encoder - Layer 1
        x = self.linear_encoder_1(x)
        x = torch.relu(x)

        # Encoder - Layer 2
        x = self.linear_encoder_2(x)
        x = torch.relu(x)

        # Encoder - Final Layer
        encoding = self.linear_encoder_final(x)
        # Note that there is no activation function here

        # Decoder --------------------

        # Decoder - Layer 1
        x = self.linear_decoder_1(encoding)
        x = torch.relu(x)

        # Decoder - Layer 2
        x = self.linear_decoder_2(x)
        x = torch.relu(x)

        # Decoder - Final Layer
        reconstruction = self.linear_decoder_final(x)

        # Putting reconstruction on the right shape (as original images)
        reconstruction = reconstruction.view(-1, 1, 28, 28)

        return reconstruction

get_encoding

get_encoding simply corresponds to the encoder operations of the forward method.

    def get_encoding(self, tensor_images):

        # Flattening images
        x = tensor_images.view(-1, 1 * 28 * 28)

        # Encoder --------------------

        # Encoder - Layer 1
        x = self.linear_encoder_1(x)
        x = torch.relu(x)

        # Encoder - Layer 2
        x = self.linear_encoder_2(x)
        x = torch.relu(x)

        # Encoder - Final Layer
        encoding = self.linear_encoder_final(x)
        # Note that there is no activation function here

        return encoding

In Summary:

class MNISTAutoEncoder(torch.nn.Module):
    def __init__(self, feature_space_dimensionality):
        super().__init__()

        self.linear_encoder_1 = torch.nn.Linear(in_features=1 * 28 * 28, out_features=64)
        self.linear_encoder_2 = torch.nn.Linear(in_features=64, out_features=64)
        self.linear_encoder_final = torch.nn.Linear(in_features=64, out_features=feature_space_dimensionality)

        self.linear_decoder_1 = torch.nn.Linear(in_features=feature_space_dimensionality, out_features=64)
        self.linear_decoder_2 = torch.nn.Linear(in_features=64, out_features=64)
        self.linear_decoder_final = torch.nn.Linear(in_features=64, out_features=1 * 28 * 28)      


    def forward(self, tensor_images):

        # Flattening images
        x = tensor_images.view(-1, 1 * 28 * 28)

        # Encoder --------------------

        # Encoder - Layer 1
        x = self.linear_encoder_1(x)
        x = torch.relu(x)

        # Encoder - Layer 2
        x = self.linear_encoder_2(x)
        x = torch.relu(x)

        # Encoder - Final Layer
        encoding = self.linear_encoder_final(x)
        # Note that there is no activation function here

        # Decoder --------------------

        # Decoder - Layer 1
        x = self.linear_decoder_1(encoding)
        x = torch.relu(x)

        # Decoder - Layer 2
        x = self.linear_decoder_2(x)
        x = torch.relu(x)

        # Decoder - Final Layer
        reconstruction = self.linear_decoder_final(x)



        # Putting reconstruction on the right shape (as original images)
        reconstruction = reconstruction.view(-1, 1, 28, 28)

        return reconstruction

    def get_encoding(self, tensor_images):

        # Flattening images
        x = tensor_images.view(-1, 1 * 28 * 28)

        # Encoder --------------------

        # Encoder - Layer 1
        x = self.linear_encoder_1(x)
        x = torch.relu(x)

        # Encoder - Layer 2
        x = self.linear_encoder_2(x)
        x = torch.relu(x)

        # Encoder - Final Layer
        encoding = self.linear_encoder_final(x)
        # Note that there is no activation function here

        return encoding

Let’s test!

It is always important to test our model by providing random tensors as input. This way, we ensure that we did not do any mistake.

import torch

mnist_auto_encoder = MNISTAutoEncoder()
n_batch = 42
random_tensor = torch.randn(size=(n_batch, 1, 28, 28))
print(mnist_auto_encoder(random_tensor).size()) 

and we get:

torch.Size([42, 1, 28, 28])

so, the result is of size (n_batch, 1, 28, 28), as expected! \o/