Introduction to Deep Learning with PyTorch
Chapter 7: Building and Training a simple Classification Model
Training our Classifier
Remember the code we had for training our linear model?
def train_parameters_linear_regression(tensor_number_tasks,
tensor_number_questions,
learning_rate=0.02,
number_training_steps=200):
"""
Instantiate ModelNumberQuestions model and Loss, and optimises the parameters of the model, given the dataset
of tensor_number_tasks and tensor_number_tasks.
Args:
tensor_number_tasks (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
tensor_number_questions (torch.Tensor): of size (n, 1) where n is the number of questions (it is also the number of tasks)
learning_rate (float):
number_training_steps (int):
Returns:
trained network (ModelNumberQuestions)
"""
net = ModelNumberQuestions() # model
loss = torch.nn.MSELoss() # loss module
optimiser = torch.optim.SGD(net.parameters(), lr=learning_rate)
for _ in range(number_training_steps):
optimiser.zero_grad()
# Compute Loss
estimator_number_questions = net.forward(tensor_number_tasks)
mse_loss = loss.forward(input=estimator_number_questions,
target=tensor_number_questions)
mse_loss.backward()
optimiser.step()
print("loss:", mse_loss.item())
print("Final Parameters:\n", list(net.named_parameters()))
return net
Well the code we have for training our classifier is super similar! But we will make some slight adjustments:
Loss
Previously, we said that we use a cross-entropy loss cross-entropy loss.
cross_entropy_loss = torch.nn.CrossEntropyLoss()
Which Optimiser should we use?
So far, we mostly relied on the Stochastic Gradient Descent SGD
optimiser.
Here we use the much more common Adam Optimiser
optimiser = Adam(net.parameters())
Training Loop
In the end our training code looks like this (we refactored it slightly):
def train_classifier(mnist_classifier, loss, optimiser, dataset_images, dataset_labels, number_training_steps):
for _ in range(number_training_steps):
optimiser.zero_grad()
# Compute Loss
estimator_number_questions = mnist_classifier(dataset_images)
value_loss = loss.forward(input=estimator_number_questions,
target=dataset_labels)
value_loss.backward()
optimiser.step()
print("loss:", value_loss.item())
And we declare our loss, model, and optimiser outside the function:
mnist_classifier = MNISTClassifier()
loss = torch.nn.CrossEntropyLoss()
optimiser = torch.optim.Adam(mnist_classifier.parameters())
Then, we just need a simple call to train_classifier
and the work is done! :D
train_classifier(mnist_classifier,
loss,
optimiser,
dataset_training_images,
dataset_training_labels,
number_training_steps=500
)
However, passing the entire dataset through the model may be:
- computationally expensive, as it computes a single gradient for the whole dataset.
- memory inefficient, as we need to load all the dataset in memory at once.
That is why it is more common to divide our dataset in batches.