This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Practical examples

I think that is more than enough NumPy for you to digest for now!

I hope you have not fallen asleep yet. Can’t blame you! I’m falling asleep writing these!

Let’s try to make things more interesting with some practical examples of how you might end up using all these fancy NumPy features in Machine Learning.

Computing accuracy for evaluating classification

Let us look at one example. You have seen/will see me discussing a bit on Machine Learning evaluation in your Introduction to Machine Learning lecture this week.

For a classification task, we might have to compute an evaluation metric called \(accuracy\).

\[accuracy = \frac{| correct |}{| instances |}\]

Let’s say we have built a classifier for three classes: apple, orange, pear. And our classifier has predicted the output for a set of test instances. Say we have compared the output of each test instance with the correct label, and have put them in a confusion matrix below.

  apple orange pear
apple 18 3 9
orange 1 30 4
pear 7 5 23

Let’s assume the rows represent the correct label and the columns represent the predicted classes.

So according to the confusion matrix, 18 instances of apples are correctly predicted as apples by your classifier. 3 instances of apples are misclassified as oranges, and 9 instances of apples are misclassified as pears.

Similarly, 1 orange was misclassified as apple, 4 misclassfied as pears, and the remaining 30 are correctly classified.

The confusion matrix can be represented as a NumPy array as below:

x = np.array([[18, 3, 9], [1, 30, 4], [7, 5, 23]])

We can compute the number of instances for each class from the matrix.

class_distribution = np.sum(x, axis=1)  ## [30 35 35]

We can also compute the total number of test instances.

total_instances = np.sum(x)  ## 100

We can also compute the proportion of instances per class (in percentage)

class_percentage = class_distribution / total_instances  ## [0.3  0.35 0.35]

# or...
class_percentage = np.sum(x, axis=1) / np.sum(x)

# or with a more OOP notation...
class_percentage = x.sum(axis=1) / x.sum()

Now, to compute the accuracy, we need to know how many instances are correctly predicted overall. We can take advantage of the fact that the correct predictions are in the diagonal of the confusion matrix, i.e. 18, 30 and 23. So we extract the diagonal from the matrix.

correct_perclass = np.diagonal(x) ## [18 30 23]

We can then sum up the number of correct predictions per class to get the total number of correct predictions

total_correct = np.sum(correct_perclass)  ## 71

The two steps above can be combined into a single line if you do so desire.

total_correct = np.sum(np.diagonal(x))  ## 71

## or if you prefer an object-oriented aproach...
total_correct = x.diagonal().sum()

Now, you can compute the accuracy!

accuracy = total_correct / total_instances

Of course, this could have been all done in a single line if you like.

accuracy = np.sum(np.diagonal(x)) / np.sum(x)

Computing Mean Squared Error for evaluating regression

Another good example would be to compute the mean squared error for regression, as briefly shown in the Introduction to Machine Learning lectures. Fortunately, the official NumPy tutorial already has this example covered, so there is no point for me to reproduce it just for the sake of it (lazy!). So please just look at the example on the official tutorial to see how you would implement this.