Chapter 6: NumPy functions

Exercise - Computing class distribution

face Josiah Wang

Let’s try to make things more interesting by applying all these fancy NumPy functions to a Machine Learning problem.

Let us look at one example. You have seen/will see me discussing a bit on Machine Learning evaluation in your first Introduction to Machine Learning lecture.

Let’s say we have built a classifier for three classes: apple, orange, pear. And our classifier has predicted the output for a set of test instances. Say we have compared the output of each test instance with the correct label, and have put them in a confusion matrix below.

apple orange pear
apple 18 3 9
orange 1 30 4
pear 7 5 23

Let’s assume the rows represent the correct label and the columns represent the predicted classes.

So according to the confusion matrix, 18 instances of apples are correctly predicted as apples by your classifier. 3 instances of apples are misclassified as oranges, and 9 instances of apples are misclassified as pears.

Similarly, 1 orange was misclassified as apple, 4 misclassified as pears, and the remaining 30 are correctly classified.

The confusion matrix can be represented as a NumPy array as below:

x = np.array([[18, 3, 9], [1, 30, 4], [7, 5, 23]])

Your task

Use NumPy to compute the percentage distribution per class. That is, compute how many percent of the dataset are apples, oranges, and pears respectively.

import numpy as np

x = np.array([[18, 3, 9], [1, 30, 4], [7, 5, 23]])
distribution = ????
assert all(distribution == np.array([0.3, 0.35, 0.35]))
 

Possible solutions:

distribution = np.sum(x, axis=1) / np.sum(x)
distribution = x.sum(axis=1) / x.sum()