Introduction to NumPy and Matplotlib
Chapter 6: NumPy functions
Exercise - Computing class distribution
Let’s try to make things more interesting by applying all these fancy NumPy functions to a Machine Learning problem.
Let us look at one example. You have seen/will see me discussing a bit on Machine Learning evaluation in your first Introduction to Machine Learning lecture.
Let’s say we have built a classifier for three classes: apple
, orange
, pear
. And our classifier has predicted the output for a set of test instances. Say we have compared the output of each test instance with the correct label, and have put them in a confusion matrix below.
apple | orange | pear | |
---|---|---|---|
apple | 18 | 3 | 9 |
orange | 1 | 30 | 4 |
pear | 7 | 5 | 23 |
Let’s assume the rows represent the correct label and the columns represent the predicted classes.
So according to the confusion matrix, 18 instances of apples are correctly predicted as apples by your classifier. 3 instances of apples are misclassified as oranges, and 9 instances of apples are misclassified as pears.
Similarly, 1 orange was misclassified as apple, 4 misclassified as pears, and the remaining 30 are correctly classified.
The confusion matrix can be represented as a NumPy
array as below:
x = np.array([[18, 3, 9], [1, 30, 4], [7, 5, 23]])
Your task
Use NumPy
to compute the percentage distribution per class. That is, compute how many percent of the dataset are apples, oranges, and pears respectively.
import numpy as np
x = np.array([[18, 3, 9], [1, 30, 4], [7, 5, 23]])
distribution = ????
assert all(distribution == np.array([0.3, 0.35, 0.35]))
Possible solutions:
distribution = np.sum(x, axis=1) / np.sum(x)
distribution = x.sum(axis=1) / x.sum()