This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 3: Understanding your data

How many categories?

face Josiah Wang

Question 3: How many (and what) categories does the dataset have?

The next thing to find out is - how many categories/classes does this dataset have? And what are these categories/classes? Note that classes here are not related to OOP classes. They just mean categories like “cat” and “dog”.

Luckily, scikit-learn also has that covered with the target_names attribute!

>>> categories = dataset.target_names
>>> print(categories)
['setosa' 'versicolor' 'virginica']
>>> print(len(categories))
3

If done correctly, you should see that the Iris dataset comprises three categories: “setosa”, “versicolor”, and “virginica”.

Iris setosa
Iris setosa.
CC BY-SA 3.0,
Link
Iris versicolor
Iris versicolor.
By D. Gordon E. Robertson - Own work,
CC BY-SA 3.0, Link
Iris virginica
Iris virginica.
By Eric Hunt - Own work,
CC BY-SA 4.0, Link