This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 2: Classification pipeline

The Iris dataset

face Josiah Wang

We will use the Iris dataset in our discussion. This is a classic dataset from 1936 often used for teaching machine learning techniques.

Conveniently, scikit-learn provides a function to access this dataset without having to download it separately.

So let us just load the dataset directly from scikit-learn!

Please follow along, either on the terminal or using a Jupyter Notebook/Google Colab or equivalent.

>>> from sklearn.datasets import load_iris
>>> dataset = load_iris()