Introduction to Scikit-learn
Chapter 4: Understanding your features
Understanding your features
You are already provided pre-processed features with the Iris dataset, rather than raw features. Therefore, there is no need for an explicit feature encoding step. We can just use the pre-processed features directly.
Now that we have examined the categories, let us now specifically try to examine and understand the features or attributes themselves.
So far, we have figured out that there are four features. But…
Question 1: What does each feature represent?
Scikit-learn gives you that information, with an attribute aptly called .feature_names
.
>>> feature_names = dataset.feature_names
>>> print(feature_names)
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
You should get four features:
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)
If you are botanically challenged like me, then here is a diagram of what sepals and petals are: