Introduction to Scikit-learn
Chapter 4: Understanding your features
Understanding your feature type
Next question…
Question 2: What is the type of the features?
The next thing you should try to figure out is what the data type of each feature is. Are they integers? Floats? Categorical? Strings?
You can of course check the internal NumPy
datatype of x
easily.
>>> print(x.dtype)
float64
You should, however, also check whether any of the features are actually not floats, but are just cast as floats for convenience. For example, some of these features may actually be integers represented as floats. If this happens, then it will be your design decision on what to do with these (it’s usually fine to keep them as floats). For the Iris dataset, these are all genuinely floats, so there is nothing to worry about.