Introduction to Scikit-learn
Chapter 9: Summary
Summary
As a summary, to perform classification in scikit-learn
- Arrange your data into
x
andy
- Split your data with
train_test_split()
intox_train
,y_train
,x_test
, andy_test
(if not pre-split) - Choose a model for your
classifier
- Initialise your model with some hyperparameters
- Fit your classifier with the training examples:
classifier.fit(x_train, y_train)
- Predict labels for unseen test examples:
classifier.predict(x_test)
- Evaluate the model performance with a metric from the
sklearn.metrics
package
You can also create Pipeline
s, perform cross_validate()
, and tune your hyperparameters with GridSearchCV
.
And that’s all we will cover with scikit-learn. I urge you to check out the official documentation and explore all the hidden treasures offered by scikit-learn for yourself. The official user guide is also very well-written and thorough. They also cover topics beyond classification, for example regression and unsupervised learning.
Hopefully all the training you had on the core Python fundamentals like functions and OOP will also help you easily understand the documentation!
Here is another set of optional exercises for you to play around with scikit-learn a bit more! These are served on Google Colab, so make your own copy and work on your copy!
Otherwise, have fun exploring scikit-learn on your own!