This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 9: Summary

Summary

face Josiah Wang Joe Stacey

As a summary, to perform classification in scikit-learn

  • Arrange your data into x and y
  • Split your data with train_test_split() into x_train, y_train, x_test, and y_test (if not pre-split)
  • Choose a model for your classifier
  • Initialise your model with some hyperparameters
  • Fit your classifier with the training examples: classifier.fit(x_train, y_train)
  • Predict labels for unseen test examples: classifier.predict(x_test)
  • Evaluate the model performance with a metric from the sklearn.metrics package

You can also create Pipelines, perform cross_validate(), and tune your hyperparameters with GridSearchCV.

And that’s all we will cover with scikit-learn. I urge you to check out the official documentation and explore all the hidden treasures offered by scikit-learn for yourself. The official user guide is also very well-written and thorough. They also cover topics beyond classification, for example regression and unsupervised learning.

Hopefully all the training you had on the core Python fundamentals like functions and OOP will also help you easily understand the documentation!

Here is another set of optional exercises for you to play around with scikit-learn a bit more! These are served on Google Colab, so make your own copy and work on your copy!

Otherwise, have fun exploring scikit-learn on your own!