This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 1: Introduction

Scikit-learn

face Josiah Wang

As mentioned, scikit-learn is a machine learning library, with many different models/algorithms already implemented for you.

Scikit-learn focusses on modelling data. It leaves the loading, handling, manipulation and data visualisation to be handled by the other libraries.

Here are some examples of what scikit-learn provides:

  • Datasets
  • Supervised models
  • Feature selection
  • Clustering of unlabelled data
  • Dataset transformation (preprocessing, feature extraction, normalization, dimensionality reduction, etc.)
  • Model selection and evaluation (cross validation, hyperparameter tuning, classification metrics, etc.)
  • Ensemble methods (Boosting, Bagging, Random Forest, etc)

You can use all the provided ML models as a blackbox, without really having to understand what goes on inside (remember the word abstraction?) Having said that, it is always good to at least have a high-level intuition of what each model is or what an algorithm does!

Scikit-learn is quite easy to use. Once you understand the basic use and syntax of one model, you can easily switch to another model since they all have a similar interface.

Scikit-learn is also pretty much object-oriented, so hope you have all your OOP knowledge ready and raring to go!