This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Introduction

Machine Learning (ML) will be central throughout your AI degree, whether you like it or not!

In this module, we will look at a machine learning library for Python called scikit-learn.

This module is not to teach you about machine learning (there is that other Introduction to Machine Learning module for that!). This is more of an introduction on how to use a Python library to perform machine learning.

Scikit-learn is built on NumPy, SciPy, Matplotlib and Pandas (this was why I introduced these first).

The name comes from the SciPy Toolkit (SciKit), because scikit-learn started out as a third party extension to SciPy. While we are not covering SciPy in our course, SciPy is essentially a library on top of NumPy that provides you with convenient classes and functions to perform scientific computations, like linear algebra, optimisation, and statistics.

Scikit-learn focuses on modelling data, and leaves the loading, handling, maanipulation and data visualisation to be handled by the other libraries.

Here are some examples of what scikit-learn provides:

  • Datasets
  • Supervised models
  • Feature selection
  • Clustering of unlabeled data
  • Dataset transformation (preprocessing, feature extraction, normalization, dimensionality reduction, etc.)
  • Model selection and evaluation (cross validation, hyperparameter tuning, classification metrics, etc.)
  • Ensemble methods (Boosting, Bagging, Random Forest, etc)

You can use all the provided ML models as a blackbox, without really having to understand what goes on inside (remember the word abstraction?). Having said that, it is always good to at least have a high-level intuition of what each model is or what an algorithm does!

Scikit-learn is quite easy to use. The API is quite unified. Once you understand the basic use and syntax of one model, you can easily switch to another model since they all have a similar interface.