This is an archived version of the course. Please find the latest version of the course on the main webpage.

Chapter 2: Classification pipeline

Pipeline in scikit-learn

face Josiah Wang

In scikit-learn, the classification pipeline is exactly the same:

  • Arrange data into \mathbf{X} and \mathbf{y}
  • Choose your model
  • Initialise your model with some hyperparameters
  • Fit your model to \mathbf{X} and \mathbf{y}
  • Predict labels \hat{\mathbf{y}}^{test} for \mathbf{X}^{test}
  • Evaluate the model performance by comparing \hat{\mathbf{y}}^{test} against \mathbf{y}^{test}

I will take you through the whole pipeline. We discuss how to apply scikit-learn to a classification problem step by step.