This is an archived version of the course and is no longer updated. Please find the latest version of the course on the main webpage.

Classifiers in Scikit-learn

With the dataset ready, it is time for the main event! Let’s train a classifier.

Before we do that, we have to first choose our classifier model.

Scikit-learn offers implementations of many different classes of classifier models.

  • K Nearest Neighbours:
    • from sklearn.neighbors import KNeighborsClassifier
  • Decision Trees:
    • from sklearn.tree import DecisionTreeClassifier
  • Logistic Regression:
    • from sklearn.linear_model import LogisticRegression
  • Random Forests:
    • from sklearn.ensemble import RandomForestClassifier
  • Naive Bayes:
    • from sklearn.naive_bayes import GaussianNB
  • Support Vector Machines:
    • from sklearn.svm import SVC
  • Multilayer Perceptron (a.k.a. Neural Networks):
    • from sklearn.neural_network import MLPClassifier

Let’s say we choose K Nearest Neighbours, which will be the first classifier you will cover in the Introduction to Machine Learning course.

You will just need to import the KNeighborsClassifier class, and then create a new instance of the classifier with some of your model hyperparameters, for example the number of neighbours $K$ and the distance metric.

from sklearn.neighbors import KNeighborsClassifier

knn_classifier = KNeighborsClassifier(n_neighbors=5, metric="euclidean")

Then all you need to do is to .fit() the classifier to your training dataset. Congratulations, you have trained your model!

knn_classifier.fit(x_train, y_train)

After fitting, you have access to various attributes of the classifier. These attributes usually end with an underscore_. For example, you can check the classes recognised by our classifier with knn_classifier.classes_. The attributes vary depending on your classification model. See the official documentation for a list of attributes for KNeighborsClassifier.

Here is another example, this time we will train a Decision Tree classifier.

from sklearn.tree import DecisionTreeClassifier

dt_classifier = DecisionTreeClassifier(criterion="entropy", max_depth=5, min_samples_split=5)
dt_classifier.fit(x_train, y_train)

After fitting the model, you will again have access to many different attributes, including a .tree_ attribute that allows you to analyse your resulting decision tree. Again, see the official documentation for DecisionTreeClassifier for more details.

Making predictions

Once your model has been fitted, you can perform predictions on new data in just one line.

predictions = knn_classifier.predict(x_test)

print(predictions)
## [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0]

You can also obtain the probability of predictions across classes.

probs = knn_classifier.predict_proba(x_test)

print(probs)
## [[0.  1.  0. ]
##  [1.  0.  0. ]
##  [0.  0.  1. ]
##  [0.  1.  0. ]
##  [0.  1.  0. ]
##  [1.  0.  0. ]
##  [0.  1.  0. ]
##  [0.  0.  1. ]
##  [0.  0.6 0.4]
##  [0.  1.  0. ]
##  [0.  0.2 0.8]
##  ...
##  [0.  0.  1. ]
##  [1.  0.  0. ]
##  [1.  0.  0. ]]

Scikit-learn’s implementation of classifiers is also very good example of OOP polymorphism in action. It does not care what classifier you use, as long as the classifier implements a .fit() and a .predict() method. As you can see, this results in a lot of flexibility where we can swap classifiers without having to change a lot of code.