A44: Support Vector Machines (SVMs) vs Logistic Regression — Practice & Comparisons [complete project with code]

A step-by-step tutorial to classify digits (MNIST), IRIS, and non-linearly separable data, gridsearch, cross-validation

Junaid Qazi, PhD
9 min readMar 24, 2022

This article is a part of Data Science from Scratch — Can I to I Can”, A Lecture Notes Book Series. (click here to get your copy today!)

Click here for the previous article/lecture on “A43: Support Vector Machines (SVMs) — Hands-on [complete project with code]”

💐Click here to FOLLOW ME for new contents💐

⚠️ In this lecture, we will be working with three different datasets to compare the performance of SVMs and Logistic regression. The datasets are; MNIST, IRIS and Circles.

✅ A Suggestion: Open a new jupyter notebook and type the code while reading this article, doing is learning, and yes, “PLEASE Read the comment, they are very useful…..!”

🧘🏻‍♂️ 👉🎯 >> Stay calm and focused! >> 🧘🏻‍♂️ 👉🎯

Welcome back guys!

Practice is a key to master the skills. Let’s recall logistic regression and compare its performance with support vector machines. We will also explore the available kernel options in SVMs and compare their performance among each other and against simple logistic regression. We are going to work with three different datasets.

  1. MNIST Handwritten digits dataset
    >>1.1: Visualizations
    >>1.2: Machine-Learning
    >>1.3: Cross-validating — logistic regression and SVM
    >>1.4: SVM — Hyperparameter tuning and the best-model
  2. The iris dataset
    >>2.1: Model training and comparisons
    >>2.2: SVM visualizing kernel effects
  3. The circles data — non-linearly separable data
    >>3.1: Model training and comparisons
    >>3.2: Visualizing kernel effects for circles data

First thing first, let’s do the required imports.

# Required imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set(font_scale=1.3) # setting font size for the whole notebook
sns.set_style(“white”) # if you want to set the style
from sklearn import svm, linear_model, datasets
from sklearn.model_selection import cross_val_score
#Retina display to see better quality images.
%config InlineBackend.figure_format = ‘retina’
# Lines below are just to ignore warnings
import warnings
warnings.filterwarnings(‘ignore’)

A good idea is to check the versions of the libraries in use.

You might be using different versions!

1. MNIST — Handwritten digits dataset

The MNIST (Modified National Institute of Standards and Technology) database of handwritten digits (28x28 pixel grayscale images) consist of 60,000 training and 10,000 test examples. The dataset is a subset of a larger set available from NIST Special Database 3 (digits written by employees of the United States Census Bureau) and Special Database 1 (digits written by high school students) which contain monochrome images of handwritten digits. The digits have been size-normalized and centered in a fixed-size image.

MNIST dataset is a widely used and deeply understood dataset. CNNs (Convolutional Neural Networks) are the top-performing deep learning models that achieve a classification accuracy of above 99%, with an error rate between 0.4% and 0.2% on the hold out test MNIST dataset.

A low resolution (8x8 pixel image) and small subset of MNIST data is included in scikit-learn machine learning library for practice as well.

Let’s load this sample dataset from scikit-learn datasets and see if we can classify the handwritten digits!

So, we need data and target keys to grab the data for our classification project — you can explore other fields yourself. If you notice, we have a key 'images' in the list which contains the 8x8 matrices of all the digits.

# Incase, you are interested in data description!
# print(digits.DESCR)

(OPTIONAL) We can create a dataframe using data, target and feature_names, however it is not required. Well, why not!

Let’s get the data in a dataframe df!

We can see, there are 64 feature, each feature for individual pixel value in 8x8 matrix, right?

Now, if we want to graphically view the image, we can drop the target column from our dataframe df, and grab a row using iloc, then reshape it into 8x8 matrix and use imshow() from matplotlib to visualize the image, super easy!

👉 Want to revise your matplotlib skills? Explore previous lectures — A14 on matplotlib essentials and A14 on matplotlib advance!

1.1: Visualizations

Let’s try to visualize some random observation/data-points at some random index, e.g. 1, 5, or -2 to see how the 8x8 image look like in MINIST dataset!

Let’s try to visualize the above iamge in gray scale, just pass cmap!

Well, we don’t really need much EDA, these are just numbers from 0 to 9. The data is already size-normalized and centred in a fixed-size image, and there is no missing data (try .info() to confirm). The target class is also well balanced.

1.2: Machine Learning

Let’s move on, separate features and the target. We can then train logistic regression and SVMs classifier and compare their performance.

Using code below, we are separating features (the 8x8=64 columns, one for each pixel) in digits_X and target (the number) in digits_y

digits_X, digits_y = df.drop(‘target’,axis=1), df.target
#digits_X, digits_y = mnist.data, mnist.target

1.3: Cross-validating logistic regression and SVM on the data

Well, we can start with default setting!

Want a recall on cross-validation? Click here!

We tried logistic regression with default parameters for comparisons only, however it is not bad. Clearly, SVM gave better performance. What do you think?

Can we further improve the model by tuning and finding the best set of parameters using gridsearch? (Want a refresh on gridsearch?)

Let’s try gridsearch for SVMs!

1.4: SVM — Hyperparameter tuning and the best model

Finding the best set of parameters using GridSearch — A quick review from the previous lectures!

👉 gamma

Intuitively, the gamma parameter defines how far the influence of a single training example reaches, with low values meaning “far” and high values meaning “close”.

The higher the value of gamma, the more it will try to exactly fit the training data set. That will cause over-fitting problems.

  • Small gamma: The model is constrained and can under-fit! It has high bias and low variance.
  • Large gamma: The model tries to capture the shape too well: it can over-fit! It has low bias and high variance.

👉 C

C is the penalty parameter of the error term. It controls the trade off between smooth decision boundary and classifying the training points correctly. C can be thought of as the parameter for the soft margin cost function, which controls the influence of each individual support vector.

  • Small C: makes the decision surface smooth and simple, a softer margin can under-fit! Gives high bias and low variance.
  • Large C: selects more support vectors: can over-fit! Gives a harder margin, low bias and high variance.

Along with gamma and C, we can look for best kernel in the gridsearch as well. The code below will do this all for you with 5 fold cross-validation (cv=5)!

So, the gridsearch took ~421.8 sec (~7 mins) on my machine. Depending upon your computer, expect different time to complete the search.

Well, let’s see what are the best suggested values of the parameters (C, gamma & kernel) from the given ones!

The code below is just an additional and optional work that I usually prefer. Saving gridsearch results as a csv file are helpful and we can use them according to our needs in future. I prefer to create a dataframe and save the results as csv file.

So, the best model is our final model trained on complete dataset (we actually did not split data into train and test sets).

*******************************************************************

Let’ move on and work with IRIS dataset now!

2. The iris dataset

2.1: Model training and comparisons

We will load the iris data from scikit-learn and scale the features before we cross-validate both logistic regression and SVM classifiers.

In the above code, we used default parameters for both classifiers. We have kernel options in SVM, let’s see how the three kernels (Gaussian/rbf, linear, polynomial) work with the data using SVM.

rbf kernel is actually giving us better score. Well, it might be good idea to see the decision boundaries that are set by three models with their respective kernels!

2.2: SVM visualizing kernel effects

Code reference from scikit-learn

Different colours represent decision boundary for the individual class.

*******************************************************************

Moving forward, let’s explore the real power of Support Vector Machines, which is its capability of classifying non-linearly separable data!

3. The circles data — non-linearly separable data

Let’s create a data with circle within a circle using sklearn’s make_circles and compare cross-validated scores for logistic regression and different SVM kernels. The code below will do this all for you!

From the above scores, clearly, logistic regression is not the option for non-linearly separable data. However, we see rbf kernel is doing the magic for SVM and clearly separating classes with 100% accuracy. Well, we may want to visualize what is happening using different kernels in SVM and how they are actually working for this complex data.

The radial basis function projects the data into higher dimensions that accompany circles well

3.2: Visualizing kernel effects for circles data

We have used three kernels, linear, rbf and poly in the above code for SVM. Let’s plot the decision boundaries and see how they are classifying the complex dataset. Give sometime and understand the code below, it should be pretty simple at this stage of the course!

Support vector machines with rbf kernel is shown in the middle, which is doing the magic!

SVM is really a powerful classifier, specially for the non-linearly separable dataset.

>>Remember, we did not do hyperparameter tuning for iris and circles datasets and used the default parameters.<<

Think about some complex dataset for classification problem and compare the results from all the models that we have learned so far. Practice is a key for understanding. UCI Machine Learning Repository could be helpful to find the datasets.

*******************************************************************

💐Click here to FOLLOW ME for new contents💐

🌹Keep practicing to brush-up & add new skills🌹

✅🌹💐💐💐🌹✅ Please clap and share >> you can help us to reach to someone who is struggling to learn these concepts.✅🌹💐💐💐🌹✅

Good luck!

See you in the next lecture on A45: Clustering — Unsupervised Machine Learning”.

Note: This complete course, including video lectures and jupyter notebooks, is available on the following links:

**************************************************************************************************************************************

About Dr. Junaid Qazi:

Dr. Junaid Qazi is a subject matter specialist, data science & machine learning consultant, and a team builder. He is a professional development coach, mentor, author, technical writer, and invited speaker. Dr. Qazi can be reached for consulting projects, technical writing and/or professional development trainings via LinkedIn.

**************************************************************************************************************************************

--

--

Junaid Qazi, PhD

We offer professional development, corporate training, consulting, curriculum and content development in Data Science, Machine Learning and Blockchain.