Python in Plain English

New Python content every day. Follow to join our 3.5M+ monthly readers.

Follow publication

A32: Multi-class Classification Using Logistic Regression

Junaid Qazi, PhD
Python in Plain English
8 min readJan 17, 2022

--

🧘🏻‍♂️Topics to be covered:

import pandas as pd; import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style(‘whitegrid’) # just optional!
%matplotlib inline
#Setting display format to retina in matplotlib to see better quality images.
from IPython.display import set_matplotlib_formats
set_matplotlib_formats(‘retina’)
# Lines below are just to ignore warnings
import warnings; warnings.filterwarnings(‘ignore’)

1. The dataset, EDA, and preprocessing

iris_df=pd.read_csv(“https://raw.githubusercontent.com/junaidqazi/DataSets_Practice_ScienceAcademy/master/Iris.csv")
This is how the data look like!

Separating features and the target

Target is a category code column, that we have created from species, which contains names (see above).

Feature scaling

#scaler = StandardScaler()# Let's try MiMaxScaler here!
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
# Try both MinMaxScaler and StandardScaler, and compare the performance of your models!

Machine Learning

2. One-vs-rest

Well, 13 miss-classed for label “1” and 3 for label “2” >> all good for class label “0”, look at the precision, recall and f1-scores as well, what do you learn from there?

3. Multinomial

We can see the improved performance using multinomial regression, less miss-classified data points here as compared to one-vs-rest!

4. Predicted probabilities

Predicted probabilities using one-vs-rest
Predicted probabilities using multinomial

5. Readings

6. Code example

# Tune regularization for multinomial logistic regression
import time
import numpy as np
from sklearn.datasets import make_classification,make_blobs
from sklearn.model_selection import cross_val_score,RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
import matplotlib.pyplot as plt
start_time=time.time()
# get the dataset
def the_dataset():
“””
Creating dataset with 10000 samples and 130 features
using make_classification(). Classes are 5!!!
“””
#X,y=make_blobs(n_samples=100000,centers=5,n_features=5,
# random_state=101,cluster_std=3)
X,y=make_classification(n_samples=10000, n_features=130,n_informative=60,
n_redundant=40,random_state=101,n_classes=5,class_sep=1.5)
# change n_sample=100000 in make_classification
print(“Data (X, y) is created…..”)
return X, y
# get a list of models to evaluate
def the_models():
C=[0.0,0.0001,0.001,0.01,0.1,1.0]
print(“list of C (inverse of regularization strength) values: “, C)
print(“Training multinomial logistic regression models……!”)
print()
models=dict()
for p in C:
# create name for model
key=’%.4f’ % p
#print(key)
# turn off penalty in some cases
if p==0.0:
# no penalty in this case
# lbfgs is a default algorithm for parameter optimization and uses limited memory
# based on Broyden–Fletcher–Goldfarb–Shanno (bfgs) algorithm
models[key]=LogisticRegression(multi_class=’multinomial’,
solver=’lbfgs’,penalty=’none’)
else:
models[key]=LogisticRegression(multi_class=’multinomial’,
solver=’lbfgs’,
penalty=’l2',C=p)
return models
# evaluate a give model using cross-validation
def model_eval(model,X,y):
“””
this is the docstring for this function.
“””
# define the evaluation procedure
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=5, random_state=101) # creating instance
# Stratification is the process of dividing members of the population into
# homogeneous subgroups before sampling.
# evaluate the model
scores = cross_val_score(model, X, y, scoring=’accuracy’, cv=cv, n_jobs=-1)
return scores
# define dataset
X, y = the_dataset()
# get the models to evaluate
models = the_models()
# evaluate the models and store results
results, names = list(), list()
for name, model in models.items():
# evaluate the model and collect the scores
scores = model_eval(model, X, y)
# store the results
results.append(scores)
names.append(name)
# summarize progress along the way
print(“C = {}; mean_score = {}; std = {} “.format(
name,round(np.mean(scores),4),round(np.std(scores),4)))
print(“\nTotal compute time (sec) = {}”.format(time.time() — start_time))
# plot model performance for comparison
plt.figure(figsize=(18,6))
plt.xlabel(“C — The regularization strenght”, fontsize=18);plt.ylabel(“Mean score”, fontsize=18)
plt.boxplot(results, labels=names, showmeans=True);
This is the output from the above long code, you can change the parameters while creating data or use your own data and see how multi-class classification work for your dataset!

--

--

Published in Python in Plain English

New Python content every day. Follow to join our 3.5M+ monthly readers.

Written by Junaid Qazi, PhD

We offer professional development, corporate training, consulting, curriculum and content development in Data Science, Machine Learning and Blockchain.

No responses yet

Write a response