K-Nearest Neighbor Classification and vehicle evaluation

Hello there!

Are you purchasing a new car or want to know whether your current car was a good purchase was not?

You are in good luck.

In this blog, we are going to predict cars acceptability based on it’s Price (purchase price, maintenance price) Specification (Number of doors, how many persons it can seat, luggage boot size, safety).

Data Source ->http://archive.ics.uci.edu/ml/datasets/Car+Evaluation

Get car.data file from above link.

Import mandatory python libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Then run below lines of Python code to read the data in pandas dataframe. Note the code contains feature names.

from sklearn import datasets

df=pd.read_csv(r’C:\Sanrusha-Canon Laptop\Udemy\Machine Learning\SampleDataSet\car.data’,header=None,delimiter=’,’,names=[‘buying’,’maint’,’doors’,’persons’,’lug_boot’,’safety’,’CAR’])

Let’s review first few rows of dataframe df.

df.head()

You should get below output

Oh no, it contains categorical variables. These categorical features will give us trouble while fitting it to sklearn KNN Classifier. Let’s label code them.

Run below lines of python code to encode these features. No need to encode target variable CAR.

from sklearn.preprocessing import LabelEncoder
lbc=LabelEncoder()
df[“buying”]=lbc.fit_transform(df[“buying”])
df[“maint”]=lbc.fit_transform(df[“maint”])
df[“lug_boot”]=lbc.fit_transform(df[“lug_boot”])
df[“safety”]=lbc.fit_transform(df[“safety”])
df[“doors”]=lbc.fit_transform(df[“doors”])
df[“persons”]=lbc.fit_transform(df[“persons”])

Let’s review first few records now

df.head()

What are the unique values in target variable CAR? Let’s find out.

df[‘CAR’].unique()

Cars are categorized as Unacceptable (unacc), Low acceptability (low), Medium Acceptability (med), High Acceptability (high), Acceptable (acc), Very Good (vgood), Good (good).

Now, we know how the cars are going to be evaluated.

Before defining X and y vectors, let’s make sure all the feature values are coded properly and they are numeric values.

df.applymap(np.isreal).head()

Very good! The features are encoded into numeric values now. It’s right time to define X and y.

X=df.drop([‘CAR’], axis=1).values
y=df[‘CAR’].values

Split the data in training and test set

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

Before calling KNeighborsClassifier, we need to find out optimal k value. Run below line of python code to get optimum K value

from sklearn.neighbors import KNeighborsClassifier
from math import sqrt
from sklearn.metrics import mean_squared_error
y1=lbc.fit_transform(df[“CAR”])
rmse=[]
for k in range(20):
k=k+1
knn=KNeighborsClassifier(n_neighbors=k)
knn.fit(X,y1)
#y_pred_knn=knn.predict(X)
rmse.append(sqrt(mean_squared_error(y1,knn.predict(X))))
print(‘K value ‘,k,’rmse ‘,sqrt(mean_squared_error(y1,knn.predict(X))))

K value 7 looks optimum value.

Run below link of code to train the data set and get predicted values through KNN

knn=KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train,y_train)
y_pred=knn.predict(X_test)

Very good! Let’s check accuracy of our model.

from sklearn import metrics
print(“Accuracy “,metrics.accuracy_score(y,knn.predict(X)))

93% is pretty awesome accuracy. We build a good model!!

Let’s draw scatter plot of actual CAR acceptability vs predicted values.

import matplotlib.pyplot as plt
import numpy as np
for xe, ye in zip(X, y):
plt.scatter(xe,[ye] * len(xe),color=”blue”,marker=”o”,s=100)

for xe, ye in zip(X, knn.predict(X)):
plt.scatter(xe,[ye] * len(xe),color=”yellow”,marker=”*”,s=10)
plt.show()

Blue dots indicate actual values. Yellow star indicates predicted values. Pretty much each blue dot has yellow star.

Barring low acceptability, all other acceptability values were predicted well.

Pretty good!!

Reference:

Machine Learning Hands-on

K-Nearest Neighbor Classification and vehicle evaluation

K-Nearest Neighbor Classification and vehicle evaluation

Machine Learning Hands-on Course

Join the most comprehensive and popular Machine Learning Hands-on Course on Udemy, because now is the time to get…

End to End Machine Learning

Sanrusha is a leading provider of Machine Learning and AI based solutions. We strive to make life better by using AI.

Recommend

Simplified AI Writer

Email Stash - 2000+ free email marketing resources | Product Hunt

Regex library - A curated list of most commonly used regular expressions | Produ...

Growth Manager Workspace

Reach 主题专栏 | API 语法

一文读懂区块链技术

Open-source low-code platform to build internal tools

来看看RandomDAO事件背后的合约分析

Stately Editor Beta

每周以太坊进展2022/2/5

About Joyk