7

K-Nearest Neighbor Classification and vehicle evaluation

 2 years ago
source link: https://medium.com/sanrusha-consultancy/k-nearest-neighbor-classification-and-vehicle-evaluation-d1d693e10ad1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

K-Nearest Neighbor Classification and vehicle evaluation

Hello there!

Are you purchasing a new car or want to know whether your current car was a good purchase was not?

You are in good luck.

In this blog, we are going to predict cars acceptability based on it’s Price (purchase price, maintenance price) Specification (Number of doors, how many persons it can seat, luggage boot size, safety).

Data Source ->http://archive.ics.uci.edu/ml/datasets/Car+Evaluation

Get car.data file from above link.

Import mandatory python libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

Then run below lines of Python code to read the data in pandas dataframe. Note the code contains feature names.

from sklearn import datasets

df=pd.read_csv(r’C:\Sanrusha-Canon Laptop\Udemy\Machine Learning\SampleDataSet\car.data’,header=None,delimiter=’,’,names=[‘buying’,’maint’,’doors’,’persons’,’lug_boot’,’safety’,’CAR’])

Let’s review first few rows of dataframe df.

df.head()

You should get below output

1*RwRjjuPBVp5PHeHk2L3oKg.png?q=20
null

Oh no, it contains categorical variables. These categorical features will give us trouble while fitting it to sklearn KNN Classifier. Let’s label code them.

Run below lines of python code to encode these features. No need to encode target variable CAR.

from sklearn.preprocessing import LabelEncoder
lbc=LabelEncoder()
df[“buying”]=lbc.fit_transform(df[“buying”])
df[“maint”]=lbc.fit_transform(df[“maint”])
df[“lug_boot”]=lbc.fit_transform(df[“lug_boot”])
df[“safety”]=lbc.fit_transform(df[“safety”])
df[“doors”]=lbc.fit_transform(df[“doors”])
df[“persons”]=lbc.fit_transform(df[“persons”])

Let’s review first few records now

df.head()

1*FD6FSpEIJn36yWhcR84o5g.png?q=20
null

What are the unique values in target variable CAR? Let’s find out.

df[‘CAR’].unique()

1*t5tX5j79eZ258AxDbqQdsQ.png?q=20
null

Cars are categorized as Unacceptable (unacc), Low acceptability (low), Medium Acceptability (med), High Acceptability (high), Acceptable (acc), Very Good (vgood), Good (good).

Now, we know how the cars are going to be evaluated.

Before defining X and y vectors, let’s make sure all the feature values are coded properly and they are numeric values.

df.applymap(np.isreal).head()

1*KleBdtCrrYg7tjFBcqJRmA.png?q=20
null

Very good! The features are encoded into numeric values now. It’s right time to define X and y.

X=df.drop([‘CAR’], axis=1).values
y=df[‘CAR’].values

Split the data in training and test set

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=0)

Before calling KNeighborsClassifier, we need to find out optimal k value. Run below line of python code to get optimum K value

from sklearn.neighbors import KNeighborsClassifier
from math import sqrt
from sklearn.metrics import mean_squared_error
y1=lbc.fit_transform(df[“CAR”])
rmse=[]
for k in range(20):
k=k+1
knn=KNeighborsClassifier(n_neighbors=k)
knn.fit(X,y1)
#y_pred_knn=knn.predict(X)
rmse.append(sqrt(mean_squared_error(y1,knn.predict(X))))
print(‘K value ‘,k,’rmse ‘,sqrt(mean_squared_error(y1,knn.predict(X))))

1*JTWsbX7XxC_XV_3jr3FY7w.png?q=20
null

K value 7 looks optimum value.

Run below link of code to train the data set and get predicted values through KNN

knn=KNeighborsClassifier(n_neighbors=7)
knn.fit(X_train,y_train)
y_pred=knn.predict(X_test)

Very good! Let’s check accuracy of our model.

from sklearn import metrics
print(“Accuracy “,metrics.accuracy_score(y,knn.predict(X)))

1*ZLTPpjqB2TMNpEWYLb7_qQ.png?q=20
null

93% is pretty awesome accuracy. We build a good model!!

Let’s draw scatter plot of actual CAR acceptability vs predicted values.

import matplotlib.pyplot as plt
import numpy as np

for xe, ye in zip(X, y):
plt.scatter(xe,[ye] * len(xe),color=”blue”,marker=”o”,s=100)

for xe, ye in zip(X, knn.predict(X)):
plt.scatter(xe,[ye] * len(xe),color=”yellow”,marker=”*”,s=10)

plt.show()

1*Dq47E0lr0lszYNCfmDmUJw.png?q=20
null

Blue dots indicate actual values. Yellow star indicates predicted values. Pretty much each blue dot has yellow star.

Barring low acceptability, all other acceptability values were predicted well.

Pretty good!!

Reference:

Machine Learning Hands-on


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK