25

Pytorch [Tabular] — Binary Classification

 4 years ago
source link: https://towardsdatascience.com/pytorch-tabular-binary-classification-a0368da5bb89?gi=fac381a780e7
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

We will use the lower back pain symptoms dataset available on Kaggle. This dataset has 13 columns where the first 12 are the features and the last column is the target column. The data set has 300 rows.

NJvQBbn.png!web

Binary Classification meme [Image [1]]

Import Libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

from sklearn.preprocessing import StandardScaler    
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, classification_report

Read Data

df = pd.read_csv("data/tabular/classification/spine_dataset.csv")df.head()

IjYvQfA.png!web

EDA and Preprocessing

Class Distribution

There is a class imbalance here. While there’s a lot that can be done to combat class imbalance, it outside the scope of this blog post.

sns.countplot(x = 'Class_att', data=df)

BzMzmmq.png!web

Class imbalance bar plot [Image [2]]

Encode Output Class

PyTorch supports labels starting from 0. That is [0, n] . We need to remap our labels to start from 0.

df['Class_att'] = df['Class_att'].astype('category')encode_map = {
    'Abnormal': 1,
    'Normal': 0
}

df['Class_att'].replace(encode_map, inplace=True)

Create Input and Output Data

The last column is our output. The input is all the columns but the last one. Here we use .iloc method from the Pandas library to select our input and output columns.

X = df.iloc[:, 0:-1]
y = df.iloc[:, -1]

Train Test Split

We now split our data into train and test sets. We’ve selected 33% percent of out data to be in the test set.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=69)

Standardize Input

For neural networks to train properly, we need to standardize the input values. We standardize features by removing the mean and scaling to unit variance. The standard score of a sample x where the mean is u and the standard deviation is s is calculated as:

z = (x — u) / s

You can find more about standardization/normalization in neural nets here .

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

Model Parameters

To train our models, we need to set some hyper-parameters. Note that this is a very simple neural network, as a result, we do not tune a lot of hyper-parameters. The goal is to get to know how PyTorch works.

EPOCHS = 50
BATCH_SIZE = 64
LEARNING_RATE = 0.001

Define Custom Dataloaders

Here we define a Dataloader. If this is new to you, I suggest you read the following blog post on Dataloaders and come back.

## train dataclass trainData(Dataset):
    
    def __init__(self, X_data, y_data):
        self.X_data = X_data
        self.y_data = y_data
        
    def __getitem__(self, index):
        return self.X_data[index], self.y_data[index]
        
    def __len__ (self):
        return len(self.X_data)


train_data = trainData(torch.FloatTensor(X_train), 
                       torch.FloatTensor(y_train))
## test data    class testData(Dataset):
    
    def __init__(self, X_data):
        self.X_data = X_data
        
    def __getitem__(self, index):
        return self.X_data[index]
        
    def __len__ (self):
        return len(self.X_data)
    

test_data = testData(torch.FloatTensor(X_test))

Let’s initialize our dataloaders. We’ll use a batch_size = 1 for our test dataloader.

train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)test_loader = DataLoader(dataset=test_data, batch_size=1)

Define Neural Net Architecture

Here, we define a 2 layer Feed-Forward network with BatchNorm and Dropout.

i2yuqqv.jpg!web

Binary Classification using Feedforward network example [Image [3] credits ]

In our __init__() function, we define the what layers we want to use while in the forward() function we call the defined layers.

Since the number of input features in our dataset is 12, the input to our first nn.Linear layer would be 12. The output could be any number you want. The only thing you need to ensure is that number of output features of one layer should be equal to the input features of the next layer. Read more about nn.Linear in the docs .

Similarly, we define ReLU, Dropout, and BatchNorm layers.

Once we’ve defined all these layers, it’s time to use them. In the forward() function, we take inputs as our input. We pass this input through the different layers we initialized.

The first line of the forward() functions takes the input, passes it through our first linear layer and then applies the ReLU activation on it. Then we apply BatchNorm on the output. Look at the following code to understand it better.

Note that we did not use Sigmoid activation during training. That’s because, we use the nn.BCEWithLogitsLoss() loss function which automatically applies the the Sigmoid activation. We however, need to use Sigmoid manually during inference.

class binaryClassification(nn.Module):
    def __init__(self):
        super(binaryClassification, self).__init__()        # Number of input features is 12.
        self.layer_1 = nn.Linear(12, 64) 
        self.layer_2 = nn.Linear(64, 64)
        self.layer_out = nn.Linear(64, 1) 
        
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(p=0.1)
        self.batchnorm1 = nn.BatchNorm1d(64)
        self.batchnorm2 = nn.BatchNorm1d(64)
        
    def forward(self, inputs):
        x = self.relu(self.layer_1(inputs))
        x = self.batchnorm1(x)
        x = self.relu(self.layer_2(x))
        x = self.batchnorm2(x)
        x = self.dropout(x)
        x = self.layer_out(x)
        
        return x

Once, we’ve defined our architecture, we check our GPU is active. The amazing thing about PyTorch is that it’s super easy to use the GPU.

The variable device will either say cuda:0 if we have the GPU. If not, it’ll say cpu . You can follow along this tutorial even if you do not have a GPU without any change in code.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")print(device)
###################### OUTPUT ######################cuda:0

Next, we need to initialize our model. After initializing it, we move it to device . Now, this device is a GPU if you have one or it’s CPU if you don’t. The network we’ve used is fairly small. So, it will not take a lot of time to train.

After this, we initialize our optimizer and decide on which loss function to use.

model = binaryClassification()
model.to(device)print(model)criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)###################### OUTPUT ######################binaryClassification(
  (layer_1): Linear(in_features=12, out_features=64, bias=True)
  (layer_2): Linear(in_features=64, out_features=64, bias=True)
  (layer_out): Linear(in_features=64, out_features=1, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.1, inplace=False)
  (batchnorm1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (batchnorm2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)

6b2eaqM.jpg!web

Binary Classification meme [Image [4]]

Train the model

Before we start the actual training, let’s define a function to calculate accuracy during training.

In the function below, we take the predicted and actual output as the input. The predicted value(a probability) is rounded off to convert it into a either 0 or 1.

Once that is done, we simply compare the number of 1/0 we predicted to the number of 1/0 actually present and calculate the accuracy.

Note that the inputs y_pred and y_test are for a batch. Our batch_size was 64. So, this accuracy is being calculated for 64 predictions.

def binary_acc(y_pred, y_test):
    y_pred_tag = torch.round(torch.sigmoid(y_pred))

    correct_results_sum = (y_pred_tag == y_test).sum().float()
    acc = correct_results_sum/y_test.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

The moment we’ve been waiting for has arrived. Let’s train our model.

You can see we’ve put a model.train() at the before the loop. model.train() tells PyTorch that you’re in training mode.

Well, why do we need to do that? If you’re using layers such as Dropout or BatchNorm which behave differently during training and evaluation, you need to tell PyTorch to act accordingly. While the default mode in PyTorch is the train mode, so, you don’t explicitly have to write that. But it’s good practice.

Similarly, we’ll call model.eval() when we test our model. We’ll see that below.

Back to training; we start a for loop. At the top of this for loop, we initialize our loss and accuracy per epoch to 0. After every epoch, we’ll print out the loss/accuracy and reset it back to 0.

Then we have another for loop. This for loop is used to get our data in batches from the train loader .

We do optimizer.zero_grad() before we make any predictions. Since the backward() function accumulates gradients, we need to set it to 0 manually per mini-batch.

From our defined model, we then obtain a prediction, get the loss(and accuracy) for that mini-batch, perform backpropagation using loss.backward() and optimizer.step() . Finally, we add all the mini-batch losses (and accuracies) to obtain the average loss (and accuracy) for the epoch.

This loss and accuracy is printed out in the outer for loop.

model.train()
for e in range(1, EPOCHS+1):
    epoch_loss = 0
    epoch_acc = 0
    for X_batch, y_batch in train_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)
        optimizer.zero_grad()
        
        y_pred = model(X_batch)
        
        loss = criterion(y_pred, y_batch.unsqueeze(1))
        acc = binary_acc(y_pred, y_batch.unsqueeze(1))
        
        loss.backward()
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        

    print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f} | Acc: {epoch_acc/len(train_loader):.3f}')###################### OUTPUT ######################Epoch 001: | Loss: 0.04027 | Acc: 98.250
Epoch 002: | Loss: 0.12023 | Acc: 96.750
Epoch 003: | Loss: 0.02067 | Acc: 99.500
Epoch 004: | Loss: 0.07329 | Acc: 96.250
Epoch 005: | Loss: 0.04676 | Acc: 99.250
Epoch 006: | Loss: 0.03005 | Acc: 99.500
Epoch 007: | Loss: 0.05777 | Acc: 98.250
Epoch 008: | Loss: 0.03446 | Acc: 99.500
Epoch 009: | Loss: 0.03443 | Acc: 100.000
Epoch 010: | Loss: 0.03368 | Acc: 100.000
Epoch 011: | Loss: 0.02395 | Acc: 100.000
Epoch 012: | Loss: 0.05094 | Acc: 98.250
Epoch 013: | Loss: 0.03618 | Acc: 98.250
Epoch 014: | Loss: 0.02143 | Acc: 100.000
Epoch 015: | Loss: 0.02730 | Acc: 99.500
Epoch 016: | Loss: 0.02323 | Acc: 100.000
Epoch 017: | Loss: 0.03395 | Acc: 98.250
Epoch 018: | Loss: 0.08600 | Acc: 96.750
Epoch 019: | Loss: 0.02394 | Acc: 100.000
Epoch 020: | Loss: 0.02363 | Acc: 100.000
Epoch 021: | Loss: 0.01660 | Acc: 100.000
Epoch 022: | Loss: 0.05766 | Acc: 96.750
Epoch 023: | Loss: 0.02115 | Acc: 100.000
Epoch 024: | Loss: 0.01331 | Acc: 100.000
Epoch 025: | Loss: 0.01504 | Acc: 100.000
Epoch 026: | Loss: 0.01727 | Acc: 100.000
Epoch 027: | Loss: 0.02128 | Acc: 100.000
Epoch 028: | Loss: 0.01106 | Acc: 100.000
Epoch 029: | Loss: 0.05802 | Acc: 98.250
Epoch 030: | Loss: 0.01275 | Acc: 100.000
Epoch 031: | Loss: 0.01272 | Acc: 100.000
Epoch 032: | Loss: 0.01949 | Acc: 100.000
Epoch 033: | Loss: 0.02848 | Acc: 100.000
Epoch 034: | Loss: 0.01514 | Acc: 100.000
Epoch 035: | Loss: 0.02949 | Acc: 100.000
Epoch 036: | Loss: 0.00895 | Acc: 100.000
Epoch 037: | Loss: 0.01692 | Acc: 100.000
Epoch 038: | Loss: 0.01678 | Acc: 100.000
Epoch 039: | Loss: 0.02755 | Acc: 100.000
Epoch 040: | Loss: 0.02021 | Acc: 100.000
Epoch 041: | Loss: 0.07972 | Acc: 98.250
Epoch 042: | Loss: 0.01421 | Acc: 100.000
Epoch 043: | Loss: 0.01558 | Acc: 100.000
Epoch 044: | Loss: 0.01185 | Acc: 100.000
Epoch 045: | Loss: 0.01830 | Acc: 100.000
Epoch 046: | Loss: 0.01367 | Acc: 100.000
Epoch 047: | Loss: 0.00880 | Acc: 100.000
Epoch 048: | Loss: 0.01046 | Acc: 100.000
Epoch 049: | Loss: 0.00933 | Acc: 100.000
Epoch 050: | Loss: 0.11034 | Acc: 98.250

Test the model

After training is done, we need to test how our model fared. See that we’ve used model.eval() before we run our testing code. To tell PyTorch that we do not want to perform back-propagation during inference, we use torch.no_grad() which reduces memory usage and speeds up computation.

We start by defining a list that will hold our predictions. Then we loop through our batches using the test loader . For each batch,

  • We make the predictions using our trained model.
  • Round off the probabilities to 1 or 0.
  • Move the batch to the CPU from the GPU.
  • Convert the tensor to a numpy object and append it to our list.
  • Flatten out the list so that we can use it as an input to confusion_matrix and classification_report .
y_pred_list = []model.eval()
with torch.no_grad():
    for X_batch in test_loader:
        X_batch = X_batch.to(device)
        y_test_pred = model(X_batch)
        y_test_pred = torch.sigmoid(y_test_pred)
        y_pred_tag = torch.round(y_test_pred)
        y_pred_list.append(y_pred_tag.cpu().numpy())
y_pred_list = [a.squeeze().tolist() for a in y_pred_list]

Confusion Matrix

Once we have all our predictions, we use the confusion_matrix() function from scikit-learn to calculate the confusion matrix.

confusion_matrix(y_test, y_pred_list)
###################### OUTPUT ######################array([[23,  8],
       [12, 60]])

Classification Report

To obtain the classification report which has precision, recall, and F1 score, we use the function classification_report .

print(classification_report(y_test, y_pred_list))
###################### OUTPUT ######################precision    recall  f1-score   support           0       0.66      0.74      0.70        31
           1       0.88      0.83      0.86        72    accuracy                           0.81       103
   macro avg       0.77      0.79      0.78       103
weighted avg       0.81      0.81      0.81       103

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK