Algorithms From Scratch: Linear Regression

Detailing and Building a Linear Regression model from scratch

Jul 8 ·9min read

ZrmUJfN.jpg!web

Photo by Isaac Smith on Unsplash

Linear Regression is a popular linear Machine Learning algorithm for regression-based problems. It is usually one of the first algorithms that is learnt when first learning Machine Learning, due to its simplicity and how it builds into other algorithms like Logistic Regression and Neural Networks. In this story we are going to implement it from scratch so that we can build our intuition about what is going on within a Linear Regression model.

Link to Github Repo…

kurtispykes/ml-from-scratch

Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…

github.com

Note: There are many frameworks with highly optimized code such as Scikit-Learn, Tensorflow and PyTorch, therefore it is usually unnecessary to build your own algorithm from scratch. However, it serves a good purpose to our intuition of what is happening within the model when we build it from scratch and this helps when we attempt to improve our models performance.

A linear model is an algorithm that makes a prediction by simply computing a weighted sum of the input features plus a bias term (also referred to as the intercept term). Taking that into perspective, what we are doing when we use a linear regression model is we hope to explain the relationship between a dependent variable (i.e house price) and one or more independent variables (i.e. location, bedrooms, area, etc).

Figure 1: Multivariate Linear Regression

When we train a model we are attempting to set the parameters to get a line that best fits the training data. Therefore, when we train a Linear Regression model we are trying to find the value of theta that best minimizes the cost function . The most common cost function of a regression models is the RMSE , however, it is much easier to minimize the MSE as it leads to the same result.

Creating the Model

If you have never written a Machine Learning algorithm from scratch, I greatly encourage you to do so. John Sullivan wrote a very useful story title 6 Steps To Write Any Machine Learning Algorithm From Scratch: Perceptron Case Study w hich is the best advice I have managed to find on the internet about writing algorithms from scratch.

Chunking the Algorithm

Randomly initialize parameters for the hypothesis function
Calculate the Partial Derivative (Read more about this here )
Update parameters
Repeat 2–3 for n number of iterations (Until cost function is minimized otherwise)
Inference

Implementation

For this section, I will be leveraging 3 Python packages. NumPy for Linear algebra, Scikit-Learn which is a popular Machine Learning framework and Matplotlib to visualize our data.

import numpy as np 
import matplotlib.pyplot as plt 
from sklearn.datasets import make_regression 
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error

First, we need a dataset. To do this I will sklearn.datasets.make_regression which allows you to generate a random regression problem — see Documentation . Next, I will split my data into training and test sets with sklearn.model_selection.train_test_split — Documentation .

# creating the data set
X, y = make_regression(n_samples=100, n_features=1, n_targets=1, noise=20, random_state=24)

# splitting training and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, random_state=24)

Lets use matplotlib.pyplot to see how our data looks — Documentation .

# visualize 
plt.scatter(X, y)
plt.show()

y2uAvuZ.png!web

Figure 2: Our generated regression problem

Now we can begin our implementation of Linear Regression. The first step of our chunk is to randomly initialize parameters for our hypothesis function.

def param_init(X): 
"""
    Initialize parameters for linear regression model
    __________________ 
    Input(s)
    X: Training data
    __________________
    Output(s)
    params: Dictionary containing coefficients
    """
    params = {} # initialize dictionary 
    _, n_features = X.shape # shape of training data

# initializing coefficents to 0 
    params["W"] = np.zeros(n_features)
    params["b"] = 0
return params

Excellent. Next we want to calculate the partial derivatives and update our parameters. We do this using a very important Machine Learning algorithm called Gradient Descent . Ergo, we can implement steps 2–4 with Gradient Descent.

def gradient_descent(X, y, params, alpha, n_iter): 
"""
    Gradient descent to minimize cost function
    __________________ 
    Input(s)
    X: Training data
    y: Labels
    params: Dictionary contatining random coefficients
    alpha: Model learning rate
    __________________
    Output(s)
    params: Dictionary containing optimized coefficients
    """
    W = params["W"] 
    b = params["b"]
    m = X.shape[0] # number of training instances 

for _ in range(n_iter): 
# prediction with random weights
        y_pred = np.dot(X, W) + b
# taking the partial derivative of coefficients
        dW = (2/m) * np.dot(X.T, (y_pred - y)) 
        db = (2/m) * np.sum(y_pred -  y)
# updates to coefficients
        W -= alpha * dW
        b -= alpha * db 

    params["W"] = W
    params["b"] = b
return params

Using these functions we can train our Linear Regression model on our training data to obtain model parameters that we will require for inference.

def train(X, y, alpha=0.01, n_iter=1000):
"""
    Train Linear Regression model with Gradient decent
    __________________ 
    Input(s)
    X: Training data
    y: Labels
    alpha: Model learning rate
    n_iter: Number of iterations 
    __________________
    Output(s)
    params: Dictionary containing optimized coefficients
    """ 
    init_params = param_init(X)
    params = gradient_descent(X, y, init_params, alpha, n_iter)
return params

Now, when we run this function we would obtain the optimized weights from our training data that we will use for inference on our test data. Next, we need to create a predict function for our inference that uses our stored weights.

def predict(X_test, params):
"""
    Train Linear Regression model with Gradient decent
    __________________ 
    Input(s)
    X: Unseen data
    params: Dictionary contianing optimized weights from training
    __________________
    Output(s)
    y_preds: Predictions of model
    """  
    y_preds = np.dot(X_test, params["W"]) + params["b"]
return y_preds

Great! Let’s run these functions and plot them to see what happens…

params = train(X_train, y_train) # train model
y_preds = predict(X_test, params) # inferenceplt.scatter(X_test, y_test)
plt.plot(X_test, y_preds, color="red")
plt.title("Predictions Dummy Regression Data")
plt.xlabel("X axis")
plt.ylabel("Y axis")

plt.show()

aYvQvyY.png!web

Figure 3: Predictions from custom linear regression model.

Our line of best fit seems to be pretty decent. To be entirely sure of our implementation we are fortunate enough to have many Machine Learning libraries, with optimized code, to compare our implementation to. To compare, I am simply going to check the RMSE of our implementation vs Scikit-learn.

Note: It may also be worthwhile to plot their implementation on the same plot and if their line of best fit covers yours then you are on the right tracks.

lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)
sklearn_y_preds = lin_reg.predict(X_test)print(f"My implementation: {np.sqrt(mean_squared_error(y_test, y_preds))}\nSklearn implementation: {np.sqrt(mean_squared_error(y_test, y_preds))}")>>>> My implementation: 20.986105292320207
Sklearn implementation: 20.986105292320207

It’s a match!

For this implementation we used procedural programming which can become quite complex and cumbersome to maintain and does not maximize the potential of Python which is an Object oriented Programming (OOP) language. On that note, here is an OOP implementation of our Linear Regression model.

class LinReg(): 
"""
    Custom made Linear Regression class
    """
def __init__(self, alpha=0.01, n_iter= 1000): 
        self.alpha = alpha
        self.n_iter = n_iter 
        self.params = {}

def param_init(self, X_train): 
"""
        Initialize parameters for linear regression model
        __________________ 
        Input(s)
        X: Training data
        """
        _, n_features = self.X.shape # shape of training data

# initializing coefficents to 0 
        self.params["W"] = np.zeros(n_features)
        self.params["b"] = 0
return self


def gradient_descent(self, X_train, y_train): 
"""
        Gradient descent to minimize cost function
        __________________ 
        Input(s)
        X: Training data
        y: Labels
        params: Dictionary contatining random coefficients
        alpha: Model learning rate
        __________________
        Output(s)
        params: Dictionary containing optimized coefficients
        """
        W = self.params["W"] 
        b = self.params["b"] 
        m = X_train.shape[0]

for _ in range(self.n_iter): 
# prediction with random weights
            y_pred = np.dot(X_train, W) + b
# taking the partial derivative of coefficients
            dW = (2/m) * np.dot(X_train.T, (y_pred - y_train)) 
            db = (2/m) * np.sum(y_pred -  y_train)
# updates to coefficients
            W -= self.alpha * dW
            b -= self.alpha * db 

        self.params["W"] = W
        self.params["b"] = b
return self

def train(self, X_train, y_train):
"""
        Train Linear Regression model with Gradient decent
        __________________ 
        Input(s)
        X: Training data
        y: Labels
        alpha: Model learning rate
        n_iter: Number of iterations 
        __________________
        Output(s)
        params: Dictionary containing optimized coefficients
        """ 
        self.params = param_init(X_train)
        gradient_descent(X_train, y_train, self.params , self.alpha, self.n_iter)
return self 

def predict(self, X_test):
"""
        Train Linear Regression model with Gradient decent
        __________________ 
        Input(s)
        X: Unseen data
        params: Dictionary contianing optimized weights from training
        __________________
        Output(s)
        y_preds: Predictions of model
        """  
        y_preds = np.dot(X_test, self.params["W"]) + self.params["b"]
return y_preds

Let’s call this class and make it make a prediction on our test data…

linreg = LinReg()
linreg.train(X_train, y_train)
linreg.predict(X_test)>>>> 
array([   4.73888182,  -90.06369632,   80.39799712,   66.76983607,
        -49.97207144,   93.77905208,   34.30778991,  -38.2209702 ,
         78.03331698,   53.81416352,  102.96993005,  151.71946744,
         95.52801857,  104.82707085,   98.0492089 ,   45.05150211,
         -7.29917923,  -78.41675446,  -27.14118529,  -98.52923336,
        170.75840972, -106.22126739,   24.86194847,  -21.39127805,
         50.24074837])

As a sanity check we will test if the predictions are the same as our procedural implementation (since we know this is similar to the scikit-learn implementation already).

linreg.predict(X_test) == y_preds>>>> 
array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

And there you have it!

Note: The problem we used as an example is simple linear regression problem (Univariate Linear Regression), y = WX + b, where W is the weight and b is the bias.

Assumptions

Before deciding that we want to use Linear Regression it is important to understand the assumptions that the model makes of our data so that we can perform the necessary feature engineering to align with our model.

For more on feature engineering, see my previous story on this topic…

Feature Engineering for Numerical Data

Tips ’n’ Tricks for engineering numerical values

towardsdatascience.com

Linear Relationship— Linear regression only captures linear relationships, ergo assumes that there is a linear relationship between the features and the target that we want to predict. We can check the validity of this by plotting scatter graphs of the features in relation to the target variable.

Multicollinearity — Refers to a situation in which two or more explanatory variables in a multiple regression model are highly related (Source: Wikipedia). When we implement Linear Regression we are assuming that there is little to no multicollinearity since when present it weakens the statistical power of the regression model. To identify whether multicollinearity is present in our data we can use a correlation matrix to see which features are highly correlated and remove one of the highly correlated features.

Homoscedasticity — In statistics, a sequence (or a vector) of random variables is homoscedastic if all its random variables have the same finite variance (Source: Wikipedia). Linear Regression assumes that the noise is the same (may also be referred to as error term or random disturbance) across all observations and does not depend on the values of the independent variables. We may use a scatter plot of the residual values against the predicted values to check of homoscedasticity. We should expect no patterns in the distribution because if there is a pattern then the data is heteroscedastic.

Multivariate Normality— We assume the variables we use for Linear Regression follow the Normal Distribution . We can check this using a Histogram or a QQ-plot and use a non-linear transformation (i.e. log-transformation) to change the distribution of the data to fit our assumption.

Autocorrelation —The correlation of a signal with a delayed copy of itself as a function of delay — Informally, it is the similarity between observations as a function of the time lag between them (Source: Wikipedia). To put it simply Autocorrelation occurs when the residuals are not independent from each other. We may use a scatter plot to visually check for autocorrelation or the Durbin-Watsons’s test which test the null hypothesis that the residuals are not linearly autocorrelated.

Pros

Easy to implement
Easy to interpret the output coefficients
Less complex

Cons

Sensitive to Outliers
Assumes linear relationship between independent and dependent variables
Assumes independence between features

For more about Linear Regression you can read the Wikipedia page.

Linear regression

In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or…

en.wikipedia.org

Wrap Up

Linear Regression is one of the easiest Machine Learning models and most likely the first you would learn or should learn. It is a very useful tool to analyze the relationship between variables, but due to the many assumptions of the model and simplification of real-world problems, it is not recommended for most practical applications.

Thank you for taking the time to go through this story. If there is anything that I have missed or you’d like me to clarify, leave a response in the comments. Additionally, If you’d like to get in contact with me, I am most accessible on LinkedIn.

Kurtis Pykes - AI Writer - Towards Data Science | LinkedIn

View Kurtis Pykes' profile on LinkedIn, the world's largest professional community. Kurtis has 1 job listed on their…

Detailing and Building a Linear Regression model from scratch

kurtispykes/ml-from-scratch

github.com

Creating the Model

Chunking the Algorithm

Implementation

Assumptions

Feature Engineering for Numerical Data

Tips ’n’ Tricks for engineering numerical values

towardsdatascience.com

Pros

Cons

Linear regression

en.wikipedia.org

Wrap Up

Kurtis Pykes - AI Writer - Towards Data Science | LinkedIn

www.linkedin.com

Recommend

IRS targets privacy coins, Lightning Network and Schnorr signatures

Python import: Advanced Techniques and Tips

Amazon RDS的同步复制是如何实现的

Barebones WebGL in 75 lines of code

Linux Mint drops Ubuntu Snap packages

Just how much can you do with one line of CSS?

快递集体出海抢肉吃

华为云蝉联中国政务云基础架构市场领军者位置！

乘风破浪的多核CPU

Windows Terminal: The Ultimate Guide

About Joyk