Lasso, Ridge, Elastic Net Regression

Regression models that regularize coefficients

As data scientists, the linear regression model is the first and simplest form of regression we learn for predicting continuous outcomes. We make the following assumptions in linear regression:

a. Predictors and target variables have a linear relationship

b. Data follows a normal distribution

c. Predictors are not correlated with each other

The linear regression model works by deriving coefficient values that minimize loss values like Root Mean Square Error (RMSE). However, if coefficient values are very high it leads to overfitting. Overfitting results in very accurate predictions on the training dataset but not-so-accurate prediction on the testing and real dataset. This predicament is solved by using Regularization. Regularization penalizes high coefficients and reduces multicollinearity.

Lasso, Ridge, Elastic Nets are advanced regression techniques that use regularization to bring optimal predictions where simple linear regression doesn’t work well.

In this blog, I will take you through these advanced regression techniques.

Data Source

We are going to use the house price dataset available on Kaggle. Get the source file from here :

House Prices: Advanced Regression Techniques

Predict sales prices and practice feature engineering, RFs, and gradient boosting

www.kaggle.com

This data set contains 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa. The challenge is to predict the final price of each home. Data is available in .csv format. Store these files on your local disk.

Data Exploration and Preprocessing

Let’s review the data and understand what we got. The below code will split the data into train and test data set and print the header rows of the train data set.

Output:

The following are key findings in the data:

a. It contains 80 feature variables. There are high chances of multi-collinearity.

b. Id feature is not useful.

c. Sales Price is Label (dependent variable).

As this blog is about exploring advanced regression techniques, I am not spending much time explaining data pre-processing steps. The below code takes care of the required data pre-processing (baring multi-collinearity).

After data pre-processing X_train and X_test contain 288 features

X_train.shape

Output:

(1460, 288)

It’s not easy to handle 288 feature variables using Simple Linear Regression approaches. We will use Multiple Linear Regression and compare its performance with advanced regression techniques like Lasso, Ridge, and Elastic Net that can reduce the loss function, prioritize the features, and get rid of multicollinearity. We will use R2 Score and Root Mean Square Error (RMSE) metrics to compare the performance of these regression techniques.

RMSE is a loss function. It measures the difference between the predicted value and the actual value.

RMSE Equation

R2 score measures how close the data are to the regression line.

Multiple Linear Regression

First, let’s use classical Multiple Linear Regression.

Output:

R2 Score 0.722
RMSE Mean 0.171

LASSO Regression

LASSO stands for Least Absolute Shrinkage and Selection Operator. Lasso encourages a simple model by shrinking data towards a central point, like mean. It achieves this shrinking by regularizing the coefficient values.

Lasso regression focuses on L1 regularization. It adds a penalty equal to the absolute value of the magnitude of coefficients. Let me explain.

Below is the equation for multiple linear regression

If we put this in the RMSE equation, it will look like below.

Regular linear regression techniques (i,e. Simple, Multiple) try to reduce RMSE to get an optimum regression trend line.

Lasso extends this loss function. It adds Lambda (λ) and absolute coefficients total. The loss function equation for Lasso is below

This is called L1 regularization. Getting the minimum value of this function means reducing below value

This part of the equation can be minimized either by low lambda value or by reducing coefficients, in other words penalizing coefficients.

Lasso reviews features and penalizes features of low importance by reducing their coefficient value. In other words, it also takes care of multicollinearity.

Output:

Lasso CV R2 Score 0.903
Lasso CV  RMSE Mean 0.121

R2 score as well as RMSE has improved. Let’s understand how LASSO has achieved these improvements.

A. Feature selection: Did LASSO improve feature selection by reducing multicollinearity and penalizing related and not so important features?

Let’s find out.

Output:

Lasso picked 110 variables and eliminated the other 178 variables

Very good! LASSO got rid of 178 variables out of 288. It has considered only 110 variables. Even among these 110 variables, it has penalized few with low coefficient values. Let’s find out the top 10 and bottom 10 feature variables.

imp_coef = pd.concat([coef.sort_values().head(10),
                     coef.sort_values().tail(10)])
matplotlib.rcParams['figure.figsize'] = (8.0, 10.0)
imp_coef.plot(kind = "barh")
plt.title("Coefficients")
plt.show()

Here you go!

Now you know the top 10 and bottom 10 feature variables LASSO considered while reducing the loss function value.

How about the distribution of residuals? Let’s visualize them.

matplotlib.rcParams[‘figure.figsize’] = (6.0, 6.0)
preds = pd.DataFrame({“preds”:model_lasso.predict(X_train), “true”:y})
preds[“residuals”] = preds[“true”] — preds[“preds”]
preds.plot(x = “preds”, y = “residuals”,kind = “scatter”)
plt.show()

The spread of residual values looks good.

Ridge Regression

Ridge regression performs L2 regularization. Below is the loss function formula for ridge regression

Let’s implement the Ridge technique on this dataset.

#Ridge Regression
model_ridge = Ridge(alpha=10).fit(X_train, y)
print('Ridge R2 Score {:.3f}'.format((cross_val_score(model_ridge, X_train, y, scoring="r2", cv = 10).mean())))
print('Ridge RMSE Mean {:.3f}'.format(np.sqrt(-cross_val_score(model_ridge, X_train, y, scoring="neg_mean_squared_error", cv = 10)).mean()))

Output

Ridge R2 Score 0.897
Ridge RMSE Mean 0.125

There is an improvement compared to Multiple Linear Regression, but no improvement compared to LASSO. Let’s check the features is considered.

coef = pd.Series(model_ridge.coef_, index = X_train.columns)
print("Ridge picked " + str(sum(coef != 0)) + " variables and eliminated the other " +  str(sum(coef == 0)) + " variables")

Output:

Ridge picked 288 variables and eliminated the other 0 variables

Unlike LASSO, ridge regression doesn’t care about feature selection. It considered all 288 features. Maybe that’s why it could not bring any improvement compared to LASSO, in this case.

Even while considering all the feature variables, Ridge brought better results compared to Multiple Linear Regression, because it regularized the loss function.

We used K-fold validation with a cv as 10. That means the data-set was divided into 10 equal sample sizes. Let’s visualize the range of loss function calculated on these sample sizes.

lr_loss=np.sqrt(-cross_val_score(lr_model, X_train, y, scoring="neg_mean_squared_error", cv = 10))
lasso_loss=np.sqrt(-cross_val_score(LassoCV(alphas = [1, 0.1, 0.001, 0.0005],cv=10), X_train, y, scoring="neg_mean_squared_error", cv = 10))
ridge_loss=np.sqrt(-cross_val_score(Ridge(alpha=10), X_train, y, scoring="neg_mean_squared_error", cv = 10))fig, ax = plt.subplots()
X_val=x=[1,2,3,4,5,6,7,8,9,10]
ax.plot(X_val,lr_loss, color="blue",label='Linear Reg')
ax.plot(X_val,lasso_loss, color="red",label='LASSO')
ax.plot(X_val,ridge_loss, color="green",label='Ridge')
plt.xlabel('Sample')
plt.ylabel('y')
ax.legend()
ax.grid(True)
plt.show()

Output:

It clearly shows, in all 10 cases the loss value for LASSO and Ridge remained more aligned compared to multiple linear regression.

This is the power of regularization!

Hope you find this blog useful! Look forward to your question/comments.

References:

Machine Learning Hands-on Course

Multiple Line Regression:-> https://medium.com/analytics-vidhya/multiple-linear-regression-7727a012ff93

Polynomial Linear Regression :-> https://medium.com/sanrusha-consultancy/polynomial-linear-regression-9d691a605aa0

KNN Regression :->https://medium.com/sanrusha-consultancy/k-nearest-neighbor-knn-regression-and-fun-behind-it-7055cf50ae56

Lasso, Ridge, Elastic Net Regression

Lasso, Ridge, Elastic Net Regression

Regression models that regularize coefficients

Data Source

House Prices: Advanced Regression Techniques

Predict sales prices and practice feature engineering, RFs, and gradient boosting

Data Exploration and Preprocessing

Multiple Linear Regression

LASSO Regression

Ridge Regression

References:

Machine Learning Hands-on Course

Join the most comprehensive Machine Learning Hands-on Course, because now is the time to get started! From basic…

End to End Machine Learning

Sanrusha is a leading provider of Machine Learning and AI based solutions. We strive to make life better by using AI.

Recommend

F# Show #0 - Vlad Zarytovskii @ Microsoft, Backpressure patterns, Sharding the S...

JetBrains ReSharper on Twitter: "The ReSharper 2021.3.3 bug-fix release is...

GitHub - asc-community/AngouriMath: Open-source cross-platform symbolic algebra...

Portant Workflow - Simple document automation for Google Workspace | Product Hun...

Share your setup with others and earn from affiliate links

Announcing .NET Community Toolkit v8.0.0 Preview 1

Beseda Share - Async audio & video discussions – right from your browser | P...

full-stack template with built-in error handling I was talking about at F# eXcha...

Stock picker and portfolio analysis app for retail investors

Stanwick man uses 72-year-old toaster every day

About Joyk