Lasso, Ridge, Elastic Net Regression
source link: https://medium.com/analytics-vidhya/advanced-regression-techniques-lasso-ridge-elastic-net-df93699101d1
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Lasso, Ridge, Elastic Net Regression
Regression models that regularize coefficients
As data scientists, the linear regression model is the first and simplest form of regression we learn for predicting continuous outcomes. We make the following assumptions in linear regression:
a. Predictors and target variables have a linear relationship
b. Data follows a normal distribution
c. Predictors are not correlated with each other
The linear regression model works by deriving coefficient values that minimize loss values like Root Mean Square Error (RMSE). However, if coefficient values are very high it leads to overfitting. Overfitting results in very accurate predictions on the training dataset but not-so-accurate prediction on the testing and real dataset. This predicament is solved by using Regularization. Regularization penalizes high coefficients and reduces multicollinearity.
Lasso, Ridge, Elastic Nets are advanced regression techniques that use regularization to bring optimal predictions where simple linear regression doesn’t work well.
In this blog, I will take you through these advanced regression techniques.
Data Source
We are going to use the house price dataset available on Kaggle. Get the source file from here :
This data set contains 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa. The challenge is to predict the final price of each home. Data is available in .csv format. Store these files on your local disk.
Data Exploration and Preprocessing
Let’s review the data and understand what we got. The below code will split the data into train and test data set and print the header rows of the train data set.
Output:
The following are key findings in the data:
a. It contains 80 feature variables. There are high chances of multi-collinearity.
b. Id feature is not useful.
c. Sales Price is Label (dependent variable).
As this blog is about exploring advanced regression techniques, I am not spending much time explaining data pre-processing steps. The below code takes care of the required data pre-processing (baring multi-collinearity).
After data pre-processing X_train and X_test contain 288 features
X_train.shape
Output:
(1460, 288)
It’s not easy to handle 288 feature variables using Simple Linear Regression approaches. We will use Multiple Linear Regression and compare its performance with advanced regression techniques like Lasso, Ridge, and Elastic Net that can reduce the loss function, prioritize the features, and get rid of multicollinearity. We will use R2 Score and Root Mean Square Error (RMSE) metrics to compare the performance of these regression techniques.
RMSE is a loss function. It measures the difference between the predicted value and the actual value.
R2 score measures how close the data are to the regression line.
Multiple Linear Regression
First, let’s use classical Multiple Linear Regression.
Output:
R2 Score 0.722
RMSE Mean 0.171
LASSO Regression
LASSO stands for Least Absolute Shrinkage and Selection Operator. Lasso encourages a simple model by shrinking data towards a central point, like mean. It achieves this shrinking by regularizing the coefficient values.
Lasso regression focuses on L1 regularization. It adds a penalty equal to the absolute value of the magnitude of coefficients. Let me explain.
Below is the equation for multiple linear regression
If we put this in the RMSE equation, it will look like below.
Regular linear regression techniques (i,e. Simple, Multiple) try to reduce RMSE to get an optimum regression trend line.
Lasso extends this loss function. It adds Lambda (λ) and absolute coefficients total. The loss function equation for Lasso is below
This is called L1 regularization. Getting the minimum value of this function means reducing below value
This part of the equation can be minimized either by low lambda value or by reducing coefficients, in other words penalizing coefficients.
Lasso reviews features and penalizes features of low importance by reducing their coefficient value. In other words, it also takes care of multicollinearity.
Output:
Lasso CV R2 Score 0.903
Lasso CV RMSE Mean 0.121
R2 score as well as RMSE has improved. Let’s understand how LASSO has achieved these improvements.
A. Feature selection: Did LASSO improve feature selection by reducing multicollinearity and penalizing related and not so important features?
Let’s find out.
Output:
Lasso picked 110 variables and eliminated the other 178 variables
Very good! LASSO got rid of 178 variables out of 288. It has considered only 110 variables. Even among these 110 variables, it has penalized few with low coefficient values. Let’s find out the top 10 and bottom 10 feature variables.
imp_coef = pd.concat([coef.sort_values().head(10),
coef.sort_values().tail(10)])
matplotlib.rcParams['figure.figsize'] = (8.0, 10.0)
imp_coef.plot(kind = "barh")
plt.title("Coefficients")
plt.show()
Here you go!
Now you know the top 10 and bottom 10 feature variables LASSO considered while reducing the loss function value.
How about the distribution of residuals? Let’s visualize them.
matplotlib.rcParams[‘figure.figsize’] = (6.0, 6.0)
preds = pd.DataFrame({“preds”:model_lasso.predict(X_train), “true”:y})
preds[“residuals”] = preds[“true”] — preds[“preds”]
preds.plot(x = “preds”, y = “residuals”,kind = “scatter”)
plt.show()
The spread of residual values looks good.
Ridge Regression
Ridge regression performs L2 regularization. Below is the loss function formula for ridge regression
Let’s implement the Ridge technique on this dataset.
#Ridge Regression
model_ridge = Ridge(alpha=10).fit(X_train, y)
print('Ridge R2 Score {:.3f}'.format((cross_val_score(model_ridge, X_train, y, scoring="r2", cv = 10).mean())))
print('Ridge RMSE Mean {:.3f}'.format(np.sqrt(-cross_val_score(model_ridge, X_train, y, scoring="neg_mean_squared_error", cv = 10)).mean()))
Output
Ridge R2 Score 0.897
Ridge RMSE Mean 0.125
There is an improvement compared to Multiple Linear Regression, but no improvement compared to LASSO. Let’s check the features is considered.
coef = pd.Series(model_ridge.coef_, index = X_train.columns)
print("Ridge picked " + str(sum(coef != 0)) + " variables and eliminated the other " + str(sum(coef == 0)) + " variables")
Output:
Ridge picked 288 variables and eliminated the other 0 variables
Unlike LASSO, ridge regression doesn’t care about feature selection. It considered all 288 features. Maybe that’s why it could not bring any improvement compared to LASSO, in this case.
Even while considering all the feature variables, Ridge brought better results compared to Multiple Linear Regression, because it regularized the loss function.
We used K-fold validation with a cv as 10. That means the data-set was divided into 10 equal sample sizes. Let’s visualize the range of loss function calculated on these sample sizes.
lr_loss=np.sqrt(-cross_val_score(lr_model, X_train, y, scoring="neg_mean_squared_error", cv = 10))
lasso_loss=np.sqrt(-cross_val_score(LassoCV(alphas = [1, 0.1, 0.001, 0.0005],cv=10), X_train, y, scoring="neg_mean_squared_error", cv = 10))
ridge_loss=np.sqrt(-cross_val_score(Ridge(alpha=10), X_train, y, scoring="neg_mean_squared_error", cv = 10))fig, ax = plt.subplots()
X_val=x=[1,2,3,4,5,6,7,8,9,10]
ax.plot(X_val,lr_loss, color="blue",label='Linear Reg')
ax.plot(X_val,lasso_loss, color="red",label='LASSO')
ax.plot(X_val,ridge_loss, color="green",label='Ridge')
plt.xlabel('Sample')
plt.ylabel('y')
ax.legend()
ax.grid(True)
plt.show()
Output:
It clearly shows, in all 10 cases the loss value for LASSO and Ridge remained more aligned compared to multiple linear regression.
This is the power of regularization!
Hope you find this blog useful! Look forward to your question/comments.
References:
Machine Learning Hands-on Course
Simple Linear Regression: ->https://medium.com/analytics-vidhya/simple-linear-regression-and-fun-behind-it-df509c2a057
Multiple Line Regression:-> https://medium.com/analytics-vidhya/multiple-linear-regression-7727a012ff93
Polynomial Linear Regression :-> https://medium.com/sanrusha-consultancy/polynomial-linear-regression-9d691a605aa0
KNN Regression :->https://medium.com/sanrusha-consultancy/k-nearest-neighbor-knn-regression-and-fun-behind-it-7055cf50ae56
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK