An “Equation-to-Code” Machine Learning Project Walk-Through — Part 4 Regularizat...
source link: https://www.tuicool.com/articles/iiemM3u
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Detailed explanation to implement regularization from scratch in Python
Hi, everyone! This is “Equation-to-Code” walk-through part 4, the final one in this series.
In the previous articles, we talk about in linear separable problem in part 1 , non-linear separable problem in part 2 , and stochastic gradient descent (SGD) in part 3 . Just like other parts, Part 4 is self-contained, you can just ignore the previous articles.
In part 4, we will talk about how to implement regularization for a regression problem, which can make our model more robust.
Here are the complete code, regression_without_regularization.py and regression_with_regularization.py .
The content is structured as follows.
- Regularization
- Fake some data samples
- Preprocessing
- Implementation without regularization
- Implementation with regularization
- Summary
1 Regularization
If our model is too complicated , it would fit training data very well but failed in new data. We called this kind of problem as overfitting .
In order to “not fit training data very well” (the middle on in above figure), we usually use some techniques to avoid overfitting, like cross-validation, dropout, batch normalization and so on.
This time, we will talk about L2 regularization term, which is wildly used in most of machine learning model.
2 Fake some data samples
We use beblow polynomial function to fake some data samples.
In order to make the data more real, we add some noise to it. You can see in the code.
import numpy as np import matplotlib.pyplot as plt
# random seed to make sure reimplement np.random.seed(0)
# the real model line def g(x): return 0.1 * (x + x**2 + x**3)
# add noise to the model for faking data train_x = np.linspace(-2, 2, 8) train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05
# plot x = np.linspace(-2, 2, 100) plt.plot(train_x, train_y, 'o') plt.plot(x, g(x), linestyle='dashed') plt.ylim(-1, 2) plt.show()
The dashed line means the real line we want to model.
3 Preprocessing
In step 1, we talked regulation is needed when models are too complicated. For example, the real line above is a polynomial function of degree 3.
But if we choose a polynomial function of degree 10, the model is more likely complicated.
Because we have 10 degrees and one bias term, so we also have 11 parameters.
We implement this to simulate the complicated situation.
import numpy as np import matplotlib.pyplot as plt
# random seed to make sure reimplement np.random.seed(0)
# the real model line def g(x): return 0.1 * (x + x**2 + x**3)
# add noise to the model for faking data train_x = np.linspace(-2, 2, 8) train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05
# standardization
mu = train_x.mean()
std = train_x.std()
def standardizer(x):
return (x - mu) / std
std_x = standardizer(train_x)
# get matrix
def to_matrix(x):
return np.vstack([
np.ones(x.size),
x,
x ** 2,
x ** 3,
x ** 4,
x ** 5,
x ** 6,
x ** 7,
x ** 8,
x ** 9,
x ** 10,
]).T
<strong>mat_x = to_matrix(std_x)</strong>
# initialize parameter
theta = np.random.randn(mat_x.shape[1])
# predict function
def f(x):
return np.dot(x, theta)
- standardization: first we standardizer our data
- get matrix: we make data as matrix form for matrix operations, which simulate the polynomial function of degree 10
- initialize parameter: initialize parameter according to the size of the input data
- predict function: this is our predicted function just as the equation above.
4 Implementation without regularization
We use mean squared error (MSE) as the cost function.
# cost function def E(x, y): return 0.5 * np.sum((y - f(x))**2)
# initialize error error = E(mat_x, train_y)
We use gradient descent to update parameters.
A Numpy array-like version might be easy to understand. Here I just list three parameters just for seeing the equation clearly.
The code
# learning rate ETA = 1e-4
# update parameter for _ in range(epoch): theta = theta - ETA * np.dot(f(X) - train_y, mat_x)
We combine the code together
# learning rate ETA = 1e-4
# initialize difference between two epochs diff = 1
######## training without regularization ########
while diff > 1e-6:
# mat_x = (20, 4)
# f(x) - y = (20,)
theta = theta - ETA * (np.dot(f(mat_x) - train_y, mat_x))
current_error = E(mat_x, train_y)
diff = error - current_error
error = current_error
# save parameters theta1 = theta
########## plot line ########## plt.ylim(-1, 2) plt.plot(std_x, train_y, 'o') z = standardizer(np.linspace(-2, 2, 100))
# plot the line without regularization theta = theta1 plt.plot(z, f(to_matrix(z)), linestyle='dashed') plt.show()
We can see what we have learned.
Here is the complete code, regression_without_regularization.py
5 Implementation with regularization
The L2 regulation term looks like this
And we combine cost function and regularization term together.
Because we add regularization term, we also need to change the update equation accordingly.
Notice that we don’t use lambda to update the bias parameter theta 0.
The code
# regularization parameter LAMBDA = 1
# initialize difference between two epochs diff = 1
# initialize error error = E(mat_x, train_y)
######## training without regularization ########
while diff > 1e-6:
# notice we don't use regularization for theta 0
reg_term = LAMBDA * np.hstack([0, theta[1:]])
# update parameter
theta = theta - ETA * (np.dot(mat_x.T, f(mat_x) - train_y) + reg_term)
current_error = E(mat_x, train_y)
diff = error - current_error
error = current_error
# save parameters theta2 = theta
########## plot the line with regularization ########## plt.ylim(-1, 2) plt.plot(std_x, train_y, 'o') z = standardizer(np.linspace(-2, 2, 100))
theta = theta2 plt.plot(z, f(to_matrix(z))) plt.show()
The model looks like this.
We can see after adding regularization, the model line becomes more smooth and be more like the original line with the degree 3.
Here is the complete code, regression_with_regularization.py
6 Summary
This is the final article of “Equation-to-Code” walk-through project. Hope they are helpful for you. Leave comments to let me know whether my article is easy to understand. Thanks for reading.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK