17

An “Equation-to-Code” Machine Learning Project Walk-Through — Part 4 Regularizat...

 5 years ago
source link: https://www.tuicool.com/articles/iiemM3u
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Detailed explanation to implement regularization from scratch in Python

mM3eequ.jpg!web

Hi, everyone! This is “Equation-to-Code” walk-through part 4, the final one in this series.

In the previous articles, we talk about in linear separable problem in part 1 , non-linear separable problem in part 2 , and stochastic gradient descent (SGD) in part 3 . Just like other parts, Part 4 is self-contained, you can just ignore the previous articles.

In part 4, we will talk about how to implement regularization for a regression problem, which can make our model more robust.

Here are the complete code, regression_without_regularization.py and regression_with_regularization.py .

The content is structured as follows.

  1. Regularization
  2. Fake some data samples
  3. Preprocessing
  4. Implementation without regularization
  5. Implementation with regularization
  6. Summary

1 Regularization

If our model is too complicated , it would fit training data very well but failed in new data. We called this kind of problem as overfitting .

ZVVBvaa.png!web
from ISCG8025

In order to “not fit training data very well” (the middle on in above figure), we usually use some techniques to avoid overfitting, like cross-validation, dropout, batch normalization and so on.

This time, we will talk about L2 regularization term, which is wildly used in most of machine learning model.

2 Fake some data samples

We use beblow polynomial function to fake some data samples.

miymAjJ.png!web

In order to make the data more real, we add some noise to it. You can see in the code.

import numpy as np
import matplotlib.pyplot as plt
# random seed to make sure reimplement
np.random.seed(0)
# the real model line
def g(x):
 return 0.1 * (x + x**2 + x**3)
# add noise to the model for faking data
train_x = np.linspace(-2, 2, 8)
train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05
# plot
x = np.linspace(-2, 2, 100)
plt.plot(train_x, train_y, 'o')
plt.plot(x, g(x), linestyle='dashed')
plt.ylim(-1, 2)
plt.show()
3QNJV3r.png!web

The dashed line means the real line we want to model.

3 Preprocessing

In step 1, we talked regulation is needed when models are too complicated. For example, the real line above is a polynomial function of degree 3.

miymAjJ.png!web
a polynomial function of degree 3

But if we choose a polynomial function of degree 10, the model is more likely complicated.

Nrmieyj.png!web
a polynomial function of degree 10

Because we have 10 degrees and one bias term, so we also have 11 parameters.

nmY7FjR.png!web

We implement this to simulate the complicated situation.

import numpy as np
import matplotlib.pyplot as plt
# random seed to make sure reimplement
np.random.seed(0)
# the real model line
def g(x):
 return 0.1 * (x + x**2 + x**3)
# add noise to the model for faking data
train_x = np.linspace(-2, 2, 8)
train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05
# standardization
mu = train_x.mean()
std = train_x.std()
def standardizer(x):
return (x - mu) / std
std_x = standardizer(train_x)

# get matrix
def to_matrix(x):
return np.vstack([
np.ones(x.size),
x,
x ** 2,
x ** 3,
x ** 4,
x ** 5,
x ** 6,
x ** 7,
x ** 8,
x ** 9,
x ** 10,
]).T
<strong>mat_x = to_matrix(std_x)</strong>
# initialize parameter
theta = np.random.randn(mat_x.shape[1])
# predict function
def f(x):
return np.dot(x, theta)
  • standardization: first we standardizer our data
  • get matrix: we make data as matrix form for matrix operations, which simulate the polynomial function of degree 10
  • initialize parameter: initialize parameter according to the size of the input data
  • predict function: this is our predicted function just as the equation above.

4 Implementation without regularization

We use mean squared error (MSE) as the cost function.

JFvmam2.png!web
# cost function
def E(x, y):
 return 0.5 * np.sum((y - f(x))**2)
# initialize error
error = E(mat_x, train_y)

We use gradient descent to update parameters.

jAfaM3R.png!web

A Numpy array-like version might be easy to understand. Here I just list three parameters just for seeing the equation clearly.

vUj6nu7.png!web

The code

# learning rate
ETA = 1e-4
# update parameter
for _ in range(epoch):
 theta = theta - ETA * np.dot(f(X) - train_y, mat_x)

We combine the code together

# learning rate
ETA = 1e-4
# initialize difference between two epochs
diff = 1
######## training without regularization ########
while diff > 1e-6:
# mat_x = (20, 4)
# f(x) - y = (20,)
theta = theta - ETA * (np.dot(f(mat_x) - train_y, mat_x))
current_error = E(mat_x, train_y)
diff = error - current_error
error = current_error
# save parameters
theta1 = theta
########## plot line ##########
plt.ylim(-1, 2)
plt.plot(std_x, train_y, 'o')
z = standardizer(np.linspace(-2, 2, 100))
# plot the line without regularization
theta = theta1
plt.plot(z, f(to_matrix(z)), linestyle='dashed')
plt.show()

We can see what we have learned.

iiQbYj6.png!web

Here is the complete code, regression_without_regularization.py

5 Implementation with regularization

The L2 regulation term looks like this

7fmAfiv.png!web

And we combine cost function and regularization term together.

INvYzub.png!web

Because we add regularization term, we also need to change the update equation accordingly.

u2eqm2E.png!web

Notice that we don’t use lambda to update the bias parameter theta 0.

The code

# regularization parameter
LAMBDA = 1
# initialize difference between two epochs
diff = 1
# initialize error
error = E(mat_x, train_y)
######## training without regularization ########
while diff > 1e-6:
# notice we don't use regularization for theta 0
reg_term = LAMBDA * np.hstack([0, theta[1:]])
# update parameter
theta = theta - ETA * (np.dot(mat_x.T, f(mat_x) - train_y) + reg_term)
current_error = E(mat_x, train_y)
diff = error - current_error
error = current_error
# save parameters
theta2 = theta
########## plot the line with regularization ##########
plt.ylim(-1, 2)
plt.plot(std_x, train_y, 'o')
z = standardizer(np.linspace(-2, 2, 100))
theta = theta2
plt.plot(z, f(to_matrix(z)))
plt.show()

The model looks like this.

3IfAfar.png!web

We can see after adding regularization, the model line becomes more smooth and be more like the original line with the degree 3.

Here is the complete code, regression_with_regularization.py

6 Summary

This is the final article of “Equation-to-Code” walk-through project. Hope they are helpful for you. Leave comments to let me know whether my article is easy to understand. Thanks for reading.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK