An “Equation-to-Code” Machine Learning Project Walk-Through — Part 4 Regularizat... - JOYK Joy of Geek, Geek News, Link all geek

Detailed explanation to implement regularization from scratch in Python

Hi, everyone! This is “Equation-to-Code” walk-through part 4, the final one in this series.

In the previous articles, we talk about in linear separable problem in part 1 , non-linear separable problem in part 2 , and stochastic gradient descent (SGD) in part 3 . Just like other parts, Part 4 is self-contained, you can just ignore the previous articles.

In part 4, we will talk about how to implement regularization for a regression problem, which can make our model more robust.

Here are the complete code, regression_without_regularization.py and regression_with_regularization.py .

The content is structured as follows.

Regularization
Fake some data samples
Preprocessing
Implementation without regularization
Implementation with regularization
Summary

1 Regularization

If our model is too complicated , it would fit training data very well but failed in new data. We called this kind of problem as overfitting .

from ISCG8025

In order to “not fit training data very well” (the middle on in above figure), we usually use some techniques to avoid overfitting, like cross-validation, dropout, batch normalization and so on.

This time, we will talk about L2 regularization term, which is wildly used in most of machine learning model.

2 Fake some data samples

We use beblow polynomial function to fake some data samples.

In order to make the data more real, we add some noise to it. You can see in the code.

import numpy as np
import matplotlib.pyplot as plt

# random seed to make sure reimplement
np.random.seed(0)

# the real model line
def g(x):
 return 0.1 * (x + x**2 + x**3)

# add noise to the model for faking data
train_x = np.linspace(-2, 2, 8)
train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05

# plot
x = np.linspace(-2, 2, 100)
plt.plot(train_x, train_y, 'o')
plt.plot(x, g(x), linestyle='dashed')
plt.ylim(-1, 2)
plt.show()

The dashed line means the real line we want to model.

3 Preprocessing

In step 1, we talked regulation is needed when models are too complicated. For example, the real line above is a polynomial function of degree 3.

a polynomial function of degree 3

But if we choose a polynomial function of degree 10, the model is more likely complicated.

a polynomial function of degree 10

Because we have 10 degrees and one bias term, so we also have 11 parameters.

We implement this to simulate the complicated situation.

import numpy as np
import matplotlib.pyplot as plt

# random seed to make sure reimplement
np.random.seed(0)

# the real model line
def g(x):
 return 0.1 * (x + x**2 + x**3)

# add noise to the model for faking data
train_x = np.linspace(-2, 2, 8)
train_y = g(train_x) + np.random.randn(len(train_x)) * 0.05

# standardization
mu = train_x.mean()
std = train_x.std()
def standardizer(x):
 return (x - mu) / std
std_x = standardizer(train_x)

# get matrix
def to_matrix(x):
 return np.vstack([
 np.ones(x.size),
 x,
 x ** 2,
 x ** 3,
 x ** 4,
 x ** 5,
 x ** 6,
 x ** 7,
 x ** 8,
 x ** 9,
 x ** 10,
 ]).T

<strong>mat_x = to_matrix(std_x)</strong>

# initialize parameter
theta = np.random.randn(mat_x.shape[1])

# predict function
def f(x):
 return np.dot(x, theta)

standardization: first we standardizer our data
get matrix: we make data as matrix form for matrix operations, which simulate the polynomial function of degree 10
initialize parameter: initialize parameter according to the size of the input data
predict function: this is our predicted function just as the equation above.

4 Implementation without regularization

We use mean squared error (MSE) as the cost function.

# cost function
def E(x, y):
 return 0.5 * np.sum((y - f(x))**2)

# initialize error
error = E(mat_x, train_y)

We use gradient descent to update parameters.

A Numpy array-like version might be easy to understand. Here I just list three parameters just for seeing the equation clearly.

The code

# learning rate
ETA = 1e-4

# update parameter
for _ in range(epoch):
 theta = theta - ETA * np.dot(f(X) - train_y, mat_x)

We combine the code together

# learning rate
ETA = 1e-4

# initialize difference between two epochs
diff = 1

######## training without regularization ########
while diff > 1e-6:
 # mat_x = (20, 4)
 # f(x) - y = (20,)
 theta = theta - ETA * (np.dot(f(mat_x) - train_y, mat_x))
 current_error = E(mat_x, train_y)
 diff = error - current_error 
 error = current_error

# save parameters
theta1 = theta

########## plot line ##########
plt.ylim(-1, 2)
plt.plot(std_x, train_y, 'o')
z = standardizer(np.linspace(-2, 2, 100))

# plot the line without regularization
theta = theta1
plt.plot(z, f(to_matrix(z)), linestyle='dashed')
plt.show()

We can see what we have learned.

Here is the complete code, regression_without_regularization.py

5 Implementation with regularization

The L2 regulation term looks like this

And we combine cost function and regularization term together.

Because we add regularization term, we also need to change the update equation accordingly.

Notice that we don’t use lambda to update the bias parameter theta 0.

The code

# regularization parameter
LAMBDA = 1

# initialize difference between two epochs
diff = 1

# initialize error
error = E(mat_x, train_y)

######## training without regularization ########
while diff > 1e-6:
 # notice we don't use regularization for theta 0
 reg_term = LAMBDA * np.hstack([0, theta[1:]])
 # update parameter
 theta = theta - ETA * (np.dot(mat_x.T, f(mat_x) - train_y) + reg_term)
 current_error = E(mat_x, train_y)
 diff = error - current_error
 error = current_error

# save parameters
theta2 = theta

########## plot the line with regularization ##########
plt.ylim(-1, 2)
plt.plot(std_x, train_y, 'o')
z = standardizer(np.linspace(-2, 2, 100))

theta = theta2
plt.plot(z, f(to_matrix(z)))
plt.show()

The model looks like this.

We can see after adding regularization, the model line becomes more smooth and be more like the original line with the degree 3.

Here is the complete code, regression_with_regularization.py

6 Summary

This is the final article of “Equation-to-Code” walk-through project. Hope they are helpful for you. Leave comments to let me know whether my article is easy to understand. Thanks for reading.

An “Equation-to-Code” Machine Learning Project Walk-Through — Part 4 Regularizat...

Detailed explanation to implement regularization from scratch in Python

1 Regularization

2 Fake some data samples

3 Preprocessing

4 Implementation without regularization

5 Implementation with regularization

6 Summary

Recommend

Keras vs Pytorch for Deep Learning

写代码的时候没有思路不知道如何写起，请教如何培养训练编程思路谢谢！

JD 的 618 是不是走走过场？

裸辞真的很难受么，大佬们给个意见

有什么书可以推荐给 15 岁的小姑娘看，主要是预防爱情懵懂时期受到伤害

大家来晒一下最近买的书吧，我先来

大文娱重回阿里 C 位

富文本原理了解一下？

经常被面试官问道的JavaScript数据类型知识你真的懂吗？

不再写 break 和 continue 了

About Joyk