Gradient Descent With RMSProp from Scratch

Gradient descent is an optimization algorithm used to find the set of parameters (coefficients) of a function that minimizes a cost function. This method iteratively adjusts the coefficients of the function until the cost reaches the local, or global, minimum. Gradient descent works by calculating the partial derivatives of the cost function with respect to each of the coefficients. The algorithm then makes small adjustments to the coefficients in order to reduce the cost until it reaches a minimum. The direction of the adjustments is determined by the negative of the gradient of the cost function. Optimizers are methods or algorithms that reduce a loss (an error) by adjusting various parameters and weights, minimizing the loss function, and thereby improving model accuracy and speed. One such optimization technique is RMSprop.

RMSProp (Root Mean Squared Propagation) is an adaptive learning rate optimization algorithm. It is an extension of the popular Adaptive Gradient Algorithm and is designed to dramatically reduce the amount of computational effort used in training neural networks. This algorithm works by exponentially decaying the learning rate every time the squared gradient is less than a certain threshold. This helps reduce the learning rate more quickly when the gradients become small. In this way, RMSProp is able to smoothly adjust the learning rate for each of the parameters in the network, providing a better performance than regular Gradient Descent alone.

The RMSprop algorithm utilizes exponentially weighted moving averages of squared gradients to update the parameters. Here is the mathematical equation for RMSprop:

Initialize parameters:
- Learning rate: α
- Exponential decay rate for averaging: γ
- Small constant for numerical stability: ε
- Initial parameter values: θ
Initialize accumulated gradients (Exponentially weighted average):
- Accumulated squared gradient for each parameter: Et= 0

Repeat until convergence or maximum iterations:

Compute the gradient of the objective function with respect to the parameters:

*** QuickLaTeX cannot compile formula:
g_t = \nabla_\theta J(\theta_t)


*** Error message:
Cannot connect to QuickLaTeX server: cURL error 28: Operation timed out after 40001 milliseconds with 0 bytes received
Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.)
These links might help in finding solution:
http://wordpress.org/extend/plugins/core-control/
http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37

Update the exponentially weighted average of the squared gradients: $E_t = \gamma E_{t-1} + (1-\gamma) g_t^2$
Update the parameters: $\theta_{t+1} = \theta_t - \alpha \frac{g_t}{\sqrt{E_t+ \epsilon}}$

where,

gt is the gradient of the loss function with respect to the parameters at time t
$\gamma$ is a decay factor
Et is the exponentially weighted average of the squared gradients
α is the learning rate
ϵ is a small constant to prevent division by zero

This process is repeated for each parameter in the optimization problem, and it helps adjust the learning rate for each parameter based on the historical gradients. The exponential moving average allows the algorithm to give more importance to recent gradients and dampen the effect of older gradients, providing stability during optimization.

Implementation

Now, we will look into the implementation of the RMSprop. We will first import all the necessary libraries as follows.

Python3

# Importing libraries

import numpy as np

import matplotlib.pyplot as plt

from numpy import arange, meshgrid

Now, we will define our objective function and its derivatives. For this article we are considering the objective function to be

*** QuickLaTeX cannot compile formula:
5 \times x_1^2 + 7 \times x_2^2


*** Error message:
Cannot connect to QuickLaTeX server: cURL error 52: Empty reply from server
Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.)
These links might help in finding solution:
http://wordpress.org/extend/plugins/core-control/
http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37

where x1 and x2 are variables.

Python3

# Defining the objective function

def objective(x1, x2):

# Replace with your objective function

return 5 * x1**2.0 + 7 * x2**2.0

# Defining the derivative of the objective function w.r.t x1

def derivative_x1(x1, x2):

# Replace with the derivative of your objective function w.r.t x1

return 10.0 * x1

# Defining the derivative of the objective function w.r.t x2

def derivative_x2(x1, x2):

# Replace with the derivative of your objective function w.r.t x2

return 14.0 * x2

Now, let us visualize this equation. We will look into its 3D graph between x1, x2, and y and we will also look into its 2D representation (contour plot).

Python3

# Plotting the objective function in 3D and 2D

# Defining the range of x1 and x2

x1 = arange(-5.0, 5.0, 0.1)

x2 = arange(-5.0, 5.0, 0.1)

# Creating a meshgrid of x1 and x2

x1, x2 = meshgrid(x1, x2)

# Calculating the objective function for each combination of x1 and x2

y = objective(x1, x2)

# Plotting the objective function in 3D and 2D

fig = plt.figure(figsize=(12, 4))

# Plot 1 - 3D plot

ax = fig.add_subplot(1, 2, 1, projection='3d')

ax.plot_surface(x1, x2, y, cmap='viridis')

ax.set_xlabel('x1')

ax.set_ylabel('x2')

ax.set_zlabel('y')

ax.set_title('3D plot of the objective function')

# Plot 2 - Contour plot (2D plot)

ax = fig.add_subplot(1, 2, 2)

ax.contour(x1, x2, y, cmap='viridis', levels=20)

ax.set_xlabel('x1')

ax.set_ylabel('x2')

ax.set_title('Contour plot of the objective function')

# Displaying the plots

plt.show()

Output:

3D and 2D plot of objective function-Geeksforgeeks

Now, let us define our RMSprop optimizer.

Python3

# Defining the RMSprop optimizer

def rmsprop(x1, x2, derivative_x1, derivative_x2, learning_rate, gamma, epsilon, max_epochs):

# Creating empty lists to store the trajectories of x1, x2, and y

x1_trajectory = []

x2_trajectory = []

y_trajectory = []

# Setting the initial values of x1, x2, and y

x1_trajectory.append(x1)

x2_trajectory.append(x2)

y_trajectory.append(objective(x1, x2))

# Defining the initial values of e1 and e2

e1 = 0

e2 = 0

# Running the gradient descent loop

for _ in range(max_epochs):

# Calculating the derivatives of the objective function w.r.t x1 and x2

gt_x1 = derivative_x1(x1, x2)

gt_x2 = derivative_x2(x1, x2)

# Calculating the exponentially weighted averages of the derivatives

e1 = gamma * e1 + (1 - gamma) * gt_x1**2.0

e2 = gamma * e2 + (1 - gamma) * gt_x2**2.0

# Updating the values of x1 and x2

x1 = x1 - learning_rate * gt_x1 / (np.sqrt(e1 + epsilon))

x2 = x2 - learning_rate * gt_x2 / (np.sqrt(e2 + epsilon))

# Appending the values of x1, x2, and y to their respective lists

x1_trajectory.append(x1)

x2_trajectory.append(x2)

y_trajectory.append(objective(x1, x2))

return x1_trajectory, x2_trajectory, y_trajectory

Now, let us optimize our objective function using the RMSprop function.

Python3

# Defining the initial values of x1, x2, and other hyperparameters

x1_initial = -4.0

x2_initial = 3.0

learning_rate = 0.1

gamma = 0.9

epsilon = 1e-8

max_epochs = 50

# Running the RMSprop algorithm

x1_trajectory, x2_trajectory, y_trajectory = rmsprop(

x1_initial,

x2_initial,

derivative_x1,

derivative_x2,

learning_rate,

gamma,

epsilon,

max_epochs

)

# Printing the optimal values of x1, x2, and y

print('The optimal value of x1 is:', x1_trajectory[-1])

print('The optimal value of x2 is:', x2_trajectory[-1])

print('The optimal value of y is:', y_trajectory[-1])

Output:

The optimal value of x1 is: -0.10352260359924752
The optimal value of x2 is: 0.0025296212056016548
The optimal value of y is: 0.05362944016394148

Now, let us visualize the path or trajectory of the objective function.

Python3

# Displying the path of y in each iteration on the contour plot

fig = plt.figure(figsize=(6, 6))

ax = fig.add_subplot(1, 1, 1)

# Plotting the contour plot

ax.contour(x1, x2, y, cmap='viridis', levels=20)

# Plotting the trajectory of y in each iteration

ax.plot(x1_trajectory, x2_trajectory, '*',

markersize=7, color='dodgerblue')

# Setting the labels and title of the plot

ax.set_xlabel('x1')

ax.set_ylabel('x2')

ax.set_title('RMSprop Optimization path for ' + str(max_epochs) + ' iterations')

# Displaying the plot

plt.show()

Output:

Gradient Descent With RMSProp from Scratch

Gradient Descent With RMSProp from Scratch

Implementation

Recommend

New Horizon College of Engineering Campus Experience

Bidirectional LSTM in NLP

Prog.AI

DoCast - Chromecast from iPhone to TV | Product Hunt

Every New Feature On iPadOS 17 You'll Want To Check Out

Here's A Look At Instagram's Upcoming Twitter Competitor

“花钱吃剩菜”，买吗？

CollegeCompass - The smart planner to get into your dream college | Product Hunt

独立！中国风投迎来震撼一幕

Rootspace - Open source productivity SaaS app | Product Hunt

About Joyk