11

Learn AI Today: 01 — Getting started with Pytorch

 3 years ago
source link: https://towardsdatascience.com/learn-ai-today-01-getting-started-with-pytorch-2e3ba25a518?gi=670abe1e49ca
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Learn AI Today

Learn AI Today: 01 — Getting started with Pytorch

Defining and training a Pytorch model and visualizing the results dynamically

Jul 19 ·11min read

RJFRfiA.jpg!web

Photo by Jukan Tateisi on Unsplash .

This is the first story in the Learn AI Today series I’m creating! These stories, or at least the first few, are based on a series of Jupyter notebooks I’ve created while studying/learning PyTorch and Deep Learning . I hope you find them as useful as I did!

What you will learn in this story:

  • How to Create a PyTorch Model
  • How to Train Your Model
  • Visualize the Training Progress Dynamically
  • How the Learning Rate Affects the Training

1. Linear Regression in PyTorch

Linear regression is a problem that you are probably familiar with. In it’s most basic form is no more than fitting a line to a set of points.

1. 1 Introducing the Concepts

Consider the mathematical expression of a line:

w and b are the two parameters or weights of this linear model. In machine learning , it is common to use w referring to weights and b referring to the bias parameter.

In machine learning when we are training a model we are basically finding the optimal parameters w and b for a given set of input/target (x,y) pairs. After the model is trained we can compute the model estimates. The expression will now look

where I change the name o y to ye (y estimate) because the solution will not be exact.

The Mean Square Error (MSE) is simply mean((ye-y)²) — the mean of the squared deviations between targets and estimates. For a regression problem, you can indeed minimize the MSE in order to find the best w and b .

The idea of linear regression can be generalized using algebra matrix notation to allow for multiple inputs and targets. If you want to learn more about the mathematical exact solution for the regression problem you can search about Normal Equation .

1.2 Defining the Model

PyTorch nn.Linear class is all that you need to define a linear model with any number of inputs and outputs. For our basic example of fitting a line to a set of points consider the following model:

Note: I’m using Module from fastai library as it makes the code cleaner. If you want to use pure PyTorch you should use nn.Module instead and you need to add super().__init__() in the __init__ method. fastai Module does that for you.

If you are familiar with Python classes , the code is self-explanatory. If not, consider doing some study before diving into PyTorch. There are many online tutorials and lessons covering the topic.

Back to the code. In the __init__ method, you define the layers of the model. In this case, it is just one linear layer. Then, the forward method is the one that is called when you call the model. Similar to __call__ method in normal Python classes.

Now you can define an instance of your LinearRegression model as model = LinearRegression(1, 1) indicating the number of inputs and outputs.

Maybe you are now asking why I don’t simply do model = nn.Linear(1, 1) and you are absolutely right. The reason I’m having all the trouble of defining LinearRegression class is just to work as a template for future improvements as you will find later.

1.3 How to Train Your Model

The training process is based on a sequence of 4 steps that repeat iteratively:

  • Forward pass: The input data is given to the model and the model outputs are obtained — outputs = model(inputs)
  • The loss function is computed: For the purpose of the linear regression problem, the loss function we are using is the mean squared error (MSE). We often refer to this function as the criterion — loss = criterion(outputs, targets)
  • Backward pass: The gradients of the loss function with respect to each learnable parameter are computed. Remember that we want to reduce the loss function to make the outputs close to the targets. The gradients tell how the loss change if you increase or decrease each parameter — loss.backwards()
  • Update parameters: Update the value of the parameters by a small amount in the direction that reduces the loss. The method to update the parameters can be as simple as subtracting the value of the gradient multiplied by a small number. This number is referred to as the learning rate and the optimizer I just described is the Stochastic Gradient Descent (SGD)optimizer.step()

I didn’t define exactly the criterion and optimizer yet but I will in a minute. This is just to give you a general overview and understanding of the steps for a training iteration or as usually called — a training epoch .

Let’s define our fit function that will do all the required steps.

Notice that there’s an extra step I didn’t mention before— optimizer.zero_grad() . This is because by default, in PyTorch, when you call loss.backwards() the optimizer adds up the values of the gradients. If you don’t set them to zero at each epoch then they will be always added up and that’s not desirable. Unless you are doing gradient accumulation — but that’s a more advanced topic. Besides that, as you can see in the code above, I’m saving the value of the loss at each epoch. We should expect it to drop steadily — meaning that the model is getting better at predicting the targets.

As I mentioned above, for linear regression the criterion usually used is the MSE . As for the optimizer, nowadays I always use Adam as my first choice. It’s fast and it should work well for most problems. I won’t go into details about how Adam works for now but the idea is always to find the best solution in the least amount of time.

Let’s now move on to creating an instance of our LinearRegression model, defining our criterion and our optimizer :

model.parameters() is the way to give the optimizer the list of trainable parameters and lr is the learning rate.

Now let’s create some data and train the model!

The data is simply a set of points following the model y = 2x + 1 + noise . To make it a little more interesting I make the noise larger for larger values of x. The unsqueeze(-1) in lines 4 and 5 is just to add an extra dimension to the tensor at the end (from [10000] to [10000,1] ). The data is the same but the tensor needs to have this shape meaning that we have 10000 samples and 1 feature per sample.

Plotting the data, the result is the image below, where you can see the true model and the input data + noise.

eyiYbij.png!web

Input data for the linear regression model. Image by the author.

And now to train the model we just run our fit function!

After training, we can plot the evolution of the loss during the 100 epochs. As you can see in the image below, initially the loss was of about 2.0 and then it drops steeply down to nearly zero. This is to be expected since when we start the model parameters are randomly initialized and as the training progress they converge to the solution.

b2YnY3.png!web

Evolution of the loss (MSE) for the 100 epochs of training. Image by the author.

Note:Try playing with the learning rate value to see how it affects the training!

To check the parameters of the trained model, you can run list(model.parameters()) after training the model. You will see that they are very close to 2.0 and 1.0 for this example since the true model is y = 2x + 1 .

You can now compute the model estimates — ye = model(x_train) . (Notice that before computing the estimates you should always run model.eval() to set the model to evaluation mode. It won’t make a difference for this simple model but later it will, when we start using Batch Normalization and Dropout.)

Plotting the prediction you can see that it matches almost perfectly the true data, despite the fact that the model could only see the noisy data.

yQbENrb.png!web

Visualizing the model estimates. Image by the author.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK