Bayesian LSTM on PyTorch — with BLiTZ, a PyTorch Bayesian Deep Learning library - JOYK Joy of Geek, Geek News, Link all geek

Bayesian LSTM on PyTorch — with BLiTZ, a PyTorch Bayesian Deep Learning library

It’s time for you to draw a confidence interval around your time-series predictions — and now that’s is easy as it can be.

Piero Esposito

Apr 15 ·6min read

IVJVz2N.png!web

LSTM Cell illustration. Source Accessed on 2020–04–14

This is a post on how to use BLiTZ, a PyTorch Bayesian Deep Learning lib to create, train and perform variational inference on sequence data using its implementation of Bayesian LSTMs.

You can check the notebook with the example part of this post here and the repository for the BLiTZ Bayesian Deep Learning on PyTorch here .

To accomplish that, we will explain how Bayesian Long-Short Term Memory works and then go through an example on stock confidence interval forecasting using this dataset from Kaggle .

If you are new to the theme of Bayesian Deep Learning, you may want to seek one of the many posts on Medium about it or just the documentation section on Bayesian DL of our lib repo . You may also want to check this post on a tutorial for BLiTZ usage .

Bayesian LSTM Layers

As we know, the LSTM architecture was designed to address the problem of vanishing information that happens when standard Recurrent Neural Networks were used to process long sequence data.

Mathematically, we translate the LSTM architecture as:

LSTM Equations. Source: https://en.wikipedia.org/wiki/Long_short-term_memory Re-written in LaTeX by me. Accessed in 2020–04–14.

We also know that the core idea on Bayesian Neural Networks is that, rather than having deterministic weights, we can sample them for a probability distribution and then optimize these distribution parameters.

Using that, it is possible to measure confidence and uncertainty over predictions, which, along with the prediction itself, are very useful data for insights.

Mathematically, we just have to add some extra steps to the equations above. They are the weights and biases sampling and happen before the feed-forward operation.

Equation representing the Weight sampled at the ith time on the position N of the layer/model. Equation representing the Bias sampled at the ith time on the position N of the layer/model.

And, of course, our trainable parameters are the ρ and μ of that parametrize each of the weights distributions. BLiTZ has a built-in BayesianLSTM layer that does all this hard work for you, so you just have to worry about your network architecture and training/testing loops.

Let’s go to our example.

First of all, our imports

Besides our common imports, we will be importing BayesianLSTM from blitz.modules and variational_estimator a decorator from blitz.utils that us with variational training and complexity-cost gathering.

We also import collections.deque to use on the time-series data preprocessing.

Data preprocessing

We will now create and preprocess our dataset to feed it to the network. We will import Amazon stock pricing from the datasets we got from Kaggle, get its “Close price” column and normalize it.

Our dataset will consist of timestamps of normalized stock prices and will have shape (batch_size, sequence_length, observation_length).

Lets import and preprocess the data:

We also must create a function to transform our stock price history in timestamps. To to that, we will use a deque with max length equal to the timestamp size we are using. We add each datapoint to the deque, and then append its copy to a main timestamp list:

Creating our Network class

Our network class receives the variational_estimator decorator, which eases sampling the loss of Bayesian Neural Networks. It will have a Bayesian LSTM layer with in_features=1 and out_features=10 followed by a nn.Linear(10, 1), which outputs the normalized price for the stock.

As you can see, this network works as a pretty normal one, and the only uncommon things here are the BayesianLSTM layer instanced and the variational_estimator decorator, but its behavior is a normal Torch one.

With that done, we can create our Neural Network object, the split the dataset and go forward to the training loop:

Creating objects

We now can create our loss object, neural network, the optimizer and the dataloader. See that we are not random splitting the dataset, as we will use the last batch of timestamps to evaluate the model. As our dataset is very small in terms of size, we will not make a dataloader for the train set.

We will use a normal Mean Squared Error loss and an Adam optimizer with learning rate =0.001

Train loop

For our train loop, we will be using the sample_elbo method that the variational_estimator added to our Neural Network. It averages the loss over X samples, and helps us to Monte Carlo estimate our loss with ease.

For this method to work, the output of the forward method of the network must be of the same shape as the labels that will be fed to the loss object/ criterion.

Evaluating the model and gathering confidence intervals

We will first create a dataframe with the true data to be plotted:

To predict a confidence interval, we must create a function to predict X times on the same data and then gather its mean and standard deviation. At the same time, we must set the size of the window we will try to predict before consulting true data.

Let’s see the code for the prediction function:

And for the confidence interval gathering. See that we can decide between how many standard deviations far from the mean we will set our confidence interval:

As we used a very small number of samples, we compensated it with a high standard deviation. Our network will try to predict 7 days and then will consult the data:

We can check the confidence interval here by seeing if the real value is lower than the upper bound and higher than the lower bound. With the parameters set, you should have a confidence interval around 95% as we had:

Check the prediction graphs

We now just plot the prediction graphs to visually see if our training went well. We will plot the real data and the test predictions with its confidence interval:

iQVNRjE.png!web

Predictions for the stock prices with confidence intervals.

And to end our evaluation, we will zoom in into the prediction zone:

uuYrMvZ.png!web

Plot of the network predictions on the test data with the confidence intervals.

Conclusion

We saw that BLiTZ Bayesian LSTM implementation makes it very easy to implement and iterate over time-series with all the power of Bayesian Deep Learning. We also saw that the Bayesian LSTM is well integrated to Torch and easy to use and introduce in any work or research.

We also could predict a confidence interval for the IBM stock price with a very high accuracy, which may be a far more useful insight than just a point-estimation.

Bayesian LSTM on PyTorch — with BLiTZ, a PyTorch Bayesian Deep Learning library