Bidirectional LSTM in NLP
source link: https://www.geeksforgeeks.org/bidirectional-lstm-in-nlp/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In this article, we will first discuss bidirectional LSTMs and their architecture. We will then look into the implementation of a review system using Bidirectional LSTM. Finally, we will conclude this article while discussing the applications of bidirectional LSTM.
Bidirectional LSTM (BiLSTM)
Bidirectional LSTM or BiLSTM is a term used for a sequence model which contains two LSTM layers, one for processing input in the forward direction and the other for processing in the backward direction. It is usually used in NLP-related tasks. The intuition behind this approach is that by processing data in both directions, the model is able to better understand the relationship between sequences (e.g. knowing the following and preceding words in a sentence).
To better understand this let us see an example. The first statement is “Server can you bring me this dish” and the second statement is “He crashed the server”. In both these statements, the word server has different meanings and this relationship depends on the following and preceding words in the statement. The bidirectional LSTM helps the machine to understand this relationship better than compared with unidirectional LSTM. This ability of BiLSTM makes it a suitable architecture for tasks like sentiment analysis, text classification, and machine translation.
Architecture
The architecture of bidirectional LSTM comprises of two unidirectional LSTMs which process the sequence in both forward and backward directions. This architecture can be interpreted as having two separate LSTM networks, one gets the sequence of tokens as it is while the other gets in the reverse order. Both of these LSTM network returns a probability vector as output and the final output is the combination of both of these probabilities. It can be represented as:
*** QuickLaTeX cannot compile formula: p_t = p_t^f + p_t^b *** Error message: Cannot connect to QuickLaTeX server: cURL error 35: Unknown SSL protocol error in connection to www.quicklatex.com:443 Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.) These links might help in finding solution: http://wordpress.org/extend/plugins/core-control/ http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37
where,
*** QuickLaTeX cannot compile formula: p_t *** Error message: Cannot connect to QuickLaTeX server: cURL error 52: Empty reply from server Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.) These links might help in finding solution: http://wordpress.org/extend/plugins/core-control/ http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37
: Final probability vector of the network.
- : Probability vector from the forward LSTM network.
- : Probability vector from the backward LSTM network.
Bidirectional LSTM layer Architecture
Figure 1 describes the architecture of the BiLSTM layer where is the input token,
*** QuickLaTeX cannot compile formula: Y_i *** Error message: Cannot connect to QuickLaTeX server: cURL error 35: Unknown SSL protocol error in connection to www.quicklatex.com:443 Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.) These links might help in finding solution: http://wordpress.org/extend/plugins/core-control/ http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37
is the output token, and and are LSTM nodes. The final output of
*** QuickLaTeX cannot compile formula: Y_i *** Error message: Cannot connect to QuickLaTeX server: cURL error 52: Empty reply from server Please make sure your server/PHP settings allow HTTP requests to external resources ("allow_url_fopen", etc.) These links might help in finding solution: http://wordpress.org/extend/plugins/core-control/ http://wordpress.org/support/topic/an-unexpected-http-error-occurred-during-the-api-request-on-wordpress-3?replies=37
is the combination of and LSTM nodes.
Now, let us look into an implementation of a review system using BiLSTM layers in Python using the Tensorflow library. We would be performing sentiment analysis on the IMDB movie review dataset. We would implement the network from scratch and train it to identify if the review is positive or negative.
Importing Libraries and Dataset
Python libraries make it very easy for us to handle the data and perform typical and complex tasks with a single line of code.
- Numpy – Numpy arrays are very fast and can perform large computations in a very short time.
- Matplotlib– This library is used to draw visualizations.
- TensorFlow – This is an open-source library that is used for Machine Learning and Artificial intelligence and provides a range of functions to achieve complex functionalities with single lines of code.
- Python3
import tensorflow as tf import tensorflow_datasets as tfds import numpy as np import matplotlib.pyplot as plt |
IMDB movies review dataset is the dataset for binary sentiment classification containing 25,000 highly polar movie reviews for training, and 25,000 for testing. This dataset can be acquired from this website or we can also use the tensorflow_datasets library to acquire it.
- Python3
# Obtain the imdb review dataset from tensorflow datasets dataset = tfds.load( 'imdb_reviews' , as_supervised = True ) # Seperate test and train datasets train_dataset, test_dataset = dataset[ 'train' ], dataset[ 'test' ] # Split the test and train data into batches of 32 # and shuffling the training set batch_size = 32 train_dataset = train_dataset.shuffle( 10000 ) train_dataset = train_dataset.batch(batch_size) test_dataset = test_dataset.batch(batch_size) |
Printing a sample review and its label from the training set.
- Python3
example, label = next ( iter (train_dataset)) print ( 'Text:\n' , example.numpy()[ 0 ]) print ( '\nLabel: ' , label.numpy()[ 0 ]) |
Output:
Text:
b'Stumbling upon this HBO special late one night, I was absolutely taken by this
attractive British "executive transvestite." I have never laughed so hard over
European History or any of the other completely worthwhile point Eddie Izzard made.
I laughed so much that I woke up my mother sleeping at the other end of the house...'
Label: 1
Model Architecture
In this section, we will define the model we will use for sentiment analysis. The initial layer of this architecture is the text vectorization layer, responsible for encoding the input text into a sequence of token indices. These tokens are subsequently fed into the embedding layer, where each word is assigned a trainable vector. After enough training, these vectors tend to adjust themselves such that words with similar meanings have similar vectors. This data is then passed to Bidirectional LSTM layers which process these sequences and finally convert it to a single logit as the classification output.
We will first perform text vectorization and let the encoder map all the words in the training dataset to a token. We can also see in the example below how we can encode and decode the sample review into a vector of integers.
- Python3
# Using the TextVectorization layer to normalize, split, and map strings # to integers. encoder = tf.keras.layers.TextVectorization(max_tokens = 10000 ) encoder.adapt(train_dataset. map ( lambda text, _: text)) # Extracting the vocabulary from the TextVectorization layer. vocabulary = np.array(encoder.get_vocabulary()) # Encoding a test example and decoding it back. original_text = example.numpy()[ 0 ] encoded_text = encoder(original_text).numpy() decoded_text = ' ' .join(vocabulary[encoded_text]) print ( 'original: ' , original_text) print ( 'encoded: ' , encoded_text) print ( 'decoded: ' , decoded_text) |
Output:
original:
b'Stumbling upon this HBO special late one night, I was absolutely taken by this
attractive British "executive transvestite." I have never laughed so hard over
European History or any of the other completely worthwhile point Eddie Izzard made.
I laughed so much that I woke up my mother sleeping at the other end of the house...'
encoded:
[9085 720 11 4335 309 534 29 311 10 14 412 602 33 11
1523 683 3505 1 10 26 110 1434 38 264 126 1835 489 42
99 5 2 81 325 2601 215 1781 9352 91 10 1434 38 73
12 10 9259 58 56 462 2703 31 2 81 129 5 2 313]
decoded:
stumbling upon this hbo special late one night i was absolutely taken by this
attractive british executive [UNK] i have never laughed so hard over european history
or any of the other completely worthwhile point eddie izzard made i laughed so much
that i woke up my mother sleeping at the other end of the house
Now, we will use this trained encoder along with Bidirectional LSTM layers to define a model as discussed earlier.
We will implement a Sequential model which will contain the following parts:
- First layer is the embedding layer used to create a embedding for the inpurt text.
- Then bidirectional LSTM layers in the network to learn greater dependencies in the network.
- Then we will have two fully connected layers whose final output will be teh probability of being the positive review.
- Python3
# Creating the model model = tf.keras.Sequential([ encoder, tf.keras.layers.Embedding( len (encoder.get_vocabulary()), 64 , mask_zero = True ), tf.keras.layers.Bidirectional( tf.keras.layers.LSTM( 64 , return_sequences = True )), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM( 32 )), tf.keras.layers.Dense( 64 , activation = 'relu' ), tf.keras.layers.Dense( 1 ) ]) # Summary of the model model.summary() # Compile the model model. compile ( loss = tf.keras.losses.BinaryCrossentropy(from_logits = True ), optimizer = tf.keras.optimizers.Adam(), metrics = [ 'accuracy' ] ) |
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
text_vectorization (TextVec (None, None) 0
torization)
embedding (Embedding) (None, None, 64) 640000
bidirectional (Bidirectiona (None, None, 128) 66048
l)
bidirectional_1 (Bidirectio (None, 64) 41216
nal)
dense (Dense) (None, 64) 4160
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 751,489
Trainable params: 751,489
Non-trainable params: 0
_________________________________________________________________
Model Training
Now, we will train the model we defined in the previous step for five epochs.
- Python3
# Training the model and validating it on test set history = model.fit( train_dataset, epochs = 5 , validation_data = test_dataset, ) |
Output:
Epoch 1/5
782/782 [==============================] - 1209s 2s/step - loss: 0.3657 -
accuracy: 0.8266 - val_loss: 0.3110 - val_accuracy: 0.8441
Epoch 2/5
782/782 [==============================] - 1269s 2s/step - loss: 0.2147 -
accuracy: 0.9126 - val_loss: 0.3566 - val_accuracy: 0.8590
Epoch 3/5
782/782 [==============================] - 1146s 1s/step - loss: 0.1616 -
accuracy: 0.9380 - val_loss: 0.3764 - val_accuracy: 0.8670
Epoch 4/5
782/782 [==============================] - 1963s 3s/step - loss: 0.0962 -
accuracy: 0.9647 - val_loss: 0.4271 - val_accuracy: 0.8564
Epoch 5/5
782/782 [==============================] - 1121s 1s/step - loss: 0.0573 -
accuracy: 0.9796 - val_loss: 0.5516 - val_accuracy: 0.8575
Plotting the training and validation accuracy and loss plots.
- Python3
# Plotting the accuracy and loss over time # Training history history_dict = history.history # Seperating validation and training accuracy acc = history_dict[ 'accuracy' ] val_acc = history_dict[ 'val_accuracy' ] # Seperating validation and training loss loss = history_dict[ 'loss' ] val_loss = history_dict[ 'val_loss' ] # Plotting plt.figure(figsize = ( 8 , 4 )) plt.subplot( 1 , 2 , 1 ) plt.plot(acc) plt.plot(val_acc) plt.title( 'Training and Validation Accuracy' ) plt.xlabel( 'Epochs' ) plt.ylabel( 'Accuracy' ) plt.legend([ 'Accuracy' , 'Validation Accuracy' ]) plt.subplot( 1 , 2 , 2 ) plt.plot(loss) plt.plot(val_loss) plt.title( 'Training and Validation Loss' ) plt.xlabel( 'Epochs' ) plt.ylabel( 'Loss' ) plt.legend([ 'Loss' , 'Validation Loss' ]) plt.show() |
Output:
The plot of training and validation accuracy and loss
Model Evaluation
Now, we will test the trained model with a random review and check its output.
- Python3
# Making predictions sample_text = ( '''The movie by GeeksforGeeks was so good and the animation are so dope. I would recommend my friends to watch it.''' ) predictions = model.predict(np.array([sample_text])) print ( * predictions[ 0 ]) # Print the label based on the prediction if predictions[ 0 ] > 0 : print ( 'The review is positive' ) else : print ( 'The review is negative' ) |
Output:
1/1 [==============================] - 0s 33ms/step
5.414222
The review is positive
Applications of BiDirectional LSTM
Some of the popular application which uses BiLSTM are sentiment analysis, text classification, text generation, and machine translation. You can also explore some of these applications in the following articles:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK