In this tutorial we’ll cover the second part of this series on encoder-decoder sequence-to-sequence RNNs: how to build, train, and test our seq2seq model for text summarization using Keras.
Let’s continue!
In order to follow along with this article, you will need experience with Python code, and a beginners understanding of Deep Learning. We will operate under the assumption that all readers have access to sufficiently powerful machines, so they can run the code provided.
If you do not have access to a GPU, we suggest accessing it through the cloud.
For instructions on getting started with Python code, we recommend trying this beginners guide to set up your system and preparing to run beginner tutorials.
First, import all the necessary libraries.
Next, define the Encoder and Decoder networks.
The input length that the encoder accepts is equal to the maximum text length which you’ve already estimated in Step 3. This is then given to an Embedding Layer of dimension (total number of words captured in the text vocabulary) x (number of nodes in an embedding layer)
(calculated in Step 5; the x_voc
variable). This is followed by three LSTM networks wherein each layer returns the LSTM output, as well as the hidden and cell states observed at the previous time steps.
In the decoder, an embedding layer is defined followed by an LSTM network. The initial state of the LSTM network is the last hidden and cell states taken from the encoder. The output of the LSTM is given to a Dense layer wrapped in a TimeDistributed layer with an attached softmax activation function.
Altogether, the model accepts encoder (text) and decoder (summary) as input and it outputs the summary. The prediction happens through predicting the upcoming word of the summary from the previous word of the summary (see the below figure).
Consider the summary line to be “I want every age to laugh”. The model has to accept two inputs - the actual text and the summary. During the training phase, the decoder accepts the input summary given to the model, and learns every word that has to follow a certain given word. It then generates the predictions using an inference model during the test phase.
Add the following code to define your network architecture.
In this step, compile the model and define EarlyStopping
to stop training the model once the validation loss metric has stopped decreasing.
Next, use the model.fit()
method to fit the training data where you can define the batch size to be 128. Send the text and summary (excluding the last word in summary) as the input, and a reshaped summary tensor comprising every word (starting from the second word) as the output (which explains the infusion of intelligence into the model to predict a word, given the previous word). Besides, to enable validation during the training phase, send the validation data as well.
Next, plot the training and validation loss metrics observed during the training phase.
Train and Validation Loss (Loss v/s Epoch)
Now that we’ve trained the model, to generate summaries from the given pieces of text, first reverse map the indices to the words (which has been previously generated using texts_to_sequences
in Step 5). Also, map the words to indices from the summaries tokenizer which is to be used to detect the start and end of the sequences.
Now define the encoder and decoder inference models to start making the predictions. Use tensorflow.keras.Model()
object to create your inference models.
An encoder inference model accepts text and returns the output generated from the three LSTMs, and hidden and cell states. A decoder inference model accepts the start of the sequence identifier (sostok) and predicts the upcoming word, eventually leading to predicting the whole summary.
Add the following code to define the inference models’ architecture.
Now define a function decode_sequence()
which accepts the input text and outputs the predicted summary. Start with sostok
and continue generating words until eostok
is encountered or the maximum length of the summary is reached. Predict the upcoming word from a given word by choosing the word which has the maximum probability attached and update the internal state of the decoder accordingly.
Define two functions - seq2summary()
and seq2text()
which convert numeric-representation to string-representation of summary and text respectively.
Finally, generate the predictions by sending in the text.
Here are a few notable summaries generated by the RNN model.
The Encoder-Decoder Sequence-to-Sequence Model (LSTM) we built generated acceptable summaries from what it learned in the training texts. Although after 50 epochs the predicted summaries are not exactly on par with the expected summaries (our model hasn’t yet reached human-level intelligence!), the intelligence our model has gained definitely counts for something.
To attain more accurate results from this model, you can increase the size of the dataset, play around with the hyperparameters of the network, try making it larger, and increase the number of epochs.
In this tutorial, you’ve trained an encoder-decoder sequence-to-sequence model to perform text summarization. In my next article you can learn all about attention mechanisms. Until then, happy learning!
Reference: Sandeep Bhogaraju
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!