Spam predictor using Convolutional Neural Networks and Flask

Integrating machine learning into a web interface

Oct 27 ·8min read

By David Lorenz , Kristi Dunks , and Serena Patel

Looking to make an easy-to-use internal prediction tool for your company, develop a prototype to pitch a machine learning product to potential investors, or show off your machine learning model to friends?

Thanks to Python’s Flask, it is simple to integrate machine learning models with a user-friendly HTML interface. This framework can be applied to any example where a user provides data and receives a prediction from a machine learning model. For example, X-ray technicians could upload a patient’s X-ray and immediately receive an automated diagnosis via image recognition. Flask is a solution to serve a model on the back-end of your application.

Our Use Case Example

Let’s imagine there’s hundreds of thousands of cell phone users who are terrible at distinguishing spam text messages from those you receive from your friends. As an innovator, you want to build a prototype website where users can enter a text to receive an automated spam or ham distinction. You’ve come to the right place! You’ll learn (a) how to build a convolutional neural network to classify texts as ham or spam and (b) how to integrate this deep learning model with a front-end application using Flask.

Yes, chances are you’re fairly adept at determining whether a text is spam and aren’t in need of machine learning. We selected this as our use case because of the simplicity of the data and problem. You can spend less time understanding the issue and more time understanding the tools! In addition, you will be able to run though this entire example easily on your own, and you won’t even need GPUs!

Designing a Deep Learning Model

Convolutional Neural Networks (CNNs) have numerous applications beyond image recognition. For example, CNNs have predictive power for time series forecasting and natural language processing (NLP). The input to a CNN is a matrix. In image recognition, each image’s pixels are coded as numerical values representing the intensity of color for each pixel.

We’ll focus on the NLP application of CNNs and train a Word CNN. A Word CNN’s input matrix includes rows representing words in a sentence and columns representing word embeddings of n dimensions. We will come back to word embeddings, but for now, consider the sentence “Yo we are watching a movie on netflix.” The matrix representation of this sentence with 10 dimensions is below.

qYB7NnF.png!web

Matrix representation of sentence with 10 dimensions (note: padding not pictured)

You might be wondering what these numbers above represent. Let’s unpack this a bit. Word embeddings are a generalized vector representation of a word. Words with similar contexts share similar vectors. As shown below, “I” and “we” have similar vector representations, but “netflix” is different.

Vector representations of “I”, “we”, and “netflix”

Why generalize words in this manner? Generalizing words with word embeddings helps prevent overfitting. You might be wondering what each dimension in the vector represents. The technical explanation is that they are the weights from a hidden layer in a neural network that predicts a given word in the context of surrounding words. Practically, it is helpful to think of these dimensions as attributes of a word. For example, one dimension could represent how happy or sad a word is.

There are two common ways to generate word embeddings: (1) pre-trained word embeddings like Word2vec or GloVe and (2) word embeddings generated from the samples you train on. Pre-trained word embeddings are embeddings generated from a massive corpus of text and generalize well. Word embeddings generated within your training data results in corpus-specific embeddings.

For more information on word embeddings, this article is helpful.

The Dataset

We use the SMS Spam Collection, downloadable here .

The data set consists of text messages that are classified as either ham (good) or spam (bad). An excerpt from the data set is shown below.

eIfiEj2.jpg!web

SMS Spam data excerpt

Jb2Az2I.png!web

That’s a lot of ham!

Word CNNs with Keras

Keras makes it easy to create a Word CNN in just a few lines of code. For this model, we generate embeddings within our corpus using the Keras “embedding” layer. Note that the output from the embedding layer is a matrix, which is the necessary input to the convolutional layer.

In less than an hour working with these data, we were able to achieve an accuracy of 98% in our test set. The power of CNNs!

Now that we have a trained model, we save the pre-trained weights, structure, and tokenizer. Our Flask application utilizes these files such that we don’t need to re-run the model training process each time the application is launched. This saves on time and computation. Our code to do this is below.

The source code and SMS data for the model can be found on GitHub here .

Serving the Model with Flask

We now create an interface to enable users to interact with our Word CNN. To do this, we create a REST (REpresentational State Transfer) API (Application Programing Interface) using Python Flask.

A RESTful API creates a link between the user and the server where the pre-trained model is hosted. You can think of it as the model always “listening” waiting for the user to enter data, generate a prediction, and provide that user with a response. We use Python Flask to interact with what the user enters and receives on the webpage. You can think of of Python Flask as a bridge between the pre-trained model and the HTML page.

First, we load the pre-trained model and tokenizer for pre-processing.

Next, we create a helper function that applies pre-processing. Why do we need to load the tokenizer? Otherwise, the identifiers attached to the words entered by the user are misaligned with the identifiers assigned during the training process. As a result, the Word CNN would interpret the input as a sentence entirely different from what the user inputs. Loading the pickled tokenizer ensures consistency with the model training.

Next, we compile our model and confirm it works with examples:

Now that we have the pre-processing and pre-trained model loaded, we interact Flask with our HTML pages. We build two HTML pages: (1) search_page.html and (2) prediction.html. When the user first visits the webpage, the if condition below fails and the search_page.html loads. This page contains an HTML id called “text_entered” that is part of a form.

When the user enters text and clicks the form’s submit button, “request.method” becomes “POST.” As a result, text_entered from the user in the HTML form becomes textData, which is converted to an array — named Features — containing numeric identifiers for each word. This array is fed through the pre-trained Word CNN to generate a prediction representing the probability of being spam.

Then, render_template sends the prediction.html page to the user. On this page, the prediction from the Word CNN is inserted into “{{ prediction }}” in the HTML code.

Our last step is to define where we will run this locally:

For testing, after you execute the lines of code above, you visit “0.0.0.0:5000” on a web browser to load the search_page.html. Before doing so, ensure that (1) the HTML files are saved in a folder named “templates” and (2) the model.h5, model.json, and tokenizer.pickle are saved in the same directory where you are running the Python or Jupyter notebook from. We can now test the application locally.

Execute all lines of code in Flask application (wait to see “*Running on…” message above)
Open browser like Google Chrome and visit “0.0.0.0:5000”
Enter example like as we have below and click “Get spam prediction!”

v2meUje.png!web

HTML interface to enter in email text

4. Click “Get spam prediction!”, which returns the page below.

bMrMBf7.png!web

HTML output of prediction

5. Observe the output in the Jupyter notebook, which prints each step in the predict function

Jupyter notebook with application running locally and user entering example

Success! We entered a ham example and the application returned a 2% chance of spam. Our app works locally.

Deploying to the Cloud

With your local application in place and functional, you now may want to launch it to the cloud so that users anywhere can interact with it. However, debugging in the cloud is much more difficult than debugging locally. Thus, we recommend testing locally before moving to the cloud.

Launching to the cloud is easy!

If you’re using a Jupyter notebook, you’ll need to create a main.py file.

You will need to upload the following to the Google Cloud:

main.py
the templates folder
a yaml file
the h5 file with weights
the json file with model framework
the pickled tokenizer
a requirement.txt file

Note that you need to specify the gunicorn version in requirements.txt such that Google Cloud can connect to the Python web server to download the proper libraries specified in the requirements.txt file. Pickle is not required (comes standard with Python) in requirements.txt, and Google Cloud will return an error if you include it.

Next, we type the following commands in the cloud to get this running:

cd [name of folder]
gcloud app deploy

For additional detail on launching to Google Cloud see the step-by-step guide on GitHub here . In addition, this video provides a good reference for the Google Cloud deployment steps.

A Spam Predictor is Born

Now you have created a usable application that allows you to determine whether a message is spam or ham.

The source code for the Flask application can be found on GitHub here .

Interested in seeing another Flask + Word CNN use case? Watch our video on adverse reactions to drugs here .

Note that Flask is great for prototypes and applications that have a limited number of users (e.g., an internal tool for a small user base at a company). Flask is not intended for production-grade model serving with thousands of users.

Flask and Image Recognition

You may be thinking: this is great but I want to interact Flask with deep learning for image recognition. You’re in luck! While we focused on Word CNNs in this example, the approach is similar for images and we have an example for images. The GitHub repositories linked below interact user uploaded images with a CNN for image recognition.

Convolutional Neural Network to diagnose pneumonia in children using labeled images from UCSD can be found here .

Python Flask to accept image uploaded by X-Ray technician and provide diagnosis (healthy or pneumonia?) can be found here .

Spam predictor using Convolutional Neural Networks and Flask