Moderating text with the Natural Language API

Photo by Da Nina on Unsplash

The Natural Language API lets you extract information from unstructured text using Google machine learning and provides a solution to the following problems:

Sentiment analysis
Entity analysis
Entity sentiment analysis
Syntax analysis
Content classification
Text moderation (preview)

🔍 Moderation categories

Text moderation (now available in preview) lets you detect sensitive or harmful content. The first moderation category that comes to mind is "toxicity", but there can be many more topics of interest. A PaLM2-based model powers the predictions and scores 16 categories:

Toxic	Insult	Public Safety	War & Conflict
Derogatory	Profanity	Health	Finance
Violent	Death, Harm & Tragedy	Religion & Belief	Politics
Sexual	Firearms & Weapons	Illicit Drugs	Legal

⚡ Moderating text

Like always, you can call the API through the REST/RPC interfaces or with idiomatic client libraries.

Here is an example using the Python client library (google-cloud-language) and the moderate_text method:

from google.cloud import language

def moderate_text(text: str) -> language.ModerateTextResponse:
    client = language.LanguageServiceClient()
    document = language.Document(
        content=text,
        type_=language.Document.Type.PLAIN_TEXT,
    )
    return client.moderate_text(document=document)

text = (
    "I have to read Ulysses by James Joyce.\n"
    "I'm a little over halfway through and I hate it.\n"
    "What a pile of garbage!"
)
response = moderate_text(text)

🚀 It's fast! The model latency is very low, allowing real-time analyses.

The response contains confidence scores for each moderation category. Let's sort them out:

import pandas as pd

def confidence(category: language.ClassificationCategory) -> float:
    return category.confidence

columns = ["category", "confidence"]
categories = sorted(
    response.moderation_categories,
    key=confidence,
    reverse=True,
)
data = ((category.name, category.confidence) for category in categories)
df = pd.DataFrame(columns=columns, data=data)

print(f"Text analyzed:\n{text}\n")
print(f"Moderation categories:\n{df}")

You may typically ignore scores below 50% and calibrate your solution by defining upper limits (or buckets) for the confidence scores. In this example, depending on your thresholds, you may flag the text as disrespectful (toxic) and insulting:

Text analyzed:
I have to read Ulysses by James Joyce.
I'm a little over halfway through and I hate it.
What a pile of garbage!

Moderation categories:
                 category  confidence
0                   Toxic    0.680873
1                  Insult    0.609475
2               Profanity    0.482516
3                 Violent    0.333333
4                Politics    0.237705
5   Death, Harm & Tragedy    0.189759
6                 Finance    0.176955
7       Religion & Belief    0.151079
8                   Legal    0.100946
9                  Health    0.096305
10          Illicit Drugs    0.083333
11     Firearms & Weapons    0.076923
12             Derogatory    0.073953
13         War & Conflict    0.052632
14          Public Safety    0.051813
15                 Sexual    0.028222

🖖 More

Run this Colab notebook: Using the Natural Language API
See the supported languages
Read more about text moderation
Follow me on Twitter or LinkedIn for more cloud explorations

🔍 Moderation categories

⚡ Moderating text

🖖 More

Recommend

Understanding Text with Natural Language Processing

Moderating Promotional Spam and Inappropriate Content in Photos at Scale at Yelp

Moderating Image Content in Slack with Amazon Rekognition and Amazon AppFlow | A...

WhatsApp has over 1,000 contractors moderating content, including private messag...

TagTeam - Natural Language Processing API for unstructured text | Product Hunt

TikTok is being investigated in the U.S. for not moderating child sex abuse mate...

Meta showed bias when moderating Israel-Palestine conflict, says new report

Musk’s one-on-one with Kanye signals naïveté moderating Twitter hate speech

Airbnb growth ‘does appear to be moderating,’ analyst says

You Don't Want Homeland Security Moderating Disinformation

About Joyk