How Banned Words Lists May Be Hurting Your Engagement (And What You Can do About It)

Overcoming the Limitations of Banned Words Lists

Photo: Raphael Schaller via Unsplash

In the ever-evolving landscape of social media, where words wield power and engagement reigns supreme, platforms have introduced a seemingly potent tool: banned words lists. The concept appears simple and well-intentioned - create a sanitized online environment by filtering out objectionable language. However, as we delve deeper into the dynamics of fan engagement and content moderation, a concerning reality emerges:

What initially seems like a safeguard may, in fact, be a double-edged sword, potentially doing more harm than good. In this blog post, we unravel the intricacies of banned words lists and their unexpected consequences on fan engagement, shedding light on how these well-intentioned services may be undermining the very essence of your organization’s social media goals.

The Ineffectiveness of Banned Words Lists

Banned words lists have proven to be ineffective in combating online abuse due to their limited scope and inability to keep up with evolving abusive behaviours. They’re a great start to your moderation journey, but their limitations can outweigh their strengths. To be effective a holistic approach that considers both language and behaviour is necessary to create a safer online environment.

Erasing authentic, emotional connections - over-blocking and under-blocking

Over-Blocking 

Over-blocking occurs when innocuous content gets mistakenly flagged and removed. Many people use their banned words lists to automatically remove profanity from their communities. Seems innocuous, right? Well, it’s not. 

Take sports for example.

In sports, positive profanity is used all the time to express joy (or sorrow) about one’s team or favourite athlete. And while our English teachers may not appreciate it, the use of profanity in everyday language is rife in sport, and your biggest fans may be the ones using profanity the most. The phrase “f**k ya!” or if you’re Canadian, “f***kin’ eh!” are used prolifically both on the sidelines and online in sports communities.

Consider a sports team's social media page where fans passionately express their love for the team using spirited language. When positive profanity, such as playful team-related banter or celebratory expletives, is swiftly removed, it not only dampens the fan's enthusiasm but also erases the authentic, emotional connection fans have with the brand. Similarly, in the world of entertainment, a TV show's official social media account might share fans' excited reactions, including humorous or slightly edgy comments, which contribute to the show's vibrant online community. 

By over-moderating and scrubbing such interactions, the brand can inadvertently alienate its most dedicated supporters and dilute the engagement that makes social media special. Striking the right balance between maintaining a respectful atmosphere and allowing for genuine fan expression is crucial for building strong, enduring relationships on social media. banned words lists don’t give you that balance.

Under-Blocking 

Your social media team might strive to encourage healthy discussions but may inadvertently allow derogatory comments to fester in the comment section. Under-blocking can occur for various reasons, including the (un)detection of false negatives, where harmful content is not recognized as such.

When it comes to banned words lists, under-blocking leads to false negatives, which often leads to negative consequences such as harassment, hate speech, the spread of misinformation and the creation of toxic online environments. In short, under-blocking happens when abusive content manages to slip through the blocked words filter, allowing it to spread and cause harm.

Here’s how it works.

There are plenty of phrases in English (and in all the languages Areto moderates in), where the combination of words within a phrases creates a meaning very different from the individual words in that phase. Take the phrase “go back to the kitchen”. (We see that phrase pop up a lot in women’s sports and media feeds.)

 The words “go”, “back” and “kitchen” are innocuous alone, but together create a sexist insult. Unless a brand is going to over-block, and remove comments that include the word “kitchen” in all instances, when using a blocked words list you end up under-blocking harmful, brand damaging content, and letting heaps of false negatives slip through the cracks.

And let’s face it, taking into consideration all the creative ways you can lob insults without using profanity or slurs, you can see why banned words lists come up short.

Under-blocking happens when abusive content manages to slip through the blocked words filter, allowing it to spread and cause harm, and alienating your fans, which is why relying solely on banned words lists often falls short in effectively moderating online discussions.

Are you over- or under-blocking abuse & inadvertently hurting your engagement and reach? Get in touch to find out!


Other Ways of Evading Banned Words Lists

Evasion and Creativity

Speaking of creativity, online abusers are adept at finding ways to circumvent word filters. They intentionally misspell words, use alternate characters, or make substitutions to evade detection. These tactics render banned words lists futile as abusers continuously adapt their abusive language. For example, if a platform blocks the word "hate," abusers might use variations like "h8" or "h@te" to get their abusive messages through.

Contextual Ambiguity

Blocked word filters often lack the ability to analyze context, leading to both false positives and false negatives. Innocent discussions about sensitive topics may trigger a filter, while subtle abuse can go undetected. Relying solely on individual words or phrases fails to capture the complex nature of online abuse. For instance, innocent discussions about sensitive topics or legitimate uses of certain words may get flagged and blocked, while abusive language used in a subtle manner may go undetected.

Language Evolution

Languages are dynamic and constantly evolving, giving rise to new terms, slang, and coded language. Maintaining an up-to-date blocked words list becomes a Sisyphean task as abusers swiftly adopt new terms. Online platforms struggle to keep pace with the ever-changing landscape of online abuse.

Multilingual Challenges

Global social media platforms cater to diverse user bases speaking various languages. Creating comprehensive blocked word lists that cover all languages is a herculean task. Abusers can effortlessly switch to another language or use transliterations, circumventing language-specific filters.

Focusing on Words, Not Behaviour

Banned words lists target specific words or phrases without considering the overall behaviour and intent of the user. Abusers can modify their language or resort to other forms of abuse, such as targeted harassment or the use of memes and images to spread hate. Word filters alone cannot effectively address these abusive behaviours.


We put Areto’s software head-to-head against a banned words list to see which one better detected context. Here’s what we found


Protect your engagement and your fan community today!

Augmenting your banned words lists with AI-powered software will help you protect your community while building your fanbase and safeguarding your engagement metrics.

To protect your engagement, or to find out how your community is performing compared to your competitors, get in touch today.


How to fight abuse while protecting engagement

Solutions like Areto offer organizations a way to work around the limitations of banned words lists by engaging natural language processing (a type of AI) to better detect nuance including positive profanity and subtle forms of abuse.

Advanced Machine Learning Algorithms
Areto leverages advanced machine learning algorithms to analyze patterns of abusive behaviour rather than relying solely on individual words or phrases. This allows it to detect abusive content in context and adapt to evolving abusive tactics.

Behavioural Analysis
Unlike traditional filters, Areto considers the overall behaviour and intent of users. By examining patterns of abusive behaviour, it can identify and flag content that may otherwise slip through the cracks. This holistic approach offers a more nuanced understanding of online abuse.

Continuous Learning and Adaptation:
Areto constantly learns from new data and user feedback, staying ahead of emerging abusive language and tactics. Its adaptive nature ensures that it remains effective in combating evolving forms of online abuse, providing a safer online environment.

Multilingual Capability
Areto is designed to handle multilingual challenges, taking into account the nuances and expressions of different languages. It can detect abusive content across multiple languages, reducing the chances of abuse going undetected due to language barriers.

User Customization and Feedback
Areto empowers users by providing customizable moderation settings. Users can define their own thresholds and preferences, tailoring the tool to their specific needs. Additionally, the feedback loop allows users to report false positives and contribute to the continuous improvement of the moderation system.

Previous
Previous

Olympic Hopeful Kayla Bushey Joins Areto Labs as Athlete Ambassador

Next
Next

Athlete & Advocate Chris Mosier teams up with Areto Labs to fight online abuse and hate speech