Beginner’s Guide to LLMs: Build a Content Moderation Filter and Learn Advanced Prompting with Free Groq API

Devesh Surve
8 min readJun 5, 2024

--

Introduction

Hey Everyone !

So one thing I’ve learnt over the years is that best way to build a useful application with an LLM (Large Language Model) often starts with identifying a practical problem. While sentiment analysis ( Checking if some text is positive/negative/neutral ) is a common use case that many people have explored, it only scratches the surface of what LLMs can do. ( And honestly is a bit boring )

So Recently, I was searching for ways to get a free API to experiment with different LLM promptings and discovered the Groq API. It’s equipped with an API call to one of the best open-source LLMs ( LLAMA-3 ) and is completely free !

But again, Creating a simple sentiment analysis felt too basic. To push the boundaries further, I decided to build a content moderation filter, integrating techniques like chain-of-thought (CoT) and few-shot prompting to enhance the model’s reasoning capabilities. But don’t worry about these terms yet, this blog will guide you through building such a moderation filter, showcasing an industry-relevant use case step by step.

But First, why a Content Moderation Filter?

Content moderation is crucial for maintaining safe and respectful online communities. With user-generated content flooding the internet, automated moderation tools are essential to classify and filter harmful or inappropriate content efficiently. By leveraging LLMs, we can create customizable and scalable moderation systems.

Real-World Examples of Content Moderation

Imagine you’re running a social media platform or a forum. Users post content all the time, and some of it might be harmful or inappropriate. Content moderation tools help automatically identify and filter out this content to keep the community safe.

Importance of Automated Moderation Tools

With the vast amount of user-generated content online, manual moderation isn’t feasible. Automated tools, powered by LLMs, can efficiently classify and filter content based on predefined rules.

Basic Approach

The basic approach involves defining the moderation rules and categories directly in the prompt, making it easy to customize and experiment with.

What is a Prompt?

A prompt is a set of instructions you give to the language model to get the desired output. For content moderation, the prompt includes guidelines for what content should be allowed or blocked.

Example Prompt Structure

Here’s a simple structure for a content moderation prompt:

You are a content moderation expert tasked with categorizing user-generated 
text based on the following guidelines:

BLOCK CATEGORY:
- [Description or examples of content that should be blocked]
ALLOW CATEGORY:
- [Description or examples of content that is allowed]

Here is the user-generated text to categorize:

<user_text>{{USER_TEXT}}</user_text>

You replace {{USER_TEXT}} with the actual user-generated text to be classified, and then send the prompt to the language model using the appropriate API. The model's response should be either "ALLOW" or "BLOCK," indicating how the text should be handled based on your provided guidelines.

Getting Your Groq API Key

To use the Groq API, you first need to obtain an API key.

What is an API Key?

So an API key is a code passed in by computer programs to identify the calling program. The Groq API uses this key to authenticate your requests.

Step-by-Step Guide

  1. Sign Up: Go to the Groq API website — https://console.groq.com/login and sign up for an account if you don’t already have one.
  2. Generate API Key: Once logged in, navigate to the API section of your dashboard. — https://console.groq.com/keys
  1. Create a New Key: Click on “Create API Key” and give it a name that you can easily identify.
  2. Copy the Key: Once the key is generated, copy it and keep it in a secure place.

You will use this key in your code to authenticate your requests to the Groq API.

Here’s the full colab reference notebook — https://colab.research.google.com/drive/1pUipKwlhCmbSZoZSlkCBlmmJBO2C9UcB?usp=sharing

Example Usage with Groq API

Let’s dive into an example that demonstrates how to use this approach with GPT and the Groq API.

Simple Example

Step 1: Configure the Groq API

First, set up the Groq API by initializing the client with your API key.

import os
from groq import Groq

# Configure Groq API
groq_api_key = 'your GROQ API KEY'
os.environ["GROQ_API_KEY"] = groq_api_key

client = Groq(api_key=os.environ.get("GROQ_API_KEY"))

Step 2: Define the Moderation Function

Next, define a function moderate_text that takes the user-generated text and the guidelines as inputs.

def moderate_text(user_text, guidelines):
prompt_template = """
You are a content moderation expert tasked with categorizing user-generated text based on the following guidelines:

{guidelines}

Here is the user-generated text to categorize:
<user_text>{user_text}</user_text>

Based on the guidelines above, classify this text as either ALLOW or BLOCK. Return nothing else.
"""

Step 3: Format the Prompt

Format the prompt by inserting the user-generated text and the guidelines into the template.

    # Format the prompt with the user text
prompt = prompt_template.format(user_text=user_text, guidelines=guidelines)

Step 4: Send the Prompt to Groq API

Send the formatted prompt to the Groq API and get the response.

    # Send the prompt to Groq API and get the response
chat_completion = client.chat.completions.create(
messages=[
{"role": "user", "content": prompt}
],
model="llama3-8b-8192"
)

Step 5: Return the Response

Return the response, which will be either “ALLOW” or “BLOCK.”

return chat_completion.choices[0].message.content.strip()

Full Example

Here’s how you can use the moderate_text function to moderate an array of user comments:

# Example guidelines
example_guidelines = '''BLOCK CATEGORY:
- Promoting violence, illegal activities, or hate speech
- Explicit sexual content
- Harmful misinformation or conspiracy theories

ALLOW CATEGORY:
- Most other content is allowed, as long as it is not explicitly disallowed
'''

user_comments = [
"This movie was great, I really enjoyed it. The main actor really killed it!",
"Delete this post now or you better hide. I am coming after you and your family.",
"Stay away from the 5G cellphones!! They are using 5G to control you.",
"Thanks for the helpful information!",
]

for comment in user_comments:
classification = moderate_text(comment, example_guidelines)
print(f"Comment: {comment}\nClassification: {classification}\n")

Here’s how the output would look

Yay ! You just created your first LLM API application !

Improving the Prompt

Now, you can easily customize the moderation rules by modifying the descriptions or examples provided in the prompt for the “BLOCK” and “ALLOW” categories.

How to Modify Guidelines

Identify the Categories: Decide what types of content should be blocked or allowed.
Write Descriptions: Write clear and detailed descriptions or examples for each category.
Update the Prompt: Modify the prompt template with your new guidelines.

For example, if you wanted to moderate a rollercoaster enthusiast forum and ensure posts stay on topic, you could update the guidelines accordingly:

rollercoaster_guidelines = '''BLOCK CATEGORY:
- Content that is not related to rollercoasters, theme parks, or the amusement industry
- Explicit violence, hate speech, or illegal activities
- Spam, advertisements, or self-promotion

ALLOW CATEGORY:
- Discussions about rollercoaster designs, ride experiences, and park reviews
- Sharing news, rumors, or updates about new rollercoaster projects
- Respectful debates about the best rollercoasters, parks, or ride manufacturers
- Some mild profanity or crude language, as long as it is not directed at individuals
'''

post_titles = [
"Top 10 Wildest Inversions on Steel Coasters",
"My Review of the New RMC Raptor Coaster at Cedar Point",
"Best Places to Buy Cheap Hiking Gear",
"Rumor: Is Six Flags Planning a Giga Coaster for 2025?",
"My Thoughts on the Latest Marvel Movie",
]

for title in post_titles:
classification = moderate_text(title, rollercoaster_guidelines)
print(f"Title: {title}\nClassification: {classification}\n")

Further Improving Performance with Chain of Thought (CoT)

To enhance content moderation capabilities, “chain-of-thought” (CoT) prompting encourages the language model to break down its reasoning process into a step-by-step chain of thoughts rather than just providing the final output.

What is Chain of Thought (CoT)?

Chain of Thought prompting helps the model think through the problem step-by-step, making its reasoning process more transparent and reliable.

So we take a TASK -> and break it down into TASK 1 -> TASK 2 -> So on

For this case: We can take this single task :

Based on the guidelines above, classify this text as either ALLOW or BLOCK. 
Return nothing else.

And Convert it into a series of actions like :

First, inside of <thinking> tags, identify any potentially concerning aspects 
of the post based on the guidelines below and consider whether those
aspects are serious enough to block the post or not.

Finally, classify this text as either ALLOW or BLOCK inside <output> tags.
Return nothing else.

Given those instructions, here is the post to categorize:

Example of CoT Prompt

Here’s how you can modify your prompt to use CoT:

cot_prompt = '''You are a content moderation expert tasked with categorizing 
user-generated text based on the following guidelines:

BLOCK CATEGORY:
- Content that is not related to rollercoasters, theme parks, or the amusement industry
- Explicit violence, hate speech, or illegal activities
- Spam, advertisements, or self-promotion

ALLOW CATEGORY:
- Discussions about rollercoaster designs, ride experiences, and park reviews
- Sharing news, rumors, or updates about new rollercoaster projects
- Respectful debates about the best rollercoasters, parks, or ride manufacturers
- Some mild profanity or crude language, as long as it is not directed at individuals

First, inside of <thinking> tags, identify any potentially concerning aspects
of the post based on the guidelines below and consider whether those
aspects are serious enough to block the post or not.

Finally, classify this text as either ALLOW or BLOCK inside <output> tags.
Return nothing else.

Given those instructions, here is the post to categorize:

<user_post>{user_post}</user_post>'''

post_titles = [
"Top 10 Wildest Inversions on Steel Coasters",
"My Review of the New RMC Raptor Coaster at Cedar Point",
"Best Places to Buy Cheap Hiking Gear",
"Rumor: Is Six Flags Planning a Giga Coaster for 2025?",
"My Thoughts on the Latest Marvel Movie",
]

for title in post_titles:
classification = moderate_text(title, rollercoaster_guidelines)
print(f"Title: {title}\nClassification: {classification}\n")

Conclusion

By using the Groq API and leveraging few-shot prompting and chain-of-thought reasoning, you can build a sophisticated and customizable content moderation filter.

Key Takeaways

  • Understand LLMs: Learn what LLMs are and their capabilities.
  • Practical Applications: See real-world examples of content moderation.
  • Hands-On Practice: Follow step-by-step instructions to build and customize your own content moderation filter.

Next Steps

I would suggest you explore more about LLMs and advanced prompting techniques. Experiment with different use cases like chatbots or recommendation systems. Engage with online forums and communities to learn from others and share your projects or reach out to me for ideas !

Additional resources:

Thanks for reading! If you found this article useful, please leave a comment or a clap.

Follow me to stay updated on my latest articles!

--

--

Devesh Surve

Grad student by day, lifelong ML/AI explorer by night. I dive deep, then share easy-to-understand, step-by-step guides to demystify the complex.