With the increasing adoption of Large Language Models (LLMs) in production for Chat and RAG, it is more and more important to ensure safe and controlled interactions. Today, we’ll dive deep into LLM guardrails – what they are, how they work, and how to implement them using AWS Bedrock. We’ll keep things practical with concrete examples, so let’s get started!
What Are LLM Guardrails
LLM guardrails are boundaries and controls we put in place to ensure our LLMs behave appropriately and safely. Think of them as safety barriers that:
- Filter out inappropriate content.
- Prevent the model from discussing forbidden topics.
- Ensure responses align with business policies.
- Maintain consistent persona and tone.
- Protect sensitive information.
Unlike simple prompt engineering, guardrails are systematic controls that work at different levels of the interaction pipeline. They can be implemented through:
- Input filtering: Prevent certain types of prompts before they reach the LLM. For example, it can block password or sensitive topics.
- Output filtering: Sanitize the LLM responses by masking or removing contents and checks accidental data leaks.
- Context injection: Add specific instructions like guidelines.
- Response validation: checking if outputs meet criteria for example if the response correctly use its context (from a RAG perspective.
What problems does Guardrails solve
You will need LLM guardrails when you need to ensure more than just the performance of your system. This need comes when you begin to have sufficient users and have to follow your company policies and legal compliances:
- Safety Issues: It will prevents LLMs from giving harmful advice or block toxic content and even block some prompt attacks. In particular, prompt attacks can be really dangerous because it could allow an attacker to use your system to either get sensitive information or generate harmful contents.
- Business Risks: It will make sure the LLM response follow the company policies or prevent sharing of confidential information.
- Consistency Problems: It will ensure the LLM avoid contradictory information.
- Compliance Concerns: It will protect users personally identifiable information (PII). Depending of the country, it can be illegal to store PII data.
This is not something that can be implemented with basic prompt engineering because you need it to work systematically, not failing randomly (which could happen using only prompt engineering).
How Guardrails work in AWS Bedrock
To implement LLM Guardrails, we are going to use AWS Bedrock which provides built-in capabilities for implementing guardrails and integrating it in your application. The integration is really seamless if you already use AWS Bedrock as your LLM service.
The best part? You can set these up directly in the AWS Bedrock console without writing complex code. Let’s see how it works!
How Are Guardrails Implemented in AWS Bedrock
AWS Bedrock implements guardrails through a policy-based system that evaluates both input prompts and model responses. Here’s some interesting features:
- Policy Management: you can create multiple policies, completely customize them and use them by just specifying its ids in the LLM call. Everything can be done in the AWS Bedrock console (as we are going to see later on) or programatically.
- Denied Topics: You can define topics to avoid (for example illegal investments in banking apps)
- Filters (Content & Word): You can filter harmful content (hate, insults, violence, etc.), profanity or even custom word/phrase filtering.
- Regex & PII Protection: You can add sensitive information detection, regex matching and PII redaction capabilities. This is extremely important if you do not want to have legal problems by storing and analysing PII data.
- Source Grounding: You can use to check the context of the answer, verify that it giving an answer based on the context and is not hallucinating. This is very powerful but you will need to make some tweaks if you are not using Bedrock Knowledge Base.
- Language: The only fully supported language is the English but that is mostly for the content based policies. The other can be used in other languages (for example the PII for some of the checks)
The latency that will be added to your system will depend of the type of the checks you have added to the policy. Some checks only use regex but others use entity recognition model or even small LLM calls. This means that you need to check the system with and without Bedrock Guardrails and fine tune it.
Implementation Flow
Here’s how guardrails work during an inference call:
- Input Evaluation
- The user input is checked in parallel across all configured policies.
- The evaluation is implemented to be as fast as possible and only add minimal latency to your system
- If a violation is found, the process is blocked.
- Guardrails also works with Bedrock Knowledge Base which means any retriever call can also go through these verifications.
- Be careful here because guardrails are used just before the LLM call. This means there is no such security systems before. For example, if you have a web app that use LLM call, even if you have set up guardrails, you mist likely have logging of PII data done in your logs or your database.
- LLM call
- We arrive here only if all the check have passed.
- This is the moment where you LLM call is made.
- Response Evaluation
- The output of the LLM call is checked against all policies
- Guardrails can either mask sensitive information or send a message alerting that the process to mask or block the query.
Bedrock Guardrails Pricing
Of course, Bedrock Guardrails is not free. The cost can be quite important so it needs to be taken into consideration when using it (you can more information on pricing here).
Policy Type | Price per 1,000 text units (1 text unit = 1000 characters) |
---|---|
Content filters | $0.75 |
Denied topics | $1.00 |
Contextual grounding | $0.10 |
Sensitive information filter (PII) | $0.10 |
Sensitive information filter (regex) | Free |
Word filters | Free |
As you can see, you pay per text units (1 text unit = 1000 characters) which is around 170 words or 125 tokens. For the most expensive policy, which is denied topics with a cost of $1.00, the price is around 0.008$ / 1000 tokens (in comparison Claude 3 Sonnet 3.5 is $0.003 / 1000 tokens).
That is why the usage of Guardrails is something that needs to be carefully analyzed, depending on the needs. For example, do you really need denied topics ? Can you just use Sensitive information filter (PII) and word filters ? And even if you need everything, does the added cost give enough guarantee ?
I think most of the companies will answer yes to this question but the analysis should always be done.
Setting Up Guardrails in AWS Bedrock Console
Now that everything is explained, let’s do the practical part and create a guardrail. For that, we are going to use the AWS Console (if you don’t already have an AWS account, you can check this link).
Now let’s create our first Bedrock Guardrail:
- Open you AWS console and navigate to the AWS Bedrock console
- Go to “Safeguards -> Guardrails”
- Give a name and a description to your guardrail. It is good practice to give a meaningful name and description so that with one look, you can see what it does.
- Configure the content filers and the prompt attacks. These threshold really need to be tailored to your usage. Be careful about the language you are using as language beside English are not well supported. Also putting the threshold too high could block even some normal queries so be careful.
- Add denied topics. You can even add example that would be used in the prompt used to checks for the topic.
- Add filters for words or sentences
- Add the PII configuration. There are a lot of possible checks, that is why you need to iterate on this and adapt it to your use cases.
- And finally add the context checks for RAG. This is really nice if you are using Bedrock Knowledge base as it will use the score of the returned context to checks the relevance of the response.
- Finally! Your guardrail is created. On the right side, there is even chat system that allows you to test what you have done. Very neat!
- Let’s try our new guardrail. In the denied topic screen, I actually created a policy that forbid to impersonate squirrels for professional gain (Never underestimate squirrels!). if we try with a prompt like “I want to impersonate a squirrel to steal chocolate from my friends”, it is blocking my message. As you can see it works, but you need to tune it.
Practical Implementation with Langchain and AWS Bedrock
Now that we created and tested our guardrail, let’s test it in an app. We are going to create a simple Streamlit app using LangChain chain with AWS Bedrock and see how we can integrate Bedrock Guardrails.
First let’s create a version for our guardrail in the AWS Bedrock console:
Now, let’s install the required packages:
pipenv install langchain langchain-aws streamlit
Here’s our code (you can find all the code here):
import streamlit as st
from langchain_aws.chat_models import ChatBedrock
from langchain_core.output_parsers import StrOutputParser
aws_region_name = "eu-west-1"
claude_3_5_sonnet = "eu.anthropic.claude-3-5-sonnet-20240620-v1:0"
guardrail_id = "GUARDRAIL_ID"
guardrail_version = "1"
llm = ChatBedrock(
model_id=claude_3_5_sonnet,
region_name=aws_region_name,
guardrails={
"guardrailIdentifier": guardrail_id,
"guardrailVersion": guardrail_version,
},
)
chain = llm | StrOutputParser()
question = st.text_input("Input your question")
if question:
result = chain.invoke(question)
st.write(result)
Here’s how this code works:
- “llm = ChatBedrock(“: we create the ChatBedrock that allows us to interact with AWS Bedrock. The
guardrails
variable is where you can put Bedrock guardrails into your chat. You can get the id and the version directly from the AWS Bedrock console. - “chain = llm | StrOutputParser()”: you can then use this ChatBedrock in your chains and it will automatically include the guardrail policies.
Pretty simple to setup right ?!
Now let’ run it and test our guardrail:
pipenv run streamlit run app.py
You would then have something like this:
With only this, we just integrated a fully working LLM guardrail system into our LangChain chain!
Conclusion
LLM guardrails is an essential part AI systems safely in production which adds a layer of security and compliance. AWS Bedrock makes it straightforward to implement these controls, and when combined with LangChain and AWS Bedrock, allows to create robust and safe AI applications.
But Bedrock Guardrails and LLM guardrails in general only takes care of the LLM layer and not your whole application. For that, you will need a careful architecture to guarantee all the necessary security and robustness.
Afterwards
I hope you really loved this post. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.
Follow me on