Monitor your Langchain app cost using Bedrock with Langfuse

Do you have or want to create a Langchain with AWS Bedrock and are concerned about monitoring costs? Say no more! You are in the right place.

Representation of cost monitoring for AWS Bedrock with langfuse

Introduction

In this post, you will learn:

  • Why monitoring is so important
  • What are the current limits of monitoring AWS Bedrock in AWS
  • What Langfuse is and how it helps
  • How to use Langfuse to have precise cost and latency monitoring of your Langchain application

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

  • A functional AWS account where you have admin permission (check this link)
  • Have access to at least one LLM model on AWS Bedrock
  • An AWS user setup on you laptop (check this link)
  • A functional Python environnement
  • Having read the post on using AWS Bedrock with Langchain (link)
  • Having read the post on monitoring Langchain apps with Langfuse (link)
  • You will be able to find all the code of this post here.

The importance of monitoring for LLMs

As said in this post (link), monitoring and more precisely, cost monitoring is incredibly important in the case of LLM apps.
Let’s take the example of a chat app that you would have deployed and is public. To lower the cost but still keep enough quality response, you decide to use Claude Haiku on Bedrock.
Here are some interesting details for this model (checkout this and this for the sources):

  • Price per 1000 input tokens: $0.00025
  • Price per 1000 output tokens: $0.00125
  • Quota usage: 1,000 queries / minute

Let’s say a query equal 1000 input tokens and 1000 output tokens. This means one query will cost $0.0015. Now let’s say your app gets viral and you have maximum usage on your application. This means you will have a maximum of 1000 queries per minute. Here’s what all this gonna cost you:

  • Per minute: 0.0015 * 1000 = $1,50
  • Per hour: 0.0015 * 1000 * 60 = $90
  • Per day: 0.0015 * 1000 * 60 * 24 = $2160

You are seeing this right, you would pay, in the worst case scenario, $2000 a day … Now you get why at least cost monitoring is so important.

Cost monitoring for LLMs in AWS

The go-to service for cost monitoring in AWS is the Cost Explorer. Here’s what is looks like.

COst explorer in AWS

This is actually very helpful and allow you to find the cost associated with each service but not in real time (you will see all the costs up to the previous day). You can get the price of each service by date, region, usage type, and more importantly by tags.

In AWS you can add tags to resources you create so that you can follow the cost. Image you have two application deployed on an AWS account. If you tag all the resource of each application with different tags, you will be able to really know what is the cost of each. This is really very helpful in general in AWS to follow what you are paying.

However, this does not apply to AWS Bedrock call. You cannot put tags inside you call so there is no way to see what is happening finely. You can just get the total cost of each LLM model, not per application. So we need to find an alternative which is Langfuse.

The alternative: Langfuse

So first what is Langfuse ? Here’s the official description:
Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.
It is an open source solution that you can deploy yourself (free) or a cloud version that you could use directly (with a monthly fixed price).
It has a lot of interesting features but here’s what will interest us here:

  • Langchain integration: Native integration with Langchain that is done with one line.
  • Custom model integration: Possibility to add custom model pricing, in this case, for Bedrock.
  • Nice UX: A nice interface where you can easily see the cost for different parameters
  • Security: Users management with different user profiles and projects

We will connect Langfuse to our Bedrock Langchain app and finally see how much all this cost us!

Initialize the work environment

We will use the code from this post on AWS Bedrock LLM models for our test.

Create a folder and prepare the virtual env

First, create a folder called LLM-monitoring-for-AWS-Bedrock:

mkdir LLM-monitoring-for-AWS-Bedrock

Now copy inside it all the files of the previous post on Bedrock:

cp -R langchain-bedrock-multi-models/* LLM-monitoring-for-AWS-Bedrock

Ok now, you should have the Pipfile, the lock file and the notebook. Now we just need to add langfuse and langchain-aws as dependency. The first is the official python package to interact with Langfuse and the second is the implementation of AWS services inside Langchain (if you do not get this package, you will have old implementation for AWS that most likely will not work well).

cd LLM-monitoring-for-AWS-Bedrock
pipenv install langfuse langchain-aws

Now you can just launch the jupyter to test that the virtualenv works

pipenv run jupyter lab

Ok now that your environnement is setup, we can advance on the next step of the post, setuping Langfuse.

Langfuse setup

For the sake of simplicity, we are going to use the Langfuse cloud. So first, go on the link and create an account. You will have the following screen in which you will need to create a project (with whatever name):

Now let’s create some keys (which are credentials to connect your application to Langfuse). You can do that by clicking on the Create new API keys button in the API keys section.

Now keep these 3 values somewhere, we will need them to integrate Langfuse into our code.
Ok now that we have Langfuse setup, we can really begin the coding part.

Quick review of the code

So first, let’s do a quick review of the current code.

from langchain_aws.chat_models import ChatBedrock
from langchain_core.runnables import ConfigurableField

aws_region_name = "us-east-1" # Or whatever region you want to use
credentials_profile_name = "default"
claude_3_sonnet = "anthropic.claude-3-sonnet-20240229-v1:0"
mistral_large = "mistral.mistral-large-2402-v1:0"

mistral_large_bedrock_chat = BedrockChat(
    model_id=mistral_large,
    credentials_profile_name=credentials_profile_name,
    region_name=aws_region_name,
)

_model_alternatives = {
    "mistral_large": mistral_large_bedrock_chat
}

claude_3_sonnet = BedrockChat(
    model_id=claude_3_sonnet,
    credentials_profile_name=credentials_profile_name,
    region_name=aws_region_name,
)

bedrock_llm = claude_3_sonnet.configurable_alternatives(
    which=ConfigurableField(
        id="model", name="Model", description="The model that will be used"
    ),
    default_key="claude_3_sonnet",
    **_model_alternatives,
)

Let’s see what is happening here:

  • We setup the configuration like the AWS profile you want to use, the region and the id of each LLMs in AWS
  • We create two BedrockChat object that will act as interface between the Bedrock service for their respective models. This BedrockChat is actually implemented in the langchain-aws package which is why we added it in the project.
  • Now we create a configurable alternative between the the BedrockChat so that we can switch between each model with just a configuration (you can check this post for more information).
from langchain_core.prompts import PromptTemplate

_MISTRAL_PROMPT = PromptTemplate.from_template(
    """
<s>[INST] You are a conversational AI designed to answer in a friendly way to a question.
You should always answer in rhymes.

Human:
<human_reply>
{input}
</human_reply>

Generate the AI's response.[/INST]</s>
"""
)

_CLAUDE_PROMPT = PromptTemplate.from_template(
    """
You are a conversational AI designed to answer in a friendly way to a question.
You should always answer in jokes.

Human:
<human_reply>
{input}
</human_reply>

Assistant:
"""

_CHAT_PROMPT_ALTERNATIVES = {"mistral_large": _MISTRAL_PROMPT}

CONFIGURABLE_CHAT_PROMPT = _CLAUDE_PROMPT.configurable_alternatives(
    which=ConfigurableField(
        id="model",
        name="Model",
        description="The model that will be used",
    ),
    default_key="claude_3_sonnet",
    **_CHAT_PROMPT_ALTERNATIVES
)

Now let’s see what is happening here:

  • We create a prompt for each model (as each model have different format for their input)
  • We create an alternative for the prompt, in the same way as with the LLM and using the same key, so that when choosing a specific model, the correct prompt will also be used.
from langchain.schema.output_parser import StrOutputParser

chain = CONFIGURABLE_CHAT_PROMPT | bedrock_llm | StrOutputParser()

chain.invoke(input="What is a large language model ?")

chain \
    .with_config(configurable={"model": "claude3_sonnet"}) \
    .invoke("What is a large language model ?")

Now we can finally:

  • Create the chain that will chain the configurable prompt with the configurable llm and and StrOutputParser to output the answer of the LLM as a string
  • Invoke the chain, either without any configuration so the default model which is Claude Sonnet will be used, or we can setup a configuration to specify the model
  • This is a classical Langchain Chain using LCEL.

Langfuse integration

Now let’s integrate Langfuse in the code.

from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler(
  secret_key="sk-****",
  public_key="pk-****",
  host="https://cloud.langfuse.com"
)

Here what happens:

  • We just created the Langfuse callback. Callback are object that are executed by Langchain for each step of a chain.
  • This callback will send all the interesting parameters to Langfuse. This is where you will put the credentials that you have created in Langfuse.
from langchain.schema.runnable import RunnableConfig

chain = (
    CONFIGURABLE_CHAT_PROMPT | bedrock_llm | StrOutputParser()
).with_config(RunnableConfig(callbacks=[langfuse_handler]))

Here, we simply:

  • Setup the langfuse callback inside the chain.
  • Each time the chain is run, all metrics will be sent to Langfuse.
chain \
    .with_config(configurable={"model": "claude_3_sonnet"}) \
    .invoke("What is a large language model ?")

chain \
    .with_config(configurable={"model": "mistral_large"}) \
    .invoke("What is a large language model ?")

Here we launch the chain two time, one with Claure 3 Sonnet and the other with Mistral Large.
Let’s see what is happening on Langfuse!

Cost monitoring in Action

Now let’s see what this looks like:

Screenshot of jupyterlab with monitoring code for llm

As you can see, there are two traces here and what is interesting is that we have the input and output token in the column Usage along with the Latency and the metadata of the call.

And if we go inside one call, here’s what is happening:

Here you can see:

  • The token usage
  • The latency
  • All your chain steps
  • The content of the input and output

And if you click on the Generation menu on the left side, you will arrive at a similar screen but with more details like the id of the model (here anthropic.claude-3-sonnet-20240229-v1:0).

This is great but there is one missing piece. Why is the Total Cost column still at 0$ ? This is because we need to add the Bedrock LLM model cost in Langfuse to work.

Add Bedrock pricing in Langfuse

There is something called Models in Langfuse where the pricing per tokens for each model is there:

You can find most of the models like the ones from OpenAI or directly from Anthropic or Mistral. But the ones from Bedrock are missing so let’s add them.

We just need to add them by clicking on Add model definition:

You will need to give a regex to find the id of the model but it is straighforward, you can just use the same kind that I use here:

(?i)^([a-z]+)\.(claude-3-sonnet-20240229)-v(\d+:\d+)$
(?i)^([a-z]+)\.(mistral-large-2402)-v(\d+:\d+)$

For the pricing, you can find them on the official AWS Bedrock page: link.

So now that we have added our two models, let’s see what we have if we launch new LLM calls:

TADAAA! We finally have the cost of each call. Isn’t that beautiful. Now you can finally be more at ease when you will deploy this kind of app in the future.

Conclusion

In this post, you learned how to monitor cost when using AWS Bedrock in your Langchain calls using Lanfuse. This is really a very well done tool so we really recomand you to test it which is open-source and has a free tier.
But whatever tool you use, don’t forget that cost monitoring is really a critical piece in every LLM application that will be shared with people and it is something that will help you sleep at night!

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !

Leave a Reply

Your email address will not be published. Required fields are marked *