Site icon Use AI the right way

Monitor your Langchain app cost using Bedrock with Langfuse

Do you have or want to create a Langchain with AWS Bedrock and are concerned about monitoring costs? Say no more! You are in the right place.

Introduction

In this post, you will learn:

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

The importance of monitoring for LLMs

As said in this post (link), monitoring and more precisely, cost monitoring is incredibly important in the case of LLM apps.
Let’s take the example of a chat app that you would have deployed and is public. To lower the cost but still keep enough quality response, you decide to use Claude Haiku on Bedrock.
Here are some interesting details for this model (checkout this and this for the sources):

Let’s say a query equal 1000 input tokens and 1000 output tokens. This means one query will cost $0.0015. Now let’s say your app gets viral and you have maximum usage on your application. This means you will have a maximum of 1000 queries per minute. Here’s what all this gonna cost you:

You are seeing this right, you would pay, in the worst case scenario, $2000 a day … Now you get why at least cost monitoring is so important.

Cost monitoring for LLMs in AWS

The go-to service for cost monitoring in AWS is the Cost Explorer. Here’s what is looks like.

This is actually very helpful and allow you to find the cost associated with each service but not in real time (you will see all the costs up to the previous day). You can get the price of each service by date, region, usage type, and more importantly by tags.

In AWS you can add tags to resources you create so that you can follow the cost. Image you have two application deployed on an AWS account. If you tag all the resource of each application with different tags, you will be able to really know what is the cost of each. This is really very helpful in general in AWS to follow what you are paying.

However, this does not apply to AWS Bedrock call. You cannot put tags inside you call so there is no way to see what is happening finely. You can just get the total cost of each LLM model, not per application. So we need to find an alternative which is Langfuse.

The alternative: Langfuse

So first what is Langfuse ? Here’s the official description:
Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.
It is an open source solution that you can deploy yourself (free) or a cloud version that you could use directly (with a monthly fixed price).
It has a lot of interesting features but here’s what will interest us here:

We will connect Langfuse to our Bedrock Langchain app and finally see how much all this cost us!

Initialize the work environment

We will use the code from this post on AWS Bedrock LLM models for our test.

Create a folder and prepare the virtual env

First, create a folder called LLM-monitoring-for-AWS-Bedrock:

mkdir LLM-monitoring-for-AWS-Bedrock

Now copy inside it all the files of the previous post on Bedrock:

cp -R langchain-bedrock-multi-models/* LLM-monitoring-for-AWS-Bedrock

Ok now, you should have the Pipfile, the lock file and the notebook. Now we just need to add langfuse and langchain-aws as dependency. The first is the official python package to interact with Langfuse and the second is the implementation of AWS services inside Langchain (if you do not get this package, you will have old implementation for AWS that most likely will not work well).

cd LLM-monitoring-for-AWS-Bedrock
pipenv install langfuse langchain-aws

Now you can just launch the jupyter to test that the virtualenv works

pipenv run jupyter lab

Ok now that your environnement is setup, we can advance on the next step of the post, setuping Langfuse.

Langfuse setup

For the sake of simplicity, we are going to use the Langfuse cloud. So first, go on the link and create an account. You will have the following screen in which you will need to create a project (with whatever name):

Now let’s create some keys (which are credentials to connect your application to Langfuse). You can do that by clicking on the Create new API keys button in the API keys section.

Now keep these 3 values somewhere, we will need them to integrate Langfuse into our code.
Ok now that we have Langfuse setup, we can really begin the coding part.

Quick review of the code

So first, let’s do a quick review of the current code.

from langchain_aws.chat_models import ChatBedrock
from langchain_core.runnables import ConfigurableField

aws_region_name = "us-east-1" # Or whatever region you want to use
credentials_profile_name = "default"
claude_3_sonnet = "anthropic.claude-3-sonnet-20240229-v1:0"
mistral_large = "mistral.mistral-large-2402-v1:0"

mistral_large_bedrock_chat = BedrockChat(
    model_id=mistral_large,
    credentials_profile_name=credentials_profile_name,
    region_name=aws_region_name,
)

_model_alternatives = {
    "mistral_large": mistral_large_bedrock_chat
}

claude_3_sonnet = BedrockChat(
    model_id=claude_3_sonnet,
    credentials_profile_name=credentials_profile_name,
    region_name=aws_region_name,
)

bedrock_llm = claude_3_sonnet.configurable_alternatives(
    which=ConfigurableField(
        id="model", name="Model", description="The model that will be used"
    ),
    default_key="claude_3_sonnet",
    **_model_alternatives,
)

Let’s see what is happening here:

from langchain_core.prompts import PromptTemplate

_MISTRAL_PROMPT = PromptTemplate.from_template(
    """
<s>[INST] You are a conversational AI designed to answer in a friendly way to a question.
You should always answer in rhymes.

Human:
<human_reply>
{input}
</human_reply>

Generate the AI's response.[/INST]</s>
"""
)

_CLAUDE_PROMPT = PromptTemplate.from_template(
    """
You are a conversational AI designed to answer in a friendly way to a question.
You should always answer in jokes.

Human:
<human_reply>
{input}
</human_reply>

Assistant:
"""

_CHAT_PROMPT_ALTERNATIVES = {"mistral_large": _MISTRAL_PROMPT}

CONFIGURABLE_CHAT_PROMPT = _CLAUDE_PROMPT.configurable_alternatives(
    which=ConfigurableField(
        id="model",
        name="Model",
        description="The model that will be used",
    ),
    default_key="claude_3_sonnet",
    **_CHAT_PROMPT_ALTERNATIVES
)

Now let’s see what is happening here:

from langchain.schema.output_parser import StrOutputParser

chain = CONFIGURABLE_CHAT_PROMPT | bedrock_llm | StrOutputParser()

chain.invoke(input="What is a large language model ?")

chain \
    .with_config(configurable={"model": "claude3_sonnet"}) \
    .invoke("What is a large language model ?")

Now we can finally:

Langfuse integration

Now let’s integrate Langfuse in the code.

from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler(
  secret_key="sk-****",
  public_key="pk-****",
  host="https://cloud.langfuse.com"
)

Here what happens:

from langchain.schema.runnable import RunnableConfig

chain = (
    CONFIGURABLE_CHAT_PROMPT | bedrock_llm | StrOutputParser()
).with_config(RunnableConfig(callbacks=[langfuse_handler]))

Here, we simply:

chain \
    .with_config(configurable={"model": "claude_3_sonnet"}) \
    .invoke("What is a large language model ?")

chain \
    .with_config(configurable={"model": "mistral_large"}) \
    .invoke("What is a large language model ?")

Here we launch the chain two time, one with Claure 3 Sonnet and the other with Mistral Large.
Let’s see what is happening on Langfuse!

Cost monitoring in Action

Now let’s see what this looks like:

As you can see, there are two traces here and what is interesting is that we have the input and output token in the column Usage along with the Latency and the metadata of the call.

And if we go inside one call, here’s what is happening:

Here you can see:

And if you click on the Generation menu on the left side, you will arrive at a similar screen but with more details like the id of the model (here anthropic.claude-3-sonnet-20240229-v1:0).

This is great but there is one missing piece. Why is the Total Cost column still at 0$ ? This is because we need to add the Bedrock LLM model cost in Langfuse to work.

Add Bedrock pricing in Langfuse

There is something called Models in Langfuse where the pricing per tokens for each model is there:

You can find most of the models like the ones from OpenAI or directly from Anthropic or Mistral. But the ones from Bedrock are missing so let’s add them.

We just need to add them by clicking on Add model definition:

You will need to give a regex to find the id of the model but it is straighforward, you can just use the same kind that I use here:

(?i)^([a-z]+)\.(claude-3-sonnet-20240229)-v(\d+:\d+)$
(?i)^([a-z]+)\.(mistral-large-2402)-v(\d+:\d+)$

For the pricing, you can find them on the official AWS Bedrock page: link.

So now that we have added our two models, let’s see what we have if we launch new LLM calls:

TADAAA! We finally have the cost of each call. Isn’t that beautiful. Now you can finally be more at ease when you will deploy this kind of app in the future.

Conclusion

In this post, you learned how to monitor cost when using AWS Bedrock in your Langchain calls using Lanfuse. This is really a very well done tool so we really recomand you to test it which is open-source and has a free tier.
But whatever tool you use, don’t forget that cost monitoring is really a critical piece in every LLM application that will be shared with people and it is something that will help you sleep at night!

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !

Exit mobile version