Site icon Use AI the right way

Add monitoring easily to your Langchain chains with Langfuse

You have created a nice Langchain application using the latest best practices (that you read on this blog, right? Of course!) and you are wondering how you are going add monitoring to it ?
You have come to the right place.

Introduction

In this blog post, you will learn:

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

Monitoring definition

Here’s a definition of monitoring:

Monitoring in software development refers to the process of continuously observing and tracking the performance, behavior, and health of software applications and systems.
The goal is to detect and diagnose issues, ensure optimal performance, and maintain the reliability and availability of the software
.

So the goal is to see at all times what is happening inside your application and detect any problems. It goes from the simple log printing in the code to the monitoring dashboard where you can see all the metrics of you application like CPU/Memory usage or the network load used.

You can argue that monitoring is not useful when you are just beginning or your are doing an application just for you. But at the moment you want to share it with people, then it should become a necessity, even more in LLM applications.

Monitoring for LLM application

Monitoring for LLM applications that are open for the public in an absolute must because:

That is why, even more than with traditional systems, you need some serious monitoring for your LLM applications.

Multiple types of monitoring

When monitoring a LLM application, you will need to add 2 type of monitoring:

You always need to combine these two as the LLM monitoring only works when the langchain chain or the call works. The only way to know why your chain doesn’t work is by using the classical monitoring.

In our case, we will focus on the llm monitoring part as the classical monitoring heavily depends on what infrastructure you use to host your application.

LLM monitoring with Langfuse

For the LLM monitoring, we are going to use Langfuse. Here’s the official description:
Langfuse is an open-source LLM engineering platform that helps teams collaboratively debug, analyze, and iterate on their LLM applications.
It is an open source solution that you can deploy yourself (free) or a cloud version that you could use directly (with a monthly fixed price).
It has the following features:

So this is actually a great tool that you can use in any LLM application to monitor it. You will get the token count and the usage, you can mange prompts and use them inside your code, and you can easily integrate the tool in Langchain. But it also has some drawbacks:

So now that we presented everything, we need to implement this monitoring to see what is can do.

Initialize the work environnement

We will use the same setup as the previous posts, meaning Streamlit, Langchain, FAISS vector store and Pipenv for managing virtual env. For better readability, we will create a new folder called LLM-monitoring-with-complexe-chain and copy inside it all the files needed from the tutorial create-complex-webapp-with-langchain-best-practices.

mkdir LLM-monitoring-with-complexe-chain
cp -R create-complex-webapp-with-langchain-best-practices/* LLM-monitoring-with-complexe-chain/
cp create-complex-webapp-with-langchain-best-practices/.env LLM-monitoring-with-complexe-chain/

This will create the folder with the code for the tutorial using the previous post. We also made sure to copy the .env file that contains the OpenAI credentials.
Now let’s install the project:

cd LLM-monitoring-with-complexe-chain
pipenv install

And make sure it works:

pipenv run streamlit run app.py

Setup of Langfuse

For the sake of simplicity, we are going to use the Langfuse cloud. So first, go on the link and create an account. You will have the following screen in which you will need to create a project (with whatever name):

Then, you will arrive here:

Now let’s create some keys (which are credentials to connect your application to Langfuse). You can do that by clicking on the Create new API keys button in the API keys section.

Then you can use these values to create the following environnement variables into your .env file. Of course the values in the screens are already invalid.

LANGFUSE_PUBLIC_KEY=pk-xxx
LANGFUSE_SECRET_KEY=sk-xxx
LANGFUSE_HOST=https:xxx

We also need to add Langfuse to the dependency of the project:

pipenv install langfuse

Ok, now you are setup and we can integrate Langfuse directly inside Langchain.

Langfuse integration

The code that we will use is the code from the tutorial on creating a complex Langchain chain. That means it is a chain using LCEL and so you will need to use a specific method to integrate. This method is called callbacks.
LangChain provides a callbacks system that allows you to hook into the various stages of your LLM application. This is useful for logging, monitoring, streaming, and other tasks.
What is happening is that, whenever you chain or a task of your chain finish, Langchain will call the callback with all the information of the chains.
In this case, we will use the Langfuse callback that will save these chain data into Langfuse.
Let’s see how to do that:

from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler()

Here’s what’s happening:

Now we just need to set this callback in the chain:

chain = (
    inputs | context | configurable_prompt | model | StrOutputParser()
).with_config(RunnableConfig(callbacks=[langfuse_handler]))

Here’s what’s happening:

Ok it is done! Very simple right ? Now let’s test it.

Test the monitoring

Now it is time to test the monitoring. Let’s upload the vector store files and ask a question.

And here’s what you have in Langfuse:

As you can see, you have one trace that shows which is the query you just did. You also have the cost of the query, the latency, the score if you use a vector store compatible and even the full input and output. This is really great right ?

Conclusion

As you have seen, monitoring is actually a very important part of an LLM application that allow us to be confident of how it works and avoid any pitfalls that will slow down your application grow. It is a necessary piece to any application that will be open to users. They are also simple to integrate so there is no excuse to not add any to your application. This is really the part that will allow you to sleep at night !

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !

Exit mobile version