Easily create production ready APIs over your LangChain chains using LangServe

So you’ve got some nice LangChain chains and you want to expose them to the world as an API? Or you need to create some internal LLM-based APIs for your projects ? Are you looking for something robust, scalable, production-ready, and supports cool features like streaming? Say no more, because we are going to show how to use LangServe.

Introduction

In this post, you will learn:

What is an API and why is it so important?
What is LangServe?
How to use LangServe to create a production ready API over your chains

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

To fully understand this blog post, we advise reading:

Our explanation on LangChain LCEL (here’s the link)
Our explanation on LangChain configurable (here’s the link)
You can find all the code of this blog post here

What is an API?

Here’s a definition of an API:

An API, or Application Programming Interface, is a set of rules and protocols for building and interacting with software applications. It defines the methods and data formats that developers can use to communicate with the software component, such as a web server or a database, effectively allowing different software systems to interact with each other.

So in summary, it is just defining how applications can communicate with each other. Here’s a list of what needs to be defined:

Routes: specific paths defined in the API that are linked to a specific function. For example, you can have /users that will give back the list of the users and you will also have /user that will take an ID an return the corresponding user information.
Methods: HTTP verbs that tell the server what action to perform. For example, GET (retrieve data), POST (create data), PUT (update data), DELETE (remove data), and PATCH (modify data).
Protocol: set of rules that defines how the data travels across the network like HTTP/HTTPS.
Request and Response: the request is when the client call the API and the response is the answer of the API to the request.
Parameters: the parameters of each request. For example, for /user, you would want to give at least the ID of the user.
Authentication and Authorization: How the API handle who has the right to call it and what can be done. One example is API keys where you need to give a specific key that will allow for specific actions.

So yeah, there is a lot thing to define to create an API and it is not something that can be done overnight because this will literally be the manual on how you want people to use your system.

Why is it so important?

When you want to create a service open to the public, you have basically three ways to do it.

Create an interface a human can use, whether it is a chat application or an audio system, so that your users will use your system.
Create an API where other interface and other system communicate with it.
Combine both approaches. This is actually the best way as you will have an interface where you will only have the human interaction part and then you will have the API where all the heavy work is done. And you can also open this API so that other system can interface with it.

What is different with LLM based application is that you are even more dependent on APIs because:

LLM agents can “reason” a big task into a list of smaller tasks
LLMs have the capability to call other API by themselves depending of the need
That means your service / app could be able to call these API or even be called by them

In conclusion, APIs are complicated to create but is a fundamental piece of a service / application. Now let’s see how to create one for your Langchain application.

LangServe to the rescue

As you’ve seen, they are many thing to define when creating APIs and to simplify the process, frameworks have been created. They are a lot of them but we are going to see one tailored specifically for LangChain: LangServe:

LangServe helps developers deploy LangChain runnables and chains as a REST API.
This library is integrated with FastAPI and uses pydantic for data validation.

Here’s the features of LangServe:

Easily create API routes for LangChain chains.
Based on FastAPI (one of the most popular python API framework) and other solid frameworks like asyncio and Pydantic, it offers excellent performance, robustness, and flexibility.
Input and Output typing inferred from your chains, so that if one chain accept only a string as an input, everything else will raise an error. This really add a lot of innate robustness to any APIs made with it.
API doc page using swagger generated automatically at /docs.
Efficient implementation of /invoke, /stream and /batch so that, if you use LangChain LCEL, you can have a chain that can be used either by just invoking it, or stream the result to your application as the output is coming or send multiple invocation at the same time.
Add a /playground to all your routes so that you can test them directly in the browser.

As you can see, LangServe make creating APIs over LangChain far easier while maintaining a standard of robustness, flexibility and performance and even give some very interesting nuggets like the playground.

But these are just words. Let’s see some concrete work!

Initialize the LangServe project

To create our project we will use the langchain-cli that will create the skeleton of the project for us. I would recommend using this as they are weird requirements to make a LangServe api works and it will save you some time:

pip install -U langchain-cli

Then we will use this to create the project skeleton:

langchain app new langserve-openai-api

This will create a folder called langserve-openai-api with the following structure:

.
├── Dockerfile
├── README.md
├── app
│   ├── __init__.py
│   └── server.py
├── packages
│   └── README.md
└── pyproject.toml

Here’s the structure of the project:

Dockerfile: this is the file that will allow you to deploy your api as a docker image to whatever you want and is a standard of the industry. We will not talk about this part here but there are multiple way to deploy the API, for example with the Cloud providers.
app: this is the default name of the folder that will contains all the code of your api.
app/server.py: this is the central piece of code that will be executed by the framework. This is where you will define all the routes.
packages: this is the folder where you can put some code that are not packaged (and so cannot be added with a poetry add)
pyproject.toml: this is the file used by poetry to define the different dependencies.
poetry.lock: this file is not created when you create the skeleton but is created / updated when you either add or install the project. It contains all the version and hash of all dependencies that you use in the project. This allows your code to be fully reproductible, meaning if you deploy it somewhere or you give it to another person, it will work the same way.

Now we are going to add some dependencies like the langchain-openai that will contain all the openai langchain code and python-dotenv that will allow us to load environnement variables from the a .env file:

poetry add langchain-openai==0.1.14 python-dotenv==1.0.1

As you say just now, we used not pip but Poetry for adding dependencies to the project. Poetry is a very powerful package manager that is mandatory when creating LangServe project.

Now we need to create our .env file that will contains the credentials to use OpenAI:

OPENAI_KEY=sk-XXXX

Okay, now you are all set with the project and we can finally look at the code and create our api.

LangServe API code presentation

As we said before, the app/server.py is the heart of the API and contains the route definition. Let’s explain it a little:

from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes

app = FastAPI()


@app.get("/")
async def redirect_root_to_docs():
    return RedirectResponse("/docs")


# Edit this to add the chain you want to add
add_routes(app, NotImplemented)

if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

from fastapi import …: this is where the imports of FastAPI which is the web framework used in LangServe but also of the function add_routes from langserve. This function will automatically create a route from a LangChain chain.
app = FastAPI(): this is where the code representation of the api is created. It is through the app that all the routes will be added.
@app.get(“/”): this is a decorator used to specify that the next function is the one handling all the GET request to “/”.
async def redirect_root_to_docs …: this is a simple function that will redirect all the request from “/” to “/docs” that will contains the swagger or the doc of the api.
add_routes(app, NotImplemented): this is the where (if it was implemented) a route will be created for your chain. We will modify this part very soon.
if __name__ == …: this whole part is only used when you launch the code locally and will use uvicorn (which is a very powerful web server framework) to launch the API.

First look at the API

Now let’s launch the API and have our first look. But first we need to comment the add_routes line:

# Edit this to add the chain you want to add
# add_routes(app, NotImplemented)

This is because the add_routes wants a LangChain route and we are giving it a NotImplemented.

Now let’s run the api:

poetry run langchain serve

Open a web browser and go to http://127.0.0.1:8000/ and you will see this:

This is the Swagger of the API which is generated automatically with the definition of the routes. Currently you only have one route that redirect “/” to “/docs” so there is not much to see.
Now let’s implement our own route with a LangChain chain.

Implement our Langchain chain route

Now let’s implement our route into our API.

...
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

Here we added imports for:

the LangChain prompt template to create our prompt
the LangChain StrOutputParser that will transform the output of the chain into string
the chat component from langchain_openai that will allow us to interact with a model from OpenAI
the load_dotenv function that will allow us to import environnement variables from the .env file

app = FastAPI(
    title="Metadocs Langserve Tutorial",
    version="0.1",
    description="Spin up a LangServe API for learning purpose",
)

Here, we are going to add a title, a version and a description to our API. It will be clearer when checking the Swagger.

template = """"Answer the question by first detailing the steps or considerations you think are important for solving it. After explaining each part, provide the final answer.
If there is any aspect you are unsure about, describe what you know and what remains unclear
Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
model = ChatOpenAI(
    temperature=0,
    model_name="gpt-4o-mini",
    openai_api_key=os.environ["OPENAI_KEY"],
)
chain = prompt | model | StrOutputParser()

Here we created our chain by:

Creating a template where we ask the LLM to answer our question by explaining the process and also expliciting what it knows from what it does not know (this is actually the “chain of thought” prompting technique)
Create an chat object that link to gpt-4o-mini model.
Create the simplest chain chain with the prompt, the llm and the output parser.

...
add_routes(app, chain, path="/chat")

Now we just used our add_routes function to create a route called “/chat” for our chain.

Here’s what it looks like in the Swagger now:

As you can see, add_routes did not just create one route but a lot of them.
This is where LangServe is so useful, because it automatically created routes for the classic usage (“/invoke”), the streaming usage (“/stream”) and the batch usage (“/batch”).
There are others routes created but we can ignore them as they will not be used in all cases.

Now let’s test our new route.

Testing our route with LangServe playground

One of the most useful feature, when developing, is the playground. This allows developer to test the route directly on the browser. You only need to add “/playground” to your route. Let’s try it:

Go to http://127.0.0.1:8000/chat/playground/ and you will see this:

This is the playground and it will allow tou to test your routes. Let’s try it with a simple question and see what’s happening:

TADAAAA! You got your first ever API based on Langchain chains and you can even test it easily.
You can pat yourself because this is not simple but you still made it. Great work!

What’s left to do

Now that you have a working API, you are not done yet. I cannot simply leave you without offering some advice:

while you have a working API, it is not yet ready for deployment and sharing. It lacks crucial elements such as security, robustness, and additional features.
security: currently, your API is only launched locally, but if you deploy it in this state, anyone could access and use it, which is very dangerous. You need to implement security and authentication to ensure that only authorized users can access it. For example, a basic security measure could be API keys (similar to how the OpenAI API works) or even OAuth (see link).
robustness: you need to add observability to see how your API is used and have a system to check and analyse the the usage and the cost (you can check this link for more information). You also need to take into consideration timeouts and rate limits of the different API that you are going to use.
features: here we just add a very simple chat but you can use any LangChain chain to create a route. This means you could integrate with routes linked to RAG chains or chains that use specific tools. The possibilities are limited only by your imagination!!

Conclusion

You learned how to create, in a simple manner, an API over your LangChain chains using LangServe. LangServe is a very powerful tool that allows the creation of complex routes that will take full advantage of the capability of LangChain chains (streaming, batching, …) and will allow you to add robustness (typing) and some documentation (Swagger) over your chains.
However, it cannot do everything, and many aspects still need development to achieve a truly production-ready API, including security and robustness. Fortunately, since it is based on the mature and powerful FastAPI framework, all the necessary tools to implement these features are already available.

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !