Implement your own low cost and scalable vector store using LanceDB, LangChain and AWS

Do you have a great idea for an app and need a powerful but affordable vector store solution? You already have something but the cost of your current vector store is too much ?
Then you are in the right place as we will discuss how to implement a low cost and scalable vector store using LanceDB, LangChain and AWS.

Introduction

In this post, you will learn:

  • What is LanceDB, and how does it differ from other vector stores?
  • How to create a scalable and cost efficient vector store?
  • How to create a simple RAG app using this architecture?

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

  • A functional AWS account where you have admin permission (check this link)
  • Have access to at least one LLM model on AWS Bedrock
  • An AWS user setup on you laptop (check this link)
  • Have a s3 bucket and permission to write data on it (check this link)
  • A functional Python environement
  • Having read the post on using AWS Bedrock with Langchain (link) and the post on how to create RAG application (link).
  • You can find all the relevant code here.

What is LanceDB and how is it different?

LanceDB is an open-source vector database that’s designed to store, manage, query and retrieve embeddings. It is made to handle large scale multi modal data (documents, image, audio, …).
It is based on the Lance format which is an open-source columnar data format designed for performant ML and AI workloads.

Specifically, this means that:

  • The embeddings are saved directly as Lance file. There is no database that needs to be up all the time.
  • All operations on the Lance files are implemented by the LanceDB library which means that it is your compute (server, local computer, …) that use and launch them. This explains why continuous uptime is not required.
  • LanceDB is very scalable because of this file only format. If you need to add a new file or a new vector store table, you simply create a new file or append to an existing one.
  • LanceDB is also very cost efficient because you will need only to pay for the storage of the Lance files and the compute to create / update / retrieve them (which you should already have if you have an application).
  • You can store the Lance files, not only on the local drive but also on Cloud storage like AWS S3. This means you have nearly unlimited storage and so great scaling.
  • LanceDB is also very fast because of the simple architecture and the efficient storage system.

Why is it important to have a cost efficient vector store?

A vector store is a central component in any RAG architecture. The more the application will be used, the more the vector store will be used and so the bill can go up very fast, depending on the tool you use.
There are some case where you need the absolute best tool, whatever the price, and there will be other case where you need something that is efficient (but not at the level of the best tools), good enough and cheaper.
In these cases, LanceDB makes sense. I’m not saying it is not powerful. Actually, LanceDB can be on par with the best vector store, but you will need a lot of development and infrastructure. But with it, it is possible to create simply what is called a serverless vector store where you really pay when only when you use it and even then, you really pay for the compute and the access to the data (for AWS S3).

How to create a scalable and cost efficient vector store?

So now that we have seen all this, how to actually simply create a scalable and cost efficient vector store using LanceDB? Here’s the main points:

  • Storage: Save all the Lance files on AWS s3. S3 will serve as the main storage as it is already cost-efficient and incredibly scalable.
  • Processing: Depending on the use case, you could use AWS Lambda for example and create your own API. You can also integrate the vector store code directly inside the backend of your webapp. You only need to have a python or javascript environnement with the LanceDB library.
  • LLM: You can use whatever you want, from Claude models through AWS Bedrock to GPT models from OpenAI. Don’t forget that for the actual RAG, you will need a model complex enough to answer the question (Claude Haiku or GPT-mini are not complex enough for this task for example).
  • Embedding: This is the model that will transform your data into embedding vector. As before, you can go through Bedrock to get Cohere embeddings or directly use the embeddings from OpenAI.

Using this architecture, you will have a fully serverless vector store where you will only pay when using it. The only constant cost will be the S3 storage (which should be pretty cheap unless you really have a lot of data). Pretty neat right?

Now let’s code all this!

Initialize the work environment

First, let’s create a folder called cost-efficient-vector-store-with-lancedb:

mkdir cost-efficient-vector-store-with-lancedb
cd cost-efficient-vector-store-with-lancedb

Now let’s init a pip environnement with the correct depencies:

pipenv install langchain langchain-aws langchain-community lancedb streamlit

We just added langchain, langchain-community, lancedb and langchain-aws as dependencies. These are libraries for langchain and the library to handle lancedb and streamlit.

Ok now that your environnement is setup, we can advance on the next step of the post, create the vector store.

Create the vector store

Now let’s create some code to create our vector store. First we are going to create a basic streamlit app to upload some vector store and use them for our rag. We take inspiration from this post so don’t hesitate to check it out.

import streamlit as st

st.write("Hello, Metadocs readers!")

Now that we have the streamlit app, let’s add the code to setup up the llm, embeddings and the s3 path. We will use the embedding model embed from cohere (which actually support multilingual if you ever need multiple language).

...
from langchain_aws.chat_models import BedrockChat
from langchain_aws.embeddings import BedrockEmbeddings


aws_region_name = "us-east-1"
claude_3_sonnet = "anthropic.claude-3-sonnet-20240229-v1:0"
cohere_embed = "cohere.embed-multilingual-v3"
s3_bucket = "s3://vector-store-lancedb-tuto"

claude_3_sonnet = BedrockChat(
    model_id=claude_3_sonnet,
    region_name=aws_region_name,
)

cohere_embedding = BedrockEmbeddings(
    model_id=cohere_embed,
    region_name=aws_region_name,
)

Now let’s add the prompt:

...
from langchain_core.prompts import ChatPromptTemplate
...

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

And a streamlit component to load a file to add to the vector store:

...
import streamlit as st
...

uploaded_file = st.file_uploader("Choose a text file", type="txt")

if uploaded_file is not None:
    string_data = uploaded_file.getvalue().decode("utf-8")
    splitted_data = string_data.split("\n\n")

Until now, this was very classic. Now let’s implement the vector store creation using LanceDB.

...
from langchain_community.vectorstores import LanceDB
...

vector_store = LanceDB(
    uri=f"{s3_bucket}/lancedb/",
    table_name="tuto",
    embedding=cohere_embedding,
    mode="overwrite",
)
retriever = vector_store.as_retriever()

...

if uploaded_file is not None:
    string_data = uploaded_file.getvalue().decode("utf-8")
    split_data = string_data.split("\n\n")

    vector_store.add_texts(split_data)

Here’s what’s happening:

  • First we import LanceDB from LangChain. We could also directly use the library LanceDB but this is simpler for this use case. Don”t hesitate to check the documentation of the library as it is far richer that what is implemented in LangChain.
  • Then we create the vector store in vector_store = LanceDB(. We need to give the uri and the table name parameters which is in fact the path where the vector store and the table will be created. We also give the embedding function and the mode “overwrite” as we do not want to keep previous data.
  • Then we create a retriever from the vector store.
  • Finally, in the uploading part, we just add the split text into our vector store. Here the preprocessing of the data is very simple but in a real use case, this will a very big part of your process.

Now that the vector store is created, we just need a chain to retrieve data from it and answer a question:

...
retriever = vector_store.as_retriever()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

...

question = st.text_input("Input your question for the uploaded document")

if question:
    result = chain.invoke(question)
    st.write(result)

Ok perfect, now we have all the pieces of your cost-efficient vector store. Now let’s try it.

Launch the app

Now that we have our app ready, let’s launch it:

AWS_PROFILE=default pipenv run streamlit run app.py

Here we use pipenv to run our streamlit app. The AWS_PROFILE part at the beginning is to give the AWS profile used as environnement variable to your whole application so that each AWS action can be authenticated. If you do not use it, you can just remove this part:

pipenv run streamlit run app.py

Here’s what you should have:

Now let’s upload our classic state_of_the_union.txt file and ask a question:

Congratulation! You just implemented a very cost-efficient but powerful vector store by yourself that you can use in real applications.
Still there are some things to improve so let’s list some of them.

Improvements

We just tested our vector store but there still a lot of improvements to be done if you want something for efficient and usable. Let’s list some of them:

  • Process time: if you have tested the app, you may have found that it is pretty slow. This is cause by a multitude of things:
    • First, it is faster to directly use the LanceDB library that the LangChain one.
    • Second, the embedding is done sequentially, and you could gain a lot of time by just doing it in parallel (or in asynchrone).
    • Third, this is launched on a laptop in Europe but the S3 bucket is in US. This means that there is a lot of latency because of the distance.
  • Compute: The cost is currently run on a laptop but to use it in an app, you will need compute to run this. You could do that directly in the backend of your webapp or through an api. In the latter, you could use AWS Lambda to have a really serverless system (AWS Lambda is an AWS service that allows to launch function with small time limit, 15 min max, but for very cheap).
  • File processing: The processing of the input file is really simple and in a real use case, you would need far more complex file processing code (for example to handle multiple file type).
  • Webapp: This was done using a Streamlit app but it is clearly not made to be something deployed and shared with people, you would need to create a real webapp with robust technologies.

Conclusion

In this tuto, we just implemented a powerful, scalable and cost-efficient vector store using LanceDB. It makes for a very simple solution for your use case, when you do not need the best of the best performance and want to reduce the cost. There are some improvements but this is definitely a solution that can be used for very big applications. It just needs some development time on your side.

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !

Leave a Reply

Your email address will not be published. Required fields are marked *