Site icon Use AI the right way

Implement your own low cost and scalable vector store using LanceDB, LangChain and AWS

Do you have a great idea for an app and need a powerful but affordable vector store solution? You already have something but the cost of your current vector store is too much ?
Then you are in the right place as we will discuss how to implement a low cost and scalable vector store using LanceDB, LangChain and AWS.

Introduction

In this post, you will learn:

As usual, we will be as straightforward and simple as possible so let’s get to work!

Pre-requisites

What is LanceDB and how is it different?

LanceDB is an open-source vector database that’s designed to store, manage, query and retrieve embeddings. It is made to handle large scale multi modal data (documents, image, audio, …).
It is based on the Lance format which is an open-source columnar data format designed for performant ML and AI workloads.

Specifically, this means that:

Why is it important to have a cost efficient vector store?

A vector store is a central component in any RAG architecture. The more the application will be used, the more the vector store will be used and so the bill can go up very fast, depending on the tool you use.
There are some case where you need the absolute best tool, whatever the price, and there will be other case where you need something that is efficient (but not at the level of the best tools), good enough and cheaper.
In these cases, LanceDB makes sense. I’m not saying it is not powerful. Actually, LanceDB can be on par with the best vector store, but you will need a lot of development and infrastructure. But with it, it is possible to create simply what is called a serverless vector store where you really pay when only when you use it and even then, you really pay for the compute and the access to the data (for AWS S3).

How to create a scalable and cost efficient vector store?

So now that we have seen all this, how to actually simply create a scalable and cost efficient vector store using LanceDB? Here’s the main points:

Using this architecture, you will have a fully serverless vector store where you will only pay when using it. The only constant cost will be the S3 storage (which should be pretty cheap unless you really have a lot of data). Pretty neat right?

Now let’s code all this!

Initialize the work environment

First, let’s create a folder called cost-efficient-vector-store-with-lancedb:

mkdir cost-efficient-vector-store-with-lancedb
cd cost-efficient-vector-store-with-lancedb

Now let’s init a pip environnement with the correct depencies:

pipenv install langchain langchain-aws langchain-community lancedb streamlit

We just added langchain, langchain-community, lancedb and langchain-aws as dependencies. These are libraries for langchain and the library to handle lancedb and streamlit.

Ok now that your environnement is setup, we can advance on the next step of the post, create the vector store.

Create the vector store

Now let’s create some code to create our vector store. First we are going to create a basic streamlit app to upload some vector store and use them for our rag. We take inspiration from this post so don’t hesitate to check it out.

import streamlit as st

st.write("Hello, Metadocs readers!")

Now that we have the streamlit app, let’s add the code to setup up the llm, embeddings and the s3 path. We will use the embedding model embed from cohere (which actually support multilingual if you ever need multiple language).

...
from langchain_aws.chat_models import BedrockChat
from langchain_aws.embeddings import BedrockEmbeddings


aws_region_name = "us-east-1"
claude_3_sonnet = "anthropic.claude-3-sonnet-20240229-v1:0"
cohere_embed = "cohere.embed-multilingual-v3"
s3_bucket = "s3://vector-store-lancedb-tuto"

claude_3_sonnet = BedrockChat(
    model_id=claude_3_sonnet,
    region_name=aws_region_name,
)

cohere_embedding = BedrockEmbeddings(
    model_id=cohere_embed,
    region_name=aws_region_name,
)

Now let’s add the prompt:

...
from langchain_core.prompts import ChatPromptTemplate
...

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

And a streamlit component to load a file to add to the vector store:

...
import streamlit as st
...

uploaded_file = st.file_uploader("Choose a text file", type="txt")

if uploaded_file is not None:
    string_data = uploaded_file.getvalue().decode("utf-8")
    splitted_data = string_data.split("\n\n")

Until now, this was very classic. Now let’s implement the vector store creation using LanceDB.

...
from langchain_community.vectorstores import LanceDB
...

vector_store = LanceDB(
    uri=f"{s3_bucket}/lancedb/",
    table_name="tuto",
    embedding=cohere_embedding,
    mode="overwrite",
)
retriever = vector_store.as_retriever()

...

if uploaded_file is not None:
    string_data = uploaded_file.getvalue().decode("utf-8")
    split_data = string_data.split("\n\n")

    vector_store.add_texts(split_data)

Here’s what’s happening:

Now that the vector store is created, we just need a chain to retrieve data from it and answer a question:

...
retriever = vector_store.as_retriever()

chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

...

question = st.text_input("Input your question for the uploaded document")

if question:
    result = chain.invoke(question)
    st.write(result)

Ok perfect, now we have all the pieces of your cost-efficient vector store. Now let’s try it.

Launch the app

Now that we have our app ready, let’s launch it:

AWS_PROFILE=default pipenv run streamlit run app.py

Here we use pipenv to run our streamlit app. The AWS_PROFILE part at the beginning is to give the AWS profile used as environnement variable to your whole application so that each AWS action can be authenticated. If you do not use it, you can just remove this part:

pipenv run streamlit run app.py

Here’s what you should have:

Now let’s upload our classic state_of_the_union.txt file and ask a question:

Congratulation! You just implemented a very cost-efficient but powerful vector store by yourself that you can use in real applications.
Still there are some things to improve so let’s list some of them.

Improvements

We just tested our vector store but there still a lot of improvements to be done if you want something for efficient and usable. Let’s list some of them:

Conclusion

In this tuto, we just implemented a powerful, scalable and cost-efficient vector store using LanceDB. It makes for a very simple solution for your use case, when you do not need the best of the best performance and want to reduce the cost. There are some improvements but this is definitely a solution that can be used for very big applications. It just needs some development time on your side.

Afterward

I hope this tutorial helped you and taught you many things. I will update this post with more nuggets from time to time. Don’t forget to check my other post as I write a lot of cool posts on practical stuff in AI.

Cheers !

Exit mobile version