Introducing Vectorize: The easy path to accurate RAG applications

Chris Latimer
Introducing Vectorize: The easy path to accurate RAG applications

If you’ve ever built an LLM-powered application using retrieval-augmented generation (RAG), you’ve probably encountered many of these common challenges:

  • You launch your application only to face user complaints that the LLM is hallucinating.
  • The LLM struggles to answer basic questions it should have no problems answering.
  • You spend days writing scripts to compare inference results using different chunking strategies and embedding models in your vector database
  • You struggle to quantify exactly which vectorization strategy works best on your data.
  • You try using the highest-performing embedding models on the Hugging Face leaderboard, but they just don’t seem to work that well on your data.

If you’ve faced these problems, you’re in good company. The founders of Vectorize spent the better part of 2023 working closely with developers to build and launch dozens of RAG applications at companies large and small. Every single team ran into some combination of these problems. Vectorize is the platform we kept wishing we had to help these teams get their LLM-powered applications into production with the confidence that they would always have the most accurate, relevant context to help the LLM perform at its best.

Today, we are excited to announce that Vectorize is publicly available for anyone to try for free! If you are building LLM-powered RAG applications or think you will need to in the future, we believe that once you try Vectorize, you’ll never want to build another RAG application without it.

Key Features Available Now

Experiments

This release focuses on a feature we call Experiments. This feature allows you to experiment with different embedding models, chunking strategies, and retrieval configurations to find the combination that will work best with your data. With Experiments, you can determine the best methods for creating highly relevant search indexes for your Retrieval Augmented Generation (RAG) applications. This data-driven approach eliminates guesswork and replaces it with precise, data-driven evaluations, helping you build the most effective generative AI solutions.

An example experiment in Vectorize using Wikipedia articles from WW2 as a demonstration.

RAG Sandbox

While it’s great to have concrete data to help you understand which vectorization strategy works best for your data, it’s still wise to compare those results against your own personal assessment. To facilitate this comparison, we have introduced the RAG Sandbox.

Leveraging the vector search indexes generated in your experiments, the RAG sandbox gives you the ability to test end-to-end RAG scenarios. Here you can submit a prompt to one of the many supported LLMs, such as Llama 3, Gemma, Mixtral, and others, that are powered by Groq, along with GPT-3.5 from OpenAI. (See the RAG Sandbox in action; no registration required. ↗)

The Vectorize RAG Sandbox is an interactive tool to see in real time how various vectorization strategies perform.

The RAG Sandbox gives you complete visibility into the retrieval-augmented generation process. You can inspect the context that comes back from the vector database and see relevancy scores, normalized discounted cumulative gain (NDCG) scores, and cosine similarity. You can then adjust your prompt, LLM, and LLM settings to see how these tweaks impact the overall effectiveness of your RAG setup.

How it works

Experiments allow you to start with a representative sample of the data you want to vectorize. This could be PDF or Word documents from your file system, exports from knowledge bases like Notion, Confluence, or other platforms, or relevant entries from SaaS platforms like Salesforce or ZenDesk. You don’t need to integrate Vectorize with any external system to run your experiments; just bring your exported files to upload, and we will provide the rest.

Using the Vectorize experiment feature requires only a free account and a representative sample of your data.

Vectorize will automatically vectorize your data and build vector search indexes using the database engine you select, either Pinecone Serveless or DataStax Astra.

Simulating User Questions

Based on the documents you provide, Vectorize begins the experiment by extracting their contents. It then uses a fine-tuned model to generate a set of user questions that your documents can effectively answer. For example, the following questions were generated based on an experiment that used books from the 20th-century philosopher Friedrich Nietzsche as input:

Vectorize experiments generate simulated questions your users may have about the documents you provide in your experiment.

Multiple Vectorization Strategies

While questions are being generated, Vectorize initiates 4 different parallel RAG pipelines to process your uploaded files. Each of these pipelines handles a specific combination of embedding model, chunking strategy, and retrieval settings before finally persisting your vectors into a search index in your selected vector database.

You can watch this process as it progresses in real time on the Vectorize UI:

Vectorize builds search indexes of your documents using 4 different embedding models and chunking strategies.

Identifying the top vectorization strategy

To score each strategy, Vectorize uses the simulated question set and performs a retrieval using a semantic similarity query for each question. It then assesses the relevancy of the responses that it gets back, along with classical information retrieval evaluation techniques such as NDCG. It then highlights the option with the best score as the “winner” of the experiment, indicated by a green ring and a trophy icon:

Iterate and Improve

Vectorize allows you to refine your vectorization strategy by performing multiple experiments with different configurations. If you discover that one embedding model works particularly well, you can vary the chunking strategy to further improve the context your LLM will receive. You can also vary settings like chunk size, chunk overlap, and the top-K value that will be used when retrieving the nearest neighbors from the vector search index.

You can select up to 4 different configurations per experiment

Start Building Better RAG Applications Today

We would like to invite you to sign up and start experimenting with Vectorize today! To get started, head over to https://platform.vectorize.io and sign up for a free account. From there, our quickstart documentation is a good resource to help you run your first experiment and see how the platform works.

If you run into any bugs, have questions, or just want to chat, please join our community on Discord!