RAG as Your Anchor: Grounding Chatbots in Trusted Data

Retrieval Augmented Generation (RAG) combines information retrieval with generative AI, allowing systems to provide more accurate, up-to-date responses.

This approach is especially valuable as large language models (LLMs) are limited by the static data they were trained on. This is where RAG steps in. By pulling updated, trusted data from your organisations knowledge bases. RAG ensures chatbots are trained on your data and provide relevant up-to-date answers.

Why is RAG important to an LLM?

LLMs face several limitations. They can confidently generate outdated or incorrect information when they lack the necessary context. Without access to recent & relevant data, these models won’t give responses specific to your organisation. While RAG is not a flawless solution, anchoring responses in curated data significantly reduces hallucinations and is more reliable.

If you’ve ever been given false information by an AI tool, you’ll know it can erode your trust fast. These oops moments, often referred to as AI hallucinations, are a common bugbear in the AI world.

At its core, RAG queries a designated knowledge base before the LLM generates a response.

For example, if a healthcare organisation uploads its guidelines, approved treatments, and care protocols into a RAG database. When a nurse or patient asks a question like, “What are the post surgery care steps for a hip replacement?”, the RAG powered chatbot retrieves the exact, up-to-date steps from the database. So, AI chatbots are generating more reliable responses from verified sources of information.

How RAG-powered Chatbots Work

RAG chatbots use two powerful AI techniques, retrieval and generation to provide accurate, up-to-date responses. Let’s break down a tiny bit of this complicated process:

Understanding the Query: Step one for the chatbot is figuring out what the human is talking about. Once the chatbot identifies key details and context it can understand the intent behind the query.
Retrieving Data: Next, the chatbot retrieves external data, so it’s providing answers based on real-time information. External data is the new data not included in the LLM’s old training data. It could constitute of document libraries, databases, API’s and contain files of differing formats.
Processing Data: Then the system processes and contextualises the retrieved data to answer the user’s question.
Generating the Response: Now the chatbot will combine the processed data with its language model to create a response.
Continuous Improvement: Over time your external data can become outdated, so you should regularly update the documents.

Why Are RAG Chatbots Better Than Conventional Bots?

Traditional chatbots can’t answer anything outside their scope, because they rely on a limited list of possible questions. While RAG powered chatbots can access external data, allowing them to give more relevant and accurate responses.

Our RAG Approach

Traditional RAG systems use embeddings and vector databases to retrieve information.

Embeddings are a way of turning words or pieces of text into numbers, so machines can process and compare them. A vector database stores these numbers. It can help a chatbot find the closest matches when it looks for relevant information.

While these methods are useful, they can sometimes fall short, especially when using open-source embedding models, which might not always capture subtle meanings or specific contexts. We’ve updated traditional RAG to better handle the retrieval of relevant information, instead of relying solely on embeddings and vector searches.

Our approach goes a step further. Instead of relying solely on embedding similarity, we use an LLM to assess section summaries and rank relevance. This ensures that responses are not just the “closest match”, but genuinely the most appropriate and useful.

This approach means our RAG powered chatbots retrieve the most accurate and context appropriate data, even for the more complex or specialised topics, and results in a far lower rate of incorrect or hallucinated responses.

Why This Matters

By addressing the limitations of traditional RAG, our system provides a more reliable foundation for chatbots. Whether you’re in healthcare, education, or another industry, this enhanced retrieval method ensures users get trustworthy, relevant answers, reducing frustration and improving outcomes.

Imagine a legal firm uses a chatbot to assist with contract queries. A user asks:

“What are the conditions for terminating a supplier agreement?”

In a traditional RAG system, the embeddings and vector database might look for terms like “terminating” and “supplier agreement”. If the exact phrase or something closely related isn’t in the database, it might retrieve something less relevant, such as:

“Conditions for terminating employment contracts.”

This happens because embeddings focus on numbers and patterns, which don’t always capture subtle differences in meaning like the difference between supplier agreements and employment contracts.

Japeto’s RAG approach works differently. It creates summaries for each section in the database to explain what they’re about. When someone asks a question, the system uses an LLM to check these summaries figuring out which sections are most relevant. Instead of just matching keywords, the system asks:

“Does this section specifically discuss supplier agreements and termination conditions?”

By doing this, the system ignores irrelevant sections and retrieves the most contextually accurate content.

Where to Start with RAG

If you’re considering RAG-powered chatbots for your business. Here are some key steps to help you get started:

1: Get your Data Together

The foundation of any RAG system is its data. Start by identifying which data sources are most valuable for your chatbot. This can be internal documents like product manuals, company guidelines or legal documents. Or it could be external sources, so this could include public knowledge bases or APIs.

Once you have the data clean, structure and update it so you get the best results. Ensure that your knowledgebase is up-to-date and structured so that it can be easily understood. If a human would struggle to read your documentation, chances are a RAG system will struggle too!

Documents come in a range of formats, so they should usually be transformed into a format more easily understood by machines. Markdown is a good example of this it preserves important context such as section titles, while keeping the content in plain text with irrelevant information removed.

RAG systems often consider information in chunks of text where full documents are too large to search. You should consider how you chunk text. If your data has a strong hierarchy, you might split text by section, or otherwise into individual paragraphs or fixed-length blocks of text. It’s important to experiment with different strategies, and see which approach is the best fit for your data.

2: Choose or Build your Chatbot

You’ll need a chatbot platform capable of handling retrieval-augmented generation. There’s a couple ways to go about this:

Off-the-shelf solutions: Look for chatbots that already integrate RAG functionality.
Custom build: If you have a development team, you can add RAG to an existing chatbot. This way, it can work with your data sources.

3: Test and Train the Chatbot

Before going live, test your RAG-powered chatbot to ensure it retrieves and generates accurate responses. Does the chatbot understand the query and pull the correct information? Is the information provided up-to-date and contextually appropriate?

4: Monitor and Improve

Once your RAG-powered chatbot is up and running, the work doesn’t stop there. To keep it performing at its best, it’s important to monitor its responses, gathering feedback from users.

This helps identify any areas where the chatbot may need improvement. Regularly update your data sources to ensure they are relevant. Looking at version control for your documents is an efficient way of managing changes.

Integrating performance metrics, such as response accuracy and user satisfaction, can make data driven decisions easier. By constantly refining your chatbot, you’ll keep it delivering accurate, insightful responses and providing even greater value over time.

Technical Insights

The most common form of RAG uses embeddings models and vector databases, and you can build these systems easily using open-source solutions. For example, you can create embeddings from your documents using the BGE Large 1.5 model and use a PostgreSQL database with the PGVector extension to store and search for these documents. Since these are open source, you could build this system with no licensing costs.

However, the quality of your data is vitally important to having an accurate RAG system. Without quality data, the pieces of text being retrieved may be irrelevant, and your chatbot will perform poorly.

The key factor here is to take a data-driven approach. It’s key to have a solid metric on how your chatbot is performing, and how relevant the information is. This can be done, for example, by putting your chatbot in front of users and having them score responses. Once you have a reliable baseline, you can experiment with changing parts of your RAG system to see the effects.

Some experiments you could run include:

Consider whether your knowledgebase has gaps, and write updated documentation
Change how your RAG system chunks text, you can change text size, or how your system decides to split up text.
If users sometimes ask their queries in confusing ways, consider rewriting their queries with an LLM for better understanding
In longer conversations, rewrite the user’s query (“How do I find out more about it?”) to include the context from previous messages (“How do I find out more about RAG?)

RAG and the Future of Chatbots

Currently, RAG chatbots provide reliable, context-driven answers to user queries. In the future, RAG chatbots may not only answer questions but also recommend actions. For instance, a school chatbot helping a student with a disability might not only tell them about academic support. But also suggest schools they’re suited for and generate an application based on their grades, needs and support provided by the schools.

Advances in RAG technology could enable chatbots to process larger volumes of data faster. We could see the development of RAG data products targeted at industry specific use cases. Other possibilities we might see include:

Combining RAG with multimodal systems (e.g., text and images).
Adding in real-time sources of data to get the most reliable information, such as APIs to your organisation’s data, or live feeds to your news or social media
Automatically training RAG systems based on past responses and user feedback

Interested in leveraging RAG chatbots for your business? Get in touch to learn how Japeto is using retrieval augmented generation technology.

Share this page

Emily Coombes

Hi! I'm Emily, a content writer at Japeto and an environmental science student.

Got a project?

Let us talk it through