Understanding Impact of Advanced Retrievers on RAG Behavior through Visualization – Towards Data Science

13 min read

LLMs have become adept at text generation and question-answering, including some smaller models such as Gemma 2B and TinyLlama 1.1B. Even with such performant pre-trained models, they may not perform well when queried about some documents not seen during training. In such a scenario, supplementing your question with relevant context from the documents is an effective approach. This approach termed Retrieval-Augmented Generation (RAG) has gained significant popularity, due to its simplicity and effectiveness.

Retriever is a key component of a RAG system, which involve obtaining relevant document chunks from a back end vector store. In a recent survey paper on the evolution of RAG systems, the authors have classified such systems into three categories, namely Naive, Advanced and Modular [1]. Within the advanced category, post-retrieval optimization techniques such summarizing as well as re-ranking retrieved documents have been identified as some key improvement techniques over the naive approach.

In this article, we will look at how a naive retriever as well as two advanced retrievers influence RAG behavior. To better represent and characterize their influence, we will be visualizing the document vector space along with the related documents in 2-D using visualization library, renumics-spotlight. This library boasts powerful features to visualize the intricacies of document embeddings, and yet it is easy to use. And for our LLM of choice, we will be using TinyLlama 1.1B Chat, a compact model, but without a proportional drop in accuracy [2]. It makes this LLM ideal for rapid experimentation.

Disclaimer: I dont have any affiliation with Renumics or its creators. This article provides an unbiased view of the library usage based on my personal experience with the intention to make its knowledge available to the masses.

Table of Contents1.0 Environment and Key Components 2.0 Design and Implementation 2.1 Module LoadVectorize 2.2 The main Module 3.0 Knobs on Spotlight UI 4.0 Comparison of Retrievers 5.0 Closing Remarks

Continue reading here:

Understanding Impact of Advanced Retrievers on RAG Behavior through Visualization - Towards Data Science

Related Posts

Comments are closed.