• Information retrieval system: responsible for retrieving information in large datasets, acting as specialized librarians to select the most relevant data according to a user’s query.
  • Text generation model: leveraging the LLMs’ ability to understand natural language and their generative nature, they take as input not only the user’s query but also feed it with the information retrieved by the information retrieval system.
  • Embedding models: encode texts into numerical vectors that capture both the semantics and the relationship between words and phrases. They allow calculating the similarity between different text fragments and form the basis of the information retrieval systems.
  1. A user poses a question using everyday language. Initially, this inquiry isn’t sent straight to the large language model (LLM) but is instead processed through a search within a dedicated knowledge base that the system can access. This knowledge base is designed to store and provide the necessary information for query resolution.
  2. To facilitate the search, the system calculates the embedding for the user’s question. The knowledge base content has been pre-processed, with each document converted into an embedding for efficient indexing. By comparing the question embedding with those of the knowledge base documents, the system can identify and retrieve the documents most relevant to fulfilling the user’s informational needs.
  3. Following retrieval, the question alongside the selected text fragment(s) (relevant documents) from the knowledge base is forwarded to the LLM. This combined input allows the LLM to generate an informed response, leveraging both the original user query and the contextual information gleaned from the knowledge base documents.
  • It reduces hallucinations: By grounding responses in factual data, RAG reduces the chances of generating incorrect or fabricated information.
  • Updated LLM knowledge: Without the need for fine-tuning, updating the information contained in the knowledge base is sufficient for the model to access new and verified information.
  • Ideal for applications where there is a multitude of available but unstructured or unlabeled information.
  • Improves LLM precision when working with documents it has never seen before and in highly specific domains. This includes private or internal company documentation, for example.
  • Effectiveness of the information retrieval engine: RAG’s effectiveness heavily relies on the quality of the information retrieval engine. This often involves preprocessing documents, appropriate chunking, selecting the right embedding model, or choosing suitable text similarity metrics.
  • Context length: LLMs have a context size limited to a certain number of tokens, so it will be a limitation of the system when working with knowledge sources where information is spread across large text fragments.
  • Dependency on knowledge base data: As with any process, it’s crucial that data is available and of quality. The relevance of responses will largely depend on this point.