RAG

📚 Resources: https://python.langchain.com/docs/modules/data_connection/
Large language models (LLMs) know a lot, but just nothing about you (unless you’re famous, in which case please subscribe — except if you’re fbi.gov/wanted kind of famous). That’s because they are trained on public datasets, and your private information are (hopefully) not public. We are talking about sensitive data here: bank statements, personal journals, browser history, etc. Call me paranoid, but I don’t want to send any of these over internet to hosted LLMs in the cloud, even if I paid a hefty amount to keep them good confidants. I want to run LLMs locally. To make a LLM relevant to you, your intuition might be to fine-tune it with your data, but: Training a LLM is expensive. Due to the cost to train, it’s hard to update a LLM with latest information. Observability is lacking. When you ask a LLM a question, it’s not obvious how the LLM arrived at its answer. There’s a different approach: Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, frameworks like LlamaIndex: retrieves information from your data sources first, adds it to your question as context, and asks the LLM to answer based on the enriched prompt. RAG overcomes all three weaknesses of the fine-tuning approach: There’s no training involved, so it’s cheap. Data is fetched only when you ask for them, so it’s always up to date. The framework can show you the retrieved documents, so it’s more trustworthy. RAG imposes little restriction on how you use LLMs. You can still use LLMs as auto-complete, chatbots, semi-autonomous agents, and more. It only makes LLMs more relevant to you. Now that you are able to build and run RAG applications, do you really think your applications are ready for production? How safe is your RAG application? Is it suffering from hallucinations? how do you know and quantity if the RAG pipeline is not hallucination? How do you protect your RAG pipeline applications from hackers and toxic users? Are there ways you can improve the quality of your RAG pipelines? Function-calling deserves an honorable mention here, though it is way out of scope for an article on RAG. In RAG, the action of retrieval must be executed somehow. Thus, it is a function. Functions do not have to be pure (in the mathematical sense); that is, they can have side effects (and — in the programming world — they often do). Therefore, functions are just tools that a LLM could wield in its hands. Metaphorically, we call those LLMs with tool-using abilities “agents”.