Retrieval-Augmented Generation (RAG)
Also: RAG
A technique that feeds an LLM relevant external documents at query time so its answers are grounded in your data, not just its training.
Retrieval-Augmented Generation (RAG) pairs a language model with a retrieval step: when a question comes in, the system first fetches the most relevant chunks of your knowledge — usually from a vector database — and passes them to the model as context, so the answer is grounded in current, specific data instead of the model's frozen training.
RAG is the workhorse behind most "chat with your docs" products and a lot of agent memory. It's also why embeddings, vector stores, and chunking strategies became core infrastructure for AI builders — and why so many tools in those categories exist.