RAG (Retrieval-Augmented Generation)

Retrieval-Augmented Generation (RAG) is a technique that improves LLM accuracy and prevents hallucinations by retrieving relevant verified information from external sources (documents, databases, knowledge bases) before the LLM generates a response.

How RAG Works

Retrieval: User query is converted to embeddings and matched against a knowledge base to find relevant documents.
Augmentation: The retrieved documents are injected into the LLM prompt as context.
Generation: The LLM generates a response grounded in the retrieved facts, not hallucinated from training data.

RAG vs. Fine-Tuning

Fine-tuning: Retrains the LLM on domain-specific data; expensive and slow; knowledge becomes static.
RAG: Retrieves latest knowledge at query time; fast, cheap, always up-to-date.

RAG in Voice Agents

A voice agent for a healthcare provider uses RAG to:

Retrieve the patient's appointment details from the EHR before answering "When is my appointment?"
Fetch clinical guidelines before answering medication questions.
Look up the patient's balance in the billing system before discussing payment options.

The AI generates responses based on verified data, not hallucination.

Knowledge Base Design

RAG quality depends on knowledge base quality. Best practices:

Keep documents current and accurate; remove outdated information.
Use consistent formatting and metadata (doc type, source, version date).
Index by topic for fast retrieval; poor indexing causes relevant docs to be missed.

How RAG Works

RAG vs. Fine-Tuning

RAG in Voice Agents

Knowledge Base Design

Related Terms

See AI Voice Agents in Action