Glossary
RAG (Retrieval-Augmented Generation)
A technique that grounds LLM responses in verified external knowledge by retrieving relevant documents before generating a reply.
Retrieval-Augmented Generation (RAG) is a technique that improves LLM accuracy and prevents hallucinations by retrieving relevant verified information from external sources (documents, databases, knowledge bases) before the LLM generates a response.
How RAG Works
- Retrieval: User query is converted to embeddings and matched against a knowledge base to find relevant documents.
- Augmentation: The retrieved documents are injected into the LLM prompt as context.
- Generation: The LLM generates a response grounded in the retrieved facts, not hallucinated from training data.
RAG vs. Fine-Tuning
- Fine-tuning: Retrains the LLM on domain-specific data; expensive and slow; knowledge becomes static.
- RAG: Retrieves latest knowledge at query time; fast, cheap, always up-to-date.
RAG in Voice Agents
A voice agent for a healthcare provider uses RAG to:
- Retrieve the patient's appointment details from the EHR before answering "When is my appointment?"
- Fetch clinical guidelines before answering medication questions.
- Look up the patient's balance in the billing system before discussing payment options.
The AI generates responses based on verified data, not hallucination.
Knowledge Base Design
RAG quality depends on knowledge base quality. Best practices:
- Keep documents current and accurate; remove outdated information.
- Use consistent formatting and metadata (doc type, source, version date).
- Index by topic for fast retrieval; poor indexing causes relevant docs to be missed.
Related Terms
See AI Voice Agents in Action
Workforce Wave deploys AI voice agents across healthcare, staffing, and more. Book a 30-minute demo — no pressure, no generic scripts.
Book a Demo