Glossary
Latency
The delay between when a caller finishes speaking and when the AI voice agent begins its response.
Latency in voice agent conversations is the time delay between when a caller finishes speaking and when the AI agent begins its response. Latency is critical to perceived naturalness — high latency creates awkward pauses that feel unnatural and signal to the caller that they are speaking with a machine.
Latency Benchmarks
- Excellent: Under 500ms — feels instantaneous, similar to human response.
- Good: 500-800ms — noticeable but acceptable pause, still feels natural.
- Poor: Over 800ms — clearly artificial; caller often interrupts or repeats.
Where Latency Comes From
- Network latency: Audio transmission to cloud servers (typically 50-200ms).
- ASR processing: Speech-to-Text model inference (100-300ms).
- NLU and LLM: Intent detection and response generation (300-800ms).
- TTS synthesis: Text-to-Speech audio generation (100-200ms).
Optimizing Latency
- Deploy models closer to callers (regional cloud servers, edge computing).
- Use faster, smaller models (distilled LLMs) instead of large flagship models.
- Parallelize: start TTS while LLM is still generating text.
- Use streaming APIs instead of request-response round trips.
Workforce Wave Latency
Workforce Wave targets sub-800ms end-to-end latency through regional inference, streaming APIs, and optimized model serving. Most calls feel indistinguishable from human agents.
See AI Voice Agents in Action
Workforce Wave deploys AI voice agents across healthcare, staffing, and more. Book a 30-minute demo — no pressure, no generic scripts.
Book a Demo