Latency

Latency in voice agent conversations is the time delay between when a caller finishes speaking and when the AI agent begins its response. Latency is critical to perceived naturalness — high latency creates awkward pauses that feel unnatural and signal to the caller that they are speaking with a machine.

Latency Benchmarks

Excellent: Under 500ms — feels instantaneous, similar to human response.
Good: 500-800ms — noticeable but acceptable pause, still feels natural.
Poor: Over 800ms — clearly artificial; caller often interrupts or repeats.

Where Latency Comes From

Network latency: Audio transmission to cloud servers (typically 50-200ms).
ASR processing: Speech-to-Text model inference (100-300ms).
NLU and LLM: Intent detection and response generation (300-800ms).
TTS synthesis: Text-to-Speech audio generation (100-200ms).

Optimizing Latency

Deploy models closer to callers (regional cloud servers, edge computing).
Use faster, smaller models (distilled LLMs) instead of large flagship models.
Parallelize: start TTS while LLM is still generating text.
Use streaming APIs instead of request-response round trips.

Workforce Wave Latency

Workforce Wave targets sub-800ms end-to-end latency through regional inference, streaming APIs, and optimized model serving. Most calls feel indistinguishable from human agents.

Latency Benchmarks

Where Latency Comes From

Optimizing Latency

Workforce Wave Latency

Related Terms

See AI Voice Agents in Action