Interaction Mode 5 — Dual-Mode
One number. Humans and AI both call it.
The same phone number serves human callers with TTS conversation and AI orchestrators with structured JSON — detected in under 500ms, no second infrastructure layer required.
Detection via SIP headers + first-utterance analysis + pre-negotiated tokens.
Human Caller
AI Caller
{
"status": "ok",
"slot": "2026-04-18
T10:00:00",
"confirmed": true,
"agent_id":
"agt_7rx9..."
}<500ms detection — caller type identified before the first response
What Breaks Without It
The naive solution is two of everything.
Serving both humans and machines without dual-mode means maintaining two separate systems — forever.
Problem
Two phone numbers, two agents, double the cost
The naive solution: one number for humans, one for AI callers. Now you manage two phone lines, two agent configs, two billing accounts. When either changes, both need updating.
Problem
Human callers get machine-mode responses
If your agent is in machine mode 100% of the time, a human calling at 3pm for a booking gets terse JSON-style responses with no warmth, no conversation — and hangs up.
Problem
AI callers get human-mode responses
If your agent stays in human mode, a LangChain agent calling for structured appointment data gets verbose conversational text it can't parse. The integration fails silently.
Detection Mechanism
Three layers. Under 500ms total.
Dual-mode detection is not a guess — it's a layered analysis that resolves before the agent delivers its first response.
SIP Header Inspection
~0msThe SIP User-Agent header on the incoming call identifies most machine callers immediately. AI systems calling via API typically declare their user agent. This layer resolves ~80% of cases before the call even connects.
First-Utterance Pattern Analysis
~200msIf SIP inspection is inconclusive, WFW analyzes the first utterance. Machine callers have characteristic speech patterns: precise syntax, immediate task declaration, absence of social greeting. The model classifies in real time.
Pre-Negotiated Token Matching
<100msAPI callers can include a pre-negotiated auth token in their first utterance, bypassing detection latency entirely. This is the fast path for registered AI systems that need guaranteed <100ms machine-mode activation.
Response Formats
Same agent. Different response. Same call.
One phone number serves both — each caller type gets exactly what they need.
Human Mode
Caller is a person
- ✓Conversational, natural TTS speech
- ✓Empathy and warmth in tone
- ✓Clarifying questions when ambiguous
- ✓Hold music and human transfer logic
- ✓DTMF fallback if voice quality drops
- ✓Full appointment booking conversation
Machine Mode
Caller is an AI system
- ✓Structured JSON — typed, parseable
- ✓No conversational filler or pleasantries
- ✓Machine-readable error codes with retry hints
- ✓Task result in single response object
- ✓Sub-500ms end-to-end session completion
- ✓Webhook delivery of extractions on call.end
Real Scenarios
When one number needs to serve both.
These are the architectures where dual-mode is not a nice-to-have — it's the only clean solution.
Your platform has human customers calling to change bookings AND n8n workflows calling to confirm appointments. One WFW agent, one phone number. Human gets a warm conversation; the workflow gets JSON. Same agent config — you maintain nothing twice.
Patients call the practice line. The EHR integration (an AI agent) also queries the same line to verify appointment data. Dual-mode means HIPAA-compliant handling for the human call and structured data for the EHR query — from one phone number.
Your enterprise deployed both a customer-facing IVR replacement AND an internal orchestration layer that queries call data. Single number, single agent. The internal AI gets JSON; the external customer gets conversation. Zero routing configuration.
API Surface
Configure dual-mode via the API.
Dual-mode is enabled per agent via the dual_mode flag in POST /v2/agents. Pre-negotiated tokens for fast-path machine detection are registered via the token management endpoints.
The caller_type field is included in every call.completed webhook payload — so your analytics always know whether a given call came from a human or an AI system.
Enable Dual-Mode on Agent Creation
POST /v2/agents
{
"businessUrl": "https://yourpractice.com",
"verticalType": "dental",
"phone_number": "+18435551234",
"dual_mode": {
"enabled": true,
"machine_response_format": "json",
"detection_timeout_ms": 500
}
}
// call.completed webhook includes:
{
"call_id": "call_4mn1...",
"caller_type": "human", // or "machine"
"duration_seconds": 142,
"disposition": "booked"
}Detection Benchmark
<500ms
Caller type identified before the agent delivers its first response.
In practice, the SIP header inspection layer resolves approximately 80% of calls before the audio stream begins — meaning most machine callers see near-zero latency on mode selection. The 500ms bound is a hard worst-case for the full three-layer analysis including first-utterance classification.
Get Started
Stop maintaining two phone systems.
One WFW agent, one phone number, one API surface. Humans get the conversation they expect. AI callers get the JSON they need. Zero routing configuration required.
Dual-mode enabled via a single flag on any WFW agent.