Designing an API for AI Consumers: What's Different
Most API design literature is written for human developers as the consumer. The API is explored in Postman, called from curl, read about in Swagger UI. The developer reads error messages, understands context, makes judgment calls.
That assumption breaks when your primary API consumer is an AI agent.
An AI consumer doesn't browse docs. It reads the OpenAPI spec or gets tool schemas through MCP. It retries on network failure without always knowing whether the first request succeeded. It branches on error codes without being able to infer meaning from error prose. It processes results sequentially and doesn't naturally think in page numbers.
Designing an API for this consumer requires different choices than designing one for a developer using Postman. This post covers the six principles we built WFW's v2 API around, and what each one looks like in practice.
Principle 1: Everything Is Async by Default
Traditional API design favors synchronous responses where possible — the caller sends a request, the server does the work, the caller gets the result. Simple, familiar, debuggable.
The problem: AI agents in orchestration pipelines don't benefit from blocking calls. If agent provisioning takes 90 seconds, a blocking POST /v2/agents would hold the calling agent's execution thread for 90 seconds. That's fine for a human in a terminal. It's a serious constraint for an orchestrator managing dozens of parallel workflows.
Every long-running operation in the WFW API returns a 202 Accepted with an operation_id immediately. The caller polls GET /v2/operations/{id} or subscribes to a webhook. The execution thread is free to do other work.
// POST /v2/agents — returns in ~100ms
{
"operation_id": "op_abc123",
"status": "pending",
"estimated_seconds": 90
}
// GET /v2/operations/op_abc123 — poll until active
{
"operation_id": "op_abc123",
"status": "active",
"agent_id": "agt_xyz789",
"provisioning_time_seconds": 84
}
For bot callers that would rather not poll, the agent.activated webhook delivers the same information as a push. For batch operations (provisioning 50 agents at once), the async pattern is what makes it tractable — fire 50 POSTs in parallel, check results in one batch after 2 minutes.
Principle 2: Idempotency Is Not Optional
A human developer who double-submits a form notices the duplicate and deletes one. An AI agent that retries on network failure has no way to know whether the first request succeeded.
Without idempotency, a retry that follows a timed-out request that actually succeeded creates a duplicate agent. Two agents, same configuration, both live on different phone numbers. This is a real failure mode — not a theoretical edge case.
Every mutating endpoint accepts an Idempotency-Key header. Send the same key twice within 24 hours and you get the same response both times. The server deduplicates silently.
# First call (or retry after failure) — same result either way
curl -X POST https://api.workforcewave.com/v2/agents \
-H "Idempotency-Key: provision-ridgeline-2026-06-29" \
...
The key is caller-generated. A good key encodes enough context to be naturally unique per logical operation: client ID + operation type + date. Don't use random UUIDs — you can't reconstruct them after a crash.
The deduplication window is 24 hours. After that, the same key is treated as a new request. This covers realistic retry scenarios (network blips, deployment restarts) without permanently blocking re-use of semantically natural keys.
Principle 3: Machine-Readable Errors
Error messages written for human developers say things like "The business URL you provided returned a 404. Make sure the URL is publicly accessible and try again."
That's a helpful message for a human reading a terminal. For an AI consumer, the actionable information is error.code: CRAWLTARGETNOT_FOUND. The message is for logging. The code is for branching.
Every WFW error response follows the same envelope:
{
"error": {
"code": "CRAWL_TARGET_NOT_FOUND",
"message": "The business URL returned a 404. Ensure the URL is publicly accessible.",
"retryable": false,
"recovery_actions": ["verify_url", "use_fallback_template"],
"http_status": 422,
"request_id": "req_7c3a2f"
}
}
code is always a machine-readable string constant — documented in the API reference, never changed without a major version bump. An AI consumer can switch on error.code directly.
retryable is a boolean. If true, the same request is safe to retry (typically a transient failure). If false, retrying won't help — the caller needs to change something.
recoveryactions is an array of suggested next steps. These are documented action codes, not prose. An orchestrator that encounters CRAWLTARGETNOTFOUND can look up usefallbacktemplate in its action table and respond by retrying with a templateid but no businessurl.
request_id goes in your logs. If you need to open a support ticket, it's the first thing we'll ask for.
Principle 4: x-bot-guidance in the OpenAPI Spec
OpenAPI specs are written for human developers who read docs. When an MCP tool schema or OpenAPI spec is ingested by an LLM, that LLM is looking for the same kind of context — but it processes it differently. Prose descriptions buried in description fields may or may not carry the right weight.
WFW's OpenAPI 3.1 spec includes x-bot-guidance extensions at the operation and schema levels. These are short, imperative instructions written specifically for LLM consumers:
paths:
/v2/agents:
post:
summary: Create a new voice agent
x-bot-guidance: |
Always include Idempotency-Key. This operation is async — you will
receive operation_id, not agent_id. Poll GET /v2/operations/{id}
until status=active before using the agent. Do not retry without
changing Idempotency-Key if you are uncertain whether the first
attempt succeeded.
parameters:
- name: Idempotency-Key
in: header
required: false
x-bot-guidance: "Treat as required. Omitting risks duplicate agents on retry."
These instructions are the system prompt for the API consumer — the context an AI needs to use the endpoint correctly, surfaced where the AI will see it rather than in a separate documentation page it may never access.
Principle 5: The Envelope Is Always the Same
One of the most common sources of brittleness in bot integrations is surprise response shapes. The caller expects { "data": [...] } and gets { "results": [...] } on certain endpoints. Or a 204 with no body on some DELETEs and a 200 with a body on others. Every inconsistency is a special case the calling AI has to learn.
WFW's v2 API uses a consistent response envelope everywhere:
{
"data": { ... }, // the actual payload, always at data
"meta": { // present on list responses
"total": 47,
"cursor": "cur_abc",
"has_more": true
},
"request_id": "req_...", // always present
"timestamp": "2026-06-29T14:22:08Z"
}
Errors always use the error envelope from Principle 3. Async operations always return operation_id and status. List responses always include meta with cursor pagination fields. No endpoint returns a bare array, a bare string, or a 204 with no body (we return { "data": { "deleted": true } } instead).
An AI consumer that learns the envelope once can parse every response correctly without special-casing individual endpoints.
Principle 6: Cursor Pagination Over Offset
Offset pagination (?page=3&per_page=20) is intuitive for humans scrolling through a UI. It maps to "give me the third page of 20 results."
For AI consumers processing results sequentially — reading every call transcript for a given agent, extracting data from each one — offset pagination has two problems:
- Instability under insertions — if new records are inserted during pagination, offset-based pages shift. Records get skipped or duplicated.
- No natural "next batch" concept — an AI processing a large result set wants to say "continue from where I left off," not "give me page 7."
WFW uses cursor-based pagination on all list endpoints:
{
"data": [ ...20 items... ],
"meta": {
"total": 183,
"cursor": "cur_eyJsYXN0X2lkIjoiY2FsbF9hYmMxMjMifQ",
"has_more": true
}
}
The calling AI stores the cursor value and passes it as ?cursor=cur_... in the next request. The cursor encodes the position stably — new records inserted during pagination don't shift it. The AI can process all 183 results with 10 sequential requests, each building on the previous cursor, with no risk of duplication or gaps.
The Bot-Facing Signals in Practice
These six principles show up in a concrete way when something goes wrong. Here's what a full error + retry cycle looks like for a bot consumer:
// First attempt — network timeout; bot doesn't know if it succeeded
// Bot retries with same Idempotency-Key
// Second attempt — server responds:
{
"error": {
"code": "RATE_LIMITED",
"message": "Too many requests. Retry after 30 seconds.",
"retryable": true,
"retry_after_seconds": 30,
"recovery_actions": ["wait_and_retry"],
"http_status": 429,
"request_id": "req_9f2b1c"
}
}
The bot reads retryable: true, reads retryafterseconds: 30, waits, and retries. It doesn't have to parse the message. It doesn't have to guess whether a 429 is retryable. It doesn't have to infer the wait time from the prose.
If the same situation returned retryable: false, the bot would know not to retry and would surface the request_id to whatever observability system it reports to.
This is what "designed for machine consumption" looks like at the protocol level: every signal the bot needs to act correctly is present in a structured field, not embedded in human-readable prose.
Next in this series: Dual-Mode Voice: How One Phone Number Serves Both Humans and AI Agents — the implementation details behind WFW's caller detection stack.
Ready to put AI voice agents to work in your business?
Get a Live Demo — It's FreeContinue Reading
Related Articles
Caller-Type Detection at 500ms: How to Tell a Human from an AI Mid-Call
When an inbound call arrives, your voice agent has under 500ms to decide whether it's talking to a human or another AI system — before generating a single word. Here's how WFW's dual-mode detection works.
The Hotel AI Called the Restaurant AI: A Story About What's Coming
When a hotel concierge AI needs to book a table at the hotel restaurant, it calls the restaurant's phone number — and the restaurant has a WFW agent. What happens? This is the A2A story.
Rate Limiting and Idempotency: What Your Bot Needs to Know
The two most important API patterns for AI consumers of the WFW API — with concrete examples and a production-ready TypeScript client.