Voice Agents overview
Real-time voice conversations powered by the Speechify API
Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or (coming soon) a phone line.
What you get
- Speechify voices — every voice from our catalog is available, including cloned voices.
- Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
- Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), or invoke built-ins like
end_callandtransfer_to_number. - Full transcripts — every turn persisted with timestamps and tool traces.
- Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.
How it fits together
Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.
What to read next
Create an agent and place your first test call.
Give your agent access to your backend, the caller’s device, or built-in actions like end_call.
Receive conversation.started, conversation.ended, message.created webhooks.
Full schemas for /v1/agents, /v1/tools, /v1/conversations.