Voice Agents overview
Real-time voice conversations powered by the Speechify API
Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or a phone line.
What you get
- Speechify voices — a curated catalog of natural voices, available via
GET /v1/agents/voices. (Cloned/personal voices stay TTS-only and aren’t usable by agents.) - Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
- Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), connect to a remote MCP server (MCP tools), or invoke built-ins like
end_callandtransfer_to_number. - Full transcripts — every turn persisted with timestamps and tool traces.
- Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.
How it fits together
Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.
When to reach for a voice agent
- Inbound support and triage. Answer routine questions before a human has to pick up.
- Outbound follow-ups. Confirm appointments, check in on customers, collect structured information at scale.
- IVR replacement. Replace tone-tree menus with a conversation that routes the caller correctly the first time.
Build without code in the console
Everything here is also a no-code workflow in the console: write the prompt, pick a voice, attach knowledge, connect a phone number, and preview the conversation in your browser. Start with the Quickstart or take the dashboard tour.
What to read next
Create an agent and place your first test call.
Give your agent access to your backend, the caller’s device, or built-in actions like end_call.
Receive a signed POST after each conversation completes.
Full schemas for /v1/agents, /v1/agents/tools, /v1/agents/conversations.