Voice Agents overview
Real-time voice conversations powered by the Speechify API
Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or a phone line.
What you get
- Speechify voices — a curated catalog of natural voices, available via
GET /v1/agents/voices. (Cloned/personal voices stay TTS-only and aren’t usable by agents.) - Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
- Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), or invoke built-ins like
end_callandtransfer_to_number. - Full transcripts — every turn persisted with timestamps and tool traces.
- Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.
How it fits together
Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.
When to reach for a voice agent
- Inbound support and triage. Answer routine questions before a human has to pick up.
- Outbound follow-ups. Confirm appointments, check in on customers, collect structured information at scale.
- IVR replacement. Replace tone-tree menus with a conversation that routes the caller correctly the first time.
Build without code in the console
Everything here is also a no-code workflow in the console: write the prompt, pick a voice, attach knowledge, connect a phone number, and preview the conversation in your browser. Start with the Quickstart or take the dashboard tour.