Voice Agents overview

Real-time voice conversations powered by the Speechify API

Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or a phone line.

What you get

  • Speechify voices — a curated catalog of natural voices, available via GET /v1/agents/voices. (Cloned/personal voices stay TTS-only and aren’t usable by agents.)
  • Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
  • Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), or invoke built-ins like end_call and transfer_to_number.
  • Full transcripts — every turn persisted with timestamps and tool traces.
  • Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.

How it fits together

Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.

When to reach for a voice agent

  • Inbound support and triage. Answer routine questions before a human has to pick up.
  • Outbound follow-ups. Confirm appointments, check in on customers, collect structured information at scale.
  • IVR replacement. Replace tone-tree menus with a conversation that routes the caller correctly the first time.

Build without code in the console

Everything here is also a no-code workflow in the console: write the prompt, pick a voice, attach knowledge, connect a phone number, and preview the conversation in your browser. Start with the Quickstart or take the dashboard tour.