Documentation
Explore the Speechify API - text-to-speech, voice agents, and everything in between.
Speechify gives you two complementary products behind a single API key and SDK: Text to Speech for turning text into lifelike audio, and Voice Agents for real-time conversational voice. Start with either - most customers end up using both.
How Speechify works
Three concepts that are shared across every product. Learn these once and the rest of the docs fits into place.
Pick from our built-in catalog (george, henry, carly, sophia, …) or clone any voice from a 10-30 second sample. The same voice ID works in TTS and Voice Agents.
simba-english for highest-quality English, simba-multilingual for 50+ languages with mixed-language input. Defaults apply if you omit the field.
A single Authorization: Bearer key works for every endpoint. The Python and TypeScript SDKs read SPEECHIFY_API_KEY from the environment automatically.
Choose your path
Convert text into lifelike audio with a single API call. Up to 2,000 characters per request, 20,000 when streaming.
Best for audiobooks, article readers, learning apps, video voice-over, notifications, and any app that needs to speak.
Quickstart · Streaming · SSML
Put a talking, listening AI in your product in under five minutes. Sub-2s turn latency, tools, RAG, memory, and phone numbers.
Best for website assistants, support bots, inbound/outbound phone agents, and voice-driven product flows.
Quickstart · Embed widget · Phone Numbers
Meet the models
Our flagship English model. Zero-shot and fine-tuned voice cloning, full SSML and emotion control, and the lowest-latency streaming option.
English · Voice cloning · SSML · Emotion · Speech marks
Multi-language and mixed-language input across 50+ languages. Same voices work everywhere, no separate cloning per language.
50+ languages · Voice cloning · Cross-language voices · Mixed-language input
Browse by capability
One call, lifelike audio back.
Start playback before the full audio is generated.
Clone any voice from a 10-30 second sample.
Cheerful, sad, angry, and 10 more emotions.
Fine-grained control over pitch, rate, pauses, emphasis.
6 fully supported, 17 in beta, 26 coming soon.
Word-level timestamps for highlighting and sync.
Real-time voice conversations with your users.
Drop-in <speechify-agent> web component.
Let agents call your backend or run on the caller’s device.
RAG over URLs, sitemaps, and uploaded documents.
Long-term per-caller memory with GDPR delete.
Inbound and outbound telephony for your agents.
Listen for conversation.started, ended, and more.
Platform essentials
API keys, header format, and security best practices.
Python (speechify-api) and TypeScript (@speechify/api).
Rate limits, concurrency caps, and request size caps per product.