Documentation

Explore the Speechify API - text-to-speech, voice agents, and everything in between.

Speechify gives you two complementary products behind a single API key and SDK: Text to Speech for turning text into lifelike audio, and Voice Agents for real-time conversational voice. Start with either - most customers end up using both.

How Speechify works

Three concepts that are shared across every product. Learn these once and the rest of the docs fits into place.

Pick from our built-in catalog (george, henry, carly, sophia, …) or clone any voice from a 10-30 second sample. The same voice ID works in TTS and Voice Agents.

simba-english for highest-quality English, simba-multilingual for 50+ languages with mixed-language input. Defaults apply if you omit the field.

A single Authorization: Bearer key works for every endpoint. The Python and TypeScript SDKs read SPEECHIFY_API_KEY from the environment automatically.

Choose your path

Convert text into lifelike audio with a single API call. Up to 2,000 characters per request, 20,000 when streaming.

Best for audiobooks, article readers, learning apps, video voice-over, notifications, and any app that needs to speak.

Quickstart · Streaming · SSML

Put a talking, listening AI in your product in under five minutes. Sub-2s turn latency, tools, RAG, memory, and phone numbers.

Best for website assistants, support bots, inbound/outbound phone agents, and voice-driven product flows.

Quickstart · Embed widget · Phone Numbers

Meet the models

Our flagship English model. Zero-shot and fine-tuned voice cloning, full SSML and emotion control, and the lowest-latency streaming option.

English · Voice cloning · SSML · Emotion · Speech marks

Simba Multilingual

Multi-language and mixed-language input across 50+ languages. Same voices work everywhere, no separate cloning per language.

50+ languages · Voice cloning · Cross-language voices · Mixed-language input

Browse by capability

One call, lifelike audio back.

Start playback before the full audio is generated.

Clone any voice from a 10-30 second sample.

Emotion Control

Cheerful, sad, angry, and 10 more emotions.

Fine-grained control over pitch, rate, pauses, emphasis.

Language Support

6 fully supported, 17 in beta, 26 coming soon.

Word-level timestamps for highlighting and sync.

Real-time voice conversations with your users.

Drop-in <speechify-agent> web component.

Let agents call your backend or run on the caller’s device.

RAG over URLs, sitemaps, and uploaded documents.

Long-term per-caller memory with GDPR delete.

Inbound and outbound telephony for your agents.

Listen for conversation.started, ended, and more.

Platform essentials

API keys, header format, and security best practices.

Python (speechify-api) and TypeScript (@speechify/api).

Rate limits, concurrency caps, and request size caps per product.

Resources

Full endpoint schemas for both products.

End-to-end demo projects on GitHub.

Manage API keys, agents, voices, and billing.