Text to Speech API
Lifelike speech in 50+ languages from a single API call. Stream long-form audio, clone any voice from a 10-30 second sample, and control delivery with SSML.
Your first request
Python
TypeScript
cURL
SPEECHIFY_API_KEY in your environment. Then walk through the Quickstart for the full five-minute tour.Set up
pip install speechify-api for Python, npm install @speechify/api for TypeScript. Both read SPEECHIFY_API_KEY from the environment automatically.
A single Authorization: Bearer key works for every endpoint. Manage and rotate keys in the console.
Build with TTS
Start playback before the full audio is generated. Up to 20,000 characters per request.
Clone any voice from a 10-30 second sample. Cloned voices work across every supported language.
Fine-grained control over pitch, rate, pauses, emphasis, and 13 emotion presets.
Word-level timestamps for highlighting, captions, and audio-text sync.
Models and languages
Two models cover every use case. simba-english is the flagship English model: highest quality, lowest streaming latency, and full SSML + emotion control. simba-multilingual handles 50+ languages with mixed-language input - the same voice IDs work across every language, no separate cloning required.
See Models and Language Support for the full matrix.