Text to Speech API

Lifelike speech in 50+ languages from a single API call. Stream long-form audio, clone any voice from a 10-30 second sample, and control delivery with SSML.

Your first request

1from speechify import Speechify
2
3client = Speechify() # reads SPEECHIFY_API_KEY from the environment
4response = client.tts.audio.speech(
5 input="Welcome to Speechify.",
6 voice_id="george",
7 audio_format="mp3",
8)
9
10with open("welcome.mp3", "wb") as f:
11 f.write(response.audio_data)
Grab a key at console.speechify.ai/api-keys and set SPEECHIFY_API_KEY in your environment. Then walk through the Quickstart for the full five-minute tour.

Set up

Build with TTS

Models and languages

Two models cover every use case. simba-english is the flagship English model: highest quality, lowest streaming latency, and full SSML + emotion control. simba-multilingual handles 50+ languages with mixed-language input - the same voice IDs work across every language, no separate cloning required.

See Models and Language Support for the full matrix.

Resources