Speechify TTS Models — Simba English, Multilingual, and 3.0

Available models

Model	ID	Languages	Voice Cloning	Best for
Simba English	`simba-english`	English only	Zero-shot + fine-tuning	Production English TTS with highest quality
Simba Multilingual	`simba-multilingual`	30+ languages	Zero-shot + fine-tuning	Multi-language or mixed-language content
Simba 3.0	`simba-3.0`	English (multilingual coming soon)	Zero-shot + fine-tuning	Streaming-native; lowest TTFB and richer expressivity

Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english.

POST

/v1/audio/speech

1 curl -X POST https://api.speechify.ai/v1/audio/speech \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{
5   "input": "Hello! This is the Speechify text-to-speech API, demonstrating how to synthesize speech from text.",
6   "voice_id": "george",
7   "audio_format": "mp3",
8   "model": "simba-english"
9 }'

Try it

Simba English

Optimized for English text-to-speech with the highest quality output.

Clear, natural-sounding speech
Consistent quality across outputs
Full support for SSML and emotion control
Zero-shot voice cloning from short audio samples
Fine-tuned voice cloning from hours of speaker audio (contact sales)

Simba Multilingual

This model is currently experimental and may be subject to changes.

Supports multiple languages, including mixing languages within a single sentence.

35 locales covering 30 distinct languages live today
Automatic language detection when the language parameter is omitted
Zero-shot voice cloning works across all supported languages
Fine-tuned voice cloning available (contact sales)

See Language Support for the full list.

Simba 3.0

Streaming-native model with lower TTFB (time to first byte) and richer expressivity than Simba English.

Optimized for real-time streaming with the lowest startup latency
Richer expressive range than Simba English
Currently English only; non-English voices return 400 until multilingual support ships
Multilingual support coming soon

Voice cloning

All models support two tiers of voice cloning:

Tier	Input	Quality	Availability
Zero-shot	10-30 second audio sample	Good	Self-serve via API or Console
Fine-tuned	Hours of speaker audio	Best	Contact sales

See Voice Cloning for implementation details.

FAQ

Which model should I use?

Use Simba 3.0 for real-time streaming use cases where the lowest startup latency matters (English-only for now). Use Simba English when steady-state English quality is the priority. Use Simba Multilingual for non-English languages or mixed-language content.

Can I switch models without changing my code?

Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models.

Do all models support the same voices?

Built-in system voices may differ between models. Cloned voices work with all models.