Speechify TTS Models — Simba English & Multilingual

Available models

Model	ID	Languages	Voice Cloning	Best for
Simba English	`simba-english`	English only	Zero-shot + fine-tuning	Production English TTS with highest quality
Simba Multilingual	`simba-multilingual`	50+ languages	Zero-shot + fine-tuning	Multi-language or mixed-language content

Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english.

POST

/v1/audio/speech

1 curl -X POST https://api.speechify.ai/v1/audio/speech \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{
5   "input": "Hello! This is the Speechify text-to-speech API.",
6   "voice_id": "george",
7   "audio_format": "mp3",
8   "model": "simba-english"
9 }'

Try it

Simba English

Optimized for English text-to-speech with the highest quality output.

Clear, natural-sounding speech
Consistent quality across outputs
Full support for SSML and emotion control
Zero-shot voice cloning from short audio samples
Fine-tuned voice cloning from hours of speaker audio (contact sales)

Simba Multilingual

This model is currently experimental and may be subject to changes.

Supports multiple languages, including mixing languages within a single sentence.

6 fully supported languages, 17 in beta, 25 coming soon
Automatic language detection when the language parameter is omitted
Zero-shot voice cloning works across all supported languages
Fine-tuned voice cloning available (contact sales)

See Language Support for the full list.

Voice cloning

Both models support two tiers of voice cloning:

Tier	Input	Quality	Availability
Zero-shot	10-30 second audio sample	Good	Self-serve via API or Console
Fine-tuned	Hours of speaker audio	Best	Contact sales

See Voice Cloning for implementation details.

FAQ

Which model should I use?

Use Simba English if your content is English-only — it produces the highest quality output. Use Simba Multilingual if you need non-English languages or mixed-language content.

Can I switch models without changing my code?

Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models.

Do both models support the same voices?

Built-in system voices may differ between models. Cloned voices work with both models.