Models
Choose the right text-to-speech model for your use case
Available models
Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english.
Simba English
Optimized for English text-to-speech with the highest quality output.
- Clear, natural-sounding speech
- Consistent quality across outputs
- Full support for SSML and emotion control
- Zero-shot voice cloning from short audio samples
- Fine-tuned voice cloning from hours of speaker audio (contact sales)
Simba Multilingual
Supports multiple languages, including mixing languages within a single sentence.
- 35 locales covering 30 distinct languages live today
- Automatic language detection when the
languageparameter is omitted - Zero-shot voice cloning works across all supported languages
- Fine-tuned voice cloning available (contact sales)
See Language Support for the full list.
Simba 3.0
Streaming-native model with lower TTFB (time to first byte) and richer expressivity than Simba English.
- Optimized for real-time streaming with the lowest startup latency
- Richer expressive range than Simba English
- Currently English only; non-English voices return
400until multilingual support ships - Multilingual support coming soon
Voice cloning
All models support two tiers of voice cloning:
See Voice Cloning for implementation details.
FAQ
Which model should I use?
Use Simba 3.0 for real-time streaming use cases where the lowest startup latency matters (English-only for now). Use Simba English when steady-state English quality is the priority. Use Simba Multilingual for non-English languages or mixed-language content.
Can I switch models without changing my code?
Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models.
Do all models support the same voices?
Built-in system voices may differ between models. Cloned voices work with all models.