Models

Choose the right text-to-speech model for your use case

Available models

ModelIDLanguagesVoice CloningBest for
Simba Englishsimba-englishEnglish onlyZero-shot + fine-tuningProduction English TTS with highest quality
Simba Multilingualsimba-multilingual30+ languagesZero-shot + fine-tuningMulti-language or mixed-language content
Simba 3.0simba-3.0English (multilingual coming soon)Zero-shot + fine-tuningStreaming-native; lowest TTFB and richer expressivity

Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english.

POST
/v1/audio/speech
1curl -X POST https://api.speechify.ai/v1/audio/speech \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "input": "Hello! This is the Speechify text-to-speech API, demonstrating how to synthesize speech from text.",
6 "voice_id": "george",
7 "audio_format": "mp3",
8 "model": "simba-english"
9}'

Simba English

Optimized for English text-to-speech with the highest quality output.

  • Clear, natural-sounding speech
  • Consistent quality across outputs
  • Full support for SSML and emotion control
  • Zero-shot voice cloning from short audio samples
  • Fine-tuned voice cloning from hours of speaker audio (contact sales)

Simba Multilingual

This model is currently experimental and may be subject to changes.

Supports multiple languages, including mixing languages within a single sentence.

  • 35 locales covering 30 distinct languages live today
  • Automatic language detection when the language parameter is omitted
  • Zero-shot voice cloning works across all supported languages
  • Fine-tuned voice cloning available (contact sales)

See Language Support for the full list.

Simba 3.0

Streaming-native model with lower TTFB (time to first byte) and richer expressivity than Simba English.

  • Optimized for real-time streaming with the lowest startup latency
  • Richer expressive range than Simba English
  • Currently English only; non-English voices return 400 until multilingual support ships
  • Multilingual support coming soon

Voice cloning

All models support two tiers of voice cloning:

TierInputQualityAvailability
Zero-shot10-30 second audio sampleGoodSelf-serve via API or Console
Fine-tunedHours of speaker audioBestContact sales

See Voice Cloning for implementation details.

FAQ

Use Simba 3.0 for real-time streaming use cases where the lowest startup latency matters (English-only for now). Use Simba English when steady-state English quality is the priority. Use Simba Multilingual for non-English languages or mixed-language content.

Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models.

Built-in system voices may differ between models. Cloned voices work with all models.