SpeechifyAI Build TTS Models: Simba 3.2, 3.0, Multilingual, and English

Available models

Model	ID	Languages	Voice Cloning	Best for
Simba 3.2	`simba-3.2`	English (multilingual coming soon)	Zero-shot (manual approval)	Recommended for new integrations. Streaming-native; lowest TTFB and richest expressivity
Simba 3.0	`simba-3.0`	English (multilingual coming soon)	Curated voices only	Earlier streaming-native model; still available
Simba Multilingual	`simba-multilingual`	30+ languages	Zero-shot + fine-tuning	Multi-language or mixed-language content
Simba English	`simba-english`	English only	Zero-shot + fine-tuning	Current API default; the only English model that supports cloned/personal voices

Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english today; we recommend explicitly setting model: "simba-3.2" on new integrations for the lowest TTFB and richest expressivity.

POST

/v1/audio/speech

1 curl -X POST https://api.speechify.ai/v1/audio/speech \
2      -H "Authorization: Bearer <token>" \
3      -H "Content-Type: application/json" \
4      -d '{
5   "input": "Hello! This is the Speechify text-to-speech API.",
6   "voice_id": "geffen_32",
7   "audio_format": "mp3",
8   "model": "simba-3.2"
9 }'

Try it

Listing models via the API

Fetch the current set of selectable models at runtime instead of hardcoding the table above. GET /v1/audio/models returns every model ID you can pass as the model parameter, marks the default (the model used when model is omitted) and the recommended model, and describes each one. Drive a model picker from this response so it stays current as models are added or the recommendation changes.

GET

/v1/audio/models

1 curl https://api.speechify.ai/v1/audio/models \
2      -H "Authorization: Bearer <token>"

Try it

Response

1 {
2   "models": [
3     {
4       "id": "simba-english",
5       "name": "Simba English",
6       "default": true,
7       "recommended": false,
8       "description": "English-only synthesis; the model used when a request omits `model`.",
9       "languages": [
10         "en"
11       ]
12     },
13     {
14       "id": "simba-3.2",
15       "name": "Simba 3.2",
16       "default": false,
17       "recommended": true,
18       "description": "Streaming-native model with the lowest time-to-first-byte and richest expressivity, English only today.",
19       "languages": [
20         "en"
21       ]
22     }
23   ]
24 }

Each entry carries the model id, a human-readable name and description, a default flag, a recommended flag (the model we suggest for new integrations, which is distinct from the default - the default stays stable for backwards compatibility), and the languages it can synthesize (BCP-47 locale strings matching the language parameter). These values reflect current support and can change over time - a model may gain languages, for example - so read them at runtime rather than caching them: because the response shape is stable and clients ignore unknown fields, both changing values and future new fields are picked up without breaking existing integrations.

Voice cloning is not advertised per model here because whether cloned voices are available also depends on your plan; see Voice Cloning.

Simba 3.2

Streaming-native flagship model with the lowest TTFB (time to first byte) and richest expressivity. Recommended for new English integrations.

Optimized for real-time streaming with the lowest startup latency
Richer expressive range than earlier Simba generations
Full support for SSML and emotion control
Serves a curated voice set: beatrice_32, dominic_32, edmund_32, geffen_32, harper_32, hugh_32, imogen_32, wyatt_32
Currently English only; non-English voices return 400 until multilingual support ships
Multilingual support coming soon
Voice cloning is supported; each cloned voice currently requires manual Speechify review and approval of the voice key before it can be used on this model — see Voice Cloning

Simba 3.0

Earlier streaming-native model, still available for backwards compatibility.

Prefer simba-3.2 for the latest quality
Same curated voice-set behavior as simba-3.2
Full support for SSML and emotion control
Currently English only; non-English voices return 400

Simba Multilingual

This model is currently experimental and may be subject to changes.

Supports multiple languages, including mixing languages within a single sentence.

35 locales covering 30 distinct languages live today
Automatic language detection when the language parameter is omitted
Zero-shot voice cloning works across all supported languages
Fine-tuned voice cloning available (contact sales)

See Language Support for the full list.

Simba English

The current API default when model is omitted, and the model to use when your voice_id is a cloned/personal voice — those are not registered under the Simba 3 voice allow-list.

Full support for SSML and emotion control
Zero-shot voice cloning from short audio samples
Fine-tuned voice cloning from hours of speaker audio (contact sales)
Prefer simba-3.2 for new integrations using the built-in voice catalog

Voice cloning

Simba English and Simba Multilingual support two tiers of voice cloning; Simba 3.2 supports zero-shot voice cloning gated on manual Speechify approval of each voice key; Simba 3.0 serves a curated voice set only and does not accept cloned/personal voices:

Tier	Input	Quality	Availability
Zero-shot	10-30 second audio sample	Good	Self-serve via API or Console
Fine-tuned	Hours of speaker audio	Best	Contact sales

Simba 3.2 voice cloning currently requires manual Speechify review and approval of the voice key while we evaluate stronger safeguards, given the model’s quality. Cloned voices work self-serve on simba-english and simba-multilingual; contact Speechify to have a clone approved for use on simba-3.2.

See Voice Cloning for implementation details.

FAQ

Which model should I use?

Use Simba 3.2 for most English use cases — it has the lowest startup latency and richest expressivity, and is the recommended Simba 3 model. Cloned voices are supported on Simba 3.2 but each one currently requires manual Speechify approval of the voice key; Simba English is the self-serve option for cloned voices without approval. Use Simba Multilingual for non-English languages or mixed-language content. Note: simba-english is still the API default when model is omitted, so set model: "simba-3.2" explicitly to opt in.

Can I switch models without changing my code?

Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models — but note that Simba 3.0 only accepts voices from its curated allow-list, and Simba 3.2 accepts its curated allow-list plus cloned voices that have been manually approved by Speechify.

Do all models support the same voices?

No. Simba 3.0 serves a curated set of voices only (registered against the model’s VMS slug). Simba 3.2 serves the same curated set plus cloned voices that have been manually approved by Speechify. Simba English and Simba Multilingual serve the full built-in voice catalog and accept any cloned/personal voice without pre-approval.