Stream Speech

Synthesize speech and stream the audio back as it is generated, for low-latency playback. The Accept header selects the audio container. For short text where receiving the whole file at once is fine, use POST /v1/audio/speech.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Headers

AcceptenumRequired
Allowed values:

Request

This endpoint expects an object.
inputstringRequired

Plain text or SSML to be synthesized to speech. Refer to https://docs.speechify.ai/docs/api-limits for the input size limits. Emotion, Pitch and Speed Rate are configured in the ssml input, please refer to the ssml documentation for more information: https://docs.speechify.ai/docs/ssml#prosody

voice_idstringRequired

Id of the voice to be used for synthesizing speech. Refer to /v1/voices endpoint for available voices

languagestringOptional

Language of the input. Follow the format of an ISO 639-1 language code and an ISO 3166-1 region code, separated by a hyphen, e.g. en-US. Please refer to the list of the supported languages and recommendations regarding this parameter: https://docs.speechify.ai/docs/language-support.

modelenumOptionalDefaults to simba-english

Model used for audio synthesis. simba-english is optimized for English, simba-multilingual for non-English or mixed input. simba-3.0 is the streaming-native model with lower TTFB and richer expressivity. Currently English only; multilingual coming soon. Non-English voices return 400 until multilingual support ships.

Allowed values:
optionsobjectOptional
GetStreamOptionsRequest is the wrapper for request parameters to the client

Response

Chunked audio stream for the requested input.

Errors

400
Bad Request Error
401
Unauthorized Error
402
Payment Required Error
403
Forbidden Error
500
Internal Server Error