Stream Speech
Synthesize speech and stream the audio back as it is generated, for low-latency playback. The Accept header selects the audio container. For short text where receiving the whole file at once is fine, use POST /v1/audio/speech.
Authentication
Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.
Headers
Request
Plain text or SSML to be synthesized to speech. Refer to https://docs.speechify.ai/docs/api-limits for the input size limits. Emotion, Pitch and Speed Rate are configured in the ssml input, please refer to the ssml documentation for more information: https://docs.speechify.ai/docs/ssml#prosody
Id of the voice to be used for synthesizing speech. Refer to /v1/voices endpoint for available voices
Language of the input. Follow the format of an ISO 639-1 language code and an ISO 3166-1 region code, separated by a hyphen, e.g. en-US. Please refer to the list of the supported languages and recommendations regarding this parameter: https://docs.speechify.ai/docs/language-support.
Model used for audio synthesis. simba-english is optimized for English, simba-multilingual for non-English or mixed input. simba-3.0 is the streaming-native model with lower TTFB and richer expressivity. Currently English only; multilingual coming soon. Non-English voices return 400 until multilingual support ships.