Stream Speech | Speechify API

Synthesize speech and stream the audio back as it is generated, for low-latency playback. The Accept header selects the audio container; the response is raw audio bytes (HTTP chunked). For Base64-encoded audio with speech-mark metadata in a single JSON response, use POST /v1/audio/speech.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Request

This endpoint expects an object.

inputstringRequired

Plain text or SSML to be synthesized to speech. Refer to https://docs.speechify.ai/docs/api-limits for the input size limits. Emotion, Pitch and Speed Rate are configured in the ssml input, please refer to the ssml documentation for more information: https://docs.speechify.ai/docs/ssml#prosody

voice_idstringRequired

Id of the voice to be used for synthesizing speech. Refer to /v1/voices endpoint for available voices

languagestringOptional

Language of the input. Follow the format of an ISO 639-1 language code and an ISO 3166-1 region code, separated by a hyphen, e.g. en-US. Please refer to the list of the supported languages and recommendations regarding this parameter: https://docs.speechify.ai/docs/language-support.

modelenumOptionalDefaults to simba-english

Model used for audio synthesis. simba-english is optimized for English, simba-multilingual for non-English or mixed input. simba-3.0 is the streaming-native model with lower TTFB and richer expressivity. Currently English only; multilingual coming soon. Non-English voices return 400 until multilingual support ships.

optionsobjectOptional

GetStreamOptionsRequest is the wrapper for request parameters to the client

Response headers

X-Request-IDstring

Unique identifier for this request, present on every response (2xx and non-2xx alike). If the caller sends an X-Request-ID request header the server echoes it back (sanitized and length-capped) so one logical request can be traced end-to-end; otherwise the server generates a fresh value. Log it on every response and quote it in support requests - it is the stable handle that ties your observation to Speechify’s server-side logs, and it matches the request_id field in the error envelope.

Unique identifier for this request, present on every response (2xx and non-2xx alike). If the caller sends an `X-Request-ID` request header the server echoes it back (sanitized and length-capped) so one logical request can be traced end-to-end; otherwise the server generates a fresh value. Log it on every response and quote it in support requests - it is the stable handle that ties your observation to Speechify's server-side logs, and it matches the `request_id` field in the error envelope.

Response

Streamed audio. The Content-Type matches the Accept header except for audio/pcm, which returns audio/L16 with rate and channels parameters (see the Accept header description).

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

404

Not Found Error

429

Too Many Requests Error

500

Internal Server Error

502

Bad Gateway Error

503

Service Unavailable Error

1	from speechify import Speechify
2
3	client = Speechify(
4	token="YOUR_TOKEN_HERE",
5	)
6
7	client.audio.stream(
8	accept="audio/mpeg",
9	input="Streaming long-form audio with the Speechify API.",
10	voice_id="george",
11	model="simba-english",
12	)