SpeechifyAI Build API
REST endpoints for text-to-speech, streaming, and voice cloning
The SpeechifyAI Build API is a REST API at https://api.speechify.ai. Use it to generate speech from text, stream long-form audio, and clone voices from a short reference sample.
A minimal call. The request and response are generated from the API spec, so they stay in sync with the live endpoint.
Explore
Synthesize speech with POST /v1/audio/speech or stream long-form audio with POST /v1/audio/stream. Up to 2,000 characters per synthesis; up to 20,000 for streaming.
List, create, delete, and preview voices - including clones minted from a 10-30 second sample. Cloned voices work across every supported language.
Response format
Non-streaming endpoints return JSON. Speech synthesis returns base64-encoded audio in audio_data. The streaming endpoint returns raw audio chunks via HTTP chunked transfer encoding.
Errors
Every non-2xx response uses the same JSON envelope:
Check error.code in your SDK exception handler - it is a stable, machine-readable identifier you can branch on. error.message is human-friendly and may change between releases. error.fields carries per-field validation errors when relevant. request_id echoes the X-Request-ID response header; quote it when filing support tickets.
See Get started for authentication and limits, and Idempotency for retry-safe writes.