SpeechifyAI Build API

REST endpoints for text-to-speech, streaming, and voice cloning

The SpeechifyAI Build API is a REST API at https://api.speechify.ai. Use it to generate speech from text, stream long-form audio, and clone voices from a short reference sample.

A minimal call. The request and response are generated from the API spec, so they stay in sync with the live endpoint.

POST
/v1/audio/speech
1curl -X POST https://api.speechify.ai/v1/audio/speech \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "input": "Hello! This is the Speechify text-to-speech API.",
6 "voice_id": "george",
7 "audio_format": "mp3",
8 "model": "simba-english"
9}'
Response
1{
2 "audio_data": "example",
3 "audio_format": "wav",
4 "billable_characters_count": 10,
5 "speech_marks": {
6 "chunks": [
7 {}
8 ],
9 "end": 1,
10 "end_time": 1,
11 "start": 1,
12 "start_time": 1,
13 "type": "example",
14 "value": "example"
15 }
16}

Explore

Response format

Non-streaming endpoints return JSON. Speech synthesis returns base64-encoded audio in audio_data. The streaming endpoint returns raw audio chunks via HTTP chunked transfer encoding.

Errors

Every non-2xx response uses the same JSON envelope:

1{
2 "error": {
3 "code": "voice_not_found",
4 "message": "Voice 'voice_demo0001' does not exist."
5 },
6 "request_id": "7f3a2c1b4d5e6f7a"
7}

Check error.code in your SDK exception handler - it is a stable, machine-readable identifier you can branch on. error.message is human-friendly and may change between releases. error.fields carries per-field validation errors when relevant. request_id echoes the X-Request-ID response header; quote it when filing support tickets.

See Get started for authentication and limits, and Idempotency for retry-safe writes.