Quickstart

Make your first text-to-speech API call in 5 minutes

1

Get your API key

  1. Sign up at console.speechify.ai
  2. Go to API Keys
  3. Copy your default API key (or create a new one)

Set it as an environment variable so the SDKs pick it up automatically:

$export SPEECHIFY_API_KEY="your-api-key-here"

API keys are sensitive. Never expose them in client-side code or public repositories. See the Authentication guide for security best practices.

2

Install the SDK

$pip install speechify-api
Prefer raw HTTP? No install needed — use the cURL tab in the examples below.
3

Generate speech

Send text to POST /v1/audio/speech. These examples are generated from our Fern SDKs and the API spec, so they switch languages and stay in sync with the live endpoint:

POST
/v1/audio/speech
1curl -X POST https://api.speechify.ai/v1/audio/speech \
2 -H "Authorization: Bearer <token>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "input": "Hello! This is the Speechify text-to-speech API.",
6 "voice_id": "george",
7 "audio_format": "mp3",
8 "model": "simba-english"
9}'

A successful call returns the audio payload:

Response
1{
2 "audio_data": "string",
3 "audio_format": "wav",
4 "billable_characters_count": 1,
5 "speech_marks": {
6 "chunks": [
7 {
8 "end": 1,
9 "end_time": 1.1,
10 "start": 1,
11 "start_time": 1.1,
12 "type": "string",
13 "value": "string"
14 }
15 ],
16 "end": 1,
17 "end_time": 1.1,
18 "start": 1,
19 "start_time": 1.1,
20 "type": "string",
21 "value": "string"
22 }
23}
The Python and TypeScript SDKs return decoded audio bytes. The raw HTTP response base64-encodes the audio in the audio_data field, so decode it before saving.
4

Save and play

Assign the call above to response, then write the audio to output.mp3:

1with open("output.mp3", "wb") as f:
2 f.write(response.audio_data)

Then play it from the terminal:

$afplay output.mp3

Choose a voice

List the built-in voices to find one that fits, then pass its id as the voice_id:

GET
/v1/voices
1curl https://api.speechify.ai/v1/voices \
2 -H "Authorization: Bearer <token>"

Popular built-in voices: george, henry, carly, sabrina. You can also clone any voice from a short audio sample.

Add emotion

Use SSML to control how the voice sounds — pass it as the input parameter and the API detects it automatically:

1<speak>
2 <speechify:style emotion="cheerful">
3 Great news! Your order has been shipped!
4 </speechify:style>
5</speak>
SSML also controls pitch, rate, pauses, and emphasis. See SSML and Emotion Control for the full reference.

Next steps