Create Chat Completion | Speechify API

curl -X POST https://api.speechify.ai/v1/chat/completions \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "waymark-moa",
  "messages": [
    {
      "role": "user",
      "content": "Explain the mixture-of-agents approach in two sentences."
    }
  ],
  "stream": false
}'

{
  "id": "chatcmpl-abc123",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "A mixture-of-agents routes a prompt across several models and fuses their answers."
      }
    }
  ],
  "object": "chat.completion",
  "created": 1700000000,
  "model": "waymark-moa",
  "usage": {
    "prompt_tokens": 4500,
    "completion_tokens": 1600,
    "total_tokens": 6100
  },
  "waymark": {
    "escalated": true,
    "models": [
      {
        "model": "zai-org/GLM-5.2",
        "input_tokens": 3000,
        "output_tokens": 1200
      },
      {
        "model": "gpt-oss-120b",
        "input_tokens": 1500,
        "output_tokens": 400
      }
    ],
    "route": "waymark-moa"
  }
}

Set stream: true to receive the answer as a text/event-stream of server-sent events. The response adds a waymark object reporting which upstream models ran and their per-model token counts, and the Speechify-Route response header names the route that served the request.

OpenAI-compatible chat completions. The gateway runs a mixture of frontier models and returns a single answer, so the request and response follow the OpenAI chat-completions shape: point the OpenAI SDK at this base URL and set `model` to one of the `waymark-*` routes. Any standard OpenAI parameter (`temperature`, `max_tokens`, `tools`, …) is forwarded. Set `stream: true` to receive the answer as a `text/event-stream` of server-sent events. The response adds a `waymark` object reporting which upstream models ran and their per-model token counts, and the `Speechify-Route` response header names the route that served the request.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Request

This endpoint expects an object.

modelenumRequired

The route to run. waymark-fast favors latency, waymark-moa balances quality and cost, and waymark-max runs the widest panel for the highest quality. Access to the higher routes depends on your plan.

messageslist of objectsRequired

The conversation so far, in OpenAI chat-message format.

streambooleanOptional

When true, the answer is streamed back as a text/event-stream of server-sent events instead of a single JSON response. Defaults to false.

Response headers

Speechify-Routestring

The route that served a chat completion (e.g. waymark-moa), after any in-gateway escalation. Mirrors the waymark.route field in the body.

RateLimit-Limitinteger

Request-rate budget: the maximum number of requests in the current window (the bucket capacity). The IETF-draft un-prefixed name; the legacy alias X-RateLimit-Limit carries the same value. Rides every response.

RateLimit-Remaininginteger

Request-rate budget: requests left in the current window. Legacy alias: X-RateLimit-Remaining.

RateLimit-Resetinteger

Request-rate budget: integer delta-seconds until the window fully refills (same unit as Retry-After). Legacy alias: X-RateLimit-Reset.

Response

The chat completion. A single JSON object by default; when the request set stream: true, a text/event-stream of server-sent events whose final data chunk before [DONE] carries the waymark usage object.

idstring

Unique identifier for the chat completion.

choiceslist of maps from strings to any

The list of completion choices, in OpenAI format.

objectstring

The object type, always chat.completion.

createdlong

Unix timestamp (seconds) of when the completion was created.

modelstring

The route that served the request.

usagemap from strings to any

Standard OpenAI token-usage totals for the request.

waymarkobject

Per-request routing and token breakdown. Reports the route taken, whether it escalated, and the input/output token counts for each upstream model that ran. Token counts only — no pricing or cost.

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

429

Too Many Requests Error

500

Internal Server Error

502

Bad Gateway Error

503

Service Unavailable Error

Authentication

Headers

Request

Response headers

Response

Errors