Create Message | Speechify API

curl -X POST https://api.speechify.ai/v1/messages \
     -H "Authorization: Bearer <token>" \
     -H "Content-Type: application/json" \
     -d '{
  "model": "waymark-moa",
  "max_tokens": 1024,
  "messages": [
    {
      "role": "user",
      "content": "Explain the mixture-of-agents approach in two sentences."
    }
  ],
  "stream": false
}'

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "A mixture-of-agents routes a prompt across several models and fuses their answers."
    }
  ],
  "model": "waymark-moa",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 4500,
    "output_tokens": 1600
  },
  "waymark": {
    "escalated": true,
    "models": [
      {
        "model": "zai-org/GLM-5.2",
        "input_tokens": 3000,
        "output_tokens": 1200
      },
      {
        "model": "gpt-oss-120b",
        "input_tokens": 1500,
        "output_tokens": 400
      }
    ],
    "route": "waymark-moa"
  }
}

Anthropic-compatible Messages endpoint. The gateway runs a mixture of frontier models and returns a single answer in Anthropic’s Messages shape: point the Anthropic SDK (or Claude Code via ANTHROPIC_BASE_URL) at this base URL and set model to one of the waymark-* routes. Any standard Anthropic parameter (system, temperature, top_p, stop_sequences, tools, …) is forwarded.

Set stream: true to receive the answer as a text/event-stream of Anthropic server-sent events. The response adds a waymark object reporting which upstream models ran and their per-model token counts, and the Speechify-Route response header names the route that served the request.

Anthropic-compatible Messages endpoint. The gateway runs a mixture of frontier models and returns a single answer in Anthropic's Messages shape: point the Anthropic SDK (or Claude Code via `ANTHROPIC_BASE_URL`) at this base URL and set `model` to one of the `waymark-*` routes. Any standard Anthropic parameter (`system`, `temperature`, `top_p`, `stop_sequences`, `tools`, …) is forwarded. Set `stream: true` to receive the answer as a `text/event-stream` of Anthropic server-sent events. The response adds a `waymark` object reporting which upstream models ran and their per-model token counts, and the `Speechify-Route` response header names the route that served the request.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Headers

Speechify-VersionstringOptional

Request

This endpoint expects an object.

modelenumRequired

The route to run. waymark-fast favors latency, waymark-moa balances quality and cost, and waymark-max runs the widest panel for the highest quality. Access to the higher routes depends on your plan.

max_tokenslongRequired

The maximum number of tokens to generate before stopping. Required by the Anthropic Messages API.

messageslist of objectsRequired

The conversation so far, in Anthropic message format.

systemstring or list of maps from strings to anyOptional

A system prompt giving the model context and instructions: a plain string, or an array of Anthropic text blocks.

streambooleanOptional

When true, the answer is streamed back as a text/event-stream of Anthropic server-sent events instead of a single JSON response. Defaults to false.

temperaturedoubleOptional

Amount of randomness injected into the response (0 to 1).

top_pdoubleOptional

Use nucleus sampling over the given cumulative probability.

stop_sequenceslist of stringsOptional

Custom text sequences that will cause the model to stop generating.

Response headers

Speechify-Routestring

The route that served a chat completion (e.g. waymark-moa), after any in-gateway escalation. Mirrors the waymark.route field in the body.

RateLimit-Limitinteger

Request-rate budget: the maximum number of requests in the current window (the bucket capacity). The IETF-draft un-prefixed name; the legacy alias X-RateLimit-Limit carries the same value. Rides every response.

RateLimit-Remaininginteger

Request-rate budget: requests left in the current window. Legacy alias: X-RateLimit-Remaining.

RateLimit-Resetinteger

Request-rate budget: integer delta-seconds until the window fully refills (same unit as Retry-After). Legacy alias: X-RateLimit-Reset.

Response

The message. A single JSON object by default; when the request set stream: true, a text/event-stream of Anthropic server-sent events whose message_delta frame carries the waymark usage object.

idstring

Unique identifier for the message.

typestring

The object type, always message.

rolestring

The conversational role of the generated message, always assistant.

contentlist of maps from strings to any

The generated content blocks, in Anthropic format.

modelstring

The route that served the request.

stop_reasonstring or null

The reason generation stopped (e.g. end_turn, max_tokens, stop_sequence); null while a streamed message is still in flight.

stop_sequencestring or null

The custom stop sequence that was generated, if any; otherwise null.

usageobject

Anthropic token-usage totals for the request.

waymarkobject

Per-request routing and token breakdown. Reports the route taken, whether it escalated, and the input/output token counts for each upstream model that ran. Token counts only — no pricing or cost.

Errors

400

Bad Request Error

401

Unauthorized Error

402

Payment Required Error

403

Forbidden Error

429

Too Many Requests Error

500

Internal Server Error

502

Bad Gateway Error

503

Service Unavailable Error