Create Message

Beta
Anthropic-compatible Messages endpoint. The gateway runs a mixture of frontier models and returns a single answer in Anthropic's Messages shape: point the Anthropic SDK (or Claude Code via `ANTHROPIC_BASE_URL`) at this base URL and set `model` to one of the `waymark-*` routes. Any standard Anthropic parameter (`system`, `temperature`, `top_p`, `stop_sequences`, `tools`, …) is forwarded. Set `stream: true` to receive the answer as a `text/event-stream` of Anthropic server-sent events. The response adds a `waymark` object reporting which upstream models ran and their per-model token counts, and the `Speechify-Route` response header names the route that served the request.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Headers

Speechify-VersionstringOptional

Request

This endpoint expects an object.
modelenumRequired

The route to run. waymark-fast favors latency, waymark-moa balances quality and cost, and waymark-max runs the widest panel for the highest quality. Access to the higher routes depends on your plan.

max_tokenslongRequired
The maximum number of tokens to generate before stopping. Required by the Anthropic Messages API.
messageslist of objectsRequired
The conversation so far, in Anthropic message format.
systemstring or list of maps from strings to anyOptional

A system prompt giving the model context and instructions: a plain string, or an array of Anthropic text blocks.

streambooleanOptional

When true, the answer is streamed back as a text/event-stream of Anthropic server-sent events instead of a single JSON response. Defaults to false.

temperaturedoubleOptional

Amount of randomness injected into the response (0 to 1).

top_pdoubleOptional
Use nucleus sampling over the given cumulative probability.
stop_sequenceslist of stringsOptional
Custom text sequences that will cause the model to stop generating.

Response headers

Speechify-Routestring

The route that served a chat completion (e.g. waymark-moa), after any in-gateway escalation. Mirrors the waymark.route field in the body.

RateLimit-Limitinteger

Request-rate budget: the maximum number of requests in the current window (the bucket capacity). The IETF-draft un-prefixed name; the legacy alias X-RateLimit-Limit carries the same value. Rides every response.

RateLimit-Remaininginteger

Request-rate budget: requests left in the current window. Legacy alias: X-RateLimit-Remaining.

RateLimit-Resetinteger

Request-rate budget: integer delta-seconds until the window fully refills (same unit as Retry-After). Legacy alias: X-RateLimit-Reset.

Response

The message. A single JSON object by default; when the request set stream: true, a text/event-stream of Anthropic server-sent events whose message_delta frame carries the waymark usage object.

idstring
Unique identifier for the message.
typestring

The object type, always message.

rolestring

The conversational role of the generated message, always assistant.

contentlist of maps from strings to any
The generated content blocks, in Anthropic format.
modelstring
The route that served the request.
stop_reasonstring or null

The reason generation stopped (e.g. end_turn, max_tokens, stop_sequence); null while a streamed message is still in flight.

stop_sequencestring or null

The custom stop sequence that was generated, if any; otherwise null.

usageobject

Anthropic token-usage totals for the request.

waymarkobject

Per-request routing and token breakdown. Reports the route taken, whether it escalated, and the input/output token counts for each upstream model that ran. Token counts only — no pricing or cost.

Errors

400
Bad Request Error
401
Unauthorized Error
402
Payment Required Error
403
Forbidden Error
429
Too Many Requests Error
500
Internal Server Error
502
Bad Gateway Error
503
Service Unavailable Error