Create Chat Completion

Beta
OpenAI-compatible chat completions. The gateway runs a mixture of frontier models and returns a single answer, so the request and response follow the OpenAI chat-completions shape: point the OpenAI SDK at this base URL and set `model` to one of the `waymark-*` routes. Any standard OpenAI parameter (`temperature`, `max_tokens`, `tools`, …) is forwarded. Set `stream: true` to receive the answer as a `text/event-stream` of server-sent events. The response adds a `waymark` object reporting which upstream models ran and their per-model token counts, and the `Speechify-Route` response header names the route that served the request.

Authentication

AuthorizationBearer

Enter your API key with the Bearer prefix, e.g. ‘Bearer sk_…’.

Headers

Speechify-VersionstringOptional

Request

This endpoint expects an object.
modelenumRequired

The route to run. waymark-fast favors latency, waymark-moa balances quality and cost, and waymark-max runs the widest panel for the highest quality. Access to the higher routes depends on your plan.

messageslist of objectsRequired

The conversation so far, in OpenAI chat-message format.

streambooleanOptional

When true, the answer is streamed back as a text/event-stream of server-sent events instead of a single JSON response. Defaults to false.

Response headers

Speechify-Routestring

The route that served a chat completion (e.g. waymark-moa), after any in-gateway escalation. Mirrors the waymark.route field in the body.

RateLimit-Limitinteger

Request-rate budget: the maximum number of requests in the current window (the bucket capacity). The IETF-draft un-prefixed name; the legacy alias X-RateLimit-Limit carries the same value. Rides every response.

RateLimit-Remaininginteger

Request-rate budget: requests left in the current window. Legacy alias: X-RateLimit-Remaining.

RateLimit-Resetinteger

Request-rate budget: integer delta-seconds until the window fully refills (same unit as Retry-After). Legacy alias: X-RateLimit-Reset.

Response

The chat completion. A single JSON object by default; when the request set stream: true, a text/event-stream of server-sent events whose final data chunk before [DONE] carries the waymark usage object.

idstring
Unique identifier for the chat completion.
choiceslist of maps from strings to any
The list of completion choices, in OpenAI format.
objectstring

The object type, always chat.completion.

createdlong

Unix timestamp (seconds) of when the completion was created.

modelstring
The route that served the request.
usagemap from strings to any

Standard OpenAI token-usage totals for the request.

waymarkobject

Per-request routing and token breakdown. Reports the route taken, whether it escalated, and the input/output token counts for each upstream model that ran. Token counts only — no pricing or cost.

Errors

400
Bad Request Error
401
Unauthorized Error
402
Payment Required Error
403
Forbidden Error
429
Too Many Requests Error
500
Internal Server Error
502
Bad Gateway Error
503
Service Unavailable Error