OpenAI-compatible integration

An OpenAI-compatible endpoint means existing SDKs, agent frameworks, and tools work after swapping the base URL and the key; nothing else changes. These pages cover the contract itself: the chat completions shape, SSE streaming, tool calls, and structured output.

An OpenAI-compatible endpoint means existing SDKs, agent frameworks, and tools work after swapping the base URL and the key; nothing else changes. The value is the absence of migration: the request shape, the streaming framing, and the tool-call envelope are the ones the ecosystem already speaks.

The pages below cover the contract itself, from the chat completions shape and Server-Sent Events streaming through tool calls and structured output, plus the reference material that supports building against it. The same contract is what this service exposes, so the documentation doubles as the integration guide.

This pillar collects every page on the topic. Each one below opens with the answer; follow a link for the full treatment, or use the rail to cross into a neighbouring pillar.

The OpenAI-compatible chat completions API

How the shape became the standard, and what the contract looks like. Read more.

Reference: glossary and bibliography

The vocabulary primers and paper citations that support building against the contract. Read more.

Structured output

Server-enforced response shape matching a caller-supplied JSON Schema. Read more.

Tool calls (function calling)

Model returns structured function invocations instead of free text. Read more.

OpenAI-compatible API

The drop-in /v1/chat/completions contract for swapping inference providers. Read more.

Context window

The shared token budget a request's prompt, history, tools, and answer all draw from. Read more.

Migrating from OpenAI to an OpenAI-compatible API

Swap base_url and the key, keep the SDK; audit model names, token counts, defaults, and the cold path. Read more.

Retries, timeouts, and backoff for LLM API calls

Three timeouts, a failure taxonomy decided before retrying, and jittered backoff sized to token budgets. Read more.

Streaming LLM responses over SSE

The wire format, the parsing rules that survive production, and the middleboxes that buffer streams back into batch. Read more.

Structured output and tool calling in practice

Declare the shape and let the server enforce it; the failures that remain are schema design and truncation. Read more.

Every page in this pillar describes the system running behind one endpoint. Point an OpenAI SDK at spotinference when ready: swap the base URL and the key, and the first request answers from the same fleet these pages measure.