An OpenAI-compatible API is an HTTP interface that reproduces OpenAI's request and response contract, principally POST /v1/chat/completions, so SDKs, agent frameworks, and tools built against OpenAI work unchanged once the base URL and key are swapped. It has become the de-facto interoperability contract of the open-weight inference ecosystem.

OpenAI-compatible API

OpenAI's HTTP API became the lingua franca of LLM serving the same way S3's did for object storage: enough clients hardcoded its shapes that every alternative server found it cheaper to speak the dialect than to teach the world a new one.

The contract surface

Compatibility means more than the URL path. The full surface includes the chat-completions request schema, server-sent-event streaming with delta chunks and a terminal [DONE] frame, the tools and tool_calls function-calling shapes, structured-output response formats, usage accounting fields, and the error envelope. Client SDKs assume all of it silently, so a server is only as compatible as its least faithful corner.

What a gateway may add

A proxy in front of the engine can add authentication, routing across tiers, wake-on-demand, and per-request usage logging, but it must not add perceptible latency. spotinference's gateway budgets under 5 milliseconds of added P95 latency over a direct vLLM call; the practical effect is that pointing an existing OpenAI SDK at the endpoint changes the bill, not the behaviour.

For the concrete contract this site serves, see API: chat completions and streaming.

Cost and reliability implications

A compatibility layer earns its place by staying out of the request path: spotinference's gateway budgets under 5 milliseconds of added P95 latency over a direct vLLM call, so repointing an existing SDK costs effectively nothing per request. The reliability surface is the contract itself; error shapes, streaming framing, and tool-call syntax must match what client libraries silently assume, because every divergence surfaces as a customer-visible parsing failure.

Part of OpenAI-compatible integration on the learn hub.

See also

References

OpenAI API reference: Chat Completions. The de-facto contract: request fields, streaming chunk framing, and tool_calls response shape that compatible servers reproduce.
vLLM: OpenAI-Compatible Server documentation. The reference open-source implementation of the contract, including the documented points where compatibility is partial.

The techniques in these pages run in production behind spotinference's OpenAI-compatible endpoint. Get a key and try it: swap the base URL and the key in an existing SDK, and the first request streams back tokens.