OpenAI-compatible API
OpenAI's HTTP API became the lingua franca of LLM serving the same way S3's did for object storage: enough clients hardcoded its shapes that every alternative server found it cheaper to speak the dialect than to teach the world a new one.
The contract surface
Compatibility means more than the URL path. The full surface includes the chat-completions request schema, server-sent-event streaming with delta chunks and a terminal [DONE] frame, the tools and tool_calls function-calling shapes, structured-output response formats, usage accounting fields, and the error envelope. Client SDKs assume all of it silently, so a server is only as compatible as its least faithful corner.
What a gateway may add
A proxy in front of the engine can add authentication, routing across tiers, wake-on-demand, and per-request usage logging, but it must not add perceptible latency. spotinference's gateway budgets under 5 milliseconds of added P95 latency over a direct vLLM call; the practical effect is that pointing an existing OpenAI SDK at the endpoint changes the bill, not the behaviour.
For the concrete contract this site serves, see API: chat completions and streaming.