What we pay, what you'd pay, and why we publish both.

Most LLM API pricing pages give you a per-token rate and skip the rest. That rate is one input into your bill. The others, duty cycle, cold starts, cache hits, hidden reasoning tokens, output-vs-input ratio, sit behind the API. You see the receipt, not the formula.

spotinference runs the inverse. We rent GPUs from Hyperstack by the minute, snap the billing API into a local table once per minute, and log prompt + completion tokens per request. Every dollar we charge is within rounding of a line item on a real Hyperstack invoice. Every token we count came out of a real vLLM response.

Our realized cost per million tokens

$1.93 per million tokens.

Trailing seven days, blended across our fleet. Updated daily.

That number includes idle time. It includes cold starts. It includes the persistent disk that holds our model weights so we don't re-download seventy gigabytes every wake. It is not a best-case-at-full-utilization number. It is what we paid divided by what we served.

How the hosted comparison works

For the same workload, a 70B-class open model with a similar input and output mix, published list prices from major hosted APIs run roughly $5 to $30 per million tokens once you blend input, output, and any reasoning tokens. Open-model serverless endpoints sit closer to $1. Our number is what we measured, not what we'd quote in a deck.

What this doesn't claim

The number assumes our duty cycle. If your traffic shape is wildly different, your realized cost will be different.
It excludes our own time. A solo operator's hours don't show up on the Hyperstack invoice.
Hosted APIs are not lying. Their per-token rates have been falling roughly tenfold per year at fixed quality. If a frontier lab cuts its premium tier in half tomorrow, our advantage shrinks. Watch this page; we'll update.

The honest pitch

We don't think hosted APIs are about to "rug pull" anyone. We do think a small operator running vLLM on Hyperstack with invoice-truth billing can publish a number that no hosted API will publish, and that some customers, high-volume, cost-sensitive, OpenAI-shape compatible, find that useful.

If you want to watch the number move over time, bookmark this page. The SQL behind the figure lives at /economics/method, including both queries verbatim.