spotinference

Sign in with GitHub
Pricing

Signup is not open yet. The numbers on this page are illustrative pre-launch bands, published early so the cost structure behind them is visible before the first key is minted.

Most API pricing pages publish a rate and keep the margin private. This page publishes both sides: what the tokens cost us to serve, and the bands we expect to charge. A price you can audit is a price you can plan around.

What the tokens cost us

Our cost of goods runs roughly $0.30 to $0.70 per million output tokens on an H100-class tier at good utilisation, serving 4-bit quantised weights. The band comes from the same invoice-truth accounting that drives the economics page: provider billing snapshotted once per minute, tokens counted per request.

The realised figure, blended across the fleet over the trailing seven days, is published live on the economics page, and the exact SQL behind it is at /economics/method. When utilisation is poor the realised number rises above the band; the page shows it either way.

Illustrative price bands

Two tiers are planned. Both are illustrative until signup opens; neither is a quote.

TierInput, per MtokOutput, per MtokWhat it trades
Standard$1.50 to $2.00$3.00 to $4.00Holds the bounded wake budget on the reliability page
Spotaround $1.00around $2.00Trades the latency SLA for price; requests wait longer when spot capacity is reclaimed

The spot tier is cheaper because it runs on GPU capacity the provider can reclaim; the mechanics of that trade are explained at spot vs on-demand GPU. The standard tier keeps the wake-latency budget described on the reliability page: eight minutes measured, ten minutes hard cap.

When signup opens

Sign in with GitHub will mint an API key, and the endpoint speaks the standard OpenAI shape, so existing SDKs point at it unchanged. There is no payment form on this site today, and no card is collected before launch pricing is final. Watch this page; the bands become real rates when signup opens.