spotinference

Sign in with GitHub
Research

Original telemetry articles built on the fleet's own measurements, followed by a short, opinionated bibliography on LLM inference serving, GPU memory management, and the production-engineering style that shapes systems in this space. Each link leads to a deep page.

Measured telemetry

Articles that state their methodology, carry the fleet's measured numbers, and mark every pending measurement as pending instead of estimating it.

Serving systems and the KV cache
Post-training quantisation
Decoding strategies
Engineering practice
Hardware datasheets