vLLM (UC Berkeley Sky Lab, 2023)
The open-source serving engine that shipped alongside the PagedAttention SOSP 2023 paper, productionised Orca-style continuous batching in the same scheduler, and became the default benchmark target plus the most-deployed open-source OpenAI-compatible chat-completions server in 2026.
Chunked prefill, multi-backend coverage (CUDA, ROCm, TPU), and a rich quantisation matrix (FP8, AWQ, GPTQ, compressed-tensors) make vLLM the reference implementation against which new engines are measured.
For the longer treatment in narrative context, see How engines work: vLLM as the reference implementation.