Orca: continuous batching (Yu et al., OSDI 2022)
The OSDI 2022 paper that named and validated iteration-level scheduling plus selective batching for transformer inference, reporting throughput gains from 1.4x to 23x over the FasterTransformer baseline on GPT-3-class models.
Every production serving engine in 2026 schedules at the iteration boundary and decouples the attention operator from the shape-uniform path; the design traces back to this paper.
For the longer treatment in narrative context, see How engines work: continuous batching.