Continuous batching
Continuous batching is an iteration-level scheduling discipline for autoregressive generation: a new request joins the in-flight batch as soon as its prefill completes, and a finished request frees its slot the same step it emits its end-of-sequence token.
The technique replaces the static-batch retirement barrier that coupled every request's completion time to its longest peer, which is why production engines report multi-fold throughput gains over the lockstep baseline.
For the longer treatment, see How engines work: continuous batching.