KV cache
The KV cache is the per-sequence store of Key and Value attention tensors retained between decode steps so each new token costs one forward pass rather than a recomputation over the entire prior context.
Cache memory, not arithmetic, sets the ceiling on how many sequences fit on a GPU concurrently, which makes KV occupancy the binding constraint on serving throughput.
For the longer treatment, see How engines work: the KV cache.