Pipeline parallelism (PP)
Pipeline parallelism splits a deep network along the layer dimension into sequential stages, places each stage on a different GPU, and streams activations forward so that several micro-batches occupy the pipeline at once.
The cross-stage hop is a point-to-point send rather than a collective, which is why PP tolerates slow interconnects that tensor parallelism cannot, at the cost of a fill-and-drain bubble that only amortises when the in-flight micro-batch count is large.
For the longer treatment, see How engines work: pipeline parallelism.