Cold start
Cold start is the elapsed wall-clock time between a fresh inference process beginning to initialise and that same process serving its first request, dominated by the disk-to-VRAM transfer of multi-gigabyte model weights.
The interval upper-bounds the latency contract for any scale-to-zero deployment, which is why an LLM cold start two or three orders of magnitude larger than a CPU function cold start reshapes the operational regime around hibernation, warm pools, and persistent weight caches.
For the longer treatment, see Reliability: cold-start anatomy.