Alessandro Sangiorgi
Alessandro Sangiorgi is a Software Engineer in the Emerging Technologies Group within the Office of the CTO at Red Hat. He has extensive experience across Cloud, Distributed Systems, AI, and Networking products and technologies.
Software Engineer
Company or affiliation –Red hat
Session
Model startup latency is a persistent bottleneck for modern inference workloads, particularly when using custom kernels written in Triton that are Just In Time (JIT) compiled. In this talk, we’ll present a novel approach to speeding up model boot times by wrapping Triton kernel caches in OCI container images.
We’ll demo a working prototype that packages Triton-generated LLVM Kernels into reusable, portable container layers. These "hot start" containers can be deployed directly to Kubernetes, bypassing costly JIT compilation and significantly reducing model startup time.
Whether you're building ML infrastructure, working with OSS compilers, or deploying models at scale, this talk offers practical techniques to optimise cold starts for Models using Triton-lang.