DevConf.US 2025

From Cold Start to Warp Speed: Triton Kernel Caching with OCI Container images
2025-09-19 , Ladd Room (Capacity 170)

Model startup latency is a persistent bottleneck for modern inference workloads, particularly when using custom kernels written in Triton that are Just In Time (JIT) compiled. In this talk, we’ll present a novel approach to speeding up model boot times by wrapping Triton kernel caches in OCI container images.
We’ll demo a working prototype that packages Triton-generated LLVM Kernels into reusable, portable container layers. These "hot start" containers can be deployed directly to Kubernetes, bypassing costly JIT compilation and significantly reducing model startup time.
Whether you're building ML infrastructure, working with OSS compilers, or deploying models at scale, this talk offers practical techniques to optimise cold starts for Models using Triton-lang.


What level of experience should the audience have to best understand your session?

Beginner - no experience needed

See also:

Maryam is a Principal Engineer on the Emerging Tech team in the Office of the CTO at Red Hat. She is currently focused on integrating Triton-Lang into the Kubernetes ecosystem to enable more efficient and scalable machine learning deployments. More recently, she contributed to the Kepler project, advancing GPU metrics collection for sustainability-aware scheduling. A long-time open source contributor and leader, Maryam has deep roots in high-performance networking—she led the effort to adapt AF_XDP for cloud-native use cases, is a maintainer for both DPDK and CNDP, integrated DPDK into Open vSwitch, and led two projects within OPNFV (VSPERF and Barometer). She has also contributed to 5G Core and Fixed-Mobile Convergence (FMC) initiatives.

Alessandro Sangiorgi is a Software Engineer in Red Hat’s Emerging Technologies (Office of the CTO), building GPU-kernel tooling and AI performance infrastructure, including Model Cache Manager and related utilities.
In his free time, he also leads Sangiorgi SRL, a small software company based in Italy whose products - led by WiFi WPS WPA Tester - have surpassed 160M downloads, placing it among Italy’s top publishers.
He holds M.S. degrees in Computer Science (USA) and Computer Engineering (Italy), with publications on securing 802.11 networks using eBPF/XDP.