DevConf.US 2025

Alessandro Sangiorgi

Alessandro Sangiorgi is a Software Engineer in Red Hat’s Emerging Technologies (Office of the CTO), building GPU-kernel tooling and AI performance infrastructure, including Model Cache Manager and related utilities.
In his free time, he also leads Sangiorgi SRL, a small software company based in Italy whose products - led by WiFi WPS WPA Tester - have surpassed 160M downloads, placing it among Italy’s top publishers.
He holds M.S. degrees in Computer Science (USA) and Computer Engineering (Italy), with publications on securing 802.11 networks using eBPF/XDP.


Job title

Software Engineer

Company or affiliation

Red hat


Session

09-19
13:40
35min
From Cold Start to Warp Speed: Triton Kernel Caching with OCI Container images
Maryam Tahhan, Alessandro Sangiorgi

Model startup latency is a persistent bottleneck for modern inference workloads, particularly when using custom kernels written in Triton that are Just In Time (JIT) compiled. In this talk, we’ll present a novel approach to speeding up model boot times by wrapping Triton kernel caches in OCI container images.
We’ll demo a working prototype that packages Triton-generated LLVM Kernels into reusable, portable container layers. These "hot start" containers can be deployed directly to Kubernetes, bypassing costly JIT compilation and significantly reducing model startup time.
Whether you're building ML infrastructure, working with OSS compilers, or deploying models at scale, this talk offers practical techniques to optimise cold starts for Models using Triton-lang.

Artificial Intelligence and Data Science
Ladd Room (Capacity 170)