DevConf.US 2025

Maryam Tahhan

Maryam is a Principal Engineer on the Emerging Tech team in the Office of the CTO at Red Hat. She is currently focused on integrating Triton-Lang into the Kubernetes ecosystem to enable more efficient and scalable machine learning deployments. More recently, she contributed to the Kepler project, advancing GPU metrics collection for sustainability-aware scheduling. A long-time open source contributor and leader, Maryam has deep roots in high-performance networking—she led the effort to adapt AF_XDP for cloud-native use cases, is a maintainer for both DPDK and CNDP, integrated DPDK into Open vSwitch, and led two projects within OPNFV (VSPERF and Barometer). She has also contributed to 5G Core and Fixed-Mobile Convergence (FMC) initiatives.


Job title

Principal Software Engineer

Company or affiliation

Red Hat


Session

09-19
13:40
35min
From Cold Start to Warp Speed: Triton Kernel Caching with OCI Container images
Maryam Tahhan, Alessandro Sangiorgi

Model startup latency is a persistent bottleneck for modern inference workloads, particularly when using custom kernels written in Triton that are Just In Time (JIT) compiled. In this talk, we’ll present a novel approach to speeding up model boot times by wrapping Triton kernel caches in OCI container images.
We’ll demo a working prototype that packages Triton-generated LLVM Kernels into reusable, portable container layers. These "hot start" containers can be deployed directly to Kubernetes, bypassing costly JIT compilation and significantly reducing model startup time.
Whether you're building ML infrastructure, working with OSS compilers, or deploying models at scale, this talk offers practical techniques to optimise cold starts for Models using Triton-lang.

Artificial Intelligence and Data Science
Ladd Room (Capacity 96)