Ayush Satyam DevConf.CZ 2026

Ayush Satyam
.ical

Associate Machine learning engineer working at Red Hat. Active contributor of Pytorch, vLLM and maintainer of projects like DiceDB.

Company or affiliation:

Red Hat

Job title:

Associate Machine learning Engineer

Session

06-19

15:30

35min

Inside MoE Optimization: A Profiler-Guided Tour of torch.compile and vLLM

Parshant Sharma, Ayush Satyam

Mixture of Experts (MoE) architectures trade dense computation for conditional sparsity activating only a subset of experts per input token. But this sparsity doesn't come for free: dynamic routing decisions, irregular memory access, and excessive kernel launches can quietly undermine performance. This talk covers optimization strategies for MoE inference using PyTorch 2.x's compilation stack alongside vLLM's serving framework. We will show Profiler traces to illustrate four key areas for optimization: kernel fusions, FX graph optimizations, memory layout optimization, and dynamic shape specialization for variable batch sizes. After that will discuss how to extract insights from profiler data mapping kernel timelines to specific fusion passes, identifying memory-bound vs. compute-bound expert execution, and validating that compiled MoE forward passes maintain batch size flexibility without guard-induced recompilation.

Artificial Intelligence and Data Science

D105 (capacity 300)

Ayush Satyam .ical

Session

Ayush Satyam
.ical