DevConf.CZ 2026

Parshant Sharma

Associate Machine Learning Engineer, PyTorch Engineering Team, Red Hat

Parshant is an Associate ML Engineer at Red Hat and a Gold Medalist in his Master’s in CSE with AIML specialization. He has authored four SCOPUS-indexed research papers in AI/ML. At Red Hat, he contributes to upstream open-source projects like PyTorch. He also has hands-on experience in AI compilers and works with open-source compiler frameworks like LLVM and MLIR, bridging ML workloads with systems-level optimization.


Company or affiliation:

Red Hat

Job title:

Associate Machine Learning Engineer


Session

06-19
15:30
35min
Inside MoE Optimization: A Profiler-Guided Tour of torch.compile and vLLM
Parshant Sharma, Ayush Satyam

Mixture of Experts (MoE) architectures trade dense computation for conditional sparsity activating only a subset of experts per input token. But this sparsity doesn't come for free: dynamic routing decisions, irregular memory access, and excessive kernel launches can quietly undermine performance. This talk covers optimization strategies for MoE inference using PyTorch 2.x's compilation stack alongside vLLM's serving framework. We will show Profiler traces to illustrate four key areas for optimization: kernel fusions, FX graph optimizations, memory layout optimization, and dynamic shape specialization for variable batch sizes. After that will discuss how to extract insights from profiler data mapping kernel timelines to specific fusion passes, identifying memory-bound vs. compute-bound expert execution, and validating that compiled MoE forward passes maintain batch size flexibility without guard-induced recompilation.

Artificial Intelligence and Data Science
D105 (capacity 300)