Parshant Sharma DevConf.CZ 2026

Parshant Sharma
.ical

Associate Machine Learning Engineer, PyTorch Engineering Team, Red Hat

Parshant is an Associate ML Engineer at Red Hat and a Gold Medalist in his Master’s in CSE with AIML specialization. He has authored four SCOPUS-indexed research papers in AI/ML. At Red Hat, he contributes to upstream open-source projects like PyTorch. He also has hands-on experience in AI compilers and works with open-source compiler frameworks like LLVM and MLIR, bridging ML workloads with systems-level optimization.

Company or affiliation:

Red Hat

Job title:

Associate Machine Learning Engineer

Session

06-19

15:30

35min

Inside MoE Optimization: A Profiler-Guided Tour of torch.compile and vLLM

Parshant Sharma, Ayush Satyam

Mixture of Experts (MoE) architectures trade dense computation for conditional sparsity activating only a subset of experts per input token. But this sparsity doesn't come for free: dynamic routing decisions, irregular memory access, and excessive kernel launches can quietly undermine performance. This talk covers optimization strategies for MoE inference using PyTorch 2.x's compilation stack alongside vLLM's serving framework. We will show Profiler traces to illustrate four key areas for optimization: kernel fusions, FX graph optimizations, memory layout optimization, and dynamic shape specialization for variable batch sizes. After that will discuss how to extract insights from profiler data mapping kernel timelines to specific fusion passes, identifying memory-bound vs. compute-bound expert execution, and validating that compiled MoE forward passes maintain batch size flexibility without guard-induced recompilation.

Artificial Intelligence and Data Science

D105 (capacity 300)

Parshant Sharma .ical

Session

Parshant Sharma
.ical