BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.devconf.info//devconf-cz-2026//talk//CSGDXU
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-devconf-cz-2026-CSGDXU@pretalx.devconf.info
DTSTART;TZID=CET:20260619T153000
DTEND;TZID=CET:20260619T160500
DESCRIPTION:Mixture of Experts (MoE) architectures trade dense computation 
 for conditional sparsity activating only a subset of experts per input tok
 en. But this sparsity doesn't come for free: dynamic routing decisions\, i
 rregular memory access\, and excessive kernel launches can quietly undermi
 ne performance. This talk covers optimization strategies for MoE inference
  using PyTorch 2.x's compilation stack alongside vLLM's serving framework.
  We will show Profiler traces to illustrate four key areas for optimizatio
 n: kernel fusions\, FX graph optimizations\, memory layout optimization\, 
 and dynamic shape specialization for variable batch sizes. After that will
  discuss how to extract insights from profiler data mapping kernel timelin
 es to specific fusion passes\, identifying memory-bound vs. compute-bound 
 expert execution\, and validating that compiled MoE forward passes maintai
 n batch size flexibility without guard-induced recompilation.
DTSTAMP:20260430T124913Z
LOCATION:D105 (capacity 300)
SUMMARY:Inside MoE Optimization: A Profiler-Guided Tour of torch.compile an
 d vLLM - Parshant Sharma\, Ayush Satyam
URL:https://pretalx.devconf.info/devconf-cz-2026/talk/CSGDXU/
END:VEVENT
END:VCALENDAR
