BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.devconf.info//devconf-us-2025//talk//7XAFY8
BEGIN:VTIMEZONE
TZID:EST
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10;UNTIL=20061029T070000Z
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:STANDARD
DTSTART:20071104T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000402T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4;UNTIL=20060402T080000Z
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:DAYLIGHT
DTSTART:20070311T030000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-devconf-us-2025-7XAFY8@pretalx.devconf.info
DTSTART;TZID=EST:20250920T145000
DTEND;TZID=EST:20250920T152500
DESCRIPTION:Modern LLM applications demand reliable\, reproducible performa
 nce numbers that reflect real-world serving conditions. This tutorial-styl
 e presentation walks attendees through every step required to collect mean
 ingful inference benchmarks on consumer or datacenter NVIDIA GPUs using an
  entirely open-source stack on Fedora. Beginning with enabling RPM Fusion 
 and installing the akmod-nvidia driver\, we show how to validate hardware 
 visibility with nvidia-smi\, then layer Podman 5.x and the NVIDIA Containe
 r Toolkit’s Container Device Interface to obtain rootless GPU access. We
  next demonstrate pulling the lightweight vLLM inference image\, mounting 
 a locally cached TinyLlama model downloaded via the Hugging Face CLI\, and
  exposing an OpenAI-compatible HTTP endpoint. Finally\, we introduce Guide
 LLM\, an automated load-generation tool that sweeps request rates\, captur
 es latency buckets\, throughput ceilings\, and token-per-second statistics
 \, and writes structured JSON for downstream analysis. Live demos illustra
 te common pitfalls and give attendees troubleshooting checklists that tran
 sfer directly to any Red Hat-derived distribution. Participants will leave
  with a turnkey recipe they can adapt to larger models\, multi-GPU nodes a
 nd a clear understanding of how configuration choices cascade into benchma
 rk accuracy. No prior container\, CUDA\, or benchmarking experience is ass
 umed. Attendees also receive sample scripts and links for immediate hands-
 on replication today.
DTSTAMP:20260315T082848Z
LOCATION:Ladd Room (Capacity 170)
SUMMARY:Learn How to Run an LLM Inference Performance Benchmark on NVIDIA G
 PUs - from soup to nuts. - Ashish Kamra\, David Gray\, Samuel Monson
URL:https://pretalx.devconf.info/devconf-us-2025/talk/7XAFY8/
END:VEVENT
END:VCALENDAR
