DevConf.US 2025

Samuel Monson

Samuel Monson is a Software Engineer working at Red Hat, specializing in performance testing for Large Language Models and AI platforms. His areas of interest include Linux, Kubernetes, parallel computing, and differential mathematics. He obtained bachelor's degrees in Computer Science and Mathematics, as well as a Master's degree in Computer Science, all from Seattle University.


Job title

Software Performance Engineer

Company or affiliation

Red Hat


Session

09-20
14:50
35min
Learn How to Run an LLM Inference Performance Benchmark on NVIDIA GPUs - from soup to nuts.
Ashish Kamra, David Gray, Samuel Monson

Modern LLM applications demand reliable, reproducible performance numbers that reflect real-world serving conditions. This tutorial-style presentation walks attendees through every step required to collect meaningful inference benchmarks on consumer or datacenter NVIDIA GPUs using an entirely open-source stack on Fedora. Beginning with enabling RPM Fusion and installing the akmod-nvidia driver, we show how to validate hardware visibility with nvidia-smi, then layer Podman 5.x and the NVIDIA Container Toolkit’s Container Device Interface to obtain rootless GPU access. We next demonstrate pulling the lightweight vLLM inference image, mounting a locally cached TinyLlama model downloaded via the Hugging Face CLI, and exposing an OpenAI-compatible HTTP endpoint. Finally, we introduce GuideLLM, an automated load-generation tool that sweeps request rates, captures latency buckets, throughput ceilings, and token-per-second statistics, and writes structured JSON for downstream analysis. Live demos illustrate common pitfalls and give attendees troubleshooting checklists that transfer directly to any Red Hat-derived distribution. Participants will leave with a turnkey recipe they can adapt to larger models, multi-GPU nodes and a clear understanding of how configuration choices cascade into benchmark accuracy. No prior container, CUDA, or benchmarking experience is assumed. Attendees also receive sample scripts and links for immediate hands-on replication today.

Artificial Intelligence and Data Science
Ladd Room (Capacity 96)