Rehan Samaratunga DevConf.US 2025

Rehan Samaratunga
.ical

At Red Hat I was a SWE Intern working for the Performance and Scale for AI Platforms team (PSAP)

Job title –

student

Company or affiliation –

Boston University

Session

My auto-tuning project aims to find the best settings for running large language models using vLLM. We want to maximize the number of output tokens / second (throughput). At the same time, we need to minimize the latency. Specifically we will ensure that the p95 latency is faster than the set baseline (default parameters). This involves testing different parameter configurations for supported models like Qwen3-32B-FP8 and Qwen3-30B-A3B-FP8.

Intern Showcase

Hewitt Boardroom (Capacity 35)

Rehan Samaratunga .ical

Session

Rehan Samaratunga
.ical