DevConf.US 2025

Cloud-Native Model Serving: vLLM's Lifecycle in Kubernetes
2025-09-20 , 101 (Capacity 48)

Effectively deploying Large Language Models (LLMs) in Kubernetes is critical for modern AI workloads, and vLLM has emerged as a leading open-source project for LLM inference serving. This session will explore the unique features of vLLM, which set it apart by maximizing throughput and minimizing resource usage. We’ll explore the lifecycle of deploying AI/LLM workloads on Kubernetes, focusing on achieving seamless containerization, efficient scaling with Kubernetes-native tools, and robust monitoring to ensure reliable operations.

By simplifying complex workloads and optimizing performance, vLLM drives innovation in scalable and efficient LLM deployment by leveraging features like dynamic batching and distributed serving, making advanced inference accessible for diverse and demanding use cases. Join us to learn why vLLM is shaping the future of LLM serving and how it integrates into Kubernetes to deliver reliable, cost-effective, and high-performance AI systems.


What level of experience should the audience have to best understand your session?

Intermediate - attendees should be familiar with the subject

Cedric Clyburn (@cedricclyburn), Senior Developer Advocate at Red Hat, is an enthusiastic software technologist with a background in Kubernetes, DevOps, and container tools. He has experience speaking and organizing conferences including DevNexus, WeAreDevelopers, The Linux Foundation, KCD NYC, and more. Cedric loves all things open-source, and works to make developer's lives easier! Based out of New York.

This speaker also appears in:

Roberto is a Principal AI Architect working in the AI Business Unit specializing in Container Orchestration Platforms (OpenShift & Kubernetes), AI/ML, DevSecOps, and CI/CD. With over 10 years of experience in system administration, cloud infrastructure, and AI/ML, he holds two MSc degrees in Telco Engineering and AI/ML.