DevConf.US 2025

Roberto Carratalá

Roberto is a Principal AI Architect working in the AI Business Unit specializing in Container Orchestration Platforms (OpenShift & Kubernetes), AI/ML, DevSecOps, and CI/CD. With over 10 years of experience in system administration, cloud infrastructure, and AI/ML, he holds two MSc degrees in Telco Engineering and AI/ML.


Job title

Principal AI Platform Architect

Company or affiliation

Red Hat


Session

09-20
11:00
35min
Cloud-Native Model Serving: vLLM's Lifecycle in Kubernetes
Cedric Clyburn, Roberto Carratalá

Effectively deploying Large Language Models (LLMs) in Kubernetes is critical for modern AI workloads, and vLLM has emerged as a leading open-source project for LLM inference serving. This session will explore the unique features of vLLM, which set it apart by maximizing throughput and minimizing resource usage. We’ll explore the lifecycle of deploying AI/LLM workloads on Kubernetes, focusing on achieving seamless containerization, efficient scaling with Kubernetes-native tools, and robust monitoring to ensure reliable operations.

By simplifying complex workloads and optimizing performance, vLLM drives innovation in scalable and efficient LLM deployment by leveraging features like dynamic batching and distributed serving, making advanced inference accessible for diverse and demanding use cases. Join us to learn why vLLM is shaping the future of LLM serving and how it integrates into Kubernetes to deliver reliable, cost-effective, and high-performance AI systems.

Cloud, Hybrid Cloud, and Hyperscale Infrastructure
101 (Capacity 48)