DevConf.IN 2026

Scaling Generative AI Inference with llm-d
2026-02-13 , VYAS - 1 - Room#VY124

Generative AI models are rapidly changing the landscape of application development, but deploying and serving these large models in production at scale presents significant challenges. llm-d is an open-source, Kubernetes-native distributed inference serving stack designed to address these complexities. This session will introduce developers to llm-d, demonstrating how it provides "well-lit paths" to serve large generative AI models with the fastest time-to-value and competitive performance across diverse hardware accelerators. Attendees will learn about llm-d's architecture, key features, and how to leverage its tested and benchmarked recipes for production deployments, focusing on practical applications and best practices.


What level of experience should the audience have to best understand your session?: Intermediate - attendees should be familiar with the subject

A Senior Principal Technical Support Engineer Engineer with 17 years of experience in Java Enterprise middleware and Red Hat AI (RHAIIS, RHELAI, RHOAI) products.

Proven expertise in the design, implementation, and troubleshooting of distributed enterprise systems on Kubernetes (Red Hat OpenShift), Linux containers (Podman, Docker), and standalone Linux servers (RHEL 7/8/9/10).

Possesses core proficiency in Java and Python programming, with hands-on experience in a wide array of Java middleware and web server technologies.

Recently developed strong expertise in Python programming for AI, focusing on Generative AI, RAG, AI Agents, LLM fine-tuning, and inference using cutting-edge open-source frameworks (vLLM, llm-d, llm compressor) and Red Hat AI (RHAIIS, RHELAI, RHOAI) platforms.