llm-d: Kubernetes Native Distributed Inferencing DevConf.US 2025

llm-d: Kubernetes Native Distributed Inferencing
.ical

2025-09-19 09:15–09:50, Ladd Room (Capacity 170)

llm-d is a well-lit path for anyone to serve LLMs at scale, for any model across a diverse and comprehensive set of hardware accelerators. Come learn more about how llm-d enables distributed inference at scale!

What level of experience should the audience have to best understand your session? –

Beginner - no experience needed

See also:

Matrix Chat and YouTube Stream
YouTube Stream Only

Robert Shaw

Robert is a director of engineering at Red Hat. Before joining Red Hat, Robert was senior director of engineering at Neural Magic. He is a core committer to vLLM and a maintainer of lllm-d.

llm-d: Kubernetes Native Distributed Inferencing .ical 2025-09-19 09:15–09:50, Ladd Room (Capacity 170)

llm-d: Kubernetes Native Distributed Inferencing
.ical

2025-09-19 09:15–09:50, Ladd Room (Capacity 170)