Rishika Kedia
IBM India Private Limited
Senior Technical Staff Member, Chief Product Owner OpenShift on IBM Z
Session
GPU costs are spiraling, yet clusters waste 30–40% of capacity due to static allocation. A GPU assigned to a pod sits idle between inference calls, model loading, startup and nobody else can use it. It gets worse when VM-based and containerized workloads run on separate clusters. The pool is siloed. No sharing, no reclaim, just waste.
This talk fixes that at the scheduling layer using two upstream Kubernetes projects i.e KubeVirt, which brings VM workloads under native Kubernetes scheduling, and Dynamic Resource Allocation (DRA), which replaces the rigid device plugin model with a flexible, claim-based API. Together they enable GPU sharing across VMs and containers on a single cluster.
We'll walk through real scheduling data, the DRA resource claim model, and how KubeVirt VM lifecycle integrates with DRA's structured parameter API. No theory-heavy slides. just the problem, the architecture, and what works.