Shamsher Ansari
I build Products that power AI at Scale.
As a Group Product Manager, I lead strategy for GPU cloud platforms designed for LLM inference, training, and fine-tuning in real production environments.
My focus is on:
- Turning GPUs into reliable AI platforms
- Scaling inference efficiently and responsibly
- Lowering cost and complexity of AI infrastructure
I care deeply about making advanced AI usable beyond big labs, so startups and enterprises can build, deploy, and scale AI without friction.
Session
GPUs are the backbone of modern AI and cloud workloads. But in reality, many GPUs sit idle most of the time. Even in well-run data centers, a large part of GPU capacity goes unused, which increases costs and slows teams down.
In this talk, we’ll break down why GPU utilization is so low and what you can do about it.
We’ll start with the basics, how GPUs are used today and where things go wrong. You’ll learn about common problems like uneven workloads, inefficient scheduling, limited visibility into GPU usage, and mismatches between hardware and software.
Next, we’ll walk through practical solutions. This includes GPU sharing, right-sizing workloads, better scheduling, and using the right monitoring tools. The focus will be on approaches you can actually apply in real systems.
We’ll also share real-world lessons from building a GPU-as-a-Service (GPUaaS) platform, covering features like model checkpointing, job preemption and resume, and queue-based scheduling with open-source tools such as Kueue to improve GPU efficiency.
By the end of the session, you’ll have a clear understanding of how to use GPUs more efficiently in AI, ML, and cloud environments, without unnecessary complexity.