2025-06-14 –, E104 (capacity 72)
We transitioned from managing individual Ubuntu servers to an on-prem Kubernetes platform built entirely on open-source technologies to support machine learning workloads. This shift replaced spreadsheet for GPU allocation with team-based managed scheduling, provided a unified development environment, and improved system observability with long-term metrics storage. Real-time insights and visualization dashboards now help users make informed decisions at a glance. In this session, we’ll share key lessons, challenges, and our future plans to enhance performance and UX.
Intermediate - attendees should be familiar with the subject
FOOBAR