DevConf.CZ 2025

From spreadsheet scheduling to Kubernetes: building an on-premise ML platform
2025-06-14 , E104 (capacity 72)

We transitioned from managing individual Ubuntu servers to an on-prem Kubernetes platform built entirely on open-source technologies to support machine learning workloads. This shift replaced spreadsheet for GPU allocation with team-based managed scheduling, provided a unified development environment, and improved system observability with long-term metrics storage. Real-time insights and visualization dashboards now help users make informed decisions at a glance. In this session, we’ll share key lessons, challenges, and our future plans to enhance performance and UX.


What level of experience should the audience have to best understand your session?

Intermediate - attendees should be familiar with the subject