Mitali Bhalla DevConf.IN 2025

Mitali Bhalla
.ical

Mitali Bhalla, Site Reliability Engineer -II at Red Hat, Inc

With 3+ years of experience managing and optimising complex Kubernetes and OpenShift clusters, I specialise in ensuring the reliability, scalability, and performance of infrastructure in dynamic environments. Passionate about site reliability engineering, I focus on automating processes, improving system uptime, and troubleshooting complex issues across large-scale cloud-native platforms.

Projects: github.com/MitaliBhalla

Company or affiliation –

Red Hat, Inc

Job title –

SRE - II

Session

03-01

13:40

15min

Streamlining Multi K8s Cluster Operations with Open Cluster Management (OCM)

Pratik Panda, Mitali Bhalla

Kubernetes adoption is surging, with 96% of organizations using it in some capacity, according to the CNCF. Companies like Spotify, Airbnb, and Shopify operate dozens, if not hundreds, of Kubernetes clusters to support their global applications. But managing multiple clusters isn’t just a technical feat—it’s a logistical challenge. Consider this: A large enterprise managing 100 clusters could have tens of thousands of nodes and millions of pods. Each cluster generates a flood of metrics, logs, and alerts that must be coordinated to ensure high availability and performance. Managing multiple clusters introduces new levels of complexity that traditional tools like Terraform and Ansible weren’t designed to handle. While these tools are effective for provisioning infrastructure, they fall short in addressing day-2 operations such as policy enforcement, cluster upgrades, and unified monitoring across multiple environments. Similarly, GitOps pipelines streamline application deployment but provide limited visibility into the overall health and governance of multiple clusters. Teams are often left without a single-pane-of-glass solution for managing configuration drift, enforcing security policies, or gaining visibility into workloads across clusters.

Why does this problem persist? Because multi-cluster Kubernetes, while powerful, introduces inherent complexities. Networking between clusters can suffer from latency, causing out-of-sync application instances. Kubernetes’ built-in security tools only apply within single clusters, requiring manual replication to ensure uniform enforcement. Monitoring tools must be deployed individually in each cluster, often resulting in fragmented observability and disjointed data correlation.

While solutions like Cluster API, ArgoCD, and KCP offer partial relief, they lack the holistic approach needed for full multi-cluster lifecycle management. This is where Open Cluster Management (OCM) shines. OCM provides a unified framework for managing multiple Kubernetes clusters efficiently. The talk will feature a live demo showcasing how OCM Hub can seamlessly manage two Kubernetes clusters. We’ll demonstrate how OCM automates lifecycle tasks, such as policy enforcement, while providing a centralized platform for monitoring, governance, and workload distribution. By intelligently correlating data from multiple clusters, OCM simplifies troubleshooting, minimizes latency issues, and ensures consistency across environments.

In this session, we’ll demonstrate OCM’s ability to manage two Kubernetes clusters seamlessly through a live demo. You’ll see how it automates critical tasks such as upgrades and policy enforcement, ensuring smooth operation even across dozens of clusters. OCM’s centralized monitoring provides correlated insights that drastically reduce downtime and troubleshooting complexity.
Whether someone is operating in hybrid, multi-cloud, or edge environments, this session would help gain practical insights into leveraging OCM to reduce operational complexity, enhance resilience, and streamline Kubernetes operations at scale.

Cloud, Edge, and Platform Technologies

Swami Vivekananda Auditorium (capacity 700)

Mitali Bhalla .ical

Session

Mitali Bhalla
.ical