DevConf.CZ 2025

Marc Tuduri

Marc Tuduri is Prometheus contributor, OpenTelemetry member and Software Engineer at Grafana.


Company or affiliation

Grafana

Job title

Staff Software Engineer


Sessions

06-14
10:15
35min
From PIDs to Pods: the life cycle of an eBPF-autoinstrumented Kubernetes application
Marc Tuduri

eBPF allows to safely attach small programs in the Linux Kernel and inspect the runtime memory of the Kernel and userspace programs at runtime. This opens a wide range of possibilities for observability applications. However, the low-level approach of eBPF often makes it difficult to match the inspected data with high-level concepts, such as the entities in your Kubernetes cluster.
This talk describes our journey to make Kubernetes a first-class citizen in Grafana Beyla, our eBPF-based instrumentation tool. From a hacker perspective, we will describe how we did to match the low-level abstractions from eBPF with the high-level Kubernetes information, in order to provide a unified experience by fuzzing the barriers between application, platform and infrastructure. We designed Beyla to be ubiquitous, so it can run as a simple operating-system-level process that internally understands about processes. But we wanted our Kubernetes users to keep talking about Pods, Deployments, and Services, so they don’t have to change their mindset nor have to dig into the lower-level constructions of the operating system to keep using Beyla.

DevOps and Automation
E104 (capacity 72)
06-14
14:45
35min
Auto-instrumentation for GPU performance using eBPF
Marc Tuduri, Dominik Süß

Modern AI workloads rely on large GPU fleets whose efficient utilisation is crucial due to high costs. However, gathering telemetry from these workloads to optimise performance is challenging because it requires manual instrumentation and adds performance overheads. Further, it does not produce telemetry in a standardised format for commonly used visualisation tools like Prometheus.

This talk explores the potential of leveraging eBPF to capture CUDA calls made to GPUs, including kernel launches and memory allocations. Data from these probes can be used to export Prometheus metrics, facilitating detailed analysis of kernel launch patterns and associated memory usage. This approach offers significant benefits as eBPF imposes minimal overhead and requires no intrusive instrumentation. Our implementation is also open-source and available on GitHub.

Cloud, Hybrid Cloud, and Hyperscale Infrastructure
D0206 (capacity 154)