DevConf.CZ 2025

Dominik Süß

Dominik started his journey in technology as an SRE, working on projects ranging from warehouse logistics and photobook designers to analyzing satellite imagery. During this time, he discovered his passion for developer tooling and making sure developers can focus on what they do best - build great software!

Now he is working as a Developer Experience Engineer at Grafana Labs, building tools to see clearly in the ever-changing world of software.


Company or affiliation

Grafana Labs

Job title

Developer Experience Engineer


Sessions

06-13
15:30
35min
License to Observe - Why observability solutions need agents
Dominik Süß

Even though most observability solutions stay out of your way after deploying them, you will most likely still need to get your data to them somehow. Many projects recommend a so-called “Agent” or “Collector”. But what exactly is an agent, and why do I need one?

This talk covers the design decisions behind observability agents, the open protocols used by prominent examples, and the different ways they work together. On top, these agents also need to be deployed and configured. Container orchestration solutions simplify this a lot but provides their own challenges.

After this talk, you will be able to reason about different deployment architectures and know which methods of gathering observability data to choose for specific situations.

Application and Services Development
D0206 (capacity 154)
06-14
14:45
35min
Auto-instrumentation for GPU performance using eBPF
Marc Tuduri, Dominik Süß

Modern AI workloads rely on large GPU fleets whose efficient utilisation is crucial due to high costs. However, gathering telemetry from these workloads to optimise performance is challenging because it requires manual instrumentation and adds performance overheads. Further, it does not produce telemetry in a standardised format for commonly used visualisation tools like Prometheus.

This talk explores the potential of leveraging eBPF to capture CUDA calls made to GPUs, including kernel launches and memory allocations. Data from these probes can be used to export Prometheus metrics, facilitating detailed analysis of kernel launch patterns and associated memory usage. This approach offers significant benefits as eBPF imposes minimal overhead and requires no intrusive instrumentation. Our implementation is also open-source and available on GitHub.

Cloud, Hybrid Cloud, and Hyperscale Infrastructure
D0206 (capacity 154)