DevConf.CZ 2026

How to Analyze Terabytes of Data from GitHub Archive at High Speed
2026-06-18 , E105 (capacity 70)

GitHub provides public API for obtaining detailed information about various events performed by users across public repositories: git pushes, pull requests and reviews, github issues and comments, github stars, etc. The information about these events is available at https://gharchive.org in the form of per-hour compressed files with JSON lines representing all the events. The number of events recorded per year is ~1.5 billions. The total size of events per year is ~7 terabytes. This sounds like a big data. The talk shows how to explore this data at high speed and minimal costs and how to obtain interesting insights from this data.


Experience level: Intermediate - attendees should be familiar with the subject

Aliaksandr is a co-founder and the principal architect of VictoriaMetrics. He's also a well-known author of the popular performance-oriented libraries: fasthttp, fastcache and quicktemplate. Before VictoriaMetrics, Aliaksandr held CTO and Architect roles with adtech companies serving high volumes of traffic. He holds a Master’s Degree in Computer Software Engineering. He decided to found VictoriaMetrics after experiencing the shortcomings of all available time series DB and monitoring solutions