2026-06-18 –, E104 (capacity 72)
strlen("Dvořák") returns 8, not 6. If that surprises you, this talk is for you.
We have all seen it: Dvořák turns into DvoÅák, names become question marks. You try things until it works. But what is actually going on?
I will start with how we got here: ASCII, code pages, Unicode, and what it left unsolved. That part will be quick. Where I really want to dig in is what the C library does next: how iconv converts between encodings, how the gconv pipeline inside glibc works, and why things like //IGNORE behave inconsistently.
As a glibc contributor, I learned most of this the long way. I will share those experiences and the surprises along the way.
You will walk away with a working mental model of character encoding in general and especially in C, from history to implementation.
Software engineer at Red Hat, Google Summer of Code mentor at The FOSSology Project, and occasional GNU C Library contributor. If you use VLC for Android and have ever looked at its user docs, that was me.
I like understanding how things work and then explaining them to people. Outside of work I tinker with microcontrollers, self-host everything I can, and watch too much anime.