DevConf.CZ 2026

Lost in Transliteration: Why strlen("Dvořák") Returns 8
2026-06-18 , E104 (capacity 72)

strlen("Dvořák") returns 8, not 6. If that surprises you, this talk is for you.

We have all seen it: Dvořák turns into DvoÅák, names become question marks. You try things until it works. But what is actually going on?

I will start with how we got here: ASCII, code pages, Unicode, and what it left unsolved. That part will be quick. Where I really want to dig in is what the C library does next: how iconv converts between encodings, how the gconv pipeline inside glibc works, and why things like //IGNORE behave inconsistently.

As a glibc contributor, I learned most of this the long way. I will share those experiences and the surprises along the way.

You will walk away with a working mental model of character encoding in general and especially in C, from history to implementation.


Experience level: Beginner - no experience needed

Software engineer at Red Hat, Google Summer of Code mentor at The FOSSology Project, and occasional GNU C Library contributor. If you use VLC for Android and have ever looked at its user docs, that was me.

I like understanding how things work and then explaining them to people. Outside of work I tinker with microcontrollers, self-host everything I can, and watch too much anime.