Erik Erlandson DevConf.US 2025

Erik Erlandson
.ical

Erik is the AI and Data Science lead at Red Hat's Emerging Technologies group, where he leads a team of data scientists and software engineers who evaluate new technologies at the intersection of data science, AI and cloud native development.

Job title –

AI and Data Science Lead

Company or affiliation –

Red Hat

Session

09-19

10:00

35min

Which Is To Be Master? Understanding LLM Tokenization

Erik Erlandson

Large language models learn to predict human and machine text as sequences of “tokens.” But what are these tokens, and how are they used to represent text? The answers to these questions matter: they form the foundation of how every LLM generates its output, and how its output correctness trades off against compute performance.

In this talk Erik Erlandson will explore a variety of algorithms used to tokenize text before it's processed by these models, focusing on their trade-offs and impact on model performance. He’ll compare algorithms for word-based, subword-based, and character-level tokenization, including widespread approaches such as Byte Pair Encoding and WordPiece.

Attendees will gain an understanding of how LLMs depend on tokenization and how choices of tokenization impact model performance tradeoffs.

Artificial Intelligence and Data Science

Ladd Room (Capacity 96)

Erik Erlandson .ical

Session

Erik Erlandson
.ical