BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.devconf.info//devconf-us-2025//talk//SBWBJS
BEGIN:VTIMEZONE
TZID:EST
BEGIN:STANDARD
DTSTART:20001029T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10;UNTIL=20061029T070000Z
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:STANDARD
DTSTART:20071104T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=11
TZNAME:EST
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000402T030000
RRULE:FREQ=YEARLY;BYDAY=1SU;BYMONTH=4;UNTIL=20060402T080000Z
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
BEGIN:DAYLIGHT
DTSTART:20070311T030000
RRULE:FREQ=YEARLY;BYDAY=2SU;BYMONTH=3
TZNAME:EDT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-devconf-us-2025-SBWBJS@pretalx.devconf.info
DTSTART;TZID=EST:20250919T100000
DTEND;TZID=EST:20250919T103500
DESCRIPTION:Large language models learn to predict human and machine text a
 s sequences of “tokens.” But what are these tokens\, and how are they 
 used to represent text? The answers to these questions matter: they form t
 he foundation of how every LLM generates its output\, and how its output c
 orrectness trades off against compute performance.\n\nIn this talk Erik Er
 landson will explore a variety of algorithms used to tokenize text before 
 it's processed by these models\, focusing on their trade-offs and impact o
 n model performance. He’ll compare algorithms for word-based\, subword-b
 ased\, and character-level tokenization\, including widespread approaches 
 such as Byte Pair Encoding and WordPiece.\n\nAttendees will gain an unders
 tanding of how LLMs depend on tokenization and how choices of tokenization
  impact model performance tradeoffs.
DTSTAMP:20260315T081755Z
LOCATION:Ladd Room (Capacity 170)
SUMMARY:Which Is To Be Master? Understanding LLM Tokenization - Erik Erland
 son
URL:https://pretalx.devconf.info/devconf-us-2025/talk/SBWBJS/
END:VEVENT
END:VCALENDAR
