BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.devconf.info//devconf-cz-2026//talk//BRCGAR
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-devconf-cz-2026-BRCGAR@pretalx.devconf.info
DTSTART;TZID=CET:20260618T131500
DTEND;TZID=CET:20260618T135000
DESCRIPTION:Extracting structured information from PDFs is a challenging ta
 sk\; the format was designed for visual consistency\, not machine readabil
 ity. Rule-based tools handle basic text extraction well but struggle with 
 tables\, semantic role identification\, and specialized content like math 
 formulas. Modern ML-based tools are more versatile but can hallucinate. Hy
 brid tools attempt to get the best of both worlds.\n\nDocling is one such 
 hybrid tool. It combines programmatic PDF parsing with additional ML model
 s\, producing a rich\, structured document representation.\n\nWe integrate
 d Docling into sec-certs\, an open-source tool for automated analysis of C
 ommon Criteria and FIPS 140 certification documents\, aiming to improve re
 liability and enable more sophisticated analysis.\n\nThis talk shares how 
 structured output changes what's possible in automated analysis\, how the 
 pipeline improved\, what worked (and what didn’t)\, and lessons learned 
 when processing large collections of security certification PDFs.
DTSTAMP:20260430T125107Z
LOCATION:D0207 (capacity 90)
SUMMARY:Structured PDF Parsing with Docling: Lessons from Analyzing Securit
 y Certification Documents - Jakub Borsky
URL:https://pretalx.devconf.info/devconf-cz-2026/talk/BRCGAR/
END:VEVENT
END:VCALENDAR
