DevConf.US 2025

Mastering Multi-Format Document Processing for AI with Docling
2025-09-20 , 107 (Capacity 20)

Most real-world data remains trapped in complex documents: PDFs with intricate layouts, PowerPoints with embedded diagrams, Word documents with nested tables. Traditional extraction tools fail extract valuable information from these documents that improve your AI workflows. Tables become jumbled text, figures disappear, and document structure is lost. This workshop introduces Docling, an open-source toolkit that uses deep learning to understand documents the way humans do.

Through three hands-on labs, you'll build a complete document processing pipeline:
Lab 1: Convert complex documents (PDF, DOCX, PPTX, HTML) into structured data while preserving tables, figures, and layouts. See how Docling maintains relationships that other tools destroy.
Lab 2: Implement intelligent chunking strategies that respect document structure—critical for accurate retrieval in AI applications.
Lab 3: Build a multimodal RAG system with visual grounding, a unique Docling feature that shows users exactly where information originates in source documents.


You'll leave with working code for document processing pipelines and the skills to integrate Docling into your AI workflows. All processing runs locally on standard hardware.

Prerequisites: Python 3.10+, basic Python knowledge 
Target Audience: Developers and data scientists working with document-heavy workflows


What level of experience should the audience have to best understand your session?

Intermediate - attendees should be familiar with the subject

Ming Zhao is an open source developer and developer advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around DocLing, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation.