DevConf.US 2025

Rafael Vasquez


Job title

Open Source Software Developer

Company or affiliation

IBM


Session

09-20
12:40
80min
Mastering Multi-Format Document Processing for AI with Docling
Mingxuan Zhao, Rafael Vasquez

Most real-world data remains trapped in complex documents: PDFs with intricate layouts, PowerPoints with embedded diagrams, Word documents with nested tables. Traditional extraction tools fail extract valuable information from these documents that improve your AI workflows. Tables become jumbled text, figures disappear, and document structure is lost. This workshop introduces Docling, an open-source toolkit that uses deep learning to understand documents the way humans do.

Through three hands-on labs, you'll build a complete document processing pipeline:
Lab 1: Convert complex documents (PDF, DOCX, PPTX, HTML) into structured data while preserving tables, figures, and layouts. See how Docling maintains relationships that other tools destroy.
Lab 2: Implement intelligent chunking strategies that respect document structure—critical for accurate retrieval in AI applications.
Lab 3: Build a multimodal RAG system with visual grounding, a unique Docling feature that shows users exactly where information originates in source documents.


You'll leave with working code for document processing pipelines and the skills to integrate Docling into your AI workflows. All processing runs locally on standard hardware.

Prerequisites: Python 3.10+, basic Python knowledge 
Target Audience: Developers and data scientists working with document-heavy workflows

Artificial Intelligence and Data Science
107 (Capacity 20)