LLMs on the Edge: The Future of On-Device Intelligence
In many secure or industrial environments — like factories, labs, or embedded automotive systems — machines run in air-gapped or low-connectivity conditions. When systems fail, engineers often rely on scattered manuals or vendor documentation, which slows recovery. What if you could drop in a self-contained AI assistant that works offline — right at the edge?
This lightning talk shows how to run a multimodal, agentic pipeline—Vision LM → RAG → LLM—entirely on-device using Podman containers on RHEL Edge with GPU CDI on an NVIDIA Jetson Orin Nano. We’ll contrast cloud vs edge constraints (RAM/power) and share a container-native architecture that delivers low latency, privacy, and reproducibility. A short demo (pre-recorded) illustrates a camera-to-answer workflow with real device metrics (tokens/sec, first-token latency). Attendees leave with a practical blueprint and ops tips for shipping rootless, reproducible, air-gapped AI stacks using Ramalama for local LLM serving.