LLMs on the Edge: Building an Offline Troubleshooting Assistant with Podman, Ramalama, and Jetson Orin
In many secure or industrial environments — like factories, labs, or embedded automotive systems — machines run in air-gapped or low-connectivity conditions. When systems fail, engineers often rely on scattered manuals or vendor documentation, which slows recovery. What if you could drop in a self-contained AI assistant that works offline — right at the edge?
In this session, we’ll show how to build a local GenAI troubleshooting assistant using Jetson Orin, Podman, and Ramalama, running entirely on-device. The LLM is containerized and served using ramalama, optimized for edge inference with quantized models like Mistral 7B (GGUF). It’s paired with a local vector database and a retrieval-augmented generation (RAG) pipeline to search logs, KBs, and internal docs — all without internet access.
We’ll deep-dive into:
Running LLMs efficiently on Jetson Orin with Ramalama + Podman
Using ramalama to orchestrate a multi-container RAG assistant
Indexing structured and unstructured ops data with FAISS
Designing an edge AI assistant for Fedora IoT or RHEL for Edge
Benchmarking performance and memory usage on-device