Challenges in Productionizing Multi-Agent System
In this talk, we explore the practical challenges and engineering trade-offs involved in taking multi-agent AI systems from prototypes to production. After a brief overview of what constitutes an AI agent, we’ll trace the evolution of agentic workflows—from the explosive interest in AutoGPT in 2023, through the rise of visually empowered agents in 2024, to today’s surge of agentic AI across industries. We’ll then examine where and why businesses should deploy multi-agent architectures, highlighting key use cases in SaaS automation and domain-specific “Vertical AI.”
On the engineering side, we dive into three core pain points:
Inference Cost vs. Performance
• Balancing API or hardware expenses against tool-calling accuracy and latency
• Strategies for using smaller, specialized models—enhanced via prompt engineering and reinforcement learning—to match or exceed larger, general-purpose alternatives
Memory Management
• Physical layer: vector stores, knowledge graphs and prompt-caching frameworks (e.g., Llama-Stack, AutoGen)
• Logical layer: shared context protocols that minimize redundant inference and shrink effective context windows
Agent-to-Agent Communication
• Physical/transport layer: HTTP-based JSON-RPC 2.0 with async-first patterns (polling, SSE, webhooks)
• Logical layer: pipelined and host-agent orchestration workflows
We’ll illustrate these concepts with benchmark data and a pre-recorded demo showcasing a full agentic pipeline—MCP function calling, A2A orchestration, and memory-driven optimization. Finally, we’ll close with a look at future directions for agentic systems and open the floor for questions.