2026-06-18 –, D0207 (capacity 90)
You're paying GPT-4 prices for queries a smaller model could handle. Model routing promises 40-70% savings.
But there's a catch: academic approaches assume humans label every response "good" or "bad." In production, those labels rarely exist.
This talk shows how to build a Bayesian router using Thompson Sampling that learns without labels. The key: a Composite Reward Function that scores responses automatically — Did the output parse? How fast was it? Did the agent retry? Three signals, zero human effort, one score to update routing.
We also address production realities often skipped in research:
- Model rot: Adapting when a provider degrades using decaying memory
- Cold-start: Converging in 20 queries instead of 100 using expert priors
- Safety: Continuous shadow evaluation to guarantee accuracy
Result: 40-50% cost cut, <1% accuracy drop.
Prerequisites: Python, LLM API familiarity.
GitHub: https://github.com/shrinidhi-mahishi/model_routing
PyPI: pip install bayesian-router
Shrinidhi Mahishi is a Principal Data Scientist at Red Hat with 12+ years of experience building scalable AI and machine learning systems. His work focuses on Generative AI, Agentic AI, LLM applications, and Retrieval-Augmented Generation (RAG) for enterprise platforms. He is passionate about open-source innovation and building practical AI systems that empower developers. Shrinidhi also holds two patents in anomaly detection and automated correlation analysis.