DevConf.US 2025

Smarter RAG, Smaller Bill: Optimize for Performance and Price
2025-09-20 โ€“, Ladd Room (Capacity 96)

RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell you a way that saves you more $$ on top of that because 2025 will all be about optimising the cost of building LLMs and its apps. RAGCache tackles these bottlenecks with cutting-edge techniques:
- ๐——๐˜†๐—ป๐—ฎ๐—บ๐—ถ๐—ฐ ๐—ž๐—ป๐—ผ๐˜„๐—น๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—–๐—ฎ๐—ฐ๐—ต๐—ถ๐—ป๐—ด: Stores intermediate states in a structured knowledge tree, balancing GPU and host memory usage.
- ๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜ ๐—ฅ๐—ฒ๐—ฝ๐—น๐—ฎ๐—ฐ๐—ฒ๐—บ๐—ฒ๐—ป๐˜ ๐—ฃ๐—ผ๐—น๐—ถ๐—ฐ๐˜†: Tailored for LLM inference and RAG retrieval patterns.
- ๐—ฆ๐—ฒ๐—ฎ๐—บ๐—น๐—ฒ๐˜€๐˜€ ๐—ข๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฎ๐—ฝ: Combines retrieval and inference to minimize latency.
Integrating RAGCache with tools like vLLM and Faiss delivers:
- ๐Ÿฐ๐˜… ๐—™๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ Time to First Token (TTFT).
- ๐Ÿฎ.๐Ÿญ๐˜… ๐—ง๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต๐—ฝ๐˜‚๐˜ ๐—•๐—ผ๐—ผ๐˜€๐˜, optimizing latency and computational efficiency.
The talk goes through:
1. Current challenges of RAG
2. A solution that reduces cost and improves user experience
3. How does it work?
4. How well does it perform?
5. What are the key benefits?
6. Lastly, a few real-world applications


What level of experience should the audience have to best understand your session? โ€“

Intermediate - attendees should be familiar with the subject

Keerthi is an AI aficionado passion-wise and Data Scientist professionally with a wide range of experience in building LLM applications with different models and researching new fine-tuning methods. I consider myself a Thinktank with industry-ready and domain specific skills to apply and innovate for the betterment of society. I also spoke in various other meetups, podcasts and technical events regarding LLMs and Data Science. My previous talk was in Bangalore, India Techtonic 2.0 event regarding LLM Security. Connect with me to collaborate on various AI projects and participate in Hackathons to build for the society.

This speaker also appears in: