Smarter RAG, Smaller Bill: Optimize for Performance and Price DevConf.US 2025

Smarter RAG, Smaller Bill: Optimize for Performance and Price
.ical
2025-09-20 09:20–09:35, Ladd Room (Capacity 170)

RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell you a way that saves you more $$ on top of that because 2025 will all be about optimising the cost of building LLMs and its apps. RAGCache tackles these bottlenecks with cutting-edge techniques:
- 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗖𝗮𝗰𝗵𝗶𝗻𝗴: Stores intermediate states in a structured knowledge tree, balancing GPU and host memory usage.
- 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗥𝗲𝗽𝗹𝗮𝗰𝗲𝗺𝗲𝗻𝘁 𝗣𝗼𝗹𝗶𝗰𝘆: Tailored for LLM inference and RAG retrieval patterns.
- 𝗦𝗲𝗮𝗺𝗹𝗲𝘀𝘀 𝗢𝘃𝗲𝗿𝗹𝗮𝗽: Combines retrieval and inference to minimize latency.
Integrating RAGCache with tools like vLLM and Faiss delivers:
- 𝟰𝘅 𝗙𝗮𝘀𝘁𝗲𝗿 Time to First Token (TTFT).
- 𝟮.𝟭𝘅 𝗧𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁 𝗕𝗼𝗼𝘀𝘁, optimizing latency and computational efficiency.
The talk goes through:
1. Current challenges of RAG
2. A solution that reduces cost and improves user experience
3. How does it work?
4. How well does it perform?
5. What are the key benefits?
6. Lastly, a few real-world applications

What level of experience should the audience have to best understand your session?: Intermediate - attendees should be familiar with the subject

See also:

KEERTHI UDAYAKUMAR

Keerthi is an AI aficionado passion-wise and Data Scientist professionally with a wide range of experience in building LLM applications with different models and researching new fine-tuning methods. I consider myself a Thinktank with industry-ready and domain specific skills to apply and innovate for the betterment of society. I also spoke in various other meetups, podcasts and technical events regarding LLMs and Data Science. My previous talk was in Bangalore, India Techtonic 2.0 event regarding LLM Security. Connect with me to collaborate on various AI projects and participate in Hackathons to build for the society.

This speaker also appears in:

ZenZone: AI-Powered Peace of Mind

Smarter RAG, Smaller Bill: Optimize for Performance and Price .ical 2025-09-20 09:20–09:35, Ladd Room (Capacity 170)

Smarter RAG, Smaller Bill: Optimize for Performance and Price
.ical
2025-09-20 09:20–09:35, Ladd Room (Capacity 170)