KEERTHI UDAYAKUMAR
Keerthi is an AI aficionado passion-wise and Data Scientist professionally with a wide range of experience in building LLM applications with different models and researching new fine-tuning methods. I consider myself a Thinktank with industry-ready and domain specific skills to apply and innovate for the betterment of society. I also spoke in various other meetups, podcasts and technical events regarding LLMs and Data Science. My previous talk was in Bangalore, India Techtonic 2.0 event regarding LLM Security. Connect with me to collaborate on various AI projects and participate in Hackathons to build for the society.
ASSOCIATE DATA SCIENTIST
Company or affiliation โREDHAT
Sessions
The project aims to assess the feasibility and effectiveness of an AI-enabled chatbot for mental health detection, employing Large Language Models (LLM), Natural Language Processing (NLP), and Deep Learning models. The web application integrates social attributes to aid users with mental health concerns, offering self-assistance through personalized assessments. The core strategy centers on fostering an "Optimistic Presence" by deploying an AI-driven virtual assistant capable of empathic conversations, active listening, and emotional state analysis. The methodology involves emulating human mental health professionals, assessing conditions through various cues, and offering tailored therapeutic interventions for any stressed out individuals. Integration with health records using Azure PostgreSQL allows collaboration with human providers for comprehensive care. This innovative solution seeks to extend constant virtual AI therapy, revolutionizing mental health support with technology-driven personalized assistance for students, working professionals and many hidden victims of poor mental health.
RAG apps save up to 60% of the cost compared to standard LLMs. But in this talk, I will tell you a way that saves you more $$ on top of that because 2025 will all be about optimising the cost of building LLMs and its apps. RAGCache tackles these bottlenecks with cutting-edge techniques:
- ๐๐๐ป๐ฎ๐บ๐ถ๐ฐ ๐๐ป๐ผ๐๐น๐ฒ๐ฑ๐ด๐ฒ ๐๐ฎ๐ฐ๐ต๐ถ๐ป๐ด: Stores intermediate states in a structured knowledge tree, balancing GPU and host memory usage.
- ๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ ๐ฅ๐ฒ๐ฝ๐น๐ฎ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐ฃ๐ผ๐น๐ถ๐ฐ๐: Tailored for LLM inference and RAG retrieval patterns.
- ๐ฆ๐ฒ๐ฎ๐บ๐น๐ฒ๐๐ ๐ข๐๐ฒ๐ฟ๐น๐ฎ๐ฝ: Combines retrieval and inference to minimize latency.
Integrating RAGCache with tools like vLLM and Faiss delivers:
- ๐ฐ๐
๐๐ฎ๐๐๐ฒ๐ฟ Time to First Token (TTFT).
- ๐ฎ.๐ญ๐
๐ง๐ต๐ฟ๐ผ๐๐ด๐ต๐ฝ๐๐ ๐๐ผ๐ผ๐๐, optimizing latency and computational efficiency.
The talk goes through:
1. Current challenges of RAG
2. A solution that reduces cost and improves user experience
3. How does it work?
4. How well does it perform?
5. What are the key benefits?
6. Lastly, a few real-world applications