2025-03-01 –, Raigad Room (Chanakya Building / School of Business)
LLMs have been very useful and we have high potential for LLMs in Enterprises. However, evaluating these models remains a complex challenge and one of the reasons for LLMs not being adopted directly.
The responsible and ethical AI is going to be the key for Enterprises to adopt the LLMs for their business needs.
Traditional metrics like perplexity or BLEU score often fail to capture the nuanced capabilities of LLMs in real-world applications.
This talk is about current best practices in benchmarking LLMs, limitations of existing approaches and emerging evaluation techniques.
We’ll explore a range of qualitative and quantitative metrics,
from task-specific benchmarks (e.g., code generation, summarization)
to user-centric evaluations (e.g., coherence, creativity, bias detection).
importance of specialized benchmarks that test LLMs on ethical and explainability grounds
Outcome : The audience will be able to understand how to choose LLMs for the right balance of accuracy, efficiency, and fairness. Additionally understand what has improved in granite 3.0 which makes it better LLM.
Intermediate - attendees should be familiar with the subject
I am AI evangelist working at Red Hat. I am really positive about what AI has to offer to the world and love to talk and discuss real life applications of AI. Red Hat offers RHEL AI which allows community and customers to make best use of LLMs for their enterprises.