Conf42 Machine Learning 2025 - Online

- premiere 5PM GMT

Beyond BLEU and ROUGE — Modern Approaches to Evaluating LLMs and AI Systems

Abstract

Traditional metrics like BLEU and ROUGE fall short in capturing advanced LLM capabilities. In this talk, discover modern methods—from benchmarks like MMLU and TruthfulQA to real-world evaluations and human-in-the-loop insights—that better assess AI performance. Join us to rethink AI evaluation—now!

...

Alok Ranjan

Engineering Manager @ Dropbox

Alok Ranjan's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)