LLM accuracy is a challenging topic to address and is much more multi dimensional than a simple accuracy score. In this talk we’ll dive deeper into how to measure LLM related metrics, going through examples, case studies and techniques beyond just a single accuracy and score. We’ll discuss how to create, track and revise micro LLM metrics to have granular direction for improving LLM models.
Interview:
What is the focus of your work?
I'm mainly focused on applied research and helping teams build in the LLM and conversational AI space. The goal is to look at industry challenges, create accessible and practical research and guides that help us create better conversational experiences.
What’s the motivation for your talk?
Each problem in the AI space, or any use case has unique challenges. There has been a lot of focus on catch-all metrics, but once you've been serving production traffic you'll find edge cases and scenarios you want to measure. This is where micro metrics can help, defining specific outputs and behaviors that you want to track for your use case.
Who is your talk for?
A range between intermediate and senior developer and product lead. The concepts are pretty standard from a product, ML and software perspective - the learning comes from thinking through the provided case studies and how they can be applied to your own use cases.
What do you think is the next big disruption in software?
Figuring out how to prompt and steer multi model models. Speech to speech is very exciting, but businesses need the ability to check for hallucinations and integrate with other services before responding to users.
Speaker
Denys Linkov
Head of ML @Voiceflow, LinkedIn Learning Instructor, ML Advisor and Instructor, Previously @LinkedIn
Denys leads Enterprise AI at Voiceflow, is a ML Startup Advisor and Linkedin Learning Course Instructor. He's worked with 50+ enterprises in their conversational AI journey, and his Gen AI courses have helped 150,000+ learners build key skills. He's worked across the AI product stack, being hands-on building key ML systems, managing product delivery teams, and working directly with customers on best practices.