Abstract

Generative AI landscape is rapidly changing as new models are appearing in horizon every few days. However, the hardware and software characteristics of these models have many similar patterns and execution phases.

In this talk, we will use Llama2 as base model to highlight basic characterization. We will present a detailed analysis of Llama2 workload performance on a platform powered by the AMD EPYC Processor. All our analysis was completed using the latest multi-core CPU servers. This includes scalability analysis as well as detailed phase by phase analysis to detect software and hardware bottlenecks at various stages.

Based on this, we will share our recommendations for tuning, optimization, and deployment best practices for the software stack with consideration of the hardware on which it is deployed. We extend our analysis to Llama3 using architecture relevant software optimization and share the best deployment practices relevant to the most AI inference deployment use cases.

Speaker

Anil Rajput

AMD Fellow, Software System Design Eng. Java Committee Chair @SPEC, Architected Industry Standard Benchmarks and Authored Best Practices Guides for Platform Engineering and Cloud

Speaker

Rema Hariharan

Principal Engineer @AMD, Seasoned Performance Engineer With a Base in Quantitative Sciences and a Penchant for Root-Causing

Dr. Rema Hariharan is an engineer known for her quantitative approach to solving complex engineering challenges. With a foundation in Engineering and advanced expertise in Operations Research, her career spans diverse optimizations, from inventory control and credit management to in-depth performance analysis of network systems and computer hardware.

Beginning her career at AT&T Bell Labs, Dr. Hariharan has contributed her expertise to leading technology companies, including Sun Microsystems, eBay, and AMD. In her current role at AMD, she focuses on optimizing the performance of AI models on AMD hardware, driving efficiency and innovation in the field.

Find Rema Hariharan at:

From the same track

Session Hybrid cloud

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

Wednesday Nov 20 / 01:35PM PST

At GEICO we are on a journey to entirely modernize our Infrastructure. We are building an open-source, cloud-agnostic hybrid stack to run across public and on prem private cloud infrastructure without having to expose vendor specific stacks to our application developers.

Rebecca Weekly

VP of Infrastructure @GEICO

Session AI HW/SW optimization

Maximizing Deep Learning Performance on CPUs using Modern Architectures

Wednesday Nov 20 / 11:45AM PST

As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has a huge impact on cost and speed of operation.

Bibek Bhattarai

AI Technical Lead @Intel, Computer Scientist Invested in Hardware-Software Optimization, Building Scalable Data Analytics, Mining, and Learning Systems

Session

High-Resolution Platform Observability

Wednesday Nov 20 / 03:55PM PST

Many observability tools fail to provide us with the relevant insights for understanding hardware health and utilization.

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Session RISC-V

Optimizing Custom Workloads with RISC-V

Wednesday Nov 20 / 02:45PM PST

This talk will explore how RISC-V architecture can accelerate custom workloads, focusing on AI/ML applications. We’ll start by examining the RISC-V ecosystem and its increasing relevance in the software development landscape.

Ludovic Henry

Member of Technical Staff @Rivos, Performance-Minded Engineer, Hardware & Software, Previously @Xamarin, @Microsoft, @Datadog

Unleashing Llama's Potential: CPU-Based Fine-Tuning

Abstract

Speaker

Anil Rajput

Find Anil Rajput at:

Speaker

Rema Hariharan

Find Rema Hariharan at:

Speaker

Anil Rajput

Speaker

Rema Hariharan

Date

Location

Track

Topics

Share

From the same track

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

Maximizing Deep Learning Performance on CPUs using Modern Architectures

High-Resolution Platform Observability

Optimizing Custom Workloads with RISC-V

Follow QCon

Contact

Menu

Conferences around the World