Unleashing Llama's Potential: CPU-Based Fine-Tuning

Generative AI landscape is rapidly changing as new models are appearing in horizon every few days. However, the hardware and software characteristics of these models have many similar patterns and execution phases.

In this talk, we will use Llama2 as base model to highlight basic characterization. We will present a detailed analysis of Llama2 workload performance on a platform powered by the AMD EPYC Processor. All our analysis was completed using the latest multi-core CPU servers. This includes scalability analysis as well as detailed phase by phase analysis to detect software and hardware bottlenecks at various stages.

Based on this, we will share our recommendations for tuning, optimization, and deployment best practices for the software stack with consideration of the hardware on which it is deployed. We extend our analysis to Llama3 using architecture relevant software optimization and share the best deployment practices relevant to the most AI inference deployment use cases.


Speaker

Speaker

Rema Hariharan

Principal Engineer @AMD, Seasoned Performance Engineer With a Base in Quantitative Sciences and a Penchant for Root-Causing

Dr. Rema Hariharan is an engineer known for her quantitative approach to solving complex engineering challenges. With a foundation in Engineering and advanced expertise in Operations Research, her career spans diverse optimizations, from inventory control and credit management to in-depth performance analysis of network systems and computer hardware.

Beginning her career at AT&T Bell Labs, Dr. Hariharan has contributed her expertise to leading technology companies, including Sun Microsystems, eBay, and AMD. In her current role at AMD, she focuses on optimizing the performance of AI models on AMD hardware, driving efficiency and innovation in the field.

Read more
Find Rema Hariharan at:

From the same track

Session Hybrid cloud

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

Wednesday Nov 20 / 01:35PM PST

At GEICO we are on a journey to entirely modernize our Infrastructure. We are building an open-source, cloud-agnostic hybrid stack to run across public and on prem private cloud infrastructure without having to expose vendor specific stacks to our application developers.

Speaker image - Rebecca Weekly

Rebecca Weekly

VP of Infrastructure @GEICO

Session AI HW/SW optimization

Maximizing Deep Learning Performance on CPUs using Modern Architectures

Wednesday Nov 20 / 11:45AM PST

As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has a huge impact on cost and speed of operation.

Speaker image - Bibek Bhattarai

Bibek Bhattarai

AI Technical Lead @Intel, Computer Scientist Invested in Hardware-Software Optimization, Building Scalable Data Analytics, Mining, and Learning Systems

Session

High-Resolution Platform Observability

Wednesday Nov 20 / 03:55PM PST

Many observability tools fail to provide us with the relevant insights for understanding hardware health and utilization.

Speaker image - Brian Martin

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Session RISC-V

Optimizing Custom Workloads with RISC-V

Wednesday Nov 20 / 02:45PM PST

This talk will explore how RISC-V architecture can accelerate custom workloads, focusing on AI/ML applications. We’ll start by examining the RISC-V ecosystem and its increasing relevance in the software development landscape.

Speaker image - Ludovic Henry

Ludovic Henry

Member of Technical Staff @Rivos, Performance-Minded Engineer, Hardware & Software, Previously @Xamarin, @Microsoft, @Datadog