As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has a huge impact on cost and speed of operation. In addition, unleashing the full potential of these hardware through appropriate software stacks can be daunting.

This talk explores the advancements in modern CPU processors for enhanced AI capabilities and acceleration of underlying computation elements, specifically General Matrix Multiply (GEMM) operations. It will dive deep into the Intel Advanced Matrix Extensions (AMX) built into modern data-center CPUs and how to use them to perform efficient low-precision matrix operations. Additionally, we will explore software tools and frameworks that unlock the full performance of these accelerators, offering actionable insights for kernel developers, framework engineers, and data scientists.

Bibek is an AI Technical Lead at Intel, where he collaborates with customers to optimize the performance of their AI workloads across various deployment platforms, including cloud, on-premises, and hybrid environments. These workloads involve pertaining, fine-tuning, and deployment of state-of-the-art deep learning models using cutting-edge AI-specialized hardware in the form of CPUs, GPUs, and AI Accelerators.

Bibek holds a Doctorate in Computer Science and Engineering from George Washington University, where his research focused on large-scale graph computing, mining, and learning technologies. He is keenly interested in HW/SW optimization of various workloads including Graph Computing, Deep Learning, and parallel computing.

From the same track

Session Hybrid cloud

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

Wednesday Nov 20 / 01:35PM PST

At GEICO we are on a journey to entirely modernize our Infrastructure. We are building an open-source, cloud-agnostic hybrid stack to run across public and on prem private cloud infrastructure without having to expose vendor specific stacks to our application developers.

Rebecca Weekly

VP of Infrastructure @GEICO

Session

High-Resolution Platform Observability

Wednesday Nov 20 / 03:55PM PST

Many observability tools fail to provide us with the relevant insights for understanding hardware health and utilization.

Brian Martin

Co-founder and Software Engineer @IOP Systems, Focused on High-Performance Software and Systems, Previously @Twitter

Session RISC-V

Optimizing Custom Workloads with RISC-V

Wednesday Nov 20 / 02:45PM PST

This talk will explore how RISC-V architecture can accelerate custom workloads, focusing on AI/ML applications. We’ll start by examining the RISC-V ecosystem and its increasing relevance in the software development landscape.

Ludovic Henry

Member of Technical Staff @Rivos, Performance-Minded Engineer, Hardware & Software, Previously @Xamarin, @Microsoft, @Datadog

Session AI/ML

Unleashing Llama's Potential: CPU-Based Fine-Tuning

Wednesday Nov 20 / 10:35AM PST

Generative AI landscape is rapidly changing as new models are appearing in horizon every few days. However, the hardware and software characteristics of these models have many similar patterns and execution phases.

Anil Rajput

AMD Fellow, Software System Design Eng. Java Committee Chair @SPEC, Architected Industry Standard Benchmarks and Authored Best Practices Guides for Platform Engineering and Cloud

Rema Hariharan

Principal Engineer @AMD, Seasoned Performance Engineer With a Base in Quantitative Sciences and a Penchant for Root-Causing

Maximizing Deep Learning Performance on CPUs using Modern Architectures

Abstract

Speaker

Bibek Bhattarai

Find Bibek Bhattarai at:

Speaker

Bibek Bhattarai

Date

Location

Track

Topics

Share

From the same track

Evaluating and Deploying State-of-the-Art Hardware to Meet the Challenges of Modern Workloads

High-Resolution Platform Observability

Optimizing Custom Workloads with RISC-V

Unleashing Llama's Potential: CPU-Based Fine-Tuning

Follow QCon

Contact

Menu

Conferences around the World