As deep learning continues to drive advancements across various industries, efficiently navigating the landscape of specialized AI hardware has a huge impact on cost and speed of operation. In addition, unleashing the full potential of these hardware through appropriate software stacks can be daunting.
This talk explores the advancements in modern CPU processors for enhanced AI capabilities and acceleration of underlying computation elements, specifically General Matrix Multiply (GEMM) operations. It will dive deep into the Intel Advanced Matrix Extensions (AMX) built into modern data-center CPUs and how to use them to perform efficient low-precision matrix operations. Additionally, we will explore software tools and frameworks that unlock the full performance of these accelerators, offering actionable insights for kernel developers, framework engineers, and data scientists.
Speaker
Bibek Bhattarai
AI Technical Lead @Intel, Computer Scientist Invested in Hardware-Software Optimization, Building Scalable Data Analytics, Mining, and Learning Systems
Bibek is an AI Technical Lead at Intel, where he collaborates with customers to optimize the performance of their AI workloads across various deployment platforms, including cloud, on-premises, and hybrid environments. These workloads involve pertaining, fine-tuning, and deployment of state-of-the-art deep learning models using cutting-edge AI-specialized hardware in the form of CPUs, GPUs, and AI Accelerators.
Bibek holds a Doctorate in Computer Science and Engineering from George Washington University, where his research focused on large-scale graph computing, mining, and learning technologies. He is keenly interested in HW/SW optimization of various workloads including Graph Computing, Deep Learning, and parallel computing.