Scale Out Batch Inference with Ray

Abstract

As AI technologies continue to evolve, the demand for processing both structured and unstructured data across diverse industries is rapidly growing. However, scaling AI batch processing across thousands of GPUs presents significant challenges in maintaining scalability, reliability, and observability. These challenges are further amplified when aiming for high-throughput batch data processing with large language models (LLMs), due to their computational demands and complexity.

In this presentation, we will demonstrate how we built a scalable and efficient batch inference stack using Ray at Anyscale. We begin by introducing Ray as a robust, scalable AI compute engine, followed by an in-depth look at RayData, a versatile and high-performance deep learning data processing pipeline. Next, we will introduce vLLM, the leading open-source framework for LLM inference, and illustrate how the combination of RayData and vLLM offers an ideal solution for scalable batch inference.

Speaker

Cody Yu

Staff Software Engineer and Tech Lead @Anyscale, Ex-Amazonian, vLLM Committer, Apache TVM PMC

Cody Yu is a staff software engineer and a tech lead at Anyscale, working on LLM inference performance optimization. He is a community member of various popular open source projects such as vLLM, SGLang and Apache TVM. Before Anyscale, Cody was a founding engineer at BosonAI, as well as a senior applied scientist at AWS AI. His recent research is in hardware acceleration and performance optimization for LLM systems.

Speaker

Cody Yu

Staff Software Engineer and Tech Lead @Anyscale, Ex-Amazonian, vLLM Committer, Apache TVM PMC

From the same track

Session AI/ML

Recommender and Search Ranking Systems in Large Scale Real World Applications

Monday Nov 18 / 01:35PM PST

Recommendation and search systems are two of the key applications of machine learning models in industry. Current state of the art approaches have evolved from tree based ensembles models to large deep learning models within the last few years.

Moumita Bhattacharya

Senior Research Scientist @Netflix, Previously @Etsy, Specialized in Machine Learning, Deep Learning, Big Data, Scala, Tensorflow, and Python

Session Knowledge Graphs

Enhance LLMs’ Explainability and Trustworthiness With Knowledge Graphs

Monday Nov 18 / 10:35AM PST

Graphs, especially knowledge graphs, are powerful tools for structuring data into interconnected networks. The structured format of knowledge graphs enhances the performance of LLM-based systems by improving information retrieval and ensuring the use of reliable sources.

Leann Chen

AI Developer Advocate @Diffbot, Creator of AI and Knowledge Graph Content on YouTube, Passionate About Knowledge Graphs & Generative AI

Session AI/ML

Why Most Machine Learning Projects Fail to Reach Production and How to Beat the Odds

Monday Nov 18 / 02:45PM PST

Despite the hype around AI, many ML projects fail, with only 15% of businesses' ML projects succeeding, according to McKinsey. Particularly with the significant investments in large language models and generative AI, only a small portion of companies have managed to realize their true value.

Wenjie Zi

Senior Machine Learning Engineer and Tech Lead @Grammarly, Specializing in Natural Language Processing, 10+ Years of Industrial Experience in Artificial Intelligence Applications

Session AI/ML

Reinforcement Learning for User Retention in Large-Scale Recommendation Systems

Monday Nov 18 / 05:05PM PST

This talk explores the application of reinforcement learning (RL) in large-scale recommendation systems to optimize user retention at scale - the true north star of effective recommendation engines.

Saurabh Gupta

Senior Engineering Leader @Meta, Veteran in the Video Recommendations Domain, Helping Scale Video Consumption

Gaurav Chakravorty

Uber TL @Meta, Previously Worked on Facebook Video Recommendations and Instagram Friending and Growth

Session

Unconference: AI and ML for Software Engineers

Monday Nov 18 / 03:55PM PST

Scale Out Batch Inference with Ray

Abstract

Speaker

Cody Yu

Find Cody Yu at:

Speaker

Cody Yu

Date

Location

Track

Share

From the same track

Recommender and Search Ranking Systems in Large Scale Real World Applications

Enhance LLMs’ Explainability and Trustworthiness With Knowledge Graphs

Why Most Machine Learning Projects Fail to Reach Production and How to Beat the Odds

Reinforcement Learning for User Retention in Large-Scale Recommendation Systems

Unconference: AI and ML for Software Engineers

Follow QCon

Contact

Menu

Conferences around the World