Speaker: Charlotte Qi

Senior Staff Engineer @Meta

Ye (Charlotte) Qi is a production engineer on the AI inference team at Meta. She is one of the inference technical leads behind Meta’s initial Meta.AI product launch and LLaMa3 development.

With over six years of experience at Meta, she has run large-scale online inference systems for both RecSys and LLM models across various organizations. Charlotte enjoys working at the multidisciplinary intersection of infrastructure, machine learning, product development and DevOps, advancing end-to-end development from research to production. Her background spans the entire software stack, including hardware productionization, inference runtime optimizations, distributed system reliability, experiment management, and service operations.

Prior to joining Meta, Charlotte earned her Master's degree from Carnegie Mellon University, specializing in large-scale machine learning systems and neural machine translation.

Session

Scaling Large Language Model Serving Infrastructure at Meta

Running LLMs requires significant computational power, which scales with model size and context length. We will discuss strategies for fitting models to various hardware configurations and share techniques for optimizing inference latency and throughput at Meta.

Read more

Date

Tuesday Nov 19 / 10:35AM PST ( 50 minutes )

Location

Ballroom BC

Share