Abstract
At Netflix, certain use cases demand the rapid transfer of massive datasets—such as 50 TB—from offline to online systems. Doing this efficiently, without disrupting applications interacting with our online systems, presents a significant challenge. Traditional data transfer methods, such as using batch processing systems and loading data into online systems via PUT APIs, posed significant scalability and cost hurdles, often leading to performance bottlenecks and impacting system efficiency. To overcome these limitations, an innovative architectural solution was developed. This approach involved transforming offline data into an optimized format through pre-processing, staging this data, transforming offline data into RocksDB SST file format, staging these files in the cloud, and enabling direct, on-demand ingestion into the serving system.
This process necessitated navigating complex internal discussions and aligning diverse stakeholders on new technical strategies. It also required rapidly adapting initial prototypes to address urgent customer needs by initially prioritizing speed to onboard them to the prototype, before shifting efforts towards building a robust, scalable production system. Crucially, cross-functional collaboration proved essential. Teams from various domains worked closely to define requirements, overcome challenges, and ensure seamless implementation.
Ultimately, this collaborative effort led to the successful deployment of a system that provides enhanced performance, reducing data deployment time by 99% (from days to just 30 minutes) and cutting costs by 70%. This presentation will delve into the journey of transforming data pipelines at scale, highlighting the key technical strategies, strategic decisions, and crucial team efforts that made this significant improvement possible.
Key Takeaways:
- Discover the challenges and solutions for large-scale data movement from batch storage to online serving systems.
- Understand the innovative architectural approach to improve data deployment efficiency.
- Strategic decision-making and problem-solving in a high-pressure environment.
- The importance of cross-functional team collaboration in solving complex engineering problems.
Speaker

Rajasekhar Ummadisetty
Software Engineer @Netflix - Driving Scalable Data Abstractions, Leader in Distributed Systems and Data Management, Previously @Amazon and @Facebook
Raj Ummadisetty is a leading professional with over a decade of experience in solving distributed systems problems at scale. He currently leads the development of data abstractions at Netflix, focusing on scalable, high-performance solutions. Previously, Raj contributed significantly at Amazon and Facebook, where he honed his expertise in building systems at scale. He holds advanced degrees from Carnegie Mellon University and IIT Roorkee, providing a solid academic foundation. Known for his passion for continuous learning and staying abreast of industry trends, Raj consistently drives innovation and efficiency, making him a key player in distributed systems and data management.