Confidently Automating Changes Across a Diverse Fleet

Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconsf.com with any comments or concerns.

The presentation titled Confidently Automating Changes Across a Diverse Fleet is delivered by Casey Bleifer, a Senior Software Engineer at Netflix. It addresses the challenges and strategies for automating code changes across a diverse software fleet.

Key points of the presentation include:

  • Challenges in Software Migrations: Casey begins with a story illustrating the difficulty of achieving full adoption of a new software version, often leading to a long tail of migrations that can take months or years, leaving systems vulnerable.
  • Goals for Automation: The team aims to automate fleet-wide code changes in a week or less, and critical vulnerability fixes in two days, with minimal effort required from platform and software owners.
  • Automation Platform Development: A fleetwide automation platform was developed, encompassing campaigns (migrations) and targets (software requiring migration), with steps defined for each target.
  • Confidence Metrics: A confidence metric is used to ensure safety and reliability of automation, allowing automatic merging of pull requests if the confidence level is high.
  • Results and Ongoing Improvements: Initial exercises revealed a need for manual interventions in many cases. Improvements have reduced time for completing migrations and the percentage requiring manual intervention, though challenges remain.
  • Partnership and Collaboration: Emphasizing teamwork and cross-functional partnerships as crucial to the success of large-scale automation efforts.

Casey concludes with the idea that automating changes across diverse systems is a continuing journey, with significant progress already made but with more improvements to be achieved in the future .

This is the end of the AI-generated content.


Abstract

Maintaining up-to-date and secure software across a polyglot fleet is a challenge for any engineering organization. Manual migrations and urgent updates disrupt productivity and require coordination across many teams. Often, these migrations take months or longer to accomplish, leaving the fleet vulnerable or forcing platform teams to maintain multiple software versions throughout the course of the migration. In this talk, I’ll share our ongoing journey to automate fleetwide changes in one week or less. This presentation will focus on:

  • How to think about safely orchestrating changes at scale
  • Designing automation for a diverse software ecosystem
  • Challenges we face as we work to reach our goal

Speaker

Casey Bleifer

Senior Software Engineer @Netflix

Casey Bleifer is a Senior Software Engineer on the Change Automation team at Netflix, where she focuses on automating code changes across the fleet. Prior to that, she contributed to Spinnaker during her time in delivery engineering at Netflix. Before Netflix, she was a frontend engineer at Uber working on the Uber Freight products. Outside of work Casey enjoys traveling, going to concerts, watching NBA/WNBA games, and being a theater nerd.

Read more

Date

Monday Nov 17 / 11:45AM PST ( 50 minutes )

Location

Ballroom BC

Topics

Fleet Management Platform Engineering CI/CD Orchestration

Slides

Slides are not available

Share

From the same track

Session CI/CD

Keeping the Mainline Green Across Diverse Language Monorepos

Monday Nov 17 / 02:45PM PST

At Uber’s scale, ensuring an always-green mainline while processing hundreds of changes per hour is a massive challenge— especially when those changes span multiple language monorepos supporting dozens of business-critical apps.

Speaker image - Dhruva Juloori

Dhruva Juloori

Senior Software Engineer @Uber, Core Contributor to SubmitQueue (Uber's CI System at Scale), Expert in Machine Learning, Distributed Systems, and Developer Productivity

Session Rust

Rust at the Core - Accelerating Polyglot SDK Development

Monday Nov 17 / 03:55PM PST

Developing SDKs for your users in multiple languages can come at a high cost - especially if you need to implement complex logic client side, but traditionally options for sharing logic across those languages have been quite limited.

Speaker image - Spencer Judge

Spencer Judge

Engineering Manager @Temporal Technologies, previously Senior Software Engineer @Transparent Systems, Senior Software Engineer @ Tableau Software

Session AI/ML

Secure Software Supply Chain: Risk Prediction at the Speed of Development

Monday Nov 17 / 01:35PM PST

The Platform That Sees Risk Before Code Does

Speaker image - Bishwajeet Paul

Bishwajeet Paul

Architect, Platform Engineering @JPMorgan Chase - Specializing in Solving Complex Challenges for the Developer Community

Session AI

Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery

Monday Nov 17 / 10:35AM PST

Modern AI platforms don’t have to choose between deterministic precision and probabilistic exploration—they need both.

Speaker image - Aaron Erickson

Aaron Erickson

Senior Manager and Founder of the DGX Cloud Applied AI Lab @NVIDIA, Previously Engineer @ThoughtWorks, VP of Engineering @New Relic, CEO and Co-Founder @Orgspace

Session Vibe Coding

Directing a Swarm of Agents for Fun and Profit

Monday Nov 17 / 05:05PM PST

Coding agents are a new tool, which many of us are trying to figure out how to use effectively.

Speaker image - Adrian Cockcroft

Adrian Cockcroft

Technology Advisor and Consultant @OrionX.net, Previously VP Open Source and Sustainability @Amazon, Cloud Architect @Netflix, Distinguished Engineer @eBay