This track will take you behind the curtain and into the heart of system meltdowns at some of the world's leading software companies in "The stories behind the incidents" track. Learn directly from SREs about real-world, high-impact production failures at scale, including the immediate challenges of triage, diagnosis, and mitigation in complex distributed systems. From these stories, you’ll gain insights into the nature of real incidents and how skilled SREs recover from them.
You’ll learn about the ambiguous, confusing, and uncertain nature of incidents when you’re in the middle of them, and hear the tales of how engineers were able to improvise innovative solutions in order to restore service. You’ll also learn how fundamentally unpredictable incidents are, and, consequently, the importance of preparing to be surprised.
From this track
The Incident that Shaped Our Engineering Culture
Wednesday Nov 19 / 10:35AM PST
Details coming soon.
War Stories from the Front Lines of Production
Wednesday Nov 19 / 11:45AM PST
Details coming soon.

Vanessa Huerta Granda
Resiliency Manager @Enova, Co-Author of the Howie Guide on Post Incident Analysis
Rebuilding A System After a Security Breach
Wednesday Nov 19 / 01:35PM PST
Details coming soon.
The Bug That Never Should've Been: A Tale of Code Review, Testing, and Human Error
Wednesday Nov 19 / 02:45PM PST
Details coming soon.
Postmortem of a Downtime: What Was Learned from A Big Mistake
Wednesday Nov 19 / 03:55PM PST
Details coming soon.