SRE
Building Resilient Platforms: Insights from 20+ Years in Mission-Critical Infrastructure
Monday Nov 17 / 02:45PM PST
In this talk, Matthew will describe lessons learned from over 20+ years of building scalable, secure and stable infrastructure platforms for software in financial services (electronic trading, credit card processing etc.), the talk is relevant to anyone building platforms for mission-critic
Matthew Liste
Head of Infrastructure @American Express, Previously @JPMorgan Chase and @Goldman Sachs
When Incidents Refuse to End
Wednesday Nov 19 / 11:45AM PST
As engineers, we’re used to managing failure, but long-running outages hit differently. They stretch teams, systems, and assumptions about how incidents “should” play out.
Vanessa Huerta Granda
Resiliency Manager @Enova, Co-Author of the Howie Guide on Post Incident Analysis
Week-Long Outage: Lifelong Lessons
Wednesday Nov 19 / 02:45PM PST
Routine database upgrades should be straightforward, especially with familiar, well-established technology. We were confident heading into our Elasticsearch upgrade, equipped with a solid plan and excited to see performance gains like we had seen from past upgrades.
Molly Struve
Staff Site Reliability Engineer @Netflix