In this track you’ll hear from engineers and architects who are living with distributed systems every day—shipping features, firefighting incidents, and pushing the limits of scale. Talks will focus on:
- Latency in the wild – How teams measure, understand, and reduce user-perceived latency, deal with long tails, and design APIs, retries, and backpressure that behave well under load and partial failure.
- Consistency trade-offs – Concrete stories of choosing (and sometimes regretting) consistency models, handling stale reads and write conflicts, and designing systems that remain understandable as they grow.
- Failure as a first-class concern – Postmortems, incident narratives, chaos experiments, and the operational practices that make failures survivable instead of catastrophic.
- Scaling beyond “it works on my cluster” – Techniques and patterns for evolving architectures under growth: sharding, multi-region, multi-tenant, and cost-aware scaling strategies.
- Tooling and observability – The metrics, tracing, logging, and testing approaches that make complex distributed behaviors visible and debuggable.
Expect candid war stories, design explanations grounded in trade-offs, and patterns you can take back to your own systems. The emphasis is not on idealized architectures, but on the pragmatic decisions and hard-won lessons that separate distributed systems that merely run from those you can trust in production.