Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Abstract

As companies grow, so does the complexity of keeping distributed systems in sync. At DoorDash, we tackled this challenge while building a high-throughput, domain-oriented data platform for capturing changes across hundreds of services.

Instead of relying on traditional Change Data Capture (CDC) mechanisms, we designed a Write-Ahead Intent Log—a lightweight, domain-scoped event stream that records write intents before state is finalized. This intent-first design acts as a durable buffer between writers and downstream consumers, enabling scalable, resilient CDC without tight coupling to database internals or the need for full mutation history.

In this talk, we’ll explore:

  • Efficiency: How publishing write intents instead of raw state changes shrinks payload size, reduces coordination overhead, and simplifies downstream processing.
  • Performance: Techniques like per-key concurrency control, progressive consistency reads, and partition-aware retries allow us to achieve under 1s tail latencies at up to 1M writes per second per table.
  • Maintainability: A Protobuf-based key-value schema abstraction that’s easily consumed by polyglot teams, with built-in support for dead-letter queues, bounded retries, and future-facing features like schema evolution via Proto + schema registry.

We’ll also share how this approach helped us avoid pitfalls like head-of-line blocking and schema drift—without relying on heavyweight infrastructure.

Key Takeaways:

  1. Intent-First Logging Enables Loose Coupling: By separating write intent from final state, you can decouple services cleanly and unlock asynchronous integrations without overloading databases.
  2. Throughput and Latency Can Coexist: With the right concurrency controls and retry strategies, it's possible to achieve sub-second latencies even at millions of writes per second per table.
  3. Simplicity Scales: A domain-scoped, schema-defined log format is easier to evolve and operate than opaque change logs tied to database internals.

Speaker

Vinay Chella

Engineering Leader @DoorDash - Specializing in Distributed Systems, Streaming & Storage Platforms, Apache Cassandra Committer, Previously Engineering Leader @Netflix

Vinay Chella is an Engineering Leader at DoorDash, where he leads the Storage and Streaming Infrastructure organization that powers mission-critical systems across the marketplace. He focuses on building high-leverage, large-scale data platforms, having previously led Netflix’s Online Data Stores and co-invented the Data Gateway platform that supported the majority of Netflix’s online datastore traffic. A long-time Apache Cassandra committer, he has been actively engaged in the community while advancing distributed data systems at scale. At DoorDash, he is shaping the next generation of storage and streaming abstractions through initiatives like Taulu and EventBus, unified data and streaming access layers that enable developers to scale with confidence. He thrives in ambiguous problem spaces, driving rapid innovation and pioneering resilient platforms that balance cost, scale, and velocity.

Read more
Find Vinay Chella at:

Speaker

Akshat Goel

Staff Software Engineer, Core Infra at @DoorDash, Previously Senior Software Engineer @Amazon

Akshat Goel is a Staff Software Engineer at DoorDash, where he builds the Storage Access Platform, a unified abstraction layer powering all online data stores. He currently co-leads initiatives like Taulu, a Table-as-a-Service system that provides developers with scalable and reliable storage primitives. Previously, Akshat worked at Amazon on AWS Timestream, focusing on ingestion systems, developer SDKs, rate limiting, and encryption solutions. His passion lies in designing developer-friendly platforms that abstract away the complexity of large-scale distributed systems.

Read more
Find Akshat Goel at:

From the same track

Session

How Netflix Shapes our Fleet for Efficiency and Reliability

Wednesday Nov 19 / 11:45AM PST

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Session

Realtime and Batch Processing of GPU Workloads

Wednesday Nov 19 / 01:35PM PST

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc.

Speaker image - Joseph Stein

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Session

From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Wednesday Nov 19 / 02:45PM PST

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits. 

Speaker image - Dumanshu Goyal

Dumanshu Goyal

Uber Technical Lead @Airbnb Powering $11B Transactions, Formerly @Google and @AWS

Session

One Platform to Serve Them All: Autoscaling Multi-Model LLM Serving

Wednesday Nov 19 / 10:35AM PST

AI teams are moving to self-hosted inference away from hosted LLMs as fine-tuning drives model performance. The catch is scale, hundreds of variants create long-tail traffic, cold starts, and duplicated stacks.

Speaker image - Meryem Arik

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist