From ms to µs: OSS Valkey Architecture Patterns for Modern AI

Abstract

As AI applications demand faster and more intelligent data access, traditional caching strategies are hitting performance and reliability limits.

This talk presents architecture patterns that shift from milliseconds to microseconds using Valkey Cluster, an open-source, Redis-compatible, in-memory datastore. Learn when to use proxy-based versus direct-access caching, how to avoid hidden reliability issues in sharded systems, and how to optimize for high price-performance at scale. Backed by the Linux Foundation, Valkey offers rich data structures and community-driven innovation.

Whether you’re building GenAI services or scaling existing platforms, this session delivers actionable patterns to improve speed, resilience, and efficiency.

Interview:

What is your session about, and why is it important for senior software developers?

Imagine your in-memory system is a Formula 1 engine. We've made the engine incredibly fast, but now the racetrack itself is the bottleneck. My talk is about fixing that track. We will look at how extra latency hops are hidden speed bumps, and how expensive abstractions are like adding a heavy new spoiler that just adds drag and cost. We'll dive into the real trade-offs of building for microsecond latency, using patterns you can apply with the open-source, Redis-compatible Valkey. (Only milliseconds were harmed.)

Why is it critical for software leaders to focus on this topic right now, as we head into 2026?

Because speed is a feature, and inefficiency is a tax. When your Redis-style operations take microseconds, your real slowdown comes from network hops, proxies, and client behavior. This is what bloats your latency tail and your cloud bill. Focusing on this now is also a key strategic move. The open governance of Valkey after the 2024 Redis license change makes it a safe, vendor-neutral base to build on, so your efficiency gains aren't locked to a single vendor.

What are the common challenges developers and architects face in this area?

The biggest challenge is that a system on a whiteboard and a system under pressure tell two different stories. On paper, your design looks clean. But in reality, a single busy component can create a traffic jam that backs everything up. We will talk about how to spot these expensive problems before they become a 3 a.m. emergency.

What's one thing you hope attendees will implement immediately after your talk?

Go back and play detective with your system. Draw a map of the journey your code takes to get data. Every stop on that map, every latency hop, and every "simple" abstraction has a price tag in both time and money. Start asking what each step costs. When you see the real price, you will find amazing ways to make things faster and cheaper.

What makes QCon stand out as a conference for senior software professionals?

QCon is where you learn from people who have the operational scars to prove their advice works. It's built on practitioner-led content, so you get unfiltered, real world lessons without the sales pitch. You leave with insights you can actually apply on Monday morning.


Speaker

Dumanshu Goyal

Uber Technical Lead @Airbnb Powering $11B Transactions, Formerly @Google and @AWS

Dumanshu Goyal leads Online Data Priorities at Airbnb, powering its $11B transaction platform and building the next generation of the company’s data systems. Previously, he led in-memory caching for Google Cloud Databases, delivering 10x improvements in scale and price-performance for Google Cloud Memorystore, one of the rare times “10x” was more than a slide promise. Before that, he spent 10 years at AWS as the founding engineer of AWS Timestream, a serverless time-series database, and architected durability and availability features for one of the internet’s foundational services, AWS DynamoDB, ensuring data stayed available even when Wi-Fi did not.

 

An expert in building and operationalizing large-scale distributed systems, Dumanshu holds 20 patents and brings deep experience in architecting the resilient infrastructure that underpins today’s digital world.

Read more
Find Dumanshu Goyal at:

Date

Wednesday Nov 19 / 02:45PM PST ( 50 minutes )

Location

Ballroom A

Topics

Architecture Open Source Microsecond-Scale In-memory Caching Sharded System Reliability Staff Plus Engineering Platform Engineering

Share

From the same track

Session Capacity Planning

How Netflix Shapes our Fleet for Efficiency and Reliability

Wednesday Nov 19 / 11:45AM PST

Netflix runs on a complex multi-layer cloud architecture made up of thousands of services, caches, and databases. As hardware options, workload patterns, cost dynamics and the Netflix products evolve, the cost-optimal hardware and configuration for running our services is constantly changing.

Speaker image - Joseph Lynch

Joseph Lynch

Principal Software Engineer @Netflix Building Highly-Reliable and High-Leverage Infrastructure Across Stateless and Stateful Services

Speaker image - Argha C

Argha C

Staff Software Engineer @Netflix - Leading Netflix's Cloud Scalability Efforts for Live

Session AI Architecture

Realtime and Batch Processing of GPU Workloads

Wednesday Nov 19 / 01:35PM PST

SS&C Technologies runs 47 trillion dollars of assets on our global private cloud. We have the primitives for infrastructure as well as platforms as a service like Kubernetes, Kafka, NiFi, Databases, etc.

Speaker image - Joseph Stein

Joseph Stein

Principal Architect of Research & Development @SS&C Technologies, Previous Apache Kafka Committer and PMC Member

Session AI/ML

Producing the World's Cheapest Tokens: A How-to Guide

Wednesday Nov 19 / 10:35AM PST

AI inference is expensive, but it doesn’t have to be. In this talk, we’ll break down how to systematically drive down the cost per token across different types of AI workloads.

Speaker image - Meryem Arik

Meryem Arik

Co-Founder and CEO @Doubleword (Previously TitanML), Recognized as a Technology Leader in Forbes 30 Under 30, Recovering Physicist

Session Platform Engineering

Write-Ahead Intent Log: A Foundation for Efficient CDC at Scale

Wednesday Nov 19 / 03:55PM PST

As companies grow, so does the complexity of keeping distributed systems in sync. At DoorDash, we tackled this challenge while building a high-throughput, domain-oriented data platform for capturing changes across hundreds of services.

Speaker image - Vinay Chella

Vinay Chella

Engineering Leader @DoorDash - Specializing in Distributed Systems, Streaming & Storage Platforms, Apache Cassandra Committer, Previously Engineering Leader @Netflix

Speaker image - Akshat Goel

Akshat Goel

Staff Software Engineer, Core Infra at @DoorDash, Previously Senior Software Engineer @Amazon