Abstract
How do you deliver relevant and personalized recommendations to nearly a billion professionals—instantly, reliably, and at scale? At LinkedIn, the answer has been a multi-year journey of architectural reinvention. What started as an offline, batch-oriented system for connection suggestions has evolved into a real-time, cloud-hosted platform powering mission-critical recommendations across multiple surfaces—including "People You May Know", "Follows", Jobs, and Video.
This talk will unpack that architectural migration across four key phases: from offline scoring (with massive precomputation and high storage waste), to nearline scoring (for reactive freshness), to online scoring (with real-time inference and candidate flexibility), and finally to remote scoring (with GPU-accelerated inference in the cloud). Each phase brought its own trade-offs in latency, freshness, cost, and scale—and each pushed us closer to a system capable of delivering intent-aware, personalized recommendations on-demand.
We’ll dive deep into the key innovations that made this evolution possible: the decoupling of candidate generation from scoring pipelines, the adoption of Embedding-Based Retrieval (EBR) for addressing the cold-start problem, and the integration of LLM-powered ranking models for nuanced personalization. These changes not only enabled 90%+ reductions in offline compute/storage costs but also unlocked major gains in member engagement and system adaptability.
Beyond PYMK and Follows, we’ll also show how these architectural patterns are being applied to next-generation systems like LinkedIn Jobs (with career-intent sensitivity) and Video (where content freshness and rapid feedback loops dominate). By comparing the stacks, we’ll highlight how a unified architectural foundation enables tailored optimization across product surfaces.
Whether you’re modernizing legacy batch systems or architecting for real-time recommendations from day one, this session will equip you with practical lessons in relevance design, infra trade-offs, and relevance-first thinking—at the scale of billions.
Key Takeaways:
- Design for Cold-Start Like It's Day One
- Architectural Shifts Create Non-Linear Gains
- Retrieval Isn’t a Model Problem—It’s a Portfolio Strategy
- Decouple to Win: Split Generation From Scoring
- Staleness Kills—Freshness Converts
Interview:
What is your session about, and why is it important for senior software developers?
My session explores LinkedIn’s journey of modernizing its recommendation infrastructure — specifically, the migration from offline, batch-driven stacks to online, real-time systems that power People You May Know (PYMK) and Follows. This is important because senior developers increasingly design systems that must balance scale, personalization, and cost efficiency. The lessons here highlight how to architect relevance platforms that not only serve billions but also evolve with user intent in real time.
Why is it critical for software leaders to focus on this topic right now, as we head into 2026?
The industry is entering a phase where AI-driven personalization, freshness of data, and infra efficiency define business competitiveness. By 2026, companies that fail to modernize from offline pipelines to real-time inference risk latency bottlenecks, escalating infra costs, and disengaged users. Software leaders need to act now to future-proof systems — enabling their organizations to unlock multi-context recommendations, rapid experimentation, and efficient use of compute resources.
What are the common challenges developers and architects face in this area?
- Balancing latency and recall: online retrieval must respond in <100ms, yet still capture meaningful candidates.
- Infra intensity: moving to inference-driven stacks demands careful GPU/CPU optimization and caching strategies.
- Explainability: dynamic embeddings and multi-pass models can obscure traceability.
- Migration trade-offs: shifting from offline to online often creates short-term metric regressions that must be navigated with leadership alignment.
- Cost and fragility: offline stacks consume massive compute/storage, while online stacks require robust observability and guardrails to avoid regressions at scale.
What's one thing you hope attendees will implement immediately after your talk?
I hope attendees leave with the conviction to challenge batch-driven assumptions in their own systems. Even small shifts — such as integrating live contextual signals into candidate generation or piloting online inference for a subset of use cases — can yield measurable engagement and infra wins. The key is to start building incremental pathways toward online-first recommendation systems rather than waiting for a wholesale re-architecture.
What makes QCon stand out as a conference for senior software professionals?
QCon is unique because it curates deeply technical yet practitioner-led insights. Unlike conferences that focus on aspirational roadmaps, QCon emphasizes battle-tested learnings, real-world trade-offs, and architectural evolution. The focus on cross-industry patterns — from consumer platforms to enterprise systems — ensures that senior engineers and architects walk away with applicable strategies, not just inspiration.
What was one interesting thing that you learned from a previous QCon?
At a past QCon, I learned how cross-functional observability frameworks were being applied to unify model serving, logging, and experimentation. What stood out was not just the technical innovation but the cultural shift it required across engineering teams. It reinforced for me that modernization is as much about organizational alignment as it is about technology.
Speaker

Nishant Lakshmikanth
Engineering Manager @LinkedIn, Leading Infrastructure for "People You May Know" and "People Follows", Previously @AWS and @Cisco
Nishant is an Engineering Manager at LinkedIn, where he leads the infrastructure for People You May Know (PYMK) and People Follows (PF)—two foundational recommendation systems responsible for building a multi-billion dollar revenue stream annually and creating an engaging and meaningful ecosystem at scale. Over the past several years, he has been instrumental in scaling the underlying relevance infrastructure by migrating multiple pipelines from offline and nearline to online, inventing powerful candidate sourcing mechanisms, and reimagining the retrieval landscape to drive sustained business impact and unlock new ways of solving cold-start problems.
At LinkedIn, Nishant also built the control plane for all data systems from the ground up, enabling seamless orchestration, quota automation, and real-time capacity management across LinkedIn’s critical services. He led the development of the Multi-Tenancy-As-A-Service (MAAS) framework, which now underpins key infrastructure shared across teams. His recent efforts focus on integrating large language models (LLMs) into production pipelines, enhancing entity-based retrieval, and optimizing model serving for low-latency, high-throughput use cases.
Before LinkedIn, Nishant was at Amazon Web Services, where he worked on Elastic Block Storage (EBS) and authored several patents in distributed storage systems. He began his career at Cisco, contributing to high-performance video backend systems.
Nishant is passionate about driving innovation, mentoring engineering talent, and shaping the future of AI-powered systems.