Presentation: Samza in LinkedIn: How LinkedIn Processes Billions of Events Everyday in Real-time
Location:
- Bayview A/B
We are enjoying something of a renaissance in data infrastructure. The old workhorses like MySQL and Oracle still exist but they are complemented by new specialized distributed data systems like Cassandra, Redis, Druid, and Hadoop. At the same time what we consider data has changed too--user activity, monitoring, logging and other event data are becoming first class citizens for data driven companies. Taking full advantage of all these systems and the relevant data creates a massive data integration problem. This problem is important to solve as these specialized systems are not very useful in the absence of a complete and reliable data flow.
One of the most powerful ways of solving this data integration problem is by restructuring your digital business logic around a centralized firehose of immutable events.
Once your data is captured in real-time and available as real-time subscriptions, you can start to compute new data sets in real-time, off these feeds. This style of stream processing is seen as something of a niche today but the model is extremely powerful and general. Much of what people compute offline in systems like Hadoop can also be done in real-time as data arrives using a stream-processing model. On top of these real-time data feeds, we can run continual processing and transformations to derive new data feeds (which are themselves logs) and publish these in the same way. We have open sourced our stream processing layer, Apache Samza[http://samza.incubator.apache.org/], which does this.
In this talk, I will share our experience of successfully building LinkedIn’s data pipeline infrastructure around Kafka and Samza. These lessons are hugely relevant to anyone building a data driven company.
Neha Narkhede Elsewhere
Similar Talks

Tracks
Covering innovative topics
Monday, 3 November
-   
          Architectures You've Always Wondered about    
  The newest and biggest Internet architectures 
-   
          Real World Functional     
  Putting functional programming concepts to work in the real world. 
-   
          The Future of Mobile    
  The future of mobile and performance improvements 
-   
          Continuous Delivery: From Heroics to Becoming Invisible    
  Continuous Delivery philosophies, cultures, hiccups, and best practices. 
-   
          Unleashing the Power of Streaming Data    
  This track explores a variety of use-cases, platforms, and techniques for processing and analyzing stream data from the companies deploying them at scale! 
-   
          Sponsored Solutions Track I    
  
Tuesday, 4 November
-   
          Engineering for Product Success    
  Architectures that make products more successful 
-   
          Reactive Service Architecture    
  Reactive, Responsive, Fault Tolerant and More. 
-   
          Modern CS In the Real World    
  How modern CS tackles problems in the real world. 
-   
          Applied Machine Learning and Data Science    
  Understand your big big data! 
-   
          Deploying at Scale    
  Containerizing Applications, Discovering Services, and Deploying to the Grid. 
-   
          Sponsored Solutions Track II    
  
Wednesday, 5 November
-   
          Beyond Hadoop     
  Emerging Big Data Frameworks and Technology 
-   
          Scalable Microservice Architectures    
  This track addresses the ways companies with hundreds of fine-grained web-services (e.g. Netflix, LinkedIn) manage complexity! 
-   
          Java at the Cutting Edge    
  The latest and greatest in the Java ecosystem 
-   
          Engineering culture    
  Successes and failures in creating an engineering culture. 
-   
          Next gen HTML5 and JS    
  How Web Components, the Future of CSS, and more are changing the web. 
-   
          Sponsored Solutions Track III    
  




