blob: 4c5e262995a1eed7fadf4099a0abc6670fc66155 [file] [log] [blame] [view]
---
layout: blog
title: On chasing the Stream Processing Utopia
icon: analytics
authors:
- name:
website:
image:
excerpt_separator: <!--more-->
exclude_from_loop: true
---
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
Talk By Kartik Paramasivam (Director of Engineering, Linkedin)
@Strange Loop, St. Louis, MO
<!--more-->
Over the last 15 years batch processing frameworks have thrived and ruled over big data processing. But now in the age of social computing, it is no longer acceptable to wait for data to land into a data-lake before it gets processed.
We want our applications to react to new data as soon as it gets generated upstream. For a web site, members expect their feed to be updated as soon as some relevant activity, news, jobs etc. happens.
We are talking seconds (or minutes). We also want to detect degraded site experience, fraud, security breaches, spam etc. instantaneously. Even business metrics (written in traditionally batch oriented languages like HIVE/PIG) are now expected to run in realtime. The current status-quo of real-time data processing (stream processing) is still very far from Utopia.
Kartik Paramasivam, The Director of Engineering presented Chasing the Stream Utopia at Strange Loop '18. The talk was inspired
by the extensive growth in Streaming Data at Linkedin, which has experienced a growth of as high as 5 Trillion Messages per day in 2018.
Linkedin supports close to 3000 applications in production using Kafka and Samza. He shed further light on Samza's claim
as State of the art Stream Processing framework in the streaming world, supporting use cases at LinkedIn, Slack, Uber, Intuit etc
His talk described LinkedIn's path on Chasing Utopia in Streaming world running apps at any complexity, any scale,
any source, any language, and any environment! He shed light on all of the above with actual use cases from LinkedIn using Samza and Kafka in production. He touched Samza's battle tested Stateful and Stateless processing, and also on the
newer available features like event time based processing using Beam Runner for Samza and Samza SQL. He further briefly explained running
and managing Kafka at Scale. Covering an array of topics from Kafka Cluster Management Woes to Dynamic Load Balancing
using Kafka Cruise Control.
He further added the tooling ecosystem that supports these apps and streaming challanges that are faced at LinkedIn. He
concluded with the upcoming releases and features of Samza (Apache Samza 1.0) and Kafka (Apache Kafka 2.0). Please find more [here] (https://youtu.be/2y8QImf-RpI)
<br>