blob: 9e6c0f3b9f7f420ca7d43dab76d0dbd1ffecb8c9 [file] [log] [blame]
---
layout: post
status: PUBLISHED
published: true
title: Announcing the release of Apache Samza 0.13.0
id: 991b7333-03ef-4c12-bcc3-0d079ac2aae0
date: '2017-06-09 18:09:04 -0400'
categories: samza
tags:
- samza
permalink: samza/entry/announcing-the-release-of-apache1
---
<p class="c0"><span>We are very excited to announce the release of </span><span class="c7"><a class="c5" href="http://samza.apache.org">Apache Samza</a></span><span class="c6">&nbsp;0.13.0.</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0"><span>Samza has been powering real-time applications in production across several large companies (including LinkedIn, Netflix, Uber) </span><span>for years now</span><span>. Samza provides leading support for large-scale </span><span>stateful</span><span class="c6">&nbsp;stream processing with:</span></p>
<p class="c0"><span>&nbsp;&bull; &nbsp;First class support for local state (with RocksDB store). This allows a stateful application to scale up to </span><span class="c7"><a class="c5" href="https://engineering.linkedin.com/performance/benchmarking-apache-samza-12-million-messages-second-single-node">1.1 Million events/sec</a></span><span class="c6">&nbsp;on a single machine with SSD.</span></p>
<p class="c0"><span class="c6">&nbsp;&bull; &nbsp;Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.</span></p>
<p class="c0"><span>&nbsp;&bull; &nbsp;A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and </span><span>output</span><span>&nbsp;systems (HDFS, Kafka, </span><span>ElastiCache</span><span class="c6">&nbsp;etc.).</span></p>
<p class="c0"><span class="c6">&nbsp;&bull; &nbsp;A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.</span></p>
<p class="c0"><span>&nbsp;&bull; &nbsp;Features like </span><span>canaries</span><span class="c6">, upgrades and rollbacks that support extremely large deployments with minimal downtime.</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<h2 class="c1" id="h.gtbpefm8f2eu"><span class="c8">New features</span></h2>
<p class="c0"><span class="c6">The 0.13.0 release contains previews for the following highly anticipated features:</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<h3 class="c9" id="h.l67x5yxz7obm"><span class="c4">High Level API</span></h3>
<p class="c0"><span>With the new high level API you can express your complex stream processing pipelines concisely in few lines of code and accomplish what previously required multiple jobs. This new API facilitates common operations like re-partitioning, windowing, and joining streams. Check out some examples to see the high level API in action </span><span class="c7"><a class="c5" href="http://samza.apache.org/startup/preview/#high-level-api">here</a></span><span class="c6">&nbsp;</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<h3 class="c9" id="h.6ty1kjt0c071"><span class="c4">Flexible Deployment Model</span></h3>
<p class="c0"><span class="c6">Samza now provides flexibility for running your application in any hosting environment and with cluster managers other than YARN. Samza can now also be run as a lightweight stream processing library embedded inside your application. Your processes can coordinate task distribution amongst themselves using ZooKeeper or static partition assignments out-of-the box. </span></p>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0"><span>See more details and code examples </span><span class="c7"><a class="c5" href="http://samza.apache.org/startup/preview/#flexible-deployment-model">here</a></span><span class="c6">.</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<h2 class="c1" id="h.tuafqpc3dyqn"><span class="c8">Enhancements, Upgrades and Bug Fixes</span></h2>
<p class="c0"><span class="c6">This release also includes the following enhancements to existing features:</span></p>
<ul class="c13 lst-kix_2fmf2d4a98wh-0 start">
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-871">SAMZA-871</a></span><span>&nbsp;adds a heart-beat mechanism between JobCoordinator and all running containers to </span><span>prevent orphaned containers</span><span class="c6">.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1140">SAMZA-1140</a></span><span class="c6">&nbsp;enables non-blocking commit in the AsyncRunloop.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1143">SAMZA-1143</a></span><span class="c6">&nbsp;adds configurations for localizing general resources in YARN.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1145">SAMZA-1145</a></span><span class="c6">&nbsp;provides the ability to configure the default number of changelog replicas.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1154">SAMZA-1154</a></span><span class="c6">&nbsp;adds a tasks endpoint to samza-rest to get information about all tasks in a job.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1158">SAMZA-1158</a></span><span class="c6">&nbsp;adds a samza-rest monitor to clean up stale local stores from completed containers.</span></li>
</ul>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0"><span class="c6">This release also includes several bug-fixes and improvements for operational stability. Some notable ones are:</span></p>
<ul class="c13 lst-kix_rm229gd4rqeu-0 start">
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1083">SAMZA-1083</a></span><span class="c6">&nbsp;prevents loading task stores that are older than delete tombstones during container startup.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1100">SAMZA-1100</a></span><span class="c6">&nbsp;fixes an exception when using an empty stream as both bootstrap and broadcast.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1112">SAMZA-1112</a></span><span class="c6">&nbsp;fixes BrokerProxy to log fatal errors.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1121">SAMZA-1121</a></span><span class="c6">&nbsp;fixes StreamAppender so that it doesn&#39;t propagate exceptions to the caller.</span></li>
<li class="c0 c3"><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1157">SAMZA-1157</a></span><span class="c6">&nbsp;fixes logging for serialization/deserialization errors.</span></li>
</ul>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0"><span class="c6">We&#39;ve also upgraded the following dependency versions:</span></p>
<ul class="c13 lst-kix_2hdzdhp18mq6-0 start">
<li class="c0 c3"><span class="c6">Samza now supports Scala 2.12.</span></li>
<li class="c0 c3"><span class="c6">Kafka version to 0.10.1.1.</span></li>
<li class="c0 c3"><span class="c6">Elasticsearch version to 2.2.0</span></li>
</ul>
<p class="c0"><span class="c6">&nbsp;</span></p>
<h2 class="c1" id="h.hesrr9ipm1db"><span class="c8">Community Developments</span></h2>
<p class="c0"><span>We&#39;ve made great community progress since the previous release. We showcased how Samza is powering stream processing at LinkedIn in Kafka Summit 2017 and O&rsquo;Reilly Strata 2017. We also presented Samza use cases and case studies from several large companies in ApacheCon Big Data, 2017. In addition, the Samza talk in LinkedIn&#39;s </span><span class="c10">Stream Processing Meetup</span><span>&nbsp;in Sunnyvale was well-received with over 200 attendees. Here are links to some of these events:</span></p>
<ol class="c13 lst-kix_uhstr71w8gkg-0 start" start="1">
<li class="c0 c3"><span>March 15, 2017 - </span><span class="c7"><a class="c5" href="https://conferences.oreilly.com/strata/strata-ca/public/schedule/detail/56054">Processing millions of events per second without breaking the bank - Kartik Paramasivam</a></span><span>&nbsp;(</span><span class="c7"><a class="c5" href="https://www.safaribooksonline.com/library/view/strata-hadoop/9781491976166/video301066.html">Video</a></span><span class="c6">)</span></li>
<li class="c0 c3"><span>May 8, 2017 - </span><span class="c7"><a class="c5" href="https://kafka-summit.org/sessions/data-processing-linkedin-apache-kafka/">Data Processing at LinkedIn with Apache Kafka and Apache Samza (Kafka Summit NYC 2017)</a></span><span class="c6">&nbsp;(Slides)</span></li>
<li class="c0 c3"><span>May 16, 2017 - </span><span class="c7"><a class="c5" href="https://apachebigdata2017.sched.com/event/9zv1/what-it-takes-to-process-a-trillion-events-a-day-case-studies-in-scaling-stream-processing-at-linkedin-jagadish-venkatraman-linkedin">What it takes to process a trillion events a day? Case studies in scaling stream processing at LinkedIn - </a></span><span class="c7"><a class="c5" href="https://apachebigdata2017.sched.com/event/9zv1/what-it-takes-to-process-a-trillion-events-a-day-case-studies-in-scaling-stream-processing-at-linkedin-jagadish-venkatraman-linkedin">Jagadish Venkatraman</a></span><span>&nbsp;(</span><span class="c7"><a class="c5" href="https://apachebigdata2017.sched.com/">ApacheCon Big Data &#39;17</a></span><span>) (</span><span class="c7"><a class="c5" href="https://cwiki.apache.org/confluence/download/attachments/51812876/ApacheCon-Talk-Jagadish-1.pdf?version%3D1%26modificationDate%3D1496363193591%26api%3Dv2">Slides</a></span><span class="c6">) </span></li>
<li class="c0 c3"><span>May 16, 2017 - </span><span class="c7"><a class="c5" href="https://apachebigdata2017.sched.com/event/9zvj/the-continuing-story-of-batching-to-streaming-analytics-at-optimizely-michael-borsuk-optimizely">The continuing story of Batching to Streaming analytics at Optimizely</a></span><span>, Michael Borsuk (ApacheCon Big Data&rsquo;17) (Slides)</span></li>
<li class="c0 c3"><span>May 24, 2017 - </span><span class="c7"><a class="c5" href="https://www.meetup.com/Stream-Processing-Meetup-LinkedIn/events/238303422/">Managed or stand alone, streaming or batch; Unified processing with the Samza Fluent API - Yi Pan (LinkedIn Stream Processing Meetup)</a></span><span>&nbsp;(</span><span class="c7"><a class="c5" href="https://www.slideshare.net/YiPan7/samza-013-meetup-slide-v10pptx">Slides</a></span><span class="c6">)</span></li>
<li class="c0 c3"><span>May 25, 2017 - </span><span class="c7"><a class="c5" href="https://feathercast.apache.org/2017/05/25/jagadish-venkatraman-apachecon-north-america-and-how-companies-are-using-apache-samza/">How companies are using Apache Samza - Jagadish Venkatraman (Apache Con podcast)</a></span></li>
</ol>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0">
<h2 class="c1" id="h.hesrr9ipm1db"><span class="c8">Future:</span></h2></p>
<p class="c0"><span>We&#39;ll continue improving the new High Level API and </span><span>flexible deployment </span><span class="c6">features with your feedback.</span></p>
<p class="c0 c2"><span class="c6"></span></p>
<p class="c0"><span>It&rsquo;s a great time to get involved. You can start by reviewing the </span><span class="c7"><a class="c5" href="http://samza.apache.org/startup/preview/#try-it-out">tutorials</a></span><span>, signing up for the </span><span class="c7"><a class="c5" href="http://samza.apache.org/community/mailing-lists.html">mailing list</a></span><span>, and grabbing some </span><span class="c7"><a class="c5" href="https://issues.apache.org/jira/browse/SAMZA-1232?jql=project%20%3D%20SAMZA%20AND%20labels%20%3D%20newbie%20AND%20status%20%3D%20Open">newbie JIRAs</a></span><span class="c6">. I&#39;d like to close by thanking everyone who&#39;s been involved in the project. It&#39;s been a great experience to be involved in this community, and I look forward to its continued growth.</p>