<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Flink Blog Feed</title>
<description>Flink Blog</description>
<link>https://flink.apache.org/blog</link>
<atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" />

<item>
<title>PyFlink: Introducing Python Support for UDFs in Flink&#39;s Table API</title>
<description>&lt;p&gt;Flink 1.9 introduced the Python Table API, allowing developers and data engineers to write Python Table API jobs for Table transformations and analysis, such as Python ETL or aggregate jobs. However, Python users faced some limitations when it came to support for Python UDFs in Flink 1.9, preventing them from extending the system’s built-in functionality.&lt;/p&gt;

&lt;p&gt;In Flink 1.10, the community further extended the support for Python by adding Python UDFs in PyFlink. Additionally, both the Python UDF environment and dependency management are now supported, allowing users to import third-party libraries in the UDFs, leveraging Python’s rich set of third-party libraries.&lt;/p&gt;

&lt;h1 id=&quot;python-support-for-udfs-in-flink-110&quot;&gt;Python Support for UDFs in Flink 1.10&lt;/h1&gt;

&lt;p&gt;Before diving into how you can define and use Python UDFs, we explain the motivation and background behind how UDFs work in PyFlink and provide some additional context about the implementation of our approach. Below we give a brief introduction on the PyFlink architecture from job submission, all the way to executing the Python UDF.&lt;/p&gt;

&lt;p&gt;The PyFlink architecture mainly includes two parts — local and cluster — as shown in the architecture visual below. The local phase is the compilation of the job, and the cluster is the execution of the job.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-09-pyflink-udfs/pyflink-udf-architecture.png&quot; width=&quot;600px&quot; alt=&quot;PyFlink UDF Architecture&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;For the local part, the Python API is a mapping of the Java API:  each time Python executes a method in the figure above, it will synchronously call the method corresponding to Java through Py4J, and finally generate a Java JobGraph, before submitting it to the cluster.&lt;/p&gt;

&lt;p&gt;For the cluster part, just like ordinary Java jobs, the JobMaster schedules tasks to TaskManagers. The tasks that include Python UDF in a TaskManager involve the execution of Java and Python operators. In the Python UDF operator, various gRPC services are used to provide different communications between the Java VM and the Python VM, such as DataService for data transmissions, StateService for state requirements, and Logging and Metrics Services. These services are built on Beam’s Fn API. While currently only Process mode is supported for Python workers, support for Docker mode and External service mode is also considered for future Flink releases.&lt;/p&gt;

&lt;h1 id=&quot;how-to-use-pyflink-with-udfs-in-flink-110&quot;&gt;How to use PyFlink with UDFs in Flink 1.10&lt;/h1&gt;

&lt;p&gt;This section provides some Python user defined function (UDF) examples, including how to install PyFlink, how to define/register/invoke UDFs in PyFlink and how to execute the job.&lt;/p&gt;

&lt;h2 id=&quot;install-pyflink&quot;&gt;Install PyFlink&lt;/h2&gt;
&lt;p&gt;Using Python in Apache Flink requires installing PyFlink. PyFlink is available through PyPI and can be easily installed using pip:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python -m pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Please note that Python 3.5 or higher is required to install and run PyFlink&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;define-a-python-udf&quot;&gt;Define a Python UDF&lt;/h2&gt;

&lt;p&gt;There are many ways to define a Python scalar function, besides extending the base class &lt;code&gt;ScalarFunction&lt;/code&gt;. The following example shows the different ways of defining a Python scalar function that takes two columns of &lt;code&gt;BIGINT&lt;/code&gt; as input parameters and returns the sum of them as the result.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# option 1: extending the base class `ScalarFunction`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ScalarFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# option 2: Python function&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# option 3: lambda function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# option 4: callable function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CallableAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__call__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CallableAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# option 5: partial function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;partial_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partial_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
          &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;register-a-python-udf&quot;&gt;Register a Python UDF&lt;/h2&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# register the Python function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;invoke-a-python-udf&quot;&gt;Invoke a Python UDF&lt;/h2&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# use the function in Python Table API&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add(a, b)&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Below, you can find a complete example of using Python UDF.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.descriptors&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.udf&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_parallelism&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;/tmp/input&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_temporary_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;/tmp/output&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
                 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_temporary_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;\
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add(a, b)&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; \
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tutorial_job&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;submit-the-job&quot;&gt;Submit the job&lt;/h2&gt;

&lt;p&gt;Firstly, you need to prepare the input data in the “/tmp/input” file. For example,&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ echo &quot;1,2&quot; &amp;gt; /tmp/input&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Next, you can run this example on the command line,&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ python python_udf_sum.py&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, (see more details &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html#job-submission-examples&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Finally, you can see the execution result on the command line:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;$ cat /tmp/output
 3&lt;/code&gt;&lt;/p&gt;

&lt;h2 id=&quot;python-udf-dependency-management&quot;&gt;Python UDF dependency management&lt;/h2&gt;

&lt;p&gt;In many cases, you would like to import third-party dependencies in the Python UDF. The example below provides detailed guidance on how to manage such dependencies.&lt;/p&gt;

&lt;p&gt;Suppose you want to use the &lt;code&gt;mpmath&lt;/code&gt; to perform the sum of the example above. The Python UDF may look like:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;mpmath&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# add third-party dependency&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fadd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To make it available on the worker node that does not contain the dependency, you can specify the dependencies with the following commands and API:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; /tmp
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;mpmath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;1.1.0 &amp;gt; requirements.txt
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip download -d cached_dir -r requirements.txt --no-binary :all:&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_python_requirements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;/tmp/requirements.txt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;/tmp/cached_dir&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A &lt;code&gt;requirements.txt&lt;/code&gt; file that defines the third-party dependencies is used. If the dependencies cannot be accessed in the cluster, then you can specify a directory containing the installation packages of these dependencies by using the parameter “&lt;code&gt;requirements_cached_dir&lt;/code&gt;”, as illustrated in the example above. The dependencies will be uploaded to the cluster and installed offline.&lt;/p&gt;

&lt;h1 id=&quot;conclusion--upcoming-work&quot;&gt;Conclusion &amp;amp; Upcoming work&lt;/h1&gt;

&lt;p&gt;In this blog post, we introduced the architecture of Python UDFs in PyFlink and provided some examples on how to define, register and invoke UDFs. Flink 1.10 brings Python support in the framework to new levels, allowing Python users to write even more magic with their preferred language. The community is actively working towards continuously improving the functionality and performance of PyFlink. Future work in upcoming releases will introduce support for Pandas UDFs in scalar and aggregate functions, add support to use Python UDFs through the SQL client to further expand the usage scope of Python UDFs, provide support for a Python ML Pipeline API and finally work towards even more performance improvements. The picture below provides more details on the roadmap for succeeding releases.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-09-pyflink-udfs/roadmap-of-pyflink.png&quot; width=&quot;600px&quot; alt=&quot;Roadmap of PyFlink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
</description>
<pubDate>Thu, 09 Apr 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html</link>
<guid isPermaLink="true">/2020/04/09/pyflink-udf-support-flink.html</guid>
</item>

<item>
<title>Stateful Functions 2.0 - An Event-driven Database on Apache Flink</title>
<description>&lt;p&gt;Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project.
This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the &lt;strong&gt;first version of an event-driven database&lt;/strong&gt; that is built on Apache Flink.&lt;/p&gt;

&lt;p&gt;Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes.&lt;/p&gt;

&lt;p&gt;With these features, Stateful Functions 2.0 addresses &lt;a href=&quot;https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf&quot;&gt;two of the most cited shortcomings&lt;/a&gt; of many FaaS setups today: consistent state and efficient messaging between functions.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#an-event-driven-database&quot; id=&quot;markdown-toc-an-event-driven-database&quot;&gt;An Event-driven Database&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#event-driven-database-vs-requestresponse-database&quot; id=&quot;markdown-toc-event-driven-database-vs-requestresponse-database&quot;&gt;“Event-driven Database” vs. “Request/Response Database”&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#state-and-consistency&quot; id=&quot;markdown-toc-state-and-consistency&quot;&gt;State and Consistency&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#remote-co-located-or-embedded-functions&quot; id=&quot;markdown-toc-remote-co-located-or-embedded-functions&quot;&gt;Remote, Co-located or Embedded Functions&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#remote-functions&quot; id=&quot;markdown-toc-remote-functions&quot;&gt;Remote Functions&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#co-located-functions&quot; id=&quot;markdown-toc-co-located-functions&quot;&gt;Co-located Functions&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#embedded-functions&quot; id=&quot;markdown-toc-embedded-functions&quot;&gt;Embedded Functions&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#loading-data-into-the-database&quot; id=&quot;markdown-toc-loading-data-into-the-database&quot;&gt;Loading Data into the Database&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#try-it-out-and-get-involved&quot; id=&quot;markdown-toc-try-it-out-and-get-involved&quot;&gt;Try it out and get involved!&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#thank-you&quot; id=&quot;markdown-toc-thank-you&quot;&gt;Thank you!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;an-event-driven-database&quot;&gt;An Event-driven Database&lt;/h2&gt;

&lt;p&gt;When Stateful Functions joined Apache Flink at the beginning of this year, the project had started as a library on top of Flink to build general-purpose event-driven applications. Users would implement &lt;em&gt;functions&lt;/em&gt; that receive and send messages, and maintain state in persistent variables. Flink provided the runtime with efficient exactly-once state and messaging. Stateful Functions 1.0 was a FaaS-inspired mix between stream processing and actor programming — on steroids.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
	&lt;figure&gt;
	&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image2.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 1&quot; /&gt;
	&lt;br /&gt;&lt;br /&gt;
	&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.1:&lt;/b&gt; A ride-sharing app as a Stateful Functions example.&lt;/i&gt;&lt;/figcaption&gt;
	&lt;/figure&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;In version 2.0, Stateful Functions now physically decouples the functions from Flink and the JVM, to invoke them through simple services. That makes it possible to execute functions on a FaaS platform, a Kubernetes deployment or behind a (micro) service.&lt;/p&gt;

&lt;p&gt;Flink invokes the functions through a service endpoint via HTTP or gRPC based on incoming events, and supplies state access. The system makes sure that only one invocation per entity (&lt;code&gt;type&lt;/code&gt;+&lt;code&gt;ID&lt;/code&gt;) is ongoing at any point in time, thus guaranteeing consistency through isolation.
By supplying state access as part of the function invocation, the functions themselves behave like stateless applications and can be managed with the same simplicity and benefits: rapid scalability, scale-to-zero, rolling/zero-downtime upgrades and so on.&lt;/p&gt;

&lt;center&gt;
	&lt;figure&gt;
	&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image5.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 2&quot; /&gt;
	&lt;br /&gt;&lt;br /&gt;
	&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.2:&lt;/b&gt; In Stateful Functions 2.0, functions are stateless and state access is part of the function invocation.&lt;/i&gt;&lt;/figcaption&gt;
	&lt;/figure&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;The functions can be implemented in any programming language that can handle HTTP requests or bring up a gRPC server. The &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;StateFun project&lt;/a&gt; includes a very slim SDK for Python, taking requests and dispatching them to annotated functions. We aim to provide similar SDKs for other languages, such as Go, JavaScript or Rust. Users do not need to write any Flink code (or JVM code) at all; data ingresses/egresses and function endpoints can be defined in a compact YAML spec.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;div class=&quot;row&quot;&gt;
  &lt;div class=&quot;col-lg-6&quot;&gt;
    &lt;div class=&quot;text-center&quot;&gt;
      &lt;figure&gt;
		&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image3.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 3&quot; /&gt;
		&lt;br /&gt;&lt;br /&gt;
		&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.3:&lt;/b&gt; A module declaring a remote endpoint and a function type.&lt;/i&gt;&lt;/figcaption&gt;
	  &lt;/figure&gt;
    &lt;/div&gt;
  &lt;/div&gt;
  &lt;div class=&quot;col-lg-6&quot;&gt;
    &lt;div class=&quot;text-center&quot;&gt;
      &lt;figure&gt;
      	&lt;div style=&quot;line-height:540%;&quot;&gt;
    		&lt;br /&gt;
		&lt;/div&gt;
		&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image10.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 4&quot; /&gt;
		&lt;br /&gt;&lt;br /&gt;
		&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.4:&lt;/b&gt; A Python implementation of a simple classifier function.&lt;/i&gt;&lt;/figcaption&gt;
	  &lt;/figure&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;The Flink processes (and the JVM) are not executing any user-code at all — though this is possible, for performance reasons (see &lt;a href=&quot;#embedded-functions&quot;&gt;Embedded Functions&lt;/a&gt;). Rather than running application-specific dataflows, Flink here stores the state of the functions and provides the dynamic messaging plane through which functions message each other, carefully dispatching messages/invocations to the event-driven functions/services to maintain consistency guarantees.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Effectively, Flink takes the role of the database, but tailored towards event-driven functions and services. 
It integrates state storage with the messaging between (and the invocations of) functions and services. 
Because of this, Stateful Functions 2.0 can be thought of as an “Event-driven Database” on Apache Flink.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;event-driven-database-vs-requestresponse-database&quot;&gt;“Event-driven Database” vs. “Request/Response Database”&lt;/h2&gt;

&lt;p&gt;In the case of a traditional database or key/value store (let’s call them request/response databases), the application issues queries to the database (e.g. SQL via JDBC, GET/PUT via HTTP). In contrast, an event-driven database like StateFun &lt;strong&gt;&lt;em&gt;inverts&lt;/em&gt;&lt;/strong&gt; that relationship between database and application: the database invokes the functions/services based on arriving messages. This fits very naturally with FaaS and many event-driven application architectures.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
	&lt;figure&gt;
	&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image7.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 5&quot; /&gt;
	&lt;br /&gt;&lt;br /&gt;
	&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.5:&lt;/b&gt; Stateful Functions 2.0 inverts the relationship between database and application.&lt;/i&gt;&lt;/figcaption&gt;
	&lt;/figure&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;In the case of applications built on request/response databases, the database is responsible only for the state. Communication between different functions/services is a separate concern handled within the application layer. In contrast to that, an event-driven database takes care of both state storage and message transport, in a tightly integrated manner.&lt;/p&gt;

&lt;p&gt;Similar to &lt;a href=&quot;https://www.brianstorti.com/the-actor-model/&quot;&gt;Actor Programming&lt;/a&gt;, Stateful Functions uses the idea of &lt;em&gt;addressable entities&lt;/em&gt; - here, the entity is a function &lt;code&gt;type&lt;/code&gt; with an invocation scoped to an &lt;code&gt;ID&lt;/code&gt;. These addressable entities own the state and are the targets of messages. Different to actor systems is that the application logic is external and the addressable entities are not physical objects in memory (i.e. actors), but rows in Flink’s managed state, together with the entities’ mailboxes.&lt;/p&gt;

&lt;h3 id=&quot;state-and-consistency&quot;&gt;State and Consistency&lt;/h3&gt;

&lt;p&gt;Besides matching the needs of serverless applications and FaaS well, the event-driven database approach also helps with simplifying consistent state management.&lt;/p&gt;

&lt;p&gt;Consider the example below, with two entities of an application — for example two microservices (&lt;em&gt;Service 1&lt;/em&gt;, &lt;em&gt;Service 2&lt;/em&gt;). &lt;em&gt;Service 1&lt;/em&gt; is invoked, updates the state in the database, and sends a request to &lt;em&gt;Service 2&lt;/em&gt;. Assume that this request fails. There is, in general, no way for &lt;em&gt;Service 1&lt;/em&gt; to know whether &lt;em&gt;Service 2&lt;/em&gt; processed the request and updated its state or not (c.f. &lt;a href=&quot;https://en.wikipedia.org/wiki/Two_Generals%27_Problem&quot;&gt;Two Generals Problem&lt;/a&gt;). To work around that, many techniques exist — making requests idempotent and retrying, commit/rollback protocols, or external transaction coordinators, for example. Solving this in the application layer is complex enough, and including the database into these approaches only adds more complexity.&lt;/p&gt;

&lt;p&gt;In the scenario where the event-driven database takes care of state and messaging, we have a much easier problem to solve. Assume one shard of the database receives the initial message, updates its state, invokes &lt;em&gt;Service 1&lt;/em&gt;, and routes the message produced by the function to another shard, to be delivered to &lt;em&gt;Service 2&lt;/em&gt;. Now assume message transport errored — it may have failed or not, we cannot know for certain. Because the database is in charge of state and messaging, it can offer a generic solution to make sure that either both go through or none does, for example through transactions or &lt;a href=&quot;https://dl.acm.org/doi/abs/10.14778/3137765.3137777&quot;&gt;consistent snapshots&lt;/a&gt;. The application functions are stateless and their invocations without side effects, which means they can be re-invoked again without implications on consistency.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
	&lt;figure&gt;
	&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image8.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 6&quot; /&gt;
	&lt;br /&gt;&lt;br /&gt;
	&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.6:&lt;/b&gt; The event-driven database integrates state access and messaging, guaranteeing consistency.&lt;/i&gt;&lt;/figcaption&gt;
	&lt;/figure&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;That is the big lesson we learned from working on stream processing technology in the past years: &lt;strong&gt;state access/updates and messaging need to be integrated&lt;/strong&gt;. This gives you consistency, scalable behavior and backpressures well based on both state access and compute bottlenecks.&lt;/p&gt;

&lt;p&gt;Despite state and computation being physically separated here, the scheduling/dispatching of function invocations is still integrated and physically co-located with state access, preserving the consistency guarantees given by physical state/compute co-location.&lt;/p&gt;

&lt;h2 id=&quot;remote-co-located-or-embedded-functions&quot;&gt;Remote, Co-located or Embedded Functions&lt;/h2&gt;

&lt;p&gt;Functions can be deployed in various ways that trade off loose coupling and independent scaling with performance overhead. Each module of functions can be of a different kind, so some functions can run remote, while others could run embedded.&lt;/p&gt;

&lt;h3 id=&quot;remote-functions&quot;&gt;Remote Functions&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Remote Functions&lt;/em&gt; are the mechanism described so far, where functions are deployed separately from the Flink StateFun cluster. The state/messaging tier (i.e. the Flink processes) and the function tier can be deployed and scaled independently. All function invocations are remote and have to go through the endpoint service.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image6.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 7&quot; /&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;In a similar way as databases are accessed via a standardized protocol (e.g. ODBC/JDBC for relational databases, REST for many key/value stores), StateFun 2.0 invokes functions and services through a standardized protocol: HTTP or gRPC with data in a well-defined ProtoBuf schema.&lt;/p&gt;

&lt;h3 id=&quot;co-located-functions&quot;&gt;Co-located Functions&lt;/h3&gt;

&lt;p&gt;An alternative way of deploying functions is &lt;em&gt;co-location&lt;/em&gt; with the Flink JVM processes. In such a setup, each Flink TaskManager would talk to one function process sitting “next to it”. A common way to do this is to use a system like Kubernetes and deploy pods consisting of a Flink container and the function container that communicate via the pod-local network.&lt;/p&gt;

&lt;p&gt;This mode supports different languages while avoiding to route invocations through a Service/Gateway/LoadBalancer, but it cannot scale the state and compute parts independently.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image9.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 8&quot; /&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;This style of deployment is similar to how &lt;a href=&quot;https://beam.apache.org/roadmap/portability/&quot;&gt;Apache Beam’s portability layer&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/tutorials/python_table_api.html&quot;&gt;Flink’s Python API&lt;/a&gt; deploy their non-JVM language SDKs.&lt;/p&gt;

&lt;h3 id=&quot;embedded-functions&quot;&gt;Embedded Functions&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;Embedded Functions&lt;/em&gt; are the mode of Stateful Functions 1.0 and Flink’s Java/Scala stream processing APIs. Functions are deployed into the JVM and are directly invoked with the messages and state access. This is the most performant way, though at the cost of only supporting JVM languages.&lt;/p&gt;

&lt;div style=&quot;line-height:60%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image11.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 9&quot; /&gt;
&lt;/center&gt;

&lt;div style=&quot;line-height:150%;&quot;&gt;
    &lt;br /&gt;
&lt;/div&gt;

&lt;p&gt;Following the database analogy, embedded functions are a bit like &lt;em&gt;stored procedures&lt;/em&gt;, but in a principled way: the functions here are normal Java/Scala/Kotlin functions implementing standard interfaces and can be developed or tested in any IDE.&lt;/p&gt;

&lt;h2 id=&quot;loading-data-into-the-database&quot;&gt;Loading Data into the Database&lt;/h2&gt;

&lt;p&gt;When building a new stateful application, you usually don’t start from a completely blank slate. Often, the application has initial state, such as initial “bootstrap” state, or state from previous versions of the application. When using a database, one could simply bulk load the data to prepare the application.&lt;/p&gt;

&lt;p&gt;The equivalent step for Flink would be to write a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/savepoints.html&quot;&gt;savepoint&lt;/a&gt; that contains the initial state. Savepoints are snapshots of the state of the distributed stream processing application and can be passed to Flink to start processing from that state. Think of them as a database dump, but of a distributed streaming database. In the case of StateFun, the savepoint would contain the state of the functions.&lt;/p&gt;

&lt;p&gt;To create a savepoint for a Stateful Functions program, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/deployment-and-operations/state-bootstrap.html&quot;&gt;State Bootstrapping API&lt;/a&gt; that is part of StateFun 2.0. The State Bootstrapping API uses Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/batch/&quot;&gt;DataSet API&lt;/a&gt;, but we plan to expand this to use SQL in the next versions.&lt;/p&gt;

&lt;h2 id=&quot;try-it-out-and-get-involved&quot;&gt;Try it out and get involved!&lt;/h2&gt;

&lt;p&gt;We hope that we could convey some of the excitement we feel about Stateful Functions. If we managed to pique your curiosity, try it out — for example, starting with &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/getting-started/python_walkthrough.html&quot;&gt;this walkthrough&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The project is still in a comparatively early stage, so if you want to get involved, there is lots to work on: SDKs for other languages (e.g. Go, JavaScript, Rust), ingresses/egresses and tools for testing, among others.&lt;/p&gt;

&lt;p&gt;To follow the project and learn more, please check out these resources:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Code: &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;https://github.com/apache/flink-statefun&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Docs: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/&quot;&gt;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Apache Flink project site: &lt;a href=&quot;https://flink.apache.org/&quot;&gt;https://flink.apache.org/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Apache Flink on Twitter: &lt;a href=&quot;https://twitter.com/apacheflink&quot;&gt;@ApacheFlink&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Stateful Functions Webpage: &lt;a href=&quot;https://statefun.io&quot;&gt;https://statefun.io&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;Stateful Functions on Twitter: &lt;a href=&quot;https://twitter.com/statefun_io&quot;&gt;@StateFun_IO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;thank-you&quot;&gt;Thank you!&lt;/h2&gt;

&lt;p&gt;The Apache Flink community would like to thank all contributors that have made this release possible:&lt;/p&gt;

&lt;p&gt;David Anderson, Dian Fu, Igal Shilman, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai, hequn8128&lt;/p&gt;

</description>
<pubDate>Tue, 07 Apr 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html</link>
<guid isPermaLink="true">/news/2020/04/07/release-statefun-2.0.0.html</guid>
</item>

<item>
<title>Flink Community Update - April&#39;20</title>
<description>&lt;p&gt;While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.&lt;/p&gt;

&lt;p&gt;And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the &lt;a href=&quot;https://www.flink-forward.org/sf-2020&quot;&gt;Flink Forward Virtual Conference&lt;/a&gt;, on April 22-24 (see &lt;a href=&quot;#upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt;). Hope to see you there!&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#the-year-so-far-in-flink&quot; id=&quot;markdown-toc-the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-110-release&quot; id=&quot;markdown-toc-flink-110-release&quot;&gt;Flink 1.10 Release&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#stateful-functions-contribution-and-20-release&quot; id=&quot;markdown-toc-stateful-functions-contribution-and-20-release&quot;&gt;Stateful Functions Contribution and 2.0 Release&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#building-up-to-flink-111&quot; id=&quot;markdown-toc-building-up-to-flink-111&quot;&gt;Building up to Flink 1.11&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt;        &lt;ul&gt;
          &lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#a-look-into-the-flink-repository&quot; id=&quot;markdown-toc-a-look-into-the-flink-repository&quot;&gt;A Look into the Flink Repository&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-community-packages&quot; id=&quot;markdown-toc-flink-community-packages&quot;&gt;Flink Community Packages&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-engine-room&quot; id=&quot;markdown-toc-flink-engine-room&quot;&gt;Flink “Engine Room”&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upcoming-events&quot; id=&quot;markdown-toc-upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-forward-virtual-conference&quot; id=&quot;markdown-toc-flink-forward-virtual-conference&quot;&gt;Flink Forward Virtual Conference&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#others&quot; id=&quot;markdown-toc-others&quot;&gt;Others&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h1 id=&quot;the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/h1&gt;

&lt;h2 id=&quot;flink-110-release&quot;&gt;Flink 1.10 Release&lt;/h2&gt;

&lt;p&gt;To kick off the new year, the Flink community &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;released Flink 1.10&lt;/a&gt; with the record contribution of over 200 engineers. This release introduced significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and advances in Python support (PyFlink). Flink 1.10 also marked the completion of the &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html#preview-of-the-new-blink-sql-query-processor&quot;&gt;Blink integration&lt;/a&gt;, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.&lt;/p&gt;

&lt;p&gt;The community is now discussing the &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-10-1-td38689.html#a38690&quot;&gt;release of Flink 1.10.1&lt;/a&gt;, covering some outstanding bugs from Flink 1.10.&lt;/p&gt;

&lt;h2 id=&quot;stateful-functions-contribution-and-20-release&quot;&gt;Stateful Functions Contribution and 2.0 Release&lt;/h2&gt;

&lt;p&gt;Last January, the first version of Stateful Functions (&lt;a href=&quot;https://statefun.io/&quot;&gt;statefun.io&lt;/a&gt;) code was pushed to the &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;Flink repository&lt;/a&gt;. Stateful Functions started out as an API to build general purpose event-driven applications on Flink, taking advantage of its advanced state management mechanism to cut the “middleman” that usually handles state coordination in such applications (e.g. a database).&lt;/p&gt;

&lt;p&gt;In a &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Update-on-Flink-Stateful-Functions-what-are-the-next-steps-tp38646.html&quot;&gt;recent update&lt;/a&gt;, some new features were announced, like multi-language support (including a Python SDK), function unit testing and Stateful Functions’ own flavor of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html&quot;&gt;State Processor API&lt;/a&gt;. The release cycle will be independent from core Flink releases and the Release Candidate (RC) has been created — so, &lt;strong&gt;you can expect Stateful Functions 2.0 to be released very soon!&lt;/strong&gt;&lt;/p&gt;

&lt;h2 id=&quot;building-up-to-flink-111&quot;&gt;Building up to Flink 1.11&lt;/h2&gt;

&lt;p&gt;Amidst the usual outpour of discussion threads, JIRA tickets and FLIPs, the community is working full steam on bringing Flink 1.11 to life in the next few months. The feature freeze is currently scheduled for late April, so the release is expected around mid May. 
The upcoming release will focus on new features and integrations that broaden the scope of Flink use cases, as well as core runtime enhancements to streamline the operations of complex deployments.&lt;/p&gt;

&lt;p&gt;Some of the plans on the use case side include support for changelog streams in the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL&quot;&gt;FLIP-105&lt;/a&gt;), easy streaming data ingestion into Apache Hive (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table&quot;&gt;FLIP-115&lt;/a&gt;) and support for Pandas DataFrames in PyFlink. On the operational side, the much anticipated new Source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints&quot;&gt;FLIP-76&lt;/a&gt;) and some changes to network memory management will allow to speed up checkpointing under backpressure.&lt;/p&gt;

&lt;p&gt;Throw into the mix improvements around type systems, the WebUI, metrics reporting and supported formats, this release is bound to keep the community busy. For a complete overview of the ongoing development, check &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-of-Apache-Flink-1-11-td38724.html#a38793&quot;&gt;this discussion&lt;/a&gt; and follow the weekly updates on the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;

&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;1 PMC (Project Management Committee) Member&lt;/strong&gt; and &lt;strong&gt;5 new Committers&lt;/strong&gt; since the last update (September 2019):&lt;/p&gt;

&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Jark Wu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Zili Chen, Jingsong Lee, Yu Li, Dian Fu, Zhu Zhu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Congratulations to all and thank you for your hardworking commitment to Flink!&lt;/p&gt;

&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;

&lt;h2 id=&quot;a-look-into-the-flink-repository&quot;&gt;A Look into the Flink Repository&lt;/h2&gt;

&lt;p&gt;In the &lt;a href=&quot;https://flink.apache.org/news/2019/09/10/community-update.html&quot;&gt;last update&lt;/a&gt;, we shared some numbers around Flink releases and mailing list activity. This time, we’re looking into the activity in the Flink repository and how it’s evolving.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-03-30-flink-community-update/2020-03-30-flink-community-update_1.png&quot; width=&quot;725px&quot; alt=&quot;GitHub 1&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;There is a clear upward trend in the number of contributions to the repository, based on the number of commits. This reflects the &lt;strong&gt;fast pace of development&lt;/strong&gt; the project is experiencing and also the &lt;strong&gt;successful integration of the China-based Flink contributors&lt;/strong&gt; started early last year. To complement these observations, the repository registered a &lt;strong&gt;1.5x increase in the number of individual contributors in 2019&lt;/strong&gt;, compared to the previous year.&lt;/p&gt;

&lt;p&gt;But did this increase in capacity produce any other measurable benefits?&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-03-30-flink-community-update/2020-03-30-flink-community-update_2.png&quot; width=&quot;725px&quot; alt=&quot;GitHub 2&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;If we look at the average time of Pull Request (PR) “resolution”, it seems like it did: &lt;strong&gt;the average time it takes to close a PR these days has been steadily decreasing&lt;/strong&gt; since last year, sitting between 5-6 days for the past few months.&lt;/p&gt;

&lt;p&gt;These are great indicators of the health of Flink as an open source project!&lt;/p&gt;

&lt;h2 id=&quot;flink-community-packages&quot;&gt;Flink Community Packages&lt;/h2&gt;

&lt;p&gt;If you missed the launch of &lt;a href=&quot;http://flink-packages.org/&quot;&gt;flink-packages.org&lt;/a&gt;, here’s a reminder! Ververica has &lt;a href=&quot;https://www.ververica.com/blog/announcing-flink-community-packages&quot;&gt;created (and open sourced)&lt;/a&gt; a website that showcases the work of the community to push forward the ecosystem surrounding Flink. There, you can explore existing packages (like the Pravega and Pulsar Flink connectors, or the Flink Kubernetes operators developed by Google and Lyft) and also submit your own contributions to the ecosystem.&lt;/p&gt;

&lt;h2 id=&quot;flink-engine-room&quot;&gt;Flink “Engine Room”&lt;/h2&gt;

&lt;p&gt;The community has recently launched the &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewrecentblogposts.action?key=FLINK&quot;&gt;“Engine Room”&lt;/a&gt;, a dedicated space in Flink’s Wiki for knowledge sharing between contributors. The goal of this initiative is to make ongoing development on Flink internals more transparent across different work streams, and also to help new contributors get on board with best practices. The first blogpost is already up and sheds light on the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines&quot;&gt;migration of Flink’s CI infrastructure from Travis to Azure Pipelines&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;upcoming-events&quot;&gt;Upcoming Events&lt;/h1&gt;

&lt;h2 id=&quot;flink-forward-virtual-conference&quot;&gt;Flink Forward Virtual Conference&lt;/h2&gt;

&lt;p&gt;The organization of Flink Forward had to make the hard decision of cancelling this year’s event in San Francisco. But all is not lost! &lt;strong&gt;Flink Forward SF will be held online on April 22-24 and you can register (for free)&lt;/strong&gt; &lt;a href=&quot;https://www.flink-forward.org/sf-2020&quot;&gt;here&lt;/a&gt;. Join the community for interactive talks and Q&amp;amp;A sessions with core Flink contributors and companies like Splunk, Lyft, Netflix or Google.&lt;/p&gt;

&lt;h2 id=&quot;others&quot;&gt;Others&lt;/h2&gt;

&lt;p&gt;Events across the globe have come to a halt due to the growing concerns around COVID-19, so this time we’ll leave you with some interesting content to read instead. In addition to this written content, you can also recap last year’s sessions from &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD207Aa8b5CsZjc7Z_KRezGz&quot;&gt;Flink Forward Berlin&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD3ANoNinSx3Au-poZTHvbF5&quot;&gt;Flink Forward China&lt;/a&gt;!&lt;/p&gt;

&lt;table class=&quot;table table-bordered&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Type&lt;/th&gt;
      &lt;th&gt;Links&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-bookmark&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Blogposts&lt;/td&gt;
      &lt;td&gt;&lt;ul&gt;
		  &lt;li&gt;&lt;a href=&quot;https://medium.com/bird-engineering/replayable-process-functions-in-flink-time-ordering-and-timers-28007a0210e1&quot;&gt;Replayable Process Functions: Time, Ordering, and Timers @Bird&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://engineering.salesforce.com/application-log-intelligence-performance-insights-at-salesforce-using-flink-92955f30573f&quot;&gt;Application Log Intelligence &amp;amp; Performance Insights at Salesforce Using Flink @Salesforce&lt;/a&gt;&lt;/li&gt;
		  &lt;/ul&gt;
		  &lt;ul&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html&quot;&gt;State Unlocked: Interacting with State in Apache Flink&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html&quot;&gt;Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html&quot;&gt;Apache Beam: How Beam Runs on Top of Flink&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/features/2020/03/27/flink-for-data-warehouse.html&quot;&gt;Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration&lt;/a&gt;&lt;/li&gt;
		&lt;/ul&gt;
	  &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;span class=&quot;glyphicon glyphicon-console&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Tutorials&lt;/td&gt;
      &lt;td&gt;&lt;ul&gt;
      	  &lt;li&gt;&lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-3-streaming-5fca1e16754&quot;&gt;Flink on Zeppelin — (Part 3). Streaming&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics&quot;&gt;Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/02/20/ddl.html&quot;&gt;No Java Required: Configuring Sources and Sinks in SQL&lt;/a&gt;&lt;/li&gt;
		  &lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html&quot;&gt;A Guide for Unit Testing in Apache Flink&lt;/a&gt;&lt;/li&gt;
		  &lt;/ul&gt;
	  &lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more.&lt;/p&gt;
</description>
<pubDate>Wed, 01 Apr 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/01/community-update.html</link>
<guid isPermaLink="true">/news/2020/04/01/community-update.html</guid>
</item>

<item>
<title>Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</title>
<description>&lt;p&gt;In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#introduction&quot; id=&quot;markdown-toc-introduction&quot;&gt;Introduction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-and-its-integration-with-hive-comes-into-the-scene&quot; id=&quot;markdown-toc-flink-and-its-integration-with-hive-comes-into-the-scene&quot;&gt;Flink and Its Integration With Hive Comes into the Scene&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#unified-metadata-management&quot; id=&quot;markdown-toc-unified-metadata-management&quot;&gt;Unified Metadata Management&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#stream-processing&quot; id=&quot;markdown-toc-stream-processing&quot;&gt;Stream Processing&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#compatible-with-more-hive-versions&quot; id=&quot;markdown-toc-compatible-with-more-hive-versions&quot;&gt;Compatible with More Hive Versions&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#reuse-hive-user-defined-functions-udfs&quot; id=&quot;markdown-toc-reuse-hive-user-defined-functions-udfs&quot;&gt;Reuse Hive User Defined Functions (UDFs)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#enhanced-read-and-write-on-hive-data&quot; id=&quot;markdown-toc-enhanced-read-and-write-on-hive-data&quot;&gt;Enhanced Read and Write on Hive Data&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#formats&quot; id=&quot;markdown-toc-formats&quot;&gt;Formats&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#more-data-types&quot; id=&quot;markdown-toc-more-data-types&quot;&gt;More Data Types&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#roadmap&quot; id=&quot;markdown-toc-roadmap&quot;&gt;Roadmap&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#summary&quot; id=&quot;markdown-toc-summary&quot;&gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;

&lt;p&gt;What are some of the latest requirements for your data warehouse and data infrastructure in 2020?&lt;/p&gt;

&lt;p&gt;We’ve came up with some for you.&lt;/p&gt;

&lt;p&gt;Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. People become less and less tolerant of delays between when data is generated and when it arrives at their hands, ready to use. Hours or even days of delay is not acceptable anymore. Users are expecting minutes, or even seconds, of end-to-end latency for data in their warehouse, to get quicker-than-ever insights.&lt;/p&gt;

&lt;p&gt;Secondly, the infrastructure should be able to handle both offline batch data for offline analytics and exploration, and online streaming data for more timely analytics. Both are indispensable as they both have very valid use cases. Apart from the real time processing mentioned above, batch processing would still exist as it’s good for ad hoc queries and explorations, and full-size calculations. Your modern infrastructure should not force users to choose between one or the other, it should offer users both options for a world-class data infrastructure.&lt;/p&gt;

&lt;p&gt;Thirdly, the data players, including data engineers, data scientists, analysts, and operations, urge a more unified infrastructure than ever before for easier ramp-up and higher working efficiency. The big data landscape has been fragmented for years - companies may have one set of infrastructure for real time processing, one set for batch, one set for OLAP, etc. That, oftentimes, comes as a result of the legacy of lambda architecture, which was popular in the era when stream processors were not as mature as today and users had to periodically run batch processing as a way to correct streaming pipelines. Well, it’s a different era now! As stream processing becomes mainstream and dominant, end users no longer want to learn shattered pieces of skills and maintain many moving parts with all kinds of tools and pipelines. Instead, what they really need is a unified analytics platform that can be mastered easily, and simplify any operational complexity.&lt;/p&gt;

&lt;p&gt;If any of these resonate with you, you just found the right post to read: we have never been this close to the vision by strengthening Flink’s integration with Hive to a production grade.&lt;/p&gt;

&lt;h2 id=&quot;flink-and-its-integration-with-hive-comes-into-the-scene&quot;&gt;Flink and Its Integration With Hive Comes into the Scene&lt;/h2&gt;

&lt;p&gt;Apache Flink has been a proven scalable system to handle extremely high workload of streaming data in super low latency in many giant tech companies.&lt;/p&gt;

&lt;p&gt;Despite its huge success in the real time processing domain, at its deep root, Flink has been faithfully following its inborn philosophy of being &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;a unified data processing engine for both batch and streaming&lt;/a&gt;, and taking a streaming-first approach in its architecture to do batch processing. By making batch a special case for streaming, Flink really leverages its cutting edge streaming capabilities and applies them to batch scenarios to gain the best offline performance. Flink’s batch performance has been quite outstanding in the early days and has become even more impressive, as the community started merging Blink, Alibaba’s fork of Flink, back to Flink in 1.9 and finished it in 1.10.&lt;/p&gt;

&lt;p&gt;On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.&lt;/p&gt;

&lt;p&gt;Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;Flink 1.10&lt;/a&gt; and we can’t wait to walk you through the details.&lt;/p&gt;

&lt;h3 id=&quot;unified-metadata-management&quot;&gt;Unified Metadata Management&lt;/h3&gt;

&lt;p&gt;Hive Metastore has evolved into the de facto metadata hub over the years in the Hadoop, or even the cloud, ecosystem. Many companies have a single Hive Metastore service instance in production to manage all of their schemas, either Hive or non-Hive metadata, as the single source of truth.&lt;/p&gt;

&lt;p&gt;In 1.9 we introduced Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html&quot;&gt;HiveCatalog&lt;/a&gt;, connecting Flink to users’ rich metadata pool. The meaning of &lt;code&gt;HiveCatalog&lt;/code&gt; is two-fold here. First, it allows Apache Flink users to utilize Hive Metastore to store and manage Flink’s metadata, including tables, UDFs, and statistics of data. Second, it enables Flink to access Hive’s existing metadata, so that Flink itself can read and write Hive tables.&lt;/p&gt;

&lt;p&gt;In Flink 1.10, users can store Flink’s own tables, views, UDFs, statistics in Hive Metastore on all of the compatible Hive versions mentioned above. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html#example&quot;&gt;Here’s an end-to-end example&lt;/a&gt; of how to store a Flink’s Kafka source table in Hive Metastore and later query the table in Flink SQL.&lt;/p&gt;

&lt;h3 id=&quot;stream-processing&quot;&gt;Stream Processing&lt;/h3&gt;

&lt;p&gt;The Hive integration feature in Flink 1.10 empowers users to re-imagine what they can accomplish with their Hive data and unlock stream processing use cases:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;join real-time streaming data in Flink with offline Hive data for more complex data processing&lt;/li&gt;
  &lt;li&gt;backfill Hive data with Flink directly in a unified fashion&lt;/li&gt;
  &lt;li&gt;leverage Flink to move real-time data into Hive more quickly, greatly shortening the end-to-end latency between when data is generated and when it arrives at your data warehouse for analytics, from hours — or even days — to minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;compatible-with-more-hive-versions&quot;&gt;Compatible with More Hive Versions&lt;/h3&gt;

&lt;p&gt;In Flink 1.10, we brought full coverage to most Hive versions including 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 2.3, and 3.1. Take a look &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;reuse-hive-user-defined-functions-udfs&quot;&gt;Reuse Hive User Defined Functions (UDFs)&lt;/h3&gt;

&lt;p&gt;Users can &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#hive-user-defined-functions&quot;&gt;reuse all kinds of Hive UDFs in Flink&lt;/a&gt; since Flink 1.9.&lt;/p&gt;

&lt;p&gt;This is a great win for Flink users with past history with the Hive ecosystem, as they may have developed custom business logic in their Hive UDFs. Being able to run these functions without any rewrite saves users a lot of time and brings them a much smoother experience when they migrate to Flink.&lt;/p&gt;

&lt;p&gt;To take it a step further, Flink 1.10 introduces &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#use-hive-built-in-functions-via-hivemodule&quot;&gt;compatibility of Hive built-in functions via HiveModule&lt;/a&gt;. Over the years, the Hive community has developed a few hundreds of built-in functions that are super handy for users. For those built-in functions that don’t exist in Flink yet, users are now able to leverage the existing Hive built-in functions that they are familiar with and complete their jobs seamlessly.&lt;/p&gt;

&lt;h3 id=&quot;enhanced-read-and-write-on-hive-data&quot;&gt;Enhanced Read and Write on Hive Data&lt;/h3&gt;

&lt;p&gt;Flink 1.10 extends its read and write capabilities on Hive data to all the common use cases with better performance.&lt;/p&gt;

&lt;p&gt;On the reading side, Flink now can read Hive regular tables, partitioned tables, and views. Lots of optimization techniques are developed around reading, including partition pruning and projection pushdown to transport less data from file storage, limit pushdown for faster experiment and exploration, and vectorized reader for ORC files.&lt;/p&gt;

&lt;p&gt;On the writing side, Flink 1.10 introduces “INSERT INTO” and “INSERT OVERWRITE” to its syntax, and can write to not only Hive’s regular tables, but also partitioned tables with either static or dynamic partitions.&lt;/p&gt;

&lt;h3 id=&quot;formats&quot;&gt;Formats&lt;/h3&gt;

&lt;p&gt;Your engine should be able to handle all common types of file formats to give you the freedom of choosing one over another in order to fit your business needs. It’s no exception for Flink. We have tested the following table storage formats: text, csv, SequenceFile, ORC, and Parquet.&lt;/p&gt;

&lt;h3 id=&quot;more-data-types&quot;&gt;More Data Types&lt;/h3&gt;

&lt;p&gt;In Flink 1.10, we added support for a few more frequently-used Hive data types that were not covered by Flink 1.9. Flink users now should have a full, smooth experience to query and manipulate Hive data from Flink.&lt;/p&gt;

&lt;h3 id=&quot;roadmap&quot;&gt;Roadmap&lt;/h3&gt;

&lt;p&gt;Integration between any two systems is a never-ending story.&lt;/p&gt;

&lt;p&gt;We are constantly improving Flink itself and the Flink-Hive integration also gets improved by collecting user feedback and working with folks in this vibrant community.&lt;/p&gt;

&lt;p&gt;After careful consideration and prioritization of the feedback we received, we have prioritize many of the below requests for the next Flink release of 1.11.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Hive streaming sink so that Flink can stream data into Hive tables, bringing a real streaming experience to Hive&lt;/li&gt;
  &lt;li&gt;Native Parquet reader for better performance&lt;/li&gt;
  &lt;li&gt;Additional interoperability - support creating Hive tables, views, functions in Flink&lt;/li&gt;
  &lt;li&gt;Better out-of-box experience with built-in dependencies, including documentations&lt;/li&gt;
  &lt;li&gt;JDBC driver so that users can reuse their existing toolings to run SQL jobs on Flink&lt;/li&gt;
  &lt;li&gt;Hive syntax and semantic compatible mode&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have more feature requests or discover bugs, please reach out to the community through mailing list and JIRAs.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Data warehousing is shifting to a more real-time fashion, and Apache Flink can make a difference for your organization in this space.&lt;/p&gt;

&lt;p&gt;Flink 1.10 brings production-ready Hive integration and empowers users to achieve more in both metadata management and unified/batch data processing.&lt;/p&gt;

&lt;p&gt;We encourage all our users to get their hands on Flink 1.10. You are very welcome to join the community in development, discussions, and all other kinds of collaborations in this topic.&lt;/p&gt;

</description>
<pubDate>Fri, 27 Mar 2020 03:30:00 +0100</pubDate>
<link>https://flink.apache.org/features/2020/03/27/flink-for-data-warehouse.html</link>
<guid isPermaLink="true">/features/2020/03/27/flink-for-data-warehouse.html</guid>
</item>

<item>
<title>Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</title>
<description>&lt;p&gt;In the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;first article&lt;/a&gt; of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded &lt;code&gt;KeysExtractor&lt;/code&gt; implementation.&lt;/p&gt;

&lt;p&gt;We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. You will learn how the approach to data partitioning described in &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Part 1&lt;/a&gt; can be applied in combination with a dynamic configuration. These two patterns, when used together, can eliminate the need to recompile the code and redeploy your Flink job for a wide range of modifications of the business logic.&lt;/p&gt;

&lt;h2 id=&quot;rules-broadcasting&quot;&gt;Rules Broadcasting&lt;/h2&gt;

&lt;p&gt;Let’s first have a look at the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html#dynamic-data-partitioning&quot;&gt;previously-defined&lt;/a&gt; data-processing pipeline:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;DynamicKeyFunction&lt;/code&gt; provides dynamic data partitioning while &lt;code&gt;DynamicAlertFunction&lt;/code&gt; is responsible for executing the main logic of processing transactions and sending alert messages according to defined rules.&lt;/p&gt;

&lt;p&gt;Vol.1 of this series simplified the use case and assumed that the applied set of rules is pre-initialized and accessible via the &lt;code&gt;List&amp;lt;Rules&amp;gt;&lt;/code&gt; within &lt;code&gt;DynamicKeyFunction&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;cm&quot;&gt;/* Simplified */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* Rules that are initialized somehow.*/&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Adding rules to this list is obviously possible directly inside the code of the Flink Job at the stage of its initialization (Create a &lt;code&gt;List&lt;/code&gt; object; use it’s &lt;code&gt;add&lt;/code&gt; method). A major drawback of doing so is that it will require recompilation of the job with each rule modification. In a real Fraud Detection system, rules are expected to change on a frequent basis, making this approach unacceptable from the point of view of business and operational requirements. A different approach is needed.&lt;/p&gt;

&lt;p&gt;Next, let’s take a look at a sample rule definition that we introduced in the previous post of the series:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/rule-dsl.png&quot; width=&quot;800px&quot; alt=&quot;Figure 1: Rule definition&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 1: Rule definition&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The previous post covered use of &lt;code&gt;groupingKeyNames&lt;/code&gt; by &lt;code&gt;DynamicKeyFunction&lt;/code&gt; to extract message keys. Parameters from the second part of this rule are used by &lt;code&gt;DynamicAlertFunction&lt;/code&gt;: they define the actual logic of the performed operations and their parameters (such as the alert-triggering limit). This means that the same rule must be present in both &lt;code&gt;DynamicKeyFunction&lt;/code&gt; and &lt;code&gt;DynamicAlertFunction&lt;/code&gt;. To achieve this result, we will use the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/broadcast_state.html&quot;&gt;broadcast data distribution mechanism&lt;/a&gt; of Apache Flink.&lt;/p&gt;

&lt;p&gt;Figure 2 presents the final job graph of the system that we are building:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/job-graph.png&quot; width=&quot;800px&quot; alt=&quot;Figure 2: Job Graph of the Fraud Detection Flink Job&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 2: Job Graph of the Fraud Detection Flink Job&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The main blocks of the Transactions processing pipeline are:&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Transaction Source&lt;/strong&gt; that consumes transaction messages from Kafka partitions in parallel. &lt;br /&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Dynamic Key Function&lt;/strong&gt; that performs data enrichment with a dynamic key. The subsequent &lt;code&gt;keyBy&lt;/code&gt; hashes this dynamic key and partitions the data accordingly among all parallel instances of the following operator.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Dynamic Alert Function&lt;/strong&gt; that accumulates a data window and creates Alerts based on it.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;data-exchange-inside-apache-flink&quot;&gt;Data Exchange inside Apache Flink&lt;/h2&gt;

&lt;p&gt;The job graph above also indicates various data exchange patterns between the operators. In order to understand how the broadcast pattern works, let’s take a short detour and discuss what methods of message propagation exist in Apache Flink’s distributed runtime.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;FORWARD&lt;/strong&gt; connection after the Transaction Source means that all data consumed by one of the parallel instances of the Transaction Source operator is transferred to exactly one instance of the subsequent &lt;code&gt;DynamicKeyFunction&lt;/code&gt; operator. It also indicates the same level of parallelism of the two connected operators (12 in the above case). This communication pattern is illustrated in Figure 3. Orange circles represent transactions, and dotted rectangles depict parallel instances of the conjoined operators.&lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/forward.png&quot; width=&quot;800px&quot; alt=&quot;Figure 3: FORWARD message passing across operator instances&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 3: FORWARD message passing across operator instances&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;HASH&lt;/strong&gt; connection between &lt;code&gt;DynamicKeyFunction&lt;/code&gt; and &lt;code&gt;DynamicAlertFunction&lt;/code&gt; means that for each message a hash code is calculated and messages are evenly distributed among available parallel instances of the next operator. Such a connection needs to be explicitly “requested” from Flink by using &lt;code&gt;keyBy&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/hash.png&quot; width=&quot;800px&quot; alt=&quot;Figure 4: HASHED message passing across operator instances (via `keyBy`)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 4: HASHED message passing across operator instances (via `keyBy`)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A &lt;strong&gt;REBALANCE&lt;/strong&gt; distribution is either caused by an explicit call to &lt;code&gt;rebalance()&lt;/code&gt; or by a change of parallelism (12 -&amp;gt; 1 in the case of the job graph from Figure 2). Calling &lt;code&gt;rebalance()&lt;/code&gt; causes data to be repartitioned in a round-robin fashion and can help to mitigate data skew in certain scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/rebalance.png&quot; width=&quot;800px&quot; alt=&quot;Figure 5: REBALANCE message passing across operator instances&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 5: REBALANCE message passing across operator instances&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The Fraud Detection job graph in Figure 2 contains an additional data source: &lt;em&gt;Rules Source&lt;/em&gt;. It also consumes from Kafka. Rules are “mixed into” the main processing data flow through the &lt;strong&gt;BROADCAST&lt;/strong&gt; channel. Unlike other methods of transmitting data between operators, such as &lt;code&gt;forward&lt;/code&gt;, &lt;code&gt;hash&lt;/code&gt; or &lt;code&gt;rebalance&lt;/code&gt; that make each message available for processing in only one of the parallel instances of the receiving operator, &lt;code&gt;broadcast&lt;/code&gt; makes each message available at the input of all of the parallel instances of the operator to which the &lt;em&gt;broadcast stream&lt;/em&gt; is connected. This makes &lt;code&gt;broadcast&lt;/code&gt; applicable to a wide range of tasks that need to affect the processing of all messages, regardless of their key or source partition.&lt;/p&gt;

&lt;center&gt;
 &lt;img src=&quot;/img/blog/patterns-blog-2/broadcast.png&quot; width=&quot;800px&quot; alt=&quot;Figure 6: BROADCAST message passing across operator instances&quot; /&gt;
 &lt;br /&gt;
 &lt;i&gt;&lt;small&gt;Figure 6: BROADCAST message passing across operator instances&lt;/small&gt;&lt;/i&gt;
 &lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
There are actually a few more specialized data partitioning schemes in Flink which we did not mention here. If you want to find out more, please refer to Flink’s documentation on &lt;strong&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#physical-partitioning&quot;&gt;stream partitioning&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;

&lt;h2 id=&quot;broadcast-state-pattern&quot;&gt;Broadcast State Pattern&lt;/h2&gt;

&lt;p&gt;In order to make use of the Rules Source, we need to “connect” it to the main data stream:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Streams setup&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[...]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesUpdateStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[...]&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;BroadcastStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesUpdateStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;broadcast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Processing pipeline setup&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As you can see, the broadcast stream can be created from any regular stream by calling the &lt;code&gt;broadcast&lt;/code&gt; method and specifying a state descriptor. Flink assumes that broadcasted data needs to be stored and retrieved while processing events of the main data flow and, therefore, always automatically creates a corresponding &lt;em&gt;broadcast state&lt;/em&gt; from this state descriptor. This is different from any other Apache Flink state type in which you need to initialize it in the &lt;code&gt;open()&lt;/code&gt; method of the  processing function. Also note that broadcast state always has a key-value format (&lt;code&gt;MapState&lt;/code&gt;).&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rules&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Connecting to &lt;code&gt;rulesStream&lt;/code&gt; causes some changes in the signature of the processing functions. The previous article presented it in a slightly simplified way as a &lt;code&gt;ProcessFunction&lt;/code&gt;. However, &lt;code&gt;DynamicKeyFunction&lt;/code&gt; is actually a &lt;code&gt;BroadcastProcessFunction&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IN2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                        &lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                        &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                                 &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                                 &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The difference is the addition of the &lt;code&gt;processBroadcastElement&lt;/code&gt; method through which messages of the rules stream will arrive. The following new version of &lt;code&gt;DynamicKeyFunction&lt;/code&gt; allows modifying the list of data-distribution keys at runtime through this stream:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;


  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                     &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                                     &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;BroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                           &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ReadOnlyBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
                                  &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;immutableEntries&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
          &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeysExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getGroupingKeyNames&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In the above code, &lt;code&gt;processElement()&lt;/code&gt; receives Transactions, and &lt;code&gt;processBroadcastElement()&lt;/code&gt; receives Rule updates. When a new rule is created, it is distributed as depicted in Figure 6 and saved in all parallel instances of the operator using &lt;code&gt;processBroadcastState&lt;/code&gt;. We use a Rule’s ID as the key to store and reference individual rules. Instead of iterating over a hardcoded &lt;code&gt;List&amp;lt;Rules&amp;gt;&lt;/code&gt;, we iterate over entries in the dynamically-updated broadcast state.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DynamicAlertFunction&lt;/code&gt; follows the same logic with respect to storing the rules in the broadcast &lt;code&gt;MapState&lt;/code&gt;. As described in &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Part 1&lt;/a&gt;, each message in the &lt;code&gt;processElement&lt;/code&gt; input is intended to be processed by one specific rule and comes “pre-marked” with a corresponding ID by  &lt;code&gt;DynamicKeyFunction&lt;/code&gt;. All we need to do is retrieve the definition of the corresponding rule from &lt;code&gt;BroadcastState&lt;/code&gt; by using the provided ID and process it according to the logic required by that rule. At this stage, we will also add messages to the internal function state in order to perform calculations on the required time window of data. We will consider how this is done in the final blog of the series about Fraud Detection.&lt;/p&gt;

&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;

&lt;p&gt;In this blog post, we continued our investigation of the use case of a Fraud Detection System built with Apache Flink. We looked into different ways in which data can be distributed between parallel operator instances and, most importantly, examined broadcast state. We demonstrated how dynamic partitioning — a pattern described in the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;first part&lt;/a&gt; of the series — can be combined and enhanced by the functionality provided by the broadcast state pattern. The ability to send dynamic updates at runtime is a powerful feature of Apache Flink that is applicable in a variety of other use cases, such as controlling state (cleanup/insert/fix), running A/B experiments or executing updates of ML model coefficients.&lt;/p&gt;
</description>
<pubDate>Tue, 24 Mar 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html</link>
<guid isPermaLink="true">/news/2020/03/24/demo-fraud-detection-2.html</guid>
</item>

<item>
<title>Apache Beam: How Beam Runs on Top of Flink</title>
<description>&lt;p&gt;Note: This blog post is based on the talk &lt;a href=&quot;https://www.youtube.com/watch?v=hxHGLrshnCY&quot;&gt;“Beam on Flink: How Does It Actually Work?”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://beam.apache.org/&quot;&gt;Apache Beam&lt;/a&gt; are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. We also take a closer look at how Beam works with Flink to provide an idea of the technical aspects of running Beam pipelines with Flink. We hope you find some useful information on how and why the two frameworks can be utilized in combination. For more information, you can refer to the corresponding &lt;a href=&quot;https://beam.apache.org/documentation/runners/flink/&quot;&gt;documentation&lt;/a&gt; on the Beam website or contact the community through the &lt;a href=&quot;https://beam.apache.org/community/contact-us/&quot;&gt;Beam mailing list&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;what-is-apache-beam&quot;&gt;What is Apache Beam&lt;/h1&gt;

&lt;p&gt;&lt;a href=&quot;https://beam.apache.org/&quot;&gt;Apache Beam&lt;/a&gt; is an open-source, unified model for defining batch and streaming data-parallel processing pipelines. It is unified in the sense that you use a single API, in contrast to using a separate API for batch and streaming like it is the case in Flink. Beam was originally developed by Google which released it in 2014 as the Cloud Dataflow SDK. In 2016, it was donated to &lt;a href=&quot;https://www.apache.org/&quot;&gt;the Apache Software Foundation&lt;/a&gt; with the name of Beam. It has been developed by the open-source community ever since. With Apache Beam, developers can write data processing jobs, also known as pipelines, in multiple languages, e.g. Java, Python, Go, SQL. A pipeline is then executed by one of Beam’s Runners. A Runner is responsible for translating Beam pipelines such that they can run on an execution engine. Every supported execution engine has a Runner. The following Runners are available: Apache Flink, Apache Spark, Apache Samza, Hazelcast Jet, Google Cloud Dataflow, and others.&lt;/p&gt;

&lt;p&gt;The execution model, as well as the API of Apache Beam, are similar to Flink’s. Both frameworks are inspired by the &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf&quot;&gt;MapReduce&lt;/a&gt;, &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41378.pdf&quot;&gt;MillWheel&lt;/a&gt;, and &lt;a href=&quot;https://research.google/pubs/pub43864/&quot;&gt;Dataflow&lt;/a&gt; papers. Like Flink, Beam is designed for parallel, distributed data processing. Both have similar transformations, support for windowing, event/processing time, watermarks, timers, triggers, and much more. However, Beam not being a full runtime focuses on providing the framework for building portable, multi-language batch and stream processing pipelines such that they can be run across several execution engines. The idea is that you write your pipeline once and feed it with either batch or streaming data. When you run it, you just pick one of the supported backends to execute. A large integration test suite in Beam called “ValidatesRunner” ensures that the results will be the same, regardless of which backend you choose for the execution.&lt;/p&gt;

&lt;p&gt;One of the most exciting developments in the Beam technology is the framework’s support for multiple programming languages including Java, Python, Go, Scala and SQL. Essentially, developers can write their applications in a programming language of their choice. Beam, with the help of the Runners, translates the program to one of the execution engines, as shown in the diagram below.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-beam-vision.png&quot; width=&quot;600px&quot; alt=&quot;The vision of Apache Beam&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;reasons-to-use-beam-with-flink&quot;&gt;Reasons to use Beam with Flink&lt;/h1&gt;

&lt;p&gt;Why would you want to use Beam with Flink instead of directly using Flink? Ultimately, Beam and Flink complement each other and provide additional value to the user. The main reasons for using Beam with Flink are the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Beam provides a unified API for both batch and streaming scenarios.&lt;/li&gt;
  &lt;li&gt;Beam comes with native support for different programming languages, like Python or Go with all their libraries like Numpy, Pandas, Tensorflow, or TFX.&lt;/li&gt;
  &lt;li&gt;You get the power of Apache Flink like its exactly-once semantics, strong memory management and robustness.&lt;/li&gt;
  &lt;li&gt;Beam programs run on your existing Flink infrastructure or infrastructure for other supported Runners, like Spark or Google Cloud Dataflow.&lt;/li&gt;
  &lt;li&gt;You get additional features like side inputs and cross-language pipelines that are not supported natively in Flink but only supported when using Beam with Flink.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;the-flink-runner-in-beam&quot;&gt;The Flink Runner in Beam&lt;/h1&gt;

&lt;p&gt;The Flink Runner in Beam translates Beam pipelines into Flink jobs. The translation can be parameterized using Beam’s pipeline options which are parameters for settings like configuring the job name, parallelism, checkpointing, or metrics reporting.&lt;/p&gt;

&lt;p&gt;If you are familiar with a DataSet or a DataStream, you will have no problems understanding what a PCollection is. PCollection stands for parallel collection in Beam and is exactly what DataSet/DataStream would be in Flink. Due to Beam’s unified API we only have one type of results of transformation: PCollection.&lt;/p&gt;

&lt;p&gt;Beam pipelines are composed of transforms. Transforms are like operators in Flink and come in two flavors: primitive and composite transforms. The beauty of all this is that Beam only comes with a small set of primitive transforms which are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;Source&lt;/code&gt; (for loading data)&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ParDo&lt;/code&gt; (think of a flat map operator on steroids)&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;GroupByKey&lt;/code&gt; (think of keyBy() in Flink)&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;AssignWindows&lt;/code&gt; (windows can be assigned at any point in time in Beam)&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;Flatten&lt;/code&gt; (like a union() operation in Flink)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Composite transforms are built by combining the above primitive transforms. For example, &lt;code&gt;Combine = GroupByKey + ParDo&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;flink-runner-internals&quot;&gt;Flink Runner Internals&lt;/h1&gt;

&lt;p&gt;Although using the Flink Runner in Beam has no prerequisite to understanding its internals, we provide more details of how the Flink runner works in Beam to share knowledge of how the two frameworks can integrate and work together to provide state-of-the-art streaming data pipelines.&lt;/p&gt;

&lt;p&gt;The Flink Runner has two translation paths. Depending on whether we execute in batch or streaming mode, the Runner either translates into Flink’s DataSet or into Flink’s DataStream API. Since multi-language support has been added to Beam, another two translation paths have been added. To summarize the four modes:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;The Classic Flink Runner for batch jobs:&lt;/strong&gt; Executes batch Java pipelines&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The Classic Flink Runner for streaming jobs:&lt;/strong&gt; Executes streaming Java pipelines&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The Portable Flink Runner for batch jobs:&lt;/strong&gt; Executes Java as well as Python, Go and other supported SDK pipelines for batch scenarios&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;The Portable Flink Runner for streaming jobs:&lt;/strong&gt; Executes Java as well as Python, Go and other supported SDK pipelines for streaming scenarios&lt;/li&gt;
&lt;/ol&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-runner-translation-paths.png&quot; width=&quot;300px&quot; alt=&quot;The 4 translation paths in the Beam&#39;s Flink Runner&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;the-classic-flink-runner-in-beam&quot;&gt;The “Classic” Flink Runner in Beam&lt;/h2&gt;

&lt;p&gt;The classic Flink Runner was the initial version of the Runner, hence the “classic” name. Beam pipelines are represented as a graph in Java which is composed of the aforementioned composite and primitive transforms. Beam provides translators which traverse the graph in topological order. Topological order means that we start from all the sources first as we iterate through the graph. Presented with a transform from the graph, the Flink Runner generates the API calls as you would normally when writing a Flink job.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/classic-flink-runner-beam.png&quot; width=&quot;600px&quot; alt=&quot;The Classic Flink Runner in Beam&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;While Beam and Flink share very similar concepts, there are enough differences between the two frameworks that make Beam pipelines impossible to be translated 1:1 into a Flink program. In the following sections, we will present the key differences:&lt;/p&gt;

&lt;h3 id=&quot;serializers-vs-coders&quot;&gt;Serializers vs Coders&lt;/h3&gt;

&lt;p&gt;When data is transferred over the wire in Flink, it has to be turned into bytes. This is done with the help of serializers. Flink has a type system to instantiate the correct coder for a given type, e.g. &lt;code&gt;StringTypeSerializer&lt;/code&gt; for a String. Apache Beam also has its own type system which is similar to Flink’s but uses slightly different interfaces. Serializers are called Coders in Beam. In order to make a Beam Coder run in Flink, we have to make the two serializer types compatible. This is done by creating a special Flink type information that looks like the one in Flink but calls the appropriate Beam coder. That way, we can use Beam’s coders although we are executing the Beam job with Flink. Flink operators expect a TypeInformation, e.g. &lt;code&gt;StringTypeInformation&lt;/code&gt;, for which we use a &lt;code&gt;CoderTypeInformation&lt;/code&gt; in Beam. The type information returns the serializer for which we return a &lt;code&gt;CoderTypeSerializer&lt;/code&gt;, which calls the underlying Beam Coder.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-serializers-coders.png&quot; width=&quot;300px&quot; alt=&quot;Serializers vs Coders&quot; /&gt;
&lt;/center&gt;

&lt;h3 id=&quot;read&quot;&gt;Read&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Read&lt;/code&gt; transform provides a way to read data into your pipeline in Beam. The Read transform is supported by two wrappers in Beam, the &lt;code&gt;SourceInputFormat&lt;/code&gt; for batch processing and the &lt;code&gt;UnboundedSourceWrapper&lt;/code&gt; for stream processing.&lt;/p&gt;

&lt;h3 id=&quot;pardo&quot;&gt;ParDo&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;ParDo&lt;/code&gt; is the swiss army knife of Beam and can be compared to a &lt;code&gt;RichFlatMapFunction&lt;/code&gt; in Flink with additional features such as &lt;code&gt;SideInputs&lt;/code&gt;, &lt;code&gt;SideOutputs&lt;/code&gt;, State and Timers. &lt;code&gt;ParDo&lt;/code&gt; is essentially translated by the Flink runner using the &lt;code&gt;FlinkDoFnFunction&lt;/code&gt; for batch processing or the &lt;code&gt;FlinkStatefulDoFnFunction&lt;/code&gt;, while for streaming scenarios the translation is executed with the &lt;code&gt;DoFnOperator&lt;/code&gt; that takes care of checkpointing and buffering of data during checkpoints, watermark emissions and maintenance of state and timers. This is all executed by Beam’s interface, called the &lt;code&gt;DoFnRunner&lt;/code&gt;, that encapsulates Beam-specific execution logic, like retrieving state, executing state and timers, or reporting metrics.&lt;/p&gt;

&lt;h3 id=&quot;side-inputs&quot;&gt;Side Inputs&lt;/h3&gt;

&lt;p&gt;In addition to the main input, ParDo transforms can have a number of side inputs. A side input can be a static set of data that you want to have available at all parallel instances. However, it is more flexible than that. You can have keyed and even windowed side input which updates based on the window size. This is a very powerful concept which does not exist in Flink but is added on top of Flink using Beam.&lt;/p&gt;

&lt;h3 id=&quot;assignwindows&quot;&gt;AssignWindows&lt;/h3&gt;

&lt;p&gt;In Flink, windows are assigned by the &lt;code&gt;WindowOperator&lt;/code&gt; when you use the &lt;code&gt;window()&lt;/code&gt; in the API. In Beam, windows can be assigned at any point in time. Any element is implicitly part of a window. If no window is assigned explicitly, the element is part of the &lt;code&gt;GlobalWindow&lt;/code&gt;. Window information is stored for each element in a wrapper called &lt;code&gt;WindowedValue&lt;/code&gt;. The window information is only used once we issue a &lt;code&gt;GroupByKey&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;groupbykey&quot;&gt;GroupByKey&lt;/h3&gt;

&lt;p&gt;Most of the time it is useful to partition the data by a key. In Flink, this is done via the &lt;code&gt;keyBy()&lt;/code&gt; API call. In Beam the &lt;code&gt;GroupByKey&lt;/code&gt; transform can only be applied if the input is of the form &lt;code&gt;KV&amp;lt;Key, Value&amp;gt;&lt;/code&gt;. Unlike Flink where the key can even be nested inside the data, Beam enforces the key to always be explicit. The &lt;code&gt;GroupByKey&lt;/code&gt; transform then groups the data by key and by window which is similar to what &lt;code&gt;keyBy(..).window(..)&lt;/code&gt; would give us in Flink. Beam has its own set of libraries to do that because Beam has its own set of window functions and triggers. Essentially, GroupByKey is very similar to what the WindowOperator does in Flink.&lt;/p&gt;

&lt;h3 id=&quot;flatten&quot;&gt;Flatten&lt;/h3&gt;

&lt;p&gt;The Flatten operator takes multiple DataSet/DataStreams, called P[arallel]Collections in Beam, and combines them into one collection. This is equivalent to Flink’s &lt;code&gt;union()&lt;/code&gt; operation.&lt;/p&gt;

&lt;h2 id=&quot;the-portable-flink-runner-in-beam&quot;&gt;The “Portable” Flink Runner in Beam&lt;/h2&gt;

&lt;p&gt;The portable Flink Runner in Beam is the evolution of the classic Runner. Classic Runners are tied to the JVM ecosystem, but the Beam community wanted to move past this and also execute Python, Go and other languages. This adds another dimension to Beam in terms of portability because, like previously mentioned, Beam already had portability across execution engines. It was necessary to change the translation logic of the Runner to be able to support language portability.&lt;/p&gt;

&lt;p&gt;There are two important building blocks for portable Runners:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;A common pipeline format across all the languages: The Runner API&lt;/li&gt;
  &lt;li&gt;A common interface during execution for the communication between the Runner and the code written in any language: The Fn API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The Runner API provides a universal representation of the pipeline as Protobuf which contains the transforms, types, and user code. Protobuf was chosen as the format because every language has libraries available for it. Similarly, for the execution part, Beam introduced the Fn API interface to handle the communication between the Runner/execution engine and the user code that may be written in a different language and executes in a different process. Fn API is pronounced “fun API”, you may guess why.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-language-portability.png&quot; width=&quot;600px&quot; alt=&quot;Language Portability in Apache Beam&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;how-are-beam-programs-translated-in-language-portability&quot;&gt;How Are Beam Programs Translated In Language Portability?&lt;/h2&gt;

&lt;p&gt;Users write their Beam pipelines in one language, but they may get executed in an environment based on a completely different language. How does that work? To explain that, let’s follow the lifecycle of a pipeline. Let’s suppose we use the Python SDK to write the pipeline. Before submitting the pipeline via the Job API to Beam’s JobServer, Beam would convert it to the Runner API, the language-agnostic format we described before. The JobServer is also a Beam component that handles the staging of the required dependencies during execution. The JobServer will then kick-off the translation which is similar to the classic Runner. However, an important change is the so-called &lt;code&gt;ExecutableStage&lt;/code&gt; transform. It is essentially a ParDo transform that we already know but designed for holding language-dependent code. Beam tries to combine as many of these transforms into one “executable stage”. The result again is a Flink program which is then sent to the Flink cluster and executed there. The major difference compared to the classic Runner is that during execution we will start &lt;em&gt;environments&lt;/em&gt; to execute the aforementioned &lt;em&gt;ExecutableStages&lt;/em&gt;. The following environments are available:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Docker-based (the default)&lt;/li&gt;
  &lt;li&gt;Process-based (a simple process is started)&lt;/li&gt;
  &lt;li&gt;Externally-provided (K8s or other schedulers)&lt;/li&gt;
  &lt;li&gt;Embedded (intended for testing and only works with Java)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Environments hold the &lt;em&gt;SDK Harness&lt;/em&gt; which is the code that handles the execution and the communication with the Runner over the Fn API. For example, when Flink executes Python code, it sends the data to the Python environment containing the Python SDK Harness. Sending data to an external process involves a minor overhead which we have measured to be 5-10% slower than the classic Java pipelines. However, Beam uses a fusion of transforms to execute as many transforms as possible in the same environment which share the same input or output. That’s why in real-world scenarios the overhead could be much lower.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-language-portability-architecture.png&quot; width=&quot;600px&quot; alt=&quot;Language Portability Architecture in beam&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Environments can be present for many languages. This opens up an entirely new type of pipelines: cross-language pipelines. In cross-language pipelines we can combine transforms of two or more languages, e.g. a machine learning pipeline with the feature generation written in Java and the learning written in Python. All this can be run on top of Flink.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Using Apache Beam with Apache Flink combines  (a.) the power of Flink with (b.) the flexibility of Beam. All it takes to run Beam is a Flink cluster, which you may already have. Apache Beam’s fully-fledged Python API is probably the most compelling argument for using Beam with Flink, but the unified API which allows to “write-once” and “execute-anywhere” is also very appealing to Beam users. On top of this, features like side inputs and a rich connector ecosystem are also reasons why people like Beam.&lt;/p&gt;

&lt;p&gt;With the introduction of schemas, a new format for handling type information, Beam is heading in a similar direction as Flink with its type system which is essential for the Table API or SQL. Speaking of, the next Flink release will include a Python version of the Table API which is based on the language portability of Beam. Looking ahead, the Beam community plans to extend the support for interactive programs like notebooks. TFX, which is built with Beam, is a very powerful way to solve many problems around training and validating machine learning models.&lt;/p&gt;

&lt;p&gt;For many years, Beam and Flink have inspired and learned from each other. With the Python support being based on Beam in Flink, they only seem to come closer to each other. That’s all the better for the community, and also users have more options and functionality to choose from.&lt;/p&gt;
</description>
<pubDate>Sat, 22 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html</link>
<guid isPermaLink="true">/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html</guid>
</item>

<item>
<title>No Java Required: Configuring Sources and Sinks in SQL</title>
<description>&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;The recent &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;Apache Flink 1.10 release&lt;/a&gt; includes many exciting features.
In particular, it marks the end of the community’s year-long effort to merge in the &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;Blink SQL contribution&lt;/a&gt; from Alibaba.
The reason the community chose to spend so much time on the contribution is that SQL works.
It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts.
Best of all, Flink SQL is ANSI-SQL compliant, which means if you’ve ever used a database in the past, you already know it&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;

&lt;p&gt;A lot of work focused on improving runtime performance and progressively extending its coverage of the SQL standard.
Flink now supports the full TPC-DS query set for batch queries, reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads.
Its streaming SQL supports an almost equal set of features - those that are well defined on a streaming runtime - including &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/streaming/joins.html&quot;&gt;complex joins&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/match_recognize.html&quot;&gt;MATCH_RECOGNIZE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As important as this work is, the community also strives to make these features generally accessible to the broadest audience possible.
That is why the Flink community is excited in 1.10 to offer production-ready DDL syntax (e.g., &lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;DROP TABLE&lt;/code&gt;) and a refactored catalog interface.&lt;/p&gt;

&lt;h1 id=&quot;accessing-your-data-where-it-lives&quot;&gt;Accessing Your Data Where It Lives&lt;/h1&gt;

&lt;p&gt;Flink does not store data at rest; it is a compute engine and requires other systems to consume input from and write its output.
Those that have used Flink’s &lt;code&gt;DataStream&lt;/code&gt; API in the past will be familiar with connectors that allow for interacting with external systems. 
Flink has a vast connector ecosystem that includes all major message queues, filesystems, and databases.&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
If your favorite system does not have a connector maintained in the central Apache Flink repository, check out the &lt;a href=&quot;https://flink-packages.org&quot;&gt;flink packages website&lt;/a&gt;, which has a growing number of community-maintained components.
&lt;/div&gt;

&lt;p&gt;While these connectors are battle-tested and production-ready, they are written in Java and configured in code, which means they are not amenable to pure SQL or Table applications.
For a holistic SQL experience, not only queries need to be written in SQL, but also table definitions.&lt;/p&gt;

&lt;h1 id=&quot;create-table-statements&quot;&gt;CREATE TABLE Statements&lt;/h1&gt;

&lt;p&gt;While Flink SQL has long provided table abstractions atop some of Flink’s most popular connectors, configurations were not always so straightforward.
Beginning in 1.10, Flink supports defining tables through &lt;code&gt;CREATE TABLE&lt;/code&gt; statements.
With this feature, users can now create logical tables, backed by various external systems, in pure SQL.&lt;/p&gt;

&lt;p&gt;By defining tables in SQL, developers can write queries against logical schemas that are abstracted away from the underlying physical data store. Coupled with Flink SQL’s unified approach to batch and stream processing, Flink provides a straight line from discovery to production.&lt;/p&gt;

&lt;p&gt;Users can define tables over static data sets, anything from a local CSV file to a full-fledged data lake or even Hive.
Leveraging Flink’s efficient batch processing capabilities, they can perform ad-hoc queries searching for exciting insights.
Once something interesting is identified, businesses can gain real-time and continuous insights by merely altering the table so that it is powered by a message queue such as Kafka.
Because Flink guarantees SQL queries have unified semantics over batch and streaming, users can be confident that redeploying this query as a continuous streaming application over a message queue will output identical results.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;c1&quot;&gt;-- Define a table called orders that is backed by a Kafka topic&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- The definition includes all relevant Kafka properties,&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- the underlying format (JSON) and even defines a&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- watermarking algorithm based on one of the fields&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- so that this table can be used with event time.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt;    &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt;    &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;5&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SECONDS&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.type&amp;#39;&lt;/span&gt;    	 &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.version&amp;#39;&lt;/span&gt; 	 &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;universal&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.topic&amp;#39;&lt;/span&gt;   	 &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;orders&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.startup-mode&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;earliest-offset&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.properties.bootstrap.servers&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;localhost:9092&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;format.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;json&amp;#39;&lt;/span&gt; 
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- Define a table called product_analysis&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- on top of ElasticSearch 7 where we &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- can write the results of our query. &lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product_analysis&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt; 	&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;tracking_time&lt;/span&gt; 	&lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;units_sold&lt;/span&gt; 	&lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.type&amp;#39;&lt;/span&gt;    &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;elasticsearch&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.version&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;7&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.hosts&amp;#39;&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;localhost:9200&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.index&amp;#39;&lt;/span&gt;   &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;ProductAnalysis&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.document.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;analysis&amp;#39;&lt;/span&gt; 
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;-- A simple query that analyzes order data&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- from Kafka and writes results into &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- ElasticSearch. &lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product_analysis&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;product_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;TUMBLE_START&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracking_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;units_sold&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;product_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;TUMBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h1 id=&quot;catalogs&quot;&gt;Catalogs&lt;/h1&gt;

&lt;p&gt;While being able to create tables is important, it often isn’t enough.
A business analyst, for example, shouldn’t have to know what properties to set for Kafka, or even have to know what the underlying data source is, to be able to write a query.&lt;/p&gt;

&lt;p&gt;To solve this problem, Flink 1.10 also ships with a revamped catalog system for managing metadata about tables and user definined functions.
With catalogs, users can create tables once and reuse them across Jobs and Sessions.
Now, the team managing a data set can create a table and immediately make it accessible to other groups within their organization.&lt;/p&gt;

&lt;p&gt;The most notable catalog that Flink integrates with today is Hive Metastore.
The Hive catalog allows Flink to fully interoperate with Hive and serve as a more efficient query engine.
Flink supports reading and writing Hive tables, using Hive UDFs, and even leveraging Hive’s metastore catalog to persist Flink specific metadata.&lt;/p&gt;

&lt;h1 id=&quot;looking-ahead&quot;&gt;Looking Ahead&lt;/h1&gt;

&lt;p&gt;Flink SQL has made enormous strides to democratize stream processing, and 1.10 marks a significant milestone in that development.
However, we are not ones to rest on our laurels and, the community is committed to raising the bar on standards while lowering the barriers to entry.
The community is looking to add more catalogs, such as JDBC and Apache Pulsar.
We encourage you to sign up for the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;mailing list&lt;/a&gt; and stay on top of the announcements and new features in upcoming releases.&lt;/p&gt;

&lt;hr /&gt;

&lt;div class=&quot;footnotes&quot;&gt;
  &lt;ol&gt;
    &lt;li id=&quot;fn:1&quot;&gt;
      &lt;p&gt;My colleague Timo, whose worked on Flink SQL from the beginning, has the entire SQL standard printed on his desk and references it before any changes are merged. It’s enormous. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
    &lt;/li&gt;
  &lt;/ol&gt;
&lt;/div&gt;
</description>
<pubDate>Thu, 20 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/20/ddl.html</link>
<guid isPermaLink="true">/news/2020/02/20/ddl.html</guid>
</item>

<item>
<title>Apache Flink 1.10.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).&lt;/p&gt;

&lt;p&gt;Flink 1.10 also marks the completion of the &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html#preview-of-the-new-blink-sql-query-processor&quot;&gt;Blink integration&lt;/a&gt;, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#improved-memory-management-and-configuration&quot; id=&quot;markdown-toc-improved-memory-management-and-configuration&quot;&gt;Improved Memory Management and Configuration&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#unified-logic-for-job-submission&quot; id=&quot;markdown-toc-unified-logic-for-job-submission&quot;&gt;Unified Logic for Job Submission&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#native-kubernetes-integration-beta&quot; id=&quot;markdown-toc-native-kubernetes-integration-beta&quot;&gt;Native Kubernetes Integration (Beta)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#table-apisql-production-ready-hive-integration&quot; id=&quot;markdown-toc-table-apisql-production-ready-hive-integration&quot;&gt;Table API/SQL: Production-ready Hive Integration&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#other-improvements-to-the-table-apisql&quot; id=&quot;markdown-toc-other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#pyflink-support-for-native-user-defined-functions-udfs&quot; id=&quot;markdown-toc-pyflink-support-for-native-user-defined-functions-udfs&quot;&gt;PyFlink: Support for Native User Defined Functions (UDFs)&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;p&gt;The binary distribution and source artifacts are now available on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; of the Flink website. For more details, check the complete &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12345845&quot;&gt;release changelog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/&quot;&gt;updated documentation&lt;/a&gt;. We encourage you to download the release and share your feedback with the community through the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;h3 id=&quot;improved-memory-management-and-configuration&quot;&gt;Improved Memory Management and Configuration&lt;/h3&gt;

&lt;p&gt;The current &lt;code&gt;TaskExecutor&lt;/code&gt; memory configuration in Flink has some shortcomings that make it hard to reason about or optimize resource utilization, such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Different configuration models for memory footprint in Streaming and Batch execution;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Complex and user-dependent configuration of off-heap state backends (i.e. RocksDB) in Streaming execution.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To make memory options more explicit and intuitive to users, Flink 1.10 introduces significant changes to the &lt;code&gt;TaskExecutor&lt;/code&gt; memory model and configuration logic (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors&quot;&gt;FLIP-49&lt;/a&gt;). These changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), giving users strict control over its memory consumption.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Managed Memory Extension&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Managed memory was extended to also account for memory usage of &lt;code&gt;RocksDBStateBackend&lt;/code&gt;. While batch jobs can use either on-heap or off-heap memory, streaming jobs with &lt;code&gt;RocksDBStateBackend&lt;/code&gt; can use off-heap memory only. Therefore, to allow users to switch between Streaming and Batch execution without having to modify cluster configurations, managed memory is now always off-heap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Simplified RocksDB Configuration&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Configuring an off-heap state backend like RocksDB used to involve a good deal of manual tuning, like decreasing the JVM heap size or setting Flink to use off-heap memory. This can now be achieved through Flink’s out-of-box configuration, and adjusting the memory budget for &lt;code&gt;RocksDBStateBackend&lt;/code&gt; is as simple as resizing the managed memory size.&lt;/p&gt;

&lt;p&gt;Another important improvement was to allow Flink to bind RocksDB native memory usage (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7289&quot;&gt;FLINK-7289&lt;/a&gt;), preventing it from exceeding its total memory budget — this is especially relevant in containerized environments like Kubernetes. For details on how to enable and tune this feature, refer to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#tuning-rocksdb&quot;&gt;Tuning RocksDB&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Note&lt;/span&gt; FLIP-49 changes the process of cluster resource configuration, which may require tuning your clusters for upgrades from previous Flink versions. For a comprehensive overview of the changes introduced and tuning guidance, consult &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html&quot;&gt;this setup&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;unified-logic-for-job-submission&quot;&gt;Unified Logic for Job Submission&lt;/h3&gt;

&lt;p&gt;Prior to this release, job submission was part of the duties of the Execution Environments and closely tied to the different deployment targets (e.g. Yarn, Kubernetes, Mesos). This led to a poor separation of concerns and, over time, to a growing number of customized environments that users needed to configure and manage separately.&lt;/p&gt;

&lt;p&gt;In Flink 1.10, job submission logic is abstracted into the generic &lt;code&gt;Executor&lt;/code&gt; interface (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission&quot;&gt;FLIP-73&lt;/a&gt;). The addition of the &lt;code&gt;ExecutorCLI&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=133631524&quot;&gt;FLIP-81&lt;/a&gt;) introduces a unified way to specify configuration parameters for &lt;strong&gt;any&lt;/strong&gt; &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html#deployment-targets&quot;&gt;execution target&lt;/a&gt;. To round up this effort, the process of result retrieval was also decoupled from job submission with the introduction of a &lt;code&gt;JobClient&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API&quot;&gt;FLINK-74&lt;/a&gt;), responsible for fetching the &lt;code&gt;JobExecutionResult&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;
	&lt;center&gt;
	&lt;img vspace=&quot;8&quot; style=&quot;width:100%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_zeppelin.png&quot; /&gt;
	&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;In particular, these changes make it much easier to programmatically use Flink in downstream frameworks — for example, Apache Beam or Zeppelin interactive notebooks — by providing users with a unified entry point to Flink. For users working with Flink across multiple target environments, the transition to a configuration-based execution process also significantly reduces boilerplate code and maintainability overhead.&lt;/p&gt;

&lt;h3 id=&quot;native-kubernetes-integration-beta&quot;&gt;Native Kubernetes Integration (Beta)&lt;/h3&gt;

&lt;p&gt;For users looking to get started with Flink on a containerized environment, deploying and managing a standalone cluster on top of Kubernetes requires some upfront knowledge about containers, operators and environment-specific tools like &lt;code&gt;kubectl&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In Flink 1.10, we rolled out the first phase of &lt;strong&gt;Active Kubernetes Integration&lt;/strong&gt; (&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-9953&quot;&gt;FLINK-9953&lt;/a&gt;) with support for session clusters (with per-job planned). In this context, “active” means that Flink’s ResourceManager (&lt;code&gt;K8sResMngr&lt;/code&gt;) natively communicates with Kubernetes to allocate new pods on-demand, similar to Flink’s Yarn and Mesos integration. Users can also leverage namespaces to launch Flink clusters for multi-tenant environments with limited aggregate resource consumption. RBAC roles and service accounts with enough permission should be configured beforehand.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;
	&lt;center&gt;
	&lt;img vspace=&quot;8&quot; style=&quot;width:75%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_nativek8s.png&quot; /&gt;
	&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;As introduced in &lt;a href=&quot;#unified-logic-for-job-submission&quot;&gt;Unified Logic For Job Submission&lt;/a&gt;, all command-line options in Flink 1.10 are mapped to a unified configuration. For this reason, users can simply refer to the Kubernetes config options and submit a job to an existing Flink session on Kubernetes in the CLI using:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&amp;lt;ClusterId&amp;gt; examples/streaming/WindowJoin.jar&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you want to try out this preview feature, we encourage you to walk through the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html&quot;&gt;Native Kubernetes setup&lt;/a&gt;, play around with it and share feedback with the community.&lt;/p&gt;

&lt;h3 id=&quot;table-apisql-production-ready-hive-integration&quot;&gt;Table API/SQL: Production-ready Hive Integration&lt;/h3&gt;

&lt;p&gt;Hive integration was announced as a preview feature in Flink 1.9. This preview allowed users to persist Flink-specific metadata (e.g. Kafka tables) in Hive Metastore using SQL DDL, call UDFs defined in Hive and use Flink for reading and writing Hive tables. Flink 1.10 rounds up this effort with further developments that bring production-ready Hive integration to Flink with full compatibility of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions&quot;&gt;most Hive versions&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;native-partition-support-for-batch-sql&quot;&gt;Native Partition Support for Batch SQL&lt;/h4&gt;

&lt;p&gt;So far, only writes to non-partitioned Hive tables were supported. In Flink 1.10, the Flink SQL syntax has been extended with &lt;code&gt;INSERT OVERWRITE&lt;/code&gt; and &lt;code&gt;PARTITION&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support&quot;&gt;FLIP-63&lt;/a&gt;), enabling users to write into both static and dynamic partitions in Hive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Static Partition Writing&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVERWRITE&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PARTITION&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partcol1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partcol2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...)]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;select_statement1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Dynamic Partition Writing&lt;/strong&gt;&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVERWRITE&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;select_statement1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Fully supporting partitioned tables allows users to take advantage of partition pruning on read, which significantly increases the performance of these operations by reducing the amount of data that needs to be scanned.&lt;/p&gt;

&lt;h4 id=&quot;further-optimizations&quot;&gt;Further Optimizations&lt;/h4&gt;

&lt;p&gt;Besides partition pruning, Flink 1.10 introduces more &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/read_write_hive.html#optimizations&quot;&gt;read optimizations&lt;/a&gt; to Hive integration, such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Projection pushdown:&lt;/strong&gt; Flink leverages projection pushdown to minimize data transfer between Flink and Hive tables by omitting unnecessary fields from table scans. This is especially beneficial for tables with a large number of columns.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;LIMIT pushdown:&lt;/strong&gt; for queries with the &lt;code&gt;LIMIT&lt;/code&gt; clause, Flink will limit the number of output records wherever possible to minimize the amount of data transferred across the network.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;ORC Vectorization on Read:&lt;/strong&gt; to boost read performance for ORC files, Flink now uses the native ORC Vectorized Reader by default for Hive versions above 2.0.0 and columns with non-complex data types.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;pluggable-modules-as-flink-system-objects-beta&quot;&gt;Pluggable Modules as Flink System Objects (Beta)&lt;/h4&gt;

&lt;p&gt;Flink 1.10 introduces a generic mechanism for pluggable modules in the Flink table core, with a first focus on system functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules&quot;&gt;FLIP-68&lt;/a&gt;). With modules, users can extend Flink’s system objects — for example use Hive built-in functions that behave like Flink system functions. This release ships with a pre-implemented &lt;code&gt;HiveModule&lt;/code&gt;, supporting multiple Hive versions, but users are also given the possibility to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/modules.html&quot;&gt;write their own pluggable modules&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/h3&gt;

&lt;h4 id=&quot;watermarks-and-computed-columns-in-sql-ddl&quot;&gt;Watermarks and Computed Columns in SQL DDL&lt;/h4&gt;

&lt;p&gt;Flink 1.10 supports stream-specific syntax extensions to define time attributes and watermark generation in Flink SQL DDL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+Time+Attribute+in+SQL+DDL&quot;&gt;FLIP-66&lt;/a&gt;). This allows time-based operations, like windowing, and the definition of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/create.html#create-table&quot;&gt;watermark strategies&lt;/a&gt; on tables created using DDL statements.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table_name&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columnName&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;watermark_strategy_expression&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;

&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This release also introduces support for virtual computed columns (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design&quot;&gt;FLIP-70&lt;/a&gt;) that can be derived based on other columns in the same table or deterministic expressions (i.e. literal values, UDFs and built-in functions). In Flink, computed columns are useful to define time attributes &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/create.html#create-table&quot;&gt;upon table creation&lt;/a&gt;.&lt;/p&gt;

&lt;h4 id=&quot;additional-extensions-to-sql-ddl&quot;&gt;Additional Extensions to SQL DDL&lt;/h4&gt;

&lt;p&gt;There is now a clear distinction between temporary/persistent and system/catalog functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog&quot;&gt;FLIP-57&lt;/a&gt;). This not only eliminates ambiguity in function reference, but also allows for deterministic function resolution order (i.e. in case of naming collision, system functions will precede catalog functions, with temporary functions taking precedence over persistent functions for both dimensions).&lt;/p&gt;

&lt;p&gt;Following the groundwork in FLIP-57, we extended the SQL DDL syntax to support the creation of catalog functions, temporary functions and temporary system functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-79+Flink+Function+DDL+Support&quot;&gt;FLIP-79&lt;/a&gt;):&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SYSTEM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt; 

  &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IF&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXISTS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;catalog_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;db_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;function_name&lt;/span&gt; 

&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;identifier&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LANGUAGE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JAVA&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SCALA&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For a complete overview of the current state of DDL support in Flink SQL, check the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/&quot;&gt;updated documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Note&lt;/span&gt; In order to correctly handle and guarantee a consistent behavior across meta-objects (tables, views, functions) in the future, some object declaration methods in the Table API have been deprecated in favor of methods that are closer to standard SQL DDL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module&quot;&gt;FLIP-64&lt;/a&gt;).&lt;/p&gt;

&lt;h4 id=&quot;full-tpc-ds-coverage-for-batch&quot;&gt;Full TPC-DS Coverage for Batch&lt;/h4&gt;

&lt;p&gt;TPC-DS is a widely used industry-standard decision support benchmark to evaluate and measure the performance of SQL-based data processing engines. In Flink 1.10, all TPC-DS queries are supported end-to-end (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11491&quot;&gt;FLINK-11491&lt;/a&gt;), reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads.&lt;/p&gt;

&lt;h3 id=&quot;pyflink-support-for-native-user-defined-functions-udfs&quot;&gt;PyFlink: Support for Native User Defined Functions (UDFs)&lt;/h3&gt;

&lt;p&gt;A preview of PyFlink was introduced in the previous release, making headway towards the goal of full Python support in Flink. For this release, the focus was to enable users to register and use Python User-Defined Functions (UDF, with UDTF/UDAF planned) in the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table&quot;&gt;FLIP-58&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;span&gt;
	&lt;center&gt;
	&lt;img vspace=&quot;8&quot; hspace=&quot;100&quot; style=&quot;width:75%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_pyflink.gif&quot; /&gt;
	&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;If you are interested in the underlying implementation — leveraging Apache Beam’s &lt;a href=&quot;https://beam.apache.org/roadmap/portability/&quot;&gt;Portability Framework&lt;/a&gt; — refer to the “Architecture” section of FLIP-58 and also to &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management&quot;&gt;FLIP-78&lt;/a&gt;. These data structures lay the required foundation for Pandas support and for PyFlink to eventually reach the DataStream API.&lt;/p&gt;

&lt;p&gt;From Flink 1.10, users can also easily install PyFlink through &lt;code&gt;pip&lt;/code&gt; using:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For a preview of other improvements planned for PyFlink, check &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14500&quot;&gt;FLINK-14500&lt;/a&gt; and get involved in the &lt;a href=&quot;http://apache-flink.147419.n8.nabble.com/Re-DISCUSS-What-parts-of-the-Python-API-should-we-focus-on-next-td1285.html&quot;&gt;discussion&lt;/a&gt; for requested user features.&lt;/p&gt;

&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10725&quot;&gt;FLINK-10725&lt;/a&gt;] Flink can now be compiled and run on Java 11.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-15495&quot;&gt;FLINK-15495&lt;/a&gt;] The Blink planner is now the default in the SQL Client, so that users can benefit from all the latest features and improvements. The switch from the old planner in the Table API is also planned for the next release, so we recommend that users start getting familiar with the Blink planner.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13025&quot;&gt;FLINK-13025&lt;/a&gt;] There is a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/elasticsearch.html#elasticsearch-connector&quot;&gt;new Elasticsearch sink connector&lt;/a&gt;, fully supporting Elasticsearch 7.x versions.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15115&quot;&gt;FLINK-15115&lt;/a&gt;] The connectors for Kafka 0.8 and 0.9 have been marked as deprecated and will no longer be actively supported. If you are still using these versions or have any other related concerns, please reach out to the @dev mailing list.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14516&quot;&gt;FLINK-14516&lt;/a&gt;] The non-credit-based network flow control code was removed, along with the configuration option &lt;code&gt;taskmanager.network.credit.model&lt;/code&gt;. Moving forward, Flink will always use credit-based flow control.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12122&quot;&gt;FLINK-12122&lt;/a&gt;] &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt; was rolled out with Flink 1.5.0 and introduced a code regression related to the way slots are allocated from &lt;code&gt;TaskManagers&lt;/code&gt;. To use a scheduling strategy that is closer to the pre-FLIP behavior, where Flink tries to spread out the workload across all currently available &lt;code&gt;TaskManagers&lt;/code&gt;, users can set &lt;code&gt;cluster.evenly-spread-out-slots: true&lt;/code&gt; in the &lt;code&gt;flink-conf.yaml&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11956&quot;&gt;FLINK-11956&lt;/a&gt;] &lt;code&gt;s3-hadoop&lt;/code&gt; and &lt;code&gt;s3-presto&lt;/code&gt; filesystems no longer use class relocations and should be loaded through &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/filesystems/#pluggable-file-systems&quot;&gt;plugins&lt;/a&gt;, but now seamlessly integrate with all credential providers. Other filesystems are strongly recommended to be used only as plugins, as we will continue to remove relocations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Flink 1.9 shipped with a refactored Web UI, with the legacy one being kept around as backup in case something wasn’t working as expected. No issues have been reported so far, so &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-old-WebUI-td35218.html&quot;&gt;the community voted&lt;/a&gt; to drop the legacy Web UI in Flink 1.10.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/release-notes/flink-1.10.html&quot;&gt;release notes&lt;/a&gt; carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.10. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;The Apache Flink community would like to thank all contributors that have made this release possible:&lt;/p&gt;

&lt;p&gt;Achyuth Samudrala, Aitozi, Alberto Romero, Alec.Ch, Aleksey Pak, Alexander Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrey Zagrebin, Arvid Heise, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Biao Liu, Chesnay Schepler, Congxian Qiu, Cyrille Chépélov, César Soto Valero, David Anderson, David Hrbacek, David Moravek, Dawid Wysakowicz, Dezhi Cai, Dian Fu, Dyana Rose, Eamon Taaffe, Fabian Hueske, Fawad Halim, Fokko Driesprong, Frey Gao, Gabor Gevay, Gao Yun, Gary Yao, GatsbyNewton, GitHub, Grebennikov Roman, GuoWei Ma, Gyula Fora, Haibo Sun, Hao Dang, Henvealf, Hongtao Zhang, HuangXingBo, Hwanju Kim, Igal Shilman, Jacob Sevart, Jark Wu, Jeff Martin, Jeff Yang, Jeff Zhang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Joao Boto, John Lonergan, Kaibo Zhou, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Ling Wang, Lining Jing, Liupengcheng, LouisXu, Mads Chr. Olesen, Marco Zühlke, Marcos Klein, Matyas Orhidi, Maximilian Bode, Maximilian Michels, Nick Pavlakis, Nico Kruber, Nicolas Deslandes, Pablo Valtuille, Paul Lam, Paul Lin, PengFei Li, Piotr Nowojski, Piotr Przybylski, Piyush Narang, Ricco Chen, Richard Deurwaarder, Robert Metzger, Roman, Roman Grebennikov, Roman Khachatryan, Rong Rong, Rui Li, Ryan Tao, Scott Kidder, Seth Wiesman, Shannon Carey, Shaobin.Ou, Shuo Cheng, Stefan Richter, Stephan Ewen, Steve OU, Steven Wu, Terry Wang, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Wind (Jiayi Liao), Xintong Song, XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yadong Xie, Yang Wang, Yangze Guo, Yikun Jiang, Ying, YngwieWang, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, a-suiniaev, azagrebin, beyond1920, biao.liub, blueszheng, bowen.li, caoyingjie, catkint, chendonglin, chenqi, chunpinghe, cyq89051127, danrtsey.wy, dengziming, dianfu, eskabetxe, fanrui, forideal, gentlewang, godfrey he, godfreyhe, haodang, hehuiyuan, hequn8128, hpeter, huangxingbo, huzheng, ifndef-SleePy, jiemotongxue, joe, jrthe42, kevin.cyj, klion26, lamber-ken, libenchao, liketic, lincoln-lil, lining, liuyongvs, liyafan82, lz, mans2singh, mojo, openinx, ouyangwulin, shining-huang, shuai-xu, shuo.cs, stayhsfLee, sunhaibotb, sunjincheng121, tianboxiu, tianchen, tianchen92, tison, tszkitlo40, unknown, vinoyang, vthinkxie, wangpeibin, wangxiaowei, wangxiyuan, wangxlong, wangyang0918, whlwanghailong, xuchao0903, xuyang1706, yanghua, yangjf2019, yongqiang chai, yuzhao.cyz, zentol, zhangzhanchum, zhengcanbin, zhijiang, zhongyong jin, zhuzhu.zz, zjuwangg, zoudaokoulife, 砚田, 谢磊, 张志豪, 曹建华&lt;/p&gt;
</description>
<pubDate>Tue, 11 Feb 2020 03:30:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/11/release-1.10.0.html</link>
<guid isPermaLink="true">/news/2020/02/11/release-1.10.0.html</guid>
</item>

<item>
<title>A Guide for Unit Testing in Apache Flink</title>
<description>&lt;p&gt;Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications. 
Apache Flink provides a robust unit testing framework to make sure your applications behave in production as expected during development. You need to include the following dependencies to utilize the provided framework.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-test-utils_${scala.binary.version}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;${flink.version}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt; 
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-runtime_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.0&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;classifier&amp;gt;&lt;/span&gt;tests&lt;span class=&quot;nt&quot;&gt;&amp;lt;/classifier&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.0&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;classifier&amp;gt;&lt;/span&gt;tests&lt;span class=&quot;nt&quot;&gt;&amp;lt;/classifier&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The strategy of writing unit tests differs for various operators. You can break down the strategy into the following three buckets:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Stateless Operators&lt;/li&gt;
  &lt;li&gt;Stateful Operators&lt;/li&gt;
  &lt;li&gt;Timed Process Operators&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;stateless-operators&quot;&gt;Stateless Operators&lt;/h1&gt;

&lt;p&gt;Writing unit tests for a stateless operator is a breeze. You need to follow the basic norm of writing a test case, i.e., create an instance of the function class and test the appropriate methods. Let’s take an example of a simple &lt;code&gt;Map&lt;/code&gt; operator.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyStatelessMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The test case for the above operator should look like&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MyStatelessMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyStatelessMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Pretty simple, right? Let’s take a look at one for the &lt;code&gt;FlatMap&lt;/code&gt; operator.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyStatelessFlatMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;FlatMap&lt;/code&gt; operators require a &lt;code&gt;Collector&lt;/code&gt; object along with the input. For the test case, we have two options:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Mock the &lt;code&gt;Collector&lt;/code&gt; object using Mockito&lt;/li&gt;
  &lt;li&gt;Use the &lt;code&gt;ListCollector&lt;/code&gt; provided by Flink&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I prefer the second method as it requires fewer lines of code and is suitable for most of the cases.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MyStatelessFlatMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessFlatMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyStatelessFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;ListCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;listCollector&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ListCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;statelessFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;listCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id=&quot;stateful-operators&quot;&gt;Stateful Operators&lt;/h1&gt;

&lt;p&gt;Writing test cases for stateful operators requires more effort. You need to check whether the operator state is updated correctly and if it is cleaned up properly along with the output of the operator.&lt;/p&gt;

&lt;p&gt;Let’s take an example of stateful &lt;code&gt;FlatMap&lt;/code&gt; function&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StatefulFlatMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichFlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;previousInput&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The intricate part of writing tests for the above class is to mock the configuration as well as the runtime context of the application. Flink provides TestHarness classes so that users don’t have to create the mock objects themselves. Using the &lt;code&gt;KeyedOperatorHarness&lt;/code&gt;, the test looks like:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.api.operators.StreamFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.runtime.streamrecord.StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;StatefulFlatMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StatefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// OneInputStreamOperatorTestHarness takes the input and output types as type parameters     &lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
    &lt;span class=&quot;c1&quot;&gt;// KeyedOneInputStreamOperatorTestHarness takes three arguments:&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;//   Flink operator object, key selector and key type&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// test first record&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;previousInput&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// test second record&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;parallel&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; 
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello parallel world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;parallel&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The test harness provides many helper methods, three of which are being used here:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code&gt;open&lt;/code&gt;: calls the open of the &lt;code&gt;FlatMap&lt;/code&gt; function with relevant parameters. It also initializes the context.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;processElement&lt;/code&gt;: allows users to pass an input element as well as the timestamp associated with the element.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;extractOutputStreamRecords&lt;/code&gt;: gets the output records along with their timestamps from the &lt;code&gt;Collector&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The test harness simplifies the unit testing for the stateful functions to a large extent.&lt;/p&gt;

&lt;p&gt;You might also need to check whether the state value is being set correctly. You can get the state value directly from the operator using a mechanism similar to the one used while creating the state. This is also demonstrated in the previous example.&lt;/p&gt;

&lt;h1 id=&quot;timed-process-operators&quot;&gt;Timed Process Operators&lt;/h1&gt;

&lt;p&gt;Writing tests for process functions, that work with time, is quite similar to writing tests for stateful functions because you can also use test harness.
However, you need to take care of another aspect, which is providing timestamps for events and controlling the current time of the application. By setting the current (processing or event) time, you can trigger registered timers, which will call the &lt;code&gt;onTimer&lt;/code&gt; method of the function&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timerService&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerProcessingTimeTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;onTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OnTimerContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Timer triggered at timestamp %d&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We need to test both the methods in the &lt;code&gt;KeyedProcessFunction&lt;/code&gt;, i.e., &lt;code&gt;processElement&lt;/code&gt; as well as &lt;code&gt;onTimer&lt;/code&gt;. Using a test harness, we can control the current time of the function. Thus, we can trigger the timer at will rather than waiting for a specific time.&lt;/p&gt;

&lt;p&gt;Let’s take a look at the test case&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testProcessElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessOperator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;// Function time is initialized to 0&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testOnTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessOperator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;numProcessingTimeTimers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
      
  &lt;span class=&quot;c1&quot;&gt;// Function time is set to 50&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProcessingTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; 
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Timer triggered at timestamp 50&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The mechanism to test the multi-input stream operators such as CoProcess functions is similar to the ones described in this article. You should use the TwoInput variant of the harness for these operators, such as &lt;code&gt;TwoInputStreamOperatorTestHarness&lt;/code&gt;.&lt;/p&gt;

&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;

&lt;p&gt;In the previous sections we showcased how unit testing in Apache Flink works for stateless, stateful and times-aware-operators. We hope you found the steps easy to follow and execute while developing your Flink applications. If you have any questions or feedback you can reach out to me &lt;a href=&quot;https://www.kharekartik.dev/about/&quot;&gt;here&lt;/a&gt; or contact the community on the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;Apache Flink user mailing list&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Fri, 07 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html</link>
<guid isPermaLink="true">/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html</guid>
</item>

<item>
<title>Apache Flink 1.9.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.&lt;/p&gt;

&lt;p&gt;This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.9.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12122&quot;&gt;FLINK-12122&lt;/a&gt;] -         Spread out tasks evenly across all available registered TaskManagers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13360&quot;&gt;FLINK-13360&lt;/a&gt;] -         Add documentation for HBase connector for Table API &amp;amp; SQL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13361&quot;&gt;FLINK-13361&lt;/a&gt;] -         Add documentation for JDBC connector for Table API &amp;amp; SQL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13723&quot;&gt;FLINK-13723&lt;/a&gt;] -         Use liquid-c for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13724&quot;&gt;FLINK-13724&lt;/a&gt;] -         Remove unnecessary whitespace from the docs&amp;#39; sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13725&quot;&gt;FLINK-13725&lt;/a&gt;] -         Use sassc for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13726&quot;&gt;FLINK-13726&lt;/a&gt;] -         Build docs with jekyll 4.0.0.pre.beta1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13791&quot;&gt;FLINK-13791&lt;/a&gt;] -         Speed up sidenav by using group_by
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13817&quot;&gt;FLINK-13817&lt;/a&gt;] -         Expose whether web submissions are enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13818&quot;&gt;FLINK-13818&lt;/a&gt;] -         Check whether web submission are enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14535&quot;&gt;FLINK-14535&lt;/a&gt;] -         Cast exception is thrown when count distinct on decimal fields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14735&quot;&gt;FLINK-14735&lt;/a&gt;] -         Improve batch schedule check input consumable performance
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10377&quot;&gt;FLINK-10377&lt;/a&gt;] -         Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10435&quot;&gt;FLINK-10435&lt;/a&gt;] -         Client sporadically hangs after Ctrl + C
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11120&quot;&gt;FLINK-11120&lt;/a&gt;] -         TIMESTAMPADD function handles TIME incorrectly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11835&quot;&gt;FLINK-11835&lt;/a&gt;] -         ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] -         Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12399&quot;&gt;FLINK-12399&lt;/a&gt;] -         FilterableTableSource does not use filters on job run
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13184&quot;&gt;FLINK-13184&lt;/a&gt;] -         Starting a TaskExecutor blocks the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13589&quot;&gt;FLINK-13589&lt;/a&gt;] -         DelimitedInputFormat index error on multi-byte delimiters with whole file input splits
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13702&quot;&gt;FLINK-13702&lt;/a&gt;] -         BaseMapSerializerTest.testDuplicate fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13708&quot;&gt;FLINK-13708&lt;/a&gt;] -         Transformations should be cleared because a table environment could execute multiple job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13740&quot;&gt;FLINK-13740&lt;/a&gt;] -         TableAggregateITCase.testNonkeyedFlatAggregate failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13749&quot;&gt;FLINK-13749&lt;/a&gt;] -         Make Flink client respect classloading policy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13758&quot;&gt;FLINK-13758&lt;/a&gt;] -         Failed to submit JobGraph when registered hdfs file in DistributedCache 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13799&quot;&gt;FLINK-13799&lt;/a&gt;] -         Web Job Submit Page displays stream of error message when web submit is disables in the config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13827&quot;&gt;FLINK-13827&lt;/a&gt;] -         Shell variable should be escaped in start-scala-shell.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13862&quot;&gt;FLINK-13862&lt;/a&gt;] -         Update Execution Plan docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13945&quot;&gt;FLINK-13945&lt;/a&gt;] -         Instructions for building flink-shaded against vendor repository don&amp;#39;t work
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13969&quot;&gt;FLINK-13969&lt;/a&gt;] -         Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13995&quot;&gt;FLINK-13995&lt;/a&gt;] -         Fix shading of the licence information of netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13999&quot;&gt;FLINK-13999&lt;/a&gt;] -         Correct the documentation of MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14066&quot;&gt;FLINK-14066&lt;/a&gt;] -         Pyflink building failure in master and 1.9.0 version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14074&quot;&gt;FLINK-14074&lt;/a&gt;] -         MesosResourceManager can&amp;#39;t create new taskmanagers in Session Cluster Mode.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14175&quot;&gt;FLINK-14175&lt;/a&gt;] -         Upgrade KPL version in flink-connector-kinesis to fix application OOM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14200&quot;&gt;FLINK-14200&lt;/a&gt;] -         Temporal Table Function Joins do not work on Tables (only TableSources) on the query side
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14235&quot;&gt;FLINK-14235&lt;/a&gt;] -         Kafka010ProducerITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14315&quot;&gt;FLINK-14315&lt;/a&gt;] -         NPE with JobMaster.disconnectTaskManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14337&quot;&gt;FLINK-14337&lt;/a&gt;] -         HistoryServer does not handle NPE on corruped archives properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14347&quot;&gt;FLINK-14347&lt;/a&gt;] -         YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14355&quot;&gt;FLINK-14355&lt;/a&gt;] -         Example code in state processor API docs doesn&amp;#39;t compile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14370&quot;&gt;FLINK-14370&lt;/a&gt;] -         KafkaProducerAtLeastOnceITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14382&quot;&gt;FLINK-14382&lt;/a&gt;] -         Incorrect handling of FLINK_PLUGINS_DIR on Yarn
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14398&quot;&gt;FLINK-14398&lt;/a&gt;] -         Further split input unboxing code into separate methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14413&quot;&gt;FLINK-14413&lt;/a&gt;] -         Shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14434&quot;&gt;FLINK-14434&lt;/a&gt;] -         Dispatcher#createJobManagerRunner should not start JobManagerRunner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14445&quot;&gt;FLINK-14445&lt;/a&gt;] -         Python module build failed when making sdist
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14447&quot;&gt;FLINK-14447&lt;/a&gt;] -         Network metrics doc table render confusion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14459&quot;&gt;FLINK-14459&lt;/a&gt;] -         Python module build hangs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14524&quot;&gt;FLINK-14524&lt;/a&gt;] -         PostgreSQL JDBC sink generates invalid SQL in upsert mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14547&quot;&gt;FLINK-14547&lt;/a&gt;] -         UDF cannot be in the join condition in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14561&quot;&gt;FLINK-14561&lt;/a&gt;] -         Don&amp;#39;t write FLINK_PLUGINS_DIR ENV variable to Flink configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14562&quot;&gt;FLINK-14562&lt;/a&gt;] -         RMQSource leaves idle consumer after closing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14574&quot;&gt;FLINK-14574&lt;/a&gt;] -          flink-s3-fs-hadoop doesn&amp;#39;t work with plugins mechanism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14589&quot;&gt;FLINK-14589&lt;/a&gt;] -         Redundant slot requests with the same AllocationID leads to inconsistent slot table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14641&quot;&gt;FLINK-14641&lt;/a&gt;] -         Fix description of metric `fullRestarts`
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14673&quot;&gt;FLINK-14673&lt;/a&gt;] -         Shouldn&amp;#39;t expect HMS client to throw NoSuchObjectException for non-existing function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14683&quot;&gt;FLINK-14683&lt;/a&gt;] -         RemoteStreamEnvironment&amp;#39;s construction function has a wrong method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14701&quot;&gt;FLINK-14701&lt;/a&gt;] -         Slot leaks if SharedSlotOversubscribedException happens
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14784&quot;&gt;FLINK-14784&lt;/a&gt;] -         CsvTableSink miss delimiter when row start with null member
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14817&quot;&gt;FLINK-14817&lt;/a&gt;] -         &amp;quot;Streaming Aggregation&amp;quot; document contains misleading code examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14846&quot;&gt;FLINK-14846&lt;/a&gt;] -         Correct the default writerbuffer size documentation of RocksDB
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14910&quot;&gt;FLINK-14910&lt;/a&gt;] -         DisableAutoGeneratedUIDs fails on keyBy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14930&quot;&gt;FLINK-14930&lt;/a&gt;] -         OSS Filesystem Uses Wrong Shading Prefix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14949&quot;&gt;FLINK-14949&lt;/a&gt;] -         Task cancellation can be stuck against out-of-thread error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14951&quot;&gt;FLINK-14951&lt;/a&gt;] -         State TTL backend end-to-end test fail when taskManager has multiple slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14953&quot;&gt;FLINK-14953&lt;/a&gt;] -         Parquet table source should use schema type to build FilterPredicate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14960&quot;&gt;FLINK-14960&lt;/a&gt;] -         Dependency shading of table modules test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14976&quot;&gt;FLINK-14976&lt;/a&gt;] -         Cassandra Connector leaks Semaphore on Throwable; hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15001&quot;&gt;FLINK-15001&lt;/a&gt;] -         The digest of sub-plan reuse should contain retraction traits for stream physical nodes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15013&quot;&gt;FLINK-15013&lt;/a&gt;] -         Flink (on YARN) sometimes needs too many slots
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15030&quot;&gt;FLINK-15030&lt;/a&gt;] -         Potential deadlock for bounded blocking ResultPartition.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15036&quot;&gt;FLINK-15036&lt;/a&gt;] -         Container startup error will be handled out side of the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15063&quot;&gt;FLINK-15063&lt;/a&gt;] -         Input group and output group of the task metric are reversed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15065&quot;&gt;FLINK-15065&lt;/a&gt;] -         RocksDB configurable options doc description error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15076&quot;&gt;FLINK-15076&lt;/a&gt;] -         Source thread should be interrupted during the Task cancellation 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15234&quot;&gt;FLINK-15234&lt;/a&gt;] -         Hive table created from flink catalog table shouldn&amp;#39;t have null properties in parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15240&quot;&gt;FLINK-15240&lt;/a&gt;] -         is_generic key is missing for Flink table stored in HiveCatalog
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15259&quot;&gt;FLINK-15259&lt;/a&gt;] -         HiveInspector.toInspectors() should convert Flink constant to Hive constant 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15266&quot;&gt;FLINK-15266&lt;/a&gt;] -         NPE in blink planner code gen
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15361&quot;&gt;FLINK-15361&lt;/a&gt;] -         ParquetTableSource should pass predicate in projectFields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15412&quot;&gt;FLINK-15412&lt;/a&gt;] -         LocalExecutorITCase#testParameterizedTypes failed in travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15413&quot;&gt;FLINK-15413&lt;/a&gt;] -         ScalarOperatorsTest failed in travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15418&quot;&gt;FLINK-15418&lt;/a&gt;] -         StreamExecMatchRule not set FlinkRelDistribution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15421&quot;&gt;FLINK-15421&lt;/a&gt;] -         GroupAggsHandler throws java.time.LocalDateTime cannot be cast to java.sql.Timestamp
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15435&quot;&gt;FLINK-15435&lt;/a&gt;] -         ExecutionConfigTests.test_equals_and_hash in pyFlink fails when cpu core numbers is 6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15443&quot;&gt;FLINK-15443&lt;/a&gt;] -         Use JDBC connector write FLOAT value occur ClassCastException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15478&quot;&gt;FLINK-15478&lt;/a&gt;] -         FROM_BASE64 code gen type wrong
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15489&quot;&gt;FLINK-15489&lt;/a&gt;] -         WebUI log refresh not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15522&quot;&gt;FLINK-15522&lt;/a&gt;] -         Misleading root cause exception when cancelling the job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15523&quot;&gt;FLINK-15523&lt;/a&gt;] -         ConfigConstants generally excluded from japicmp
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15543&quot;&gt;FLINK-15543&lt;/a&gt;] -         Apache Camel not bundled but listed in flink-dist NOTICE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15549&quot;&gt;FLINK-15549&lt;/a&gt;] -         Integer overflow in SpillingResettableMutableObjectIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15577&quot;&gt;FLINK-15577&lt;/a&gt;] -         WindowAggregate RelNodes missing Window specs in digest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15615&quot;&gt;FLINK-15615&lt;/a&gt;] -         Docs: wrong guarantees stated for the file sink
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11135&quot;&gt;FLINK-11135&lt;/a&gt;] -         Reorder Hadoop config loading in HadoopUtils
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12848&quot;&gt;FLINK-12848&lt;/a&gt;] -         Method equals() in RowTypeInfo should consider fieldsNames
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13729&quot;&gt;FLINK-13729&lt;/a&gt;] -         Update website generation dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14008&quot;&gt;FLINK-14008&lt;/a&gt;] -         Auto-generate binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14104&quot;&gt;FLINK-14104&lt;/a&gt;] -         Bump Jackson to 2.10.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14123&quot;&gt;FLINK-14123&lt;/a&gt;] -         Lower the default value of taskmanager.memory.fraction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14206&quot;&gt;FLINK-14206&lt;/a&gt;] -         Let fullRestart metric count fine grained restarts as well
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14215&quot;&gt;FLINK-14215&lt;/a&gt;] -         Add Docs for TM and JM Environment Variable Setting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14251&quot;&gt;FLINK-14251&lt;/a&gt;] -         Add FutureUtils#forward utility
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14334&quot;&gt;FLINK-14334&lt;/a&gt;] -         ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14335&quot;&gt;FLINK-14335&lt;/a&gt;] -         ExampleIntegrationTest in testing docs is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14408&quot;&gt;FLINK-14408&lt;/a&gt;] -         In OldPlanner, UDF open method can not be invoke when SQL is optimized
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14557&quot;&gt;FLINK-14557&lt;/a&gt;] -         Clean up the package of py4j
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14639&quot;&gt;FLINK-14639&lt;/a&gt;] -         Metrics User Scope docs refer to wrong class
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14646&quot;&gt;FLINK-14646&lt;/a&gt;] -         Check non-null for key in KeyGroupStreamPartitioner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14825&quot;&gt;FLINK-14825&lt;/a&gt;] -         Rework state processor api documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14995&quot;&gt;FLINK-14995&lt;/a&gt;] -         Kinesis NOTICE is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15113&quot;&gt;FLINK-15113&lt;/a&gt;] -         fs.azure.account.key not hidden from global configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15554&quot;&gt;FLINK-15554&lt;/a&gt;] -         Bump jetty-util-ajax to 9.3.24
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15657&quot;&gt;FLINK-15657&lt;/a&gt;] -         Fix the python table api doc link in Python API tutorial
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15700&quot;&gt;FLINK-15700&lt;/a&gt;] -         Improve Python API Tutorial doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15726&quot;&gt;FLINK-15726&lt;/a&gt;] -         Fixing error message in StreamExecTableSourceScan
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 30 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/30/release-1.9.2.html</link>
<guid isPermaLink="true">/news/2020/01/30/release-1.9.2.html</guid>
</item>

<item>
<title>State Unlocked: Interacting with State in Apache Flink</title>
<description>&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;

&lt;p&gt;With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink.&lt;/p&gt;

&lt;p&gt;In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink. In particular, Flink developers should have sufficient means to access and modify their state, as well as making bootstrapping state with existing data from external systems a piece-of-cake. These efforts span multiple Flink major releases and consist of the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Evolvable state schema in Apache Flink&lt;/li&gt;
  &lt;li&gt;Flexibility in swapping state backends, and&lt;/li&gt;
  &lt;li&gt;The State processor API, an offline tool to read, write and modify state in Flink&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This post discusses the community’s efforts related to state management in Flink, provides some practical examples of how the different features and APIs can be utilized and covers some future ideas for new and improved ways of managing state in Apache Flink.&lt;/p&gt;

&lt;h1 id=&quot;stream-processing-what-is-state&quot;&gt;Stream processing: What is State?&lt;/h1&gt;

&lt;p&gt;To set the tone for the remaining of the post, let us first try to explain the very definition of state in stream processing. When it comes to stateful stream processing, state comprises of the information that an application or stream processing engine will remember across events and streams as more realtime (unbounded) and/or offline (bounded) data flow through the system. Most trivial applications are inherently stateful; even the example of a simple COUNT operation, whereby when counting up to 10, you essentially need to remember that you have already counted up to 9.&lt;/p&gt;

&lt;p&gt;To better understand how Flink manages state, one can think of Flink like a three-layered state abstraction, as illustrated in the diagram below.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-visual-1.png&quot; width=&quot;600px&quot; alt=&quot;State in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;On the top layer, sits the Flink user code, for example, a &lt;code&gt;KeyedProcessFunction&lt;/code&gt; that contains some value state. This is a simple variable whose value state annotations makes it automatically fault-tolerant, re-scalable and queryable by the runtime. These variables are backed by the configured state backend that sits either on-heap or on-disk (RocksDB State Backend) and provides data locality, proximity to the computation and speed when it comes to per-record computations. Finally, when it comes to upgrades, the introduction of new features or bug fixes, and in order to keep your existing state intact, this is where savepoints come in.&lt;/p&gt;

&lt;p&gt;A savepoint is a snapshot of the distributed, global state of an application at a logical point-in-time and is stored in an external distributed file system or blob storage such as HDFS, or S3. Upon upgrading an application or implementing a code change  — such as adding a new operator or changing a field — the Flink job can restart by re-loading the application state from the savepoint into the state backend, making it local and available for the computation and continue processing as if nothing had ever happened.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-visual-2.png&quot; width=&quot;600px&quot; alt=&quot;State in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
 It is important to remember here that &lt;b&gt;state is one of the most valuable components of a Flink application&lt;/b&gt; carrying all the information about both where you are now and where you are going. State is among the most long-lived components in a Flink service since it can be carried across jobs, operators, configurations, new features and bug fixes.
&lt;/div&gt;

&lt;h1 id=&quot;schema-evolution-with-apache-flink&quot;&gt;Schema Evolution with Apache Flink&lt;/h1&gt;

&lt;p&gt;In the previous section, we explained how state is stored and persisted in a Flink application. Let’s now take a look at what happens when evolving state in a stateful Flink streaming application becomes necessary.&lt;/p&gt;

&lt;p&gt;Imagine an Apache Flink application that implements a &lt;code&gt;KeyedProcessFunction&lt;/code&gt; and contains some &lt;code&gt;ValueState&lt;/code&gt;. As illustrated below, within the state descriptor, when registering the type, Flink users specify their &lt;code&gt;TypeInformation&lt;/code&gt; that informs Flink about how to serialize the bytes and represents Flink’s internal type system, used to serialize data when shipped across the network or stored in state backends. Flink’s type system has built-in support for all the basic types such as longs, strings, doubles, arrays and basic collection types like lists and maps. Additionally, Flink supports most of the major composite types including Tuples, POJOs,  Scala Case Classes and Apache Avro&lt;sup&gt;Ⓡ&lt;/sup&gt;. Finally, if an application’s type does not match any of the above, developers can either plug in their own serializer or Flink will then fall back to Kryo.&lt;/p&gt;

&lt;h2 id=&quot;state-registration-with-built-in-serialization-in-apache-flink&quot;&gt;State registration with built-in serialization in Apache Flink&lt;/h2&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;descriptor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-state&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;valueState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;descriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Typically, evolving the schema of an application’s state happens because of some business logic change (adding or dropping fields or changing data types). In all cases, the schema is determined by means of its serializer, and can be thought of in terms of an alter table statement when compared with a database. When a state variable is first introduced it is like running a &lt;code&gt;CREATE_TABLE&lt;/code&gt; command, there is a lot of freedom with its execution. However, having data in that table (registered rows) limits developers in what they can do and what rules they follow in order to make updates or changes by an &lt;code&gt;ALTER_TABLE&lt;/code&gt; statement. Schema migration in Apache Flink follows a similar principle since the framework is essentially running an &lt;code&gt;ALTER_TABLE&lt;/code&gt; statement across savepoints.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-182&quot;&gt;Flink 1.8&lt;/a&gt; comes with built-in support for &lt;a href=&quot;https://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt; (specifically the &lt;a href=&quot;https://avro.apache.org/docs/1.7.7/spec.html&quot;&gt;1.7.7 specification&lt;/a&gt;) and evolves state schema according to Avro specifications by adding and removing types or even by swapping between generic and specific Avro record types.&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-191&quot;&gt;Flink 1.9&lt;/a&gt; the community added support for schema evolution for POJOs, including the ability to remove existing fields from POJO types or add new fields. The POJO schema evolution tends to be less flexible — when compared to Avro — since it is not possible to change neither the declared field types nor the class name of a POJO type, including its namespace.&lt;/p&gt;

&lt;p&gt;With the community’s efforts related to schema evolution, Flink developers can now expect out-of-the-box support for both Avro and POJO formats, with backwards compatibility for all Flink state backends. Future work revolves around adding support for Scala Case Classes, Tuples and other formats. Make sure to subscribe to the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;Flink mailing list&lt;/a&gt; to contribute and stay on top of any upcoming additions in this space.&lt;/p&gt;

&lt;h2 id=&quot;peeking-under-the-hood&quot;&gt;Peeking Under the Hood&lt;/h2&gt;

&lt;p&gt;Now that we have explained how schema evolution in Flink works, let’s describe the challenges of performing schema serialization with Flink under the hood. Flink considers state as a core part of its API stability, in a way that developers should always be able to take a savepoint from one version of Flink and restart it on the next. With schema evolution, every migration needs to be backwards compatible and also compatible with the different state backends. While in the Flink code the state backends are represented as interfaces detailing how to store and retrieve bytes, in practice, they behave vastly differently, something that adds extra complexity to how schema evolution is executed in Flink.&lt;/p&gt;

&lt;p&gt;For instance, the heap state backend supports lazy serialization and eager deserialization, making the per-record code path always working with Java objects, serializing on a background thread.  When restoring, Flink will eagerly deserialize all the data and then start the user code. If a developer plugs in a new serializer, the deserialization happens before Flink ever receives the information.&lt;/p&gt;

&lt;p&gt;The RocksDB state backend behaves in the exact opposite manner: it supports eager serialization — because of items being stored on disk and RocksDB only consuming byte arrays. RocksDB provides lazy deserialization simply by downloading files to the local disk, making Flink unaware of what the bytes mean until a serializer is registered.&lt;/p&gt;

&lt;p&gt;An additional challenge stems from the fact that different versions of user code contain different classes on their classpath making the serializer used to write into a savepoint likely potentially unavailable at runtime.&lt;/p&gt;

&lt;p&gt;To overcome the previously mentioned challenges, we introduced what we call &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt;. The &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; stores the configuration of the writer serializer in the snapshot. When restoring it will use that configuration to read back the previous state and check its compatibility with the current version. Using such operation allows Flink to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Read the configuration used to write out a snapshot&lt;/li&gt;
  &lt;li&gt;Consume the new user code&lt;/li&gt;
  &lt;li&gt;Check if both items above are compatible&lt;/li&gt;
  &lt;li&gt;Consume the bytes from the snapshot and move forward or alert the user otherwise&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TypeSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getCurrentVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;writeSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataOutputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;readSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;DataInputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;ClassLoader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;restoreSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;resolveSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;implementing-apache-avro-serialization-in-flink&quot;&gt;Implementing Apache Avro Serialization in Flink&lt;/h2&gt;

&lt;p&gt;Apache Avro is a data serialization format that has very well-defined schema migration semantics and supports both reader and writer schemas. During normal Flink execution the reader and writer schemas will be the same. However, when upgrading an application they may be different and with schema evolution, Flink will be able to migrate objects with their schemas.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@SuppressWarnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WeakerAccess&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is a sketch of our Avro serializer. It uses the provided schemas and delegates to Apache Avro for all (de)-serialization. Let’s take a look at one possible implementation of a &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; that supports schema migration for Avro.&lt;/p&gt;

&lt;h1 id=&quot;writing-out-the-snapshot&quot;&gt;Writing out the snapshot&lt;/h1&gt;

&lt;p&gt;When serializing out the snapshot, the snapshot configuration will write two pieces of information; the current snapshot configuration version and the serializer configuration.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getCurrentVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;writeSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataOutputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeUTF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The version is used to version the snapshot configuration object itself while the &lt;code&gt;writeSnapshot&lt;/code&gt; method writes out all the information we need to understand the current format; the runtime schema.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;readSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;DataInputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;ClassLoader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchemaDefinition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readUTF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;previousSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseAvroSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchemaDefinition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;findClassOrFallbackToGeneric&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getFullName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tryExtractAvroSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now when Flink restores it is able to read back in the writer schema used to serialize the data. The current runtime schema is discovered on the class path using some Java reflection magic.&lt;/p&gt;

&lt;p&gt;Once we have both of these we can compare them for compatibility. Perhaps nothing has changed and the schemas are compatible as is.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;resolveSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;incompatible&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Objects&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;compatibleAsIs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Otherwise, the schemas are compared using Avro’s compatibility checks and they may either be compatible with a migration or incompatible.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;  &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SchemaPairCompatibility&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compatibility&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SchemaCompatibility&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;checkReaderWriterCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;avroCompatibilityToFlinkCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If they are compatible with migration then Flink will restore a new serializer that can read the old schema and deserialize into the new runtime type which is in effect a migration.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;restoreSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id=&quot;the-state-processor-api-reading-writing-and-modifying-flink-state&quot;&gt;The State Processor API: Reading, writing and modifying Flink state&lt;/h1&gt;

&lt;p&gt;The State Processor API allows reading from and writing to Flink savepoints. Some of the interesting use cases it can be used for are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Analyzing state for interesting patterns&lt;/li&gt;
  &lt;li&gt;Troubleshooting or auditing jobs by checking for state discrepancies&lt;/li&gt;
  &lt;li&gt;Bootstrapping state for new applications&lt;/li&gt;
  &lt;li&gt;Modifying savepoints such as:
    &lt;ul&gt;
      &lt;li&gt;Changing the maximum parallelism of a savepoint after deploying a Flink job&lt;/li&gt;
      &lt;li&gt;Introducing breaking schema updates to a Flink application&lt;/li&gt;
      &lt;li&gt;Correcting invalid state in a Flink savepoint&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a &lt;a href=&quot;https://flink.apache.org/feature/2019/09/13/state-processor-api.html&quot;&gt;previous blog post&lt;/a&gt;, we discussed the State Processor API in detail, the community’s motivation behind introducing the feature in Flink 1.9, what you can use the API for and how you can use it. Essentially, the State Processor API is based around a relational model of mapping your Flink job state to a database, as illustrated in the diagram below. We encourage you to &lt;a href=&quot;https://flink.apache.org/feature/2019/09/13/state-processor-api.html&quot;&gt;read the previous story&lt;/a&gt; for more information on the API and how to use it. In a follow up post, we will provide detailed tutorials on:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Reading Keyed and Operator State with the State Processor API and&lt;/li&gt;
  &lt;li&gt;Writing and Bootstrapping Keyed and Operator State with the State Processor API&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stay tuned for more details and guidance around this feature of Flink.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-state-processor-api-visual-1.png&quot; width=&quot;600px&quot; alt=&quot;State Processor API in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-state-processor-api-visual-2.png&quot; width=&quot;600px&quot; alt=&quot;State Processor API in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h1 id=&quot;looking-ahead-more-ways-to-interact-with-state-in-flink&quot;&gt;Looking ahead: More ways to interact with State in Flink&lt;/h1&gt;

&lt;p&gt;There is a lot of discussion happening in the community related to extending the way Flink developers interact with state in their Flink applications. Regarding the State Processor API, some thoughts revolve around further broadening the API’s scope beyond its current ability to read from and write to both keyed and operator state. In upcoming releases, the State processor API will be extended to support both reading from and writing to windows and have a first-class integration with Flink’s Table API and SQL.&lt;/p&gt;

&lt;p&gt;Beyond widening the scope of the State Processor API, the Flink community is discussing a few additional ways to improve the way developers interact with state in Flink. One of them is the proposal for a Unified Savepoint Format (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State&quot;&gt;FLIP-41&lt;/a&gt;) for all keyed state backends. Such improvement aims at introducing a unified binary format across all savepoints in all keyed state backends, something that drastically reduces the overhead of swapping the state backend in a Flink application. Such an improvement would allow developers to take a savepoint in their application and restart it in a different state backend — for example, moving it from the heap to disk (RocksDB state backend) and back — depending on the scalability and evolution of the application at different points-in-time.&lt;/p&gt;

&lt;p&gt;The community is also discussing the ability to have upgradability dry runs in upcoming Flink releases. Having such functionality in Flink allows developers to detect incompatible updates offline without the need of starting a new Flink job from scratch. For example, Flink users will be able to uncover topology or schema incompatibilities upon upgrading a Flink job, without having to load the state back to a running Flink job in the first place. Additionally, with upgradability dry runs Flink users will be able to get information about the registered state through the streaming graph, without needing to access the state in the state backend.&lt;/p&gt;

&lt;p&gt;With all  the exciting new functionality added in Flink 1.9 as well as some solid ideas and discussions around bringing state in Flink to the next level, the community is committed to making state in Apache Flink a fundamental element of the framework, something that is ever-present across versions and upgrades of your application and a component that is a true first-class citizen in Apache Flink. We encourage you to sign up to the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;mailing list&lt;/a&gt; and stay on top of the announcements and new features in upcoming releases.&lt;/p&gt;
</description>
<pubDate>Wed, 29 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html</link>
<guid isPermaLink="true">/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html</guid>
</item>

<item>
<title>Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System</title>
<description>&lt;p&gt;In this series of blog posts you will learn about three powerful Flink patterns for building streaming applications:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Dynamic updates of application logic&lt;/li&gt;
  &lt;li&gt;Dynamic data partitioning (shuffle), controlled at runtime&lt;/li&gt;
  &lt;li&gt;Low latency alerting based on custom windowing logic (without using the window API)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These patterns expand the possibilities of what is achievable with statically defined data flows and provide the building blocks to fulfill complex business requirements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dynamic updates of application logic&lt;/strong&gt; allow Flink jobs to change at runtime, without downtime from stopping and resubmitting the code.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Dynamic data partitioning&lt;/strong&gt; provides the ability to change how events are distributed and grouped by Flink at runtime. Such functionality often becomes a natural requirement when building jobs with dynamically reconfigurable application logic.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Custom window management&lt;/strong&gt; demonstrates how you can utilize the low level &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html&quot;&gt;process function API&lt;/a&gt;, when the native &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html&quot;&gt;window API&lt;/a&gt; is not exactly matching your requirements. Specifically, you will learn how to implement low latency alerting on windows and how to limit state growth with timers.&lt;/p&gt;

&lt;p&gt;These patterns build on top of core Flink functionality, however, they might not be immediately apparent from the framework’s documentation as explaining and presenting the motivation behind them is not always trivial without a concrete use case. That is why we will showcase these patterns with a practical example that offers a real-world usage scenario for Apache Flink — a &lt;em&gt;Fraud Detection&lt;/em&gt; engine.
We hope that this series will place these powerful approaches into your tool belt and enable you to take on new and exciting tasks.&lt;/p&gt;

&lt;p&gt;In the first blog post of the series we will look at the high-level architecture of the demo application, describe its components and their interactions. We will then deep dive into the implementation details of the first pattern in the series - &lt;strong&gt;dynamic data partitioning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You will be able to run the full Fraud Detection Demo application locally and look into the details of the implementation by using the accompanying GitHub repository.&lt;/p&gt;

&lt;h3 id=&quot;fraud-detection-demo&quot;&gt;Fraud Detection Demo&lt;/h3&gt;

&lt;p&gt;The full source code for our fraud detection demo is open source and available online. To run it locally, check out the following repository and follow the steps in the README:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/afedulov/fraud-detection-demo&quot;&gt;https://github.com/afedulov/fraud-detection-demo&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You will see the demo is a self-contained application - it only requires &lt;code&gt;docker&lt;/code&gt; and &lt;code&gt;docker-compose&lt;/code&gt; to be built from sources and includes the following components:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Apache Kafka (message broker) with ZooKeeper&lt;/li&gt;
  &lt;li&gt;Apache Flink (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/concepts/glossary.html#flink-application-cluster&quot;&gt;application cluster&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Fraud Detection Web App&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The high-level goal of the Fraud Detection engine is to consume a stream of financial transactions and evaluate them against a set of rules. These rules are subject to frequent changes and tweaks. In a real production system, it is important to be able to add and remove them at runtime, without incurring an expensive penalty of stopping and restarting the job.&lt;/p&gt;

&lt;p&gt;When you navigate to the demo URL in your browser, you will be presented with the following UI:&lt;/p&gt;

&lt;center&gt;
 &lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/ui.png&quot; width=&quot;800px&quot; alt=&quot;Figure 1: Demo UI&quot; /&gt;
 &lt;br /&gt;
 &lt;i&gt;&lt;small&gt;Figure 1: Fraud Detection Demo UI&lt;/small&gt;&lt;/i&gt;
 &lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;On the left side, you can see a visual representation of financial transactions flowing through the system after you click the “Start” button. The slider at the top allows you to control the number of generated transactions per second. The middle section is devoted to managing the rules evaluated by Flink. From here, you can create new rules as well as issue control commands, such as clearing Flink’s state.&lt;/p&gt;

&lt;p&gt;The demo out-of-the-box comes with a set of predefined sample rules. You can click the &lt;em&gt;Start&lt;/em&gt; button and, after some time, will observe alerts displayed in the right section of the UI. These alerts are the result of Flink evaluating the generated transactions stream against the predefined rules.&lt;/p&gt;

&lt;p&gt;Our sample fraud detection system consists of three main components:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Frontend (React)&lt;/li&gt;
  &lt;li&gt;Backend (SpringBoot)&lt;/li&gt;
  &lt;li&gt;Fraud Detection application (Apache Flink)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Interactions between the main elements are depicted in &lt;em&gt;Figure 2&lt;/em&gt;.&lt;/p&gt;

&lt;center&gt;
 &lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/architecture.png&quot; width=&quot;800px&quot; alt=&quot;Figure 2: Demo Components&quot; /&gt;
 &lt;br /&gt;
 &lt;i&gt;&lt;small&gt;Figure 2: Fraud Detection Demo Components&lt;/small&gt;&lt;/i&gt;
 &lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The Backend exposes a REST API to the Frontend for creating/deleting rules as well as issuing control commands for managing the demo execution. It then relays those Frontend actions to Flink by sending them via a “Control” Kafka topic. The Backend additionally includes a &lt;em&gt;Transactions Generator&lt;/em&gt; component, which sends an emulated stream of money transfer events to Flink via a separate “Transactions” topic. Alerts generated by Flink are consumed by the Backend from “Alerts” topic and relayed to the UI via WebSockets.&lt;/p&gt;

&lt;p&gt;Now that you are familiar with the overall layout and the goal of our Fraud Detection engine, let’s now go into the details of what is required to implement such a system.&lt;/p&gt;

&lt;h3 id=&quot;dynamic-data-partitioning&quot;&gt;Dynamic Data Partitioning&lt;/h3&gt;

&lt;p&gt;The first pattern we will look into is Dynamic Data Partitioning.&lt;/p&gt;

&lt;p&gt;If you have used Flink’s DataStream API in the past, you are undoubtedly familiar with the &lt;strong&gt;keyBy&lt;/strong&gt; method. Keying a stream shuffles all the records such that elements with the same key are assigned to the same partition. This means all records with the same key are processed by the same physical instance of the next operator.&lt;/p&gt;

&lt;p&gt;In a typical streaming application, the choice of key is fixed, determined by some static field within the elements. For instance, when building a simple window-based aggregation of a stream of transactions, we might always group by the transactions account id.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// [...]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;...&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;Transaction:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getAccountId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/*window specification*/&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This approach is the main building block for achieving horizontal scalability in a wide range of use cases. However, in the case of an application striving to provide flexibility in business logic at runtime, this is not enough.
To understand why this is the case, let us start with articulating a realistic sample rule definition for our fraud detection system in the form of a functional requirement:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;“Whenever the &lt;strong&gt;sum&lt;/strong&gt; of the accumulated &lt;strong&gt;payment amount&lt;/strong&gt; from the same &lt;strong&gt;payer&lt;/strong&gt; to the same &lt;strong&gt;beneficiary&lt;/strong&gt; within the &lt;strong&gt;duration of a week&lt;/strong&gt; is &lt;strong&gt;greater&lt;/strong&gt; than &lt;strong&gt;1 000 000 $&lt;/strong&gt; - fire an alert.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In this formulation we can spot a number of parameters that we would like to be able to specify in a newly-submitted rule and possibly even later modify or tweak at runtime:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Aggregation field (payment amount)&lt;/li&gt;
  &lt;li&gt;Grouping fields (payer + beneficiary)&lt;/li&gt;
  &lt;li&gt;Aggregation function (sum)&lt;/li&gt;
  &lt;li&gt;Window duration (1 week)&lt;/li&gt;
  &lt;li&gt;Limit (1 000 000)&lt;/li&gt;
  &lt;li&gt;Limit operator (greater)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Accordingly, we will use the following simple JSON format to define the aforementioned parameters:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;ruleId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;ruleState&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;ACTIVE&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;groupingKeyNames&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;payerId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;beneficiaryId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregateFieldName&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;paymentAmount&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregatorFunctionType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;SUM&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;limitOperatorType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;GREATER&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;limit&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;quot;windowMinutes&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10080&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;At this point, it is important to understand that &lt;strong&gt;&lt;code&gt;groupingKeyNames&lt;/code&gt;&lt;/strong&gt; determine the actual physical grouping of events - all Transactions with the same values of specified parameters (e.g. &lt;em&gt;payer #25 -&amp;gt; beneficiary #12&lt;/em&gt;) have to be aggregated in the same physical instance of the evaluating operator. Naturally, the process of distributing data in such a way in Flink’s API is realised by a &lt;code&gt;keyBy()&lt;/code&gt; function.&lt;/p&gt;

&lt;p&gt;Most examples in Flink’s &lt;code&gt;keyBy()&lt;/code&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-field-expressions&quot;&gt;documentation&lt;/a&gt; use a hard-coded &lt;code&gt;KeySelector&lt;/code&gt;, which extracts specific fixed events’ fields. However, to support the desired flexibility, we have to extract them in a more dynamic fashion based on the specifications of the rules. For this, we will have to use one additional operator that prepares every event for dispatching to a correct aggregating instance.&lt;/p&gt;

&lt;p&gt;On a high level, our main processing pipeline looks like this:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* some key selector */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* actual calculations and alerting */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We have previously established that each rule defines a &lt;strong&gt;&lt;code&gt;groupingKeyNames&lt;/code&gt;&lt;/strong&gt; parameter that specifies which combination of fields will be used for the incoming events’ grouping. Each rule might use an arbitrary combination of these fields. At the same time, every incoming event potentially needs to be evaluated against multiple rules. This implies that events might simultaneously need to be present at multiple parallel instances of evaluating operators that correspond to different rules and hence will need to be forked. Ensuring such events dispatching is the purpose of &lt;code&gt;DynamicKeyFunction()&lt;/code&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/shuffle_function_1.png&quot; width=&quot;800px&quot; alt=&quot;Figure 3: Forking events with Dynamic Key Function&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 3: Forking events with Dynamic Key Function&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;DynamicKeyFunction&lt;/code&gt; iterates over a set of defined rules and prepares every event to be processed by a &lt;code&gt;keyBy()&lt;/code&gt; function by extracting the required grouping keys:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;cm&quot;&gt;/* Simplified */&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* Rules that are initialized somehow.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;                        Details will be discussed in a future blog post. */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
      &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
           &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;KeysExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getGroupingKeyNames&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
               &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;KeysExtractor.getKey()&lt;/code&gt; uses reflection to extract the required values of &lt;code&gt;groupingKeyNames&lt;/code&gt; fields from events and combines them as a single concatenated String key, e.g &lt;code&gt;&quot;{payerId=25;beneficiaryId=12}&quot;&lt;/code&gt;. Flink will calculate the hash of this key and assign the processing of this particular combination to a specific server in the cluster. This will allow tracking all transactions between &lt;em&gt;payer #25&lt;/em&gt; and &lt;em&gt;beneficiary #12&lt;/em&gt; and evaluating defined rules within the desired time window.&lt;/p&gt;

&lt;p&gt;Notice that a wrapper class &lt;code&gt;Keyed&lt;/code&gt; with the following signature was introduced as the output type of &lt;code&gt;DynamicKeyFunction&lt;/code&gt;:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wrapped&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(){&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Fields of this POJO carry the following information: &lt;code&gt;wrapped&lt;/code&gt; is the original transaction event, &lt;code&gt;key&lt;/code&gt; is the result of using &lt;code&gt;KeysExtractor&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt; is the ID of the Rule that caused the dispatch of the event (according to the rule-specific grouping logic).&lt;/p&gt;

&lt;p&gt;Events of this type will be the input to the &lt;code&gt;keyBy()&lt;/code&gt; function in the main processing pipeline and allow the use of a simple lambda-expression as a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions&quot;&gt;&lt;code&gt;KeySelector&lt;/code&gt;&lt;/a&gt; for the final step of implementing dynamic data shuffle.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;By applying &lt;code&gt;DynamicKeyFunction&lt;/code&gt; we are implicitly copying events for performing parallel per-rule evaluation within a Flink cluster. By doing so, we achieve an important property - horizontal scalability of rules’ processing. Our system will be capable of handling more rules by adding more servers to the cluster, i.e. increasing the parallelism. This property is achieved at the cost of data duplication, which might become an issue depending on the specific set of parameters, such as incoming data rate, available network bandwidth, event payload size etc. In a real-life scenario, additional optimizations can be applied, such as combined evaluation of rules which have the same &lt;code&gt;groupingKeyNames&lt;/code&gt;, or a filtering layer, which would strip events of all the fields that are not required for processing of a particular rule.&lt;/p&gt;

&lt;h3 id=&quot;summary&quot;&gt;Summary:&lt;/h3&gt;

&lt;p&gt;In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case - a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a  &lt;strong&gt;dynamic data partitioning pattern&lt;/strong&gt; as the first underlying building block to enable flexible runtime configurations.&lt;/p&gt;

&lt;p&gt;To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.&lt;/p&gt;

&lt;p&gt;In the second part of this series, we will describe how the rules make their way into the running Fraud Detection engine. Additionally, we will go over the implementation details of the main processing function of the pipeline - &lt;em&gt;DynamicAlertFunction()&lt;/em&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/end-to-end.png&quot; width=&quot;800px&quot; alt=&quot;Figure 4: End-to-end pipeline&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 4: End-to-end pipeline&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;In the next article, we will see how Flink’s broadcast streams can be utilized to help steer the processing within the Fraud Detection engine at runtime (Dynamic Application Updates pattern).&lt;/p&gt;
</description>
<pubDate>Wed, 15 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html</link>
<guid isPermaLink="true">/news/2020/01/15/demo-fraud-detection.html</guid>
</item>

<item>
<title>Apache Flink 1.8.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;

&lt;p&gt;This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.3.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13723&quot;&gt;FLINK-13723&lt;/a&gt;] -         Use liquid-c for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13724&quot;&gt;FLINK-13724&lt;/a&gt;] -         Remove unnecessary whitespace from the docs&amp;#39; sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13725&quot;&gt;FLINK-13725&lt;/a&gt;] -         Use sassc for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13726&quot;&gt;FLINK-13726&lt;/a&gt;] -         Build docs with jekyll 4.0.0.pre.beta1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13791&quot;&gt;FLINK-13791&lt;/a&gt;] -         Speed up sidenav by using group_by
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] -         Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13184&quot;&gt;FLINK-13184&lt;/a&gt;] -         Starting a TaskExecutor blocks the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13728&quot;&gt;FLINK-13728&lt;/a&gt;] -         Fix wrong closing tag order in sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13746&quot;&gt;FLINK-13746&lt;/a&gt;] -         Elasticsearch (v2.3.5) sink end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13749&quot;&gt;FLINK-13749&lt;/a&gt;] -         Make Flink client respect classloading policy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13892&quot;&gt;FLINK-13892&lt;/a&gt;] -         HistoryServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13936&quot;&gt;FLINK-13936&lt;/a&gt;] -         NOTICE-binary is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13966&quot;&gt;FLINK-13966&lt;/a&gt;] -         Jar sorting in collect_license_files.sh is locale dependent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13995&quot;&gt;FLINK-13995&lt;/a&gt;] -         Fix shading of the licence information of netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13999&quot;&gt;FLINK-13999&lt;/a&gt;] -         Correct the documentation of MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14009&quot;&gt;FLINK-14009&lt;/a&gt;] -         Cron jobs broken due to verifying incorrect NOTICE-binary file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14010&quot;&gt;FLINK-14010&lt;/a&gt;] -         Dispatcher &amp;amp; JobManagers don&amp;#39;t give up leadership when AM is shut down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14043&quot;&gt;FLINK-14043&lt;/a&gt;] -         SavepointMigrationTestBase is super slow
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14107&quot;&gt;FLINK-14107&lt;/a&gt;] -         Kinesis consumer record emitter deadlock under event time alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14175&quot;&gt;FLINK-14175&lt;/a&gt;] -         Upgrade KPL version in flink-connector-kinesis to fix application OOM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14235&quot;&gt;FLINK-14235&lt;/a&gt;] -         Kafka010ProducerITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14315&quot;&gt;FLINK-14315&lt;/a&gt;] -         NPE with JobMaster.disconnectTaskManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14337&quot;&gt;FLINK-14337&lt;/a&gt;] -         HistoryServerTest.testHistoryServerIntegration failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14347&quot;&gt;FLINK-14347&lt;/a&gt;] -         YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14370&quot;&gt;FLINK-14370&lt;/a&gt;] -         KafkaProducerAtLeastOnceITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14398&quot;&gt;FLINK-14398&lt;/a&gt;] -         Further split input unboxing code into separate methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14413&quot;&gt;FLINK-14413&lt;/a&gt;] -         shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14434&quot;&gt;FLINK-14434&lt;/a&gt;] -         Dispatcher#createJobManagerRunner should not start JobManagerRunner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14562&quot;&gt;FLINK-14562&lt;/a&gt;] -         RMQSource leaves idle consumer after closing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14589&quot;&gt;FLINK-14589&lt;/a&gt;] -         Redundant slot requests with the same AllocationID leads to inconsistent slot table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15036&quot;&gt;FLINK-15036&lt;/a&gt;] -         Container startup error will be handled out side of the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12848&quot;&gt;FLINK-12848&lt;/a&gt;] -         Method equals() in RowTypeInfo should consider fieldsNames
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13729&quot;&gt;FLINK-13729&lt;/a&gt;] -         Update website generation dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13965&quot;&gt;FLINK-13965&lt;/a&gt;] -         Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13967&quot;&gt;FLINK-13967&lt;/a&gt;] -         Generate full binary licensing via collect_license_files.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13968&quot;&gt;FLINK-13968&lt;/a&gt;] -         Add travis check for the correctness of the binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13991&quot;&gt;FLINK-13991&lt;/a&gt;] -         Add git exclusion for 1.9+ features to 1.8
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14008&quot;&gt;FLINK-14008&lt;/a&gt;] -         Auto-generate binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14104&quot;&gt;FLINK-14104&lt;/a&gt;] -         Bump Jackson to 2.10.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14123&quot;&gt;FLINK-14123&lt;/a&gt;] -         Lower the default value of taskmanager.memory.fraction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14215&quot;&gt;FLINK-14215&lt;/a&gt;] -         Add Docs for TM and JM Environment Variable Setting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14334&quot;&gt;FLINK-14334&lt;/a&gt;] -         ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14639&quot;&gt;FLINK-14639&lt;/a&gt;] -         Fix the document of Metrics  that has an error for `User Scope` 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14646&quot;&gt;FLINK-14646&lt;/a&gt;] -         Check non-null for key in KeyGroupStreamPartitioner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14995&quot;&gt;FLINK-14995&lt;/a&gt;] -         Kinesis NOTICE is incorrect
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 Dec 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/12/11/release-1.8.3.html</link>
<guid isPermaLink="true">/news/2019/12/11/release-1.8.3.html</guid>
</item>

<item>
<title>Running Apache Flink on Kubernetes with KUDO</title>
<description>&lt;p&gt;A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. To achieve elastic scalability, both are typically deployed in clustered environments, and increasingly on top of container orchestration platforms like Kubernetes. The &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/operator/&quot;&gt;Operator pattern&lt;/a&gt; provides an extension mechanism to Kubernetes that captures human operator knowledge about an application, like Flink, in software to automate its operation. &lt;a href=&quot;https://kudo.dev&quot;&gt;KUDO&lt;/a&gt; is an open source toolkit for building Operators using declarative YAML specs, with a focus on ease of use for cluster admins and developers.&lt;/p&gt;

&lt;p&gt;In this blog post we demonstrate how to orchestrate a streaming data analytics application based on Flink and Kafka with KUDO. It consists of a Flink job that checks financial transactions for fraud, and two microservices that generate and display the transactions. You can find more details about this demo in the &lt;a href=&quot;https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud&quot;&gt;KUDO Operators repository&lt;/a&gt;, including instructions for installing the dependencies.&lt;/p&gt;

&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
	&lt;img src=&quot;/img/blog/2019-11-06-flink-kubernetes-kudo/flink-kudo-architecture.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;

&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h2&gt;

&lt;p&gt;You can run this demo on your local machine using &lt;a href=&quot;https://github.com/kubernetes/minikube&quot;&gt;minikube&lt;/a&gt;. The instructions below were tested with minikube v1.5.1 and Kubernetes v1.16.2 but should work on any Kubernetes version above v1.15.0. First, start a minikube cluster with enough capacity:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;minikube start --cpus=6 --memory=9216 --disk-size=10g&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;If you’re using a different way to provision Kubernetes, make sure you have at least 6 CPU Cores, 9 GB of RAM and 10 GB of disk space available.&lt;/p&gt;

&lt;p&gt;Install the &lt;code&gt;kubectl&lt;/code&gt; CLI tool. The KUDO CLI is a plugin for the Kubernetes CLI. The official instructions for installing and setting up kubectl are &lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/install-kubectl/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Next, let’s install the KUDO CLI. At the time of this writing, the latest KUDO version is v0.10.0. You can find the CLI binaries for download &lt;a href=&quot;https://github.com/kudobuilder/kudo/releases&quot;&gt;here&lt;/a&gt;. Download the &lt;code&gt;kubectl-kudo&lt;/code&gt; binary for your OS and architecture.&lt;/p&gt;

&lt;p&gt;If you’re using Homebrew on MacOS, you can install the CLI via:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ brew tap kudobuilder/tap
$ brew install kudo-cli
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now, let’s initialize KUDO on our Kubernetes cluster:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo init
$KUDO_HOME has been configured at /Users/gerred/.kudo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will create several resources. First, it will create the &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/&quot;&gt;Custom Resource Definitions&lt;/a&gt;, &lt;a href=&quot;https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/&quot;&gt;service account&lt;/a&gt;, and &lt;a href=&quot;https://kubernetes.io/docs/reference/access-authn-authz/rbac/&quot;&gt;role bindings&lt;/a&gt; necessary for KUDO to operate. It will also create an instance of the &lt;a href=&quot;https://kudo.dev/docs/architecture.html#components&quot;&gt;KUDO controller&lt;/a&gt; so that we can begin creating instances of applications.&lt;/p&gt;

&lt;p&gt;The KUDO CLI leverages the kubectl plugin system, which gives you all its functionality under &lt;code&gt;kubectl kudo&lt;/code&gt;. This is a convenient way to install and deal with your KUDO Operators. For our demo, we use Kafka and Flink which depend on ZooKeeper. To make the ZooKeeper Operator available on the cluster, run:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install zookeeper --version=0.3.0 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The –skip-instance flag skips the creation of a ZooKeeper instance. The flink-demo Operator that we’re going to install below will create it as a dependency instead. Now let’s make the Kafka and Flink Operators available the same way:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install kafka --version=1.2.0 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install flink --version=0.2.1 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This installs all the Operator versions needed for our demo.&lt;/p&gt;

&lt;h2 id=&quot;financial-fraud-demo&quot;&gt;Financial Fraud Demo&lt;/h2&gt;

&lt;p&gt;In our financial fraud demo we have two microservices, called “generator” and “actor”. The generator produces transactions with random amounts and writes them into a Kafka topic. Occasionally, the value will be over 10,000 which is considered fraud for the purpose of this demo. The Flink job subscribes to the Kafka topic and detects fraudulent transactions. When it does, it submits them to another Kafka topic which the actor consumes. The actor simply displays each fraudulent transaction.&lt;/p&gt;

&lt;p&gt;The KUDO CLI by default installs Operators from the &lt;a href=&quot;https://github.com/kudobuilder/operators/&quot;&gt;official repository&lt;/a&gt;, but it also supports installation from your local filesystem. This is useful if you want to develop your own Operator, or modify this demo for your own purposes.&lt;/p&gt;

&lt;p&gt;First, clone the “kudobuilder/operators” repository via:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ git clone https://github.com/kudobuilder/operators.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, change into the “operators” directory and install the demo-operator from your local filesystem:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ cd operators
$ kubectl kudo install repository/flink/docs/demo/financial-fraud/demo-operator --instance flink-demo
instance.kudo.dev/v1beta1/flink-demo created
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This time we didn’t include the –skip-instance flag, so KUDO will actually deploy all the components, including Flink, Kafka, and ZooKeeper. KUDO orchestrates deployments and other lifecycle operations using &lt;a href=&quot;https://kudo.dev/docs/concepts.html#plan&quot;&gt;plans&lt;/a&gt; that were defined by the Operator developer. Plans are similar to &lt;a href=&quot;https://en.wikipedia.org/wiki/Runbook&quot;&gt;runbooks&lt;/a&gt; and encapsulate all the procedures required to operate the software. We can track the status of the deployment using this KUDO command:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo plan status --instance flink-demo
Plan(s) for &quot;flink-demo&quot; in namespace &quot;default&quot;:
.
└── flink-demo (Operator-Version: &quot;flink-demo-0.1.4&quot; Active-Plan: &quot;deploy&quot;)
	└── Plan deploy (serial strategy) [IN_PROGRESS]
    	├── Phase dependencies [IN_PROGRESS]
    	│   ├── Step zookeeper (COMPLETE)
    	│   └── Step kafka (IN_PROGRESS)
    	├── Phase flink-cluster [PENDING]
    	│   └── Step flink (PENDING)
    	├── Phase demo [PENDING]
    	│   ├── Step gen (PENDING)
    	│   └── Step act (PENDING)
    	└── Phase flink-job [PENDING]
        	└── Step submit (PENDING)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The output shows that the “deploy” plan is in progress and that it consists of 4 phases: “dependencies”, “flink-cluster”, “demo” and “flink-job”. The “dependencies” phase includes steps for “zookeeper” and “kafka”. This is where both dependencies get installed, before KUDO continues to install the Flink cluster and the demo itself. We also see that ZooKeeper installation completed, and that Kafka installation is currently in progress. We can view details about Kafka’s deployment plan via:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo plan status --instance flink-demo-kafka
Plan(s) for &quot;flink-demo-kafka&quot; in namespace &quot;default&quot;:
.
└── flink-demo-kafka (Operator-Version: &quot;kafka-1.2.0&quot; Active-Plan: &quot;deploy&quot;)
	├── Plan deploy (serial strategy) [IN_PROGRESS]
	│   └── Phase deploy-kafka [IN_PROGRESS]
	│   	└── Step deploy (IN_PROGRESS)
	└── Plan not-allowed (serial strategy) [NOT ACTIVE]
    	└── Phase not-allowed (serial strategy) [NOT ACTIVE]
        	└── Step not-allowed (serial strategy) [NOT ACTIVE]
            	└── not-allowed [NOT ACTIVE]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After Kafka was successfully installed the next phase “flink-cluster” will start and bring up, you guessed it, your flink-cluster. After this is done, the demo phase creates the generator and actor pods that generate and display transactions for this demo. Lastly, we have the flink-job phase in which we submit the actual FinancialFraudJob to the Flink cluster. Once the flink job is submitted, we will be able to see fraud logs in our actor pod shortly after.&lt;/p&gt;

&lt;p&gt;After a while, the state of all plans, phases and steps will change to “COMPLETE”. Now we can view the Flink dashboard to verify that our job is running. To access it from outside the Kubernetes cluster, first start the client proxy, then open the URL below in your browser:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl proxy
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;http://127.0.0.1:8001/api/v1/namespaces/default/services/flink-demo-flink-jobmanager:ui/proxy/#/overview&quot;&gt;http://127.0.0.1:8001/api/v1/namespaces/default/services/flink-demo-flink-jobmanager:ui/proxy/#/overview&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It should look similar to this, depending on your local machine and how many cores you have available:&lt;/p&gt;

&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
	&lt;img src=&quot;/img/blog/2019-11-06-flink-kubernetes-kudo/flink-dashboard-ui.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The job is up and running and we should now be able to see fraudulent transaction in the logs of the actor pod:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl logs $(kubectl get pod -l actor=flink-demo -o jsonpath=&quot;{.items[0].metadata.name}&quot;)
Broker:   flink-demo-kafka-kafka-0.flink-demo-kafka-svc:9093
Topic:   fraud

Detected Fraud:   TransactionAggregate {startTimestamp=0, endTimestamp=1563395831000, totalAmount=19895:
Transaction{timestamp=1563395778000, origin=1, target=&#39;3&#39;, amount=8341}
Transaction{timestamp=1563395813000, origin=1, target=&#39;3&#39;, amount=8592}
Transaction{timestamp=1563395817000, origin=1, target=&#39;3&#39;, amount=2802}
Transaction{timestamp=1563395831000, origin=1, target=&#39;3&#39;, amount=160}}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you add the “-f” flag to the previous command, you can follow along as more transactions are streaming in and are evaluated by our Flink job.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In this blog post we demonstrated how to easily deploy an end-to-end streaming data application on Kubernetes using KUDO. We deployed a Flink job and two microservices, as well as all the required infrastructure - Flink, Kafka, and ZooKeeper using just a few kubectl commands. To find out more about KUDO, visit the &lt;a href=&quot;https://kudo.dev&quot;&gt;project website&lt;/a&gt; or join the community on &lt;a href=&quot;https://kubernetes.slack.com/messages/kudo/&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Mon, 09 Dec 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/12/09/flink-kubernetes-kudo.html</link>
<guid isPermaLink="true">/news/2019/12/09/flink-kubernetes-kudo.html</guid>
</item>

<item>
<title>How to query Pulsar Streams using Apache Flink</title>
<description>&lt;p&gt;In a previous &lt;a href=&quot;https://flink.apache.org/2019/05/03/pulsar-flink.html&quot;&gt;story&lt;/a&gt; on the  Flink blog, we explained the different ways that &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://pulsar.apache.org/&quot;&gt;Apache Pulsar&lt;/a&gt; can integrate to provide elastic data processing at large scale. This blog post discusses the new developments and integrations between the two frameworks and showcases how you can leverage Pulsar’s built-in schema to query Pulsar streams in real time using Apache Flink.&lt;/p&gt;

&lt;h1 id=&quot;a-short-intro-to-apache-pulsar&quot;&gt;A short intro to Apache Pulsar&lt;/h1&gt;

&lt;p&gt;Apache Pulsar is a flexible pub/sub messaging system, backed by durable log storage. Some of the framework’s highlights include multi-tenancy, a unified message model, structured event streams and a cloud-native architecture that make it a perfect fit for a wide set of use cases, ranging from billing, payments and trading services all the way to the unification of the different messaging architectures in an organization. If you are interested in finding out more about Pulsar, you can visit the &lt;a href=&quot;https://pulsar.apache.org/docs/en/standalone/&quot;&gt;Apache Pulsar documentation&lt;/a&gt; or get in touch with the Pulsar community on &lt;a href=&quot;https://apache-pulsar.herokuapp.com&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;

&lt;h1 id=&quot;existing-pulsar--flink-integration-apache-flink-16&quot;&gt;Existing Pulsar &amp;amp; Flink integration (Apache Flink 1.6+)&lt;/h1&gt;

&lt;p&gt;The existing integration between Pulsar and Flink exploits Pulsar as a message queue in a Flink application. Flink developers can utilize Pulsar as a streaming source and streaming sink for their Flink applications by selecting a specific Pulsar source and connecting to their desired Pulsar cluster and topic:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create and configure Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;  
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SimpleStringSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; 
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subsciptionName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;subscription&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ingest DataStream with Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Pulsar streams can then get connected to the Flink processing logic…&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// perform computation on DataStream (here a simple WordCount)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatmap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
 
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduce&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReduceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;…and then get emitted back to Pulsar (used now as a sink), sending one’s computation results downstream, back to a Pulsar topic:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// emit result via Pulsar producer &lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkPulsarProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthentificationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UTF_8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Although this is a great first integration step, the existing design is not leveraging the full power of Pulsar. Some shortcomings of the integration with Flink 1.6.0 relate to Pulsar neither being utilized as durable storage nor having schema integration with Flink, resulting in manual input when describing an application’s schema registry.&lt;/p&gt;

&lt;h1 id=&quot;pulsars-integration-with-flink-19-using-pulsar-as-a-flink-catalog&quot;&gt;Pulsar’s integration with Flink 1.9: Using Pulsar as a Flink catalog&lt;/h1&gt;

&lt;p&gt;The latest integration between &lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-191&quot;&gt;Flink 1.9.0&lt;/a&gt; and Pulsar addresses most of the previously mentioned shortcomings. The &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;contribution of Alibaba’s Blink to the Flink repository&lt;/a&gt; adds many enhancements and new features to the processing framework that make the integration with Pulsar significantly more powerful and impactful. Flink 1.9.0 brings Pulsar schema integration into the picture, makes the Table API a first-class citizen and provides an exactly-once streaming source and at-least-once streaming sink with Pulsar. Lastly, with schema integration, Pulsar can now be registered as a Flink catalog, making running Flink queries on top of Pulsar streams a matter of a few commands. In the following sections, we will take a closer look at the new integrations and provide examples of how to query Pulsar streams using Flink SQL.&lt;/p&gt;

&lt;h1 id=&quot;leveraging-the-flink--pulsar-schema-integration&quot;&gt;Leveraging the Flink &amp;lt;&amp;gt; Pulsar Schema Integration&lt;/h1&gt;

&lt;p&gt;Before delving into the integration details and how you can use Pulsar schema with Flink, let us describe how schema in Pulsar works. Schema in Apache Pulsar already co-exists and serves as the representation of the data on the broker side of the framework, something that makes schema registry with external systems obsolete. Additionally, the data schema in Pulsar is associated with each topic so both producers and consumers send data with predefined schema information, while the broker performs schema validation, and manages schema multi-versioning and evolution in compatibility checks.&lt;/p&gt;

&lt;p&gt;Below you can find an example of Pulsar’s schema on both the producer and consumer side. On the producer side, you can specify which schema you want to use and Pulsar then sends a POJO class without the need to perform any serialization/deserialization. Similarly, on the consumer end, you can also specify the data schema and upon receiving the data, Pulsar will automatically validate the schema information, fetch the schema of the given version and then deserialize the data back to a POJO structure. Pulsar stores the schema information in the metadata of a Pulsar topic.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Create producer with Struct schema and send messages&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Producer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;AVRO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newMessage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;“&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;”&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Create consumer with Struct schema and receive messages&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Consumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newCOnsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;AVRO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;receive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Let’s assume we have an application that specifies a schema to the producer and/or consumer. Upon receiving the schema information, the producer (or consumer) — that is connected to the broker — will transfer such information so that the broker can then perform schema registration, validations and schema compatibility checks before returning or rejecting the schema as illustrated in the diagram below:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-pulsar-sql-blog-post-visual.png&quot; width=&quot;600px&quot; alt=&quot;Pulsar Schema&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Not only is Pulsar able to handle and store the schema information, but is additionally able to handle any schema evolution — where necessary. Pulsar will effectively manage any schema evolution in the broker, keeping track of all different versions of your schema while performing any necessary compatibility checks.&lt;/p&gt;

&lt;p&gt;Moreover, when messages are published on the producer side, Pulsar will tag each message with the schema version as part of each message’s metadata. On the consumer side, when the message is received and the metadata is deserialized, Pulsar will check the schema version associated with this message and will fetch the corresponding schema information from the broker. As a result, when Pulsar integrates with a Flink application it uses the pre-existing schema information and maps individual messages with schema information to a different row in Flink’s type system.&lt;/p&gt;

&lt;p&gt;For the cases when Flink users do not interact with schema directly or make use of primitive schema (for example, using a topic to store a string or long number), Pulsar will either convert the message payload into a Flink row, called ‘value’ or — for the cases of structured schema types, like JSON and AVRO —  Pulsar will extract the individual fields from the schema information and will map the fields to Flink’s type system. Finally, all metadata information associated with each message, such as the message key, topic, publish time, or event time will be converted into metadata fields in a Flink row. Below we provide two examples of primitive schema and structured schema types and how these will be transformed from a Pulsar topic to Flink’s type system.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-pulsar-sql-blog-post-visual-primitive-avro-schema.png&quot; width=&quot;600px&quot; alt=&quot;Primitive and AVRO Schema&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Once all the schema information is mapped to Flink’s type system, you can start building a Pulsar source, sink or catalog in Flink based on the specified schema information as illustrated below:&lt;/p&gt;

&lt;h1 id=&quot;flink--pulsar-read-data-from-pulsar&quot;&gt;Flink &amp;amp; Pulsar: Read data from Pulsar&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Create a Pulsar source for streaming queries&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;pulsar://...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;http://...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;partitionDiscoveryIntervalMillis&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;5000&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;startingOffsets&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;earliest&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-source-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FlinkPulsarSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// you don&amp;#39;t need to provide a type information to addSource since FlinkPulsarSource is ResultTypeQueryable&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// chain operations on dataStream of Row and sink the output&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// end method chaining&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;Register topics in Pulsar as streaming tables&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inAppendMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO sink-table .....&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h1 id=&quot;flink--pulsar-write-data-to-pulsar&quot;&gt;Flink &amp;amp; Pulsar: Write data to Pulsar&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Create a Pulsar sink for streaming queries&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.....&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FlinkPulsarSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DummyTopicKeyExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;Write a streaming table to Pulsar&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inAppendMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO sink-table .....&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In every instance, Flink developers only need to specify the properties of how Flink will connect to a Pulsar cluster without worrying about any schema registry, or serialization/deserialization actions and register the Pulsar cluster as a source, sink or streaming table in Flink. Once all three elements are put together, Pulsar can then be registered as a catalog in Flink, something that drastically simplifies how you process and query data like, for example, writing a program to query data from Pulsar or using the Table API and SQL to query Pulsar data streams.&lt;/p&gt;

&lt;h1 id=&quot;next-steps--future-integration&quot;&gt;Next Steps &amp;amp; Future Integration&lt;/h1&gt;

&lt;p&gt;The goal of the integration between Pulsar and Flink is to simplify how developers use the two frameworks to build a unified data processing stack. As we progress from the classical Lamda architectures — where an online, speeding layer is combined with an offline, batch layer to run data computations — Flink and Pulsar present a great combination in providing a truly unified data processing stack. We see Flink as a unified computation engine, handling both online (streaming) and offline (batch) workloads and Pulsar as the unified data storage layer for a truly unified data processing stack that simplifies developer workloads.&lt;/p&gt;

&lt;p&gt;There is still a lot of ongoing work and effort from both communities in getting the integration even better, such as a new source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) that will allow the &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-Flink-Pulsar-Connector-td22019.html&quot;&gt;contribution of the Pulsar connectors to the Flink community&lt;/a&gt; as well as a new subscription type called &lt;code&gt;Key_Shared&lt;/code&gt; subscription type in Pulsar that will allow efficient scaling of the source parallelism. Additional efforts focus around the provision of end-to-end, exactly-once guarantees (currently available only in the source Pulsar connector, and not the sink Pulsar connector) and more efforts around using Pulsar/BookKeeper as a Flink state backend.&lt;/p&gt;

&lt;p&gt;You can find a more detailed overview of the integration work between the two communities in this &lt;a href=&quot;https://youtu.be/3sBXXfgl5vs&quot;&gt;recording video&lt;/a&gt; from Flink Forward Europe 2019 or sign up to the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink dev mailing list&lt;/a&gt; for the latest contribution and integration efforts between Flink and Pulsar.&lt;/p&gt;
</description>
<pubDate>Mon, 25 Nov 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</link>
<guid isPermaLink="true">/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</guid>
</item>

<item>
<title>Apache Flink 1.9.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.9 series.&lt;/p&gt;

&lt;p&gt;This release includes 96 fixes and minor improvements for Flink 1.9.0. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.9.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11630&quot;&gt;FLINK-11630&lt;/a&gt;] -         TaskExecutor does not wait for Task termination when terminating itself
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13490&quot;&gt;FLINK-13490&lt;/a&gt;] -         Fix if one column value is null when reading JDBC, the following values are all null
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13941&quot;&gt;FLINK-13941&lt;/a&gt;] -         Prevent data-loss by not cleaning up small part files from S3.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12501&quot;&gt;FLINK-12501&lt;/a&gt;] -         AvroTypeSerializer does not work with types generated by avrohugger
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13386&quot;&gt;FLINK-13386&lt;/a&gt;] -         Fix some frictions in the new default Web UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13526&quot;&gt;FLINK-13526&lt;/a&gt;] -         Switching to a non existing catalog or database crashes sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13568&quot;&gt;FLINK-13568&lt;/a&gt;] -         DDL create table doesn&amp;#39;t allow STRING data type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13805&quot;&gt;FLINK-13805&lt;/a&gt;] -         Bad Error Message when TaskManager is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13806&quot;&gt;FLINK-13806&lt;/a&gt;] -         Metric Fetcher floods the JM log with errors when TM is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14010&quot;&gt;FLINK-14010&lt;/a&gt;] -         Dispatcher &amp;amp; JobManagers don&amp;#39;t give up leadership when AM is shut down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14145&quot;&gt;FLINK-14145&lt;/a&gt;] -         CompletedCheckpointStore#getLatestCheckpoint(true) returns wrong checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13059&quot;&gt;FLINK-13059&lt;/a&gt;] -         Cassandra Connector leaks Semaphore on Exception and hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13534&quot;&gt;FLINK-13534&lt;/a&gt;] -         Unable to query Hive table with decimal column
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13562&quot;&gt;FLINK-13562&lt;/a&gt;] -         Throws exception when FlinkRelMdColumnInterval meets two stage stream group aggregate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13563&quot;&gt;FLINK-13563&lt;/a&gt;] -         TumblingGroupWindow should implement toString method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13564&quot;&gt;FLINK-13564&lt;/a&gt;] -         Throw exception if constant with YEAR TO MONTH resolution was used for group windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13588&quot;&gt;FLINK-13588&lt;/a&gt;] -         StreamTask.handleAsyncException throws away the exception cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13653&quot;&gt;FLINK-13653&lt;/a&gt;] -         ResultStore should avoid using RowTypeInfo when creating a result
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13711&quot;&gt;FLINK-13711&lt;/a&gt;] -         Hive array values not properly displayed in SQL CLI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13737&quot;&gt;FLINK-13737&lt;/a&gt;] -         flink-dist should add provided dependency on flink-examples-table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13738&quot;&gt;FLINK-13738&lt;/a&gt;] -         Fix NegativeArraySizeException in LongHybridHashTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13742&quot;&gt;FLINK-13742&lt;/a&gt;] -         Fix code generation when aggregation contains both distinct aggregate with and without filter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13760&quot;&gt;FLINK-13760&lt;/a&gt;] -         Fix hardcode Scala version dependency in hive connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13761&quot;&gt;FLINK-13761&lt;/a&gt;] -         `SplitStream` should be deprecated because `SplitJavaStream` is deprecated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13789&quot;&gt;FLINK-13789&lt;/a&gt;] -         Transactional Id Generation fails due to user code impacting formatting string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13823&quot;&gt;FLINK-13823&lt;/a&gt;] -         Incorrect debug log in CompileUtils
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13825&quot;&gt;FLINK-13825&lt;/a&gt;] -         The original plugins dir is not restored after e2e test run
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13831&quot;&gt;FLINK-13831&lt;/a&gt;] -         Free Slots / All Slots display error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13887&quot;&gt;FLINK-13887&lt;/a&gt;] -         Ensure defaultInputDependencyConstraint to be non-null when setting it in ExecutionConfig
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13897&quot;&gt;FLINK-13897&lt;/a&gt;] -         OSS FS NOTICE file is placed in wrong directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13933&quot;&gt;FLINK-13933&lt;/a&gt;] -         Hive Generic UDTF can not be used in table API both stream and batch mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13936&quot;&gt;FLINK-13936&lt;/a&gt;] -         NOTICE-binary is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13966&quot;&gt;FLINK-13966&lt;/a&gt;] -         Jar sorting in collect_license_files.sh is locale dependent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14009&quot;&gt;FLINK-14009&lt;/a&gt;] -         Cron jobs broken due to verifying incorrect NOTICE-binary file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14049&quot;&gt;FLINK-14049&lt;/a&gt;] -         Update error message for failed partition updates to include task name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14076&quot;&gt;FLINK-14076&lt;/a&gt;] -         &amp;#39;ClassNotFoundException: KafkaException&amp;#39; on Flink v1.9 w/ checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14107&quot;&gt;FLINK-14107&lt;/a&gt;] -         Kinesis consumer record emitter deadlock under event time alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14119&quot;&gt;FLINK-14119&lt;/a&gt;] -         Clean idle state for RetractableTopNFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14139&quot;&gt;FLINK-14139&lt;/a&gt;] -         Fix potential memory leak of rest server when using session/standalone cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14140&quot;&gt;FLINK-14140&lt;/a&gt;] -         The Flink Logo Displayed in Flink Python Shell is Broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14150&quot;&gt;FLINK-14150&lt;/a&gt;] -         Unnecessary __pycache__ directories appears in pyflink.zip
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14288&quot;&gt;FLINK-14288&lt;/a&gt;] -         Add Py4j NOTICE for source release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13892&quot;&gt;FLINK-13892&lt;/a&gt;] -         HistoryServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14043&quot;&gt;FLINK-14043&lt;/a&gt;] -         SavepointMigrationTestBase is super slow
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12164&quot;&gt;FLINK-12164&lt;/a&gt;] -         JobMasterTest.testJobFailureWhenTaskExecutorHeartbeatTimeout is unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9900&quot;&gt;FLINK-9900&lt;/a&gt;] -         Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13484&quot;&gt;FLINK-13484&lt;/a&gt;] -         ConnectedComponents end-to-end test instable with NoResourceAvailableException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13489&quot;&gt;FLINK-13489&lt;/a&gt;] -         Heavy deployment end-to-end test fails on Travis with TM heartbeat timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13514&quot;&gt;FLINK-13514&lt;/a&gt;] -         StreamTaskTest.testAsyncCheckpointingConcurrentCloseAfterAcknowledge unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13530&quot;&gt;FLINK-13530&lt;/a&gt;] -         AbstractServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13585&quot;&gt;FLINK-13585&lt;/a&gt;] -         Fix sporadical deallock in TaskAsyncCallTest#testSetsUserCodeClassLoader()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13599&quot;&gt;FLINK-13599&lt;/a&gt;] -         Kinesis end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13663&quot;&gt;FLINK-13663&lt;/a&gt;] -         SQL Client end-to-end test for modern Kafka failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13688&quot;&gt;FLINK-13688&lt;/a&gt;] -         HiveCatalogUseBlinkITCase.testBlinkUdf constantly failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13739&quot;&gt;FLINK-13739&lt;/a&gt;] -         BinaryRowTest.testWriteString() fails in some environments
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13746&quot;&gt;FLINK-13746&lt;/a&gt;] -         Elasticsearch (v2.3.5) sink end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13769&quot;&gt;FLINK-13769&lt;/a&gt;] -         BatchFineGrainedRecoveryITCase.testProgram failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13807&quot;&gt;FLINK-13807&lt;/a&gt;] -         Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13965&quot;&gt;FLINK-13965&lt;/a&gt;] -         Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9941&quot;&gt;FLINK-9941&lt;/a&gt;] -         Flush in ScalaCsvOutputFormat before close method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13336&quot;&gt;FLINK-13336&lt;/a&gt;] -         Remove the legacy batch fault tolerance page and redirect it to the new task failure recovery page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13380&quot;&gt;FLINK-13380&lt;/a&gt;] -         Improve the usability of Flink session cluster on Kubernetes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13819&quot;&gt;FLINK-13819&lt;/a&gt;] -         Introduce RpcEndpoint State
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13845&quot;&gt;FLINK-13845&lt;/a&gt;] -         Drop all the content of removed &amp;quot;Checkpointed&amp;quot; interface
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13957&quot;&gt;FLINK-13957&lt;/a&gt;] -         Log dynamic properties on job submission
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13967&quot;&gt;FLINK-13967&lt;/a&gt;] -         Generate full binary licensing via collect_license_files.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13968&quot;&gt;FLINK-13968&lt;/a&gt;] -         Add travis check for the correctness of the binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13449&quot;&gt;FLINK-13449&lt;/a&gt;] -         Add ARM architecture to MemoryArchitecture
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Documentation
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13105&quot;&gt;FLINK-13105&lt;/a&gt;] -         Add documentation for blink planner&amp;#39;s built-in functions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13277&quot;&gt;FLINK-13277&lt;/a&gt;] -         add documentation of Hive source/sink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13354&quot;&gt;FLINK-13354&lt;/a&gt;] -         Add documentation for how to use blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13355&quot;&gt;FLINK-13355&lt;/a&gt;] -         Add documentation for Temporal Table Join in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13356&quot;&gt;FLINK-13356&lt;/a&gt;] -         Add documentation for TopN and Deduplication in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13359&quot;&gt;FLINK-13359&lt;/a&gt;] -         Add documentation for DDL introduction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13362&quot;&gt;FLINK-13362&lt;/a&gt;] -         Add documentation for Kafka &amp;amp; ES &amp;amp; FileSystem DDL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13363&quot;&gt;FLINK-13363&lt;/a&gt;] -         Add documentation for streaming aggregate performance tunning.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13706&quot;&gt;FLINK-13706&lt;/a&gt;] -         add documentation of how to use Hive functions in Flink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13942&quot;&gt;FLINK-13942&lt;/a&gt;] -         Add Overview page for Getting Started section
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13863&quot;&gt;FLINK-13863&lt;/a&gt;] -         Update Operations Playground to Flink 1.9.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13937&quot;&gt;FLINK-13937&lt;/a&gt;] -         Fix wrong hive dependency version in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13830&quot;&gt;FLINK-13830&lt;/a&gt;] -         The Document about Cluster on yarn have some problems
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14160&quot;&gt;FLINK-14160&lt;/a&gt;] -         Extend Operations Playground with --backpressure option
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13388&quot;&gt;FLINK-13388&lt;/a&gt;] -         Update UI screenshots in the documentation to the new default Web Frontend
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13415&quot;&gt;FLINK-13415&lt;/a&gt;] -         Document how to use hive connector in scala shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13517&quot;&gt;FLINK-13517&lt;/a&gt;] -         Restructure Hive Catalog documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13643&quot;&gt;FLINK-13643&lt;/a&gt;] -         Document the workaround for users with a different minor Hive version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13757&quot;&gt;FLINK-13757&lt;/a&gt;] -         Fix wrong description of &quot;IS NOT TRUE&quot; function documentation
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Fri, 18 Oct 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/10/18/release-1.9.1.html</link>
<guid isPermaLink="true">/news/2019/10/18/release-1.9.1.html</guid>
</item>

<item>
<title>The State Processor API: How to Read, write and modify the state of Flink applications</title>
<description>&lt;p&gt;Whether you are running Apache Flink&lt;sup&gt;Ⓡ&lt;/sup&gt; in production or evaluated Flink as a computation framework in the past, you’ve probably found yourself asking the question: How can I access, write or update state in a Flink savepoint? Ask no more! &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html&quot;&gt;Apache Flink 1.9.0&lt;/a&gt; introduces the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html&quot;&gt;State Processor API&lt;/a&gt;, a powerful extension of the DataSet API that allows reading, writing and modifying state in Flink’s savepoints and checkpoints.&lt;/p&gt;

&lt;p&gt;In this post, we explain why this feature is a big step for Flink, what you can use it for, and how to use it. Finally, we will discuss the future of the State Processor API and how it aligns with our plans to evolve Flink into a system for &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;unified batch and stream processing&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;stateful-stream-processing-with-apache-flink-until-flink-19&quot;&gt;Stateful Stream Processing with Apache Flink until Flink 1.9&lt;/h2&gt;

&lt;p&gt;All non-trivial stream processing applications are stateful and most of them are designed to run for months or years. Over time, many of them accumulate a lot of valuable state that can be very expensive or even impossible to rebuild if it gets lost due to a failure. In order to guarantee the consistency and durability of application state, Flink featured a sophisticated checkpointing and recovery mechanism from very early on. With every release, the Flink community has added more and more state-related features to improve checkpointing and recovery speed, the maintenance of applications, and practices to manage applications.&lt;/p&gt;

&lt;p&gt;However, a feature that was commonly requested by Flink users was the ability to access the state of an application “from the outside”. This request was motivated by the need to validate or debug the state of an application, to migrate the state of an application to another application, to evolve an application from the Heap State Backend to the RocksDB State Backend, or to import the initial state of an application from an external system like a relational database.&lt;/p&gt;

&lt;p&gt;Despite all those convincing reasons to expose application state externally, your access options have been fairly limited until now. Flink’s Queryable State feature only supports key-lookups (point queries) and does not guarantee the consistency of returned values (the value of a key might be different before and after an application recovered from a failure). Moreover, queryable state cannot be used to add or modify the state of an application. Also, savepoints, which are consistent snapshots of an application’s state, were not accessible because the application state is encoded with a custom binary format.&lt;/p&gt;

&lt;h2 id=&quot;reading-and-writing-application-state-with-the-state-processor-api&quot;&gt;Reading and Writing Application State with the State Processor API&lt;/h2&gt;

&lt;p&gt;The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#integration-with-datastream-and-dataset-api&quot;&gt;interoperability of DataSet and Table API&lt;/a&gt;, you can even use relational Table API or SQL queries to analyze and process state data.&lt;/p&gt;

&lt;p&gt;For example, you can take a savepoint of a running stream processing application and analyze it with a DataSet batch program to verify that the application behaves correctly. Or you can read a batch of data from any store, preprocess it, and write the result to a savepoint that you use to bootstrap the state of a streaming application. It’s also possible to fix inconsistent state entries now. Finally, the State Processor API opens up many ways to evolve a stateful application that were previously blocked by parameter and design choices that could not be changed without losing all the state of the application after it was started. For example, you can now arbitrarily modify the data types of states, adjust the maximum parallelism of operators, split or merge operator state, re-assign operator UIDs, and so on.&lt;/p&gt;

&lt;h2 id=&quot;mapping-application-state-to-datasets&quot;&gt;Mapping Application State to DataSets&lt;/h2&gt;

&lt;p&gt;The State Processor API maps the state of a streaming application to one or more data sets that can be separately processed. In order to be able to use the API, you need to understand how this mapping works.&lt;/p&gt;

&lt;p&gt;But let’s first have a look at what a stateful Flink job looks like. A Flink job is composed of operators, typically one or more source operators, a few operators for the actual processing, and one or more sink operators. Each operator runs in parallel in one or more tasks and can work with different types of state. An operator can have zero, one, or more &lt;em&gt;“operator states”&lt;/em&gt; which are organized as lists that are scoped to the operator’s tasks. If the operator is applied on a keyed stream, it can also have zero, one, or more &lt;em&gt;“keyed states”&lt;/em&gt; which are scoped to a key that is extracted from each processed record. You can think of keyed state as a distributed key-value map.&lt;/p&gt;

&lt;p&gt;The following figure shows the application “MyApp” which consists of three operators called “Src”, “Proc”, and “Snk”. Src has one operator state (os1), Proc has one operator state (os2) and two keyed states (ks1, ks2) and Snk is stateless.&lt;/p&gt;

&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
	&lt;img src=&quot;/img/blog/2019-09-13-state-processor-api-blog/application-my-app-state-processor-api.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;A savepoint or checkpoint of MyApp consists of the data of all states, organized in a way that the states of each task can be restored. When processing the data of a savepoint (or checkpoint) with a batch job, we need a mental model that maps the data of the individual tasks’ states into data sets or tables. In fact, we can think of a savepoint as a database. Every operator (identified by its UID) represents a namespace. Each operator state of an operator is mapped to a dedicated table in the namespace with a single column that holds the state’s data of all tasks. All keyed states of an operator are mapped to a single table consisting of a column for the key, and one column for each keyed state. The following figure shows how a savepoint of MyApp is mapped to a database.&lt;/p&gt;

&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
	&lt;img src=&quot;/img/blog/2019-09-13-state-processor-api-blog/database-my-app-state-processor-api.png&quot; width=&quot;600px&quot; alt=&quot;Database: My App&quot; /&gt;
&lt;/p&gt;

&lt;p&gt;The figure shows how the values of Src’s operator state are mapped to a table with one column and five rows, one row for all list entries across all parallel tasks of Src. Operator state os2 of the operator “Proc” is similarly mapped to an individual table. The keyed states ks1 and ks2 are combined to a single table with three columns, one for the key, one for ks1 and one for ks2. The keyed table holds one row for each distinct key of both keyed states. Since the operator “Snk” does not have any state, its namespace is empty.&lt;/p&gt;

&lt;p&gt;The State Processor API now offers methods to create, load, and write a savepoint. You can read a DataSet from a loaded savepoint or convert a DataSet into a state and add it to a savepoint. DataSets can be processed with the full feature set of the DataSet API. With these building blocks, all of the before-mentioned use cases (and more) can be addressed. Please have a look at the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html&quot;&gt;documentation&lt;/a&gt; if you’d like to learn how to use the State Processor API in detail.&lt;/p&gt;

&lt;h2 id=&quot;why-dataset-api&quot;&gt;Why DataSet API?&lt;/h2&gt;

&lt;p&gt;In case you are familiar with &lt;a href=&quot;https://flink.apache.org/roadmap.html&quot;&gt;Flink’s roadmap&lt;/a&gt;, you might be surprised that the State Processor API is based on the DataSet API. The Flink community plans to extend the DataStream API with the concept of &lt;em&gt;BoundedStreams&lt;/em&gt; and deprecate the DataSet API. When designing this feature, we also evaluated the DataStream API or Table API but neither could provide the right feature set yet. Since we didn’t want to block this feature on the progress of Flink’s APIs, we decided to build it on the DataSet API, but kept its dependencies on the DataSet API to a minimum. Hence, migrating it to another API should be fairly easy.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Flink users have requested a feature to access and modify the state of streaming applications from the outside for a long time. With the State Processor API, Flink 1.9.0 finally exposes application state as a data format that can be manipulated. This feature opens up many new possibilities for how users can maintain and manage Flink streaming applications, including arbitrary evolution of stream applications and exporting and bootstrapping of application state. To put it concisely, the State Processor API unlocks the black box that savepoints used to be.&lt;/p&gt;
</description>
<pubDate>Fri, 13 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/feature/2019/09/13/state-processor-api.html</link>
<guid isPermaLink="true">/feature/2019/09/13/state-processor-api.html</guid>
</item>

<item>
<title>Apache Flink 1.8.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;

&lt;p&gt;This release includes 23 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13941&quot;&gt;FLINK-13941&lt;/a&gt;] -         Prevent data-loss by not cleaning up small part files from S3.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9526&quot;&gt;FLINK-9526&lt;/a&gt;] -         BucketingSink end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10368&quot;&gt;FLINK-10368&lt;/a&gt;] -         &amp;#39;Kerberized YARN on Docker test&amp;#39; unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12319&quot;&gt;FLINK-12319&lt;/a&gt;] -         StackOverFlowError in cep.nfa.sharedbuffer.SharedBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12736&quot;&gt;FLINK-12736&lt;/a&gt;] -         ResourceManager may release TM with allocated slots
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12889&quot;&gt;FLINK-12889&lt;/a&gt;] -         Job keeps in FAILING state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13059&quot;&gt;FLINK-13059&lt;/a&gt;] -         Cassandra Connector leaks Semaphore on Exception; hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13159&quot;&gt;FLINK-13159&lt;/a&gt;] -         java.lang.ClassNotFoundException when restore job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13367&quot;&gt;FLINK-13367&lt;/a&gt;] -         Make ClosureCleaner detect writeReplace serialization override
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13369&quot;&gt;FLINK-13369&lt;/a&gt;] -         Recursive closure cleaner ends up with stackOverflow in case of circular dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13394&quot;&gt;FLINK-13394&lt;/a&gt;] -         Use fallback unsafe secure MapR in nightly.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13484&quot;&gt;FLINK-13484&lt;/a&gt;] -         ConnectedComponents end-to-end test instable with NoResourceAvailableException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13499&quot;&gt;FLINK-13499&lt;/a&gt;] -         Remove dependency on MapR artifact repository
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13508&quot;&gt;FLINK-13508&lt;/a&gt;] -         CommonTestUtils#waitUntilCondition() may attempt to sleep with negative time
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13586&quot;&gt;FLINK-13586&lt;/a&gt;] -         Method ClosureCleaner.clean broke backward compatibility between 1.8.0 and 1.8.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13761&quot;&gt;FLINK-13761&lt;/a&gt;] -         `SplitStream` should be deprecated because `SplitJavaStream` is deprecated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13789&quot;&gt;FLINK-13789&lt;/a&gt;] -         Transactional Id Generation fails due to user code impacting formatting string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13806&quot;&gt;FLINK-13806&lt;/a&gt;] -         Metric Fetcher floods the JM log with errors when TM is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13807&quot;&gt;FLINK-13807&lt;/a&gt;] -         Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13897&quot;&gt;FLINK-13897&lt;/a&gt;] -         OSS FS NOTICE file is placed in wrong directory
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12578&quot;&gt;FLINK-12578&lt;/a&gt;] -         Use secure URLs for Maven repositories
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12741&quot;&gt;FLINK-12741&lt;/a&gt;] -         Update docs about Kafka producer fault tolerance guarantees
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12749&quot;&gt;FLINK-12749&lt;/a&gt;] -         Add Flink Operations Playground documentation
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/09/11/release-1.8.2.html</link>
<guid isPermaLink="true">/news/2019/09/11/release-1.8.2.html</guid>
</item>

<item>
<title>Flink Community Update - September&#39;19</title>
<description>&lt;p&gt;This has been an exciting, fast-paced year for the Apache Flink community. But with over 10k messages across the mailing lists, 3k Jira tickets and 2k pull requests, it is not easy to keep up with the latest state of the project. Plus everything happening around it. With that in mind, we want to bring back regular community updates to the Flink blog.&lt;/p&gt;

&lt;p&gt;The first post in the series takes you on an little detour across the year, to freshen up and make sure you’re all up to date.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#the-year-so-far-in-flink&quot; id=&quot;markdown-toc-the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#integration-of-the-chinese-speaking-community&quot; id=&quot;markdown-toc-integration-of-the-chinese-speaking-community&quot;&gt;Integration of the Chinese-speaking community&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#improving-flinks-documentation&quot; id=&quot;markdown-toc-improving-flinks-documentation&quot;&gt;Improving Flink’s Documentation&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#adjusting-the-contribution-process-and-experience&quot; id=&quot;markdown-toc-adjusting-the-contribution-process-and-experience&quot;&gt;Adjusting the Contribution Process and Experience&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt;        &lt;ul&gt;
          &lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
          &lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
        &lt;/ul&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#upcoming-events&quot; id=&quot;markdown-toc-upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#north-america&quot; id=&quot;markdown-toc-north-america&quot;&gt;North America&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#europe&quot; id=&quot;markdown-toc-europe&quot;&gt;Europe&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#asia&quot; id=&quot;markdown-toc-asia&quot;&gt;Asia&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h1 id=&quot;the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/h1&gt;

&lt;p&gt;Two major versions were released this year: &lt;a href=&quot;https://flink.apache.org/news/2019/04/09/release-1.8.0.html&quot;&gt;Flink 1.8&lt;/a&gt; and &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html&quot;&gt;Flink 1.9&lt;/a&gt;; paving the way for the goal of making Flink the first framework to seamlessly support stream and batch processing with a single, unified runtime. The &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;contribution of Blink&lt;/a&gt; to Apache Flink was key in accelerating the path to this vision and reduced the waiting time for long-pending user requests — such as Hive integration, (better) Python support, the rework of Flink’s Machine Learning library and…fine-grained failure recovery (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The 1.9 release was the result of the &lt;strong&gt;biggest community effort the project has experienced so far&lt;/strong&gt;, with the number of contributors soaring to 190 (see &lt;a href=&quot;#the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt;). For a quick overview of the upcoming work for Flink 1.10 (and beyond), have a look at the updated &lt;a href=&quot;https://flink.apache.org/roadmap.html&quot;&gt;roadmap&lt;/a&gt;!&lt;/p&gt;

&lt;h2 id=&quot;integration-of-the-chinese-speaking-community&quot;&gt;Integration of the Chinese-speaking community&lt;/h2&gt;

&lt;p&gt;As the number of Chinese-speaking Flink users rapidly grows, the community is working on translating resources and creating dedicated spaces for discussion to invite and include these users in the wider Flink community. Part of the ongoing work is described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-35%3A+Support+Chinese+Documents+and+Website&quot;&gt;FLIP-35&lt;/a&gt; and has resulted in:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A new user mailing list (user-zh@f.a.o) dedicated to Chinese-speakers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A Chinese translation of the Apache Flink &lt;a href=&quot;https://flink.apache.org/zh/&quot;&gt;website&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/zh/&quot;&gt;documentation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Multiple meetups organized all over China, with the biggest one reaching a whopping number of 500+ participants. Some of these meetups were also organized in collaboration with communities from other projects, like Apache Pulsar and Apache Kafka.&lt;/li&gt;
&lt;/ul&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_3.png&quot; width=&quot;800px&quot; alt=&quot;China Meetup&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;In case you’re interested in knowing more about this work in progress, Robert Metzger and Fabian Hueske will be diving into “Inviting Apache Flink’s Chinese User Community” at the upcoming ApacheCon Europe 2019 (see &lt;a href=&quot;#upcoming-flink-community-events&quot;&gt;Upcoming Flink Community Events&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;improving-flinks-documentation&quot;&gt;Improving Flink’s Documentation&lt;/h2&gt;

&lt;p&gt;Besides the translation effort, the community has also been working quite hard on a &lt;strong&gt;Flink docs overhaul&lt;/strong&gt;. The main goals are to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Organize and clean-up the structure of the docs;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Align the content with the overall direction of the project;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Improve the &lt;em&gt;getting-started&lt;/em&gt; material and make the content more accessible to different levels of Flink experience.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Given that there has been some confusion in the past regarding unclear definition of core Flink concepts, one of the first completed efforts was to introduce a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/concepts/glossary.html#glossary&quot;&gt;Glossary&lt;/a&gt; in the docs. To get up to speed with the roadmap for the remainder efforts, you can refer to &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation&quot;&gt;FLIP-42&lt;/a&gt; and the corresponding &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12639&quot;&gt;umbrella Jira ticket&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;adjusting-the-contribution-process-and-experience&quot;&gt;Adjusting the Contribution Process and Experience&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://flink.apache.org/contributing/how-to-contribute.html&quot;&gt;guidelines&lt;/a&gt; to contribute to Apache Flink have been reworked on the website, in an effort to lower the entry barrier for new contributors and reduce the overall friction in the contribution process. In addition, the Flink community discussed and adopted &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026&quot;&gt;bylaws&lt;/a&gt; to help the community collaborate and coordinate more smoothly.&lt;/p&gt;

&lt;p&gt;For code contributors, a &lt;a href=&quot;https://flink.apache.org/contributing/code-style-and-quality-preamble.html&quot;&gt;Code Style and Quality Guide&lt;/a&gt; that captures the expected standards for contributions was also added to the “Contributing” section of the Flink website.&lt;/p&gt;

&lt;p&gt;It’s important to stress that &lt;strong&gt;contributions are not restricted to code&lt;/strong&gt;. Non-code contributions such as mailing list support, documentation work or organization of community events are equally as important to the development of the project and highly encouraged.&lt;/p&gt;

&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;

&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;5 new Committers&lt;/strong&gt; and &lt;strong&gt;4 PMC (Project Management Committee) Members&lt;/strong&gt; in 2019, so far:&lt;/p&gt;

&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Jincheng Sun, Kete (Kurt) Young, Kostas Kloudas, Thomas Weise
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Andrey Zagrebin, Hequn, Jiangjie (Becket) Qin, Rong Rong, Zhijiang Wang
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Congratulations and thank you for your hardworking commitment to Flink!&lt;/p&gt;

&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;

&lt;p&gt;Flink continues to push the boundaries of (stream) data processing, and the community is proud to see an ever-increasingly diverse set of contributors, users and technologies join the ecosystem.&lt;/p&gt;

&lt;p&gt;In the timeframe of three releases, the project jumped from &lt;strong&gt;112 to 190 contributors&lt;/strong&gt;, also doubling down on the number of requested changes and improvements. To top it off, the Flink GitHub repository recently reached the milestone of &lt;strong&gt;10k stars&lt;/strong&gt;, all the way up from the incubation days in 2014.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_1.png&quot; width=&quot;1000px&quot; alt=&quot;GitHub&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The activity across the user@ and dev@&lt;sup&gt;1&lt;/sup&gt; mailing lists shows a healthy heartbeat, and the gradual ramp up of user-zh@ suggests that this was a well-received community effort. Looking at the numbers for the same period in 2018, the dev@ mailing list has seen the biggest surge in activity, with an average growth of &lt;strong&gt;2.5x in the number of messages and distinct users&lt;/strong&gt; — a great reflection of the hyperactive pace of development of the Flink codebase.&lt;/p&gt;

&lt;p&gt;&lt;img style=&quot;float: right;&quot; src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_2.png&quot; width=&quot;420px&quot; alt=&quot;Mailing Lists&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In support of these observations, the report for the financial year of 2019 from the Apache Software Foundation (ASF) features Flink as one of the most thriving open source projects, with mentions for:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Most Active Visits and Downloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Most Active Sources: Visits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Most Active Sources: Clones&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Top Repositories by Number of Commits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Top Most Active Apache Mailing Lists (user@ and dev@)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hats off to our fellows at Apache Beam for an astounding year, too! For more detailed insights, check the &lt;a href=&quot;https://s3.amazonaws.com/files-dist/AnnualReports/FY2018%20Annual%20Report.pdf&quot;&gt;full report&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;1. Excluding messages from “jira@apache.org”.&lt;/sup&gt;&lt;/p&gt;

&lt;h1 id=&quot;upcoming-events&quot;&gt;Upcoming Events&lt;/h1&gt;

&lt;p&gt;As the conference and meetup season ramps up again, here are some events to keep an eye out for talks about Flink and opportunities to mingle with the wider stream processing community.&lt;/p&gt;

&lt;h3 id=&quot;north-america&quot;&gt;North America&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://conferences.oreilly.com/strata/strata-ny&quot;&gt;Strata Data Conference 2019&lt;/a&gt;&lt;/strong&gt;, September 23-26, New York, USA
    &lt;p&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;[Meetup] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/&quot;&gt;Apache Flink Bay Area Meetup&lt;/a&gt;&lt;/strong&gt;, September 24, San Francisco, USA
    &lt;p&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/&quot;&gt;Scale By The Bay 2019&lt;/a&gt;&lt;/strong&gt;, November 13-15, San Francisco, USA&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;europe&quot;&gt;Europe&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[Meetup] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Apache-Flink-London-Meetup/events/264123672&quot;&gt;Apache Flink London Meetup&lt;/a&gt;&lt;/strong&gt;, September 23, London, UK
    &lt;p&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://europe-2019.flink-forward.org&quot;&gt;Flink Forward Europe 2019&lt;/a&gt;&lt;/strong&gt;, October 7-9, Berlin, Germany
    &lt;p&gt;&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;The next edition of Flink Forward Europe is around the corner and the &lt;a href=&quot;https://europe-2019.flink-forward.org/conference-program&quot;&gt;program&lt;/a&gt; has been announced, featuring 70+ talks as well as panel discussions and interactive “Ask Me Anything” sessions with core Flink committers. If you’re looking to learn more about Flink and share your experience with other community members, there really is &lt;a href=&quot;(https://vimeo.com/296403091)&quot;&gt;no better place&lt;/a&gt; than Flink Forward!&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; if you are a &lt;strong&gt;committer for any Apache project&lt;/strong&gt;, you can &lt;strong&gt;get a free ticket&lt;/strong&gt; by registering with your Apache email address and using the discount code: &lt;em&gt;FFEU19-ApacheCommitter&lt;/em&gt;.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://aceu19.apachecon.com/&quot;&gt;ApacheCon Berlin 2019&lt;/a&gt;&lt;/strong&gt;, October 22-24, Berlin, Germany&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://www.data2day.de/&quot;&gt;Data2Day 2019&lt;/a&gt;&lt;/strong&gt;, October 22-24, Ludwigshafen, Germany&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://bigdatatechwarsaw.eu&quot;&gt;Big Data Tech Warsaw 2020&lt;/a&gt;&lt;/strong&gt;, February 7, Warsaw, Poland
    &lt;p&gt;&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The Call For Presentations (CFP) is now &lt;a href=&quot;https://bigdatatechwarsaw.eu/cfp/&quot;&gt;open&lt;/a&gt;.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;asia&quot;&gt;Asia&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://m.aliyun.com/markets/aliyun/developer/ffa2019&quot;&gt;Flink Forward Asia 2019&lt;/a&gt;&lt;/strong&gt;, November 28-30, Beijing, China
    &lt;p&gt;&lt;/p&gt;
    &lt;ul&gt;
      &lt;li&gt;The second edition of Flink Forward Asia is also happening later this year, in Beijing, and the CFP is &lt;a href=&quot;https://developer.aliyun.com/special/ffa2019&quot;&gt;open&lt;/a&gt; until September 20.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more. Also, please reach out if you’re interested in organizing or being part of Flink events in your area!&lt;/p&gt;
</description>
<pubDate>Tue, 10 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/09/10/community-update.html</link>
<guid isPermaLink="true">/news/2019/09/10/community-update.html</guid>
</item>

<item>
<title>Apache Flink 1.9.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is proud to announce the release of Apache Flink
1.9.0.&lt;/p&gt;

&lt;p&gt;The Apache Flink project’s goal is to develop a stream processing system to
unify and power many forms of real-time and offline data processing
applications as well as event-driven applications. In this release, we have
made a huge step forward in that effort, by integrating Flink’s stream and
batch processing capabilities under a single, unified runtime.&lt;/p&gt;

&lt;p&gt;Significant features on this path are batch-style recovery for batch jobs and
a preview of the new Blink-based query engine for Table API and SQL queries.
We are also excited to announce the availability of the State Processor API,
which is one of the most frequently requested features and enables users to
read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes
a reworked WebUI and previews of Flink’s new Python Table API and its
integration with the Apache Hive ecosystem.&lt;/p&gt;

&lt;p&gt;This blog post describes all major new features and improvements, important
changes to be aware of and what to expect moving forward. For more details,
check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344601&quot;&gt;complete release
changelog&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The binary distribution and source artifacts for this release are now
available via the &lt;a href=&quot;https://flink.apache.org/downloads.html&quot;&gt;Downloads&lt;/a&gt; page of
the Flink project, along with the updated
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/&quot;&gt;documentation&lt;/a&gt;.
Flink 1.9 is API-compatible with previous 1.x releases for APIs annotated with
the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;

&lt;p&gt;Please feel encouraged to download the release and share your thoughts with
the community through the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing
lists&lt;/a&gt; or
&lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt;. As always,
feedback is very much appreciated!&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#fine-grained-batch-recovery-flip-1&quot; id=&quot;markdown-toc-fine-grained-batch-recovery-flip-1&quot;&gt;Fine-grained Batch Recovery (FLIP-1)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#state-processor-api-flip-43&quot; id=&quot;markdown-toc-state-processor-api-flip-43&quot;&gt;State Processor API (FLIP-43)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#stop-with-savepoint-flip-34&quot; id=&quot;markdown-toc-stop-with-savepoint-flip-34&quot;&gt;Stop-with-Savepoint (FLIP-34)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-webui-rework&quot; id=&quot;markdown-toc-flink-webui-rework&quot;&gt;Flink WebUI Rework&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#preview-of-the-new-blink-sql-query-processor&quot; id=&quot;markdown-toc-preview-of-the-new-blink-sql-query-processor&quot;&gt;Preview of the new Blink SQL Query Processor&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#preview-of-full-hive-integration-flink-10556&quot; id=&quot;markdown-toc-preview-of-full-hive-integration-flink-10556&quot;&gt;Preview of Full Hive Integration (FLINK-10556)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#preview-of-the-new-python-table-api-flip-38&quot; id=&quot;markdown-toc-preview-of-the-new-python-table-api-flip-38&quot;&gt;Preview of the new Python Table API (FLIP-38)&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;h3 id=&quot;fine-grained-batch-recovery-flip-1&quot;&gt;Fine-grained Batch Recovery (FLIP-1)&lt;/h3&gt;

&lt;p&gt;The time to recover a batch (DataSet, Table API and SQL) job from a task
failure was significantly reduced. Until Flink 1.9, task failures in batch
jobs were recovered by canceling all tasks and restarting the whole job, i.e,
the job was started from scratch and all progress was voided. With this
release, Flink can be configured to limit the recovery to only those tasks
that are in the same &lt;strong&gt;failover region&lt;/strong&gt;. A failover region is the set of
tasks that are connected via pipelined data exchanges. Hence, the
batch-shuffle connections of a job define the boundaries of its failover
regions. More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt;.
&lt;img src=&quot;/img/blog/release-19-flip1.png&quot; alt=&quot;alt_text&quot; title=&quot;Fine-grained Batch
Recovery&quot; /&gt;&lt;/p&gt;

&lt;p&gt;To use this new failover strategy, you need to do the following
settings:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Make sure you have the entry &lt;code&gt;jobmanager.execution.failover-strategy:
region&lt;/code&gt; in your &lt;code&gt;flink-conf.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The configuration of the 1.9 distribution has that entry by default,
  but when reusing a configuration file from previous setups, you have to add
  it manually.&lt;/p&gt;

&lt;p&gt;Moreover, you need to set the &lt;code&gt;ExecutionMode&lt;/code&gt; of batch jobs in the
&lt;code&gt;ExecutionConfig&lt;/code&gt; to &lt;code&gt;BATCH&lt;/code&gt; to configure that data shuffles are not pipelined
and jobs have more than one failover region.&lt;/p&gt;

&lt;p&gt;The “Region” failover strategy also improves the recovery of “embarrassingly
parallel” streaming jobs, i.e., jobs without any shuffle like keyBy() or
rebalance. When such a job is recovered, only the tasks of the affected
pipeline (failover region) are restarted. For all other streaming jobs, the
recovery behavior is the same as in prior Flink versions.&lt;/p&gt;

&lt;h3 id=&quot;state-processor-api-flip-43&quot;&gt;State Processor API (FLIP-43)&lt;/h3&gt;

&lt;p&gt;Up to Flink 1.9, accessing the state of a job from the outside was limited to
the (still) experimental &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html&quot;&gt;Queryable
State&lt;/a&gt;.
This release introduces a new, powerful library to read, write and modify
state snapshots using the batch DataSet API. In practice, this means:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Flink job state can be bootstrapped by reading data from external systems,
such as external databases, and converting it into a savepoint.&lt;/li&gt;
  &lt;li&gt;State in savepoints can be queried using any of Flink’s batch APIs
(DataSet, Table, SQL), for example to analyze relevant state patterns or
check for discrepancies in state that can support application auditing or
troubleshooting.&lt;/li&gt;
  &lt;li&gt;The schema of state in savepoints can be migrated offline, compared to the
previous approach requiring online migration on schema access.&lt;/li&gt;
  &lt;li&gt;Invalid data in savepoints can be identified and corrected.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The new State Processor API covers all variations of snapshots: savepoints,
full checkpoints and incremental checkpoints. More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+State+Processor+API&quot;&gt;FLIP-43&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;stop-with-savepoint-flip-34&quot;&gt;Stop-with-Savepoint (FLIP-34)&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#operations&quot;&gt;Cancelling with a
savepoint&lt;/a&gt;
is a common operation for stopping/restarting, forking or updating Flink jobs.
However, the existing implementation did not guarantee output persistence to
external storage systems for exactly-once sinks. To improve the end-to-end
semantics when stopping a job, Flink 1.9 introduces a new &lt;code&gt;SUSPEND&lt;/code&gt; mode to
stop a job with a savepoint that is consistent with the emitted data.
You can suspend a job with Flink’s CLI client as follows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;bin/flink stop -p [:targetDirectory] :jobId
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The final job state is set to &lt;code&gt;FINISHED&lt;/code&gt; on success, allowing
users to detect failures of the requested operation.&lt;/p&gt;

&lt;p&gt;More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212&quot;&gt;FLIP-34&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;flink-webui-rework&quot;&gt;Flink WebUI Rework&lt;/h3&gt;

&lt;p&gt;After a
&lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html&quot;&gt;discussion&lt;/a&gt;
about modernizing the internals of Flink’s WebUI, this component was
reconstructed using the latest stable version of Angular — basically, a bump
from Angular 1.x to 7.x. The redesigned version is the default in 1.9.0,
however there is a link to switch to the old WebUI.&lt;/p&gt;

&lt;div class=&quot;row&quot;&gt; &lt;div class=&quot;col-sm-6&quot;&gt; &lt;span&gt;&lt;img class=&quot;thumbnail&quot; src=&quot;/img/blog/release-19-web1.png&quot; /&gt;&lt;/span&gt; &lt;/div&gt; &lt;div class=&quot;col-sm-6&quot;&gt; &lt;span&gt;&lt;img class=&quot;thumbnail&quot; src=&quot;/img/blog/release-19-web2.png&quot; /&gt;&lt;/span&gt; &lt;/div&gt;
    &lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Moving forward, feature parity for the old version of the WebUI 
will not be guaranteed.&lt;/p&gt;

&lt;h3 id=&quot;preview-of-the-new-blink-sql-query-processor&quot;&gt;Preview of the new Blink SQL Query Processor&lt;/h3&gt;

&lt;p&gt;Following the &lt;a href=&quot;/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;donation of
Blink&lt;/a&gt; to
Apache Flink, the community worked on integrating Blink’s query optimizer and
runtime for the Table API and SQL. As a first step, we refactored the
monolithic &lt;code&gt;flink-table&lt;/code&gt; module into smaller modules
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;).
This resulted in a clear separation of and well-defined interfaces between the
Java and Scala API modules and the optimizer and runtime modules.&lt;/p&gt;

&lt;p&gt;&lt;span&gt;&lt;img style=&quot;width:50%&quot; src=&quot;/img/blog/release-19-stack.png&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Next, we extended Blink’s planner to implement the new optimizer interface
such that there are now two pluggable query processors to execute Table API
and SQL statements: the pre-1.9 Flink processor and the new Blink-based query
processor. The Blink-based query processor offers better SQL coverage (full TPC-H
coverage in 1.9, TPC-DS coverage is planned for the next release) and improved
performance for batch queries as the result of more extensive query
optimization (cost-based plan selection and more optimization rules), improved
code-generation, and tuned operator implementations.
The Blink-based query processor also provides a more powerful streaming runner,
with some new features (e.g. dimension table join, TopN, deduplication) and 
optimizations to solve data-skew in aggregation and more useful built-in
functions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The semantics and set of supported operations of the query
processors are mostly, but not fully aligned.&lt;/p&gt;

&lt;p&gt;However, the integration of Blink’s query processor is not fully completed
yet. Therefore, the pre-1.9 Flink processor is still the default processor in
Flink 1.9 and recommended for production settings. You can enable the Blink
processor by configuring it via the &lt;code&gt;EnvironmentSettings&lt;/code&gt; when creating a
&lt;code&gt;TableEnvironment&lt;/code&gt;. The selected processor must be on the classpath of the
executing Java process. For cluster setups, both query processors are
automatically loaded with the default configuration. When running a query from
your IDE you need to explicitly &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/#table-program-dependencies&quot;&gt;add a planner
dependency&lt;/a&gt;
to your project.&lt;/p&gt;

&lt;h4 id=&quot;other-improvements-to-the-table-api-and-sql&quot;&gt;&lt;strong&gt;Other Improvements to the Table API and SQL&lt;/strong&gt;&lt;/h4&gt;

&lt;p&gt;Besides the exciting progress around the Blink planner, the community worked
on a whole set of other improvements to these interfaces, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Scala-free Table API and SQL for Java users
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;As part of the refactoring and splitting of the flink-table module, two
separate API modules for Java and Scala were created. For Scala users,
nothing really changes, but Java users can use the Table API and/or SQL now
without pulling in a Scala dependency.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Rework of the Table API Type System&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System&quot;&gt;FLIP-37&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;The community implemented a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/types.html#data-types&quot;&gt;new data type
system&lt;/a&gt;
to detach the Table API from Flink’s
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/types_serialization.html#flinks-typeinformation-class&quot;&gt;TypeInformation&lt;/a&gt;
class and improve its compliance with the SQL standard. This is still a
work in progress and expected to be completed in the next release. In
Flink 1.9, UDFs are―among other things―not ported to the new type system
yet.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Multi-column and Multi-row Transformations for Table API&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739&quot;&gt;FLIP-29&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;The functionality of the Table API was extended with a set of
transformations that support multi-row and/or multi-column inputs and
outputs. These transformations significantly ease the implementation of
processing logic that would be cumbersome to implement with relational
operators.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New, Unified Catalog APIs&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs&quot;&gt;FLIP-30&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;We reworked the catalog APIs to store metadata and unified the handling of
internal and external catalogs. This effort was mainly initiated as a
prerequisite for the Hive integration (see below), but improves the overall
convenience of managing catalog metadata in Flink. Besides improving the
catalog interfaces, we also extended their functionality. Previously table
definitions for Table API or SQL queries were volatile. With Flink 1.9, the
metadata of tables which are registered with a SQL DDL statement can be
persisted in a catalog. This means you can add a table that is backed by a
Kafka topic to a Metastore catalog and from then on query this table
whenever your catalog is connected to Metastore.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;DDL Support in the SQL API
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10232&quot;&gt;FLINK-10232&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;

    &lt;p&gt;Up to this point, Flink SQL only supported DML statements (e.g. &lt;code&gt;SELECT&lt;/code&gt;,
&lt;code&gt;INSERT&lt;/code&gt;). External tables (table sources and sinks) had to be registered
via Java/Scala code or configuration files. For 1.9, we added support for
SQL DDL statements to register and remove tables and views (&lt;code&gt;CREATE TABLE,
DROP TABLE)&lt;/code&gt;. However, we did not add
stream-specific syntax extensions to define timestamp extraction and
watermark generation, yet. Full support for streaming use cases is planned
for the next release.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;preview-of-full-hive-integration-flink-10556&quot;&gt;Preview of Full Hive Integration (FLINK-10556)&lt;/h3&gt;

&lt;p&gt;Apache Hive is widely used in Hadoop’s ecosystem to store and query large
amounts of structured data. Besides being a query processor, Hive features a
catalog called Metastore to manage and organize large datasets. A common
integration point for query processors is to integrate with Hive’s Metastore
in order to be able to tap into the data managed by Hive.&lt;/p&gt;

&lt;p&gt;Recently, the community started implementing an external catalog for Flink’s
Table API and SQL that connects to Hive’s Metastore. In Flink 1.9, users will
be able to query and process all data that is stored in Hive. As described
earlier, you will also be able to persist metadata of Flink tables in Metastore.
Moreover, the Hive integration includes support to use Hive’s UDFs in Flink
Table API or SQL queries. More details are available in
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10556&quot;&gt;FLINK-10556&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While, previously, table definitions for Table API or SQL queries were always
volatile, the new catalog connector additionally allows persisting a table in
Metastore that is created with a SQL DDL statement (see above). This means
that you connect to Metastore and register a table that is, for example,
backed by a Kafka topic. From now on, you can query that table whenever your
catalog is connected to Metastore.&lt;/p&gt;

&lt;p&gt;Please note that the Hive support in Flink 1.9 is experimental. We are
planning to stabilize these features for the next release and are looking
forward to your feedback.&lt;/p&gt;

&lt;h3 id=&quot;preview-of-the-new-python-table-api-flip-38&quot;&gt;Preview of the new Python Table API (FLIP-38)&lt;/h3&gt;

&lt;p&gt;This release also introduces a first version of a Python Table API
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API&quot;&gt;FLIP-38&lt;/a&gt;).
This marks the start towards our goal of bringing
full-fledged Python support to Flink. The feature was designed as a slim
Python API wrapper around the Table API, basically translating Python Table
API method calls into Java Table API calls. In the initial version that ships
with Flink 1.9, the Python Table API does not support UDFs yet, but just
standard relational operations. Support for UDFs implemented in Python is on
the roadmap for future releases.&lt;/p&gt;

&lt;p&gt;If you’d like to try the new Python API, you have to manually &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#build-pyflink&quot;&gt;install
PyFlink&lt;/a&gt;.
From there, you can have a look at &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html&quot;&gt;this
walkthrough&lt;/a&gt;
or explore it on your own. The &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-td31201.html&quot;&gt;community is currently
working&lt;/a&gt;
on preparing a &lt;code&gt;pyflink&lt;/code&gt; Python package that will be made available for
installation via &lt;code&gt;pip&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;The Table API and SQL are now part of the default configuration of the
Flink distribution. Before, the Table API and SQL had to be enabled by
moving the corresponding JAR file from ./opt to ./lib.&lt;/li&gt;
  &lt;li&gt;The machine learning library (flink-ml) has been removed in preparation for
&lt;a href=&quot;https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit&quot;&gt;FLIP-39&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;The old DataSet and DataStream Python APIs have been removed in favor of
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API&quot;&gt;FLIP-38&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Flink can be compiled and run on Java 9. Note that certain components
interacting with external systems (connectors, filesystems, reporters) may
not work since the respective projects may have skipped Java 9 support.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/release-notes/flink-1.9.html&quot;&gt;release
notes&lt;/a&gt;
for a more detailed list of changes and new features if you plan to upgrade
your Flink setup to Flink 1.9.0.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;We would like to thank all contributors who have made this release possible:&lt;/p&gt;

&lt;p&gt;Abdul Qadeer (abqadeer), Aitozi, Alberto Romero, Aleksey Pak, Alexander
Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrew Duffy, Andrey Zagrebin,
Ankur, Artsem Semianenka, Benchao Li, Biao Liu, Bo WANG, Bowen L, Chesnay
Schepler, Clark Yang, Congxian Qiu, Cristian, Danny Chan, David Moravek, Dawid
Wysakowicz, Dian Fu, EronWright, Fabian Hueske, Fabio Lombardelli, Fokko
Driesprong, Gao Yun, Gary Yao, Gen Luo, Gyula Fora, Hequn Cheng,
Hongtao Zhang, Huang Xingbo, HuangXingBo, Hugo Da Cruz Louro, Humberto
Rodríguez A, Hwanju Kim, Igal Shilman, Jamie Grier, Jark Wu, Jason, Jasper
Yue, Jeff Zhang, Jiangjie (Becket) Qin, Jiezhi.G, Jincheng Sun, Jing Zhang,
Jingsong Lee, Juan Gentile, Jungtaek Lim, Kailash Dayanand, Kevin
Bohinski, Konstantin Knauf, Konstantinos Papadopoulos, Kostas Kloudas, Kurt
Young, Lakshmi, Lakshmi Gururaja Rao, Leeviiii, LouisXu, Maximilian Michels,
Nico Kruber, Niels Basjes, Paul Lam, PengFei Li, Peter Huang, Pierre Zemb,
Piotr Nowojski, Piyush Narang, Richard Deurwaarder, Robert Metzger, Robert
Stoll, Romano Vacca, Rong Rong, Rui Li, Ryantaocer, Scott Mitchell, Seth
Wiesman, Shannon Carey, Shimin Yang, Stefan Richter, Stephan Ewen, Stephen
Connolly, Steven Wu, SuXingLee, TANG Wen-hui, Thomas Weise, Till Rohrmann,
Timo Walther, Tom Goong, TsReaper, Tzu-Li (Gordon) Tai, Ufuk Celebi,
Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Xintong Song, Xpray,
XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yangze Guo, Yu Li, Yun Gao, Yun
Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, Zili
Chen, aloys, arganzheng, azagrebin, bd2019us, beyond1920, biao.liub,
blueszheng, boshu Zheng, chenqi, chummyhe89, chunpinghe, dcadmin,
dianfu, godfrey he, guanghui01.rong, hehuiyuan, hello, hequn8128, 
jackyyin, joongkeun.yang, klion26, lamber-ken, leesf, liguowei,
lincoln-lil, liyafan82, luoqi, mans2singh, maqingxiang, maxin, mjl, okidogi,
ozan, potseluev, qiangsi.lq, qiaoran, robbinli, shaoxuan-wang, shengqian.zhou,
shenlang.sl, shuai-xu, sunhaibotb, tianchen, tianchen92,
tison, tom_gong, vinoyang, vthinkxie, wanggeng3, wenhuitang, winifredtamg,
xl38154, xuyang1706, yangfei5, yanghua, yuzhao.cyz,
zhangxin516, zhangxinxing, zhaofaxian, zhijiang, zjuwangg, 林小铂,
黄培松, 时无两丶.&lt;/p&gt;
</description>
<pubDate>Thu, 22 Aug 2019 04:30:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/08/22/release-1.9.0.html</link>
<guid isPermaLink="true">/news/2019/08/22/release-1.9.0.html</guid>
</item>

<item>
<title>Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</title>
<description>&lt;style type=&quot;text/css&quot;&gt;
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
.tg .tg-wide{padding:10px 30px;}
.tg .tg-top{vertical-align:top}
.tg .tg-topcenter{text-align:center;vertical-align:top}
.tg .tg-center{text-align:center;vertical-align:center}
&lt;/style&gt;

&lt;p&gt;In a &lt;a href=&quot;/2019/06/05/flink-network-stack.html&quot;&gt;previous blog post&lt;/a&gt;, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the network stack will be further examined in a future post. If you are unfamiliar with the network stack we highly recommend reading the &lt;a href=&quot;/2019/06/05/flink-network-stack.html&quot;&gt;network stack deep-dive&lt;/a&gt; first and then continuing here.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#monitoring&quot; id=&quot;markdown-toc-monitoring&quot;&gt;Monitoring&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#backpressure-monitor&quot; id=&quot;markdown-toc-backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#network-metrics&quot; id=&quot;markdown-toc-network-metrics&quot;&gt;Network Metrics&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#backpressure&quot; id=&quot;markdown-toc-backpressure&quot;&gt;Backpressure&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#resource-usage--throughput&quot; id=&quot;markdown-toc-resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#latency-tracking&quot; id=&quot;markdown-toc-latency-tracking&quot;&gt;Latency Tracking&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;monitoring&quot;&gt;Monitoring&lt;/h2&gt;

&lt;p&gt;Probably the most important part of network monitoring is &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;monitoring backpressure&lt;/a&gt;, a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The receiver is slow.&lt;br /&gt;
This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The network channel is slow.&lt;br /&gt;
  Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous &lt;a href=&quot;https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines&quot;&gt;capacity planning blog post&lt;/a&gt; provides some more insights.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through &lt;a href=&quot;https://www.ververica.com/blog/how-flink-handles-backpressure&quot;&gt;this blog post on backpressure&lt;/a&gt; from 2015.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;directly via Flink’s web UI and its backpressure monitor, or&lt;/li&gt;
  &lt;li&gt;indirectly through some of the network metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future investigations will most likely be the operator after the last one that is experiencing backpressure. This specific operator is also highly likely to cause the backpressure in the first place.&lt;/p&gt;

&lt;h3 id=&quot;backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;backpressure monitor&lt;/a&gt; is only exposed via Flink’s web UI². Since it’s an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks’ threads on all TaskManagers via &lt;code&gt;Thread.getStackTrace()&lt;/code&gt; and computes the number of samples where tasks were blocked on a buffer request. These tasks were either unable to send network buffers at the rate they were produced, or the downstream task(s) were slow at processing them and gave no credits for sending. The backpressure monitor will show the ratio of blocked to total requests. Since some backpressure is considered normal / temporary, it will show a status of&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;span style=&quot;color:green&quot;&gt;OK&lt;/span&gt; for &lt;code&gt;ratio ≤ 0.10&lt;/code&gt;,&lt;/li&gt;
  &lt;li&gt;&lt;span style=&quot;color:orange&quot;&gt;LOW&lt;/span&gt; for &lt;code&gt;0.10 &amp;lt; Ratio ≤ 0.5&lt;/code&gt;, and&lt;/li&gt;
  &lt;li&gt;&lt;span style=&quot;color:red&quot;&gt;HIGH&lt;/span&gt; for &lt;code&gt;0.5 &amp;lt; Ratio ≤ 1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png&quot; width=&quot;600px&quot; alt=&quot;Backpressure sampling:high&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;&lt;sup&gt;2&lt;/sup&gt; You may also access the backpressure monitor via the REST API: &lt;code&gt;/jobs/:jobid/vertices/:vertexid/backpressure&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance.&lt;/p&gt;

&lt;h2 id=&quot;network-metrics&quot;&gt;Network Metrics&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network&quot;&gt;Network&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#io&quot;&gt;task I/O&lt;/a&gt; metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most relevant metrics for users are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;&lt;br /&gt;
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
While interpreting &lt;code&gt;inPoolUsage&lt;/code&gt; in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;&lt;br /&gt;
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
Starting with Flink 1.9, &lt;code&gt;inPoolUsage&lt;/code&gt; is the sum of &lt;code&gt;floatingBuffersUsage&lt;/code&gt; and &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;code&gt;numRecordsOut&lt;/code&gt;, &lt;code&gt;numRecordsIn&lt;/code&gt;&lt;br /&gt;
Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;code&gt;numBytesOut&lt;/code&gt;, &lt;code&gt;numBytesInLocal&lt;/code&gt;, &lt;code&gt;numBytesInRemote&lt;/code&gt;&lt;br /&gt;
The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;code&gt;numBuffersOut&lt;/code&gt;, &lt;code&gt;numBuffersInLocal&lt;/code&gt;, &lt;code&gt;numBuffersInRemote&lt;/code&gt;&lt;br /&gt;
Similar to &lt;code&gt;numBytes…&lt;/code&gt; but counting the number of network buffers.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;alert alert-warning&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
For the sake of completeness and since they have been used in the past, we will briefly look at the &lt;code&gt;outputQueueLength&lt;/code&gt; and &lt;code&gt;inputQueueLength&lt;/code&gt; metrics. These are somewhat similar to the &lt;code&gt;[out,in]PoolUsage&lt;/code&gt; metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there is also a special subtlety with local channels: since a local input channel does not have its own queue (it works with the output queue directly), its value will always be &lt;code&gt;0&lt;/code&gt; for that channel (see &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12576&quot;&gt;FLINK-12576&lt;/a&gt;) and for the case where you only have local input channels, then &lt;code&gt;inputQueueLength = 0&lt;/code&gt;.&lt;/p&gt;

  &lt;p&gt;Overall, &lt;strong&gt;we discourage the use of&lt;/strong&gt; &lt;code&gt;outputQueueLength&lt;/code&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;code&gt;inputQueueLength&lt;/code&gt; because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various &lt;code&gt;*PoolUsage&lt;/code&gt; metrics which even reveal more detailed insight.&lt;/p&gt;
&lt;/div&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
 If you reason about buffer usage, please keep the following in mind:&lt;/p&gt;

  &lt;ul&gt;
    &lt;li&gt;Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5).
      &lt;ul&gt;
        &lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it.&lt;/li&gt;
        &lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082)&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/li&gt;
    &lt;li&gt;The receiver will only release a received buffer after deserialising the last record in it.&lt;/li&gt;
  &lt;/ul&gt;
&lt;/div&gt;

&lt;p&gt;The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.&lt;/p&gt;

&lt;h3 id=&quot;backpressure&quot;&gt;Backpressure&lt;/h3&gt;

&lt;p&gt;Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;If a subtask’s&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt; &lt;strong&gt;is 100%&lt;/strong&gt;, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the &lt;code&gt;RecordWriters&lt;/code&gt; are currently writing into.&lt;br /&gt;
&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;&quot;&gt;&lt;/span&gt; This is different to what the backpressure monitor is showing!&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;An &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain &lt;code&gt;#exclusive buffers&lt;/code&gt; credits), they are reserved for a sender’s backlog and wait for data, they may contain data and are enqueued in an input channel, or they may contain data and are being read by the receiver’s subtask (one record at a time).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; Due to &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11082&quot;&gt;FLINK-11082&lt;/a&gt;, an &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% is quite common even in normal situations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; If &lt;code&gt;inPoolUsage&lt;/code&gt; is constantly around 100%, this is a strong indicator for exercising backpressure upstream.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.&lt;/p&gt;

&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;&lt;/th&gt;
    &lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
    &lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot;&gt;&lt;code&gt;inPoolUsage&lt;/code&gt; low&lt;/th&gt;
    &lt;td class=&quot;tg-topcenter&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
    &lt;td class=&quot;tg-topcenter&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressured, temporary situation: upstream is not backpressured yet or not anymore)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot; rowspan=&quot;2&quot;&gt;
      &lt;code&gt;inPoolUsage&lt;/code&gt; high&lt;br /&gt;
      (&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9+&lt;/span&gt;&lt;/strong&gt;)&lt;/th&gt;
    &lt;td class=&quot;tg-topcenter&quot;&gt;
      if all upstream tasks’&lt;code&gt;outPoolUsage&lt;/code&gt; are low: &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (may eventually cause backpressure)&lt;/td&gt;
    &lt;td class=&quot;tg-topcenter&quot; rowspan=&quot;2&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressured by downstream task(s) or network, probably forwarding backpressure upstream)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;tg-topcenter&quot;&gt;if any upstream task’s&lt;code&gt;outPoolUsage&lt;/code&gt; is high: &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (may exercise backpressure upstream and may be the source of backpressure)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;

&lt;p&gt;&lt;br /&gt;
We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;If all subtasks of the receiver task have low &lt;code&gt;inPoolUsage&lt;/code&gt; values and any upstream subtask’s &lt;code&gt;outPoolUsage&lt;/code&gt; is high, then there may be a network bottleneck causing backpressure.
Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the &lt;a href=&quot;#span-classlabel-label-info-styledisplay-inline-blockspan-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressurespan&quot;&gt;What to do with backpressure?&lt;/a&gt; box below.&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-flink-19-and-above&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Flink 1.9 and above&lt;/h3&gt;

  &lt;ul&gt;
    &lt;li&gt;If &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;:
      &lt;ul&gt;
        &lt;li&gt;Assuming that &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is around 100%, the higher the &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; the more input channels are backpressured. In an extreme case of &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; being close to 100%, it means that all channels are backpressured.&lt;/li&gt;
      &lt;/ul&gt;
    &lt;/li&gt;
  &lt;/ul&gt;

  &lt;p&gt;&lt;br /&gt;
The relation between &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, and the upstream tasks’ &lt;code&gt;outPoolUsage&lt;/code&gt; is summarised in the following table and extends on the table above with &lt;code&gt;inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage&lt;/code&gt;:&lt;/p&gt;

  &lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;&lt;/th&gt;
    &lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; low&lt;/th&gt;
    &lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; high&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
      &lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
      &lt;em&gt;any&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (potential network bottleneck)&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
      &lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressure eventually appears on only some of the input channels)&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressure eventually appears on most or all of the input channels)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
      &lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
      any upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressure on only some of the input channels)&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot;&gt;
      &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
      (backpressure on most or all of the input channels)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;

  &lt;p&gt;&lt;sup&gt;3&lt;/sup&gt; this should not happen&lt;/p&gt;

&lt;/div&gt;

&lt;h3 id=&quot;resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/h3&gt;

&lt;p&gt;Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Low throughput with frequent &lt;code&gt;outPoolUsage&lt;/code&gt; values around 100% but low &lt;code&gt;inPoolUsage&lt;/code&gt; on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; parameter or try disabling credit-based flow control to verify.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Combining &lt;code&gt;numRecordsOut&lt;/code&gt; and &lt;code&gt;numBytesOut&lt;/code&gt; helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;If you want to reason about buffer fill rates and the influence of the output flusher, you may combine &lt;code&gt;numBytesInRemote&lt;/code&gt; with &lt;code&gt;numBuffersInRemote&lt;/code&gt;. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout.
Please note that, as of Flink 1.8 and 1.9, &lt;code&gt;numBuffersOut&lt;/code&gt; only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressure&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; What to do with Backpressure?&lt;/h3&gt;

  &lt;p&gt;Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.&lt;/p&gt;

  &lt;p&gt;Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. In that case, you can often just ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.&lt;/p&gt;

  &lt;h4 id=&quot;system-resources&quot;&gt;System Resources&lt;/h4&gt;

  &lt;p&gt;Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:&lt;/p&gt;

  &lt;ol&gt;
    &lt;li&gt;Try to optimise your code. Code profilers are helpful in this case.&lt;/li&gt;
    &lt;li&gt;Tune Flink for that specific resource.&lt;/li&gt;
    &lt;li&gt;Scale out by increasing the parallelism and/or increasing the number of machines in the cluster.&lt;/li&gt;
  &lt;/ol&gt;

  &lt;h4 id=&quot;garbage-collection&quot;&gt;Garbage Collection&lt;/h4&gt;

  &lt;p&gt;Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -&lt;code&gt;XX:+PrintGCDetails&lt;/code&gt;) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here (&lt;a href=&quot;https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html&quot;&gt;Oracle’s Garbage Collection Tuning Guide&lt;/a&gt; or &lt;a href=&quot;https://plumbr.io/java-garbage-collection-handbook&quot;&gt;Plumbr’s Java Garbage Collection handbook&lt;/a&gt; seem like a good start).&lt;/p&gt;

  &lt;h4 id=&quot;cputhread-bottleneck&quot;&gt;CPU/Thread Bottleneck&lt;/h4&gt;

  &lt;p&gt;Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads’ CPU usage, for example.&lt;/p&gt;

  &lt;h4 id=&quot;thread-contention&quot;&gt;Thread Contention&lt;/h4&gt;

  &lt;p&gt;Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around the shared &lt;code&gt;/dev/urandom&lt;/code&gt; resource.&lt;/p&gt;

  &lt;h4 id=&quot;load-imbalance&quot;&gt;Load Imbalance&lt;/h4&gt;

  &lt;p&gt;If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.&lt;/p&gt;

  &lt;p&gt;&lt;br /&gt;
This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised.&lt;/p&gt;
&lt;/div&gt;

&lt;h3 id=&quot;latency-tracking&quot;&gt;Latency Tracking&lt;/h3&gt;

&lt;p&gt;Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same time — and block the receiver from processing incoming records — the time inside the network stack for following records is extended dramatically. We highly recommend adding your own metrics to your Flink job  for better latency tracking in your job’s components and a broader view on the cause of delays.&lt;/p&gt;

&lt;p&gt;Flink offers some support for &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#latency-tracking&quot;&gt;tracking the latency&lt;/a&gt; of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-latency-interval&quot;&gt;configuration via &lt;code&gt;metrics.latency.interval&lt;/code&gt;&lt;/a&gt; or via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setLatencyTrackingInterval-long-&quot;&gt;ExecutionConfig#setLatencyTrackingInterval()&lt;/a&gt;. Once enabled, Flink will collect latency histograms based on the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-latency-granularity&quot;&gt;granularity defined via &lt;code&gt;metrics.latency.granularity&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;single&lt;/code&gt;: one histogram for each operator subtask&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;operator&lt;/code&gt; (default): one histogram for each combination of source task and operator subtask&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;subtask&lt;/code&gt;: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, &lt;em&gt;a latency marker does not enter application logic&lt;/em&gt; and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not a full “end-to-end” latency. User code indirectly influences these waiting times, though!&lt;/p&gt;

&lt;p&gt;Since &lt;code&gt;LatencyMarkers&lt;/code&gt; sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most &lt;code&gt;buffer_timeout / 2&lt;/code&gt;. This delay will add to each network connection towards a subtask and should be taken into account when analysing a subtask’s latency metric.&lt;/p&gt;

&lt;p&gt;By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Flink’s latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results.&lt;/p&gt;
&lt;/div&gt;

&lt;div class=&quot;alert alert-warning&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
Enabling latency metrics can significantly impact the performance of the cluster (in particular for &lt;code&gt;subtask&lt;/code&gt; granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes.&lt;/p&gt;
&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In the previous sections we discussed how to monitor Flink’s network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network layer itself but, in most cases, is caused by some subtask under high load. These two scenarios can be distinguished from one another by analysing the metrics as described above. We also provided some hints at monitoring resource usage and tracking network latencies that may add up from sources to sinks.&lt;/p&gt;

&lt;p&gt;Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.&lt;/p&gt;

</description>
<pubDate>Tue, 23 Jul 2019 17:30:00 +0200</pubDate>
<link>https://flink.apache.org/2019/07/23/flink-network-stack-2.html</link>
<guid isPermaLink="true">/2019/07/23/flink-network-stack-2.html</guid>
</item>

<item>
<title>Apache Flink 1.8.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 40 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all improvements, sub-tasks and bug fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10921&quot;&gt;FLINK-10921&lt;/a&gt;] -         Prioritize shard consumers in Kinesis Consumer by event time 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12617&quot;&gt;FLINK-12617&lt;/a&gt;] -         StandaloneJobClusterEntrypoint should default to random JobID for non-HA setups 
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9445&quot;&gt;FLINK-9445&lt;/a&gt;] -         scala-shell uses plain java command
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] -         Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10941&quot;&gt;FLINK-10941&lt;/a&gt;] -         Slots prematurely released which still contain unconsumed data 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11059&quot;&gt;FLINK-11059&lt;/a&gt;] -         JobMaster may continue using an invalid slot if releasing idle slot meet a timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11107&quot;&gt;FLINK-11107&lt;/a&gt;] -         Avoid memory stateBackend to create arbitrary folders under HA path when no checkpoint path configured
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11897&quot;&gt;FLINK-11897&lt;/a&gt;] -         ExecutionGraphSuspendTest does not wait for all tasks to be submitted
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11915&quot;&gt;FLINK-11915&lt;/a&gt;] -         DataInputViewStream skip returns wrong value
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11987&quot;&gt;FLINK-11987&lt;/a&gt;] -         Kafka producer occasionally throws NullpointerException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12009&quot;&gt;FLINK-12009&lt;/a&gt;] -         Wrong check message about heartbeat interval for HeartbeatServices
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12042&quot;&gt;FLINK-12042&lt;/a&gt;] -         RocksDBStateBackend mistakenly uses default filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12112&quot;&gt;FLINK-12112&lt;/a&gt;] -         AbstractTaskManagerProcessFailureRecoveryTest process output logging does not work properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12132&quot;&gt;FLINK-12132&lt;/a&gt;] -         The example in /docs/ops/deployment/yarn_setup.md should be updated due to the change FLINK-2021
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12184&quot;&gt;FLINK-12184&lt;/a&gt;] -         HistoryServerArchiveFetcher isn&amp;#39;t compatible with old version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12219&quot;&gt;FLINK-12219&lt;/a&gt;] -         Yarn application can&amp;#39;t stop when flink job failed in per-job yarn cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12247&quot;&gt;FLINK-12247&lt;/a&gt;] -         fix NPE when writing an archive file to a FileSystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12260&quot;&gt;FLINK-12260&lt;/a&gt;] -         Slot allocation failure by taskmanager registration timeout and race
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12296&quot;&gt;FLINK-12296&lt;/a&gt;] -         Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12297&quot;&gt;FLINK-12297&lt;/a&gt;] -         Make ClosureCleaner recursive
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12301&quot;&gt;FLINK-12301&lt;/a&gt;] -         Scala value classes inside case classes cannot be serialized anymore in Flink 1.8.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] -         Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12375&quot;&gt;FLINK-12375&lt;/a&gt;] -         flink-container job jar does not have read permissions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12416&quot;&gt;FLINK-12416&lt;/a&gt;] -         Docker build script fails on symlink creation ln -s
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12544&quot;&gt;FLINK-12544&lt;/a&gt;] -         Deadlock while releasing memory and requesting segment concurrent in SpillableSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12547&quot;&gt;FLINK-12547&lt;/a&gt;] -         Deadlock when the task thread downloads jars using BlobClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12646&quot;&gt;FLINK-12646&lt;/a&gt;] -         Use reserved IP as unrouteable IP in RestClientTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12688&quot;&gt;FLINK-12688&lt;/a&gt;] -         Make serializer lazy initialization thread safe in StateDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12740&quot;&gt;FLINK-12740&lt;/a&gt;] -         SpillableSubpartitionTest deadlocks on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12835&quot;&gt;FLINK-12835&lt;/a&gt;] -         Time conversion is wrong in ManualClock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12863&quot;&gt;FLINK-12863&lt;/a&gt;] -         Race condition between slot offerings and AllocatedSlotReport
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12865&quot;&gt;FLINK-12865&lt;/a&gt;] -         State inconsistency between RM and TM on the slot status
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12871&quot;&gt;FLINK-12871&lt;/a&gt;] -         Wrong SSL setup examples in docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12895&quot;&gt;FLINK-12895&lt;/a&gt;] -         TaskManagerProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure failed on travis 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12896&quot;&gt;FLINK-12896&lt;/a&gt;] -         TaskCheckpointStatisticDetailsHandler uses wrong value for JobID when archiving
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11126&quot;&gt;FLINK-11126&lt;/a&gt;] -         Filter out AMRMToken in the TaskManager credentials
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12137&quot;&gt;FLINK-12137&lt;/a&gt;] -         Add more proper explanation on flink streaming connectors 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12169&quot;&gt;FLINK-12169&lt;/a&gt;] -         Improve Javadoc of MessageAcknowledgingSourceBase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12378&quot;&gt;FLINK-12378&lt;/a&gt;] -         Consolidate FileSystem Documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12391&quot;&gt;FLINK-12391&lt;/a&gt;] -         Add timeout to transfer.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12539&quot;&gt;FLINK-12539&lt;/a&gt;] -         StreamingFileSink: Make the class extendable to customize for different usecases
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12350&quot;&gt;FLINK-12350&lt;/a&gt;] -         RocksDBStateBackendTest doesn&amp;#39;t cover the incremental checkpoint code path
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12460&quot;&gt;FLINK-12460&lt;/a&gt;] -         Change taskmanager.tmp.dirs to io.tmp.dirs in configuration docs
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Tue, 02 Jul 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/07/02/release-1.8.1.html</link>
<guid isPermaLink="true">/news/2019/07/02/release-1.8.1.html</guid>
</item>

<item>
<title>A Practical Guide to Broadcast State in Apache Flink</title>
<description>&lt;p&gt;Since version 1.5.0, Apache Flink features a new type of state which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream. We walk you through the processing steps and the source code to implement this application in practice.&lt;/p&gt;

&lt;h2 id=&quot;what-is-broadcast-state&quot;&gt;What is Broadcast State?&lt;/h2&gt;

&lt;p&gt;The Broadcast State can be used to combine and jointly process two streams of events in a specific way. The events of the first stream are broadcasted to all parallel instances of an operator, which maintains them as state. The events of the other stream are not broadcasted but sent to individual instances of the same operator and processed together with the events of the broadcasted stream. 
The new broadcast state is a natural fit for applications that need to join a low-throughput and a high-throughput stream or need to dynamically update their processing logic. We will use a concrete example of the latter use case to explain the broadcast state and show its API in more detail in the remainder of this post.&lt;/p&gt;

&lt;h2 id=&quot;dynamic-pattern-evaluation-with-broadcast-state&quot;&gt;Dynamic Pattern Evaluation with Broadcast State&lt;/h2&gt;

&lt;p&gt;Imagine an e-commerce website that captures the interactions of all users as a stream of user actions. The company that operates the website is interested in analyzing the interactions to increase revenue, improve the user experience, and detect and prevent malicious behavior. 
The website implements a streaming application that detects a pattern on the stream of user events. However, the company wants to avoid modifying and redeploying the application every time the pattern changes. Instead, the application ingests a second stream of patterns and updates its active pattern when it receives a new pattern from the pattern stream. In the following, we discuss this application step-by-step and show how it leverages the broadcast state feature in Apache Flink.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig1.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Our example application ingests two data streams. The first stream provides user actions on the website and is illustrated on the top left side of the above figure. A user interaction event consists of the type of the action (user login, user logout, add to cart, or complete payment) and the id of the user, which is encoded by color. The user action event stream in our illustration contains a logout action of User 1001 followed by a payment-complete event for User 1003, and an “add-to-cart” action of User 1002.&lt;/p&gt;

&lt;p&gt;The second stream provides action patterns that the application will evaluate. A pattern consists of two consecutive actions. In the figure above, the pattern stream contains the following two:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Pattern #1: A user logs in and immediately logs out without browsing additional pages on the e-commerce website.&lt;/li&gt;
  &lt;li&gt;Pattern #2: A user adds an item to the shopping cart and logs out without completing the purchase.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Such patterns help a business in better analyzing user behavior, detecting malicious actions, and improving the website experience. For example, in the case of items being added to a shopping cart with no follow up purchase, the website team can take appropriate actions to understand better the reasons why users don’t complete a purchase and initiate specific programs to improve the website conversion (such as providing discount codes, limited free shipping offers etc.)&lt;/p&gt;

&lt;p&gt;On the right-hand side, the figure shows three parallel tasks of an operator that ingest the pattern and user action streams, evaluate the patterns on the action stream, and emit pattern matches downstream. For the sake of simplicity, the operator in our example only evaluates a single pattern with exactly two subsequent actions. The currently active pattern is replaced when a new pattern is received from the pattern stream. In principle, the operator could also be implemented to evaluate more complex patterns or multiple patterns concurrently which could be individually added or removed.&lt;/p&gt;

&lt;p&gt;We will describe how the pattern matching application processes the user action and pattern streams.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig2.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;First a pattern is sent to the operator. The pattern is broadcasted to all three parallel tasks of the operator. The tasks store the pattern in their broadcast state. Since the broadcast state should only be updated using broadcasted data, the state of all tasks is always expected to be the same.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig3.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Next, the first user actions are partitioned on the user id and shipped to the operator tasks. The partitioning ensures that all actions of the same user are processed by the same task. The figure above shows the state of the application after the first pattern and the first three action events were consumed by the operator tasks.&lt;/p&gt;

&lt;p&gt;When a task receives a new user action, it evaluates the currently active pattern by looking at the user’s latest and previous actions. For each user, the operator stores the previous action in the keyed state. Since the tasks in the figure above only received a single action for each user so far (we just started the application), the pattern does not need to be evaluated. Finally, the previous action in the user’s keyed state is updated to the latest action, to be able to look it up when the next action of the same user arrives.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig4.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;After the first three actions are processed, the next event, the logout action of User 1001, is shipped to the task that processes the events of User 1001. When the task receives the actions, it looks up the current pattern from the broadcast state and the previous action of User 1001. Since the pattern matches both actions, the task emits a pattern match event. Finally, the task updates its keyed state by overriding the previous event with the latest action.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig5.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;When a new pattern arrives in the pattern stream, it is broadcasted to all tasks and each task updates its broadcast state by replacing the current pattern with the new one.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig6.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Once the broadcast state is updated with a new pattern, the matching logic continues as before, i.e., user action events are partitioned by key and evaluated by the responsible task.&lt;/p&gt;

&lt;h2 id=&quot;how-to-implement-an-application-with-broadcast-state&quot;&gt;How to Implement an Application with Broadcast State?&lt;/h2&gt;

&lt;p&gt;Until now, we conceptually discussed the application and explained how it uses broadcast state to evaluate dynamic patterns over event streams. Next, we’ll show how to implement the example application with Flink’s DataStream API and the broadcast state feature.&lt;/p&gt;

&lt;p&gt;Let’s start with the input data of the application. We have two data streams, actions, and patterns. At this point, we don’t really care where the streams come from. The streams could be ingested from Apache Kafka or Kinesis or any other system. Action and Pattern are Pojos with two fields each:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patterns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;Action&lt;/code&gt; and &lt;code&gt;Pattern&lt;/code&gt; are Pojos with two fields each:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;code&gt;Action: Long userId, String action&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;code&gt;Pattern: String firstAction, String secondAction&lt;/code&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As a first step, we key the action stream on the &lt;code&gt;userId&lt;/code&gt; attribute.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;KeyedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actionsByUser&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actions&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KeySelector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Next, we prepare the broadcast state. Broadcast state is always represented as &lt;code&gt;MapState&lt;/code&gt;, the most versatile state primitive that Flink provides.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcStateDescriptor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;patterns&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;VOID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;POJO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Since our application only evaluates and stores a single &lt;code&gt;Pattern&lt;/code&gt; at a time, we configure the broadcast state as a &lt;code&gt;MapState&lt;/code&gt; with key type &lt;code&gt;Void&lt;/code&gt; and value type &lt;code&gt;Pattern&lt;/code&gt;. The &lt;code&gt;Pattern&lt;/code&gt; is always stored in the &lt;code&gt;MapState&lt;/code&gt; with &lt;code&gt;null&lt;/code&gt; as key.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;BroadcastStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcedPatterns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patterns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;broadcast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bcStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the &lt;code&gt;MapStateDescriptor&lt;/code&gt; for the broadcast state, we apply the &lt;code&gt;broadcast()&lt;/code&gt; transformation on the patterns stream and receive a &lt;code&gt;BroadcastStream bcedPatterns&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;matches&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actionsByUser&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bcedPatterns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PatternEvaluator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After we obtained the keyed &lt;code&gt;actionsByUser&lt;/code&gt; stream and the broadcasted &lt;code&gt;bcedPatterns&lt;/code&gt; stream, we &lt;code&gt;connect()&lt;/code&gt; both streams and apply a &lt;code&gt;PatternEvaluator&lt;/code&gt; on the connected streams. &lt;code&gt;PatternEvaluator&lt;/code&gt; is a custom function that implements the &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; interface. It applies the pattern matching logic that we discussed before and emits &lt;code&gt;Tuple2&amp;lt;Long, Pattern&amp;gt;&lt;/code&gt; records which contain the user id and the matched pattern.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PatternEvaluator&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedBroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
 
  &lt;span class=&quot;c1&quot;&gt;// handle for keyed state (per user)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// broadcast state descriptor&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
 
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// initialize keyed state&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;lastAction&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;patterns&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;VOID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;POJO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;cm&quot;&gt;/**&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;   * Called for each user action.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;   * Evaluates the current pattern against the previous and&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;   * current action of the user.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;   */&lt;/span&gt;
  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
     &lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
     &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;// get current pattern from broadcast state&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;
     &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
     &lt;span class=&quot;c1&quot;&gt;// access MapState with null as VOID default value&lt;/span&gt;
     &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;// get previous action of current user from keyed state&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
     &lt;span class=&quot;c1&quot;&gt;// user had an action before, check if pattern matches&lt;/span&gt;
     &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;firstAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; 
         &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;secondAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;c1&quot;&gt;// MATCH&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getCurrentKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
     &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;// update keyed state and remember action for next pattern evaluation&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

 &lt;span class=&quot;cm&quot;&gt;/**&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;  * Called for each new pattern.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;  * Overwrites the current pattern with the new pattern.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;  */&lt;/span&gt;
 &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
 &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
     &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
     &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;// store the new pattern by updating the broadcast state&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;BroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
   &lt;span class=&quot;c1&quot;&gt;// storing in MapState with null as VOID default value&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;bcState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; interface provides three methods to process records and emit results.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;processBroadcastElement()&lt;/code&gt; is called for each record of the broadcasted stream. In our &lt;code&gt;PatternEvaluator&lt;/code&gt; function, we simply put the received &lt;code&gt;Pattern&lt;/code&gt; record in to the broadcast state using the &lt;code&gt;null&lt;/code&gt; key (remember, we only store a single pattern in the &lt;code&gt;MapState&lt;/code&gt;).&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;processElement()&lt;/code&gt; is called for each record of the keyed stream. It provides read-only access to the broadcast state to prevent modification that result in different broadcast states across the parallel instances of the function. The &lt;code&gt;processElement()&lt;/code&gt; method of the &lt;code&gt;PatternEvaluator&lt;/code&gt; retrieves the current pattern from the broadcast state and the previous action of the user from the keyed state. If both are present, it checks whether the previous and current action match with the pattern and emits a pattern match record if that is the case. Finally, it updates the keyed state to the current user action.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;onTimer()&lt;/code&gt; is called when a previously registered timer fires. Timers can be registered in the &lt;code&gt;processElement&lt;/code&gt; method and are used to perform computations or to clean up state in the future. We did not implement this method in our example to keep the code concise. However, it could be used to remove the last action of a user when the user was not active for a certain period of time to avoid growing state due to inactive users.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You might have noticed the context objects of the &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt;’s processing method. The context objects give access to additional functionality such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The broadcast state (read-write or read-only, depending on the method),&lt;/li&gt;
  &lt;li&gt;A &lt;code&gt;TimerService&lt;/code&gt;, which gives access to the record’s timestamp, the current watermark, and which can register timers,&lt;/li&gt;
  &lt;li&gt;The current key (only available in &lt;code&gt;processElement()&lt;/code&gt;), and&lt;/li&gt;
  &lt;li&gt;A method to apply a function the keyed state of each registered key (only available in &lt;code&gt;processBroadcastElement()&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; has full access to Flink state and time features just like any other ProcessFunction and hence can be used to implement sophisticated application logic. Broadcast state was designed to be a versatile feature that adapts to different scenarios and use cases. Although we only discussed a fairly simple and restricted application, you can use broadcast state in many ways to implement the requirements of your application.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In this blog post, we walked you through an example application to explain what Apache Flink’s broadcast state is and how it can be used to evaluate dynamic patterns on event streams. We’ve also discussed the API and showed the source code of our example application.&lt;/p&gt;

&lt;p&gt;We invite you to check the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html&quot;&gt;documentation&lt;/a&gt; of this feature and provide feedback or suggestions for further improvements through our &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/flink-community/&quot;&gt;mailing list&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Wed, 26 Jun 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/06/26/broadcast-state.html</link>
<guid isPermaLink="true">/2019/06/26/broadcast-state.html</guid>
</item>

<item>
<title>A Deep-Dive into Flink&#39;s Network Stack</title>
<description>&lt;style type=&quot;text/css&quot;&gt;
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
.tg .tg-wide{padding:10px 30px;}
.tg .tg-top{vertical-align:top}
.tg .tg-center{text-align:center;vertical-align:center}
&lt;/style&gt;

&lt;p&gt;Flink’s network stack is one of the core components that make up the &lt;code&gt;flink-runtime&lt;/code&gt; module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akka, the network stack between TaskManagers relies on a much lower-level API using Netty.&lt;/p&gt;

&lt;p&gt;This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning parameters, and common anti-patterns.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#logical-view&quot; id=&quot;markdown-toc-logical-view&quot;&gt;Logical View&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#physical-transport&quot; id=&quot;markdown-toc-physical-transport&quot;&gt;Physical Transport&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-1&quot; id=&quot;markdown-toc-inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#credit-based-flow-control&quot; id=&quot;markdown-toc-credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-2&quot; id=&quot;markdown-toc-inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#what-do-we-gain-where-is-the-catch&quot; id=&quot;markdown-toc-what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#writing-records-into-network-buffers-and-reading-them-again&quot; id=&quot;markdown-toc-writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#flushing-buffers-to-netty&quot; id=&quot;markdown-toc-flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#buffer-builder--buffer-consumer&quot; id=&quot;markdown-toc-buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#latency-vs-throughput&quot; id=&quot;markdown-toc-latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;logical-view&quot;&gt;Logical View&lt;/h2&gt;

&lt;p&gt;Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a &lt;code&gt;keyBy()&lt;/code&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack1.png&quot; width=&quot;400px&quot; alt=&quot;Logical View on Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;It abstracts over the different settings of the following three concepts:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Subtask output type (&lt;code&gt;ResultPartitionType&lt;/code&gt;):
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;pipelined (bounded or unbounded):&lt;/strong&gt;
  Sending data downstream as soon as it is produced, potentially one-by-one, either as a bounded or unbounded stream of records.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;blocking:&lt;/strong&gt;
  Sending data downstream only when the full result was produced.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Scheduling type:
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;all at once (eager):&lt;/strong&gt;
  Deploy all subtasks of the job at the same time (for streaming applications).&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;next stage on first output (lazy):&lt;/strong&gt;
  Deploy downstream tasks as soon as any of their producers generated output.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;next stage on complete output:&lt;/strong&gt;
  Deploy downstream tasks when any or all of their producers have generated their full output set.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;Transport:
    &lt;ul&gt;
      &lt;li&gt;&lt;strong&gt;high throughput:&lt;/strong&gt;
  Instead of sending each record one-by-one, Flink buffers a bunch of records into its network buffers and sends them altogether. This reduces the costs per record and leads to higher throughput.&lt;/li&gt;
      &lt;li&gt;&lt;strong&gt;low latency via buffer timeout:&lt;/strong&gt;
  By reducing the timeout of sending an incompletely filled buffer, you may sacrifice throughput for latency.&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will have a look at the throughput and low latency optimisations in the sections below which look at the physical layers of the network stack. For this part, let us elaborate a bit more on the output and scheduling types. First of all, it is important to know that the subtask output type and the scheduling type are closely intertwined making only specific combinations of the two valid.&lt;/p&gt;

&lt;p&gt;Pipelined result partitions are streaming-style outputs which need a live target subtask to send data to. The target can be scheduled before results are produced or at first output. Batch jobs produce bounded result partitions while streaming jobs produce unbounded results.&lt;/p&gt;

&lt;p&gt;Batch jobs may also produce results in a blocking fashion, depending on the operator and connection pattern that is used. In that case, the complete result must be produced first before the receiving task can be scheduled. This allows batch jobs to work more efficiently and with lower resource usage.&lt;/p&gt;

&lt;p&gt;The following table summarises the valid combinations:
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;Output Type&lt;/th&gt;
    &lt;th&gt;Scheduling Type&lt;/th&gt;
    &lt;th&gt;Applies to…&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td rowspan=&quot;2&quot;&gt;pipelined, unbounded&lt;/td&gt;
    &lt;td&gt;all at once&lt;/td&gt;
    &lt;td&gt;Streaming jobs&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;next stage on first output&lt;/td&gt;
    &lt;td&gt;n/a¹&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td rowspan=&quot;2&quot;&gt;pipelined, bounded&lt;/td&gt;
    &lt;td&gt;all at once&lt;/td&gt;
    &lt;td&gt;n/a²&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;next stage on first output&lt;/td&gt;
    &lt;td&gt;Batch jobs&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;blocking&lt;/td&gt;
    &lt;td&gt;next stage on complete output&lt;/td&gt;
    &lt;td&gt;Batch jobs&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; Currently not used by Flink. &lt;br /&gt;
&lt;sup&gt;2&lt;/sup&gt; This may become applicable to streaming jobs once the &lt;a href=&quot;/roadmap.html#batch-and-streaming-unification&quot;&gt;Batch/Streaming unification&lt;/a&gt; is done.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;
Additionally, for subtasks with more than one input, scheduling start in two ways: after &lt;em&gt;all&lt;/em&gt; or after &lt;em&gt;any&lt;/em&gt; input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-&quot;&gt;ExecutionConfig#setExecutionMode()&lt;/a&gt; - and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionMode.html#enum.constant.detail&quot;&gt;ExecutionMode&lt;/a&gt; in particular - as well as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setDefaultInputDependencyConstraint-org.apache.flink.api.common.InputDependencyConstraint-&quot;&gt;ExecutionConfig#setDefaultInputDependencyConstraint()&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;physical-transport&quot;&gt;Physical Transport&lt;/h2&gt;

&lt;p&gt;In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups&quot;&gt;slot sharing groups&lt;/a&gt;. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.&lt;/p&gt;

&lt;p&gt;For the example pictured below, we will assume a parallelism of 4 and a deployment with two task managers offering 2 slots each. TaskManager 1 executes subtasks A.1, A.2, B.1, and B.2 and TaskManager 2 executes subtasks A.3, A.4, B.3, and B.4. In a shuffle-type connection between task A and task B, for example from a &lt;code&gt;keyBy()&lt;/code&gt;, there are 2x4 logical connections to handle on each TaskManager, some of which are local, some remote:
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;&lt;/th&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;B.1&lt;/th&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;B.2&lt;/th&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;B.3&lt;/th&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;B.4&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;A.1&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;A.2&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;A.3&lt;/th&gt;
    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;th class=&quot;tg-wide&quot;&gt;A.4&lt;/th&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Each (remote) network connection between different tasks will get its own TCP channel in Flink’s network stack. However, if different subtasks of the same task are scheduled onto the same TaskManager, their network connections towards the same TaskManagers will be multiplexed and share a single TCP channel for reduced resource usage. In our example, this would apply to A.1 → B.3, A.1 → B.4, as well as A.2 → B.3, and A.2 → B.4 as pictured below:
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack2.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The results of each subtask are called &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultPartition.html&quot;&gt;ResultPartition&lt;/a&gt;, each split into separate &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.html&quot;&gt;ResultSubpartitions&lt;/a&gt; — one for each logical channel. At this point in the stack, Flink is not dealing with individual records anymore but instead with a group of serialised records assembled together into network buffers. The number of buffers available to each subtask in its own local buffer pool (one per sending and receiving side each) is limited to at most&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;#channels * buffers-per-channel + floating-buffers-per-gate
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The total number of buffers on a single TaskManager usually does not need configuration. See the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers&quot;&gt;Configuring the Network Buffers&lt;/a&gt; documentation for details on how to do so if needed.&lt;/p&gt;

&lt;h3 id=&quot;inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/h3&gt;

&lt;p&gt;Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition’s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask’s buffer pool, Flink will stop reading from this channel until a buffer becomes available. This would effectively backpressure all sending subtasks on this multiplex and therefore also throttle other receiving subtasks. The following picture illustrates this for an overloaded subtask B.4 which would cause backpressure on the multiplex and also stop subtask B.3 from receiving and processing further buffers, even though it still has capacity.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack3.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-backpressure-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;To prevent this situation from even happening, Flink 1.5 introduced its own flow control mechanism.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/h2&gt;

&lt;p&gt;Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of &lt;strong&gt;exclusive buffers&lt;/strong&gt;. Conversely, buffers in the local buffer pool are called &lt;strong&gt;floating buffers&lt;/strong&gt; as they will float around and are available to every input channel.&lt;/p&gt;

&lt;p&gt;Receivers will announce the availability of buffers as &lt;strong&gt;credits&lt;/strong&gt; to the sender (1 buffer = 1 credit). Each result subpartition will keep track of its &lt;strong&gt;channel credits&lt;/strong&gt;. Buffers are only forwarded to the lower network stack if credit is available and each sent buffer reduces the credit score by one. In addition to the buffers, we also send information about the current &lt;strong&gt;backlog&lt;/strong&gt; size which specifies how many buffers are waiting in this subpartition’s queue. The receiver will use this to ask for an appropriate number of floating buffers for faster backlog processing. It will try to acquire as many floating buffers as the backlog size but this may not always be possible and we may get some or no buffers at all. The receiver will make use of the retrieved buffers and will listen for further buffers becoming available to continue.
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack4.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-credit-flow-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Credit-based flow control will use &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; to specify how many buffers are exclusive (mandatory) and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate&quot;&gt;floating-buffers-per-gate&lt;/a&gt; for the local buffer pool (optional&lt;sup&gt;3&lt;/sup&gt;) thus achieving the same buffer limit as without flow control. The default values for these two parameters have been chosen so that the maximum (theoretical) throughput with flow control is at least as good as without flow control, given a healthy network with usual latencies. You may need to adjust these depending on your actual round-trip-time and bandwidth.
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;3&lt;/sup&gt;If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).&lt;/p&gt;

&lt;h3 id=&quot;inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/h3&gt;

&lt;p&gt;As opposed to the receiver’s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.&lt;/p&gt;

&lt;h3 id=&quot;what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/h3&gt;

&lt;p&gt;&lt;img align=&quot;right&quot; src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack5.png&quot; width=&quot;300&quot; height=&quot;200&quot; alt=&quot;Physical-transport-credit-flow-checkpoints-Flink&#39;s Network Stack&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/internals/stream_checkpointing.html#checkpointing&quot;&gt;checkpoint alignments&lt;/a&gt;: without flow control, it would take a while for the channel to fill the network stack’s internal buffers and propagate that the receiver is not reading anymore. During that time, a lot of buffers could be sitting around. Any checkpoint barrier would have to queue up behind these buffers and would thus have to wait until all of those have been processed before it can start (“Barriers never overtake records!”).&lt;/p&gt;

&lt;p&gt;However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While this may affect your job’s performance, it is usually better to have flow control because of all its advantages. You may want to increase the number of exclusive buffers via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; at the cost of using more memory. The overall memory use compared to the previous implementation, however, may still be lower because lower network stacks do not need to buffer much data any more since we can always transfer that to Flink immediately.&lt;/p&gt;

&lt;p&gt;There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate&quot;&gt;floating-buffers-per-gate&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;Advantages&lt;/th&gt;
    &lt;th&gt;Disadvantages&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;tg-top&quot;&gt;
    • better resource utilisation with data skew in multiplexed connections &lt;br /&gt;&lt;br /&gt;
    • improved checkpoint alignment&lt;br /&gt;&lt;br /&gt;
    • reduced memory use (less data in lower network layers)&lt;/td&gt;
    &lt;td class=&quot;tg-top&quot;&gt;
    • additional credit-announce messages&lt;br /&gt;&lt;br /&gt;
    • additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)&lt;br /&gt;&lt;br /&gt;
    • potential round-trip latency&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot;&gt;• backpressure appears earlier&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot;&gt;
  &lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
If you need to turn off credit-based flow control, you can add this to your &lt;code&gt;flink-conf.yaml&lt;/code&gt;:&lt;/p&gt;

  &lt;p&gt;&lt;code&gt;taskmanager.network.credit-model: false&lt;/code&gt;&lt;/p&gt;

  &lt;p&gt;This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/h2&gt;

&lt;p&gt;The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it:
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack6.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-complete-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;After creating a record and passing it along, for example via &lt;code&gt;Collector#collect()&lt;/code&gt;, it is given to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.html&quot;&gt;RecordWriter&lt;/a&gt; which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter first serialises the record to a flexible on-heap byte array using the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/serialization/SpanningRecordSerializer.html&quot;&gt;SpanningRecordSerializer&lt;/a&gt;. Afterwards, it tries to write these bytes into the associated network buffer of the target network channel. We will come back to this last part in the section below.&lt;/p&gt;

&lt;p&gt;On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html&quot;&gt;RecordReader&lt;/a&gt; and going through the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/serialization/SpillingAdaptiveSpanningRecordDeserializer.html&quot;&gt;SpillingAdaptiveSpanningRecordDeserializer&lt;/a&gt;. Similar to the serialiser, this deserialiser must also deal with special cases like records spanning multiple network buffers, either because the record is just bigger than a network buffer (32KiB by default, set via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-segment-size&quot;&gt;taskmanager.memory.segment-size&lt;/a&gt;) or because the serialised record was added to a network buffer which did not have enough remaining bytes. Flink will nevertheless use these bytes and continue writing the rest to a new network buffer.
&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/h3&gt;

&lt;p&gt;In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication and synchronisation, but also make the whole buffering obsolete.&lt;/p&gt;

&lt;p&gt;In Flink, there are three situations that make a buffer available for consumption by the Netty server:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;a buffer becomes full when writing a record to it, or&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;the buffer timeout hits, or&lt;br /&gt;&lt;/li&gt;
  &lt;li&gt;a special event such as a checkpoint barrier is sent.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;flush-after-buffer-full&quot;&gt;Flush after Buffer Full&lt;/h4&gt;

&lt;p&gt;The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described above. This is a classical producer-consumer pattern with the network buffers in the middle and as shown by the next picture. After (1) serialising and (2) writing data to the buffer, the RecordWriter updates the buffer’s writer index accordingly. Once the buffer is completely filled, the record writer will (3) acquire a new buffer from its local buffer pool for any remaining bytes of the current record - or for the next one - and add the new one to the subpartition queue. This will (4) notify the Netty server of data being available if it is not aware yet&lt;sup&gt;4&lt;/sup&gt;. Whenever Netty has capacity to handle this notification, it will (5) take the buffer and send it along the appropriate TCP channel.
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack7.png&quot; width=&quot;500px&quot; alt=&quot;Record-writer-to-network-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;4&lt;/sup&gt;We can assume it already got the notification if there are more finished buffers in the queue.
&lt;br /&gt;&lt;/p&gt;

&lt;h4 id=&quot;flush-after-buffer-timeout&quot;&gt;Flush after Buffer Timeout&lt;/h4&gt;

&lt;p&gt;In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.html#setBufferTimeout-long-&quot;&gt;StreamExecutionEnvironment#setBufferTimeout&lt;/a&gt; and acts as an upper bound on the latency&lt;sup&gt;5&lt;/sup&gt; (for low-throughput channels). The following picture shows how it interacts with the other components: the RecordWriter serialises and writes into network buffers as before but concurrently, the output flusher may (3,4) notify the Netty server of data being available if Netty is not already aware (similar to the “buffer full” scenario above). When Netty handles this notification (5) it will consume the available data from the buffer and update the buffer’s reader index. The buffer stays in the queue - any further operation on this buffer from the Netty server side will continue reading from the reader index next time.
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack8.png&quot; width=&quot;500px&quot; alt=&quot;Record-writer-to-network-with-flusher-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;5&lt;/sup&gt;Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured.
&lt;br /&gt;&lt;/p&gt;

&lt;h4 id=&quot;flush-after-special-event&quot;&gt;Flush after special event&lt;/h4&gt;

&lt;p&gt;Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in.
&lt;br /&gt;&lt;/p&gt;

&lt;h4 id=&quot;further-remarks&quot;&gt;Further remarks&lt;/h4&gt;

&lt;p&gt;In contrast to Flink &amp;lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;less synchronisation overhead (output flusher and RecordWriter are independent)&lt;/li&gt;
  &lt;li&gt;in high-load scenarios where Netty is the bottleneck (either through backpressure or directly), we can still accumulate data in incomplete buffers&lt;/li&gt;
  &lt;li&gt;significant reduction of Netty notifications&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any &lt;em&gt;available&lt;/em&gt; CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead.
&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/h3&gt;

&lt;p&gt;If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html&quot;&gt;BufferBuilder&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html&quot;&gt;BufferConsumer&lt;/a&gt; classes which have been introduced in Flink 1.5. While reading is potentially only &lt;em&gt;per buffer&lt;/em&gt;, writing to it is &lt;em&gt;per record&lt;/em&gt; and thus on the hot path for all network communication in Flink. Therefore, it was very clear to us that we needed a lightweight connection between the task’s thread and the Netty thread which does not imply too much synchronisation overhead. For further details, we suggest to check out the &lt;a href=&quot;https://github.com/apache/flink/tree/release-1.8/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer&quot;&gt;source code&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/h2&gt;

&lt;p&gt;Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record) to 100ms (the default) and shows the resulting throughput rates on a cluster with 100 nodes and 8 slots each running a job that has no business logic and thus only tests the network stack. For comparison, we also plot Flink 1.4 before the low-latency improvements (as described above) were added.
&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack9.png&quot; width=&quot;650px&quot; alt=&quot;Network-buffertimeout-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;As you can see, with Flink 1.5+, even very low buffer timeouts such as 1ms (for low-latency scenarios) provide a maximum throughput as high as 75% of the default timeout where more data is buffered before being sent over the wire.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and common antipatterns to avoid. Stay tuned for more.&lt;/p&gt;

</description>
<pubDate>Wed, 05 Jun 2019 10:45:00 +0200</pubDate>
<link>https://flink.apache.org/2019/06/05/flink-network-stack.html</link>
<guid isPermaLink="true">/2019/06/05/flink-network-stack.html</guid>
</item>

<item>
<title>State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</title>
<description>&lt;p&gt;A common requirement for many stateful streaming applications is to automatically cleanup application state for effective management of your state size, or to control how long the application state can be accessed (e.g. due to legal regulations like the GDPR). The state time-to-live (TTL) feature was initiated in Flink 1.6.0 and enabled application state cleanup and efficient state size management in Apache Flink.&lt;/p&gt;

&lt;p&gt;In this post, we motivate the State TTL feature and discuss its use cases. Moreover, we show how to use and configure it. We explain how Flink internally manages state with TTL and present some exciting additions to the feature in Flink 1.8.0. The blog post concludes with an outlook on future improvements and extensions.&lt;/p&gt;

&lt;h1 id=&quot;the-transient-nature-of-state&quot;&gt;The Transient Nature of State&lt;/h1&gt;

&lt;p&gt;There are two major reasons why state should be maintained only for a limited time. For example, let’s imagine a Flink application that ingests a stream of user login events and stores for each user the time of the last login to improve the experience of frequent visitors.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Controlling the size of state.&lt;/strong&gt;
Being able to efficiently manage an ever-growing state size is a primary use case for state TTL. Oftentimes, data needs to be persisted temporarily while there is some user activity around it, e.g. web sessions. When the activity ends there is no longer interest in that data while it still occupies storage. Flink 1.8.0 introduces background cleanup of old state based on TTL that makes the eviction of no-longer-necessary data frictionless. Previously, the application developer had to take extra actions and explicitly remove useless state to free storage space. This manual clean up procedure was not only error prone but also less efficient than the new lazy method to remove state. Following our previous example of storing the time of the last login, this might not be necessary after some time because the user can be treated as “infrequent” later on.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Complying with data protection and sensitive data requirements.&lt;/strong&gt;
Recent developments around data privacy regulations, such as the General Data Protection Regulation (GDPR) introduced by the European Union, make compliance with such data requirements or treating sensitive data a top priority for many use cases and applications. An example of such use cases includes applications that require keeping data for a specific timeframe and preventing access to it thereafter. This is a common challenge for companies providing short-term services to their customers. The state TTL feature gives guarantees for how long an application can access state and hence can help to comply with data protection regulations.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Both requirements can be addressed by a feature that periodically, yet continuously, removes the state for a key once it becomes unnecessary or unimportant and there is no requirement to keep it in storage any more.&lt;/p&gt;

&lt;h1 id=&quot;state-ttl-for-continuous-cleanup-of-application-state&quot;&gt;State TTL for continuous cleanup of application state&lt;/h1&gt;

&lt;p&gt;The 1.6.0 release of Apache Flink introduced the State TTL feature. It enabled developers of stream processing applications to configure the state of operators to expire and be cleaned up after a defined timeout (time-to-live). In Flink 1.8.0 the feature was extended, including continuous cleanup of old entries for both the RocksDB and the heap state backends (FSStateBackend and MemoryStateBackend), enabling a continuous cleanup process of old entries (according to the TTL setting).&lt;/p&gt;

&lt;p&gt;In Flink’s DataStream API, application state is defined by a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/state.html#using-managed-keyed-state&quot;&gt;state descriptor&lt;/a&gt;. State TTL is configured by passing a &lt;code&gt;StateTtlConfiguration&lt;/code&gt; object to a state descriptor. The following Java example shows how to create a state TTL configuration and provide it to the state descriptor that holds the last login time of a user as a &lt;code&gt;Long&lt;/code&gt; value:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.state.StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.time.Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.state.ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setUpdateType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;UpdateType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;OnCreateAndWrite&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setStateVisibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;StateVisibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;NeverReturnExpired&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    
&lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lastUserLogin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;lastUserLogin&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;lastUserLogin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;enableTimeToLive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Flink provides multiple options to configure the behavior of the state TTL functionality.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;When is the Time-to-Live reset?&lt;/strong&gt; 
By default, the expiration time of a state entry is updated when the state is modified. Optionally, it can also be updated on read access at the cost of an additional write operation to update the timestamp.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Can the expired state be accessed one last time?&lt;/strong&gt; 
State TTL employs a lazy strategy to clean up expired state. This can lead to the situation that an application attempts to read state which is expired but hasn’t been removed yet. You can configure whether such a read request returns the expired state or not. In either case, the expired state is immediately removed afterwards. While the option of returning expired state favors data availability, not returning expired state can be required for data protection regulations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Which time semantics are used for the Time-to-Live timers?&lt;/strong&gt; 
With Flink 1.8.0, users can only define a state TTL in terms of processing time. The support for event time is planned for future Apache Flink releases.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can read more about how to use state TTL in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl&quot;&gt;Apache Flink documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Internally, the State TTL feature is implemented by storing an additional timestamp of the last relevant state access, along with the actual state value. While this approach adds some storage overhead, it allows Flink to check for the expired state during state access, checkpointing, recovery, or dedicated storage cleanup procedures.&lt;/p&gt;

&lt;h1 id=&quot;taking-out-the-garbage&quot;&gt;“Taking out the Garbage”&lt;/h1&gt;

&lt;p&gt;When a state object is accessed in a read operation, Flink will check its timestamp and clear the state if it is expired (depending on the configured state visibility, the expired state is returned or not). Due to this lazy removal, expired state that is never accessed again will forever occupy storage space unless it is garbage collected.&lt;/p&gt;

&lt;p&gt;So how can the expired state be removed without the application logic explicitly taking care of it? In general, there are different possible strategies to remove it in the background.&lt;/p&gt;

&lt;h2 id=&quot;keep-full-state-snapshots-clean&quot;&gt;Keep full state snapshots clean&lt;/h2&gt;

&lt;p&gt;Flink 1.6.0 already supported automatic eviction of the expired state when a full snapshot for a checkpoint or savepoint is taken. Note that state eviction is not applied for incremental checkpoints. State eviction on full snapshots must be explicitly enabled as shown in the following example:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupFullSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The local storage stays untouched but the size of the stored snapshot is reduced. The local state of an operator will only be cleaned up when the operator reloads its state from a snapshot, i.e. in case of recovery or when starting from a savepoint.&lt;/p&gt;

&lt;p&gt;Due to these limitations, applications still need to actively remove state after it expired in Flink 1.6.0. To improve the user experience, Flink 1.8.0 introduces two more autonomous cleanup strategies, one for each of Flink’s two state backend types. We describe them below.&lt;/p&gt;

&lt;h2 id=&quot;incremental-cleanup-in-heap-state-backends&quot;&gt;Incremental cleanup in Heap state backends&lt;/h2&gt;

&lt;p&gt;This approach is specific to the Heap state backends (FSStateBackend and MemoryStateBackend). The idea is that the storage backend keeps a lazy global iterator over all state entries. Certain events, for instance state access, trigger an incremental cleanup. Every time an incremental cleanup is triggered, the iterator is advanced. The traversed state entries are checked and expired once are removed. The following code example shows how to enable incremental cleanup:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// check 10 keys for every state access&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupIncrementally&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If enabled, every state access triggers a cleanup step. For every clean up step, a certain number of state entries are checked for expiration. There are two tuning parameters. The first defines the number of state entries to check for each cleanup step. The second parameter is a flag to trigger a cleanup step after each processed record, additionally to each state access.&lt;/p&gt;

&lt;p&gt;There are two important caveats about this approach: 
* The first one is that the time spent for the incremental cleanup increases the record processing latency.
* The second one should be practically negligible but still worth mentioning: if no state is accessed or no records are processed, expired state won’t be removed.&lt;/p&gt;

&lt;h2 id=&quot;rocksdb-background-compaction-to-filter-out-expired-state&quot;&gt;RocksDB background compaction to filter out expired state&lt;/h2&gt;

&lt;p&gt;If your application uses the RocksDB state backend, you can enable another cleanup strategy which is based on a Flink specific compaction filter. RocksDB periodically runs asynchronous compactions to merge state updates and reduce storage. The Flink compaction filter checks the expiration timestamp of state entries with TTL and discards all expired values.&lt;/p&gt;

&lt;p&gt;The first step to activate this feature is to configure the RocksDB state backend by setting the following Flink configuration option: &lt;code&gt;state.backend.rocksdb.ttl.compaction.filter.enabled&lt;/code&gt;. Once the RocksDB state backend is configured, the compaction cleanup strategy is enabled for a state as shown in the following code example:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupInRocksdbCompactFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Keep in mind that calling the Flink TTL filter slows down the RocksDB compaction.&lt;/p&gt;

&lt;h2 id=&quot;eager-state-cleanup-with-timers&quot;&gt;Eager State Cleanup with Timers&lt;/h2&gt;

&lt;p&gt;Another way to manually cleanup state is based on Flink timers. This is an idea that the community is currently evaluating for future releases. With this approach, a cleanup timer is registered for every state access. This approach is more predictable because state is eagerly removed as soon as it expires. However, it is more expensive because the timers consume storage along with the original state.&lt;/p&gt;

&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;

&lt;p&gt;Apart from including the timer-based cleanup strategy, mentioned above, the Flink community has plans to further improve the state TTL feature. The possible improvements include adding support of TTL for event time scale (only processing time is supported at the moment) and enabling State TTL for queryable state.&lt;/p&gt;

&lt;p&gt;We encourage you to join the conversation and share your thoughts and ideas in the &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;Apache Flink JIRA board&lt;/a&gt; or by subscribing to the Apache Flink dev mailing list. Feedback or suggestions are always appreciated and we look forward to hearing your thoughts on the Flink mailing lists.&lt;/p&gt;

&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;

&lt;p&gt;Time-based state access restrictions and controlling the size of application state are common challenges in the world of stateful stream processing. Flink’s 1.8.0 release significantly improves the State TTL feature by adding support for continuous background cleanup of expired state objects. The new clean up mechanisms relieve you from manually implementing state cleanup. They are also more efficient due to their lazy nature. State TTL gives you control over the size of your application state so that you can focus on the core logic of your applications.&lt;/p&gt;
</description>
<pubDate>Sun, 19 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/19/state-ttl.html</link>
<guid isPermaLink="true">/2019/05/19/state-ttl.html</guid>
</item>

<item>
<title>Flux capacitor, huh? Temporal Tables and Joins in Streaming SQL</title>
<description>&lt;p&gt;Figuring out how to manage and model temporal data for effective point-in-time analysis was a longstanding battle, dating as far back as the early 80’s, that culminated with the introduction of temporal tables in the SQL standard in 2011. Up to that point, users were doomed to implement this as part of the application logic, often hurting the length of the development lifecycle as well as the maintainability of the code. And, although there isn’t a single, commonly accepted definition of &lt;strong&gt;temporal data&lt;/strong&gt;, the challenge it represents is one and the same: how do we validate or enrich data against dynamically changing, historical datasets?&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables1.png&quot; width=&quot;500px&quot; alt=&quot;Taxi Fares and Conversion Rates&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;For example:&lt;/strong&gt; given a stream with Taxi Fare events tied to the local currency of the ride location, we might want to convert the fare price to a common currency for further processing. As conversion rates excel at fluctuating over time, each Taxi Fare event would need to be matched to the rate that was valid at the time the event occurred in order to produce a reliable result.&lt;/p&gt;

&lt;h2 id=&quot;modelling-temporal-data-with-flink&quot;&gt;Modelling Temporal Data with Flink&lt;/h2&gt;

&lt;p&gt;In the 1.7 release, Flink has introduced the concept of &lt;strong&gt;temporal tables&lt;/strong&gt; into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and keep data closely tied to time context, so that it can be interpreted as valid only within a specific period of time. Transforming a stream into a temporal table requires:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Defining a &lt;strong&gt;primary key&lt;/strong&gt; and a &lt;strong&gt;versioning field&lt;/strong&gt; that can be used to keep track of the changes that happen over time;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Exposing the stream as a &lt;strong&gt;temporal table function&lt;/strong&gt; that maps each point in time to a static relation.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Going back to our example use case, a temporal table is just what we need to model the conversion rate data such as to make it useful for point-in-time querying. Temporal table functions are implemented as an extension of Flink’s generic &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/udfs.html#table-functions&quot;&gt;table function&lt;/a&gt; class and can be defined in the same straightforward way to be used with the Table API or SQL parser.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.table.functions.TemporalTableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
 
&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;
 
&lt;span class=&quot;c1&quot;&gt;// Get the stream and table environments.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;c1&quot;&gt;// Provide a sample static data set of the rates history table.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
 
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;USD&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;102L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt; 
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;EUR&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt; 
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;YEN&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt; 
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;EUR&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;116L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt; 
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;USD&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;105L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
 
&lt;span class=&quot;c1&quot;&gt;// Create and register an example table using the sample data set.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistoryStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromCollection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;r_currency, r_rate, r_proctime.proctime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;RatesHistory&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;c1&quot;&gt;// Create and register the temporal table function &amp;quot;rates&amp;quot;.&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Define &amp;quot;r_proctime&amp;quot; as the versioning field and &amp;quot;r_currency&amp;quot; as the primary key.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemporalTableFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rates&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTemporalTableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;r_proctime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;r_currency&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Rates&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rates&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
 
&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;What does this &lt;strong&gt;Rates&lt;/strong&gt; function do, in practice? Imagine we would like to check what the conversion rates looked like at a given time — say, 11:00. We could simply do something like:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;11:00&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables2.png&quot; width=&quot;650px&quot; alt=&quot;Point-in-time Querying&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Even though Flink does not yet support querying temporal table functions with a constant time attribute parameter, these functions can be used to cover a much more interesting scenario: temporal table joins.&lt;/p&gt;

&lt;h2 id=&quot;streaming-joins-using-temporal-tables&quot;&gt;Streaming Joins using Temporal Tables&lt;/h2&gt;

&lt;p&gt;Temporal tables reach their full potential when used in combination — erm, joined — with streaming data, for instance to power applications that must continuously whitelist against a reference dataset that changes over time for auditing or regulatory compliance. While efficient joins have long been an enduring challenge for query processors due to computational cost and resource consumption, joins over streaming data carry some additional challenges:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The &lt;strong&gt;unbounded&lt;/strong&gt; nature of streams means that inputs are continuously evaluated and intermediate join results can consume memory resources indefinitely. Flink gracefully manages its memory consumption out-of-the-box (even for heavier cases where joins require spilling to disk) and supports time-windowed joins to bound the amount of data that needs to be kept around as state;&lt;/li&gt;
  &lt;li&gt;Streaming data might be &lt;strong&gt;out-of-order&lt;/strong&gt; and &lt;strong&gt;late&lt;/strong&gt;, so it is not possible to enforce an ordering upfront and time handling requires some thinking to avoid unnecessary outputs and retractions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the particular case of temporal data, time-windowed joins are not enough (well, at least not without getting into some expensive tweaking): sooner or later, each reference record will fall outside of the window and be wiped from state, no longer being considered for future join results. To address this limitation, Flink has introduced support for temporal table joins to cover time-varying relations.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables3.png&quot; width=&quot;500px&quot; alt=&quot;Temporal Table Join between Taxi Fares and Conversion Rates&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Each record from the append-only table on the probe side (&lt;code&gt;Taxi Fare&lt;/code&gt;) is joined with the version of the record from the temporal table on the build side (&lt;code&gt;Conversion Rate&lt;/code&gt;) that most closely matches the probe side record time attribute (&lt;code&gt;time&lt;/code&gt;) for the same value of the primary key (&lt;code&gt;currency&lt;/code&gt;). Remember the temporal table function (&lt;code&gt;Rates&lt;/code&gt;) we registered earlier? It can now be used to express this join as a simple SQL statement that would otherwise require a heavier statement with a subquery.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables4.png&quot; width=&quot;700px&quot; alt=&quot;Regular Join vs. Temporal Table Join&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Temporal table joins support both &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#processing-time-temporal-joins&quot;&gt;processing&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#event-time-temporal-joins&quot;&gt;event time&lt;/a&gt; semantics and effectively limit the amount of data kept in state while also allowing records on the build side to be arbitrarily old, as opposed to time-windowed joins. Probe-side records only need to be kept in state for a very short time to ensure correct semantics in presence of out-of-order records. The challenges mentioned in the beginning of this section are overcome by:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Narrowing the &lt;strong&gt;scope&lt;/strong&gt; of the join: only the time-matching version of &lt;code&gt;ratesHistory&lt;/code&gt; is visible for a given &lt;code&gt;taxiFare.time&lt;/code&gt;;&lt;/li&gt;
  &lt;li&gt;Pruning &lt;strong&gt;unneeded records&lt;/strong&gt; from state: for cases using event time, records between current time and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html#event-time-and-watermarks&quot;&gt;watermark&lt;/a&gt; delay are persisted for both the probe and build side. These are discarded as soon as the watermark arrives and the results are emitted — allowing the join operation to move forward in time and the build table to “refresh” its version in state.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;All this means it is now possible to express continuous stream enrichment in relational and time-varying terms using Flink without dabbling into syntactic patchwork or compromising performance. In other words: stream time-travelling minus the flux capacitor. Extending this syntax to batch processing for enriching historic data with proper (event) time semantics is also part of the Flink roadmap!&lt;/p&gt;

&lt;p&gt;If you’d like to get some &lt;strong&gt;hands-on practice in joining streams with Flink SQL&lt;/strong&gt; (and Flink SQL in general), checkout this &lt;a href=&quot;https://github.com/ververica/sql-training/wiki&quot;&gt;free training for Flink SQL&lt;/a&gt;. The training environment is based on Docker and set up in just a few minutes.&lt;/p&gt;

&lt;p&gt;Subscribe to the &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;Apache Flink mailing lists&lt;/a&gt; to stay up-to-date with the latest developments in this space.&lt;/p&gt;
</description>
<pubDate>Tue, 14 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/14/temporal-tables.html</link>
<guid isPermaLink="true">/2019/05/14/temporal-tables.html</guid>
</item>

<item>
<title>When Flink &amp; Pulsar Come Together</title>
<description>&lt;p&gt;The open source data technology frameworks &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://pulsar.apache.org/en/&quot;&gt;Apache Pulsar&lt;/a&gt; can integrate in different ways to provide elastic data processing at large scale. I recently gave a talk at &lt;a href=&quot;https://www.flink-forward.org/&quot;&gt;Flink Forward&lt;/a&gt; San Francisco 2019 and presented some of the integrations between the two frameworks for batch and streaming applications. In this post, I will give a short introduction to Apache Pulsar and its differentiating elements from other messaging systems and describe the ways that Pulsar and Flink can work together to provide a seamless developer experience for elastic data processing at scale.&lt;/p&gt;

&lt;h2 id=&quot;a-brief-introduction-to-apache-pulsar&quot;&gt;A brief introduction to Apache Pulsar&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://pulsar.apache.org/en/&quot;&gt;Apache Pulsar&lt;/a&gt; is an open-source distributed pub-sub messaging system under the stewardship of the &lt;a href=&quot;https://www.apache.org/&quot;&gt;Apache Software Foundation&lt;/a&gt;. Pulsar is a multi-tenant, high-performance solution for server-to-server messaging including multiple features such as native support for multiple clusters in a Pulsar instance, with seamless &lt;a href=&quot;https://pulsar.apache.org/docs/en/administration-geo&quot;&gt;geo-replication&lt;/a&gt; of messages across clusters, very low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with &lt;a href=&quot;https://pulsar.apache.org/docs/en/concepts-architecture-overview#persistent-storage&quot;&gt;persistent message storage&lt;/a&gt; provided by &lt;a href=&quot;https://bookkeeper.apache.org/&quot;&gt;Apache BookKeeper&lt;/a&gt; among others. Let’s now discuss the primary differentiators between Pulsar and other pub-sub messaging frameworks:&lt;/p&gt;

&lt;p&gt;The first differentiating factor stems from the fact that although Pulsar provides a flexible pub-sub messaging system it is also backed by durable log storage — hence combining both messaging and storage under one framework. Because of that layered architecture, Pulsar provides instant failure recovery, independent scalability and balance-free cluster expansion.&lt;/p&gt;

&lt;p&gt;Pulsar’s architecture follows a similar pattern to other pub-sub systems as the framework is organized in topics as the main data entity, with producers sending data to, and consumers receiving data from a topic as shown in the diagram below.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-1.png&quot; width=&quot;400px&quot; alt=&quot;Pulsar producers and consumers&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The second differentiator of Pulsar is that the framework is built from the get-go with &lt;a href=&quot;https://pulsar.apache.org/docs/en/concepts-multi-tenancy/&quot;&gt;multi-tenancy&lt;/a&gt; in mind. What that means is that each Pulsar topic has a hierarchical management structure making the allocation of resources as well as the resource management and coordination between teams efficient and easy. With Pulsar’s multi-tenancy structure, data platform maintainers can onboard new teams with no friction as Pulsar provides resource isolation at the property (tenant), namespace or topic level, while at the same time data can be shared across the cluster for easy collaboration and coordination.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-2.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Finally, Pulsar’s flexible messaging framework unifies the streaming and queuing data consumption models and provides greater flexibility. As shown in the below diagram, Pulsar holds the data in the topic while multiple teams can consume the data independently depending on their workloads and data consumption patterns.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-3.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;pulsars-view-on-data-segmented-data-streams&quot;&gt;Pulsar’s view on data: Segmented data streams&lt;/h2&gt;

&lt;p&gt;Apache Flink is a streaming-first computation framework that perceives &lt;a href=&quot;/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;batch processing as a special case of streaming&lt;/a&gt;. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.&lt;/p&gt;

&lt;p&gt;Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of &lt;em&gt;Segmented Streams&lt;/em&gt; and bounded data stream for batch and static workloads.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-4.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;With Pulsar, once a producer sends data to a topic, it is partitioned depending on the data traffic and then further segmented under those partitions — using Apache Bookkeeper as segment store —  to allow for parallel data processing as illustrated in the diagram below. This allows a combination of traditional pub-sub messaging and distributed parallel computations in one framework.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-5.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;when-flink--pulsar-come-together&quot;&gt;When Flink + Pulsar come together&lt;/h2&gt;

&lt;p&gt;Apache Flink and Apache Pulsar integrate in multiple ways already. In the following sections, I will present some potential future integrations between the frameworks and share examples of existing ways in which you can utilize the frameworks together.&lt;/p&gt;

&lt;h3 id=&quot;potential-integrations&quot;&gt;Potential Integrations&lt;/h3&gt;

&lt;p&gt;Pulsar can integrate with Apache Flink in different ways. Some potential integrations include providing support for streaming workloads with the use of &lt;em&gt;Streaming Connectors&lt;/em&gt; and support for batch workloads with the use of &lt;em&gt;Batch Source Connectors&lt;/em&gt;. Pulsar also comes with native support for schema that can integrate with Flink and provide structured access to the data, for example by using Flink SQL as a way of querying data in Pulsar. Finally, an alternative way of integrating the technologies could include using Pulsar as a state backend with Flink. Since Pulsar has a layered architecture (&lt;em&gt;Streams&lt;/em&gt; and &lt;em&gt;Segmented Streams&lt;/em&gt;, powered by Apache Bookkeeper), it becomes natural to use Pulsar as a storage layer and store Flink state.&lt;/p&gt;

&lt;p&gt;From an architecture point of view, we can imagine the integration between the two frameworks as one that uses Apache Pulsar for a unified view of the data layer and Apache Flink as a unified computation and data processing framework and API.&lt;/p&gt;

&lt;h3 id=&quot;existing-integrations&quot;&gt;Existing Integrations&lt;/h3&gt;

&lt;p&gt;Integration between the two frameworks is ongoing and developers can already use Pulsar with Flink in multiple ways. For example, Pulsar can be used as a streaming source and streaming sink in Flink DataStream applications. Developers can ingest data from Pulsar into a Flink job that makes computations and processes real-time data, to then send the data back to a Pulsar topic as a streaming sink. Such an example is shown below:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create and configure Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SimpleStringSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subscriptionName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;subscription&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ingest DataStream with Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// perform computation on DataStream (here a simple WordCount)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
       &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduce&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReduceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;
       &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// emit result via Pulsar producer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkPulsarProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UTF_8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Another integration between the two frameworks that developers can take advantage of includes using Pulsar as both a streaming source and a streaming table sink for Flink SQL or Table API queries as shown in the example below:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// obtain a DataStream with words&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// register DataStream as Table &amp;quot;words&amp;quot; with two attributes (&amp;quot;word&amp;quot;, &amp;quot;ts&amp;quot;). &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;//   &amp;quot;ts&amp;quot; is an event-time timestamp.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;words&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;word, ts.rowtime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// create a TableSink that produces to Pulsar&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TableSink&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sink&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PulsarJsonTableSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;ROUTING_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// register Pulsar TableSink as table &amp;quot;wc&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;&amp;quot;wc&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;sink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;configure&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;cnt&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;},&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;LONG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;}));&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// count words per 5 seconds and write result to table &amp;quot;wc&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO wc &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT word, COUNT(*) AS cnt &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;&amp;quot;FROM words &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
   &lt;span class=&quot;s&quot;&gt;&amp;quot;GROUP BY word, TUMBLE(ts, INTERVAL &amp;#39;5&amp;#39; SECOND)&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Finally, Flink integrates with Pulsar for batch workloads as a batch sink where all results get pushed to Pulsar after Apache Flink has completed the computation in a static data set. Such an example is shown below:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// obtain DataSet from arbitrary computation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// create PulsarOutputFormat instance&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OutputFormat&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pulsarOutputFormat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PulsarOutputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
   &lt;span class=&quot;n&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
   &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; 
   &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// write DataSet to Pulsar&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pulsarOutputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be &lt;em&gt;“streaming-first”&lt;/em&gt; with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://lists.apache.org/list.html?dev@pulsar.apache.org&quot;&gt;Apache Pulsar&lt;/a&gt; mailing lists to stay up-to-date with the latest developments in this space or share your thoughts and recommendations with both communities.&lt;/p&gt;
</description>
<pubDate>Fri, 03 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/03/pulsar-flink.html</link>
<guid isPermaLink="true">/2019/05/03/pulsar-flink.html</guid>
</item>

<item>
<title>Apache Flink&#39;s Application to Season of Docs</title>
<description>&lt;p&gt;The Apache Flink community is happy to announce its application to the first edition of &lt;a href=&quot;https://developers.google.com/season-of-docs/&quot;&gt;Season of Docs&lt;/a&gt; by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or two technical writers to extend and restructure parts of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/&quot;&gt;our documentation&lt;/a&gt; (details below).&lt;/p&gt;

&lt;p&gt;The community has discussed this opportunity on the &lt;a href=&quot;https://lists.apache.org/thread.html/3c789b6187da23ad158df59bbc598543b652e3cfc1010a14e294e16a@%3Cdev.flink.apache.org%3E&quot;&gt;dev mailinglist&lt;/a&gt; and agreed on three project ideas to submit to the program. We have a great team of mentors (Stephan, Fabian, David, Jark &amp;amp; Konstantin) lined up and are very much looking forward to the first proposals by potential technical writers (given we are admitted to the program ;)). In case of questions feel free to reach out to the community via &lt;a href=&quot;../../../../community.html#mailing-lists&quot;&gt;dev@flink.apache.org&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;project-ideas-list&quot;&gt;Project Ideas List&lt;/h2&gt;

&lt;h3 id=&quot;project-1-improve-documentation-of-stream-processing-concepts&quot;&gt;Project 1: Improve Documentation of Stream Processing Concepts&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing over the last several years. As the community has pushed the boundaries of stream processing, we have introduced new concepts that users need to become familiar with to develop and operate Apache Flink applications efficiently.
The Apache Flink documentation [1] already contains a “concepts” section, but it is a ) incomplete and b) lacks an overall structure &amp;amp; reading flow. In addition, “concepts”-content is also spread over the development [2] &amp;amp; operations [3] documentation without references to the “concepts” section. An example of this can be found in [4] and [5].&lt;/p&gt;

&lt;p&gt;In this project, we would like to restructure, consolidate and extend the concepts documentation for Apache Flink to better guide users who want to become productive as quickly as possible. This includes better conceptual introductions to topics such as event time, state, and fault tolerance with proper linking to and from relevant deployment and development guides.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;project-2-improve-documentation-of-flink-deployments--operations&quot;&gt;Project 2: Improve Documentation of Flink Deployments &amp;amp; Operations&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing for the last few years. As a stateful distributed system in general and a continuously running, low-latency system in particular, Apache Flink deployments are non-trivial to setup and manage.
Unfortunately, the operations [1] and monitoring documentation [2] are arguably the weakest spots of the Apache Flink documentation. While it is comprehensive and often goes into a lot of detail, it lacks an overall structure and does not address common overarching concerns of operations teams in an efficient way.&lt;/p&gt;

&lt;p&gt;In this project, we would like to restructure this part of the documentation and extend it if possible. Ideas for extension include: discussion of session and per-job clusters, better documentation for containerized deployments (incl. K8s), capacity planning &amp;amp; integration into CI/CD pipelines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;project-3-improve-documentation-for-relational-apis-table-api--sql&quot;&gt;Project 3: Improve Documentation for Relational APIs (Table API &amp;amp; SQL)&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Apache Flink features APIs at different levels of abstraction which enables its users to trade conciseness for expressiveness. Flink’s relational APIs, SQL and the Table API, are “younger” than the DataStream and DataSet APIs, more high-level and focus on data analytics use cases. A core principle of Flink’s SQL and Table API is that they can be used to process static (batch) and continuous (streaming) data and that a program or query produces the same result in both cases.
The documentation of Flink’s relational APIs has organically grown and can be improved in a few areas. There are several on-going development efforts (e.g. Hive Integration, Python Support or Support for Interactive Programming) that aim to extend the scope of the Table API and SQL.&lt;/p&gt;

&lt;p&gt;The existing documentation could be reorganized to prepare for covering the new features. Moreover, it could be improved by adding a concepts section that describes the use cases and internals of the APIs in more detail. Moreover, the documentation of built-in functions could be improved by adding more concrete examples.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table&quot;&gt;Table API &amp;amp; SQL docs main page&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html&quot;&gt;Built-in functions&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html&quot;&gt;Concepts&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/&quot;&gt;Streaming Concepts&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

</description>
<pubDate>Wed, 17 Apr 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/04/17/sod.html</link>
<guid isPermaLink="true">/news/2019/04/17/sod.html</guid>
</item>

<item>
<title>Apache Flink 1.8.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce Apache Flink 1.8.0.  The
latest release includes more than 420 resolved issues and some exciting
additions to Flink that we describe in the following sections of this post.
Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344274&quot;&gt;complete changelog&lt;/a&gt;
for more details.&lt;/p&gt;

&lt;p&gt;Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated
with the &lt;code&gt;@Public&lt;/code&gt; annotation.  The release is available now and we encourage
everyone to &lt;a href=&quot;/downloads.html&quot;&gt;download the release&lt;/a&gt; and
check out the updated
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;mailing
lists&lt;/a&gt; or
&lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always,
very much appreciated!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#known-issues&quot; id=&quot;markdown-toc-known-issues&quot;&gt;Known Issues&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;p&gt;With Flink 1.8.0 we come closer to our goals of enabling fast data processing
and building data-intensive applications for the Flink community in a seamless
way. We do this by cleaning up and refactoring Flink under the hood to allow
more efficient feature development in the future. This includes removal of the
legacy runtime components that were subsumed in the major rework of Flink’s
underlying distributed system architecture
(&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;)
as well as refactorings on the Table API that prepare it for the future
addition of the Blink enhancements
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11439&quot;&gt;FLINK-11439&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Nevertheless, this release includes some important new features and bug fixes.
The most interesting of those are highlighted below. Please consult the
&lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344274&quot;&gt;complete changelog&lt;/a&gt;
and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html&quot;&gt;release notes&lt;/a&gt;
for more details.&lt;/p&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Finalized State Schema Evolution Story&lt;/strong&gt;: This release completes
the community driven effort to provide a schema evolution story for
user state managed by Flink. This has been an effort that spanned 2
releases, starting from 1.7.0 with the introduction of support for
Avro state schema evolution as well as a revamped serialization
compatibility abstraction.&lt;/p&gt;

    &lt;p&gt;Flink 1.8.0 finalizes this effort by extending support for schema
evolution to POJOs, upgrading all Flink built-in serializers to use
the new serialization compatibility abstractions, as well as making it
easier for advanced users who use custom state serializers to
implement the abstractions.  These different aspects for a complete
out-of-the-box schema evolution story are explained in detail below:&lt;/p&gt;

    &lt;ol&gt;
      &lt;li&gt;
        &lt;p&gt;Support for POJO state schema evolution: The pool of data types
that support state schema evolution has been expanded to include
POJOs. For state types that use POJOs, you can now add or remove
fields from your POJO while retaining backwards
compatibility. For a full overview of the list of data types that
now support schema evolution as well as their evolution
specifications and limitations, please refer to the State Schema
Evolution documentation page.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Upgrade all Flink serializers to use new serialization
compatibility asbtractions: Back in 1.7.0, we introduced the new
serialization compatibility abstractions &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt;
and &lt;code&gt;TypeSerializerSchemaCompatibility&lt;/code&gt;. Besides providing a more
expressible API to reflect schema compatibility between the data
stored in savepoints and the data registered at runtime, another
important aspect about the new abstraction is that it avoids the
need for Flink to Java-serialize the state serializer as state
metadata in savepoints.&lt;/p&gt;

        &lt;p&gt;In 1.8.0, all of Flink’s built-in serializers have been upgraded to
use the new abstractions, and therefore the serializers
themselves are no longer Java-serialized into savepoints. This
greatly improves interoperability of Flink savepoints, in terms
of state schema evolvability. For example, one outcome was the
support for POJO schema evolution, as previously mentioned
above. Another outcome is that all composite data types supported
by Flink (such as &lt;code&gt;Either&lt;/code&gt;, Scala case classes, Flink Java
&lt;code&gt;Tuple&lt;/code&gt;s, etc.) are generally evolve-able as well when they have
a nested evolvable type, such as a POJO. For example, the &lt;code&gt;MyPojo&lt;/code&gt;
type in &lt;code&gt;ValueState&amp;lt;Tuple2&amp;lt;Integer, MyPojo&amp;gt;&amp;gt;&lt;/code&gt; or
&lt;code&gt;ListState&amp;lt;Either&amp;lt;Integer, MyPojo&amp;gt;&amp;gt;&lt;/code&gt;, which is a POJO, is allowed
to evolve its schema.&lt;/p&gt;

        &lt;p&gt;For users who are using custom &lt;code&gt;TypeSerializer&lt;/code&gt; implementations
for their state serializer and are still using the outdated
abstractions (i.e. &lt;code&gt;TypeSerializerConfigSnapshot&lt;/code&gt; and
&lt;code&gt;CompatiblityResult&lt;/code&gt;), we highly recommend upgrading to the new
abstractions to be future proof. Please refer to the Custom State
Serialization documentation page for a detailed description on
the new abstractions.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Provide pre-defined snapshot implementations for common
serializers: For convenience, Flink 1.8.0 comes with two
predefined implementations for the &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; that
make the task of implementing these new abstractions easier
for most implementations of &lt;code&gt;TypeSerializer&lt;/code&gt;s -
&lt;code&gt;SimpleTypeSerializerSnapshot&lt;/code&gt; and
&lt;code&gt;CompositeTypeSerializerSnapshot&lt;/code&gt;. This section in the
documentation provides information on how to use these classes.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ol&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Continuous cleanup of old state based on TTL
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7811&quot;&gt;FLINK-7811&lt;/a&gt;)&lt;/strong&gt;: We
introduced TTL (time-to-live) for Keyed state in Flink 1.6
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;). This
feature enabled cleanup and made keyed state entries inaccessible after a
defined timeout. In addition state would now also be cleaned up when
writing a savepoint/checkpoint.&lt;/p&gt;

    &lt;p&gt;Flink 1.8 introduces continuous cleanup of old entries for both the RocksDB
state backend
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10471&quot;&gt;FLINK-10471&lt;/a&gt;) and the heap
state backend
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10473&quot;&gt;FLINK-10473&lt;/a&gt;). This means
that old entries (according to the TTL setting) are continuously cleaned up.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;SQL pattern detection with user-defined functions and
aggregations&lt;/strong&gt;: The support of the MATCH_RECOGNIZE clause has been
extended by multiple features.  The addition of user-defined
functions allows for custom logic during pattern detection
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10597&quot;&gt;FLINK-10597&lt;/a&gt;),
while adding aggregations allows for more complex CEP definitions,
such as the following
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7599&quot;&gt;FLINK-7599&lt;/a&gt;).&lt;/p&gt;

    &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;SELECT *
FROM Ticker
    MATCH_RECOGNIZE (
        ORDER BY rowtime
        MEASURES
            AVG(A.price) AS avgPrice
        ONE ROW PER MATCH
        AFTER MATCH SKIP TO FIRST B
        PATTERN (A+ B)
        DEFINE
            A AS AVG(A.price) &amp;lt; 15
    ) MR;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;RFC-compliant CSV format (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9964&quot;&gt;FLINK-9964&lt;/a&gt;)&lt;/strong&gt;: The SQL tables can now be read and written in
an RFC-4180 standard compliant CSV table format. The format might also be
useful for general DataStream API users.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New KafkaDeserializationSchema that gives direct access to ConsumerRecord
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8354&quot;&gt;FLINK-8354&lt;/a&gt;)&lt;/strong&gt;: For the
Flink &lt;code&gt;KafkaConsumers&lt;/code&gt;, we introduced a new &lt;code&gt;KafkaDeserializationSchema&lt;/code&gt; that
gives direct access to the Kafka &lt;code&gt;ConsumerRecord&lt;/code&gt;. This now allows access to
all data that Kafka provides for a record, including the headers. This
subsumes the &lt;code&gt;KeyedSerializationSchema&lt;/code&gt; functionality, which is deprecated but
still available for now.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Per-shard watermarking option in FlinkKinesisConsumer
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5697&quot;&gt;FLINK-5697&lt;/a&gt;)&lt;/strong&gt;: The Kinesis
Consumer can now emit periodic watermarks that are derived from per-shard watermarks,
for correct event time processing with subtasks that consume multiple Kinesis shards.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New consumer for DynamoDB Streams to capture table changes
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4582&quot;&gt;FLINK-4582&lt;/a&gt;)&lt;/strong&gt;: &lt;code&gt;FlinkDynamoDBStreamsConsumer&lt;/code&gt;
is a variant of the Kinesis consumer that supports retrieval of CDC-like streams from DynamoDB tables.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for global aggregates for subtask coordination
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10887&quot;&gt;FLINK-10887&lt;/a&gt;)&lt;/strong&gt;:
Designed as a solution for global source watermark tracking, &lt;code&gt;GlobalAggregateManager&lt;/code&gt;
allows sharing of information between parallel subtasks. This feature will
be integrated into streaming connectors for watermark synchronization and
can be used for other purposes with a user defined aggregator.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Changes to bundling of Hadoop libraries with Flink
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11266&quot;&gt;FLINK-11266&lt;/a&gt;)&lt;/strong&gt;:
Convenience binaries that include hadoop are no longer released.&lt;/p&gt;

    &lt;p&gt;If a deployment relies on &lt;code&gt;flink-shaded-hadoop2&lt;/code&gt; being included in
&lt;code&gt;flink-dist&lt;/code&gt;, then you must manually download a pre-packaged Hadoop
jar from the optional components section of the &lt;a href=&quot;/downloads.html&quot;&gt;download
page&lt;/a&gt; and copy it into the
&lt;code&gt;/lib&lt;/code&gt; directory.  Alternatively, a Flink distribution that includes
hadoop can be built by packaging &lt;code&gt;flink-dist&lt;/code&gt; and activating the
&lt;code&gt;include-hadoop&lt;/code&gt; maven profile.&lt;/p&gt;

    &lt;p&gt;As hadoop is no longer included in &lt;code&gt;flink-dist&lt;/code&gt; by default, specifying
&lt;code&gt;-DwithoutHadoop&lt;/code&gt; when packaging &lt;code&gt;flink-dist&lt;/code&gt; no longer impacts the build.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;FlinkKafkaConsumer will now filter restored partitions based on topic
specification
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10342&quot;&gt;FLINK-10342&lt;/a&gt;)&lt;/strong&gt;:
Starting from Flink 1.8.0, the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt; now always filters out
restored partitions that are no longer associated with a specified topic to
subscribe to in the restored execution. This behaviour did not exist in
previous versions of the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;. If you wish to retain the
previous behaviour, please use the
&lt;code&gt;disableFilterRestoredPartitionsWithSubscribedTopics()&lt;/code&gt; configuration method
on the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;.&lt;/p&gt;

    &lt;p&gt;Consider this example: if you had a Kafka Consumer that was consuming from
topic &lt;code&gt;A&lt;/code&gt;, you did a savepoint, then changed your Kafka consumer to instead
consume from topic &lt;code&gt;B&lt;/code&gt;, and then restarted your job from the savepoint.
Before this change, your consumer would now consume from both topic &lt;code&gt;A&lt;/code&gt; and
&lt;code&gt;B&lt;/code&gt; because it was stored in state that the consumer was consuming from topic
&lt;code&gt;A&lt;/code&gt;. With the change, your consumer would only consume from topic &lt;code&gt;B&lt;/code&gt; after
restore because it now filters the topics that are stored in state using the
configured topics.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Change in the Maven modules of Table API
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11064&quot;&gt;FLINK-11064&lt;/a&gt;)&lt;/strong&gt;: Users
that had a &lt;code&gt;flink-table&lt;/code&gt; dependency before, need to update their
dependencies to &lt;code&gt;flink-table-planner&lt;/code&gt; and the correct dependency of
&lt;code&gt;flink-table-api-*&lt;/code&gt;, depending on whether Java or Scala is used: one of
&lt;code&gt;flink-table-api-java-bridge&lt;/code&gt; or &lt;code&gt;flink-table-api-scala-bridge&lt;/code&gt;.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;known-issues&quot;&gt;Known Issues&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Discarded checkpoint can cause Tasks to fail
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11662&quot;&gt;FLINK-11662&lt;/a&gt;)&lt;/strong&gt;: There is
a race condition that can lead to erroneous checkpoint failures. This mostly
occurs when restarting from a savepoint or checkpoint takes a long time at the
sources of a job. If you see random checkpointing failures that don’t seem to
have a good explanation you might be affected. Please see the Jira issue for
more details and a workaround for the problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html&quot;&gt;release
notes&lt;/a&gt;
for a more detailed list of changes and new features if you plan to upgrade
your Flink setup to Flink 1.8.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;We would like to acknowledge all community members for contributing to this
release.  Special credits go to the following members for contributing to the
1.8.0 release (according to &lt;code&gt;git log --pretty=&quot;%an&quot; release-1.7.0..release-1.8.0 | sort | uniq&lt;/code&gt; without manual deduplication):&lt;/p&gt;

&lt;p&gt;Addison Higham, Aitozi, Aleksey Pak, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Andrey Zagrebin, Artsem Semianenka, Asura7969, Avi, Barisa Obradovic, Benchao Li, Bo WANG, Chesnay Schepler, Congxian Qiu, Cristian, David Anderson, Dawid Wysakowicz, Dian Fu, DuBin, EAlexRojas, EronWright, Eugen Yushin, Fabian Hueske, Fokko Driesprong, Gary Yao, Hequn Cheng, Igal Shilman, Jamie Grier, JaryZhen, Jeff Zhang, Jihyun Cho, Jinhu Wu, Joerg Schad, KarmaGYZ, Kezhu Wang, Konstantin Knauf, Kostas Kloudas, Lakshmi, Lakshmi Gururaja Rao, Lavkesh Lahngir, Li, Shuangjiang, Mai Nakagawa, Matrix42, Matt, Maximilian Michels, Mododo, Nico Kruber, Paul Lin, Piotr Nowojski, Qi Yu, Qin, Robert, Robert Metzger, Romano Vacca, Rong Rong, Rune Skou Larsen, Seth Wiesman, Shannon Carey, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, SuXingLee, TANG Wen-hui, Tao Yang, Thomas Weise, Till Rohrmann, Timo Walther, Tom Goong, Tony Feng, Tony Wei, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Ufuk Celebi, Xingcan Cui, Xpray, XuQianJin-Stars, Xue Yu, Yangze Guo, Ying Xu, Yiqun Lin, Yu Li, Yuanyang Wu, Yun Tang, ZILI CHEN, Zhanchun Zhang, Zhijiang, ZiLi Chen, acqua.csq, alex04.wang, ap, azagrebin, blueszheng, boshu Zheng, chengjie.wu, chensq, chummyhe89, eaglewatcherwb, hequn8128, ifndef-SleePy, intsmaze, jackyyin, jinhu.wjh, jparkie, jrthe42, junsheng.wu, kgorman, kkloudas, kkolman, klion26, lamber-ken, leesf, libenchao, lining, liuzhaokun, lzh3636, maqingxiang, mb-datadome, okidogi, park.yq, sunhaibotb, sunjincheng121, tison, unknown, vinoyang, wenhuitang, wind, xueyu, xuqianjin, yanghua, zentol, zhangzhanchun, zhijiang, zhuzhu.zz, zy, 仲炜, 砚田, 谢磊&lt;/p&gt;

</description>
<pubDate>Tue, 09 Apr 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/04/09/release-1.8.0.html</link>
<guid isPermaLink="true">/news/2019/04/09/release-1.8.0.html</guid>
</item>

<item>
<title>Flink and Prometheus: Cloud-native monitoring of streaming applications</title>
<description>&lt;p&gt;This blog post describes how developers can leverage Apache Flink’s built-in &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;metrics system&lt;/a&gt; together with &lt;a href=&quot;https://prometheus.io/&quot;&gt;Prometheus&lt;/a&gt; to observe and monitor streaming applications in an effective way. This is a follow-up post from my &lt;a href=&quot;https://flink-forward.org/&quot;&gt;Flink Forward&lt;/a&gt; Berlin 2018 talk (&lt;a href=&quot;https://www.slideshare.net/MaximilianBode1/monitoring-flink-with-prometheus&quot;&gt;slides&lt;/a&gt;, &lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/monitoring-flink-with-prometheus&quot;&gt;video&lt;/a&gt;). We will cover some basic Prometheus concepts and why it is a great fit for monitoring Apache Flink stream processing jobs. There is also an example to showcase how you can utilize Prometheus with Flink to gain insights into your applications and be alerted on potential degradations of your Flink jobs.&lt;/p&gt;

&lt;h2 id=&quot;why-prometheus&quot;&gt;Why Prometheus?&lt;/h2&gt;

&lt;p&gt;Prometheus is a metrics-based monitoring system that was originally created in 2012. The system is completely open-source (under the Apache License 2) with a vibrant community behind it and it has graduated from the Cloud Native Foundation last year – a sign of maturity, stability and production-readiness. As we mentioned, the system is based on metrics and it is designed to measure the overall health, behavior and performance of a service. Prometheus features a multi-dimensional data model as well as a flexible query language. It is designed for reliability and can easily be deployed in traditional or containerized environments. Some of the important Prometheus concepts are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt; Prometheus defines metrics as floats of information that change in time. These time series have millisecond precision.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Labels&lt;/strong&gt; are the key-value pairs associated with time series that support Prometheus’ flexible and powerful data model – in contrast to hierarchical data structures that one might experience with traditional metrics systems.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Scrape:&lt;/strong&gt; Prometheus is a pull-based system and fetches (“scrapes”) metrics data from specified sources that expose HTTP endpoints with a text-based format.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;PromQL&lt;/strong&gt; is Prometheus’ &lt;a href=&quot;https://prometheus.io/docs/prometheus/latest/querying/basics/&quot;&gt;query language&lt;/a&gt;. It can be used for both building dashboards and setting up alert rules that will trigger when specific conditions are met.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When considering metrics and monitoring systems for your Flink jobs, there are many &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;options&lt;/a&gt;. Flink offers native support for exposing data to Prometheus via the &lt;code&gt;PrometheusReporter&lt;/code&gt; configuration. Setting up this integration is very easy.&lt;/p&gt;

&lt;p&gt;Prometheus is a great choice as usually Flink jobs are not running in isolation but in a greater context of microservices. For making metrics available to Prometheus from other parts of a larger system, there are two options: There exist &lt;a href=&quot;https://prometheus.io/docs/instrumenting/clientlibs/&quot;&gt;libraries for all major languages&lt;/a&gt; to instrument other applications. Additionally, there is a wide variety of &lt;a href=&quot;https://prometheus.io/docs/instrumenting/exporters/&quot;&gt;exporters&lt;/a&gt;, which are tools that expose metrics of third-party systems (like databases or Apache Kafka) as Prometheus metrics.&lt;/p&gt;

&lt;h2 id=&quot;prometheus-and-flink-in-action&quot;&gt;Prometheus and Flink in Action&lt;/h2&gt;

&lt;p&gt;We have provided a &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example&quot;&gt;GitHub repository&lt;/a&gt; that demonstrates the integration described above. To have a look, clone the repository, make sure &lt;a href=&quot;https://docs.docker.com/install/&quot;&gt;Docker&lt;/a&gt; is installed and run:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;./gradlew composeUp
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This builds a Flink job using the build tool &lt;a href=&quot;https://gradle.org/&quot;&gt;Gradle&lt;/a&gt; and starts up a local environment based on &lt;a href=&quot;https://docs.docker.com/compose/&quot;&gt;Docker Compose&lt;/a&gt; running the job in a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/docker.html#flink-job-cluster&quot;&gt;Flink job cluster&lt;/a&gt; (reachable at &lt;a href=&quot;http://localhost:8081/&quot;&gt;http://localhost:8081&lt;/a&gt;) as well as a Prometheus instance (&lt;a href=&quot;http://localhost:9090/&quot;&gt;http://localhost:9090&lt;/a&gt;).&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheusexamplejob.png&quot; width=&quot;600px&quot; alt=&quot;PrometheusExampleJob in Flink Web UI&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Job graph and custom metric for example job in Flink web interface.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;PrometheusExampleJob&lt;/code&gt; has three operators: Random numbers up to 10,000 are generated, then a map counts the events and creates a histogram of the values passed through. Finally, the events are discarded without further output. The very simple code below is from the second operator. It illustrates how easy it is to add custom metrics relevant to your business logic into your Flink job.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FlinkMetricsExposingMapFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getMetricGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;counter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;events&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;center&gt;&lt;i&gt;&lt;small&gt;Excerpt from &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example/blob/master/src/main/java/com/github/mbode/flink_prometheus_example/FlinkMetricsExposingMapFunction.java&quot;&gt;FlinkMetricsExposingMapFunction.java&lt;/a&gt; demonstrating custom Flink metric.&lt;/small&gt;&lt;/i&gt;&lt;/center&gt;

&lt;h2 id=&quot;configuring-prometheus-with-flink&quot;&gt;Configuring Prometheus with Flink&lt;/h2&gt;

&lt;p&gt;To start monitoring Flink with Prometheus, the following steps are necessary:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;
    &lt;p&gt;Make the &lt;code&gt;PrometheusReporter&lt;/code&gt; jar available to the classpath of the Flink cluster (it comes with the Flink distribution):&lt;/p&gt;

    &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; cp /opt/flink/opt/flink-metrics-prometheus-1.7.2.jar /opt/flink/lib
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter&quot;&gt;Configure the reporter&lt;/a&gt; in Flink’s &lt;em&gt;flink-conf.yaml&lt;/em&gt;. All job managers and task managers will expose the metrics on the configured port.&lt;/p&gt;

    &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; metrics.reporters: prom
 metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
 metrics.reporter.prom.port: 9999
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Prometheus needs to know where to scrape metrics. In a static scenario, you can simply &lt;a href=&quot;https://prometheus.io/docs/prometheus/latest/configuration/configuration/&quot;&gt;configure Prometheus&lt;/a&gt; in &lt;em&gt;prometheus.yml&lt;/em&gt; with the following:&lt;/p&gt;

    &lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; scrape_configs:
 - job_name: &#39;flink&#39;
   static_configs:
   - targets: [&#39;job-cluster:9999&#39;, &#39;taskmanager1:9999&#39;, &#39;taskmanager2:9999&#39;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

    &lt;p&gt;In more dynamic scenarios we recommend using Prometheus’ service discovery support for different platforms such as Kubernetes, AWS EC2 and more.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Both custom metrics are now available in Prometheus:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheus.png&quot; width=&quot;600px&quot; alt=&quot;Prometheus web UI with example metric&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Example metric in Prometheus web UI.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;More technical metrics from the Flink cluster (like checkpoint sizes or duration, Kafka offsets or resource consumption) are also available. If you are interested, you can check out the HTTP endpoints exposing all Prometheus metrics for the job managers and the two task managers on &lt;a href=&quot;http://localhost:9249/metrics&quot;&gt;http://localhost:9249&lt;/a&gt;, &lt;a href=&quot;http://localhost:9250/metrics&quot;&gt;http://localhost:9250&lt;/a&gt; and &lt;a href=&quot;http://localhost:9251/metrics&quot;&gt;http://localhost:9251&lt;/a&gt;, respectively.&lt;/p&gt;

&lt;p&gt;To test Prometheus’ alerting feature, kill one of the Flink task managers via&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker kill taskmanager1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Our Flink job can recover from this partial failure via the mechanism of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/checkpointing.html&quot;&gt;Checkpointing&lt;/a&gt;. Nevertheless, after roughly one minute (as configured in the alert rule) the following alert will fire:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheusalerts.png&quot; width=&quot;600px&quot; alt=&quot;Prometheus web UI with example alert&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Example alert in Prometheus web UI.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;In real-world situations alerts like this one can be routed through a component called &lt;a href=&quot;https://prometheus.io/docs/alerting/alertmanager/&quot;&gt;Alertmanager&lt;/a&gt; and be grouped into notifications to systems like email, PagerDuty or Slack.&lt;/p&gt;

&lt;p&gt;Go ahead and play around with the setup, and check out the &lt;a href=&quot;https://grafana.com/grafana&quot;&gt;Grafana&lt;/a&gt; instance reachable at &lt;a href=&quot;http://localhost:3000/&quot;&gt;http://localhost:3000&lt;/a&gt; (credentials &lt;em&gt;admin:flink&lt;/em&gt;) for visualizing Prometheus metrics. If there are any questions or problems, feel free to &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example/issues&quot;&gt;create an issue&lt;/a&gt;. Once finished, do not forget to tear down the setup via&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;./gradlew composeDown
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Using Prometheus together with Flink provides an easy way for effective monitoring and alerting of your Flink jobs. Both projects have exciting and vibrant communities behind them with new developments and additions scheduled for upcoming releases. We encourage you to try the two technologies together as it has immensely improved our insights into Flink jobs running in production.&lt;/p&gt;
</description>
<pubDate>Mon, 11 Mar 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html</link>
<guid isPermaLink="true">/features/2019/03/11/prometheus-monitoring.html</guid>
</item>

<item>
<title>What to expect from Flink Forward San Francisco 2019</title>
<description>&lt;p&gt;The third annual Flink Forward San Francisco is just a few weeks away! As always, Flink Forward will be the right place to meet and mingle with experienced Flink users, contributors, and committers. Attendees will hear and chat about the latest developments around Flink and learn from technical deep-dive sessions and exciting use cases that were put into production with Flink. The event will take place on April 1-2, 2019 at Hotel Nikko in San Francisco. The &lt;a href=&quot;https://sf-2019.flink-forward.org/program-committee&quot;&gt;program committee&lt;/a&gt; assembled an amazing &lt;a href=&quot;https://sf-2019.flink-forward.org/speakers&quot;&gt;lineup of speakers&lt;/a&gt; who will cover many different aspects of Apache Flink and stream processing.&lt;/p&gt;

&lt;p&gt;Some highlights of the program are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#realtime-store-visit-predictions-at-scale&quot;&gt;Realtime Store Visit Predictions at Scale&lt;/a&gt;: Luca Giovagnoli from &lt;em&gt;Yelp&lt;/em&gt; will talk about a “multidisciplinary” Flink application that combines geospatial clustering algorithms, Machine Learning models, and cutting-edge stream-processing technology.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#real-time-processing-with-flink-for-machine-learning-at-netflix&quot;&gt;Real-time Processing with Flink for Machine Learning at Netflix&lt;/a&gt;: Elliot Chow will discuss the practical aspects of using Apache Flink to power Machine Learning algorithms for video recommendations, search results ranking, and selection of artwork images at &lt;em&gt;Netflix&lt;/em&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#building-production-flink-jobs-with-airstream-at-airbnb&quot;&gt;Building production Flink jobs with Airstream at Airbnb&lt;/a&gt;: Pala Muthiah and Hao Wang will reveal how &lt;em&gt;Airbnb&lt;/em&gt; builds real time data pipelines with Airstream, Airbnb’s computation framework that is powered by Flink SQL.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api&quot;&gt;When Table meets AI: Build Flink AI Ecosystem on Table API&lt;/a&gt;: Shaoxuan Wang from &lt;em&gt;Alibaba&lt;/em&gt; will discuss how they are building a solid AI ecosystem for unified batch/streaming Machine Learning data pipelines on top of Flink’s Table API.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#adventures-in-scaling-from-zero-to-5-billion-data-points-per-day&quot;&gt;Adventures in Scaling from Zero to 5 Billion Data Points per Day&lt;/a&gt;: Dave Torok will take us through &lt;em&gt;Comcast’s&lt;/em&gt; journey in scaling the company’s operationalized Machine Learning framework from the very early days in production to processing more than 5 billion data points per day.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re new to Apache Flink or want to deepen your knowledge around the framework, Flink Forward features again a full day of training.&lt;/p&gt;

&lt;p&gt;You can choose from 3 training tracks:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#introduction-to-streaming-with-apache-flink&quot;&gt;Introduction to Streaming with Apache Flink&lt;/a&gt;: A hands-on, in-depth introduction to stream processing and Apache Flink, this course emphasizes those features of Flink that make it easy to build and manage accurate, fault tolerant applications on streams.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#analyzing-streaming-data-with-flink-sql&quot;&gt;Analyzing Streaming Data with Flink SQL&lt;/a&gt;: In this hands-on training, you will learn what it means to run SQL queries on data streams and how to fully leverage the potential of SQL on Flink. We’ll also cover some of the more recent features such as time-versioned joins and the MATCH RECOGNIZE clause.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#apache-flink-troubleshooting---operations&quot;&gt;Troubleshooting and Operating Flink at large scale&lt;/a&gt;: In this training, we will focus on everything you need to run Apache Flink applications reliably and efficiently in production including topics like capacity planning, monitoring, troubleshooting and tuning Apache Flink.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you haven’t done so yet, check out the &lt;a href=&quot;http://sf-2019.flink-forward.org/conference-program&quot;&gt;full schedule&lt;/a&gt; and &lt;a href=&quot;https://sf-2019.flink-forward.org/register&quot;&gt;register&lt;/a&gt; your attendance. &lt;br /&gt;
I’m looking forward to meet you at Flink Forward San Francisco.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Fabian&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Wed, 06 Mar 2019 12:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/03/06/ffsf-preview.html</link>
<guid isPermaLink="true">/news/2019/03/06/ffsf-preview.html</guid>
</item>

<item>
<title>Monitoring Apache Flink Applications 101</title>
<description>&lt;!-- improve style of tables --&gt;
&lt;style&gt;
  table { border: 0px solid black; table-layout: auto; width: 800px; }
  th, td { border: 1px solid black; padding: 5px; padding-left: 10px; padding-right: 10px; }
  th { text-align: center }
  td { vertical-align: top }
&lt;/style&gt;

&lt;p&gt;This blog post provides an introduction to Apache Flink’s built-in monitoring
and metrics system, that allows developers to effectively monitor their Flink
jobs. Oftentimes, the task of picking the relevant metrics to monitor a
Flink application can be overwhelming for a DevOps team that is just starting
with stream processing and Apache Flink. Having worked with many organizations
that deploy Flink at scale, I would like to share my experience and some best
practice with the community.&lt;/p&gt;

&lt;p&gt;With business-critical applications running on Apache Flink, performance monitoring
becomes an increasingly important part of a successful production deployment. It 
ensures that any degradation or downtime is immediately identified and resolved
as quickly as possible.&lt;/p&gt;

&lt;p&gt;Monitoring goes hand-in-hand with observability, which is a prerequisite for
troubleshooting and performance tuning. Nowadays, with the complexity of modern
enterprise applications and the speed of delivery increasing, an engineering
team must understand and have a complete overview of its applications’ status at
any given point in time.&lt;/p&gt;

&lt;h2 id=&quot;flinks-metrics-system&quot;&gt;Flink’s Metrics System&lt;/h2&gt;

&lt;p&gt;The foundation for monitoring Flink jobs is its &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;metrics
system&lt;/a&gt;
which consists of two components; &lt;code&gt;Metrics&lt;/code&gt; and &lt;code&gt;MetricsReporters&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;metrics&quot;&gt;Metrics&lt;/h3&gt;

&lt;p&gt;Flink comes with a comprehensive set of built-in metrics such as:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Used JVM Heap / NonHeap / Direct Memory (per Task-/JobManager)&lt;/li&gt;
  &lt;li&gt;Number of Job Restarts (per Job)&lt;/li&gt;
  &lt;li&gt;Number of Records Per Second (per Operator)&lt;/li&gt;
  &lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These metrics have different scopes and measure more general (e.g. JVM or
operating system) as well as Flink-specific aspects.&lt;/p&gt;

&lt;p&gt;As a user, you can and should add application-specific metrics to your
functions. Typically these include counters for the number of invalid records or
the number of records temporarily buffered in managed state. Besides counters,
Flink offers additional metrics types like gauges and histograms. For
instructions on how to register your own metrics with Flink’s metrics system
please check out &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#registering-metrics&quot;&gt;Flink’s
documentation&lt;/a&gt;.
In this blog post, we will focus on how to get the most out of Flink’s built-in
metrics.&lt;/p&gt;

&lt;h3 id=&quot;metricsreporters&quot;&gt;MetricsReporters&lt;/h3&gt;

&lt;p&gt;All metrics can be queried via Flink’s REST API. However, users can configure
MetricsReporters to send the metrics to external systems. Apache Flink provides
reporters to the most common monitoring tools out-of-the-box including JMX,
Prometheus, Datadog, Graphite and InfluxDB. For information about how to
configure a reporter check out Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter&quot;&gt;MetricsReporter
documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the remaining part of this blog post, we will go over some of the most
important metrics to monitor your Apache Flink application.&lt;/p&gt;

&lt;h2 id=&quot;monitoring-general-health&quot;&gt;Monitoring General Health&lt;/h2&gt;

&lt;p&gt;The first thing you want to monitor is whether your job is actually in a &lt;em&gt;RUNNING&lt;/em&gt;
state. In addition, it pays off to monitor the number of restarts and the time
since the last restart.&lt;/p&gt;

&lt;p&gt;Generally speaking, successful checkpointing is a strong indicator of the
general health of your application. For each checkpoint, checkpoint barriers
need to flow through the whole topology of your Flink job and events and
barriers cannot overtake each other. Therefore, a successful checkpoint shows
that no channel is fully congested.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;uptime&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job&lt;/td&gt;
      &lt;td&gt;The time that the job has been running without interruption.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;fullRestarts&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job&lt;/td&gt;
      &lt;td&gt;The total number of full restarts since this job was submitted.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;numberOfCompletedCheckpoints&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job&lt;/td&gt;
      &lt;td&gt;The number of successfully completed checkpoints.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;numberOfFailedCheckpoints&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job&lt;/td&gt;
      &lt;td&gt;The number of failed checkpoints.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-1.png&quot; width=&quot;800px&quot; alt=&quot;Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-2.png&quot; width=&quot;800px&quot; alt=&quot;Completed Checkpoints (18336), Failed (14)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Completed Checkpoints (18336), Failed (14)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;ΔfullRestarts&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;ΔnumberOfFailedCheckpoints&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;monitoring-progress--throughput&quot;&gt;Monitoring Progress &amp;amp; Throughput&lt;/h2&gt;

&lt;p&gt;Knowing that your application is RUNNING and checkpointing is working fine is good,
but it does not tell you whether the application is actually making progress and
keeping up with the upstream systems.&lt;/p&gt;

&lt;h3 id=&quot;throughput&quot;&gt;Throughput&lt;/h3&gt;

&lt;p&gt;Flink provides multiple metrics to measure the throughput of our application.
For each operator or task (remember: a task can contain multiple &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups&quot;&gt;chained
tasks&lt;/a&gt;
Flink counts the number of records and bytes going in and out. Out of those
metrics, the rate of outgoing records per operator is often the most intuitive
and easiest to reason about.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;numRecordsOutPerSecond&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;task&lt;/td&gt;
      &lt;td&gt;The number of records this operator/task sends per second.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;numRecordsOutPerSecond&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;operator&lt;/td&gt;
      &lt;td&gt;The number of records this operator sends per second.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-3.png&quot; width=&quot;800px&quot; alt=&quot;Mean Records Out per Second per Operator&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Mean Records Out per Second per Operator&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;recordsOutPerSecond&lt;/code&gt; = &lt;code&gt;0&lt;/code&gt; (for a non-Sink operator)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: Source operators always have zero incoming records. Sink operators
always have zero outgoing records because the metrics only count
Flink-internal communication. There is a &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7286&quot;&gt;JIRA
ticket&lt;/a&gt; to change this
behavior.&lt;/p&gt;

&lt;h3 id=&quot;progress&quot;&gt;Progress&lt;/h3&gt;

&lt;p&gt;For applications, that use event time semantics, it is important that watermarks
progress over time. A watermark of time &lt;em&gt;t&lt;/em&gt; tells the framework, that it
should not anymore expect to receive  events with a timestamp earlier than &lt;em&gt;t&lt;/em&gt;,
and in turn, to trigger all operations that were scheduled for a timestamp &amp;lt; &lt;em&gt;t&lt;/em&gt;.
For example, an event time window that ends at &lt;em&gt;t&lt;/em&gt; = 30 will be closed and
evaluated once the watermark passes 30.&lt;/p&gt;

&lt;p&gt;As a consequence, you should monitor the watermark at event time-sensitive
operators in your application, such as process functions and windows. If the
difference between the current processing time and the watermark, known as
even-time skew, is unusually high, then it typically implies one of two issues.
First, it could mean that your are simply processing old events, for example
during catch-up after a downtime or when your job is simply not able to keep up
and events are queuing up. Second, it could mean a single upstream sub-task has
not sent a watermark for a long time (for example because it did not receive any
events to base the watermark on), which also prevents the watermark in
downstream operators to progress. This &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5017&quot;&gt;JIRA
ticket&lt;/a&gt; provides further
information and a work around for the latter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;currentOutputWatermark&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;operator&lt;/td&gt;
      &lt;td&gt;The last watermark this operator has emitted.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-4.png&quot; width=&quot;800px&quot; alt=&quot;Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging a few seconds behind for each subtask.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging a few seconds behind for each subtask.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;currentProcessingTime - currentOutputWatermark&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;keeping-up&quot;&gt;“Keeping Up”&lt;/h3&gt;

&lt;p&gt;When consuming from a message queue, there is often a direct way to monitor if
your application is keeping up. By using connector-specific metrics you can
monitor how far behind the head of the message queue your current consumer group
is. Flink forwards the underlying metrics from most sources.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;records-lag-max&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;user&lt;/td&gt;
      &lt;td&gt;applies to &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;. The maximum lag in terms of the number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;millisBehindLatest&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;user&lt;/td&gt;
      &lt;td&gt;applies to &lt;code&gt;FlinkKinesisConsumer&lt;/code&gt;. The number of milliseconds a consumer is behind the head of the stream. For any consumer and Kinesis shard, this indicates how far it is behind the current time.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;records-lag-max&lt;/code&gt;  &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;millisBehindLatest&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;monitoring-latency&quot;&gt;Monitoring Latency&lt;/h2&gt;

&lt;p&gt;Generally speaking, latency is the delay between the creation of an event and
the time at which results based on this event become visible. Once the event is
created it is usually stored in a persistent message queue, before it is
processed by Apache Flink, which then writes the results to a database or calls
a downstream system. In such a pipeline, latency can be introduced at each stage
and for various reasons including the following:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;It might take a varying amount of time until events are persisted in the
message queue.&lt;/li&gt;
  &lt;li&gt;During periods of high load or during recovery, events might spend some time
in the message queue until they are processed by Flink (see previous section).&lt;/li&gt;
  &lt;li&gt;Some operators in a streaming topology need to buffer events for some time
(e.g. in a time window) for functional reasons.&lt;/li&gt;
  &lt;li&gt;Each computation in your Flink topology (framework or user code), as well as
each network shuffle, takes time and adds to latency.&lt;/li&gt;
  &lt;li&gt;If the application emits through a transactional sink, the sink will only
commit and publish transactions upon successful checkpoints of Flink, adding
latency usually up to the checkpointing interval for each record.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In practice, it has proven invaluable to add timestamps to your events at
multiple stages (at least at creation, persistence, ingestion by Flink,
publication by Flink, possibly sampling those to save bandwidth). The
differences between these timestamps can be exposed as a user-defined metric in
your Flink topology to derive the latency distribution of each stage.&lt;/p&gt;

&lt;p&gt;In the rest of this section, we will only consider latency, which is introduced
inside the Flink topology and cannot be attributed to transactional sinks or
events being buffered for functional reasons (4.).&lt;/p&gt;

&lt;p&gt;To this end, Flink comes with a feature called &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#latency-tracking&quot;&gt;Latency
Tracking&lt;/a&gt;.
When enabled, Flink will insert so-called latency markers periodically at all
sources. For each sub-task, a latency distribution from each source to this
operator will be reported. The granularity of these histograms can be further
controlled by setting &lt;em&gt;metrics.latency.granularity&lt;/em&gt; as desired.&lt;/p&gt;

&lt;p&gt;Due to the potentially high number of histograms (in particular for
&lt;em&gt;metrics.latency.granularity: subtask&lt;/em&gt;), enabling latency tracking can
significantly impact the performance of the cluster. It is recommended to only
enable it to locate sources of latency during debugging.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;latency&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;operator&lt;/td&gt;
      &lt;td&gt;The latency from the source operator to this operator.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;restartingTime&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job&lt;/td&gt;
      &lt;td&gt;The time it took to restart the job, or how long the current restart has been in progress.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-5.png&quot; width=&quot;800px&quot; alt=&quot;Latency distribution between a source and a single sink subtask.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Latency distribution between a source and a single sink subtask.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;jvm-metrics&quot;&gt;JVM Metrics&lt;/h2&gt;

&lt;p&gt;So far we have only looked at Flink-specific metrics. As long as latency &amp;amp;
throughput of your application are in line with your expectations and it is
checkpointing consistently, this is probably everything you need. On the other
hand, if you job’s performance is starting to degrade among the firstmetrics you
want to look at are memory consumption and CPU load of your Task- &amp;amp; JobManager
JVMs.&lt;/p&gt;

&lt;h3 id=&quot;memory&quot;&gt;Memory&lt;/h3&gt;

&lt;p&gt;Flink reports the usage of Heap, NonHeap, Direct &amp;amp; Mapped memory for JobManagers
and TaskManagers.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Heap memory - as with most JVM applications - is the most volatile and important
metric to watch. This is especially true when using Flink’s filesystem
statebackend as it keeps all state objects on the JVM Heap. If the size of
long-living objects on the Heap increases significantly, this can usually be
attributed to the size of your application state (check the 
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#checkpointing&quot;&gt;checkpointing metrics&lt;/a&gt;
for an estimated size of the on-heap state). The possible reasons for growing
state are very application-specific. Typically, an increasing number of keys, a
large event-time skew between different input streams or simply missing state
cleanup may cause growing state.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;NonHeap memory is dominated by the metaspace, the size of which is unlimited by default
and holds class metadata as well as static content. There is a 
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10317&quot;&gt;JIRA Ticket&lt;/a&gt; to limit the size
to 250 megabyte by default.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The biggest driver of Direct memory is by far the
number of Flink’s network buffers, which can be
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#configuring-the-network-buffers&quot;&gt;configured&lt;/a&gt;.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Mapped memory is usually close to zero as Flink does not use memory-mapped files.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In a containerized environment you should additionally monitor the overall
memory consumption of the Job- and TaskManager containers to ensure they don’t
exceed their resource limits. This is particularly important, when using the
RocksDB statebackend, since RocksDB allocates a considerable amount of
memory off heap. To understand how much memory RocksDB might use, you can
checkout &lt;a href=&quot;https://www.da-platform.com/blog/manage-rocksdb-memory-size-apache-flink&quot;&gt;this blog
post&lt;/a&gt;
by Stefan Richter.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.Memory.NonHeap.Committed&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The amount of non-heap memory guaranteed to be available to the JVM (in bytes).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.Memory.Heap.Used&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The amount of heap memory currently used (in bytes).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.Memory.Heap.Committed&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The amount of heap memory guaranteed to be available to the JVM (in bytes).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.Memory.Direct.MemoryUsed&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The amount of memory used by the JVM for the direct buffer pool (in bytes).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.Memory.Mapped.MemoryUsed&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The amount of memory used by the JVM for the mapped buffer pool (in bytes).&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.GarbageCollector.G1 Young Generation.Time&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The total time spent performing G1 Young Generation garbage collection.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.GarbageCollector.G1 Old Generation.Time&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The total time spent performing G1 Old Generation garbage collection.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-6.png&quot; width=&quot;800px&quot; alt=&quot;TaskManager memory consumption and garbage collection times.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;TaskManager memory consumption and garbage collection times.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-7.png&quot; width=&quot;800px&quot; alt=&quot;JobManager memory consumption and garbage collection times.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;JobManager memory consumption and garbage collection times.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;container memory limit&lt;/code&gt; &amp;lt; &lt;code&gt;container memory + safety margin&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;cpu&quot;&gt;CPU&lt;/h3&gt;

&lt;p&gt;Besides memory, you should also monitor the CPU load of the TaskManagers. If
your TaskManagers are constantly under very high load, you might be able to
improve the overall performance by decreasing the number of task slots per
TaskManager (in case of a Standalone setup), by providing more resources to the
TaskManager (in case of a containerized setup), or by providing more
TaskManagers. In general, a system already running under very high load during
normal operations, will need much more time to catch-up after recovering from a
downtime. During this time you will see a much higher latency (event-time skew) than
usual.&lt;/p&gt;

&lt;p&gt;A sudden increase in the CPU load might also be attributed to high garbage
collection pressure, which should be visible in the JVM memory metrics as well.&lt;/p&gt;

&lt;p&gt;If one or a few TaskManagers are constantly under very high load, this can slow
down the whole topology due to long checkpoint alignment times and increasing
event-time skew. A common reason is skew in the partition key of the data, which
can be mitigated by pre-aggregating before the shuffle or keying on a more
evenly distributed key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Metric&lt;/th&gt;
      &lt;th&gt;Scope&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;Status.JVM.CPU.Load&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;job-/taskmanager&lt;/td&gt;
      &lt;td&gt;The recent CPU usage of the JVM.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-8.png&quot; width=&quot;800px&quot; alt=&quot;TaskManager &amp;amp; JobManager CPU load.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;TaskManager &amp;amp; JobManager CPU load.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;system-resources&quot;&gt;System Resources&lt;/h2&gt;

&lt;p&gt;In addition to the JVM metrics above, it is also possible to use Flink’s metrics
system to gather insights about system resources, i.e. memory, CPU &amp;amp;
network-related metrics for the whole machine as opposed to the Flink processes
alone. System resource monitoring is disabled by default and requires additional
dependencies on the classpath. Please check out the 
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#system-resources&quot;&gt;Flink system resource metrics documentation&lt;/a&gt; for
additional guidance and details. System resource monitoring in Flink can be very
helpful in setups without existing host monitoring capabilities.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;This post tries to shed some light on Flink’s metrics and monitoring system. You
can utilise it as a starting point when you first think about how to
successfully monitor your Flink application. I highly recommend to start
monitoring your Flink application early on in the development phase. This way
you will be able to improve your dashboards and alerts over time and, more
importantly, observe the performance impact of the changes to your application
throughout the development phase. By doing so, you can ask the right questions
about the runtime behaviour of your application, and learn much more about
Flink’s internals early on.&lt;/p&gt;

&lt;p&gt;Last but not least, this post only scratches the surface of the overall metrics
and monitoring capabilities of Apache Flink. I highly recommend going over
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;Flink’s metrics documentation&lt;/a&gt;
for a full reference of Flink’s metrics system.&lt;/p&gt;
</description>
<pubDate>Mon, 25 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html</link>
<guid isPermaLink="true">/news/2019/02/25/monitoring-best-practices.html</guid>
</item>

<item>
<title>Apache Flink 1.6.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fourth bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 25 fixes and minor improvements for Flink 1.6.3. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.4.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10721&quot;&gt;FLINK-10721&lt;/a&gt;] -         Kafka discovery-loop exceptions may be swallowed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10761&quot;&gt;FLINK-10761&lt;/a&gt;] -         MetricGroup#getAllVariables can deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10774&quot;&gt;FLINK-10774&lt;/a&gt;] -         connection leak when partition discovery is disabled and open throws exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10848&quot;&gt;FLINK-10848&lt;/a&gt;] -         Flink&amp;#39;s Yarn ResourceManager can allocate too many excess containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11022&quot;&gt;FLINK-11022&lt;/a&gt;] -         Update LICENSE and NOTICE files for older releases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11071&quot;&gt;FLINK-11071&lt;/a&gt;] -         Dynamic proxy classes cannot be resolved when deserializing job graph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11084&quot;&gt;FLINK-11084&lt;/a&gt;] -         Incorrect ouput after two consecutive split and select
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11119&quot;&gt;FLINK-11119&lt;/a&gt;] -         Incorrect Scala example for Table Function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11134&quot;&gt;FLINK-11134&lt;/a&gt;] -         Invalid REST API request should not log the full exception in Flink logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11151&quot;&gt;FLINK-11151&lt;/a&gt;] -         FileUploadHandler stops working if the upload directory is removed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11173&quot;&gt;FLINK-11173&lt;/a&gt;] -         Proctime attribute validation throws an incorrect exception message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11224&quot;&gt;FLINK-11224&lt;/a&gt;] -         Log is missing in scala-shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11232&quot;&gt;FLINK-11232&lt;/a&gt;] -         Empty Start Time of sub-task on web dashboard
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11234&quot;&gt;FLINK-11234&lt;/a&gt;] -         ExternalTableCatalogBuilder unable to build a batch-only table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11235&quot;&gt;FLINK-11235&lt;/a&gt;] -         Elasticsearch connector leaks threads if no connection could be established
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11251&quot;&gt;FLINK-11251&lt;/a&gt;] -         Incompatible metric name on prometheus reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11389&quot;&gt;FLINK-11389&lt;/a&gt;] -         Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11584&quot;&gt;FLINK-11584&lt;/a&gt;] -         ConfigDocsCompletenessITCase fails DescriptionBuilder#linebreak() is used
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11585&quot;&gt;FLINK-11585&lt;/a&gt;] -         Prefix matching in ConfigDocsGenerator can result in wrong assignments
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10910&quot;&gt;FLINK-10910&lt;/a&gt;] -         Harden Kubernetes e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11079&quot;&gt;FLINK-11079&lt;/a&gt;] -         Skip deployment for flnk-storm-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11207&quot;&gt;FLINK-11207&lt;/a&gt;] -         Update Apache commons-compress from 1.4.1 to 1.18
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11262&quot;&gt;FLINK-11262&lt;/a&gt;] -         Bump jython-standalone to 2.7.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11289&quot;&gt;FLINK-11289&lt;/a&gt;] -         Rework example module structure to account for licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11304&quot;&gt;FLINK-11304&lt;/a&gt;] -         Typo in time attributes doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11469&quot;&gt;FLINK-11469&lt;/a&gt;] -         fix Tuning Checkpoints and Large State doc
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 25 Feb 2019 01:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/25/release-1.6.4.html</link>
<guid isPermaLink="true">/news/2019/02/25/release-1.6.4.html</guid>
</item>

<item>
<title>Apache Flink 1.7.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 40 fixes and minor improvements for Flink 1.7.1, covering several critical
recovery issues as well as problems in the Flink streaming connectors.&lt;/p&gt;

&lt;p&gt;The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.7.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11179&quot;&gt;FLINK-11179&lt;/a&gt;] -          JoinCancelingITCase#testCancelSortMatchWhileDoingHeavySorting test error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11180&quot;&gt;FLINK-11180&lt;/a&gt;] -         ProcessFailureCancelingITCase#testCancelingOnProcessFailure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11181&quot;&gt;FLINK-11181&lt;/a&gt;] -         SimpleRecoveryITCaseBase test error
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10721&quot;&gt;FLINK-10721&lt;/a&gt;] -         Kafka discovery-loop exceptions may be swallowed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10761&quot;&gt;FLINK-10761&lt;/a&gt;] -         MetricGroup#getAllVariables can deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10774&quot;&gt;FLINK-10774&lt;/a&gt;] -         connection leak when partition discovery is disabled and open throws exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10848&quot;&gt;FLINK-10848&lt;/a&gt;] -         Flink&amp;#39;s Yarn ResourceManager can allocate too many excess containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11046&quot;&gt;FLINK-11046&lt;/a&gt;] -         ElasticSearch6Connector cause thread blocked when index failed with retry
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11071&quot;&gt;FLINK-11071&lt;/a&gt;] -         Dynamic proxy classes cannot be resolved when deserializing job graph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11083&quot;&gt;FLINK-11083&lt;/a&gt;] -         CRowSerializerConfigSnapshot is not instantiable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11084&quot;&gt;FLINK-11084&lt;/a&gt;] -         Incorrect ouput after two consecutive split and select
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11100&quot;&gt;FLINK-11100&lt;/a&gt;] -         Presto S3 FileSystem E2E test broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11119&quot;&gt;FLINK-11119&lt;/a&gt;] -         Incorrect Scala example for Table Function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11134&quot;&gt;FLINK-11134&lt;/a&gt;] -         Invalid REST API request should not log the full exception in Flink logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11145&quot;&gt;FLINK-11145&lt;/a&gt;] -         Fix Hadoop version handling in binary release script
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11151&quot;&gt;FLINK-11151&lt;/a&gt;] -         FileUploadHandler stops working if the upload directory is removed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11168&quot;&gt;FLINK-11168&lt;/a&gt;] -         LargePlanTest times out on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11173&quot;&gt;FLINK-11173&lt;/a&gt;] -         Proctime attribute validation throws an incorrect exception message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11187&quot;&gt;FLINK-11187&lt;/a&gt;] -         StreamingFileSink with S3 backend transient socket timeout issues 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11191&quot;&gt;FLINK-11191&lt;/a&gt;] -         Exception in code generation when ambiguous columns in MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11194&quot;&gt;FLINK-11194&lt;/a&gt;] -         missing Scala 2.12 build of HBase connector 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11201&quot;&gt;FLINK-11201&lt;/a&gt;] -         Document SBT dependency requirements when using MiniClusterResource
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11224&quot;&gt;FLINK-11224&lt;/a&gt;] -         Log is missing in scala-shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11227&quot;&gt;FLINK-11227&lt;/a&gt;] -         The DescriptorProperties contains some bounds checking errors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11232&quot;&gt;FLINK-11232&lt;/a&gt;] -         Empty Start Time of sub-task on web dashboard
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11234&quot;&gt;FLINK-11234&lt;/a&gt;] -         ExternalTableCatalogBuilder unable to build a batch-only table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11235&quot;&gt;FLINK-11235&lt;/a&gt;] -         Elasticsearch connector leaks threads if no connection could be established
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11246&quot;&gt;FLINK-11246&lt;/a&gt;] -         Fix distinct AGG visibility issues
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11251&quot;&gt;FLINK-11251&lt;/a&gt;] -         Incompatible metric name on prometheus reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11279&quot;&gt;FLINK-11279&lt;/a&gt;] -         Invalid week interval parsing in ExpressionParser
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11302&quot;&gt;FLINK-11302&lt;/a&gt;] -         FlinkS3FileSystem uses an incorrect path for temporary files.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11389&quot;&gt;FLINK-11389&lt;/a&gt;] -         Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11419&quot;&gt;FLINK-11419&lt;/a&gt;] -         StreamingFileSink fails to recover after taskmanager failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11436&quot;&gt;FLINK-11436&lt;/a&gt;] -         Java deserialization failure of the AvroSerializer when used in an old CompositeSerializers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10457&quot;&gt;FLINK-10457&lt;/a&gt;] -         Support SequenceFile for StreamingFileSink
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10910&quot;&gt;FLINK-10910&lt;/a&gt;] -         Harden Kubernetes e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11023&quot;&gt;FLINK-11023&lt;/a&gt;] -         Update LICENSE and NOTICE files for flink-connectors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11079&quot;&gt;FLINK-11079&lt;/a&gt;] -         Skip deployment for flink-storm-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11207&quot;&gt;FLINK-11207&lt;/a&gt;] -         Update Apache commons-compress from 1.4.1 to 1.18
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11216&quot;&gt;FLINK-11216&lt;/a&gt;] -         Back to top button is missing in the Joining document and is not properly placed in the Process Function document
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11262&quot;&gt;FLINK-11262&lt;/a&gt;] -         Bump jython-standalone to 2.7.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11289&quot;&gt;FLINK-11289&lt;/a&gt;] -         Rework example module structure to account for licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11304&quot;&gt;FLINK-11304&lt;/a&gt;] -         Typo in time attributes doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11331&quot;&gt;FLINK-11331&lt;/a&gt;] -         Fix errors in tableApi.md and functions.md
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11469&quot;&gt;FLINK-11469&lt;/a&gt;] -         fix  Tuning Checkpoints and Large State doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11473&quot;&gt;FLINK-11473&lt;/a&gt;] -         Clarify Documenation on Latency Tracking
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11628&quot;&gt;FLINK-11628&lt;/a&gt;] -         Cache maven on travis
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 15 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/15/release-1.7.2.html</link>
<guid isPermaLink="true">/news/2019/02/15/release-1.7.2.html</guid>
</item>

<item>
<title>Batch as a Special Case of Streaming and Alibaba&#39;s contribution of Blink</title>
<description>&lt;p&gt;Last week, we &lt;a href=&quot;https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E&quot;&gt;broke the news&lt;/a&gt; that Alibaba decided to contribute its Flink-fork, called Blink, back to the Apache Flink project. Why is that a big thing for Flink, what will it mean for users and the community, and how does it fit into Flink’s overall vision? Let’s take a step back to understand this better…&lt;/p&gt;

&lt;h2 id=&quot;a-unified-approach-to-batch-and-streaming&quot;&gt;A Unified Approach to Batch and Streaming&lt;/h2&gt;

&lt;p&gt;Since its early days, Apache Flink has followed the philosophy of taking a unified approach to batch and streaming data processing. The core building block is &lt;em&gt;“continuous processing of unbounded data streams”&lt;/em&gt;: if you can do that, you can also do offline processing of bounded data sets (batch processing use cases), because these are just streams that happen to end at some point.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/bounded-unbounded.png&quot; width=&quot;600px&quot; alt=&quot;Processing of bounded and unbounded data.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;“streaming first, with batch as a special case of streaming”&lt;/em&gt; philosophy is supported by various projects (for example &lt;a href=&quot;https://flink.apache.org&quot;&gt;Flink&lt;/a&gt;, &lt;a href=&quot;https://beam.apache.org&quot;&gt;Beam&lt;/a&gt;, etc.) and often been cited as a powerful way to build data applications that &lt;a href=&quot;https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101&quot;&gt;generalize across real-time and offline processing&lt;/a&gt; and to help greatly reduce the complexity of data infrastructures.&lt;/p&gt;

&lt;h3 id=&quot;why-are-there-still-batch-processors&quot;&gt;Why are there still batch processors?&lt;/h3&gt;

&lt;p&gt;However, &lt;em&gt;“batch is just a special case of streaming”&lt;/em&gt; does not mean that any stream processor is now the right tool for your batch processing use cases - the introduction of stream processors did not render batch processors obsolete:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Pure stream processing systems are very slow at batch processing workloads. No one would consider it a good idea to use a stream processor that shuffles through message queues to analyze large amounts of available data.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Unified APIs like &lt;a href=&quot;https://beam.apache.org&quot;&gt;Apache Beam&lt;/a&gt; often delegate to different runtimes depending on whether the data is continuous/unbounded of fix/bounded. For example, the implementations of the batch and streaming runtime of Google Cloud Dataflow are different, to get the desired performance and resilience in each case.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;em&gt;Apache Flink&lt;/em&gt; has a streaming API that can do bounded/unbounded use cases, but still offers a separate DataSet API and runtime stack that is faster for batch use cases.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What is the reason for the above? Where did &lt;em&gt;“batch is just a special case of streaming”&lt;/em&gt; go wrong?&lt;/p&gt;

&lt;p&gt;The answer is simple, nothing is wrong with that paradigm. Unifying batch and streaming in the API is one aspect. One needs to also exploit certain characteristics of the special case “bounded data” in the runtime to competitively handle batch processing use cases. After all, batch processors have been built specifically for that special case.&lt;/p&gt;

&lt;h2 id=&quot;batch-on-top-of-a-streaming-runtime&quot;&gt;Batch on top of a Streaming Runtime&lt;/h2&gt;

&lt;p&gt;We always believed that it is possible to have a runtime that is state-of-the-art for both stream processing and batch processing use cases at the same time. A runtime that is streaming-first, but can exploit just the right amount of special properties of bounded streams to be as fast for batch use cases as dedicated batch processors. &lt;strong&gt;This is the unique approach that Flink takes.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Apache Flink has a network stack that supports both &lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/improving-throughput-and-latency-with-flinks-network-stack&quot;&gt;low-latency/high-throughput streaming data exchanges&lt;/a&gt;, as well as high-throughput batch shuffles. Flink has streaming runtime operators for many operations, but also specialized operators for bounded inputs, which get used when you choose the DataSet API or select the batch environment in the Table API.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/stream-batch-joins.png&quot; width=&quot;500px&quot; alt=&quot;Streaming and batch joins.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;The figure illustrates a streaming join and a batch join. The batch join can read one input fully into a hash table and then probe with the other input. The stream join needs to build tables for both sides, because it needs to continuously process both inputs. 
For data larger than memory, the batch join can partition both data sets into subsets that fit in memory (data hits disk once) whereas the continuous nature of the stream join requires it to always keep all data in the table and repeatedly hit disk on cache misses.&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Because of that, Apache Flink has been actually demonstrating some pretty impressive batch processing performance since its early days. The below benchmark is a bit older, but validated our architectural approach early on.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/sort-performance.png&quot; width=&quot;500px&quot; alt=&quot;Sort performance.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Time to sort 3.2 TB (80 GB/node), in seconds&lt;br /&gt;
(&lt;a href=&quot;https://www.slideshare.net/FlinkForward/dongwon-kim-a-comparative-performance-evaluation-of-flink&quot; target=&quot;blank&quot;&gt;Presentation by Dongwon Kim, Flink Forward Berlin 2015&lt;/a&gt;.)&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-is-still-missing&quot;&gt;What is still missing?&lt;/h2&gt;

&lt;p&gt;To conclude the approach and make Flink’s experience on bounded data (batch) state-of-the-art, we need to add a few more enhancements. We believe that these features are key to realizing our vision:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(1) A truly unified runtime operator stack&lt;/strong&gt;: Currently the bounded and unbounded operators have a different network and threading model and don’t mix and match. The original reason was that batch operators followed a “pull model” (easier for batch algorithms), while streaming operators followed a “push model” (better latency/throughput characteristics). In a unified stack, continuous streaming operators are the foundation. When operating on bounded data without latency constraints, the API or the query optimizer can select from a larger set of operators. The optimizer can pick, for example, a specialized join operator that first consumes one input stream entirely before reading the second input stream.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(2) Exploiting bounded streams to reduce the scope of fault tolerance&lt;/strong&gt;: When input data is bounded, it is possible to completely buffer data during shuffles (memory or disk) and replay that data after a failure. This makes recovery more fine grained and thus much more efficient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(3) Exploiting bounded stream operator properties for scheduling&lt;/strong&gt;: A continuous unbounded streaming application needs (by definition) all operators running at the same time. An application on bounded data can schedule operations after another, depending on how the operators consume data (e.g., first build hash table, then probe hash table). This increases resource efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(4) Enabling these special case optimizations for the DataStream API&lt;/strong&gt;: Currently, only the Table API (which is unified across bounded/unbounded streams) activates these optimizations when working on bounded data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;(5) Performance and coverage for SQL&lt;/strong&gt;: SQL is the de-facto standard data language, and while it is also being rapidly adopted for continuous streaming use cases, there is absolutely no way past it for bounded/batch use cases. To be competitive with the best batch engines, Flink needs more coverage and performance for the SQL query execution. While the core data-plane in Flink is high performance, the speed of SQL execution ultimately depends a lot also on optimizer rules, a rich set of operators, and features like code generation.&lt;/p&gt;

&lt;h2 id=&quot;enter-blink&quot;&gt;Enter Blink&lt;/h2&gt;

&lt;p&gt;Blink is a fork of Apache Flink, originally created inside Alibaba to improve Flink’s behavior for internal use cases. Blink adds a series of improvements and integrations (see the &lt;a href=&quot;https://github.com/apache/flink/blob/blink/README.md&quot;&gt;Readme&lt;/a&gt; for details), many of which fall into the category of improved bounded-data/batch processing and SQL. In fact, of the above list of features for a unified batch/streaming system, Blink implements significant steps forward in all except (4):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified Stream Operators:&lt;/strong&gt; Blink extends the Flink streaming runtime operator model to support selectively reading from different inputs, while keeping the push model for very low latency. This control over the inputs helps to now support algorithms like hybrid hash-joins on the same operator and threading model as continuous symmetric joins through RocksDB. These operators also form the basis for future features like &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API&quot;&gt;“Side Inputs”&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Table API &amp;amp; SQL Query Processor:&lt;/strong&gt; The SQL query processor is the component that evolved the changed most compared to the latest Flink master branch:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;While Flink currently translates queries either into DataSet or DataStream programs (depending on the characteristics of their inputs), Blink translates queries to a data flow of the aforementioned stream operators.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Blink adds many more runtime operators for common SQL operations like semi-joins, anti-joins, etc.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The query planner (optimizer) is still based on Apache Calcite, but has many more optimization rules (incl. join reordering) and uses a proper cost model for planning.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Stream operators are more aggressively chained.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The common data structures (sorters, hash tables) and serializers are extended to go even further in operating on binary data and saving serialization overhead. Code generation is used for the row serializers.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Improved Scheduling and Failure Recovery:&lt;/strong&gt; Finally, Blink implements several improvements for task scheduling and fault tolerance. The scheduling strategies use resources better by exploiting how the operators process their input data. The failover strategies recover more fine-grained along the boundaries of persistent shuffles. A failed JobManager can be replaced without restarting a running application.&lt;/p&gt;

&lt;p&gt;The changes in Blink result in a big improvement in performance. The below numbers were reported by the developers of Blink to give a rough impression of the performance gains.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/blink-flink-tpch.png&quot; width=&quot;600px&quot; alt=&quot;TPC-H performance of Blink and Flink.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Relative performance of Blink versus Flink 1.6.0 in the TPC-H benchmark, query by query.&lt;br /&gt;
The performance improvement is in average 10x.&lt;br /&gt;
&lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/unified-engine-for-data-processing-and-ai&quot; target=&quot;blank&quot;&gt;Presentation by Xiaowei Jiang at Flink Forward Berlin, 2018&lt;/a&gt;.)&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/blink-spark-tpcds.png&quot; width=&quot;600px&quot; alt=&quot;TPC-DS performace of Blink and Spark.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Performance of Blink versus Spark in the TPC-DS benchmark, aggregate time for all queries together.&lt;br /&gt;
&lt;a href=&quot;https://www.bilibili.com/video/av42325467/?p=3&quot; target=&quot;blank&quot;&gt;Presentation by Xiaowei Jiang at Flink Forward Beijing, 2018&lt;/a&gt;.&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;how-do-we-plan-to-merge-blink-and-flink&quot;&gt;How do we plan to merge Blink and Flink?&lt;/h2&gt;

&lt;p&gt;Blink’s code is currently available as a &lt;a href=&quot;https://github.com/apache/flink/tree/blink&quot;&gt;branch&lt;/a&gt; in the Apache Flink repository. It is a challenge to merge a such big amount of changes, while making the merge process as non-disruptive as possible and keeping public APIs as stable as possible.&lt;/p&gt;

&lt;p&gt;The community’s &lt;a href=&quot;https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E&quot;&gt;merge plan&lt;/a&gt; focuses initially on the bounded/batch processing features mentioned above and follows the following approach to ensure a smooth integration:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;To merge Blink’s &lt;em&gt;SQL/Table API query processor&lt;/em&gt; enhancements, we exploit the fact that both Flink and Blink have the same APIs: SQL and the Table API.
Following some restructuring of the Table/SQL module (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;) we plan to merge the Blink query planner (optimizer) and runtime (operators) as an additional query processor next to the current SQL runtime. Think of it as two different runners for the same APIs.&lt;br /&gt;
Initially, users will be able to select which query processor to use. After a transition period in which the new query processor will be developed to subsume the current query processor, the current processor will most likely be deprecated and eventually dropped. Given that SQL is such a well defined interface, we anticipate that this transition has little friction for users. Mostly a pleasant surprise to have broader SQL feature coverage and a boost in performance.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To support the merge of Blink’s &lt;em&gt;enhancements to scheduling and recovery&lt;/em&gt; for jobs on bounded data, the Flink community is already working on refactoring its current schedule and adding support for &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10429&quot;&gt;pluggable scheduling and fail-over strategies&lt;/a&gt;.&lt;br /&gt;
Once this effort is finished, we can add Blink’s scheduling and recovery strategies as a new scheduling strategy that is used by the new query processor. Eventually, we plan to use the new scheduling strategy also for bounded DataStream programs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The extended catalog support, DDL support, as well as support for Hive’s catalog and integrations is currently going through separate design discussions. We plan to leverage existing code here whenever it makes sense.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;We believe that the data processing stack of the future is based on stream processing: The elegance of stream processing with its ability to model offline processing (batch), real-time data processing, and event-driven applications in the same way, while offering high performance and consistency is simply too compelling.&lt;/p&gt;

&lt;p&gt;Exploiting certain properties of bounded data is important for a stream processor to achieve the same performance as dedicated batch processors. While Flink always supported batch processing, the project is taking the next step in building a unified runtime and towards &lt;strong&gt;becoming a stream processor that is competitive with batch processing systems even on their home turf: OLAP SQL.&lt;/strong&gt; The contribution of Alibaba’s Blink code helps the Flink community to pick up the speed on this development.&lt;/p&gt;
</description>
<pubDate>Wed, 13 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html</link>
<guid isPermaLink="true">/news/2019/02/13/unified-batch-streaming-blink.html</guid>
</item>

<item>
<title>Apache Flink 1.5.6 Released</title>
<description>&lt;p&gt;The Apache Flink community released the sixth and last bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 47 fixes and minor improvements for Flink 1.5.5. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.6.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] -         Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10863&quot;&gt;FLINK-10863&lt;/a&gt;] -         Assign uids to all operators
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] -         YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] -         ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10166&quot;&gt;FLINK-10166&lt;/a&gt;] -         Dependency problems when executing SQL query in sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10309&quot;&gt;FLINK-10309&lt;/a&gt;] -         Cancel with savepoint fails with java.net.ConnectException when using the per job-mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10419&quot;&gt;FLINK-10419&lt;/a&gt;] -         ClassNotFoundException while deserializing user exceptions from checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] -         Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] -         java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10491&quot;&gt;FLINK-10491&lt;/a&gt;] -         Deadlock during spilling data in SpillableSubpartition 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] -         Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10581&quot;&gt;FLINK-10581&lt;/a&gt;] -         YarnConfigurationITCase.testFlinkContainerMemory test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10642&quot;&gt;FLINK-10642&lt;/a&gt;] -         CodeGen split fields errors when maxGeneratedCodeLength equals 1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10655&quot;&gt;FLINK-10655&lt;/a&gt;] -         RemoteRpcInvocation not overwriting ObjectInputStream&amp;#39;s ClassNotFoundException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10669&quot;&gt;FLINK-10669&lt;/a&gt;] -         Exceptions &amp;amp; errors are not properly checked in logs in e2e tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10670&quot;&gt;FLINK-10670&lt;/a&gt;] -         Fix Correlate codegen error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10674&quot;&gt;FLINK-10674&lt;/a&gt;] -         Fix handling of retractions after clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10690&quot;&gt;FLINK-10690&lt;/a&gt;] -         Tests leak resources via Files.list
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10693&quot;&gt;FLINK-10693&lt;/a&gt;] -         Fix Scala EitherSerializer duplication
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10715&quot;&gt;FLINK-10715&lt;/a&gt;] -         E2e tests fail with ConcurrentModificationException in MetricRegistryImpl
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10750&quot;&gt;FLINK-10750&lt;/a&gt;] -         SocketClientSinkTest.testRetry fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10752&quot;&gt;FLINK-10752&lt;/a&gt;] -         Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10753&quot;&gt;FLINK-10753&lt;/a&gt;] -         Propagate and log snapshotting exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10770&quot;&gt;FLINK-10770&lt;/a&gt;] -         Some generated functions are not opened properly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10773&quot;&gt;FLINK-10773&lt;/a&gt;] -         Resume externalized checkpoint end-to-end test fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10821&quot;&gt;FLINK-10821&lt;/a&gt;] -         Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10839&quot;&gt;FLINK-10839&lt;/a&gt;] -         Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10856&quot;&gt;FLINK-10856&lt;/a&gt;] -         Harden resume from externalized checkpoint E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10857&quot;&gt;FLINK-10857&lt;/a&gt;] -         Conflict between JMX and Prometheus Metrics reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10880&quot;&gt;FLINK-10880&lt;/a&gt;] -         Failover strategies should not be applied to Batch Execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10913&quot;&gt;FLINK-10913&lt;/a&gt;] -         ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10925&quot;&gt;FLINK-10925&lt;/a&gt;] -         NPE in PythonPlanStreamer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10990&quot;&gt;FLINK-10990&lt;/a&gt;] -         Enforce minimum timespan in MeterView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10998&quot;&gt;FLINK-10998&lt;/a&gt;] -         flink-metrics-ganglia has LGPL dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] -         Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4173&quot;&gt;FLINK-4173&lt;/a&gt;] -         Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9869&quot;&gt;FLINK-9869&lt;/a&gt;] -         Send PartitionInfo in batch to Improve perfornance
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10613&quot;&gt;FLINK-10613&lt;/a&gt;] -         Remove logger casts in HBaseConnectorITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10614&quot;&gt;FLINK-10614&lt;/a&gt;] -         Update test_batch_allround.sh e2e to new testing infrastructure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10637&quot;&gt;FLINK-10637&lt;/a&gt;] -         Start MiniCluster with random REST port
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10678&quot;&gt;FLINK-10678&lt;/a&gt;] -         Add a switch to run_test to configure if logs should be checked for errors/excepions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10906&quot;&gt;FLINK-10906&lt;/a&gt;] -         docker-entrypoint.sh logs credentails during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10916&quot;&gt;FLINK-10916&lt;/a&gt;] -         Include duplicated user-specified uid into error message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11005&quot;&gt;FLINK-11005&lt;/a&gt;] -         Define flink-sql-client uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10606&quot;&gt;FLINK-10606&lt;/a&gt;] -         Construct NetworkEnvironment simple for tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10607&quot;&gt;FLINK-10607&lt;/a&gt;] -         Unify to remove duplicated NoOpResultPartitionConsumableNotifier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10827&quot;&gt;FLINK-10827&lt;/a&gt;] -         Add test for duplicate() to SerializerTestBase
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 26 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/26/release-1.5.6.html</link>
<guid isPermaLink="true">/news/2018/12/26/release-1.5.6.html</guid>
</item>

<item>
<title>Apache Flink 1.6.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 80 fixes and minor improvements for Flink 1.6.2. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.3.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10097&quot;&gt;FLINK-10097&lt;/a&gt;] -         More tests to increase StreamingFileSink test coverage
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] -         Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10367&quot;&gt;FLINK-10367&lt;/a&gt;] -         Avoid recursion stack overflow during releasing SingleInputGate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10863&quot;&gt;FLINK-10863&lt;/a&gt;] -         Assign uids to all operators in general purpose testing job
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] -         YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9635&quot;&gt;FLINK-9635&lt;/a&gt;] -         Local recovery scheduling can cause spread out of tasks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] -         ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9878&quot;&gt;FLINK-9878&lt;/a&gt;] -         IO worker threads BLOCKED on SSL Session Cache while CMS full gc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10149&quot;&gt;FLINK-10149&lt;/a&gt;] -         Fink Mesos allocates extra port when not configured to do so.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10166&quot;&gt;FLINK-10166&lt;/a&gt;] -         Dependency problems when executing SQL query in sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10309&quot;&gt;FLINK-10309&lt;/a&gt;] -         Cancel with savepoint fails with java.net.ConnectException when using the per job-mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10357&quot;&gt;FLINK-10357&lt;/a&gt;] -         Streaming File Sink end-to-end test failed with mismatch
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10359&quot;&gt;FLINK-10359&lt;/a&gt;] -         Scala example in DataSet docs is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10364&quot;&gt;FLINK-10364&lt;/a&gt;] -         Test instability in NonHAQueryableStateFsBackendITCase#testMapState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10419&quot;&gt;FLINK-10419&lt;/a&gt;] -         ClassNotFoundException while deserializing user exceptions from checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10425&quot;&gt;FLINK-10425&lt;/a&gt;] -         taskmanager.host is not respected
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] -         Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10463&quot;&gt;FLINK-10463&lt;/a&gt;] -         Null literal cannot be properly parsed in Java Table API function call
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10481&quot;&gt;FLINK-10481&lt;/a&gt;] -         Wordcount end-to-end test in docker env unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] -         java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10491&quot;&gt;FLINK-10491&lt;/a&gt;] -         Deadlock during spilling data in SpillableSubpartition 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] -         Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10567&quot;&gt;FLINK-10567&lt;/a&gt;] -         Lost serialize fields when ttl state store with the mutable serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10570&quot;&gt;FLINK-10570&lt;/a&gt;] -         State grows unbounded when &amp;quot;within&amp;quot; constraint not applied
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10581&quot;&gt;FLINK-10581&lt;/a&gt;] -         YarnConfigurationITCase.testFlinkContainerMemory test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10642&quot;&gt;FLINK-10642&lt;/a&gt;] -         CodeGen split fields errors when maxGeneratedCodeLength equals 1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10655&quot;&gt;FLINK-10655&lt;/a&gt;] -         RemoteRpcInvocation not overwriting ObjectInputStream&amp;#39;s ClassNotFoundException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10663&quot;&gt;FLINK-10663&lt;/a&gt;] -         Closing StreamingFileSink can cause NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10669&quot;&gt;FLINK-10669&lt;/a&gt;] -         Exceptions &amp;amp; errors are not properly checked in logs in e2e tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10670&quot;&gt;FLINK-10670&lt;/a&gt;] -         Fix Correlate codegen error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10674&quot;&gt;FLINK-10674&lt;/a&gt;] -         Fix handling of retractions after clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10681&quot;&gt;FLINK-10681&lt;/a&gt;] -         elasticsearch6.ElasticsearchSinkITCase fails if wrong JNA library installed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10690&quot;&gt;FLINK-10690&lt;/a&gt;] -         Tests leak resources via Files.list
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10693&quot;&gt;FLINK-10693&lt;/a&gt;] -         Fix Scala EitherSerializer duplication
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10715&quot;&gt;FLINK-10715&lt;/a&gt;] -         E2e tests fail with ConcurrentModificationException in MetricRegistryImpl
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10750&quot;&gt;FLINK-10750&lt;/a&gt;] -         SocketClientSinkTest.testRetry fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10752&quot;&gt;FLINK-10752&lt;/a&gt;] -         Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10753&quot;&gt;FLINK-10753&lt;/a&gt;] -         Propagate and log snapshotting exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10763&quot;&gt;FLINK-10763&lt;/a&gt;] -         Interval join produces wrong result type in Scala API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10770&quot;&gt;FLINK-10770&lt;/a&gt;] -         Some generated functions are not opened properly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10773&quot;&gt;FLINK-10773&lt;/a&gt;] -         Resume externalized checkpoint end-to-end test fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10809&quot;&gt;FLINK-10809&lt;/a&gt;] -         Using DataStreamUtils.reinterpretAsKeyedStream produces corrupted keyed state after restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10816&quot;&gt;FLINK-10816&lt;/a&gt;] -         Fix LockableTypeSerializer.duplicate() 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10821&quot;&gt;FLINK-10821&lt;/a&gt;] -         Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10839&quot;&gt;FLINK-10839&lt;/a&gt;] -         Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10842&quot;&gt;FLINK-10842&lt;/a&gt;] -         Waiting loops are broken in e2e/common.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10856&quot;&gt;FLINK-10856&lt;/a&gt;] -         Harden resume from externalized checkpoint E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10857&quot;&gt;FLINK-10857&lt;/a&gt;] -         Conflict between JMX and Prometheus Metrics reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10880&quot;&gt;FLINK-10880&lt;/a&gt;] -         Failover strategies should not be applied to Batch Execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10913&quot;&gt;FLINK-10913&lt;/a&gt;] -         ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10925&quot;&gt;FLINK-10925&lt;/a&gt;] -         NPE in PythonPlanStreamer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10946&quot;&gt;FLINK-10946&lt;/a&gt;] -         Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10990&quot;&gt;FLINK-10990&lt;/a&gt;] -         Enforce minimum timespan in MeterView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10992&quot;&gt;FLINK-10992&lt;/a&gt;] -         Jepsen: Do not use /tmp as HDFS Data Directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10997&quot;&gt;FLINK-10997&lt;/a&gt;] -         Avro-confluent-registry does not bundle any dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10998&quot;&gt;FLINK-10998&lt;/a&gt;] -         flink-metrics-ganglia has LGPL dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] -         Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11017&quot;&gt;FLINK-11017&lt;/a&gt;] -         Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11029&quot;&gt;FLINK-11029&lt;/a&gt;] -         Incorrect parameter in Working with state doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11041&quot;&gt;FLINK-11041&lt;/a&gt;] -         ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11045&quot;&gt;FLINK-11045&lt;/a&gt;] -         UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11083&quot;&gt;FLINK-11083&lt;/a&gt;] -         CRowSerializerConfigSnapshot is not instantiable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11087&quot;&gt;FLINK-11087&lt;/a&gt;] -         Broadcast state migration Incompatibility from 1.5.3 to 1.7.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11123&quot;&gt;FLINK-11123&lt;/a&gt;] -         Missing import in ML quickstart docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11136&quot;&gt;FLINK-11136&lt;/a&gt;] -         Fix the logical of merge for DISTINCT aggregates
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4173&quot;&gt;FLINK-4173&lt;/a&gt;] -         Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10353&quot;&gt;FLINK-10353&lt;/a&gt;] -         Restoring a KafkaProducer with Semantic.EXACTLY_ONCE from a savepoint written with Semantic.AT_LEAST_ONCE fails with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10608&quot;&gt;FLINK-10608&lt;/a&gt;] -         Add avro files generated by datastream-allround-test to RAT exclusions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10613&quot;&gt;FLINK-10613&lt;/a&gt;] -         Remove logger casts in HBaseConnectorITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10614&quot;&gt;FLINK-10614&lt;/a&gt;] -         Update test_batch_allround.sh e2e to new testing infrastructure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10637&quot;&gt;FLINK-10637&lt;/a&gt;] -         Start MiniCluster with random REST port
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10678&quot;&gt;FLINK-10678&lt;/a&gt;] -         Add a switch to run_test to configure if logs should be checked for errors/excepions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10692&quot;&gt;FLINK-10692&lt;/a&gt;] -         Harden Confluent schema E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10883&quot;&gt;FLINK-10883&lt;/a&gt;] -         Submitting a jobs without enough slots times out due to a unspecified timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10906&quot;&gt;FLINK-10906&lt;/a&gt;] -         docker-entrypoint.sh logs credentails during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10916&quot;&gt;FLINK-10916&lt;/a&gt;] -         Include duplicated user-specified uid into error message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10951&quot;&gt;FLINK-10951&lt;/a&gt;] -         Disable enforcing of YARN container virtual memory limits in tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11005&quot;&gt;FLINK-11005&lt;/a&gt;] -         Define flink-sql-client uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10606&quot;&gt;FLINK-10606&lt;/a&gt;] -         Construct NetworkEnvironment simple for tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10607&quot;&gt;FLINK-10607&lt;/a&gt;] -         Unify to remove duplicated NoOpResultPartitionConsumableNotifier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10827&quot;&gt;FLINK-10827&lt;/a&gt;] -         Add test for duplicate() to SerializerTestBase
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Wish
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10793&quot;&gt;FLINK-10793&lt;/a&gt;] -         Change visibility of TtlValue and TtlSerializer to public for external tools
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Sat, 22 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/22/release-1.6.3.html</link>
<guid isPermaLink="true">/news/2018/12/22/release-1.6.3.html</guid>
</item>

<item>
<title>Apache Flink 1.7.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.7 series.&lt;/p&gt;

&lt;p&gt;This release includes 27 fixes and minor improvements for Flink 1.7.0. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.7.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] -         Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10367&quot;&gt;FLINK-10367&lt;/a&gt;] -         Avoid recursion stack overflow during releasing SingleInputGate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10522&quot;&gt;FLINK-10522&lt;/a&gt;] -         Check if RecoverableWriter supportsResume and act accordingly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10963&quot;&gt;FLINK-10963&lt;/a&gt;] -         Cleanup small objects uploaded to S3 as independent objects
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] -         YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] -         ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10149&quot;&gt;FLINK-10149&lt;/a&gt;] -         Fink Mesos allocates extra port when not configured to do so.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10359&quot;&gt;FLINK-10359&lt;/a&gt;] -         Scala example in DataSet docs is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] -         java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] -         Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10997&quot;&gt;FLINK-10997&lt;/a&gt;] -         Avro-confluent-registry does not bundle any dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] -         Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11013&quot;&gt;FLINK-11013&lt;/a&gt;] -         Fix distinct aggregates for group window in Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11017&quot;&gt;FLINK-11017&lt;/a&gt;] -         Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11029&quot;&gt;FLINK-11029&lt;/a&gt;] -         Incorrect parameter in Working with state doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11032&quot;&gt;FLINK-11032&lt;/a&gt;] -         Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11033&quot;&gt;FLINK-11033&lt;/a&gt;] -         Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11041&quot;&gt;FLINK-11041&lt;/a&gt;] -         ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11044&quot;&gt;FLINK-11044&lt;/a&gt;] -         RegisterTableSink docs incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11045&quot;&gt;FLINK-11045&lt;/a&gt;] -         UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11047&quot;&gt;FLINK-11047&lt;/a&gt;] -         CoGroupGroupSortTranslationTest does not compile with scala 2.12
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11085&quot;&gt;FLINK-11085&lt;/a&gt;] -         NoClassDefFoundError in presto-s3 filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11087&quot;&gt;FLINK-11087&lt;/a&gt;] -         Broadcast state migration Incompatibility from 1.5.3 to 1.7.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11094&quot;&gt;FLINK-11094&lt;/a&gt;] -         Restored state in RocksDBStateBackend that has not been accessed in restored execution causes NPE on snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11123&quot;&gt;FLINK-11123&lt;/a&gt;] -         Missing import in ML quickstart docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11136&quot;&gt;FLINK-11136&lt;/a&gt;] -         Fix the logical of merge for DISTINCT aggregates
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11080&quot;&gt;FLINK-11080&lt;/a&gt;] -         Define flink-connector-elasticsearch6 uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 21 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/21/release-1.7.1.html</link>
<guid isPermaLink="true">/news/2018/12/21/release-1.7.1.html</guid>
</item>

<item>
<title>Apache Flink 1.7.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce Apache Flink 1.7.0. 
The latest release includes more than 420 resolved issues and some exciting additions to Flink that we describe in the following sections of this post. 
Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12343585&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;Flink 1.7.0 is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.
The release is available now and we encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the updated &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-170---extending-the-reach-of-stream-processing&quot; id=&quot;markdown-toc-flink-170---extending-the-reach-of-stream-processing&quot;&gt;Flink 1.7.0 - Extending the reach of Stream Processing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;flink-170---extending-the-reach-of-stream-processing&quot;&gt;Flink 1.7.0 - Extending the reach of Stream Processing&lt;/h2&gt;

&lt;p&gt;In Flink 1.7.0 we come closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way. 
Our latest release includes some exciting new features and improvements such as support for Scala 2.12, an exactly-once S3 file sink, the integration of complex event processing with streaming SQL and more features that we explain below.&lt;/p&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Scala 2.12 Support in Apache Flink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7811&quot;&gt;FLINK-7811&lt;/a&gt;):
Apache Flink 1.7.0 is the first release which comes with full support for Scala 2.12. 
This allows users to write Flink applications with a newer Scala version and to leverage the Scala 2.12 ecosystem.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;State Evolution&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9376&quot;&gt;FLINK-9376&lt;/a&gt;):
In many cases, a long-running Flink application needs to evolve during its lifetime because of changing requirements. 
Changing the user state without losing the current application progress in the form of its state is a crucial requirement for application evolution.&lt;/p&gt;

    &lt;p&gt;With Flink 1.7.0, the community added state evolution which allows you to flexibly adapt a long-running application’s user states schema, while maintaining compatibility with previous savepoints. 
With state evolution it is possible to add or remove columns to your state schema in order to change which business features will be captured by your application after it has been deployed.&lt;/p&gt;

    &lt;p&gt;State schema evolution now works out-of-the-box when using Avro’s generated classes as user state, meaning that the schema of the state can be evolved according to Avro’s specifications. 
While Avro types are the only built-in type that supports schema evolution as of Flink 1.7, the community continues working to further extend support to other types in future Flink releases.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Exactly-once S3 StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9752&quot;&gt;FLINK-9752&lt;/a&gt;):
The &lt;code&gt;StreamingFileSink&lt;/code&gt; which was introduced in Flink 1.6.0 is now extended to also support writing to S3 filesystems with exactly-once processing guarantees. 
Using this feature allows users to build exactly-once end-to-end pipelines writing to S3.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;&lt;code&gt;MATCH_RECOGNIZE&lt;/code&gt; Support in Streaming SQL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6935&quot;&gt;FLINK-6935&lt;/a&gt;):
This is a major addition to Apache Flink 1.7.0 that provides initial support of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/streaming/match_recognize.html&quot;&gt;&lt;code&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/a&gt; standard to Flink SQL. 
This feature combines both complex event processing (CEP) and SQL for easy pattern matching on data streams and, thus, enabling a whole set of new use cases.&lt;/p&gt;

    &lt;p&gt;This feature is currently in beta phase so we welcome any feedback and suggestions from the community for future iterations and improvements.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Temporal Tables and Temporal Joins in Streaming SQL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9712&quot;&gt;FLINK-9712&lt;/a&gt;):
Temporal Tables is a new concept in Apache Flink that gives a (parameterized) view on a table’s changing history and returns the content of a table at a specific point in time.&lt;/p&gt;

    &lt;p&gt;As an example, we can use a table with historical currency exchange rates. 
Such a table is constantly growing/evolving as time progresses and newly updated exchange rates are added. 
Temporal Table is a view that can return the actual state of those exchange rates to any given point of time. 
With such a table it is possible to convert a stream of orders in different currencies to a common currency using the correct exchange rate.&lt;/p&gt;

    &lt;p&gt;Temporal Joins allow for memory and computational-efficient joins of Streaming data with an ever-changing/updating table, using either processing time or event time, while being ANSI SQL compliant.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Miscellaneous Features for Streaming SQL&lt;/strong&gt;:
Besides the major features mentioned above, Flink’s Table &amp;amp; SQL API has been extended to serve more use cases.&lt;/p&gt;

    &lt;p&gt;The following built-in functions were added to the APIs: &lt;code&gt;TO_BASE64&lt;/code&gt;, &lt;code&gt;LOG2&lt;/code&gt;, &lt;code&gt;LTRIM&lt;/code&gt;, &lt;code&gt;REPEAT&lt;/code&gt;, &lt;code&gt;REPLACE&lt;/code&gt;, &lt;code&gt;COSH&lt;/code&gt;, &lt;code&gt;SINH&lt;/code&gt;, &lt;code&gt;TANH&lt;/code&gt;&lt;/p&gt;

    &lt;p&gt;The SQL Client now supports the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/sqlClient.html#sql-views&quot;&gt;definition of views&lt;/a&gt; both in an environment file and within a CLI session. 
Furthermore, basic SQL statement auto-completion has been added to the CLI.&lt;/p&gt;

    &lt;p&gt;The community added an &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#elasticsearch-connector&quot;&gt;Elasticsearch 6 table sink&lt;/a&gt; which allows to store updating results of a dynamic table.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Versioned REST API&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7551&quot;&gt;FLINK-7551&lt;/a&gt;):
Beginning with Flink 1.7.0, the REST API is versioned. 
This guarantees the stability of Flink’s REST API so that third-party applications can be developed against a stable API in Flink. 
Thus, future Flink upgrades will not require changes to existing third-party integrations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Kafka 2.0 Connector&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10598&quot;&gt;FLINK-10598&lt;/a&gt;):
Apache Flink 1.7.0 continues to add more connectors, making it even easier to interact with more external systems. 
In this release, the community added the Kafka 2.0 connector which allows to read from and write to Kafka 2.0 with exactly-once guarantees.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Local Recovery&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9635&quot;&gt;FLINK-9635&lt;/a&gt;):
Apache Flink 1.7.0 completes the local recovery feature by extending Flink’s scheduling to take previous deployment locations into account in case of recovery.&lt;/p&gt;

    &lt;p&gt;If local recovery is enabled Flink will keep a local copy of the latest checkpoint on the machine where the task is running. 
By scheduling tasks to their previous locations, Flink will, thus, minimize the network traffic for restoring state by reading checkpoint state from local disk. 
This feature considerably improves recovery speed.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Removal of Flink’s Legacy Mode&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10392&quot;&gt;FLINK-10392&lt;/a&gt;):
Apache Flink 1.7.0 marks the release where the Flip-6 effort has been fully completed and reached feature parity with the legacy mode. 
Consequently, this release removes support for the legacy mode.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.7.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;We would like to acknowledge all community members for contributing to this release. 
Special credits go to the following members for contributing to the 1.7.0 release (according to git):&lt;/p&gt;

&lt;p&gt;Aitozi, Alex Arkhipov, Alexander Koltsov, Alexey Trenikhin, Alice, Alice Yan, Aljoscha Krettek, Andrei Poluliakh, Andrey Zagrebin, Ashwin Sinha, Barisa Obradovic, Ben La Monica, Benoit Meriaux, Bowen Li, Chesnay Schepler, Christophe Jolif, Congxian Qiu, Craig Foster, David Anderson, Dawid Wysakowicz, Dian Fu, Diego Carvallo, Dimitris Palyvos, Eugen Yushin, Fabian Hueske, Florian Schmidt, Gary Yao, Guibo Pan, Hequn Cheng, Hiroaki Yoshida, Igal Shilman, JIN SUN, Jamie Grier, Jayant Ameta, Jeff Zhang, Jeffrey Chung, Jicaar, Jin Sun, Joe Malt, Johannes Dillmann, Jun Zhang, Kostas Kloudas, Krzysztof Białek, Lakshmi Gururaja Rao, Liu Biao, Mahesh Senniappan, Manuel Hoffmann, Mark Cho, Max Feng, Mike Pedersen, Mododo, Nico Kruber, Oleksandr Nitavskyi, Osman Şamil AKÇELİK, Patrick Lucas, Paul Lam, Piotr Nowojski, Rick Hofstede, Rong R, Rong Rong, Sayat Satybaldiyev, Sebastian Klemke, Seth Wiesman, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Jason, Thomas Weise, Till Rohrmann, Timo Walther, Tzu-Li “tison” Chen, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Wosin, Xingcan Cui, Xpray, Xue Yu, Yangze Guo, Ying Xu, Yun Tang, Zhijiang, blues Zheng, hequn8128, ifndef-SleePy, jerryjzhang, jrthe42, jyc.jia, kkolman, lihongli, linjun, linzhaoming, liurenjie1024, liuxianjiao, lrl, lsy, lzqdename, maqingxiang, maqingxiang-it, minwenjun, shuai-xu, sihuazhou, snuyanzin, wind, xuewei.linxuewei, xueyu, xuqianjin, yanghua, yangshimin, zhijiang, 谢磊, 陈梓立&lt;/p&gt;
</description>
<pubDate>Fri, 30 Nov 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/11/30/release-1.7.0.html</link>
<guid isPermaLink="true">/news/2018/11/30/release-1.7.0.html</guid>
</item>

<item>
<title>Apache Flink 1.6.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 30 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10242&quot;&gt;FLINK-10242&lt;/a&gt;] -         Latency marker interval should be configurable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10243&quot;&gt;FLINK-10243&lt;/a&gt;] -         Add option to reduce latency metrics granularity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10331&quot;&gt;FLINK-10331&lt;/a&gt;] -         Reduce number of flush requests to the network stack
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10332&quot;&gt;FLINK-10332&lt;/a&gt;] -         Move data available notification in PipelinedSubpartition out of the synchronized block
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5542&quot;&gt;FLINK-5542&lt;/a&gt;] -         YARN client incorrectly uses local YARN config to check vcore capacity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] -         Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9788&quot;&gt;FLINK-9788&lt;/a&gt;] -         ExecutionGraph Inconsistency prevents Job from recovering
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9884&quot;&gt;FLINK-9884&lt;/a&gt;] -         Slot request may not be removed when it has already be assigned in slot manager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9891&quot;&gt;FLINK-9891&lt;/a&gt;] -         Flink cluster is not shutdown in YARN mode when Flink client is stopped
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9932&quot;&gt;FLINK-9932&lt;/a&gt;] -         Timed-out TaskExecutor slot-offers to JobMaster leak the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10135&quot;&gt;FLINK-10135&lt;/a&gt;] -         Certain cluster-level metrics are no longer exposed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10157&quot;&gt;FLINK-10157&lt;/a&gt;] -         Allow `null` user values in map state with TTL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10222&quot;&gt;FLINK-10222&lt;/a&gt;] -         Table scalar function expression parses error when function name equals the exists keyword suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10259&quot;&gt;FLINK-10259&lt;/a&gt;] -         Key validation for GroupWindowAggregate is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10263&quot;&gt;FLINK-10263&lt;/a&gt;] -         User-defined function with LITERAL paramters yields CompileException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10316&quot;&gt;FLINK-10316&lt;/a&gt;] -         Add check to KinesisProducer that aws.region is set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10354&quot;&gt;FLINK-10354&lt;/a&gt;] -         Savepoints should be counted as retained checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10363&quot;&gt;FLINK-10363&lt;/a&gt;] -         S3 FileSystem factory prints secrets into logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10379&quot;&gt;FLINK-10379&lt;/a&gt;] -         Can not use Table Functions in Java Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10383&quot;&gt;FLINK-10383&lt;/a&gt;] -         Hadoop configurations on the classpath seep into the S3 file system configs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10390&quot;&gt;FLINK-10390&lt;/a&gt;] -         DataDog MetricReporter leaks connections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10400&quot;&gt;FLINK-10400&lt;/a&gt;] -         Return failed JobResult if job terminates in state FAILED or CANCELED
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10415&quot;&gt;FLINK-10415&lt;/a&gt;] -         RestClient does not react to lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10444&quot;&gt;FLINK-10444&lt;/a&gt;] -         Make S3 entropy injection work with FileSystem safety net
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10451&quot;&gt;FLINK-10451&lt;/a&gt;] -         TableFunctionCollector should handle the life cycle of ScalarFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10465&quot;&gt;FLINK-10465&lt;/a&gt;] -         Jepsen: runit supervised sshd is stopped on tear down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10469&quot;&gt;FLINK-10469&lt;/a&gt;] -         FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10487&quot;&gt;FLINK-10487&lt;/a&gt;] -         fix invalid Flink SQL example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10516&quot;&gt;FLINK-10516&lt;/a&gt;] -         YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10524&quot;&gt;FLINK-10524&lt;/a&gt;] -         MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10532&quot;&gt;FLINK-10532&lt;/a&gt;] -         Broken links in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10544&quot;&gt;FLINK-10544&lt;/a&gt;] -         Remove custom settings.xml for snapshot deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9061&quot;&gt;FLINK-9061&lt;/a&gt;] -         Add entropy to s3 path for better scalability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10075&quot;&gt;FLINK-10075&lt;/a&gt;] -         HTTP connections to a secured REST endpoint flood the log
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10260&quot;&gt;FLINK-10260&lt;/a&gt;] -         Confusing log messages during TaskManager registration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10282&quot;&gt;FLINK-10282&lt;/a&gt;] -         Provide separate thread-pool for REST endpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10291&quot;&gt;FLINK-10291&lt;/a&gt;] -         Generate JobGraph with fixed/configurable JobID in StandaloneJobClusterEntrypoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10311&quot;&gt;FLINK-10311&lt;/a&gt;] -         HA end-to-end/Jepsen tests for standby Dispatchers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10312&quot;&gt;FLINK-10312&lt;/a&gt;] -         Wrong / missing exception when submitting job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10371&quot;&gt;FLINK-10371&lt;/a&gt;] -         Allow to enable SSL mutual authentication on REST endpoints by configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10375&quot;&gt;FLINK-10375&lt;/a&gt;] -         ExceptionInChainedStubException hides wrapped exception in cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10582&quot;&gt;FLINK-10582&lt;/a&gt;] -         Make REST executor thread priority configurable
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 29 Oct 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/10/29/release-1.6.2.html</link>
<guid isPermaLink="true">/news/2018/10/29/release-1.6.2.html</guid>
</item>

<item>
<title>Apache Flink 1.5.5 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fifth bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.5.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10242&quot;&gt;FLINK-10242&lt;/a&gt;] -         Latency marker interval should be configurable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10243&quot;&gt;FLINK-10243&lt;/a&gt;] -         Add option to reduce latency metrics granularity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10331&quot;&gt;FLINK-10331&lt;/a&gt;] -         Reduce number of flush requests to the network stack
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10332&quot;&gt;FLINK-10332&lt;/a&gt;] -         Move data available notification in PipelinedSubpartition out of the synchronized block
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5542&quot;&gt;FLINK-5542&lt;/a&gt;] -         YARN client incorrectly uses local YARN config to check vcore capacity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] -         Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9788&quot;&gt;FLINK-9788&lt;/a&gt;] -         ExecutionGraph Inconsistency prevents Job from recovering
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9884&quot;&gt;FLINK-9884&lt;/a&gt;] -         Slot request may not be removed when it has already be assigned in slot manager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9891&quot;&gt;FLINK-9891&lt;/a&gt;] -         Flink cluster is not shutdown in YARN mode when Flink client is stopped
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9932&quot;&gt;FLINK-9932&lt;/a&gt;] -         Timed-out TaskExecutor slot-offers to JobMaster leak the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10135&quot;&gt;FLINK-10135&lt;/a&gt;] -         Certain cluster-level metrics are no longer exposed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10222&quot;&gt;FLINK-10222&lt;/a&gt;] -         Table scalar function expression parses error when function name equals the exists keyword suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10259&quot;&gt;FLINK-10259&lt;/a&gt;] -         Key validation for GroupWindowAggregate is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10316&quot;&gt;FLINK-10316&lt;/a&gt;] -         Add check to KinesisProducer that aws.region is set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10354&quot;&gt;FLINK-10354&lt;/a&gt;] -         Savepoints should be counted as retained checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10400&quot;&gt;FLINK-10400&lt;/a&gt;] -         Return failed JobResult if job terminates in state FAILED or CANCELED
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10415&quot;&gt;FLINK-10415&lt;/a&gt;] -         RestClient does not react to lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10451&quot;&gt;FLINK-10451&lt;/a&gt;] -         TableFunctionCollector should handle the life cycle of ScalarFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10469&quot;&gt;FLINK-10469&lt;/a&gt;] -         FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10487&quot;&gt;FLINK-10487&lt;/a&gt;] -         fix invalid Flink SQL example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10516&quot;&gt;FLINK-10516&lt;/a&gt;] -         YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10524&quot;&gt;FLINK-10524&lt;/a&gt;] -         MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10544&quot;&gt;FLINK-10544&lt;/a&gt;] -         Remove custom settings.xml for snapshot deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10075&quot;&gt;FLINK-10075&lt;/a&gt;] -         HTTP connections to a secured REST endpoint flood the log
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10260&quot;&gt;FLINK-10260&lt;/a&gt;] -         Confusing log messages during TaskManager registration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10282&quot;&gt;FLINK-10282&lt;/a&gt;] -         Provide separate thread-pool for REST endpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10312&quot;&gt;FLINK-10312&lt;/a&gt;] -         Wrong / missing exception when submitting job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10375&quot;&gt;FLINK-10375&lt;/a&gt;] -         ExceptionInChainedStubException hides wrapped exception in cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10582&quot;&gt;FLINK-10582&lt;/a&gt;] -         Make REST executor thread priority configurable
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 29 Oct 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/10/29/release-1.5.5.html</link>
<guid isPermaLink="true">/news/2018/10/29/release-1.5.5.html</guid>
</item>

<item>
<title>Apache Flink 1.6.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;

&lt;p&gt;This release includes 60 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9637&quot;&gt;FLINK-9637&lt;/a&gt;] -         Add public user documentation for TTL feature
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10068&quot;&gt;FLINK-10068&lt;/a&gt;] -         Add documentation for async/RocksDB-based timers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10085&quot;&gt;FLINK-10085&lt;/a&gt;] -         Update AbstractOperatorRestoreTestBase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10087&quot;&gt;FLINK-10087&lt;/a&gt;] -         Update BucketingSinkMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10089&quot;&gt;FLINK-10089&lt;/a&gt;] -         Update FlinkKafkaConsumerBaseMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10090&quot;&gt;FLINK-10090&lt;/a&gt;] -         Update ContinuousFileProcessingMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10091&quot;&gt;FLINK-10091&lt;/a&gt;] -         Update WindowOperatorMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10092&quot;&gt;FLINK-10092&lt;/a&gt;] -         Update StatefulJobSavepointMigrationITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10109&quot;&gt;FLINK-10109&lt;/a&gt;] -         Add documentation for StreamingFileSink
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9289&quot;&gt;FLINK-9289&lt;/a&gt;] -         Parallelism of generated operators should have max parallism of input
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9546&quot;&gt;FLINK-9546&lt;/a&gt;] -         The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] -         Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9972&quot;&gt;FLINK-9972&lt;/a&gt;] -         Debug memory logging not working 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10011&quot;&gt;FLINK-10011&lt;/a&gt;] -         Old job resurrected during HA failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10063&quot;&gt;FLINK-10063&lt;/a&gt;] -         Jepsen: Automatically restart Mesos Processes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10101&quot;&gt;FLINK-10101&lt;/a&gt;] -         Mesos web ui url is missing.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10105&quot;&gt;FLINK-10105&lt;/a&gt;] -         Test failure because of jobmanager.execution.failover-strategy is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10115&quot;&gt;FLINK-10115&lt;/a&gt;] -         Content-length limit is also applied to FileUploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10116&quot;&gt;FLINK-10116&lt;/a&gt;] -         createComparator fails on case class with Unit type fields prior to the join-key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10141&quot;&gt;FLINK-10141&lt;/a&gt;] -         Reduce lock contention introduced with 1.5
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10142&quot;&gt;FLINK-10142&lt;/a&gt;] -         Reduce synchronization overhead for credit notifications
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10150&quot;&gt;FLINK-10150&lt;/a&gt;] -         Chained batch operators interfere with each other other
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10151&quot;&gt;FLINK-10151&lt;/a&gt;] -         [State TTL] Fix false recursion call in TransformingStateTableKeyGroupPartitioner.tryAddToSource
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10154&quot;&gt;FLINK-10154&lt;/a&gt;] -         Make sure we always read at least one record in KinesisConnector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10169&quot;&gt;FLINK-10169&lt;/a&gt;] -         RowtimeValidator fails with custom TimestampExtractor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10172&quot;&gt;FLINK-10172&lt;/a&gt;] -         Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10192&quot;&gt;FLINK-10192&lt;/a&gt;] -         SQL Client table visualization mode does not update correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10193&quot;&gt;FLINK-10193&lt;/a&gt;] -         Default RPC timeout is used when triggering savepoint via JobMasterGateway
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10204&quot;&gt;FLINK-10204&lt;/a&gt;] -         StreamElementSerializer#copy broken for LatencyMarkers 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10255&quot;&gt;FLINK-10255&lt;/a&gt;] -         Standby Dispatcher locks submitted JobGraphs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10261&quot;&gt;FLINK-10261&lt;/a&gt;] -         INSERT INTO does not work with ORDER BY clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10267&quot;&gt;FLINK-10267&lt;/a&gt;] -         [State] Fix arbitrary iterator access on RocksDBMapIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10269&quot;&gt;FLINK-10269&lt;/a&gt;] -         Elasticsearch 6 UpdateRequest fail because of binary incompatibility
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10283&quot;&gt;FLINK-10283&lt;/a&gt;] -         FileCache logs unnecessary warnings
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10293&quot;&gt;FLINK-10293&lt;/a&gt;] -         RemoteStreamEnvironment does not forward port to RestClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10314&quot;&gt;FLINK-10314&lt;/a&gt;] -         Blocking calls in Execution Graph creation bring down cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10328&quot;&gt;FLINK-10328&lt;/a&gt;] -         Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10329&quot;&gt;FLINK-10329&lt;/a&gt;] -         Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10022&quot;&gt;FLINK-10022&lt;/a&gt;] -         Add metrics for input/output buffers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9013&quot;&gt;FLINK-9013&lt;/a&gt;] -         Document yarn.containers.vcores only being effective when adapting YARN config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9446&quot;&gt;FLINK-9446&lt;/a&gt;] -         Compatibility table not up-to-date
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9795&quot;&gt;FLINK-9795&lt;/a&gt;] -         Update Mesos documentation for flip6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9859&quot;&gt;FLINK-9859&lt;/a&gt;] -         More Akka config options
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9899&quot;&gt;FLINK-9899&lt;/a&gt;] -         Add more metrics to the Kinesis source connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9962&quot;&gt;FLINK-9962&lt;/a&gt;] -         allow users to specify TimeZone in DateTimeBucketer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10001&quot;&gt;FLINK-10001&lt;/a&gt;] -         Improve Kubernetes documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10006&quot;&gt;FLINK-10006&lt;/a&gt;] -         Improve logging in BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10020&quot;&gt;FLINK-10020&lt;/a&gt;] -         Kinesis Consumer listShards should support more recoverable exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10082&quot;&gt;FLINK-10082&lt;/a&gt;] -         Initialize StringBuilder in Slf4jReporter with estimated size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10094&quot;&gt;FLINK-10094&lt;/a&gt;] -         Always backup default config for end-to-end tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10110&quot;&gt;FLINK-10110&lt;/a&gt;] -         Harden e2e Kafka shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10131&quot;&gt;FLINK-10131&lt;/a&gt;] -         Improve logging around ResultSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10137&quot;&gt;FLINK-10137&lt;/a&gt;] -         YARN: Log completed Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10164&quot;&gt;FLINK-10164&lt;/a&gt;] -         Add support for resuming from savepoints to StandaloneJobClusterEntrypoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10170&quot;&gt;FLINK-10170&lt;/a&gt;] -         Support string representation for map and array types in descriptor-based Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10185&quot;&gt;FLINK-10185&lt;/a&gt;] -         Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10223&quot;&gt;FLINK-10223&lt;/a&gt;] -         TaskManagers should log their ResourceID during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10301&quot;&gt;FLINK-10301&lt;/a&gt;] -         Allow a custom Configuration in StreamNetworkBenchmarkEnvironment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10325&quot;&gt;FLINK-10325&lt;/a&gt;] -         [State TTL] Refactor TtlListState to use only loops, no java stream API for performance
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10084&quot;&gt;FLINK-10084&lt;/a&gt;] -         Migration tests weren&amp;#39;t updated for 1.5
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 20 Sep 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/09/20/release-1.6.1.html</link>
<guid isPermaLink="true">/news/2018/09/20/release-1.6.1.html</guid>
</item>

<item>
<title>Apache Flink 1.5.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fourth bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.4.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9878&quot;&gt;FLINK-9878&lt;/a&gt;] -         IO worker threads BLOCKED on SSL Session Cache while CMS full gc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10011&quot;&gt;FLINK-10011&lt;/a&gt;] -         Old job resurrected during HA failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10101&quot;&gt;FLINK-10101&lt;/a&gt;] -         Mesos web ui url is missing.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10115&quot;&gt;FLINK-10115&lt;/a&gt;] -         Content-length limit is also applied to FileUploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10116&quot;&gt;FLINK-10116&lt;/a&gt;] -         createComparator fails on case class with Unit type fields prior to the join-key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10141&quot;&gt;FLINK-10141&lt;/a&gt;] -         Reduce lock contention introduced with 1.5
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10142&quot;&gt;FLINK-10142&lt;/a&gt;] -         Reduce synchronization overhead for credit notifications
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10150&quot;&gt;FLINK-10150&lt;/a&gt;] -         Chained batch operators interfere with each other other
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10172&quot;&gt;FLINK-10172&lt;/a&gt;] -         Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10193&quot;&gt;FLINK-10193&lt;/a&gt;] -         Default RPC timeout is used when triggering savepoint via JobMasterGateway
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10204&quot;&gt;FLINK-10204&lt;/a&gt;] -         StreamElementSerializer#copy broken for LatencyMarkers 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10255&quot;&gt;FLINK-10255&lt;/a&gt;] -         Standby Dispatcher locks submitted JobGraphs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10261&quot;&gt;FLINK-10261&lt;/a&gt;] -         INSERT INTO does not work with ORDER BY clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10267&quot;&gt;FLINK-10267&lt;/a&gt;] -         [State] Fix arbitrary iterator access on RocksDBMapIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10293&quot;&gt;FLINK-10293&lt;/a&gt;] -         RemoteStreamEnvironment does not forward port to RestClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10314&quot;&gt;FLINK-10314&lt;/a&gt;] -         Blocking calls in Execution Graph creation bring down cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10328&quot;&gt;FLINK-10328&lt;/a&gt;] -         Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10329&quot;&gt;FLINK-10329&lt;/a&gt;] -         Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10082&quot;&gt;FLINK-10082&lt;/a&gt;] -         Initialize StringBuilder in Slf4jReporter with estimated size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10131&quot;&gt;FLINK-10131&lt;/a&gt;] -         Improve logging around ResultSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10137&quot;&gt;FLINK-10137&lt;/a&gt;] -         YARN: Log completed Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10185&quot;&gt;FLINK-10185&lt;/a&gt;] -         Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10223&quot;&gt;FLINK-10223&lt;/a&gt;] -         TaskManagers should log their ResourceID during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10301&quot;&gt;FLINK-10301&lt;/a&gt;] -         Allow a custom Configuration in StreamNetworkBenchmarkEnvironment
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 20 Sep 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/09/20/release-1.5.4.html</link>
<guid isPermaLink="true">/news/2018/09/20/release-1.5.4.html</guid>
</item>

<item>
<title>Apache Flink 1.5.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.3. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.3.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9951&quot;&gt;FLINK-9951&lt;/a&gt;] -         Update scm developerConnection
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5750&quot;&gt;FLINK-5750&lt;/a&gt;] -         Incorrect translation of n-ary Union
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9289&quot;&gt;FLINK-9289&lt;/a&gt;] -         Parallelism of generated operators should have max parallism of input
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9546&quot;&gt;FLINK-9546&lt;/a&gt;] -         The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9655&quot;&gt;FLINK-9655&lt;/a&gt;] -         Externalized checkpoint E2E test fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] -         Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9694&quot;&gt;FLINK-9694&lt;/a&gt;] -         Potentially NPE in CompositeTypeSerializerConfigSnapshot constructor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9923&quot;&gt;FLINK-9923&lt;/a&gt;] -         OneInputStreamTaskTest.testWatermarkMetrics fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9935&quot;&gt;FLINK-9935&lt;/a&gt;] -         Batch Table API: grouping by window and attribute causes java.lang.ClassCastException:
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9936&quot;&gt;FLINK-9936&lt;/a&gt;] -         Mesos resource manager unable to connect to master after failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9946&quot;&gt;FLINK-9946&lt;/a&gt;] -         Quickstart E2E test archetype version is hard-coded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9969&quot;&gt;FLINK-9969&lt;/a&gt;] -         Unreasonable memory requirements to complete examples/batch/WordCount
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9972&quot;&gt;FLINK-9972&lt;/a&gt;] -         Debug memory logging not working 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9978&quot;&gt;FLINK-9978&lt;/a&gt;] -         Source release sha contains absolute file path
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9985&quot;&gt;FLINK-9985&lt;/a&gt;] -         Incorrect parameter order in document
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9988&quot;&gt;FLINK-9988&lt;/a&gt;] -           job manager does not respect property jobmanager.web.address
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10013&quot;&gt;FLINK-10013&lt;/a&gt;] -         Fix Kerberos integration for FLIP-6 YarnTaskExecutorRunner 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10033&quot;&gt;FLINK-10033&lt;/a&gt;] -         Let Task release reference to Invokable on shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10070&quot;&gt;FLINK-10070&lt;/a&gt;] -         Flink cannot be compiled with maven 3.0.x
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10022&quot;&gt;FLINK-10022&lt;/a&gt;] -         Add metrics for input/output buffers
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9446&quot;&gt;FLINK-9446&lt;/a&gt;] -         Compatibility table not up-to-date
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9765&quot;&gt;FLINK-9765&lt;/a&gt;] -         Improve CLI responsiveness when cluster is not reachable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9806&quot;&gt;FLINK-9806&lt;/a&gt;] -         Add a canonical link element to documentation HTML
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9859&quot;&gt;FLINK-9859&lt;/a&gt;] -         More Akka config options
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9942&quot;&gt;FLINK-9942&lt;/a&gt;] -         Guard handlers against null fields in requests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9986&quot;&gt;FLINK-9986&lt;/a&gt;] -         Remove unnecessary information from .version.properties file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9987&quot;&gt;FLINK-9987&lt;/a&gt;] -         Rework ClassLoader E2E test to not rely on .version.properties file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10006&quot;&gt;FLINK-10006&lt;/a&gt;] -         Improve logging in BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10016&quot;&gt;FLINK-10016&lt;/a&gt;] -         Make YARN/Kerberos end-to-end test stricter
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 21 Aug 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/08/21/release-1.5.3.html</link>
<guid isPermaLink="true">/news/2018/08/21/release-1.5.3.html</guid>
</item>

<item>
<title>Apache Flink 1.6.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is proud to announce the 1.6.0 release. Over the past 2 months, the Flink community has worked hard to resolve more than 360 issues. Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12342760&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;Flink 1.6.0 is the seventh major release in the 1.x.y series. It is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;

&lt;p&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-16---the-next-step-in-stateful-stream-processing&quot; id=&quot;markdown-toc-flink-16---the-next-step-in-stateful-stream-processing&quot;&gt;Flink 1.6 - The next step in stateful stream processing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#improving-flinks-state-support&quot; id=&quot;markdown-toc-improving-flinks-state-support&quot;&gt;Improving Flink’s State Support&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#extending-flinks-deployment-options&quot; id=&quot;markdown-toc-extending-flinks-deployment-options&quot;&gt;Extending Flink’s Deployment Options&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#enhancing-sql-and-table-api&quot; id=&quot;markdown-toc-enhancing-sql-and-table-api&quot;&gt;Enhancing SQL and Table API&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#more-connectors&quot; id=&quot;markdown-toc-more-connectors&quot;&gt;More Connectors&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#jepsen-based-distributed-tests-suite&quot; id=&quot;markdown-toc-jepsen-based-distributed-tests-suite&quot;&gt;Jepsen Based Distributed Tests Suite&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#various-other-features-and-improvements&quot; id=&quot;markdown-toc-various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;flink-16---the-next-step-in-stateful-stream-processing&quot;&gt;Flink 1.6 - The next step in stateful stream processing&lt;/h2&gt;

&lt;p&gt;In Flink 1.6.0 we continue the groundwork we laid out in earlier versions: Enabling Flink users to seamlessly run fast data processing and build data-driven and data-intensive applications effortlessly.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Flink’s state support is one of the key features which makes Flink so versatile and powerful when it comes to implementing all kinds of use cases. 
To make it even easier, the community added &lt;strong&gt;native support for state TTL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9938&quot;&gt;FLINK-9938&lt;/a&gt;). 
This feature allows to clean up state after it has expired. 
With Flink 1.6.0 &lt;strong&gt;timer state can now go out of core&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9485&quot;&gt;FLINK-9485&lt;/a&gt;) by storing the relevant state in RocksDB. 
Last but not least, we also &lt;strong&gt;improved the deletion of timers&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9423&quot;&gt;FLINK-9423&lt;/a&gt;) significantly.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;With Flink 1.5.0 we reworked Flink’s distributed architecture to add support for resource elasticity and different deployment scenarios, most notably a better container integration. 
In Flink 1.6.0 we follow up on some of the unfinished aspects of this work: &lt;strong&gt;All external communication, including job submissions, is now HTTP/REST based&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;) which eases container setups considerably. 
Flink 1.6.0 also comes with a &lt;strong&gt;container entrypoint&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9488&quot;&gt;FLINK-9488&lt;/a&gt;) which allows to easily bootstrap a containerized job cluster.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Streaming SQL is one of the features with the most disruptive potential, because it makes Flink much more accessible. 
In Apache Flink 1.6.0 the community &lt;strong&gt;improved further the SQL CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8863&quot;&gt;FLINK-8863&lt;/a&gt;) making the &lt;strong&gt;executions of streaming and batch queries&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8861&quot;&gt;FLINK-8861&lt;/a&gt;) against a multitude of data sources a piece of cake. 
In addition, the &lt;strong&gt;full Avro support&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9444&quot;&gt;FLINK-9444&lt;/a&gt;) makes reading any kind of Avro data seamless. 
Last but not least, the community &lt;strong&gt;hardened Flink’s CEP library&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9418&quot;&gt;FLINK-9418&lt;/a&gt;) that can now handle significantly larger use cases.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;What would be a distributed processing engine without its connectors to talk to the outside world? 
In the latest Flink release we added a &lt;strong&gt;new StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9750&quot;&gt;FLINK-9750&lt;/a&gt;) that succeeds the &lt;code&gt;BucketingSink&lt;/code&gt; as the standard file sink. 
The community also added support for &lt;strong&gt;ElasticSearch 6.x&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7386&quot;&gt;FLINK-7386&lt;/a&gt;) and implemented multiple &lt;strong&gt;AvroDeserializationSchemas&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9338&quot;&gt;FLINK-9338&lt;/a&gt;) to easily ingest Avro data.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;h3 id=&quot;improving-flinks-state-support&quot;&gt;Improving Flink’s State Support&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for State TTL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9938&quot;&gt;FLINK-9938&lt;/a&gt;):
This feature allows to specify a time-to-live (TTL) for Flink state. 
Once the time-to-live has been exceeded Flink will no longer give access to the respective state values. 
The expired data is cleaned up on access so that the operator keyed state doesn’t grow infinitely and it won’t be included in subsequent checkpoints.
This feature fully complies with new data protection regulations (e.g. GDPR).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Scalable Timers Based on RocksDB&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9485&quot;&gt;FLINK-9485&lt;/a&gt;):
Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk. 
Previously, users were limited to the heap memory size. 
On top of that, snapshots of the timer state are now asynchronous, i.e., they no longer block the processing pipeline during checkpoints and can be incremental.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Faster Timer Deletions&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9423&quot;&gt;FLINK-9423&lt;/a&gt;): 
Improving Flink’s internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n). 
This significantly improves Flink jobs using timers. 
Deleting timers is also exposed through a user-facing API now.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;extending-flinks-deployment-options&quot;&gt;Extending Flink’s Deployment Options&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Job Cluster Container Entrypoint&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9488&quot;&gt;FLINK-9488&lt;/a&gt;):
Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster. 
Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed. 
Since the image already contains the Flink job, client communication is no longer necessary. 
Avoiding additional communication steps with the client reduces the number of moving parts and improves operations in a container environment significantly.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Fully RESTified Job Submission&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;):
The Flink client now sends all job-relevant content via a single POST call to the server. 
This allows a much easier integration with cluster management frameworks and container environments, since opening custom ports is no longer necessary.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;enhancing-sql-and-table-api&quot;&gt;Enhancing SQL and Table API&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;User-Defined Function in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8863&quot;&gt;FLINK-8863&lt;/a&gt;):
The SQL Client CLI now supports the registration of user-defined functions. 
This considerably improves the CLI’s expressiveness, because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for Batch Queries in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8861&quot;&gt;FLINK-8861&lt;/a&gt;):
The SQL Client CLI now supports the execution of batch queries.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for INSERT INTO Statements in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8858&quot;&gt;FLINK-8858&lt;/a&gt;):
By supporting SQL’s INSERT INTO statements, the SQL Client CLI can be used to submit long-running SQL queries to Flink that sink their results in external systems. 
The SQL Client itself can be shut down after submission without stopping the job.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Unified Table Sinks and Formats&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8866&quot;&gt;FLINK-8866&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8558&quot;&gt;FLINK-8558&lt;/a&gt;):
In the past, table sinks had to be configured programmatically and were tied to a specific format and implementation.
This release reworked these aspects by decoupling formats from connectors and improving how table sinks are discovered and configured. 
Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New Kafka Table Sink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9846&quot;&gt;FLINK-9846&lt;/a&gt;):
The Kafka table sink now uses the new unified APIs and supports both JSON and Avro formats.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Full SQL Avro Support&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9444&quot;&gt;FLINK-9444&lt;/a&gt;):
Flink’s Table &amp;amp; SQL API now understands the full spectrum of Avro types including generic/specific records and logical types. 
The types are automatically mapped from and to Flink-equivalent types allowing to specify end-to-end ETL pipelines in SQL.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Improved Expressiveness of SQL and Table API&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5878&quot;&gt;FLINK-5878&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8688&quot;&gt;FLINK-8688&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6810&quot;&gt;FLINK-6810&lt;/a&gt;):
Flink’s Table &amp;amp; SQL API supports left, right, and full outer joins that allow for continuous result-updating queries.
SQL aggregate functions support the &lt;code&gt;DISTINCT&lt;/code&gt; keyword. 
Queries such as &lt;code&gt;COUNT(DISTINCT column)&lt;/code&gt; are supported for windowed and non-windowed aggregations.
Both SQL and Table API now include more built-in functions such as &lt;code&gt;MD5, SHA1, SHA2, LOG&lt;/code&gt;, and &lt;code&gt;UNNEST&lt;/code&gt; for multisets.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;more-connectors&quot;&gt;More Connectors&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9750&quot;&gt;FLINK-9750&lt;/a&gt;):
The new &lt;code&gt;StreamingFileSink&lt;/code&gt; is an exactly-once sink for writing to filesystems which capitalizes on the knowledge acquired from the previous &lt;code&gt;BucketingSink&lt;/code&gt;. 
Exactly-once is supported through integration of the sink with Flink’s checkpointing mechanism.
The new sink is built upon Flink’s own &lt;code&gt;FileSystem&lt;/code&gt; abstraction and it supports local file system and HDFS, with plans for S3 support in the near future.
It exposes pluggable file rolling and bucketing policies.
Apart from row-wise encoding formats, the new &lt;code&gt;StreamingFileSink&lt;/code&gt; comes with support for Parquet.
Other bulk-encoding formats like ORC can be easily added using the exposed APIs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;ElasticSearch 6.x Connector and Improved Support for Older Versions&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7386&quot;&gt;FLINK-7386&lt;/a&gt;):
Flink now comes with a connector for ElasticSearch 6.x, that is built on top of Elasticsearch’s new &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high.html&quot;&gt;high level REST client&lt;/a&gt;.
For older ElasticSearch versions which still use the native Java &lt;code&gt;TransportClient&lt;/code&gt;, Flink’s Elasticsearch connectors now support up to Elasticsearch version 5.6.10.
Some APIs in the &lt;code&gt;RequestIndexer&#39;s&lt;/code&gt; public interface of the ElasticSearch connector have been deprecated. 
Please refer to the Javadoc / documentation for the new preferred API.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Avro Deserialization Schemas&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9338&quot;&gt;FLINK-9338&lt;/a&gt;):
Flink comes now with a &lt;code&gt;DeserializationSchema&lt;/code&gt; which allows deserializing Avro encoded messages. 
It also adds out-of-the-box integration with Confluent’s schema registry.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;jepsen-based-distributed-tests-suite&quot;&gt;Jepsen Based Distributed Tests Suite&lt;/h3&gt;

&lt;p&gt;The Flink community added a Jepsen based test suite (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9004&quot;&gt;FLINK-9004&lt;/a&gt;) which validates the behavior of Flink’s distributed cluster components under real-world faults. 
It is a first step towards a higher test coverage for Flink’s fault tolerance mechanisms. 
The community intends to incrementally improve test coverage with it.&lt;/p&gt;

&lt;h3 id=&quot;various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Hardened CEP Library&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9418&quot;&gt;FLINK-9418&lt;/a&gt;):
The CEP operator’s internal NFA state is now backed by Flink state.
That way it can go out of core to support much larger use cases.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;More Expressive DataStream Joins&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8478&quot;&gt;FLINK-8478&lt;/a&gt;):
Flink 1.6.0 adds support for interval joins in the DataStream API. 
With this feature it is now possible to join together events from different streams where elements from one stream lie in a specified time interval relative to elements from the other stream.
Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html&quot;&gt;documentation&lt;/a&gt; for more details.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Intra-Cluster Mutual Authentication&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9312&quot;&gt;FLINK-9312&lt;/a&gt;):
Flink’s cluster components now enforce mutual authentication with their peers. 
This allows only Flink components to talk to each other, making it impossible for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/release-notes/flink-1.6.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.6.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;According to git shortlog, the following 112 people contributed to the 1.6.0 release. Thanks to all contributors!&lt;/p&gt;

&lt;p&gt;Alejandro Alcalde, Alexander Koltsov, Alexey Tsitkin, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Arunan Sugunakumar, Ashwin Sinha, Bill Lee, Bowen Li, Chesnay Schepler, Christophe Jolif, Clément Tamisier, Craig Foster, David Anderson, Dawid Wysakowicz, Deepak Sharnma, Dmitrii_Kniazev, EAlexRojas, Elias Levy, Eron Wright, Ethan Li, Fabian Hueske, Florian Schmidt, Franz Thoma, Gabor Gevay, Georgii Gobozov, Haohui Mai, Jamie Grier, Jeff Zhang, Jelmer Kuperus, Jiayi Liao, Jungtaek Lim, Kailash HD, Ken Geis, Ken Krugler, Lakshmi Gururaja Rao, Leonid Ishimnikov, Matrix42, Michael Gendelman, MichealShin, Moser Thomas W, Nico Duldhardt, Nico Kruber, Oleksandr Nitavskyi, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Philippe Duveau, Piotr Nowojski, Qiu Congxian/klion26, Rinat Sharipov, Rong Rong, Rune Skou Larsen, Sayat Satybaldiyev, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Thomas Weise, Till Rohrmann, Timo Walther, Tobii42, Tzu-Li (Gordon) Tai, Viktor Vlasov, Wosin, Xingcan Cui, Xpray, Yan Zhou, Yazdan.JS, Yun Tang, Zhijiang, Zsolt Donca, an4828, aria, binlijin, blueszheng, davidxdh, gyao, hequn8128, hzyuqi1, jerryjzhang, jparkie, juhoautio, kai-chi, kkloudas, klion26, lamber-ken, lincoln-lil, linjun, liurenjie1024, lsy, maqingxiang-it, maxbelov, mayyamus, minwenjun, neoremind, sampathBhat, shankarganesh1234, shuai.xus, sihuazhou, snuyanzin, triones.deng, vinoyang, xueyu, yangshimin, yuemeng, zhangminglei, zhouhai02, zjureel, 军长, 陈梓立&lt;/p&gt;
</description>
<pubDate>Thu, 09 Aug 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/08/09/release-1.6.0.html</link>
<guid isPermaLink="true">/news/2018/08/09/release-1.6.0.html</guid>
</item>

<item>
<title>Apache Flink 1.5.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.1. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9839&quot;&gt;FLINK-9839&lt;/a&gt;] -         End-to-end test: Streaming job with SSL
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5750&quot;&gt;FLINK-5750&lt;/a&gt;] -         Incorrect translation of n-ary Union
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8161&quot;&gt;FLINK-8161&lt;/a&gt;] -         Flakey YARNSessionCapacitySchedulerITCase on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8731&quot;&gt;FLINK-8731&lt;/a&gt;] -         TwoInputStreamTaskTest flaky on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9091&quot;&gt;FLINK-9091&lt;/a&gt;] -         Failure while enforcing releasability in building flink-json module
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9380&quot;&gt;FLINK-9380&lt;/a&gt;] -         Failing end-to-end tests should not clean up logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9439&quot;&gt;FLINK-9439&lt;/a&gt;] -         DispatcherTest#testJobRecovery dead locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9575&quot;&gt;FLINK-9575&lt;/a&gt;] -         Potential race condition when removing JobGraph in HA
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9584&quot;&gt;FLINK-9584&lt;/a&gt;] -         Unclosed streams in Bucketing-/RollingSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9658&quot;&gt;FLINK-9658&lt;/a&gt;] -         Test data output directories are no longer cleaned up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9706&quot;&gt;FLINK-9706&lt;/a&gt;] -         DispatcherTest#testSubmittedJobGraphListener fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9743&quot;&gt;FLINK-9743&lt;/a&gt;] -         PackagedProgram.extractContainedLibraries fails on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9754&quot;&gt;FLINK-9754&lt;/a&gt;] -         Release scripts refers to non-existing profile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9755&quot;&gt;FLINK-9755&lt;/a&gt;] -         Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9762&quot;&gt;FLINK-9762&lt;/a&gt;] -         CoreOptions.TMP_DIRS wrongly managed on Yarn
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9766&quot;&gt;FLINK-9766&lt;/a&gt;] -         Incomplete/incorrect cleanup in RemoteInputChannelTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9771&quot;&gt;FLINK-9771&lt;/a&gt;] -          &amp;quot;Show Plan&amp;quot; option under Submit New Job in WebUI not working 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9772&quot;&gt;FLINK-9772&lt;/a&gt;] -         Documentation of Hadoop API outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9784&quot;&gt;FLINK-9784&lt;/a&gt;] -         Inconsistent use of &amp;#39;static&amp;#39; in AsyncIOExample.java
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9793&quot;&gt;FLINK-9793&lt;/a&gt;] -         When submitting a flink job with yarn-cluster, flink-dist*.jar is repeatedly uploaded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9810&quot;&gt;FLINK-9810&lt;/a&gt;] -         JarListHandler does not close opened jars
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9838&quot;&gt;FLINK-9838&lt;/a&gt;] -         Slot request failed Exceptions after completing a job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9841&quot;&gt;FLINK-9841&lt;/a&gt;] -         Web UI only show partial taskmanager log 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9842&quot;&gt;FLINK-9842&lt;/a&gt;] -         Job submission fails via CLI with SSL enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9847&quot;&gt;FLINK-9847&lt;/a&gt;] -         OneInputStreamTaskTest.testWatermarksNotForwardedWithinChainWhenIdle unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9857&quot;&gt;FLINK-9857&lt;/a&gt;] -         Processing-time timers fire too early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9860&quot;&gt;FLINK-9860&lt;/a&gt;] -         Netty resource leak on receiver side
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9872&quot;&gt;FLINK-9872&lt;/a&gt;] -         SavepointITCase#testSavepointForJobWithIteration does not properly cancel jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9908&quot;&gt;FLINK-9908&lt;/a&gt;] -         Inconsistent state of SlotPool after ExecutionGraph cancellation 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9910&quot;&gt;FLINK-9910&lt;/a&gt;] -         Non-queued scheduling failure sometimes does not return the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9911&quot;&gt;FLINK-9911&lt;/a&gt;] -         SlotPool#failAllocation is called outside of main thread
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9499&quot;&gt;FLINK-9499&lt;/a&gt;] -         Allow REST API for running a job to provide job configuration as body of POST request
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9659&quot;&gt;FLINK-9659&lt;/a&gt;] -         Remove hard-coded sleeps in bucketing sink E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9748&quot;&gt;FLINK-9748&lt;/a&gt;] -         create_source_release pollutes flink root directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9768&quot;&gt;FLINK-9768&lt;/a&gt;] -         Only build flink-dist for binary releases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9785&quot;&gt;FLINK-9785&lt;/a&gt;] -         Add remote addresses to LocalTransportException instances
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9801&quot;&gt;FLINK-9801&lt;/a&gt;] -         flink-dist is missing dependency on flink-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9804&quot;&gt;FLINK-9804&lt;/a&gt;] -         KeyedStateBackend.getKeys() does not work on RocksDB MapState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9811&quot;&gt;FLINK-9811&lt;/a&gt;] -         Add ITCase for interactions of Jar handlers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9873&quot;&gt;FLINK-9873&lt;/a&gt;] -         Log actual state when aborting checkpoint due to task not running
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9881&quot;&gt;FLINK-9881&lt;/a&gt;] -         Typo in a function name in table.scala
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9888&quot;&gt;FLINK-9888&lt;/a&gt;] -         Remove unsafe defaults from release scripts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9909&quot;&gt;FLINK-9909&lt;/a&gt;] -         Remove cancellation of input futures from ConjunctFutures
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 31 Jul 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/07/31/release-1.5.2.html</link>
<guid isPermaLink="true">/news/2018/07/31/release-1.5.2.html</guid>
</item>

<item>
<title>Apache Flink 1.5.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.5.0. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8977&quot;&gt;FLINK-8977&lt;/a&gt;] -         End-to-end test: Manually resume job after terminal failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8982&quot;&gt;FLINK-8982&lt;/a&gt;] -         End-to-end test: Queryable state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8989&quot;&gt;FLINK-8989&lt;/a&gt;] -         End-to-end test: ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8996&quot;&gt;FLINK-8996&lt;/a&gt;] -         Include an operator with broadcast and union state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9008&quot;&gt;FLINK-9008&lt;/a&gt;] -         End-to-end test: Quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9320&quot;&gt;FLINK-9320&lt;/a&gt;] -         Update `test-ha.sh` end-to-end test to use general purpose DataStream job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9322&quot;&gt;FLINK-9322&lt;/a&gt;] -         Add exception throwing map function that simulates failures to the general purpose DataStream job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9394&quot;&gt;FLINK-9394&lt;/a&gt;] -         Let externalized checkpoint resume e2e also test rescaling
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8785&quot;&gt;FLINK-8785&lt;/a&gt;] -         JobSubmitHandler does not handle JobSubmissionExceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8795&quot;&gt;FLINK-8795&lt;/a&gt;] -         Scala shell broken for Flip6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8946&quot;&gt;FLINK-8946&lt;/a&gt;] -         TaskManager stop sending metrics after JobManager failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9174&quot;&gt;FLINK-9174&lt;/a&gt;] -         The type of state created in ProccessWindowFunction.proccess() is inconsistency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9215&quot;&gt;FLINK-9215&lt;/a&gt;] -         TaskManager Releasing  - org.apache.flink.util.FlinkException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9257&quot;&gt;FLINK-9257&lt;/a&gt;] -         End-to-end tests prints &amp;quot;All tests PASS&amp;quot; even if individual test-script returns non-zero exit code
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9258&quot;&gt;FLINK-9258&lt;/a&gt;] -         ConcurrentModificationException in ComponentMetricGroup.getAllVariables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9326&quot;&gt;FLINK-9326&lt;/a&gt;] -         TaskManagerOptions.NUM_TASK_SLOTS does not work for local/embedded mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9374&quot;&gt;FLINK-9374&lt;/a&gt;] -         Flink Kinesis Producer does not backpressure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9398&quot;&gt;FLINK-9398&lt;/a&gt;] -         Flink CLI list running job returns all jobs except in CREATE state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9437&quot;&gt;FLINK-9437&lt;/a&gt;] -         Revert cypher suite update
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9458&quot;&gt;FLINK-9458&lt;/a&gt;] -         Unable to recover from job failure on YARN with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9467&quot;&gt;FLINK-9467&lt;/a&gt;] -         No Watermark display on Web UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9468&quot;&gt;FLINK-9468&lt;/a&gt;] -         Wrong calculation of outputLimit in LimitedConnectionsFileSystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9493&quot;&gt;FLINK-9493&lt;/a&gt;] -         Forward exception when releasing a TaskManager at the SlotPool
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9494&quot;&gt;FLINK-9494&lt;/a&gt;] -         Race condition in Dispatcher with concurrent granting and revoking of leaderhship
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9500&quot;&gt;FLINK-9500&lt;/a&gt;] -         FileUploadHandler does not handle EmptyLastHttpContent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9524&quot;&gt;FLINK-9524&lt;/a&gt;] -         NPE from ProcTimeBoundedRangeOver.scala
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9530&quot;&gt;FLINK-9530&lt;/a&gt;] -         Task numRecords metrics broken for chains
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9554&quot;&gt;FLINK-9554&lt;/a&gt;] -         flink scala shell doesn&amp;#39;t work in yarn mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] -         Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9570&quot;&gt;FLINK-9570&lt;/a&gt;] -         SQL Client merging environments uses AbstractMap
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9580&quot;&gt;FLINK-9580&lt;/a&gt;] -         Potentially unclosed ByteBufInputStream in RestClient#readRawResponse
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9627&quot;&gt;FLINK-9627&lt;/a&gt;] -         Extending &amp;#39;KafkaJsonTableSource&amp;#39; according to comments will result in NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9629&quot;&gt;FLINK-9629&lt;/a&gt;] -         Datadog metrics reporter does not have shaded dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9633&quot;&gt;FLINK-9633&lt;/a&gt;] -         Flink doesn&amp;#39;t use the Savepoint path&amp;#39;s filesystem to create the OuptutStream on Task.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9634&quot;&gt;FLINK-9634&lt;/a&gt;] -         Deactivate previous location based scheduling if local recovery is disabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9636&quot;&gt;FLINK-9636&lt;/a&gt;] -         Network buffer leaks in requesting a batch of segments during canceling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] -         ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9654&quot;&gt;FLINK-9654&lt;/a&gt;] -         Internal error while deserializing custom Scala TypeSerializer instances
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9655&quot;&gt;FLINK-9655&lt;/a&gt;] -         Externalized checkpoint E2E test fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9665&quot;&gt;FLINK-9665&lt;/a&gt;] -         PrometheusReporter does not properly unregister metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9676&quot;&gt;FLINK-9676&lt;/a&gt;] -         Deadlock during canceling task and recycling exclusive buffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9677&quot;&gt;FLINK-9677&lt;/a&gt;] -         RestClient fails for large uploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9684&quot;&gt;FLINK-9684&lt;/a&gt;] -         HistoryServerArchiveFetcher not working properly with secure hdfs cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] -         Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9708&quot;&gt;FLINK-9708&lt;/a&gt;] -         Network buffer leaks when buffer request fails during buffer redistribution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9769&quot;&gt;FLINK-9769&lt;/a&gt;] -         FileUploads may be shared across requests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9770&quot;&gt;FLINK-9770&lt;/a&gt;] -         UI jar list broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9789&quot;&gt;FLINK-9789&lt;/a&gt;] -         Watermark metrics for an operator&amp;amp;task shadow each other
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9153&quot;&gt;FLINK-9153&lt;/a&gt;] -         TaskManagerRunner should support rpc port range
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;] -         Extend JobSubmitHandler to accept jar files
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9316&quot;&gt;FLINK-9316&lt;/a&gt;] -         Expose operator unique ID to the user defined functions in DataStream .
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9564&quot;&gt;FLINK-9564&lt;/a&gt;] -         Expose end-to-end module directory to test scripts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9599&quot;&gt;FLINK-9599&lt;/a&gt;] -         Implement generic mechanism to receive files via rest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9669&quot;&gt;FLINK-9669&lt;/a&gt;] -         Introduce task manager assignment store
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9670&quot;&gt;FLINK-9670&lt;/a&gt;] -         Introduce slot manager factory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9671&quot;&gt;FLINK-9671&lt;/a&gt;] -         Add configuration to enable task manager isolation.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4301&quot;&gt;FLINK-4301&lt;/a&gt;] -         Parameterize Flink version in Quickstart bash script
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8650&quot;&gt;FLINK-8650&lt;/a&gt;] -         Add tests and documentation for WINDOW clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8654&quot;&gt;FLINK-8654&lt;/a&gt;] -         Extend quickstart docs on how to submit jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9109&quot;&gt;FLINK-9109&lt;/a&gt;] -         Add flink modify command to documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9355&quot;&gt;FLINK-9355&lt;/a&gt;] -         Simplify configuration of local recovery to a simple on/off
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9372&quot;&gt;FLINK-9372&lt;/a&gt;] -         Typo on Elasticsearch website link (elastic.io --&amp;gt; elastic.co)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9409&quot;&gt;FLINK-9409&lt;/a&gt;] -         Remove flink-avro and flink-json from /opt
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9456&quot;&gt;FLINK-9456&lt;/a&gt;] -         Let ResourceManager notify JobManager about failed/killed TaskManagers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9508&quot;&gt;FLINK-9508&lt;/a&gt;] -         General Spell Check on Flink Docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9517&quot;&gt;FLINK-9517&lt;/a&gt;] -         Fixing broken links on CLI and Upgrade Docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9518&quot;&gt;FLINK-9518&lt;/a&gt;] -         SSL setup Docs config example has wrong keys password 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9549&quot;&gt;FLINK-9549&lt;/a&gt;] -         Fix FlickCEP Docs broken link and minor style changes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9573&quot;&gt;FLINK-9573&lt;/a&gt;] -         Check for leadership with leader session id
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9594&quot;&gt;FLINK-9594&lt;/a&gt;] -         Add documentation for e2e test changes introduced with FLINK-9257
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9595&quot;&gt;FLINK-9595&lt;/a&gt;] -         Add instructions to docs about ceased support of KPL version used in Kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9638&quot;&gt;FLINK-9638&lt;/a&gt;] -         Add helper script to run single e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9672&quot;&gt;FLINK-9672&lt;/a&gt;] -         Fail fatally if we cannot submit job on added JobGraph signal
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9707&quot;&gt;FLINK-9707&lt;/a&gt;] -         LocalFileSystem does not support concurrent directory creations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9729&quot;&gt;FLINK-9729&lt;/a&gt;] -         Duplicate lines for &amp;quot;Weekday name (Sunday .. Saturday)&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9734&quot;&gt;FLINK-9734&lt;/a&gt;] -         Typo &amp;#39;field-deleimiter&amp;#39; in SQL client docs
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Thu, 12 Jul 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/07/12/release-1.5.1.html</link>
<guid isPermaLink="true">/news/2018/07/12/release-1.5.1.html</guid>
</item>

<item>
<title>Apache Flink 1.5.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is thrilled to announce the 1.5.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 780 issues. Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12341764&amp;amp;projectId=12315522&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;p&gt;Flink 1.5.0 is the sixth major release in the 1.x.y series. As usual, it is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;

&lt;p&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.5/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-15---streaming-evolved&quot; id=&quot;markdown-toc-flink-15---streaming-evolved&quot;&gt;Flink 1.5 - Streaming Evolved&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#rewrite-of-flinks-deployment-and-process-model&quot; id=&quot;markdown-toc-rewrite-of-flinks-deployment-and-process-model&quot;&gt;Rewrite of Flink’s Deployment and Process Model&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#broadcast-state&quot; id=&quot;markdown-toc-broadcast-state&quot;&gt;Broadcast State&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#improvements-to-flinks-network-stack&quot; id=&quot;markdown-toc-improvements-to-flinks-network-stack&quot;&gt;Improvements to Flink’s Network Stack&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#task-local-state-recovery&quot; id=&quot;markdown-toc-task-local-state-recovery&quot;&gt;Task-Local State Recovery&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#extending-join-support-for-sql-and-table-api&quot; id=&quot;markdown-toc-extending-join-support-for-sql-and-table-api&quot;&gt;Extending Join Support for SQL and Table API&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#sql-cli-client&quot; id=&quot;markdown-toc-sql-cli-client&quot;&gt;SQL CLI Client&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#various-other-features-and-improvements&quot; id=&quot;markdown-toc-various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;flink-15---streaming-evolved&quot;&gt;Flink 1.5 - Streaming Evolved&lt;/h2&gt;

&lt;p&gt;We believe that the field of stream processing, and Apache Flink with it, is taking another major leap at the moment. Stream processing is not just faster analytics and a more principled way of building fast continuous data pipelines. Stream processing is becoming a paradigm to build data-driven and data-intensive applications - it brings together data processing logic and application/business logic.&lt;/p&gt;

&lt;p&gt;To help users realize the potential of this change, we spent a lot of effort in this release to rework some fundamental pieces of Flink. We want Flink to feel natural to users who do data engineering / data processing, as well as users who build data/event-driven applications (and of course those who combine both aspects inside their applications). This is an ongoing journey, but here are the first steps on this way:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;We have &lt;strong&gt;redesigned and reimplemented large parts of Flink’s process model&lt;/strong&gt;. This effort has been tracked under the name &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;. While not all is completed yet, the changes in Flink 1.5 enable more natural Kubernetes deployments and switch to HTTP/REST for all external communication (to naturally interact with service proxies). Simultaneously, Flink 1.5 simplifies deployments on common cluster managers (YARN, Mesos) and features dynamic resource allocation.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Streaming &lt;strong&gt;broadcast state&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4940&quot;&gt;FLINK-4940&lt;/a&gt;) connects a broadcasted stream (e.g., context data, machine learning models, rules/patterns, triggers, …) with other streams that may maintain (large) keyed state, such as feature vectors, state machines, etc. Prior to Flink 1.5, such use cases could not be easily built.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;To improve support for real-time applications with tight latency constraints, we made &lt;strong&gt;major improvements to Flink’s network stack&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7315&quot;&gt;FLINK-7315&lt;/a&gt;). Flink 1.5 achieves even lower latencies while maintaining a high throughput. In addition, we improved checkpoint stability under backpressure.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Streaming SQL is more and more recognized as a simple and powerful way to perform streaming analytics, build data pipelines, do feature engineering, or incrementally keep applications updated on changing data. We added a &lt;strong&gt;SQL CLI for streaming SQL queries&lt;/strong&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client&quot;&gt;FLIP-24&lt;/a&gt;) to make this feature easier to get started with.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;h3 id=&quot;rewrite-of-flinks-deployment-and-process-model&quot;&gt;Rewrite of Flink’s Deployment and Process Model&lt;/h3&gt;

&lt;p&gt;The rewrite of Flink’s deployment and process model (internally known as &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;) has been in the works for more than a year and was a substantial effort from the Flink community. Many contributors from several organizations, such as data Artisans, Alibaba, and Dell EMC, collaborated on the design and implementation of this feature, which has been the most significant improvement of a Flink core component since the project’s inception.&lt;/p&gt;

&lt;p&gt;In a nutshell, the improvements add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization, failure recovery, and also dynamic scaling. Moreover, deployments on container management infrastructures like Kubernetes have been simplified and all requests to the JobManager now happen through REST. This includes job submission, cancellation, requesting job status, taking a savepoint, and so on.&lt;/p&gt;

&lt;p&gt;The work also builds the foundation for future improvements of Flink’s integration with Kubernetes. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment, i.e., without starting a Flink cluster first. In addition, the work is a big step towards support for applications that are able to automatically adjust their parallelism.&lt;/p&gt;

&lt;p&gt;Note that Flink’s programming APIs are not affected by these improvements.&lt;/p&gt;

&lt;h3 id=&quot;broadcast-state&quot;&gt;Broadcast State&lt;/h3&gt;

&lt;p&gt;Support for broadcast state, i.e., state that is replicated across all parallel instances of a function, has been an frequently requested feature. Typical use cases for broadcast state involve two streams, a control or configuration stream that serves rules, patterns, or other configuration messages and a regular data stream. The processing of the regular stream is configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.&lt;/p&gt;

&lt;p&gt;Of course, broadcasted state can checkpointed and restored just like any other state in Flink with exactly-once state consistency guarantees. Moreover, broadcast state unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library.&lt;/p&gt;

&lt;h3 id=&quot;improvements-to-flinks-network-stack&quot;&gt;Improvements to Flink’s Network Stack&lt;/h3&gt;

&lt;p&gt;The performance of a distributed streaming application heavily depends on the component that transfers events from one operator to another via a network connection. In the context of stream processing, two performance metrics, latency and throughput, are important.&lt;/p&gt;

&lt;p&gt;For Flink 1.5, the community worked on two efforts to improve Flink’s network stack, credit-based flow control and improving the transfer latency. Credit-based flow control reduces the amount of data “on the wire” to a minimum while preserving high throughput. This significantly reduces the time to complete a checkpoint in back pressure situations. Moreover, Flink is now able to achieve much lower latencies without a reduction in throughput.&lt;/p&gt;

&lt;h3 id=&quot;task-local-state-recovery&quot;&gt;Task-Local State Recovery&lt;/h3&gt;

&lt;p&gt;Flink’s checkpointing mechanism writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This mechanism ensures that state is not lost when an application fails. However, in case of a failure, it might take a while to load the state from the remote storage to recover the application.&lt;/p&gt;

&lt;p&gt;Improving the checkpointing and recovery efficiency is an ongoing effort in the Flink community. Prominent features of previous releases were asynchronous and incremental checkpointing. In this release, we improved the efficiency of failure recovery.&lt;/p&gt;

&lt;p&gt;Task-local state recovery leverages the fact that a job typically fails due to a single crashed operator, TaskManager, or machine. When writing the state of operators to the remote storage, Flink can now also keep a copy on the local disk of each machine. In case of failover, the scheduler tries to reschedule tasks to their previous machine and load the state from the local disk instead of the remote storage, resulting in faster recovery.&lt;/p&gt;

&lt;h3 id=&quot;extending-join-support-for-sql-and-table-api&quot;&gt;Extending Join Support for SQL and Table API&lt;/h3&gt;

&lt;p&gt;With the 1.5.0 release, Flink adds support for windowed outer equi-joins. Queries like the one shown below allow for joining of tables on bounded time ranges in both event-time and processing-time.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;departureTime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arrivalTime&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Departures&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OUTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Arrivals&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arrivalTime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt; 
      &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deptureTime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;departureTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOURS&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For cases where two streaming tables should not be joined within a bounded time interval, Flink SQL also now supports non-windowed inner joins. This enables full-history matching, which is common in many standard SQL statements.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;productId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;amount&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Users&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userId&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;sql-cli-client&quot;&gt;SQL CLI Client&lt;/h3&gt;

&lt;p&gt;A few months ago, the community started an effort to add a service to execute streaming and batch SQL queries (FLIP-24). The new SQL CLI client is the first step of this effort and provides a SQL shell to run exploratory queries on data streams. The animation below shows a preview of this features.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sql_client_demo.gif&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h3 id=&quot;various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.openstack.org/&quot;&gt;OpenStack&lt;/a&gt; provides software for creating public and private clouds on pools of resources. Flink now supports OpenStack’s S3-like file system, Swift, for checkpoint and savepoint storage. Swift can be used without Hadoop dependencies.&lt;/li&gt;
  &lt;li&gt;Reading and writing JSON messages from and to connectors has been improved. It’s now possible to parse a standard JSON schema in order to configure serializers and deserializers. The SQL CLI Client is able to read JSON records from Kafka.&lt;/li&gt;
  &lt;li&gt;Applications can be rescaled without manually triggering a savepoint. Under the hood, Flink will still take a savepoint, stop the application, and rescale it to the new parallelism.&lt;/li&gt;
  &lt;li&gt;Improved metrics for watermarks and latency. Flink now reports the minimum watermark in all operators, including sources. Moreover, the latency metrics were reworked for better integration with common metrics systems.&lt;/li&gt;
  &lt;li&gt;The &lt;code&gt;FileInputFormat&lt;/code&gt; (and many derived input formats) now supports reading files from multiple paths.&lt;/li&gt;
  &lt;li&gt;The &lt;code&gt;BucketingSink&lt;/code&gt; supports the specification of custom extensions for multiple parts.&lt;/li&gt;
  &lt;li&gt;The &lt;code&gt;CassandraOutputFormat&lt;/code&gt; can be used to emit &lt;code&gt;Row&lt;/code&gt; objects.&lt;/li&gt;
  &lt;li&gt;The Kinesis consumer allows for more customization.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;

&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.5.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;According to git shortlog, the following 106 people contributed to the 1.5.0 release. Thanks to all contributors!&lt;/p&gt;

&lt;p&gt;Aegeaner, Alejandro Alcalde, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Ankit Parashar, Arunan Sugunakumar, Bartłomiej Tartanus, Bowen Li, Cristian, Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii_Kniazev, Dyana Rose, EAlexRojas, Eron Wright, Fabian Hueske, Florian Schmidt, Gabor Gevay, Greg Hogan, Gyula Fora, Jark Wu, Jelmer Kuperus, Joerg Schad, John Eismeier, Kailash HD, Ken Geis, Ken Krugler, Kent Murra, Leonid Ishimnikov, Malcolm Taylor, Matrix42, Michael Fong, Michael Gendelman, Moser Thomas W, Nico Kruber, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Phetsarath, Sourigna, Philip Luppens, Piotr Nowojski, Qiu Congxian/klion26, Razvan, Robert Metzger, Rong Rong, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Steven Langbroek, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Vetriselvan1187, Xingcan Cui, Xpray, Yazdan.JS, Zhijiang, Zohar Mizrahi, aria, biao.liub, binlijin, davidxdh, eastcirclek, eskabetxe, gyao, hequn8128, hzyuqi1, ifndef-SleePy, jparkie, juhoautio, kkloudas, maqingxiang-it, maxbelov, mayyamus, mingleiZhang, neoremind, nichuanlei, okumin, shankarganesh1234, shuai.xus, sihuazhou, summerleafs, sunjincheng121, triones.deng, twalthr, uybhatti, vinoyang, wenlong.lwl, yanghua, yew1eb, yuemeng, zentol, zhangminglei, zhouhai02, zjureel, 军长, 金竹, 王振涛, 陈梓立&lt;/p&gt;
</description>
<pubDate>Fri, 25 May 2018 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/05/25/release-1.5.0.html</link>
<guid isPermaLink="true">/news/2018/05/25/release-1.5.0.html</guid>
</item>

<item>
<title>Apache Flink 1.3.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;

&lt;p&gt;This release includes 4 critical fixes related to checkpointing and recovery. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all Flink 1.3 series users to upgrade to Flink 1.3.3.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7783&quot;&gt;FLINK-7783&lt;/a&gt;] -         Don&amp;#39;t always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7283&quot;&gt;FLINK-7283&lt;/a&gt;] -         PythonPlanBinderTest issues with python paths
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8487&quot;&gt;FLINK-8487&lt;/a&gt;] -         State loss after multiple restart attempts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8807&quot;&gt;FLINK-8807&lt;/a&gt;] -         ZookeeperCompleted checkpoint store can get stuck in infinite loop
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8890&quot;&gt;FLINK-8890&lt;/a&gt;] -         Compare checkpoints with order in CompletedCheckpoint.checkpointsMatch()
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 15 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/03/15/release-1.3.3.html</link>
<guid isPermaLink="true">/news/2018/03/15/release-1.3.3.html</guid>
</item>

<item>
<title>Apache Flink 1.4.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.4 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 10 fixes and minor improvements for Flink 1.4.1. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.4.2.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6321&quot;&gt;FLINK-6321&lt;/a&gt;] -         RocksDB state backend Checkpointing is not working with KeyedCEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7756&quot;&gt;FLINK-7756&lt;/a&gt;] -         RocksDB state backend Checkpointing (Async and Incremental)  is not working with CEP.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8423&quot;&gt;FLINK-8423&lt;/a&gt;] -         OperatorChain#pushToOperator catch block may fail with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8451&quot;&gt;FLINK-8451&lt;/a&gt;] -         CaseClassSerializer is not backwards compatible in 1.4
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8520&quot;&gt;FLINK-8520&lt;/a&gt;] -         CassandraConnectorITCase.testCassandraTableSink unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8621&quot;&gt;FLINK-8621&lt;/a&gt;] -         PrometheusReporterTest.endpointIsUnavailableAfterReporterIsClosed unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8692&quot;&gt;FLINK-8692&lt;/a&gt;] -         Mistake in MyMapFunction code snippet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8735&quot;&gt;FLINK-8735&lt;/a&gt;] -         Add savepoint migration ITCase that covers operator state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8741&quot;&gt;FLINK-8741&lt;/a&gt;] -         KafkaFetcher09/010/011 uses wrong user code classloader
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8772&quot;&gt;FLINK-8772&lt;/a&gt;] -         FlinkKafkaConsumerBase partitions discover missing a log parameter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8791&quot;&gt;FLINK-8791&lt;/a&gt;] -         Fix documentation on how to link dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8798&quot;&gt;FLINK-8798&lt;/a&gt;] -         Make commons-logging a parent-first pattern
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8849&quot;&gt;FLINK-8849&lt;/a&gt;] -         Wrong link from concepts/runtime to doc on chaining
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8202&quot;&gt;FLINK-8202&lt;/a&gt;] -         Update queryable section on configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8574&quot;&gt;FLINK-8574&lt;/a&gt;] -         Add timestamps to travis logging messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8576&quot;&gt;FLINK-8576&lt;/a&gt;] -         Log message for QueryableState loading failure too verbose
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8652&quot;&gt;FLINK-8652&lt;/a&gt;] -         Reduce log level of QueryableStateClient.getKvState() to DEBUG
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8308&quot;&gt;FLINK-8308&lt;/a&gt;] -         Update yajl-ruby dependency to 1.3.1 or higher
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 08 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/03/08/release-1.4.2.html</link>
<guid isPermaLink="true">/news/2018/03/08/release-1.4.2.html</guid>
</item>

<item>
<title>An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)</title>
<description>&lt;p&gt;&lt;em&gt;This post is an adaptation of &lt;a href=&quot;https://berlin.flink-forward.org/kb_sessions/hit-me-baby-just-one-time-building-end-to-end-exactly-once-applications-with-flink/&quot;&gt;Piotr Nowojski’s presentation from Flink Forward Berlin 2017&lt;/a&gt;. You can find the slides and a recording of the presentation on the Flink Forward Berlin website.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Apache Flink 1.4.0, released in December 2017, introduced a significant milestone for stream processing with Flink: a new feature called &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7210&quot;&gt;relevant Jira here&lt;/a&gt;) that extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and a selection of data sources and sinks, including Apache Kafka versions 0.11 and beyond. It provides a layer of abstraction and requires a user to implement only a handful of methods to achieve end-to-end exactly-once semantics.&lt;/p&gt;

&lt;p&gt;If that’s all you need to hear, let us point you &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html&quot;&gt;to the relevant place in the Flink documentation&lt;/a&gt;, where you can read about how to put &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; to use.&lt;/p&gt;

&lt;p&gt;But if you’d like to learn more, in this post, we’ll share an in-depth overview of the new feature and what is happening behind the scenes in Flink.&lt;/p&gt;

&lt;p&gt;Throughout the rest of this post, we’ll:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Describe the role of Flink’s checkpoints for guaranteeing exactly-once results within a Flink application.&lt;/li&gt;
  &lt;li&gt;Show how Flink interacts with data sources and data sinks via the two-phase commit protocol to deliver &lt;em&gt;end-to-end&lt;/em&gt; exactly-once guarantees.&lt;/li&gt;
  &lt;li&gt;Walk through a simple example on how to use &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; to implement an exactly-once file sink.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;exactly-once-semantics-within-an-apache-flink-application&quot;&gt;Exactly-once Semantics Within an Apache Flink Application&lt;/h2&gt;

&lt;p&gt;When we say “exactly-once semantics”, what we mean is that each incoming event affects the final results exactly once. Even in case of a machine or software failure, there’s no duplicate data and no data that goes unprocessed.&lt;/p&gt;

&lt;p&gt;Flink has long provided exactly-once semantics &lt;em&gt;within&lt;/em&gt; a Flink application. Over the past few years, we’ve &lt;a href=&quot;https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink&quot;&gt;written in depth about Flink’s checkpointing&lt;/a&gt;, which is at the core of Flink’s ability to provide exactly-once semantics. The Flink documentation also &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html&quot;&gt;provides a thorough overview of the feature&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Before we continue, here’s a quick summary of the checkpointing algorithm because understanding checkpoints is necessary for understanding this broader topic.&lt;/p&gt;

&lt;p&gt;A checkpoint in Flink is a consistent snapshot of:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The current state of an application&lt;/li&gt;
  &lt;li&gt;The position in an input stream&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Flink generates checkpoints on a regular, configurable interval and then writes the checkpoint to a persistent storage system, such as S3 or HDFS. Writing the checkpoint data to the persistent storage happens asynchronously, which means that a Flink application continues to process data during the checkpointing process.&lt;/p&gt;

&lt;p&gt;In the event of a machine or software failure and upon restart, a Flink application resumes processing from the most recent successfully-completed checkpoint; Flink restores application state and rolls back to the correct position in the input stream from a checkpoint before processing starts again. This means that Flink computes results as though the failure never occurred.&lt;/p&gt;

&lt;p&gt;Before Flink 1.4.0, exactly-once semantics were limited to the scope of &lt;em&gt;a Flink application only&lt;/em&gt; and did not extend to most of the external systems to which Flink sends data after processing.&lt;/p&gt;

&lt;p&gt;But Flink applications operate in conjunction with a wide range of data sinks, and developers should be able to maintain exactly-once semantics beyond the context of one component.&lt;/p&gt;

&lt;p&gt;To provide &lt;em&gt;end-to-end exactly-once&lt;/em&gt; semantics–that is, semantics that also apply to the external systems that Flink writes to in addition to the state of the Flink application–these external systems must provide a means to commit or roll back writes that coordinate with Flink’s checkpoints.&lt;/p&gt;

&lt;p&gt;One common approach for coordinating commits and rollbacks in a distributed system is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Two-phase_commit_protocol&quot;&gt;two-phase commit protocol&lt;/a&gt;. In the next section, we’ll go behind the scenes and discuss how Flink’s &lt;code&gt;TwoPhaseCommitSinkFunction &lt;/code&gt;utilizes the two-phase commit protocol to provide end-to-end exactly-once semantics.&lt;/p&gt;

&lt;h2 id=&quot;end-to-end-exactly-once-applications-with-apache-flink&quot;&gt;End-to-end Exactly Once Applications with Apache Flink&lt;/h2&gt;

&lt;p&gt;We’ll walk through the two-phase commit protocol and how it enables end-to-end exactly-once semantics in a sample Flink application that reads from and writes to Kafka. Kafka is a popular messaging system to use along with Flink, and Kafka recently added support for transactions with its 0.11 release. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-011&quot;&gt;This means that Flink now has the necessary mechanism to provide end-to-end exactly-once semantics&lt;/a&gt; in applications when receiving data from and writing data to Kafka.&lt;/p&gt;

&lt;p&gt;Flink’s support for end-to-end exactly-once semantics is not limited to Kafka and you can use it with any source / sink that provides the necessary coordination mechanism. For example, &lt;a href=&quot;http://pravega.io/&quot;&gt;Pravega&lt;/a&gt;, an open-source streaming storage system from Dell/EMC, also supports end-to-end exactly-once semantics with Flink via the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-1.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;In the sample Flink application that we’ll discuss today, we have:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;A data source that reads from Kafka (in Flink, a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-consumer&quot;&gt;KafkaConsumer&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;A windowed aggregation&lt;/li&gt;
  &lt;li&gt;A data sink that writes data back to Kafka (in Flink, a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-producer&quot;&gt;KafkaProducer&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the data sink to provide exactly-once guarantees, it must write all data to Kafka within the scope of a transaction. A commit bundles all writes between two checkpoints.&lt;/p&gt;

&lt;p&gt;This ensures that writes are rolled back in case of a failure.&lt;/p&gt;

&lt;p&gt;However, in a distributed system with multiple, concurrently-running sink tasks, a simple commit or rollback is not sufficient, because all of the components must “agree” together on committing or rolling back to ensure a consistent result. Flink uses the two-phase commit protocol and its pre-commit phase to address this challenge.&lt;/p&gt;

&lt;p&gt;The starting of a checkpoint represents the “pre-commit” phase of our two-phase commit protocol. When a checkpoint starts, the Flink JobManager injects a checkpoint barrier (which separates the records in the data stream into the set that goes into the current checkpoint vs. the set that goes into the next checkpoint) into the data stream.&lt;/p&gt;

&lt;p&gt;The barrier is passed from operator to operator. For every operator, it triggers the operator’s state backend to take a snapshot of its state.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-2.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The data source stores its Kafka offsets, and after completing this, it passes the checkpoint barrier to the next operator.&lt;/p&gt;

&lt;p&gt;This approach works if an operator has internal state &lt;em&gt;only&lt;/em&gt;. &lt;em&gt;Internal state&lt;/em&gt; is everything that is stored and managed by Flink’s state backends - for example, the windowed sums in the second operator. When a process has only internal state, there is no need to perform any additional action during pre-commit aside from updating the data in the state backends before it is checkpointed. Flink takes care of correctly committing those writes in case of checkpoint success or aborting them in case of failure.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-3.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit without external state&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;However, when a process has &lt;em&gt;external&lt;/em&gt; state, this state must be handled a bit differently. External state usually comes in the form of writes to an external system such as Kafka. In that case, to provide exactly-once guarantees, the external system must provide support for transactions that integrates with a two-phase commit protocol.&lt;/p&gt;

&lt;p&gt;We know that the data sink in our example has such external state because it’s writing data to Kafka. In this case, in the pre-commit phase, the data sink must pre-commit its external transaction in addition to writing its state to the state backend.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-4.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit with external state&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The pre-commit phase finishes when the checkpoint barrier passes through all of the operators and the triggered snapshot callbacks complete. At this point the checkpoint completed successfully and consists of the state of the entire application, including pre-committed external state. In case of a failure, we would re-initialize the application from this checkpoint.&lt;/p&gt;

&lt;p&gt;The next step is to notify all operators that the checkpoint has succeeded. This is the commit phase of the two-phase commit protocol and the JobManager issues checkpoint-completed callbacks for every operator in the application. The data source and window operator have no external state, and so in the commit phase, these operators don’t have to take any action. The data sink does have external state, though, and commits the transaction with the external writes.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-5.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - commit external state&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;So let’s put all of these different pieces together:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Once all of the operators complete their pre-commit, they issue a commit.&lt;/li&gt;
  &lt;li&gt;If at least one pre-commit fails, all others are aborted, and we roll back to the previous successfully-completed checkpoint.&lt;/li&gt;
  &lt;li&gt;After a successful pre-commit, the commit &lt;em&gt;must&lt;/em&gt; be guaranteed to eventually succeed – both our operators and our external system need to make this guarantee. If a commit fails (for example, due to an intermittent network issue), the entire Flink application fails, restarts according to the user’s restart strategy, and there is another commit attempt. This process is critical because if the commit does not eventually succeed, data loss occurs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Therefore, we can be sure that all operators agree on the final outcome of the checkpoint: all operators agree that the data is either committed or that the commit is aborted and rolled back.&lt;/p&gt;

&lt;h2 id=&quot;implementing-the-two-phase-commit-operator-in-flink&quot;&gt;Implementing the Two-Phase Commit Operator in Flink&lt;/h2&gt;

&lt;p&gt;All the logic required to put a two-phase commit protocol together can be a little bit complicated and that’s why Flink extracts the common logic of the two-phase commit protocol into the abstract &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; class&lt;code&gt;. &lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Let’s discuss how to extend a &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; on a simple file-based example. We need to implement only four methods and present their implementations for an exactly-once file sink:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;code&gt;beginTransaction - &lt;/code&gt;to begin the transaction, we create a temporary file in a temporary directory on our destination file system. Subsequently, we can write data to this file as we process it.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;preCommit - &lt;/code&gt;on pre-commit, we flush the file, close it, and never write to it again. We’ll also start a new transaction for any subsequent writes that belong to the next checkpoint.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;commit - &lt;/code&gt;on commit, we atomically move the pre-committed file to the actual destination directory. Please note that this increases the latency in the visibility of the output data.&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;abort - &lt;/code&gt;on abort, we delete the temporary file.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As we know, if there’s any failure, Flink restores the state of the application to the latest successful checkpoint. One potential catch is in a rare case when the failure occurs after a successful pre-commit but before notification of that fact (a commit) reaches our operator. In that case, Flink restores our operator to the state that has already been pre-committed but not yet committed.&lt;/p&gt;

&lt;p&gt;We must save enough information about pre-committed transactions in checkpointed state to be able to either &lt;code&gt;abort&lt;/code&gt; or &lt;code&gt;commit&lt;/code&gt; transactions after a restart. In our example, this would be the path to the temporary file and target directory.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; takes this scenario into account, and it always issues a preemptive commit when restoring state from a checkpoint. It is our responsibility to implement a commit in an idempotent way. Generally, this shouldn’t be an issue. In our example, we can recognize such a situation: the temporary file is not in the temporary directory, but has already been moved to the target directory.&lt;/p&gt;

&lt;p&gt;There are a handful of other edge cases that &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; takes into account, too. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html&quot;&gt;Learn more in the Flink documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;

&lt;p&gt;If you’ve made it this far, thanks for staying with us through a detailed post. Here are some key points that we covered:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Flink’s checkpointing system serves as Flink’s basis for supporting a two-phase commit protocol and providing end-to-end exactly-once semantics.&lt;/li&gt;
  &lt;li&gt;An advantage of this approach is that Flink does not materialize data in transit the way that some other systems do–there’s no need to write every stage of the computation to disk as is the case is most batch processing.&lt;/li&gt;
  &lt;li&gt;Flink’s new &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and external systems that support transactions&lt;/li&gt;
  &lt;li&gt;Starting with &lt;a href=&quot;https://data-artisans.com/blog/announcing-the-apache-flink-1-4-0-release&quot;&gt;Flink 1.4.0&lt;/a&gt;, both the Pravega and Kafka 0.11 producers provide exactly-once semantics; Kafka introduced transactions for the first time in Kafka 0.11, which is what made the Kafka exactly-once producer possible in Flink.&lt;/li&gt;
  &lt;li&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-011&quot;&gt;Kafka 0.11 producer&lt;/a&gt; is implemented on top of the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt;, and it offers very low overhead compared to the at-least-once Kafka producer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We’re very excited about what this new feature enables, and we look forward to being able to support additional producers with the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; in the future.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post &lt;a href=&quot;https://data-artisans.com/blog/end-to-end-exactly-once-processing-apache-flink-apache-kafka&quot; target=&quot;_blank&quot;&gt; first appeared on the data Artisans blog &lt;/a&gt;and was contributed to Apache Flink and the Flink blog by the original authors Piotr Nowojski and Mike Winters.&lt;/em&gt;&lt;/p&gt;
&lt;link rel=&quot;canonical&quot; href=&quot;https://data-artisans.com/blog/end-to-end-exactly-once-processing-apache-flink-apache-kafka&quot; /&gt;

</description>
<pubDate>Thu, 01 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html</link>
<guid isPermaLink="true">/features/2018/03/01/end-to-end-exactly-once-apache-flink.html</guid>
</item>

<item>
<title>Apache Flink 1.4.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.4 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.4.0. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.4.1.&lt;/p&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6321&quot;&gt;FLINK-6321&lt;/a&gt;] -         RocksDB state backend Checkpointing is not working with KeyedCEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7499&quot;&gt;FLINK-7499&lt;/a&gt;] -         double buffer release in SpillableSubpartitionView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7756&quot;&gt;FLINK-7756&lt;/a&gt;] -         RocksDB state backend Checkpointing (Async and Incremental)  is not working with CEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7760&quot;&gt;FLINK-7760&lt;/a&gt;] -         Restore failing from external checkpointing metadata.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8323&quot;&gt;FLINK-8323&lt;/a&gt;] -         Fix Mod scala function bug
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5506&quot;&gt;FLINK-5506&lt;/a&gt;] -         Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6951&quot;&gt;FLINK-6951&lt;/a&gt;] -         Incompatible versions of httpcomponents jars for Flink kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7949&quot;&gt;FLINK-7949&lt;/a&gt;] -         AsyncWaitOperator is not restarting when queue is full
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8145&quot;&gt;FLINK-8145&lt;/a&gt;] -         IOManagerAsync not properly shut down in various tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8200&quot;&gt;FLINK-8200&lt;/a&gt;] -         RocksDBAsyncSnapshotTest should use temp fold instead of fold with fixed name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8226&quot;&gt;FLINK-8226&lt;/a&gt;] -         Dangling reference generated after NFA clean up timed out SharedBufferEntry
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8230&quot;&gt;FLINK-8230&lt;/a&gt;] -         NPE in OrcRowInputFormat on nested structs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8235&quot;&gt;FLINK-8235&lt;/a&gt;] -         Cannot run spotbugs for single module
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8242&quot;&gt;FLINK-8242&lt;/a&gt;] -         ClassCastException in OrcTableSource.toOrcPredicate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8248&quot;&gt;FLINK-8248&lt;/a&gt;] -         RocksDB state backend Checkpointing is not working with KeyedCEP in 1.4
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8249&quot;&gt;FLINK-8249&lt;/a&gt;] -         Kinesis Producer didnt configure region
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8261&quot;&gt;FLINK-8261&lt;/a&gt;] -         Typos in the shading exclusion for jsr305 in the quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8263&quot;&gt;FLINK-8263&lt;/a&gt;] -         Wrong packaging of flink-core in scala quickstarty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8265&quot;&gt;FLINK-8265&lt;/a&gt;] -         Missing jackson dependency for flink-mesos
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8270&quot;&gt;FLINK-8270&lt;/a&gt;] -         TaskManagers do not use correct local path for shipped Keytab files in Yarn deployment modes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8275&quot;&gt;FLINK-8275&lt;/a&gt;] -         Flink YARN deployment with Kerberos enabled not working 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8278&quot;&gt;FLINK-8278&lt;/a&gt;] -         Scala examples in Metric documentation do not compile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8283&quot;&gt;FLINK-8283&lt;/a&gt;] -         FlinkKafkaConsumerBase failing on Travis with no output in 10min
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8295&quot;&gt;FLINK-8295&lt;/a&gt;] -         Netty shading does not work properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8306&quot;&gt;FLINK-8306&lt;/a&gt;] -         FlinkKafkaConsumerBaseTest has invalid mocks on final methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8318&quot;&gt;FLINK-8318&lt;/a&gt;] -         Conflict jackson library with ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8325&quot;&gt;FLINK-8325&lt;/a&gt;] -         Add COUNT AGG support constant parameter, i.e. COUNT(*), COUNT(1) 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8352&quot;&gt;FLINK-8352&lt;/a&gt;] -         Flink UI Reports No Error on Job Submission Failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8355&quot;&gt;FLINK-8355&lt;/a&gt;] -         DataSet Should not union a NULL row for AGG without GROUP BY clause.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8371&quot;&gt;FLINK-8371&lt;/a&gt;] -         Buffers are not recycled in a non-spilled SpillableSubpartition upon release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8398&quot;&gt;FLINK-8398&lt;/a&gt;] -         Stabilize flaky KinesisDataFetcherTests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8406&quot;&gt;FLINK-8406&lt;/a&gt;] -         BucketingSink does not detect hadoop file systems
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8409&quot;&gt;FLINK-8409&lt;/a&gt;] -         Race condition in KafkaConsumerThread leads to potential NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8419&quot;&gt;FLINK-8419&lt;/a&gt;] -         Kafka consumer&amp;#39;s offset metrics are not registered for dynamically discovered partitions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8421&quot;&gt;FLINK-8421&lt;/a&gt;] -         HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8433&quot;&gt;FLINK-8433&lt;/a&gt;] -         Update code example for &amp;quot;Managed Operator State&amp;quot; documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8461&quot;&gt;FLINK-8461&lt;/a&gt;] -         Wrong logger configurations for shaded Netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8466&quot;&gt;FLINK-8466&lt;/a&gt;] -         ErrorInfo needs to hold Exception as SerializedThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8484&quot;&gt;FLINK-8484&lt;/a&gt;] -         Kinesis consumer re-reads closed shards on job restart
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8485&quot;&gt;FLINK-8485&lt;/a&gt;] -         Running Flink inside Intellij no longer works after upgrading from 1.3.2 to 1.4.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8489&quot;&gt;FLINK-8489&lt;/a&gt;] -         Data is not emitted by second ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8496&quot;&gt;FLINK-8496&lt;/a&gt;] -         WebUI does not display TM MemorySegment metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8499&quot;&gt;FLINK-8499&lt;/a&gt;] -         Kryo must not be child-first loaded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8522&quot;&gt;FLINK-8522&lt;/a&gt;] -         DefaultOperatorStateBackend writes data in checkpoint that is never read.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8559&quot;&gt;FLINK-8559&lt;/a&gt;] -         Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8561&quot;&gt;FLINK-8561&lt;/a&gt;] -         SharedBuffer line 573 uses == to compare BufferEntries instead of .equals.
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8079&quot;&gt;FLINK-8079&lt;/a&gt;] -         Skip remaining E2E tests if one failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8202&quot;&gt;FLINK-8202&lt;/a&gt;] -         Update queryable section on configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8243&quot;&gt;FLINK-8243&lt;/a&gt;] -         OrcTableSource should recursively read all files in nested directories of the input path.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8260&quot;&gt;FLINK-8260&lt;/a&gt;] -         Document API of Kafka 0.11 Producer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8264&quot;&gt;FLINK-8264&lt;/a&gt;] -         Add Scala to the parent-first loading patterns
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8271&quot;&gt;FLINK-8271&lt;/a&gt;] -         upgrade from deprecated classes to AmazonKinesis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8287&quot;&gt;FLINK-8287&lt;/a&gt;] -         Flink Kafka Producer docs should clearly state what partitioner is used by default
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8296&quot;&gt;FLINK-8296&lt;/a&gt;] -         Rework FlinkKafkaConsumerBestTest to not use Java reflection for dependency injection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8346&quot;&gt;FLINK-8346&lt;/a&gt;] -         add S3 signature v4 workaround to docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8362&quot;&gt;FLINK-8362&lt;/a&gt;] -         Shade Elasticsearch dependencies away
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8455&quot;&gt;FLINK-8455&lt;/a&gt;] -         Add Hadoop to the parent-first loading patterns
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8473&quot;&gt;FLINK-8473&lt;/a&gt;] -         JarListHandler may fail with NPE if directory is deleted
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8571&quot;&gt;FLINK-8571&lt;/a&gt;] -         Provide an enhanced KeyedStream implementation to use ForwardPartitioner
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8472&quot;&gt;FLINK-8472&lt;/a&gt;] -         Extend migration tests for Flink 1.4
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 15 Feb 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/02/15/release-1.4.1.html</link>
<guid isPermaLink="true">/news/2018/02/15/release-1.4.1.html</guid>
</item>

<item>
<title>Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</title>
<description>&lt;p&gt;Apache Flink was purpose-built for &lt;em&gt;stateful&lt;/em&gt; stream processing. However, what is state in a stream processing application? I defined state and stateful stream processing in a &lt;a href=&quot;http://flink.apache.org/features/2017/07/04/flink-rescalable-state.html&quot;&gt;previous blog post&lt;/a&gt;, and in case you need a refresher, &lt;em&gt;state is defined as memory in an application’s operators that stores information about previously-seen events that you can use to influence the processing of future events&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;State is a fundamental, enabling concept in stream processing required for a majority of complex use cases. Some examples highlighted in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html&quot;&gt;Flink documentation&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;When an application searches for certain event patterns, the state stores the sequence of events encountered so far.&lt;/li&gt;
  &lt;li&gt;When aggregating events per minute, the state holds the pending aggregates.&lt;/li&gt;
  &lt;li&gt;When training a machine learning model over a stream of data points, the state holds the current version of the model parameters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, stateful stream processing is only useful in production environments if the state is fault tolerant. “Fault tolerance” means that even if there’s a software or machine failure, the computed end-result is accurate, with no data loss or double-counting of events.&lt;/p&gt;

&lt;p&gt;Flink’s fault tolerance has always been a powerful and popular feature, minimizing the impact of software or machine failure on your business and making it possible to guarantee exactly-once results from a Flink application.&lt;/p&gt;

&lt;p&gt;Core to this is &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html&quot;&gt;checkpointing&lt;/a&gt;, which is the mechanism Flink uses to make application state fault tolerant. A checkpoint in Flink is a global, asynchronous snapshot of application state that’s taken on a regular interval and sent to durable storage (usually, a distributed file system). In the event of a failure, Flink restarts an application using the most recently completed checkpoint as a starting point. Some Apache Flink users run applications with gigabytes or even terabytes of application state. These users reported that with such large state, creating a checkpoint was often a slow and resource intensive operation, which is why in Flink 1.3 we introduced ‘incremental checkpointing.’&lt;/p&gt;

&lt;p&gt;Before incremental checkpointing, every single Flink checkpoint consisted of the full state of an application. We created the incremental checkpointing feature after we noticed that writing the full state for every checkpoint was often unnecessary, as the state changes from one checkpoint to the next were rarely that large. Incremental checkpointing instead maintains the differences (or ‘delta’) between each checkpoint and stores only the differences between the last checkpoint and the current state.&lt;/p&gt;

&lt;p&gt;Incremental checkpoints can provide a significant performance improvement for jobs with a very large state. Early testing of the feature by a production user with terabytes of state shows a drop in checkpoint time from more than 3 minutes down to 30 seconds after implementing incremental checkpoints. This is because the checkpoint doesn’t need to transfer the full state to durable storage on each checkpoint.&lt;/p&gt;

&lt;h3 id=&quot;how-to-start&quot;&gt;How to Start&lt;/h3&gt;

&lt;p&gt;Currently, you can only use incremental checkpointing with a RocksDB state back-end, and Flink uses RocksDB’s internal backup mechanism to consolidate checkpoint data over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and Flink eventually consumes and prunes old checkpoints automatically.&lt;/p&gt;

&lt;p&gt;To enable incremental checkpointing in your application, I recommend you read the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/large_state_tuning.html#tuning-rocksdb&quot;&gt;the Apache Flink documentation on checkpointing&lt;/a&gt; for full details, but in summary, you enable checkpointing as normal, but enable incremental checkpointing in the constructor by setting the second parameter to &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;

&lt;h4 id=&quot;java-example&quot;&gt;Java Example&lt;/h4&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filebackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&quot;scala-example&quot;&gt;Scala Example&lt;/h4&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filebackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;By default, Flink retains 1 completed checkpoint, so if you need a higher number, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/checkpointing.html#related-config-options&quot;&gt;you can configure it with the following flag&lt;/a&gt;:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;checkpoints&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;retained&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;how-it-works&quot;&gt;How it Works&lt;/h3&gt;

&lt;p&gt;Flink’s incremental checkpointing uses &lt;a href=&quot;https://github.com/facebook/rocksdb/wiki/Checkpoints&quot;&gt;RocksDB checkpoints&lt;/a&gt; as a foundation. RocksDB is a key-value store based on ‘&lt;a href=&quot;https://en.wikipedia.org/wiki/Log-structured_merge-tree&quot;&gt;log-structured-merge&lt;/a&gt;’ (LSM) trees that collects all changes in a mutable (changeable) in-memory buffer called a ‘memtable’. Any updates to the same key in the memtable replace previous values, and once the memtable is full, RocksDB writes it to disk with all entries sorted by their key and with light compression applied. Once RocksDB writes the memtable to disk it is immutable (unchangeable) and is now called a ‘sorted-string-table’ (sstable).&lt;/p&gt;

&lt;p&gt;A ‘compaction’ background task merges sstables to consolidate potential duplicates for each key, and over time RocksDB deletes the original sstables, with the merged sstable containing all information from across all the other sstables.&lt;/p&gt;

&lt;p&gt;On top of this, Flink tracks which sstable files RocksDB has created and deleted since the previous checkpoint, and as the sstables are immutable, Flink uses this to figure out the state changes. To do this, Flink triggers a flush in RocksDB, forcing all memtables into sstables on disk, and hard-linked in a local temporary directory. This process is synchronous to the processing pipeline, and Flink performs all further steps asynchronously and does not block processing.&lt;/p&gt;

&lt;p&gt;Then Flink copies all new sstables to stable storage (e.g., HDFS, S3) to reference in the new checkpoint. Flink doesn’t copy all sstables that already existed in the previous checkpoint to stable storage but re-reference them. Any new checkpoints will no longer reference deleted files as deleted sstables in RocksDB are always the result of compaction, and it eventually replaces old tables with an sstable that is the result of a merge. This how in Flink’s incremental checkpoints can prune the checkpoint history.&lt;/p&gt;

&lt;p&gt;For tracking changes between checkpoints, the uploading of consolidated tables is redundant work. Flink performs the process incrementally, and typically adds only a small overhead, so we consider this worthwhile because it allows Flink to keep a shorter history of checkpoints to consider in a recovery.&lt;/p&gt;

&lt;h4 id=&quot;an-example&quot;&gt;An Example&lt;/h4&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/incremental_cp_impl_example.svg&quot; alt=&quot;Example setup&quot; /&gt;
&lt;em&gt;Example setup&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Take an example with a subtask of one operator that has a keyed state, and the number of retained checkpoints set at &lt;strong&gt;2&lt;/strong&gt;. The columns in the figure above show the state of the local RocksDB instance for each checkpoint, the files it references, and the counts in the shared state registry after the checkpoint completes.&lt;/p&gt;

&lt;p&gt;For checkpoint ‘CP 1’, the local RocksDB directory contains two sstable files, it considers these new and uploads them to stable storage using directory names that match the checkpoint name. When the checkpoint completes, Flink creates the two entries in the shared state registry and sets their counts to ‘1’. The key in the shared state registry is a composite of an operator, subtask, and the original sstable file name. The registry also keeps a mapping from the key to the file path in stable storage.&lt;/p&gt;

&lt;p&gt;For checkpoint ‘CP 2’, RocksDB has created two new sstable files, and the two older ones still exist. For checkpoint ‘CP 2’, Flink adds the two new files to stable storage and can reference the previous two files. When the checkpoint completes, Flink increases the counts for all referenced files by 1.&lt;/p&gt;

&lt;p&gt;For checkpoint ‘CP 3’, RocksDB’s compaction has merged &lt;code&gt;sstable-(1)&lt;/code&gt;, &lt;code&gt;sstable-(2)&lt;/code&gt;, and &lt;code&gt;sstable-(3)&lt;/code&gt; into &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and deleted the original files. This merged file contains the same information as the source files, with all duplicate entries eliminated. In addition to this merged file, &lt;code&gt;sstable-(4)&lt;/code&gt; still exists and there is now a new &lt;code&gt;sstable-(5)&lt;/code&gt; file. Flink adds the new &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and &lt;code&gt;sstable-(5)&lt;/code&gt; files to stable storage, &lt;code&gt;sstable-(4)&lt;/code&gt; is re-referenced from checkpoint ‘CP 2’ and increases the counts for referenced files by 1. The older ‘CP 1’ checkpoint is now deleted as the number of retained checkpoints (2) has been reached. As part of this deletion, Flink decreases the counts for all files referenced ‘CP 1’, (&lt;code&gt;sstable-(1)&lt;/code&gt; and &lt;code&gt;sstable-(2)&lt;/code&gt;), by 1.&lt;/p&gt;

&lt;p&gt;For checkpoint ‘CP-4’, RocksDB has merged &lt;code&gt;sstable-(4)&lt;/code&gt;, &lt;code&gt;sstable-(5)&lt;/code&gt;, and a new &lt;code&gt;sstable-(6)&lt;/code&gt; into &lt;code&gt;sstable-(4,5,6)&lt;/code&gt;. Flink adds this new table to stable storage and references it together with &lt;code&gt;sstable-(1,2,3)&lt;/code&gt;, it increases the counts for &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and &lt;code&gt;sstable-(4,5,6)&lt;/code&gt; by 1 and then deletes ‘CP-2’ as the number of retained checkpoints has been reached. As the counts for &lt;code&gt;sstable-(1)&lt;/code&gt;, &lt;code&gt;sstable-(2)&lt;/code&gt;, and &lt;code&gt;sstable-(3)&lt;/code&gt; have now dropped to 0, and Flink deletes them from stable storage.&lt;/p&gt;

&lt;h3 id=&quot;race-conditions-and-concurrent-checkpoints&quot;&gt;Race Conditions and Concurrent Checkpoints&lt;/h3&gt;

&lt;p&gt;As Flink can execute multiple checkpoints in parallel, sometimes new checkpoints start before confirming previous checkpoints as completed. Because of this, you should consider which the previous checkpoint to use as a basis for a new incremental checkpoint. Flink only references state from a checkpoint confirmed by the checkpoint coordinator so that it doesn’t unintentionally reference a deleted shared file.&lt;/p&gt;

&lt;h3 id=&quot;restoring-checkpoints-and-performance-considerations&quot;&gt;Restoring Checkpoints and Performance Considerations&lt;/h3&gt;

&lt;p&gt;If you enable incremental checkpointing, there are no further configuration steps needed to recover your state in case of failure. If a failure occurs, Flink’s &lt;code&gt;JobManager&lt;/code&gt; tells all tasks to restore from the last completed checkpoint, be it a full or incremental checkpoint. Each &lt;code&gt;TaskManager&lt;/code&gt; then downloads their share of the state from the checkpoint on the distributed file system.&lt;/p&gt;

&lt;p&gt;Though the feature can lead to a substantial improvement in checkpoint time for users with a large state, there are trade-offs to consider with incremental checkpointing. Overall, the process reduces the checkpointing time during normal operations but can lead to a longer recovery time depending on the size of your state. If the cluster failure is particularly severe and the Flink &lt;code&gt;TaskManager&lt;/code&gt;s have to read from multiple checkpoints, recovery can be a slower operation than when using non-incremental checkpointing. You can also no longer delete old checkpoints as newer checkpoints need them, and the history of differences between checkpoints can grow indefinitely over time. You need to plan for larger distributed storage to maintain the checkpoints and the network overhead to read from it.&lt;/p&gt;

&lt;p&gt;There are some strategies for improving the convenience/performance trade-off, and I recommend you read &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#basics-of-incremental-checkpoints&quot;&gt;the Flink documentation&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This post &lt;a href=&quot;https://data-artisans.com/blog/managing-large-state-apache-flink-incremental-checkpointing-overview&quot; target=&quot;_blank&quot;&gt; originally appeared on the data Artisans blog &lt;/a&gt;and was contributed to the Flink blog by Stefan Richter and Chris Ward.&lt;/em&gt;&lt;/p&gt;
&lt;link rel=&quot;canonical&quot; href=&quot;https://data-artisans.com/blog/managing-large-state-apache-flink-incremental-checkpointing-overview&quot; /&gt;

</description>
<pubDate>Tue, 30 Jan 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html</link>
<guid isPermaLink="true">/features/2018/01/30/incremental-checkpointing.html</guid>
</item>

<item>
<title>Apache Flink in 2017: Year in Review</title>
<description>&lt;p&gt;2017 was another exciting year for the Apache Flink® community, with 3 major version releases (&lt;a href=&quot;http://flink.apache.org/news/2017/02/06/release-1.2.0.html&quot;&gt;Flink 1.2.0 in February&lt;/a&gt;, &lt;a href=&quot;http://flink.apache.org/news/2017/06/01/release-1.3.0.html&quot;&gt;Flink 1.3.0 in June&lt;/a&gt;, and &lt;a href=&quot;http://flink.apache.org/news/2017/12/12/release-1.4.0.html&quot;&gt;Flink 1.4.0 in December&lt;/a&gt;) and the first-ever &lt;a href=&quot;https://sf-2017.flink-forward.org/&quot;&gt;Flink Forward in San Francisco&lt;/a&gt;, giving Flink community members in another corner of the globe an opportunity to connect. Users shared details about their innovative production deployments, redefining what is possible with a modern stream processing framework like Flink.&lt;/p&gt;

&lt;p&gt;In this post, we’ll look back on the project’s progress over the course of 2017, and we’ll also preview what 2018 has in store.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#community-growth&quot; id=&quot;markdown-toc-community-growth&quot;&gt;Community Growth&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#github&quot; id=&quot;markdown-toc-github&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#meetups&quot; id=&quot;markdown-toc-meetups&quot;&gt;Meetups&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-forward-2017&quot; id=&quot;markdown-toc-flink-forward-2017&quot;&gt;Flink Forward 2017&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#features-and-ecosystem&quot; id=&quot;markdown-toc-features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-ecosystem-growth&quot; id=&quot;markdown-toc-flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#feature-timeline-in-2017&quot; id=&quot;markdown-toc-feature-timeline-in-2017&quot;&gt;Feature Timeline in 2017&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#looking-ahead-to-2018&quot; id=&quot;markdown-toc-looking-ahead-to-2018&quot;&gt;Looking ahead to 2018&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;community-growth&quot;&gt;Community Growth&lt;/h2&gt;

&lt;h3 id=&quot;github&quot;&gt;Github&lt;/h3&gt;

&lt;p&gt;First, here’s a summary of community statistics from &lt;a href=&quot;https://github.com/apache/flink&quot;&gt;GitHub&lt;/a&gt;. At the time of writing:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Contributors&lt;/strong&gt; have increased from 258 in December 2016 to 352 in December 2017 (up &lt;strong&gt;36%&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Stars&lt;/strong&gt; have increased from 1830 in December 2016 to 3036 in December 2017 (up &lt;strong&gt;65%&lt;/strong&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Forks&lt;/strong&gt; have increased from 1255 in December 2016 to 2070 in December 2017 (up &lt;strong&gt;65%&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The community also welcomed &lt;strong&gt;10 new committers in 2017&lt;/strong&gt;: Kostas Kloudas, Jark Wu, Stefan Richter, Kurt Young, Theodore Vasiloudis, Xiaogang Shi, Dawid Wysakowicz, Shaoxuan Wang, Jincheng Sun and Haohui Mai.&lt;/p&gt;

&lt;p&gt;We also welcomed &lt;strong&gt;3 new members to the &lt;a href=&quot;http://www.apache.org/foundation/governance/pmcs.html&quot;&gt;project management committee (PMC)&lt;/a&gt;&lt;/strong&gt;: Greg Hogan, Tzu-Li (Gordon) Tai and Chesnay Schepler.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/github-stats-2017.png&quot; alt=&quot;Apache Flink GitHub Stats&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Next, let’s take a look at a few other project stats, starting with number of commits. If we run:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git log --pretty&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;oneline --after&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;12/31/2016 &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; wc -l&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Inside the Flink repository, we’ll see a total of &lt;strong&gt;2316&lt;/strong&gt; commits so far in 2017, bringing the all-time total commits to &lt;strong&gt;12,532&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now, let’s go a bit deeper, here are instructions to take a look at this data yourself.&lt;/p&gt;

&lt;p&gt;Download and install gitstats from the &lt;a href=&quot;http://gitstats.sourceforge.net/&quot;&gt;project homepage&lt;/a&gt;, then clone the Apache Flink git repository:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone git@github.com:apache/flink.git&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Generate the statistics&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;gitstats flink/ flink-stats/&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;View all the statistics as an HTML page using your default browser:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;open flink-stats/index.html&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Flink surpassed 1 million lines of code in 2016, and that trend continued in 2017 with the code base now clocking in at &lt;strong&gt;1,257,949&lt;/strong&gt; lines.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-lines-of-code-2017.png&quot; alt=&quot;Flink Total Lines of Code&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Monday remains the day of the week with the most commits over the project’s history, but Wednesday is catching up:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-dow-2017.png&quot; alt=&quot;Flink Commits by Day of Week&quot; /&gt;&lt;/p&gt;

&lt;p&gt;5 pm remains the preferred commit time, closely followed by 4 pm:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-hod-2017.png&quot; alt=&quot;Flink Commits by Hour of Day&quot; /&gt;&lt;/p&gt;

&lt;h3 id=&quot;meetups&quot;&gt;Meetups&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://www.meetup.com/topics/apache-flink/&quot;&gt;Apache Flink Meetup membership&lt;/a&gt; grew by &lt;strong&gt;20%&lt;/strong&gt; this year to a total of &lt;strong&gt;19,767&lt;/strong&gt; members at &lt;strong&gt;39&lt;/strong&gt; meetups listing Flink as a topic. With meetups on five continents, the Flink community is proud to be truly global.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-meetups-dec2017.png&quot; alt=&quot;Apache Flink Meetup Map&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;flink-forward-2017&quot;&gt;Flink Forward 2017&lt;/h2&gt;

&lt;p&gt;2017 was the first year we ran a Flink Forward conference in both &lt;a href=&quot;https://berlin-2017.flink-forward.org&quot;&gt;Berlin&lt;/a&gt; (September 11-13) and &lt;a href=&quot;https://sf-2017.flink-forward.org&quot;&gt;San Francisco&lt;/a&gt; (April 10-11), and over 350 members of our community attended each event for speaker sessions, training, and discussion about Flink.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://www.slideshare.net/FlinkForward/presentations&quot;&gt;Slides&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA&quot;&gt;videos&lt;/a&gt; are available for all speaker sessions, and if you’re interested in learning more about how organizations use Flink in production, we encourage you to browse and watch a couple.&lt;/p&gt;

&lt;p&gt;For 2018, Flink Forward will be back in &lt;a href=&quot;https://flink-forward.org/&quot;&gt;September in Berlin&lt;/a&gt;, and in &lt;a href=&quot;https://sf-2018.flink-forward.org/&quot;&gt;April in San Francisco&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/speaker-logos-ff2017.png&quot; alt=&quot;Flink Forward Speakers&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/h2&gt;

&lt;h3 id=&quot;flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/h3&gt;

&lt;p&gt;Flink was added to a selection of distributions and integrations during 2017, making it easier for a wider user base to get started with Flink:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;Official Docker image&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html&quot;&gt;Official DC/OS and Mesos support&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://data-artisans.com/blog/dellemc-launches-open-source-pravega-complete-apache-flink-connector&quot;&gt;A Flink connector&lt;/a&gt; for &lt;a href=&quot;http://pravega.io&quot;&gt;Pravega&lt;/a&gt;, Dell/EMC’s streaming storage system.&lt;/li&gt;
  &lt;li&gt;Uber announced AthenaX, a streaming SQL platform &lt;a href=&quot;https://data-artisans.com/blog/uber-introduces-open-source-athenax-streaming-sql-platform-apache-flink&quot;&gt;powered by Apache Flink&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;dataArtisans announced an early access program of a SaaS product based on Apache Flink, &lt;a href=&quot;https://data-artisans.com/blog/da-platform-2-stateful-stream-processing-with-apache-flink-made-easier&quot;&gt;dA Platform 2&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;feature-timeline-in-2017&quot;&gt;Feature Timeline in 2017&lt;/h3&gt;

&lt;p&gt;Just in time for the end of the year, our 1.4 release &lt;a href=&quot;http://flink.apache.org/news/2017/12/12/release-1.4.0.html&quot;&gt;read the full release announcement&lt;/a&gt; landed in mid-December culminating 5 months of work and the resolution of more than 900 issues. This is the fifth major release in the 1.x.y series.&lt;/p&gt;

&lt;p&gt;Here’s a selection of major features added to Flink over the course of 2017:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-releases-2017.png&quot; alt=&quot;Flink Release Timeline 2017&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you take a look at &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5016?jql=project%20%3D%20FLINK%20AND%20issuetype%20in%20(Bug%2C%20Improvement%2C%20%22New%20Feature%22)%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20resolved%20%3E%3D%202017-01-01%20AND%20resolved%20%3C%3D%202017-12-31%20ORDER%20BY%20resolved%20ASC&quot;&gt;the resolved issues and enhancements for 2017 on Jira&lt;/a&gt; you can see that the community resolved over 1,831 issues and feature additions.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/news/2016/12/19/2016-year-in-review.html#looking-ahead-to-2017&quot;&gt;Regarding roadmap commitments from 2016&lt;/a&gt;, there is mixed news, with some items a part of current releases, others scheduled for upcoming releases and some that remain under discussion.&lt;/p&gt;

&lt;h2 id=&quot;looking-ahead-to-2018&quot;&gt;Looking ahead to 2018&lt;/h2&gt;

&lt;p&gt;A good source of information about the Flink community’s roadmap is the list of &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals&quot;&gt;Flink Improvement Proposals (FLIPs)&lt;/a&gt; in the project wiki. Below, we’ll highlight a selection of FLIPs accepted by the community as well as some that are still under discussion.&lt;/p&gt;

&lt;p&gt;Work is already underway on a number of these features, and some will be included in Flink 1.5 at the beginning of 2018.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Improved BLOB storage architecture&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-19:+Improved+BLOB+storage+architecture&quot;&gt;FLIP-19&lt;/a&gt; to consolidate API usage and improve concurrency.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Integration of SQL and CEP&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-20:+Integration+of+SQL+and+CEP&quot;&gt;FLIP-20&lt;/a&gt; to allow developers to  create complex event processing (CEP) patterns using SQL statements.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Unified checkpoints and savepoints&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-10:+Unify+Checkpoints+and+Savepoints&quot;&gt;FLIP-10&lt;/a&gt;, to allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;An improved Flink deployment and process model&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;, to allow for better integration with Flink and cluster managers and deployment technologies such as Mesos, Docker, and Kubernetes.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Fine-grained recovery from task failures&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+:+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt; to improve recovery efficiency and only re-execute failed tasks, reducing the amount of state that Flink needs to transfer on recovery.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;An SQL Client&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client&quot;&gt;FLIP-24&lt;/a&gt; to add a service and a client to execute SQL queries against batch and streaming tables.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Serving of machine learning models&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving&quot;&gt;FLIP-23&lt;/a&gt; to add a library that allows users to apply offline-trained machine learning models to data streams.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lastly, we’d like to extend a sincere thank you to all the Flink community for making 2017 a great year!&lt;/p&gt;
</description>
<pubDate>Thu, 21 Dec 2017 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/12/21/2017-year-in-review.html</link>
<guid isPermaLink="true">/news/2017/12/21/2017-year-in-review.html</guid>
</item>

<item>
<title>Apache Flink 1.4.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.4.0 release. Over the past 5 months, the
Flink community has been working hard to resolve more than 900 issues. See the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12340533&quot;&gt;complete changelog&lt;/a&gt;
for more detail.&lt;/p&gt;

&lt;p&gt;This is the fifth major release in the 1.x.y series. It is API-compatible with the other 1.x.y
releases for APIs annotated with the @Public annotation.&lt;/p&gt;

&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads&lt;/a&gt; page on the Flink project site.&lt;/p&gt;

&lt;p&gt;The release includes improvements to many different aspects of Flink, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The ability to build end-to-end exactly-once applications with Flink and popular data sources and sinks such as Apache Kafka.&lt;/li&gt;
  &lt;li&gt;A more developer-friendly dependency structure as well as Hadoop-free Flink for Flink users who do not have Hadoop dependencies.&lt;/li&gt;
  &lt;li&gt;Support for JOIN and for new sources and sinks in table API and SQL, expanding the range of logic that can be expressed with these APIs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A summary of some of the features in the release is available below.&lt;/p&gt;

&lt;p&gt;For more background on the Flink 1.4.0 release and the work planned for the Flink 1.5.0 release, please refer to &lt;a href=&quot;http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html&quot;&gt;this blog post&lt;/a&gt; on the Apache Flink blog.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot; id=&quot;markdown-toc-end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot;&gt;End-to-end Exactly Once Applications with Apache Flink and Apache Kafka and TwoPhaseCommitSinkFunction&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#table-api-and-streaming-sql-enhancements&quot; id=&quot;markdown-toc-table-api-and-streaming-sql-enhancements&quot;&gt;Table API and Streaming SQL Enhancements&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#a-significantly-improved-dependency-structure-and-reversed-class-loading&quot; id=&quot;markdown-toc-a-significantly-improved-dependency-structure-and-reversed-class-loading&quot;&gt;A Significantly-Improved Dependency Structure and Reversed Class Loading&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#hadoop-free-flink&quot; id=&quot;markdown-toc-hadoop-free-flink&quot;&gt;Hadoop-free Flink&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#improvements-to-flink-internals&quot; id=&quot;markdown-toc-improvements-to-flink-internals&quot;&gt;Improvements to Flink Internals&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#improvements-to-the-queryable-state-client&quot; id=&quot;markdown-toc-improvements-to-the-queryable-state-client&quot;&gt;Improvements to the Queryable State Client&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#metrics-and-monitoring&quot; id=&quot;markdown-toc-metrics-and-monitoring&quot;&gt;Metrics and Monitoring&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#connector-improvements-and-fixes&quot; id=&quot;markdown-toc-connector-improvements-and-fixes&quot;&gt;Connector improvements and fixes&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#release-notes---please-read&quot; id=&quot;markdown-toc-release-notes---please-read&quot;&gt;Release Notes - Please Read&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#changes-to-dynamic-class-loading-of-user-code&quot; id=&quot;markdown-toc-changes-to-dynamic-class-loading-of-user-code&quot;&gt;Changes to dynamic class loading of user code&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#no-more-avro-dependency-included-by-default&quot; id=&quot;markdown-toc-no-more-avro-dependency-included-by-default&quot;&gt;No more Avro dependency included by default&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#hadoop-free-flink-1&quot; id=&quot;markdown-toc-hadoop-free-flink-1&quot;&gt;Hadoop-free Flink&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#bundled-s3-filesystems&quot; id=&quot;markdown-toc-bundled-s3-filesystems&quot;&gt;Bundled S3 FileSystems&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;

&lt;h3 id=&quot;end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot;&gt;End-to-end Exactly Once Applications with Apache Flink and Apache Kafka and TwoPhaseCommitSinkFunction&lt;/h3&gt;

&lt;p&gt;Flink 1.4 includes a first version of an exactly-once producer for Apache Kafka 0.11. This producer
enables developers who build Flink applications with Kafka as a data source and sink to compute
exactly-once results not just within the Flink program, but truly “end-to-end” in the application.&lt;/p&gt;

&lt;p&gt;The common pattern used for exactly-once applications in Kafka and in other sinks–the two-phase
commit algorithm–has been extracted in Flink 1.4.0 into a common class, the
TwoPhaseCommitSinkFunction (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7210&quot;&gt;FLINK-7210&lt;/a&gt;). This
will make it easier for users to create their own exactly-once data sinks in the future.&lt;/p&gt;

&lt;h3 id=&quot;table-api-and-streaming-sql-enhancements&quot;&gt;Table API and Streaming SQL Enhancements&lt;/h3&gt;

&lt;p&gt;Flink SQL now supports windowed joins based on processing time and event time
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5725&quot;&gt;FLINK-5725&lt;/a&gt;). Users will be able to execute a
join between 2 streaming tables and compute windowed results according to these 2 different concepts
of time. The syntax and semantics in Flink are the same as standard SQL with JOIN and with Flink’s
streaming SQL more broadly.&lt;/p&gt;

&lt;p&gt;Flink SQL also now supports “INSERT INTO SELECT” queries, which makes it possible to write results
from SQL directly into a data sink (an external system that receives data from a Flink application).
This improves operability and ease-of-use of Flink SQL.&lt;/p&gt;

&lt;p&gt;The Table API now supports aggregations on streaming tables; previously, the only supported
operations on streaming tables were projection, selection, and union
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4557&quot;&gt;FLINK-4557&lt;/a&gt;). This feature was initially discussed in Flink
Improvement Proposal 11: &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot;&gt;FLIP-11&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The release also adds support for new table API and SQL sources and sinks, including a Kafka 0.11
source and JDBC sink.&lt;/p&gt;

&lt;p&gt;Lastly, Flink SQL now uses Apache Calcite 1.14, which was just released in October 2017
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7051&quot;&gt;FLINK-7051&lt;/a&gt;).&lt;/p&gt;

&lt;h3 id=&quot;a-significantly-improved-dependency-structure-and-reversed-class-loading&quot;&gt;A Significantly-Improved Dependency Structure and Reversed Class Loading&lt;/h3&gt;

&lt;p&gt;Flink 1.4.0 shades a number of dependences and subtle runtime conflicts, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;ASM&lt;/li&gt;
  &lt;li&gt;Guava&lt;/li&gt;
  &lt;li&gt;Jackson&lt;/li&gt;
  &lt;li&gt;Netty&lt;/li&gt;
  &lt;li&gt;Apache Zookeeper&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These changes improve Flink’s overall stability and removes friction when embedding Flink or calling
Flink “library style”.&lt;/p&gt;

&lt;p&gt;The release also introduces default reversed (child-first) class loading for dynamically-loaded user
code, allowing for different dependencies than those included in the core framework.&lt;/p&gt;

&lt;p&gt;For details on those changes please check out the relevant Jira issues:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7442&quot;&gt;FLINK-7442&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6529&quot;&gt;FLINK-6529&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;hadoop-free-flink&quot;&gt;Hadoop-free Flink&lt;/h3&gt;

&lt;p&gt;Apache Flink users without any Apache Hadoop dependencies can now run Flink without Hadoop. Flink
programs that do not rely on Hadoop components can now be much smaller, a benefit particularly in a
container-based setup resulting in less network traffic and better performance.&lt;/p&gt;

&lt;p&gt;This includes the addition of Flink’s own Amazon S3 filesystem implementations based on Hadoop’s S3a
and Presto’s S3 file system with properly shaded dependencies (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5706&quot;&gt;FLINK-5706&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The details of these changes regarding Hadoop-free Flink are available in the Jira issue:
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2268&quot;&gt;FLINK-2268&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;improvements-to-flink-internals&quot;&gt;Improvements to Flink Internals&lt;/h3&gt;

&lt;p&gt;Flink 1.4.0 introduces a new blob storage architecture that was first discussed in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture&quot;&gt;Flink Improvement Proposal 19&lt;/a&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6916&quot;&gt;FLINK-6916&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;This will enable easier integration with both the work being done in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;Flink Improvement Proposal 6&lt;/a&gt; in
the future and with other improvements in the 1.4.0 release, such as support for messages larger
than the maximum Akka Framesize (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6046&quot;&gt;FLINK-6046&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The improvement also enables Flink to leverage distributed file systems in high availability
settings for optimized distribution of deployment data to TaskManagers.&lt;/p&gt;

&lt;h3 id=&quot;improvements-to-the-queryable-state-client&quot;&gt;Improvements to the Queryable State Client&lt;/h3&gt;

&lt;p&gt;Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html&quot;&gt;queryable state&lt;/a&gt; makes it possible for users to access application state directly in Flink
before the state has been sent to an external database or key-value store.&lt;/p&gt;

&lt;p&gt;Flink 1.4.0 introduces a range of improvements to the queryable state client, including a more
container-friendly architecture, a more user-friendly API that hides configuration parameters, and
the groundwork to be able to expose window state (the state of an in-flight window) in the future.&lt;/p&gt;

&lt;p&gt;For details about the changes to queryable state please refer to the umbrella Jira issue:
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5675&quot;&gt;FLINK-5675&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;metrics-and-monitoring&quot;&gt;Metrics and Monitoring&lt;/h3&gt;

&lt;p&gt;Flink’s metrics system now also includes support for Prometheus, an increasingly-popular metrics and
reporting system within the Flink community (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6221&quot;&gt;FLINK-6221&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;And the Apache Kafka connector in Flink now exposes metrics for failed and successful offset commits
in the Kafka consumer callback (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6998&quot;&gt;FLINK-6998&lt;/a&gt;).&lt;/p&gt;

&lt;h3 id=&quot;connector-improvements-and-fixes&quot;&gt;Connector improvements and fixes&lt;/h3&gt;

&lt;p&gt;Flink 1.4.0 introduces an Apache Kafka 0.11 connector and, as described above, support for an
exactly-once producer for Kafka 0.11 (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6988&quot;&gt;FLINK-6988&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Additionally, the Flink-Kafka consumer now supports dynamic partition discovery &amp;amp; topic discovery
based on regex. This means that the Flink-Kafka consumer can pick up new Kafka partitions without
needing to restart the job and while maintaining exactly-once guarantees
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4022&quot;&gt;FLINK-4022&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Flink’s Apache Kinesis connector now uses an updated version of the Kinesis Consumer Library and
Kinesis Consumer Library. This introduces improved retry logic to the connector and should
significantly reduce the number of failures caused by Flink writing too quickly to Kinesis
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7366&quot;&gt;FLINK-7366&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Flink’s Apache Cassandra connector now supports Scala tuples–previously, only streams of Java
tuples were supported (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4497&quot;&gt;FLINK-4497&lt;/a&gt;). Also, a bug was fixed in
the Cassandra connector that caused messages to be lost in certain instances
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4500&quot;&gt;FLINK-4500&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;release-notes---please-read&quot;&gt;Release Notes - Please Read&lt;/h2&gt;

&lt;p&gt;Some of these changes will require updating the configuration or Maven dependencies for existing
programs. Please read below to see if you might be affected.&lt;/p&gt;

&lt;h3 id=&quot;changes-to-dynamic-class-loading-of-user-code&quot;&gt;Changes to dynamic class loading of user code&lt;/h3&gt;

&lt;p&gt;As mentioned above, we changed the way Flink loads user code from the previous default of
&lt;em&gt;parent-first class loading&lt;/em&gt; (the default for Java) to &lt;em&gt;child-first classloading&lt;/em&gt;, which is a common
practice in Java Application Servers, where this is also referred to as inverted or reversed class
loading.&lt;/p&gt;

&lt;p&gt;This should not affect regular user code but will enable programs to use a different version of
dependencies that come with Flink – for example Akka, netty, or Jackson. If you want to change back
to the previous default, you can use the configuration setting &lt;code&gt;classloader.resolve-order: parent-first&lt;/code&gt;,
the new default being &lt;code&gt;child-first&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;no-more-avro-dependency-included-by-default&quot;&gt;No more Avro dependency included by default&lt;/h3&gt;

&lt;p&gt;Flink previously included Avro by default so user programs could simply use Avro and not worry about
adding any dependencies. This behavior was changed in Flink 1.4 because it can lead to dependency
clashes.&lt;/p&gt;

&lt;p&gt;You now must manually include the Avro dependency (&lt;code&gt;flink-avro&lt;/code&gt;) with your program jar (or add it to
the Flink lib folder) if you want to use Avro.&lt;/p&gt;

&lt;h3 id=&quot;hadoop-free-flink-1&quot;&gt;Hadoop-free Flink&lt;/h3&gt;

&lt;p&gt;Starting with version 1.4, Flink can run without any Hadoop dependencies present in the Classpath.
Along with simply running without Hadoop, this enables Flink to dynamically use whatever Hadoop
version is available in the classpath.&lt;/p&gt;

&lt;p&gt;You could, for example, download the Hadoop-free release of Flink but use that to run on any
supported version of YARN, and Flink would dynamically use the Hadoop dependencies from YARN.&lt;/p&gt;

&lt;p&gt;This also means that in cases where you used connectors to HDFS, such as the &lt;code&gt;BucketingSink&lt;/code&gt; or
&lt;code&gt;RollingSink&lt;/code&gt;, you now have to ensure that you either use a Flink distribution with bundled Hadoop
dependencies or make sure to include Hadoop dependencies when building a jar file for your
application.&lt;/p&gt;

&lt;h3 id=&quot;bundled-s3-filesystems&quot;&gt;Bundled S3 FileSystems&lt;/h3&gt;

&lt;p&gt;Flink 1.4 comes bundled with two different S3 FileSystems based on the Presto S3 FileSystem and
the Hadoop S3A FileSystem. They don’t have dependencies (because all dependencies are
shaded/relocated) and you can use them by dropping the respective file from the &lt;code&gt;opt&lt;/code&gt; directory
into the &lt;code&gt;lib&lt;/code&gt; directory of your Flink installation. For more information about this, please refer
to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#built-in-file-systems&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;According to git shortlog, the following 106 people contributed to the 1.4.0 release. Thank you to
all contributors!&lt;/p&gt;

&lt;p&gt;Ajay Tripathy, Alejandro Alcalde, Aljoscha Krettek, Bang, Phiradet, Bowen Li, Chris Ward, Cristian,
Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii Kniazev, DmytroShkvyra, Fabian
Hueske, FlorianFan, Fokko Driesprong, Gabor Gevay, Gary Yao, Greg Hogan, Haohui Mai, Hequn Cheng,
James Lafa, Jark Wu, Jie Shen, Jing Fan, JingsongLi, Joerg Schad, Juan Paulo Gutierrez, Ken Geis,
Kent Murra, Kurt Young, Lim Chee Hau, Maximilian Bode, Michael Fong, Mike Kobit, Mikhail Lipkovich,
Nico Kruber, Novotnik, Petr, Nycholas de Oliveira e Oliveira, Patrick Lucas, Piotr Nowojski, Robert
Metzger, Rodrigo Bonifacio, Rong Rong, Scott Kidder, Sebastian Klemke, Shuyi Chen, Stefan Richter,
Stephan Ewen, Svend Vanderveken, Till Rohrmann, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Usman
Younas, Vetriselvan1187, Vishnu Viswanath, Wright, Eron, Xingcan Cui, Xpray, Yestin, Yonatan Most,
Zhenzhong Xu, Zhijiang, adebski, asdf2014, bbayani, biao.liub, cactuslrd.lird, dawidwys, desktop,
fengyelei, godfreyhe, gosubpl, gyao, hongyuhong, huafengw, kkloudas, kl0u, lincoln-lil,
lingjinjiang, mengji.fy, minwenjun, mtunique, p1tz, paul, rtudoran, shaoxuan-wang, sirko
bretschneider, sunjincheng121, tedyu, twalthr, uybhatti, wangmiao1981, yew1eb, z00376786, zentol,
zhangminglei, zhe li, zhouhai02, zjureel, 付典, 军长, 宝牛, 淘江, 金竹&lt;/p&gt;
</description>
<pubDate>Tue, 12 Dec 2017 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/12/12/release-1.4.0.html</link>
<guid isPermaLink="true">/news/2017/12/12/release-1.4.0.html</guid>
</item>

<item>
<title>Looking Ahead to Apache Flink 1.4.0 and 1.5.0</title>
<description>&lt;p&gt;The Apache Flink 1.4.0 release is on track to happen in the next couple of weeks, and for all of the
readers out there who haven’t been following the release discussion on &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink’s developer mailing
list&lt;/a&gt;, we’d like to provide some details on
what’s coming in Flink 1.4.0 as well as a preview of what the Flink community will save for 1.5.0.&lt;/p&gt;

&lt;p&gt;Both releases include ambitious features that we believe will move Flink to an entirely new level in
terms of the types of problems it can solve and applications it can support. The community deserves
lots of credit for its hard work over the past few months, and we’re excited to see these features
in the hands of users.&lt;/p&gt;

&lt;p&gt;This post will describe how the community plans to get there and the rationale behind the approach.&lt;/p&gt;

&lt;h2 id=&quot;coming-soon-major-changes-to-flinks-runtime&quot;&gt;Coming soon: Major Changes to Flink’s Runtime&lt;/h2&gt;

&lt;p&gt;There are 3 significant improvements to the Apache Flink engine that the community has nearly
completed and that will have a meaningful impact on Flink’s operability and performance.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Rework of the deployment model and distributed process&lt;/li&gt;
  &lt;li&gt;Transition from configurable, fixed-interval network I/O to event-driven network I/O and application-level flow control for better backpressure handling&lt;/li&gt;
  &lt;li&gt;Faster recovery from failure&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Next, we’ll go through each of these improvements in more detail.&lt;/p&gt;

&lt;h2 id=&quot;reworking-flinks-deployment-model-and-distributed-processing&quot;&gt;Reworking Flink’s Deployment Model and Distributed Processing&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt; (FLIP is short for
FLink Improvement Proposal and FLIPs are proposals for bigger changes to Flink) is an initiative
that’s been in the works for more than a year and represents a major refactor of Flink’s deployment
model and distributed process. The underlying motivation for FLIP-6 was the fact that Flink is being
adopted by a wider range of developer communities–both developers coming from the big data and
analytics space as well as developers coming from the event-driven applications space.&lt;/p&gt;

&lt;p&gt;Modern, stateful stream processing has served as a convergence for these two developer communities.
Despite a significant overlap of the core concepts in the applications being built, each group of
developers has its own set of common tools, deployment models, and expected behaviors when working
with a stream processing framework like Flink.&lt;/p&gt;

&lt;p&gt;FLIP-6 will ensure that Flink fits naturally in both of these contexts, behaving as though it’s
native to each ecosystem and operating seamlessly within a broader technology stack. A few of the
specific changes in FLIP-6 that will have such an impact:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Leveraging cluster management frameworks to support full resource elasticity&lt;/li&gt;
  &lt;li&gt;First-class support for containerized environments such as Kubernetes and Docker&lt;/li&gt;
  &lt;li&gt;REST-based client-cluster communication to ease operations and 3rd party integrations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;FLIP-6, along with already-introduced features like
&lt;a href=&quot;https://data-artisans.com/blog/apache-flink-at-mediamath-rescaling-stateful-applications&quot;&gt;rescalable state&lt;/a&gt;,
lays the groundwork for dynamic scaling in Flink, meaning that Flink programs will be able to scale up or down
automatically based on required resources–a huge step forward in terms of ease of operability and
the efficiency of Flink applications.&lt;/p&gt;

&lt;h2 id=&quot;lower-latency-via-improvements-to-the-apache-flink-network-stack&quot;&gt;Lower Latency via Improvements to the Apache Flink Network Stack&lt;/h2&gt;

&lt;p&gt;Speed will always be a key consideration for users who build stream processing applications, and
Flink 1.5 will include a rework of the network stack that will even further improve Flink’s latency.
At the heart of this work is a transition from configurable, fixed-interval network I/O to event-
driven network I/O and application-level flow control, ensuring that Flink will use all available
network capacity, as well as credit-based flow control which offers more fine-grained backpressuring
for improved checkpoint alignments.&lt;/p&gt;

&lt;p&gt;In our testing (&lt;a href=&quot;https://www.slideshare.net/FlinkForward/flink-forward-berlin-2017-nico-kruber-building-a-network-stack-for-optimal-throughput-lowlatency-tradeoffs#26&quot;&gt;see slide 26 here&lt;/a&gt;),
we’ve seen a substantial improvement in latency using event-driven network I/O, and the community
is also doing work to make sure we’re able to provide this increase in speed without a measurable
throughput tradeoff.&lt;/p&gt;

&lt;h2 id=&quot;faster-recovery-from-failures&quot;&gt;Faster Recovery from Failures&lt;/h2&gt;

&lt;p&gt;Flink 1.3.0 introduced incremental checkpoints, making it possible to take a checkpoint of state
updates since the last successfully-completed checkpoint only rather than the previous behavior of
only taking checkpoints of the entire state of the application. This has led to significant
performance improvements for users with large state.&lt;/p&gt;

&lt;p&gt;Flink 1.5 will introduce task-local recovery, which means that Flink will store a second copy of the
most recent checkpoint on the local disk (or even in main memory) of a task manager. The primary
copy still goes to durable storage so that it’s resilient to machine failures.&lt;/p&gt;

&lt;p&gt;In case of failover, the scheduler will try to reschedule tasks to their previous task manager (in
other words, to the same machine again) if this is possible. The task can then recover from the
locally-kept state. This makes it possible to avoid reading all state from the distributed file
system (which is remote over the network). Especially in applications with very large state, not
having to read many gigabytes over the network and instead from local disk will result in
significant performance gains in recovery.&lt;/p&gt;

&lt;h2 id=&quot;the-proposed-timeline-for-flink-14-and-flink-15&quot;&gt;The Proposed Timeline for Flink 1.4 and Flink 1.5&lt;/h2&gt;

&lt;p&gt;The good news is that all 3 of the features described above are well underway, and in fact, much of
the work is already covered by open pull requests.&lt;/p&gt;

&lt;p&gt;But given these features’ importance and the complexity of the work involved, the community expected
that the QA and testing required would be extensive and would delay the release of the otherwise-
ready features also on the list for the next release.&lt;/p&gt;

&lt;p&gt;And so the community decided to withhold the 3 features above (deployment model rework, improvements
to the network stack, and faster recovery) to be included a separate Flink 1.5 release that will
come shortly after the Flink 1.4 release. Flink 1.5 is estimated to come just a couple of months
after 1.4 rather than the typical 4-month cycle in between major releases.&lt;/p&gt;

&lt;p&gt;The soon-to-be-released Flink 1.4 represents the current state of Flink without merging those 3
features. And Flink 1.4 is a substantial release in its own right, including, but not limited to,
the following:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;A significantly improved dependency structure&lt;/strong&gt;, removing many of Flink’s dependencies and subtle runtime conflicts. This increases overall stability and removes friction when embedding Flink or calling Flink “library style”.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Reversed class loading for dynamically-loaded user code&lt;/strong&gt;, allowing for different dependencies than those included in the core framework.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;An Apache Kafka 0.11 exactly-once producer&lt;/strong&gt;, making it possible to build end-to-end exactly once applications with Flink and Kafka.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Streaming SQL JOIN based on processing time and event time&lt;/strong&gt;, which gives users the full advantage of Flink’s time handling while using a SQL JOIN.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Table API / Streaming SQL Source and Sink Additions&lt;/strong&gt;, including a Kafka 0.11 source and JDBC sink.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Hadoop-free Flink&lt;/strong&gt;, meaning that users who don’t rely on any Hadoop components (such as YARN or HDFS) in their Flink applications can use Flink without Hadoop for the first time.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Improvements to queryable state&lt;/strong&gt;, including a more container-friendly architecture, a more user-friendly API that hides configuration parameters, and the groundwork to be able to expose window state (the state of an in-flight window) in the future.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Connector improvements and fixes&lt;/strong&gt; for a range of connectors including Kafka, Apache Cassandra, Amazon Kinesis, and more.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Improved RPC performance&lt;/strong&gt; for faster recovery from failure&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The community decided it was best to get these features into a stable version of Flink as soon as
possible, and the separation of what could have been a single (and very substantial) Flink 1.4
release into 1.4 and 1.5 serves that purpose.&lt;/p&gt;

&lt;p&gt;We’re excited by what each of these represents for Apache Flink, and we’d like to extend our thanks
to the Flink community for all of their hard work.&lt;/p&gt;

&lt;p&gt;If you’d like to follow along with release discussions, &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;please subscribe to the dev@ mailing
list&lt;/a&gt;.&lt;/p&gt;

</description>
<pubDate>Wed, 22 Nov 2017 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html</link>
<guid isPermaLink="true">/news/2017/11/22/release-1.4-and-1.5-timeline.html</guid>
</item>

<item>
<title>Apache Flink 1.3.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;

&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.3.1. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.3.2.&lt;/p&gt;

&lt;div class=&quot;alert alert-warning&quot;&gt;
  Important Notice:

  &lt;p&gt;A user reported a bug in the FlinkKafkaConsumer
  (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7143&quot;&gt;FLINK-7143&lt;/a&gt;) that is causing
  incorrect partition assignment in large Kafka deployments in the presence of inconsistent broker
  metadata.  In that case multiple parallel instances of the FlinkKafkaConsumer may read from the
  same topic partition, leading to data duplication. In Flink 1.3.2 this bug is fixed but incorrect
  assignments from Flink 1.3.0 and 1.3.1 cannot be automatically fixed by upgrading to Flink 1.3.2
  via a savepoint because the upgraded version would resume the wrong partition assignment from the
  savepoint. If you believe you are affected by this bug (seeing messages from some partitions
  duplicated) please refer to the JIRA issue for an upgrade path that works around that.&lt;/p&gt;

  &lt;p&gt;Before attempting the more elaborate upgrade path, we would suggest to check if you are
  actually affected by this bug. We did not manage to reproduce it in various testing clusters and
  according to the reporting user, it only appeared in rare cases on their very large setup. This
  leads us to believe that most likely only a minority of setups would be affected by this bug.&lt;/p&gt;
&lt;/div&gt;

&lt;p&gt;Notable changes:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The default Kafka version for Flink Kafka Consumer 0.10 was bumped from 0.10.0.1 to 0.10.2.1.&lt;/li&gt;
  &lt;li&gt;Some default values for configurations of AWS API call behaviors in the Flink Kinesis Consumer
 were adapted for better default consumption performance: 1) &lt;code&gt;SHARD_GETRECORDS_MAX&lt;/code&gt; default changed
 to 10,000, and 2) &lt;code&gt;SHARD_GETRECORDS_INTERVAL_MILLIS&lt;/code&gt; default changed to 200ms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;List of resolved issues:&lt;/p&gt;

&lt;h2&gt;        Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6665&quot;&gt;FLINK-6665&lt;/a&gt;] -         Pass a ScheduledExecutorService to the RestartStrategy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6667&quot;&gt;FLINK-6667&lt;/a&gt;] -         Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6680&quot;&gt;FLINK-6680&lt;/a&gt;] -         App &amp;amp; Flink migration guide: updates for the 1.3 release
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5488&quot;&gt;FLINK-5488&lt;/a&gt;] -         yarnClient should be closed in AbstractYarnClusterDescriptor for error conditions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6376&quot;&gt;FLINK-6376&lt;/a&gt;] -         when deploy flink cluster on the yarn, it is lack of hdfs delegation token.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6541&quot;&gt;FLINK-6541&lt;/a&gt;] -         Jar upload directory not created
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6654&quot;&gt;FLINK-6654&lt;/a&gt;] -         missing maven dependency on &amp;quot;flink-shaded-hadoop2-uber&amp;quot; in flink-dist
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6655&quot;&gt;FLINK-6655&lt;/a&gt;] -         Misleading error message when HistoryServer path is empty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6742&quot;&gt;FLINK-6742&lt;/a&gt;] -         Improve error message when savepoint migration fails due to task removal
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6774&quot;&gt;FLINK-6774&lt;/a&gt;] -         build-helper-maven-plugin version not set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6806&quot;&gt;FLINK-6806&lt;/a&gt;] -         rocksdb is not listed as state backend in doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6843&quot;&gt;FLINK-6843&lt;/a&gt;] -         ClientConnectionTest fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6867&quot;&gt;FLINK-6867&lt;/a&gt;] -         Elasticsearch 1.x ITCase still instable due to embedded node instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6918&quot;&gt;FLINK-6918&lt;/a&gt;] -         Failing tests: ChainLengthDecreaseTest and ChainLengthIncreaseTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6945&quot;&gt;FLINK-6945&lt;/a&gt;] -         TaskCancelAsyncProducerConsumerITCase.testCancelAsyncProducerAndConsumer instable test case
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6964&quot;&gt;FLINK-6964&lt;/a&gt;] -         Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6965&quot;&gt;FLINK-6965&lt;/a&gt;] -         Avro is missing snappy dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6987&quot;&gt;FLINK-6987&lt;/a&gt;] -         TextInputFormatTest fails when run in path containing spaces
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6996&quot;&gt;FLINK-6996&lt;/a&gt;] -         FlinkKafkaProducer010 doesn&amp;#39;t guarantee at-least-once semantic
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7005&quot;&gt;FLINK-7005&lt;/a&gt;] -         Optimization steps are missing for nested registered tables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7011&quot;&gt;FLINK-7011&lt;/a&gt;] -         Instable Kafka testStartFromKafkaCommitOffsets failures on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7025&quot;&gt;FLINK-7025&lt;/a&gt;] -         Using NullByteKeySelector for Unbounded ProcTime NonPartitioned Over
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7034&quot;&gt;FLINK-7034&lt;/a&gt;] -         GraphiteReporter cannot recover from lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7038&quot;&gt;FLINK-7038&lt;/a&gt;] -         Several misused &amp;quot;KeyedDataStream&amp;quot; term in docs and Javadocs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7041&quot;&gt;FLINK-7041&lt;/a&gt;] -         Deserialize StateBackend from JobCheckpointingSettings with user classloader
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7132&quot;&gt;FLINK-7132&lt;/a&gt;] -         Fix BulkIteration parallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7133&quot;&gt;FLINK-7133&lt;/a&gt;] -         Fix Elasticsearch version interference
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7137&quot;&gt;FLINK-7137&lt;/a&gt;] -         Flink table API defaults top level fields as nullable and all nested fields within CompositeType as non-nullable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7143&quot;&gt;FLINK-7143&lt;/a&gt;] -         Partition assignment for Kafka consumer is not stable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7154&quot;&gt;FLINK-7154&lt;/a&gt;] -         Missing call to build CsvTableSource example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7158&quot;&gt;FLINK-7158&lt;/a&gt;] -         Wrong test jar dependency in flink-clients
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7177&quot;&gt;FLINK-7177&lt;/a&gt;] -         DataSetAggregateWithNullValuesRule fails creating null literal for non-nullable type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7178&quot;&gt;FLINK-7178&lt;/a&gt;] -         Datadog Metric Reporter Jar is Lacking Dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7180&quot;&gt;FLINK-7180&lt;/a&gt;] -         CoGroupStream perform checkpoint failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7195&quot;&gt;FLINK-7195&lt;/a&gt;] -         FlinkKafkaConsumer should not respect fetched partitions to filter restored partition states
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7216&quot;&gt;FLINK-7216&lt;/a&gt;] -         ExecutionGraph can perform concurrent global restarts to scheduling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7225&quot;&gt;FLINK-7225&lt;/a&gt;] -         Cutoff exception message in StateDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7226&quot;&gt;FLINK-7226&lt;/a&gt;] -         REST responses contain invalid content-encoding header
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7231&quot;&gt;FLINK-7231&lt;/a&gt;] -         SlotSharingGroups are not always released in time for new restarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7234&quot;&gt;FLINK-7234&lt;/a&gt;] -         Fix CombineHint documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7241&quot;&gt;FLINK-7241&lt;/a&gt;] -         Fix YARN high availability documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7255&quot;&gt;FLINK-7255&lt;/a&gt;] -         ListStateDescriptor example uses wrong constructor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7258&quot;&gt;FLINK-7258&lt;/a&gt;] -         IllegalArgumentException in Netty bootstrap with large memory state segment size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7266&quot;&gt;FLINK-7266&lt;/a&gt;] -         Don&amp;#39;t attempt to delete parent directory on S3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7268&quot;&gt;FLINK-7268&lt;/a&gt;] -         Zookeeper Checkpoint Store interacting with Incremental State Handles can lead to loss of handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7281&quot;&gt;FLINK-7281&lt;/a&gt;] -         Fix various issues in (Maven) release infrastructure
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6365&quot;&gt;FLINK-6365&lt;/a&gt;] -         Adapt default values of the Kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6575&quot;&gt;FLINK-6575&lt;/a&gt;] -         Disable all tests on Windows that use HDFS
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6682&quot;&gt;FLINK-6682&lt;/a&gt;] -         Improve error message in case parallelism exceeds maxParallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6789&quot;&gt;FLINK-6789&lt;/a&gt;] -         Remove duplicated test utility reducer in optimizer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6874&quot;&gt;FLINK-6874&lt;/a&gt;] -         Static and transient fields ignored for POJOs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6898&quot;&gt;FLINK-6898&lt;/a&gt;] -         Limit size of operator component in metric name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6937&quot;&gt;FLINK-6937&lt;/a&gt;] -         Fix link markdown in Production Readiness Checklist doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6940&quot;&gt;FLINK-6940&lt;/a&gt;] -         Clarify the effect of configuring per-job state backend
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6998&quot;&gt;FLINK-6998&lt;/a&gt;] -         Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7004&quot;&gt;FLINK-7004&lt;/a&gt;] -         Switch to Travis Trusty image
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7032&quot;&gt;FLINK-7032&lt;/a&gt;] -         Intellij is constantly changing language level of sub projects back to 1.6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7069&quot;&gt;FLINK-7069&lt;/a&gt;] -         Catch exceptions for each reporter separately
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7149&quot;&gt;FLINK-7149&lt;/a&gt;] -         Add checkpoint ID to &amp;#39;sendValues()&amp;#39; in GenericWriteAheadSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7164&quot;&gt;FLINK-7164&lt;/a&gt;] -         Extend integration tests for (externalised) checkpoints, checkpoint store
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7174&quot;&gt;FLINK-7174&lt;/a&gt;] -         Bump dependency of Kafka 0.10.x to the latest one
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7211&quot;&gt;FLINK-7211&lt;/a&gt;] -         Exclude Gelly javadoc jar from release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7224&quot;&gt;FLINK-7224&lt;/a&gt;] -         Incorrect Javadoc description in all Kafka consumer versions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7228&quot;&gt;FLINK-7228&lt;/a&gt;] -         Harden HistoryServerStaticFileHandlerTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7233&quot;&gt;FLINK-7233&lt;/a&gt;] -         TaskManagerHeapSizeCalculationJavaBashTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7287&quot;&gt;FLINK-7287&lt;/a&gt;] -         test instability in Kafka010ITCase.testCommitOffsetsToKafka
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7290&quot;&gt;FLINK-7290&lt;/a&gt;] -         Make release scripts modular
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Sat, 05 Aug 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/08/05/release-1.3.2.html</link>
<guid isPermaLink="true">/news/2017/08/05/release-1.3.2.html</guid>
</item>

<item>
<title>A Deep Dive into Rescalable State in Apache Flink</title>
<description>&lt;p&gt;&lt;em&gt;Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink.&lt;/em&gt;
 &lt;br /&gt;
 &lt;br /&gt;&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#an-intro-to-stateful-stream-processing&quot; id=&quot;markdown-toc-an-intro-to-stateful-stream-processing&quot;&gt;An Intro to Stateful Stream Processing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#state-in-apache-flink&quot; id=&quot;markdown-toc-state-in-apache-flink&quot;&gt;State in Apache Flink&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#rescaling-stateful-stream-processing-jobs&quot; id=&quot;markdown-toc-rescaling-stateful-stream-processing-jobs&quot;&gt;Rescaling Stateful Stream Processing Jobs&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#reassigning-operator-state-when-rescaling&quot; id=&quot;markdown-toc-reassigning-operator-state-when-rescaling&quot;&gt;Reassigning Operator State When Rescaling&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#reassigning-keyed-state-when-rescaling&quot; id=&quot;markdown-toc-reassigning-keyed-state-when-rescaling&quot;&gt;Reassigning Keyed State When Rescaling&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#wrapping-up&quot; id=&quot;markdown-toc-wrapping-up&quot;&gt;Wrapping Up&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;an-intro-to-stateful-stream-processing&quot;&gt;An Intro to Stateful Stream Processing&lt;/h2&gt;

&lt;p&gt;At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input.&lt;/p&gt;

&lt;p&gt;In contrast, operators in &lt;em&gt;stateless&lt;/em&gt; stream processing only consider their current inputs, without further context and knowledge about the past. A simple example to illustrate this difference: let us consider a source stream that emits events with schema &lt;code&gt;e = {event_id:int, event_value:int}&lt;/code&gt;. Our goal is, for each event, to extract and output the &lt;code&gt;event_value&lt;/code&gt;. We can easily achieve this with a simple source-map-sink pipeline, where the map function extracts the &lt;code&gt;event_value&lt;/code&gt; from the event and emits it downstream to an outputting sink. This is an instance of stateless stream processing.&lt;/p&gt;

&lt;p&gt;But what if we want to modify our job to output the &lt;code&gt;event_value&lt;/code&gt; only if it is larger than the value from the previous event? In this case, our map function obviously needs some way to remember the &lt;code&gt;event_value&lt;/code&gt; from a past event — and so this is an instance of stateful stream processing.&lt;/p&gt;

&lt;p&gt;This example should demonstrate that state is a fundamental, enabling concept in stream processing that is required for a majority of interesting use cases.&lt;/p&gt;

&lt;h2 id=&quot;state-in-apache-flink&quot;&gt;State in Apache Flink&lt;/h2&gt;

&lt;p&gt;Apache Flink is a massively parallel distributed system that allows stateful stream processing at large scale. For scalability, a Flink job is logically decomposed into a graph of operators, and the execution of each operator is physically decomposed into multiple parallel operator instances. Conceptually, each parallel operator instance in Flink is an independent task that can be scheduled on its own machine in a network-connected cluster of shared-nothing machines.&lt;/p&gt;

&lt;p&gt;For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.&lt;/p&gt;

&lt;p&gt;However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.&lt;/p&gt;

&lt;p&gt;For the sake of data locality, all state data in Flink is always bound to the task that runs the corresponding parallel operator instance and is co-located on the same machine that runs the task.&lt;/p&gt;

&lt;p&gt;Through this design, all state data for a task is local, and no network communication between tasks is required for state access. Avoiding this kind of traffic is crucial for the scalability of a massively parallel distributed system like Flink.&lt;/p&gt;

&lt;p&gt;For Flink’s stateful stream processing, we differentiate between two different types of state: operator state and keyed state. Operator state is scoped per parallel instance of an operator (sub-task), and keyed state can be thought of as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#keyed-state&quot;&gt;“operator state that has been partitioned, or sharded, with exactly one state-partition per key”&lt;/a&gt;. We could have easily implemented our previous example as operator state: all events that are routed through the operator instance can influence its value.&lt;/p&gt;

&lt;h2 id=&quot;rescaling-stateful-stream-processing-jobs&quot;&gt;Rescaling Stateful Stream Processing Jobs&lt;/h2&gt;

&lt;p&gt;Changing the parallelism (that is, changing the number of parallel subtasks that perform work for an operator) in stateless streaming is very easy. It requires only starting or stopping parallel instances of stateless operators and dis-/connecting them to/from their upstream and downstream operators as shown in &lt;strong&gt;Figure 1A&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;On the other hand, changing the parallelism of stateful operators is much more involved because we must also (i) redistribute the previous operator state in a (ii) consistent, (iii) meaningful way. Remember that in Flink’s shared-nothing architecture, all state is local to the task that runs the owning parallel operator instance, and there is no communication between parallel operator instances at job runtime.&lt;/p&gt;

&lt;p&gt;However, there is already one mechanism in Flink that allows the exchange of operator state between tasks, in a consistent way, with exactly-once guarantees — Flink’s checkpointing!&lt;/p&gt;

&lt;p&gt;You can see detail about Flink’s checkpoints in &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html&quot;&gt;the documentation&lt;/a&gt;. In a nutshell, a checkpoint is triggered when a checkpoint coordinator injects a special event (a so-called checkpoint barrier) into a stream.&lt;/p&gt;

&lt;p&gt;Checkpoint barriers flow downstream with the event stream from sources to sinks, and whenever an operator instance receives a barrier, the operator instance immediately snapshots its current state to a distributed storage system, e.g. HDFS.&lt;/p&gt;

&lt;p&gt;On restore, the new tasks for the job (which potentially run on different machines now) can again pick up the state data from the distributed storage system.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 1&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/stateless-stateful-streaming.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;We can piggyback rescaling of stateful jobs on checkpointing, as shown in &lt;strong&gt;Figure 1B&lt;/strong&gt;. First, a checkpoint is triggered and sent to a distributed storage system. Next, the job is restarted with a changed parallelism and can access a consistent snapshot of all previous state from the distributed storage. While this solves (i) redistribution of a (ii) consistent state across machines there is still one problem: without a clear 1:1 relationship between previous state and new parallel operator instances, how can we assign the state in a (iii) meaningful way?&lt;/p&gt;

&lt;p&gt;We could again assign the state from previous &lt;code&gt;map_1&lt;/code&gt; and &lt;code&gt;map_2&lt;/code&gt; to the new &lt;code&gt;map_1&lt;/code&gt; and &lt;code&gt;map_2&lt;/code&gt;. But this would leave &lt;code&gt;map_3&lt;/code&gt; with empty state. Depending on the type of state and concrete semantics of the job, this naive approach could lead to anything from inefficiency to incorrect results.&lt;/p&gt;

&lt;p&gt;In the following section, we’ll explain how we solved the problem of efficient, meaningful state reassignment in Flink. Each of Flink state’s two flavours, operator state and keyed state, requires a different approach to state assignment.&lt;/p&gt;

&lt;h2 id=&quot;reassigning-operator-state-when-rescaling&quot;&gt;Reassigning Operator State When Rescaling&lt;/h2&gt;

&lt;p&gt;First, we’ll discuss how state reassignment in rescaling works for operator state. A common real-world use-case of operator state in Flink is to maintain the current offsets for Kafka partitions in Kafka sources. Each Kafka source instance would maintain &lt;code&gt;&amp;lt;PartitionID, Offset&amp;gt;&lt;/code&gt; pairs – one pair for each Kafka partition that the source is reading–as operator state. How would we redistribute this operator state in case of rescaling? Ideally, we would like to reassign all &lt;code&gt;&amp;lt;PartitionID, Offset&amp;gt;&lt;/code&gt; pairs from the checkpoint in round robin across all parallel operator instances after the rescaling.&lt;/p&gt;

&lt;p&gt;As a user, we are aware of the “meaning” of Kafka partition offsets, and we know that we can treat them as independent, redistributable units of state. The problem of how we can we share this domain-specific knowledge with Flink remains.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Figure 2A&lt;/strong&gt; illustrates the previous interface for checkpointing operator state in Flink. On snapshot, each operator instance returned an object that represented its complete state. In the case of a Kafka source, this object was a list of partition offsets.&lt;/p&gt;

&lt;p&gt;This snapshot object was then written to the distributed store. On restore, the object was read from distributed storage and passed to the operator instance as a parameter to the restore function.&lt;/p&gt;

&lt;p&gt;This approach was problematic for rescaling: how could Flink decompose the operator state into meaningful, redistributable partitions? Even though the Kafka source was actually always a list of partition offsets, the previously-returned state object was a black box to Flink and therefore could not be redistributed.&lt;/p&gt;

&lt;p&gt;As a generalized approach to solve this black box problem, we slightly modified the checkpointing interface, called &lt;code&gt;ListCheckpointed&lt;/code&gt;. &lt;strong&gt;Figure 2B&lt;/strong&gt; shows the new checkpointing interface, which returns and receives a list of state partitions. Introducing a list instead of a single object makes the meaningful partitioning of state explicit: each item in the list still remains a black box to Flink, but is considered an atomic, independently re-distributable part of the operator state.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 2&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/list-checkpointed.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Our approach provides a simple API with which implementing operators can encode domain-specific knowledge about how to partition and merge units of state. With our new checkpointing interface, the Kafka source makes individual partition offsets explicit, and state reassignment becomes as easy as splitting and merging lists.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FlinkKafkaConsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichParallelSourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CheckpointedFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
	 &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;

   &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ListState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

   &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;initializeState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FunctionInitializationContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

      &lt;span class=&quot;n&quot;&gt;OperatorStateStore&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateStore&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getOperatorStateStore&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// register the state with the backend&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateStore&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getSerializableListState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;kafka-offsets&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

      &lt;span class=&quot;c1&quot;&gt;// if the job was restarted, we set the restored offsets&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;isRestored&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kafkaOffset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;c1&quot;&gt;// ... restore logic&lt;/span&gt;
         &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

   &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
   &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;snapshotState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FunctionSnapshotContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

      &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;clear&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

      &lt;span class=&quot;c1&quot;&gt;// write the partition offsets to the list of operator states&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subscribedPartitionOffsets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;entrySet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
         &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

   &lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;

&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;reassigning-keyed-state-when-rescaling&quot;&gt;Reassigning Keyed State When Rescaling&lt;/h2&gt;
&lt;p&gt;The second flavour of state in Flink is keyed state. In contrast to operator state, keyed state is scoped by key, where the key is extracted from each stream event.&lt;/p&gt;

&lt;p&gt;To illustrate how keyed state differs from operator state, let’s use the following example. Assume we have a stream of events, where each event has the schema &lt;code&gt;{customer_id:int, value:int}&lt;/code&gt;. We have already learned that we can use operator state to compute and emit the running sum of values for all customers.&lt;/p&gt;

&lt;p&gt;Now assume we want to slightly modify our goal and compute a running sum of values for each individual &lt;code&gt;customer_id&lt;/code&gt;. This is a use case from keyed state, as one aggregated state must be maintained for each unique key in the stream.&lt;/p&gt;

&lt;p&gt;Note that keyed state is only available for keyed streams, which are created through the &lt;code&gt;keyBy()&lt;/code&gt; operation in Flink. The &lt;code&gt;keyBy()&lt;/code&gt; operation (i) specifies how to extract a key from each event and (ii) ensures that all events with the same key are always processed by the same parallel operator instance. As a result, all keyed state is transitively also bound to one parallel operator instance, because for each key, exactly one operator instance is responsible. This mapping from key to operator is deterministically computed through hash partitioning on the key.&lt;/p&gt;

&lt;p&gt;We can see that keyed state has one clear advantage over operator state when it comes to rescaling: we can easily figure out how to correctly split and redistribute the state across parallel operator instances. State reassignment simply follows the partitioning of the keyed stream. After rescaling, the state for each key must be assigned to the operator instance that is now responsible for that key, as determined by the hash partitioning of the keyed stream.&lt;/p&gt;

&lt;p&gt;While this automatically solves the problem of logically remapping the state to sub-tasks after rescaling, there is one more practical problem left to solve: how can we efficiently transfer the state to the subtasks’ local backends?&lt;/p&gt;

&lt;p&gt;When we’re not rescaling, each subtask can simply read the whole state as written to the checkpoint by a previous instance in one sequential read.&lt;/p&gt;

&lt;p&gt;When rescaling, however, this is no longer possible – the state for each subtask is now potentially scattered across the files written by all subtasks (think about what happens if you change the parallelism in &lt;code&gt;hash(key) mod parallelism&lt;/code&gt;). We have illustrated this problem in &lt;strong&gt;Figure 3A&lt;/strong&gt;. In this example, we show how keys are shuffled when rescaling from parallelism 3 to 4 for a key space of 0, 20, using identity as hash function to keep it easy to follow.&lt;/p&gt;

&lt;p&gt;A naive approach might be to read all the previous subtask state from the checkpoint in all sub-tasks and filter out the matching keys for each sub-task. While this approach can benefit from a sequential read pattern, each subtask potentially reads a large fraction of irrelevant state data, and the distributed file system receives a huge number of parallel read requests.&lt;/p&gt;

&lt;p&gt;Another approach could be to build an index that tracks the location of the state for each key in the checkpoint. With this approach, all sub-tasks could locate and read the matching keys very selectively. This approach would avoid reading irrelevant data, but it has two major downsides. A materialized index for all keys, i.e. a key-to-read-offset mapping, can potentially grow very large. Furthermore, this approach can also introduce a huge amount of random I/O (when seeking to the data for individual keys, see &lt;strong&gt;Figure 3A&lt;/strong&gt;, which typically entails very bad performance in distributed file systems.&lt;/p&gt;

&lt;p&gt;Flink’s approach sits in between those two extremes by introducing key-groups as the atomic unit of state assignment. How does this work? The number of key-groups must be determined before the job is started and (currently) cannot be changed after the fact. As key-groups are the atomic unit of state assignment, this also means that the number of key-groups is the upper limit for parallelism. In a nutshell, key-groups give us a way to trade between flexibility in rescaling (by setting an upper limit for parallelism) and the maximum overhead involved in indexing and restoring the state.&lt;/p&gt;

&lt;p&gt;We assign key-groups to subtasks as ranges. This makes the reads on restore not only sequential within each key-group, but often also across multiple key-groups. An additional benefit: this also keeps the metadata of key-group-to-subtask assignments very small. We do not maintain explicit lists of key-groups because it is sufficient to track the range boundaries.&lt;/p&gt;

&lt;p&gt;We have illustrated rescaling from parallelism 3 to 4 with 10 key-groups in &lt;strong&gt;Figure 3B&lt;/strong&gt;. As we can see, introducing key-groups and assigning them as ranges greatly improves the access pattern over the naive approach. Equation 2 and 3 in &lt;strong&gt;Figure 3B&lt;/strong&gt; also details how we compute key-groups and the range assignment.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 2&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/key-groups.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;

&lt;p&gt;Thanks for staying with us, and we hope you now have a clear idea of how rescalable state works in Apache Flink and how to make use of rescaling in real-world scenarios.&lt;/p&gt;

&lt;p&gt;Flink 1.3.0, which was released earlier this month, adds more tooling for state management and fault tolerance in Flink, including incremental checkpoints. And the community is exploring features such as…&lt;/p&gt;

&lt;p&gt;• State replication&lt;br /&gt;
• State that isn’t bound to the lifecycle of a Flink job&lt;br /&gt;
• Automatic rescaling (with no savepoints required)&lt;/p&gt;

&lt;p&gt;…for Flink 1.4.0 and beyond.&lt;/p&gt;

&lt;p&gt;If you’d like to learn more, we recommend starting with the Apache Flink &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This is an excerpt from a post that originally appeared on the data Artisans blog. If you’d like to read the original post in its entirety, you can find it &lt;a href=&quot;https://data-artisans.com/blog/apache-flink-at-mediamath-rescaling-stateful-applications&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt; (external link).&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Tue, 04 Jul 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html</link>
<guid isPermaLink="true">/features/2017/07/04/flink-rescalable-state.html</guid>
</item>

<item>
<title>Apache Flink 1.3.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;

&lt;p&gt;This release includes 50 fixes and minor improvements for Flink 1.3.0. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.3.1.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;        Bug
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6492&quot;&gt;FLINK-6492&lt;/a&gt;] -         Unclosed DataOutputViewStream in GenericArraySerializerConfigSnapshot#write()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6602&quot;&gt;FLINK-6602&lt;/a&gt;] -         Table source with defined time attributes allows empty string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6652&quot;&gt;FLINK-6652&lt;/a&gt;] -         Problem with DelimitedInputFormat
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6659&quot;&gt;FLINK-6659&lt;/a&gt;] -         RocksDBMergeIteratorTest, SavepointITCase leave temporary directories behind
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6669&quot;&gt;FLINK-6669&lt;/a&gt;] -         [Build] Scala style check errror on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6685&quot;&gt;FLINK-6685&lt;/a&gt;] -         SafetyNetCloseableRegistry is closed prematurely in Task::triggerCheckpointBarrier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6772&quot;&gt;FLINK-6772&lt;/a&gt;] -         Incorrect ordering of matched state events in Flink CEP
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6775&quot;&gt;FLINK-6775&lt;/a&gt;] -         StateDescriptor cannot be shared by multiple subtasks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6780&quot;&gt;FLINK-6780&lt;/a&gt;] -         ExternalTableSource should add time attributes in the row type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6783&quot;&gt;FLINK-6783&lt;/a&gt;] -         Wrongly extracted TypeInformations for WindowedStream::aggregate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6797&quot;&gt;FLINK-6797&lt;/a&gt;] -         building docs fails with bundler 1.15
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6801&quot;&gt;FLINK-6801&lt;/a&gt;] -         PojoSerializerConfigSnapshot cannot deal with missing Pojo fields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6804&quot;&gt;FLINK-6804&lt;/a&gt;] -         Inconsistent state migration behaviour between different state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6807&quot;&gt;FLINK-6807&lt;/a&gt;] -         Elasticsearch 5 connector artifact not published to maven 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6808&quot;&gt;FLINK-6808&lt;/a&gt;] -         Stream join fails when checkpointing is enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6809&quot;&gt;FLINK-6809&lt;/a&gt;] -         side outputs documentation: wrong variable name in java example code
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6812&quot;&gt;FLINK-6812&lt;/a&gt;] -         Elasticsearch 5 release artifacts not published to Maven central
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6815&quot;&gt;FLINK-6815&lt;/a&gt;] -         Javadocs don&amp;#39;t work anymore in Flink 1.4-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6816&quot;&gt;FLINK-6816&lt;/a&gt;] -         Fix wrong usage of Scala string interpolation in Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6833&quot;&gt;FLINK-6833&lt;/a&gt;] -         Race condition: Asynchronous checkpointing task can fail completed StreamTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6844&quot;&gt;FLINK-6844&lt;/a&gt;] -         TraversableSerializer should implement compatibility methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6848&quot;&gt;FLINK-6848&lt;/a&gt;] -         Extend the managed state docs with a Scala example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6853&quot;&gt;FLINK-6853&lt;/a&gt;] -         Migrating from Flink 1.1 fails for FlinkCEP
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6869&quot;&gt;FLINK-6869&lt;/a&gt;] -         Scala serializers do not have the serialVersionUID specified
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6875&quot;&gt;FLINK-6875&lt;/a&gt;] -         Remote DataSet API job submission timing out
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6881&quot;&gt;FLINK-6881&lt;/a&gt;] -         Creating a table from a POJO and defining a time attribute fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6883&quot;&gt;FLINK-6883&lt;/a&gt;] -         Serializer for collection of Scala case classes are generated with different anonymous class names in 1.3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6886&quot;&gt;FLINK-6886&lt;/a&gt;] -         Fix Timestamp field can not be selected in event time case when  toDataStream[T], `T` not a `Row` Type.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6896&quot;&gt;FLINK-6896&lt;/a&gt;] -         Creating a table from a POJO and use table sink to output fail
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6899&quot;&gt;FLINK-6899&lt;/a&gt;] -         Wrong state array size in NestedMapsStateTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6914&quot;&gt;FLINK-6914&lt;/a&gt;] -         TrySerializer#ensureCompatibility causes StackOverflowException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6915&quot;&gt;FLINK-6915&lt;/a&gt;] -         EnumValueSerializer broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6921&quot;&gt;FLINK-6921&lt;/a&gt;] -         EnumValueSerializer cannot properly handle appended enum values
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6922&quot;&gt;FLINK-6922&lt;/a&gt;] -         Enum(Value)SerializerConfigSnapshot uses Java serialization to store enum values
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6930&quot;&gt;FLINK-6930&lt;/a&gt;] -         Selecting window start / end on row-based Tumble/Slide window causes NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6932&quot;&gt;FLINK-6932&lt;/a&gt;] -         Update the inaccessible Dataflow Model paper link
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6941&quot;&gt;FLINK-6941&lt;/a&gt;] -         Selecting window start / end on over window causes field not resolve exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6948&quot;&gt;FLINK-6948&lt;/a&gt;] -         EnumValueSerializer cannot handle removed enum values
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        Improvement
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5354&quot;&gt;FLINK-5354&lt;/a&gt;] -         Split up Table API documentation into multiple pages 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6038&quot;&gt;FLINK-6038&lt;/a&gt;] -         Add deep links to Apache Bahir Flink streaming connector documentations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6796&quot;&gt;FLINK-6796&lt;/a&gt;] -         Allow setting the user code class loader for AbstractStreamOperatorTestHarness
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6803&quot;&gt;FLINK-6803&lt;/a&gt;] -         Add test for PojoSerializer when Pojo changes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6859&quot;&gt;FLINK-6859&lt;/a&gt;] -         StateCleaningCountTrigger should not delete timer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6929&quot;&gt;FLINK-6929&lt;/a&gt;] -         Add documentation for Table API OVER windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6952&quot;&gt;FLINK-6952&lt;/a&gt;] -         Add link to Javadocs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6748&quot;&gt;FLINK-6748&lt;/a&gt;] -         Table API / SQL Docs: Table API Page
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        Test
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6830&quot;&gt;FLINK-6830&lt;/a&gt;] -         Add ITTests for savepoint migration from 1.3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6320&quot;&gt;FLINK-6320&lt;/a&gt;] -         Flakey JobManagerHAJobGraphRecoveryITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6744&quot;&gt;FLINK-6744&lt;/a&gt;] -         Flaky ExecutionGraphSchedulingTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6913&quot;&gt;FLINK-6913&lt;/a&gt;] -         Instable StatefulJobSavepointMigrationITCase.testRestoreSavepoint
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Fri, 23 Jun 2017 18:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/06/23/release-1.3.1.html</link>
<guid isPermaLink="true">/news/2017/06/23/release-1.3.1.html</guid>
</item>

<item>
<title>Apache Flink 1.3.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the &lt;a href=&quot;/blog/release_1.3.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;p&gt;This is the fourth major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.&lt;/p&gt;

&lt;p&gt;Users can expect Flink releases now in a 4 month cycle. At the beginning of the 1.3 &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+and+Feature+Plan&quot;&gt;release cycle&lt;/a&gt;, the community decided to follow a strict &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases&quot;&gt;time-based release model&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/&quot;&gt;documentation&lt;/a&gt;. Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;. Some highlights of the release are listed below.&lt;/p&gt;

&lt;h1 id=&quot;large-state-handlingrecovery&quot;&gt;Large State Handling/Recovery&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Incremental Checkpointing for RocksDB&lt;/strong&gt;: It is now possible to checkpoint only the difference from the previous successful checkpoint, rather than checkpointing the entire application state. This speeds up checkpointing and saves disk space, because the individual checkpoints are smaller. (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5053&quot;&gt;FLINK-5053&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Asynchronous snapshots for heap-based state backends&lt;/strong&gt;: The filesystem and memory statebackends now also support asynchronous snapshots using a copy-on-write HashMap implementation. Asynchronous snapshotting makes Flink more resilient to slow storage systems and expensive serialization. The time an operator blocks on a snapshot is reduced to a minimum (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6048&quot;&gt;FLINK-6048&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5715&quot;&gt;FLINK-5715&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Allow upgrades to state serializers:&lt;/strong&gt; Users can now upgrade serializers, while keeping their application state. One use case of this is upgrading custom serializers used for managed operator state/keyed state. Also, registration order for POJO types/Kryo types is now no longer fixed (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#custom-serialization-for-managed-state&quot;&gt;Documentation&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6178&quot;&gt;FLINK-6178&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Recover job state at the granularity of operator&lt;/strong&gt;: Before Flink 1.3, operator state was bound to Flink’s internal “Task” representation. This made it hard to change a job’s topology while keeping its state around. With this change, users are allowed to do more topology changes (un-chain operators) by restoring state into logical operators instead of “Tasks” (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5892&quot;&gt;FLINK-5892&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Fine-grained recovery&lt;/strong&gt; (beta): Instead of restarting the complete ExecutionGraph in case of a task failure, Flink is now able to restart only the affected subgraph and thereby significantly decrease recovery time (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4256&quot;&gt;FLINK-4256&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;datastream-api&quot;&gt;DataStream API&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Side Outputs&lt;/strong&gt;: This change allows users to have more than one output stream for an operator. Operator metadata, internal system information (debugging, performance etc.) or rejected/late elements are potential use-cases for this new API feature. &lt;strong&gt;The Window operator is now using this new feature for late window elements&lt;/strong&gt; (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html&quot;&gt;Side Outputs Documentation&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4460&quot;&gt;FLINK-4460&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Union Operator State&lt;/strong&gt;: Flink 1.2.0 introduced broadcast state functionality, but this had not yet been exposed via a public API. Flink 1.3.0 provides the Union Operator State API for exposing broadcast operator state. The union state will send the entire state across all parallel instances to each instance on restore, giving each operator a full view of the state (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5991&quot;&gt;FLINK-5991&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Per-Window State&lt;/strong&gt;: Previously, the state that a WindowFunction or ProcessWindowFunction could access was scoped to the key of the window but not the window itself. With this new feature, users can keep window state independent of the key (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5929&quot;&gt;FLINK-5929&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;deployment-and-tooling&quot;&gt;Deployment and Tooling&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Flink HistoryServer&lt;/strong&gt;: Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/historyserver.html&quot;&gt;HistoryServer&lt;/a&gt; now allows you to query the status and statistics of completed jobs that have been archived by a JobManager (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1579&quot;&gt;FLINK-1579&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Watermark Monitoring in Web Front-end&lt;/strong&gt;: For easier diagnosis of watermark issues, the Flink JobManager front-end now provides a new tab to track the watermark of each operator (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3427&quot;&gt;FLINK-3427&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Datadog HTTP Metrics Reporter&lt;/strong&gt;: Datadog is a widely-used metrics system, and Flink now offers a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter&quot;&gt;Datadog reporter&lt;/a&gt; that contacts the Datadog http endpoint directly (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6013&quot;&gt;FLINK-6013&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Network Buffer Configuration&lt;/strong&gt;: We finally got rid of the tedious network buffer configuration and replaced it with a more generic approach. First of all, you may now follow the idiom “more is better” without any penalty on the latency which could previously occur due to excessive buffering in incoming and outgoing channels. Secondly, instead of defining an absolute number of network buffers, we now use fractions of the available JVM memory (10% by default). This should cover more use cases by default and may also be tweaked by defining a minimum and maximum size.&lt;/p&gt;

    &lt;p&gt;→ See &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#configuring-the-network-buffers&quot;&gt;Configuring the Network Buffers&lt;/a&gt; in the Flink documentation.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;table-api--sql&quot;&gt;Table API / SQL&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for Retractions in Table API / SQL&lt;/strong&gt;: As part of our endeavor to support continuous queries on &lt;a href=&quot;http://flink.apache.org/news/2017/04/04/dynamic-tables.html&quot;&gt;Dynamic Tables&lt;/a&gt;, Retraction is an important building block that will enable a whole range of new applications which require updating previously-emitted results. Examples for such use cases are computation of early results for long-running windows, updates due to late arriving data, or maintaining constantly changing results similar to materialized views in relational database systems. Flink 1.3.0 supports retraction for non-windowed aggregates. Results with updates can be either converted into a DataStream or materialized to external data stores using TableSinks with upsert or retraction support.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Extended support for aggregations in Table API / SQL&lt;/strong&gt;: With Flink 1.3.0, the Table API and SQL support many more types of aggregations, including
    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;GROUP BY window aggregations in SQL (via the window functions &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6011&quot;&gt;TUMBLE, HOP, and SESSION windows&lt;/a&gt;) for both batch and streaming.&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;SQL OVER window aggregations (only for streaming)&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Non-windowed aggregations (in streaming with retractions).&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;User-defined aggregation functions for custom aggregation logic.&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;External catalog support&lt;/strong&gt;: The Table API &amp;amp; SQL allows to register external catalogs. Table API and SQL queries can then have access to table sources and their schema from the external catalogs without register those tables one by one.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;→ See &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#group-windows&quot;&gt;the Flink documentation&lt;/a&gt; for details about these features.&lt;/p&gt;

&lt;div class=&quot;alert alert-warning&quot;&gt;
  The Table API / SQL documentation is currently being reworked. The community plans to publish the updated docs in the week of June 5th.
&lt;/div&gt;

&lt;h1 id=&quot;connectors&quot;&gt;Connectors&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;ElasticSearch 5.x support&lt;/strong&gt;: The ElasticSearch connectors have been restructured to have a common base module and specific modules for ES 1, 2 and 5, similar to how the Kafka connectors are organized. This will make fixes and future improvements available across all ES versions (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4988&quot;&gt;FLINK-4988&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Allow rescaling the Kinesis Consumer&lt;/strong&gt;: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the Kinesis Consumer also makes use of that engine feature (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4821&quot;&gt;FLINK-4821&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Transparent shard discovery for Kinesis Consumer&lt;/strong&gt;: The Kinesis consumer can now discover new shards without failing / restarting jobs when a resharding is happening (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4577&quot;&gt;FLINK-4577&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Allow setting custom start positions for the Kafka consumer&lt;/strong&gt;: With this change, you can instruct Flink’s Kafka consumer to start reading messages from a specific offset (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3123&quot;&gt;FLINK-3123&lt;/a&gt;) or earliest / latest offset (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4280&quot;&gt;FLINK-4280&lt;/a&gt;) without respecting committed offsets in Kafka.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Allow out-opt from offset committing for the Kafka consumer&lt;/strong&gt;: By default, Kafka commits the offsets to the Kafka broker once a checkpoint has been completed. This change allows users to disable this mechanism (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3398&quot;&gt;FLINK-3398&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;cep-library&quot;&gt;CEP Library&lt;/h1&gt;

&lt;p&gt;The CEP library has been greatly enhanced and is now able to accommodate more use-cases out-of-the-box (expressivity enhancements), make more efficient use of the available resources, adjust to changing runtime conditions–all without breaking backwards compatibility of operator state.&lt;/p&gt;

&lt;p&gt;Please note that the API of the CEP library has been updated with this release.&lt;/p&gt;

&lt;p&gt;Below are some of the main features of the revamped CEP library:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Make CEP operators rescalable&lt;/strong&gt;: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the CEP library also makes use of that engine feature (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5420&quot;&gt;FLINK-5420&lt;/a&gt;).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;New operators for the CEP library&lt;/strong&gt;:&lt;/p&gt;

    &lt;ul&gt;
      &lt;li&gt;
        &lt;p&gt;Quantifiers (*,+,?) for the pattern API (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3318&quot;&gt;FLINK-3318&lt;/a&gt;)&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Support for different continuity requirements (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6208&quot;&gt;FLINK-6208&lt;/a&gt;)&lt;/p&gt;
      &lt;/li&gt;
      &lt;li&gt;
        &lt;p&gt;Support for iterative conditions (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6197&quot;&gt;FLINK-6197&lt;/a&gt;)&lt;/p&gt;
      &lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;gelly-library&quot;&gt;Gelly Library&lt;/h1&gt;

&lt;ul&gt;
  &lt;li&gt;Unified driver for running Gelly examples &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4949&quot;&gt;FLINK-4949&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;PageRank algorithm for directed graphs (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4896&quot;&gt;FLINK-4896&lt;/a&gt;).&lt;/li&gt;
  &lt;li&gt;Add Circulant and Echo graph generators (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6393&quot;&gt;FLINK-6393&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;known-issues&quot;&gt;Known Issues&lt;/h1&gt;

&lt;div class=&quot;alert alert-warning&quot;&gt;
  There are two &lt;strong&gt;known issues&lt;/strong&gt; in Flink 1.3.0. Both will be addressed in the &lt;i&gt;1.3.1&lt;/i&gt; release.
  &lt;br /&gt;
  &lt;ul&gt;
  	&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6783&quot;&gt;FLINK-6783&lt;/a&gt;: Wrongly extracted TypeInformations for &lt;code&gt;WindowedStream::aggregate&lt;/code&gt;&lt;/li&gt;
  	&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6775&quot;&gt;FLINK-6775&lt;/a&gt;: StateDescriptor cannot be shared by multiple subtasks&lt;/li&gt;
  &lt;/ul&gt; 
&lt;/div&gt;

&lt;h1 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h1&gt;

&lt;p&gt;According to git shortlog, the following 103 people contributed to the 1.3.0 release. Thank you to all contributors!&lt;/p&gt;

&lt;p&gt;Addison Higham, Alexey Diomin, Aljoscha Krettek, Andrea Sella, Andrey Melentyev, Anton Mushin, barcahead, biao.liub, Bowen Li, Chen Qin, Chico Sokol, David Anderson, Dawid Wysakowicz, DmytroShkvyra, Fabian Hueske, Fabian Wollert, fengyelei, Flavio Pompermaier, FlorianFan, Fokko Driesprong, Geoffrey Mon, godfreyhe, gosubpl, Greg Hogan, guowei.mgw, hamstah, Haohui Mai, Hequn Cheng, hequn.chq, heytitle, hongyuhong, Jamie Grier, Jark Wu, jingzhang, Jinkui Shi, Jin Mingjian, Joerg Schad, Joshua Griffith, Jürgen Thomann, kaibozhou, Kathleen Sharp, Ken Geis, kkloudas, Kurt Young, lincoln-lil, lingjinjiang, liuyuzhong7, Lorenz Buehmann, manuzhang, Marc Tremblay, Mauro Cortellazzi, Max Kuklinski, mengji.fy, Mike Dias, mtunique, Nico Kruber, Omar Erminy, Patrick Lucas, paul, phoenixjiangnan, rami-alisawi, Ramkrishna, Rick Cox, Robert Metzger, Rodrigo Bonifacio, rtudoran, Seth Wiesman, Shaoxuan Wang, shijinkui, shuai.xus, Shuyi Chen, spkavuly, Stefano Bortoli, Stefan Richter, Stephan Ewen, Stephen Gran, sunjincheng121, tedyu, Till Rohrmann, tonycox, Tony Wei, twalthr, Tzu-Li (Gordon) Tai, Ufuk Celebi, Ventura Del Monte, Vijay Srinivasaraghavan, WangTaoTheTonic, wenlong.lwl, xccui, xiaogang.sxg, Xpray, zcb, zentol, zhangminglei, Zhenghua Gao, Zhijiang, Zhuoluo Yang, zjureel, Zohar Mizrahi, 士远, 槿瑜, 淘江, 金竹&lt;/p&gt;

</description>
<pubDate>Thu, 01 Jun 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/06/01/release-1.3.0.html</link>
<guid isPermaLink="true">/news/2017/06/01/release-1.3.0.html</guid>
</item>

<item>
<title>Introducing Docker Images for Apache Flink</title>
<description>&lt;p&gt;For some time, the Apache Flink community has provided scripts to build a Docker image to run Flink. Now, starting with version 1.2.1, Flink will have a &lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;Docker image&lt;/a&gt; on the Docker Hub. This image is maintained by the Flink community and curated by the &lt;a href=&quot;https://github.com/docker-library/official-images&quot;&gt;Docker&lt;/a&gt; team to ensure it meets the quality standards for container images of the Docker community.&lt;/p&gt;

&lt;p&gt;A community-maintained way to run Apache Flink on Docker and other container runtimes and orchestrators is part of the ongoing effort by the Flink community to make Flink a first-class citizen of the container world.&lt;/p&gt;

&lt;p&gt;If you want to use the Docker image today you can get the latest version by running:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker pull flink
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;And to run a local Flink cluster with one TaskManager and the Web UI exposed on port 8081, run:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker run -t -p 8081:8081 flink local
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With this image there are various ways to start a Flink cluster, both locally and in a distributed environment. Take a look at the &lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;documentation&lt;/a&gt; that shows how to run a Flink cluster with multiple TaskManagers locally using Docker Compose or across multiple machines using Docker Swarm. You can also use the examples as a reference to create configurations for other platforms like Mesos and Kubernetes.&lt;/p&gt;

&lt;p&gt;While this announcement is an important milestone, it’s just the first step to help users run containerized Flink in production. There are &lt;a href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20Docker%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC&quot;&gt;improvements&lt;/a&gt; to be made in Flink itself and we will continue to improve these Docker images and for the documentation and examples surrounding them.&lt;/p&gt;

&lt;p&gt;This is of course a team effort, so any contribution is welcome. The &lt;a href=&quot;https://github.com/docker-flink&quot;&gt;docker-flink&lt;/a&gt; GitHub organization hosts the source files to &lt;a href=&quot;https://github.com/docker-flink/docker-flink&quot;&gt;generate the images&lt;/a&gt; and the &lt;a href=&quot;https://github.com/docker-flink/docs/tree/master/flink&quot;&gt;documentation&lt;/a&gt; that is presented alongside the images on Docker Hub.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: The docker images are provided as a community project by individuals on a best-effort basis. They are not official releases by the Apache Flink PMC.&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Tue, 16 May 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/05/16/official-docker-image.html</link>
<guid isPermaLink="true">/news/2017/05/16/official-docker-image.html</guid>
</item>

<item>
<title>Apache Flink 1.2.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.2 series.&lt;/p&gt;

&lt;p&gt;This release includes many critical fixes for Flink 1.2.0. The list below includes a detailed list of all fixes.&lt;/p&gt;

&lt;p&gt;We highly recommend all users to upgrade to Flink 1.2.1.&lt;/p&gt;

&lt;p&gt;Please note that there are two unresolved major issues in Flink 1.2.1 and 1.2.0:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6353&quot;&gt;FLINK-6353&lt;/a&gt; Restoring using CheckpointedRestoring does not work from 1.2 to 1.2&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6188&quot;&gt;FLINK-6188&lt;/a&gt; Some setParallelism() methods can’t cope with default parallelism&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Release Notes - Flink - Version 1.2.1&lt;/h2&gt;

&lt;h3&gt;        Sub-task
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5546&quot;&gt;FLINK-5546&lt;/a&gt;] -         java.io.tmpdir setted as project build directory in surefire plugin
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5640&quot;&gt;FLINK-5640&lt;/a&gt;] -         configure the explicit Unit Test file suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5723&quot;&gt;FLINK-5723&lt;/a&gt;] -         Use &amp;quot;Used&amp;quot; instead of &amp;quot;Initial&amp;quot; to make taskmanager tag more readable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5825&quot;&gt;FLINK-5825&lt;/a&gt;] -         In yarn mode, a small pic can not be loaded
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        Bug
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4813&quot;&gt;FLINK-4813&lt;/a&gt;] -         Having flink-test-utils as a dependency outside Flink fails the build
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4848&quot;&gt;FLINK-4848&lt;/a&gt;] -         keystoreFilePath should be checked against null in SSLUtils#createSSLServerContext
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5628&quot;&gt;FLINK-5628&lt;/a&gt;] -         CheckpointStatsTracker implements Serializable but isn&amp;#39;t
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5644&quot;&gt;FLINK-5644&lt;/a&gt;] -         Task#lastCheckpointSize metric broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5650&quot;&gt;FLINK-5650&lt;/a&gt;] -         Flink-python tests executing cost too long time
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5652&quot;&gt;FLINK-5652&lt;/a&gt;] -         Memory leak in AsyncDataStream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5669&quot;&gt;FLINK-5669&lt;/a&gt;] -         flink-streaming-contrib DataStreamUtils.collect in local environment mode fails when offline
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5678&quot;&gt;FLINK-5678&lt;/a&gt;] -         User-defined TableFunctions do not support all types of parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5699&quot;&gt;FLINK-5699&lt;/a&gt;] -         Cancel with savepoint fails with a NPE if savepoint target directory not set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5701&quot;&gt;FLINK-5701&lt;/a&gt;] -         FlinkKafkaProducer should check asyncException on checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5708&quot;&gt;FLINK-5708&lt;/a&gt;] -         we should remove duplicated configuration options 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5732&quot;&gt;FLINK-5732&lt;/a&gt;] -         Java quick start mvn command line is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5749&quot;&gt;FLINK-5749&lt;/a&gt;] -             unset HADOOP_HOME and HADOOP_CONF_DIR to avoid env in build machine failing the UT and IT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5751&quot;&gt;FLINK-5751&lt;/a&gt;] -         404 in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5771&quot;&gt;FLINK-5771&lt;/a&gt;] -         DelimitedInputFormat does not correctly handle multi-byte delimiters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5773&quot;&gt;FLINK-5773&lt;/a&gt;] -         Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5806&quot;&gt;FLINK-5806&lt;/a&gt;] -         TaskExecutionState toString format have wrong key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5814&quot;&gt;FLINK-5814&lt;/a&gt;] -         flink-dist creates wrong symlink when not used with cleaned before
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5817&quot;&gt;FLINK-5817&lt;/a&gt;] -         Fix test concurrent execution failure by test dir conflicts.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5828&quot;&gt;FLINK-5828&lt;/a&gt;] -         BlobServer create cache dir has concurrency safety problem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5885&quot;&gt;FLINK-5885&lt;/a&gt;] -         Java code snippet instead of scala in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5907&quot;&gt;FLINK-5907&lt;/a&gt;] -         RowCsvInputFormat bug on parsing tsv
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5934&quot;&gt;FLINK-5934&lt;/a&gt;] -         Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5940&quot;&gt;FLINK-5940&lt;/a&gt;] -         ZooKeeperCompletedCheckpointStore cannot handle broken state handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5942&quot;&gt;FLINK-5942&lt;/a&gt;] -         Harden ZooKeeperStateHandleStore to deal with corrupted data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5945&quot;&gt;FLINK-5945&lt;/a&gt;] -         Close function in OuterJoinOperatorBase#executeOnCollections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5949&quot;&gt;FLINK-5949&lt;/a&gt;] -         Flink on YARN checks for Kerberos credentials for non-Kerberos authentication methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5962&quot;&gt;FLINK-5962&lt;/a&gt;] -         Cancel checkpoint canceller tasks in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5965&quot;&gt;FLINK-5965&lt;/a&gt;] -         Typo on DropWizard wrappers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5972&quot;&gt;FLINK-5972&lt;/a&gt;] -         Don&amp;#39;t allow shrinking merging windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5985&quot;&gt;FLINK-5985&lt;/a&gt;] -         Flink treats every task as stateful (making topology changes impossible)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6000&quot;&gt;FLINK-6000&lt;/a&gt;] -         Can not start HA cluster with start-cluster.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6001&quot;&gt;FLINK-6001&lt;/a&gt;] -         NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and allowedLateness
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6002&quot;&gt;FLINK-6002&lt;/a&gt;] -         Documentation: &amp;#39;MacOS X&amp;#39; under &amp;#39;Download and Start Flink&amp;#39; in Quickstart page is not rendered correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6006&quot;&gt;FLINK-6006&lt;/a&gt;] -         Kafka Consumer can lose state if queried partition list is incomplete on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6025&quot;&gt;FLINK-6025&lt;/a&gt;] -         User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6051&quot;&gt;FLINK-6051&lt;/a&gt;] -         Wrong metric scope names in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6084&quot;&gt;FLINK-6084&lt;/a&gt;] -         Cassandra connector does not declare all dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6133&quot;&gt;FLINK-6133&lt;/a&gt;] -         fix build status in README.md
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6170&quot;&gt;FLINK-6170&lt;/a&gt;] -         Some checkpoint metrics rely on latest stat snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6181&quot;&gt;FLINK-6181&lt;/a&gt;] -         Zookeeper scripts use invalid regex
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6182&quot;&gt;FLINK-6182&lt;/a&gt;] -         Fix possible NPE in SourceStreamTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6183&quot;&gt;FLINK-6183&lt;/a&gt;] -         TaskMetricGroup may not be cleanup when Task.run() is never called or exits early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6184&quot;&gt;FLINK-6184&lt;/a&gt;] -         Buffer metrics can cause NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6203&quot;&gt;FLINK-6203&lt;/a&gt;] -         DataSet Transformations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6207&quot;&gt;FLINK-6207&lt;/a&gt;] -         Duplicate type serializers for async snapshots of CopyOnWriteStateTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6308&quot;&gt;FLINK-6308&lt;/a&gt;] -         Task managers are not attaching to job manager on macos
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        Improvement
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4326&quot;&gt;FLINK-4326&lt;/a&gt;] -         Flink start-up scripts should optionally start services on the foreground
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5217&quot;&gt;FLINK-5217&lt;/a&gt;] -         Deprecated interface Checkpointed make clear suggestion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5331&quot;&gt;FLINK-5331&lt;/a&gt;] -         PythonPlanBinderTest idling extremely long
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5581&quot;&gt;FLINK-5581&lt;/a&gt;] -         Improve Kerberos security related documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5639&quot;&gt;FLINK-5639&lt;/a&gt;] -         Clarify License implications of RabbitMQ Connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5680&quot;&gt;FLINK-5680&lt;/a&gt;] -         Document env.ssh.opts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5681&quot;&gt;FLINK-5681&lt;/a&gt;] -         Make ReaperThread for SafetyNetCloseableRegistry a singleton
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5702&quot;&gt;FLINK-5702&lt;/a&gt;] -         Kafka Producer docs should warn if using setLogFailuresOnly, at-least-once is compromised
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5705&quot;&gt;FLINK-5705&lt;/a&gt;] -         webmonitor&amp;#39;s request/response use UTF-8 explicitly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5713&quot;&gt;FLINK-5713&lt;/a&gt;] -         Protect against NPE in WindowOperator window cleanup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5721&quot;&gt;FLINK-5721&lt;/a&gt;] -         Add FoldingState to State Documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5800&quot;&gt;FLINK-5800&lt;/a&gt;] -         Make sure that the CheckpointStreamFactory is instantiated once per operator only
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5805&quot;&gt;FLINK-5805&lt;/a&gt;] -         improve docs for ProcessFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5807&quot;&gt;FLINK-5807&lt;/a&gt;] -         improved wording for doc home page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5837&quot;&gt;FLINK-5837&lt;/a&gt;] -         improve readability of the queryable state docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5876&quot;&gt;FLINK-5876&lt;/a&gt;] -         Mention Scala type fallacies for queryable state client serializers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5877&quot;&gt;FLINK-5877&lt;/a&gt;] -         Fix Scala snippet in Async I/O API doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5894&quot;&gt;FLINK-5894&lt;/a&gt;] -         HA docs are misleading re: state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5895&quot;&gt;FLINK-5895&lt;/a&gt;] -         Reduce logging aggressiveness of FileSystemSafetyNet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5938&quot;&gt;FLINK-5938&lt;/a&gt;] -         Replace ExecutionContext by Executor in Scheduler
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6212&quot;&gt;FLINK-6212&lt;/a&gt;] -         Missing reference to flink-avro dependency
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        New Feature
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6139&quot;&gt;FLINK-6139&lt;/a&gt;] -         Documentation for building / preparing Flink for MapR
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;        Task
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2883&quot;&gt;FLINK-2883&lt;/a&gt;] -         Add documentation to forbid key-modifying ReduceFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3903&quot;&gt;FLINK-3903&lt;/a&gt;] -         Homebrew Installation
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Wed, 26 Apr 2017 20:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/04/26/release-1.2.1.html</link>
<guid isPermaLink="true">/news/2017/04/26/release-1.2.1.html</guid>
</item>

<item>
<title>Continuous Queries on Dynamic Tables</title>
<description>&lt;h4 id=&quot;analyzing-data-streams-with-sql&quot;&gt;Analyzing Data Streams with SQL&lt;/h4&gt;

&lt;p&gt;More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/streams.png&quot; style=&quot;width:45%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. Due to these features, Flink is able to compute exact and deterministic results from high-volume input streams in near real-time while providing exactly-once semantics in case of failures.&lt;/p&gt;

&lt;p&gt;Flink’s core API for stream processing, the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream API&lt;/a&gt;, is very expressive and provides primitives for many common operations. Among other features, it offers highly customizable windowing logic, different state primitives with varying performance characteristics, hooks to register and react on timers, and tooling for efficient asynchronous requests to external systems. On the other hand, many stream analytics applications follow similar patterns and do not require the level of expressiveness as provided by the DataStream API. They could be expressed in a more natural and concise way using a domain specific language. As we all know, SQL is the de-facto standard for data analytics. For streaming analytics, SQL would enable a larger pool of people to specify applications on data streams in less time. However, no open source stream processor offers decent SQL support yet.&lt;/p&gt;

&lt;h2 id=&quot;why-is-sql-on-streams-a-big-deal&quot;&gt;Why is SQL on Streams a Big Deal?&lt;/h2&gt;

&lt;p&gt;SQL is the most widely used language for data analytics for many good reasons:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;SQL is declarative: You specify what you want but not how to compute it.&lt;/li&gt;
  &lt;li&gt;SQL can be effectively optimized: An optimizer figures out an efficient plan to compute your result.&lt;/li&gt;
  &lt;li&gt;SQL can be efficiently evaluated: The processing engine knows exactly what to compute and how to do so efficiently.&lt;/li&gt;
  &lt;li&gt;And finally, everybody knows and many tools speak SQL.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So being able to process and analyze data streams with SQL makes stream processing technology available to many more users. Moreover, it significantly reduces the time and effort to define efficient stream analytics applications due to the SQL’s declarative nature and potential to be automatically optimized.&lt;/p&gt;

&lt;p&gt;However, SQL (and the relational data model and algebra) were not designed with streaming data in mind. Relations are (multi-)sets and not infinite sequences of tuples. When executing a SQL query, conventional database systems and query engines read and process a data set, which is completely available, and produce a fixed sized result. In contrast, data streams continuously provide new records such that data arrives over time. Hence, streaming queries have to continuously process the arriving data and never “complete”.&lt;/p&gt;

&lt;p&gt;That being said, processing streams with SQL is not impossible. Some relational database systems feature eager maintenance of materialized views, which is similar to evaluating SQL queries on streams of data. A materialized view is defined as a SQL query just like a regular (virtual) view. However, the result of the query is actually stored (or materialized) in memory or on disk such that the view does not need to be computed on-the-fly when it is queried. In order to prevent that a materialized view becomes stale, the database system needs to update the view whenever its base relations (the tables referenced in its definition query) are modified. If we consider the changes on the view’s base relations as a stream of modifications (or as a changelog stream) it becomes obvious that materialized view maintenance and SQL on streams are somehow related.&lt;/p&gt;

&lt;h2 id=&quot;flinks-relational-apis-table-api-and-sql&quot;&gt;Flink’s Relational APIs: Table API and SQL&lt;/h2&gt;

&lt;p&gt;Since version 1.1.0 (released in August 2016), Flink features two semantically equivalent relational APIs, the language-embedded Table API (for Java and Scala) and standard SQL. Both APIs are designed as unified APIs for online streaming and historic batch data. This means that,&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;a query produces exactly the same result regardless whether its input is static batch data or streaming data.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Unified APIs for stream and batch processing are important for several reasons. First of all, users only need to learn a single API to process static and streaming data. Moreover, the same query can be used to analyze batch and streaming data, which allows to jointly analyze historic and live data in the same query. At the current state we haven’t achieved complete unification of batch and streaming semantics yet, but the community is making very good progress towards this goal.&lt;/p&gt;

&lt;p&gt;The following code snippet shows two equivalent Table API and SQL queries that compute a simple windowed aggregate on a stream of temperature sensor measurements. The syntax of the SQL query is based on &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite’s&lt;/a&gt; syntax for &lt;a href=&quot;https://calcite.apache.org/docs/reference.html#grouped-window-functions&quot;&gt;grouped window functions&lt;/a&gt; and will be supported in version 1.3.0 of Flink.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setStreamTimeCharacteristic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TimeCharacteristic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;EventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// define a table source to read sensor data (sensorId, time, room, temp)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorTable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// can be a CSV file, Kafka topic, database, or ...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register the table source&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensors&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// Table API&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tapiResult&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensors&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;// scan sensors table&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// define 1-hour window&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;                           &lt;span class=&quot;c1&quot;&gt;// group by window and room&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;temp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTemp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// compute average temperature&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// SQL&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sqlResult&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |SELECT room, TUMBLE_END(rowtime, INTERVAL &amp;#39;1&amp;#39; HOUR), AVG(temp) AS avgTemp&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |FROM sensors&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |GROUP BY TUMBLE(rowtime, INTERVAL &amp;#39;1&amp;#39; HOUR), room&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stripMargin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As you can see, both APIs are tightly integrated with each other and Flink’s primary &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html&quot;&gt;DataSet&lt;/a&gt; APIs. A &lt;code&gt;Table&lt;/code&gt; can be generated from and converted to a &lt;code&gt;DataSet&lt;/code&gt; or &lt;code&gt;DataStream&lt;/code&gt;. Hence, it is easily possible to scan an external table source such as a database or &lt;a href=&quot;https://parquet.apache.org&quot;&gt;Parquet&lt;/a&gt; file, do some preprocessing with a Table API query, convert the result into a &lt;code&gt;DataSet&lt;/code&gt; and run a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/gelly/index.html&quot;&gt;Gelly&lt;/a&gt; graph algorithm on it. The queries defined in the example above can also be used to process batch data by changing the execution environment.&lt;/p&gt;

&lt;p&gt;Internally, both APIs are translated into the same logical representation, optimized by Apache Calcite, and compiled into DataStream or DataSet programs. In fact, the optimization and translation process does not know whether a query was defined using the Table API or SQL. If you are curious about the details of the optimization process, have a look at &lt;a href=&quot;http://flink.apache.org/news/2016/05/24/stream-sql.html&quot;&gt;a blog post&lt;/a&gt; that we published last year. Since the Table API and SQL are equivalent in terms of semantics and only differ in syntax, we always refer to both APIs when we talk about SQL in this post.&lt;/p&gt;

&lt;p&gt;In its current state (version 1.2.0), Flink’s relational APIs support a limited set of relational operators on data streams, including projections, filters, and windowed aggregates. All supported operators have in common that they never update result records which have been emitted. This is clearly not an issue for record-at-a-time operators such as projection and filter. However, it affects operators that collect and process multiple records as for instance windowed aggregates. Since emitted results cannot be updated, input records, which arrive after a result has been emitted, have to be discarded in Flink 1.2.0.&lt;/p&gt;

&lt;p&gt;The limitations of the current version are acceptable for applications that emit data to storage systems such as Kafka topics, message queues, or files which only support append operations and no updates or deletes. Common use cases that follow this pattern are for example continuous ETL and stream archiving applications that persist streams to an archive or prepare data for further online (streaming) analysis or later offline analysis. Since it is not possible to update previously emitted results, these kinds of applications have to make sure that the emitted results are correct and will not need to be corrected in the future. The following figure illustrates such applications.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-append-out.png&quot; style=&quot;width:60%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;While queries that only support appends are useful for some kinds of applications and certain types of storage systems, there are many streaming analytics use cases that need to update results. This includes streaming applications that cannot discard late arriving records, need early results for (long-running) windowed aggregates, or require non-windowed aggregates. In each of these cases, previously emitted result records need to be updated. Result-updating queries often materialize their result to an external database or key-value store in order to make it accessible and queryable for external applications. Applications that implement this pattern are dashboards, reporting applications, or &lt;a href=&quot;http://2016.flink-forward.org/kb_sessions/joining-infinity-windowless-stream-processing-with-flink/&quot;&gt;other applications&lt;/a&gt;, which require timely access to continuously updated results. The following figure illustrates these kind of applications.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-update-out.png&quot; style=&quot;width:60%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;continuous-queries-on-dynamic-tables&quot;&gt;Continuous Queries on Dynamic Tables&lt;/h2&gt;

&lt;p&gt;Support for queries that update previously emitted results is the next big step for Flink’s relational APIs. This feature is so important because it vastly increases the scope of the APIs and the range of supported use cases. Moreover, many of the newly supported use cases can be challenging to implement using the DataStream API.&lt;/p&gt;

&lt;p&gt;So when adding support for result-updating queries, we must of course preserve the unified semantics for stream and batch inputs. We achieve this by the concept of &lt;em&gt;Dynamic Tables&lt;/em&gt;. A dynamic table is a table that is continuously updated and can be queried like a regular, static table. However, in contrast to a query on a batch table which terminates and returns a static table as result, a query on a dynamic table runs continuously and produces a table that is continuously updated depending on the modification on the input table. Hence, the resulting table is a dynamic table as well. This concept is very similar to materialized view maintenance as we discussed before.&lt;/p&gt;

&lt;p&gt;Assuming we can run queries on dynamic tables which produce new dynamic tables, the next question is, How do streams and dynamic tables relate to each other? The answer is that streams can be converted into dynamic tables and dynamic tables can be converted into streams. The following figure shows the conceptual model of processing a relational query on a stream.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/stream-query-stream.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;First, the stream is converted into a dynamic table. The dynamic table is queried with a continuous query, which produces a new dynamic table. Finally, the resulting table is converted back into a stream. It is important to note that this is only the logical model and does not imply how the query is actually executed. In fact, a continuous query is internally translated into a conventional DataStream program.&lt;/p&gt;

&lt;p&gt;In the following, we describe the different steps of this model:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Defining a dynamic table on a stream,&lt;/li&gt;
  &lt;li&gt;Querying a dynamic table, and&lt;/li&gt;
  &lt;li&gt;Emitting a dynamic table.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;defining-a-dynamic-table-on-a-stream&quot;&gt;Defining a Dynamic Table on a Stream&lt;/h2&gt;

&lt;p&gt;The first step of evaluating a SQL query on a dynamic table is to define a dynamic table on a stream. This means we have to specify how the records of a stream modify the dynamic table. The stream must carry records with a schema that is mapped to the relational schema of the table. There are two modes to define a dynamic table on a stream: &lt;em&gt;Append Mode&lt;/em&gt; and &lt;em&gt;Update Mode&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In append mode each stream record is an insert modification to the dynamic table. Hence, all records of a stream are appended to the dynamic table such that it is ever-growing and infinite in size. The following figure illustrates the append mode.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/append-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;In update mode a stream record can represent an insert, update, or delete modification on the dynamic table (append mode is in fact a special case of update mode). When defining a dynamic table on a stream via update mode, we can specify a unique key attribute on the table. In that case, update and delete operations are performed with respect to the key attribute. The update mode is visualized in the following figure.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/replace-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;querying-a-dynamic-table&quot;&gt;Querying a Dynamic Table&lt;/h2&gt;

&lt;p&gt;Once we have defined a dynamic table, we can run a query on it. Since dynamic tables change over time, we have to define what it means to query a dynamic table. Let’s imagine we take a snapshot of a dynamic table at a specific point in time. This snapshot can be treated as a regular static batch table. We denote a snapshot of a dynamic table &lt;em&gt;A&lt;/em&gt; at a point &lt;em&gt;t&lt;/em&gt; as &lt;em&gt;A[t]&lt;/em&gt;. The snapshot can be queried with any SQL query. The query produces a regular static table as result. We denote the result of a query &lt;em&gt;q&lt;/em&gt; on a dynamic table &lt;em&gt;A&lt;/em&gt; at time &lt;em&gt;t&lt;/em&gt; as &lt;em&gt;q(A[t])&lt;/em&gt;. If we repeatedly compute the result of a query on snapshots of a dynamic table for progressing points in time, we obtain many static result tables which are changing over time and effectively constitute a dynamic table. We define the semantics of a query on a dynamic table as follows.&lt;/p&gt;

&lt;p&gt;A query &lt;em&gt;q&lt;/em&gt; on a dynamic table &lt;em&gt;A&lt;/em&gt; produces a dynamic table &lt;em&gt;R&lt;/em&gt;, which is at each point in time &lt;em&gt;t&lt;/em&gt; equivalent to the result of applying &lt;em&gt;q&lt;/em&gt; on &lt;em&gt;A[t]&lt;/em&gt;, i.e., &lt;em&gt;R[t] = q(A[t])&lt;/em&gt;. This definition implies that running the same query on &lt;em&gt;q&lt;/em&gt; on a batch table and on a streaming table produces the same result. In the following, we show two examples to illustrate the semantics of queries on dynamic tables.&lt;/p&gt;

&lt;p&gt;In the figure below, we see a dynamic input table &lt;em&gt;A&lt;/em&gt; on the left side, which is defined in append mode. At time &lt;em&gt;t = 8&lt;/em&gt;, &lt;em&gt;A&lt;/em&gt; consists of six rows (colored in blue). At time &lt;em&gt;t = 9&lt;/em&gt; and &lt;em&gt;t = 12&lt;/em&gt;, one row is appended to &lt;em&gt;A&lt;/em&gt; (visualized in green and orange, respectively). We run a simple query on table &lt;em&gt;A&lt;/em&gt; which is shown in the center of the figure. The query groups by attribute &lt;em&gt;k&lt;/em&gt; and counts the records per group. On the right hand side we see the result of query &lt;em&gt;q&lt;/em&gt; at time &lt;em&gt;t = 8&lt;/em&gt; (blue), &lt;em&gt;t = 9&lt;/em&gt; (green), and &lt;em&gt;t = 12&lt;/em&gt; (orange). At each point in time t, the result table is equivalent to a batch query on the dynamic table &lt;em&gt;A&lt;/em&gt; at time &lt;em&gt;t&lt;/em&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-groupBy-cnt.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The query in this example is a simple grouped (but not windowed) aggregation query. Hence, the size of the result table depends on the number of distinct grouping keys of the input table. Moreover, it is worth noticing that the query continuously updates result rows that it had previously emitted instead of merely adding new rows.&lt;/p&gt;

&lt;p&gt;The second example shows a similar query which differs in one important aspect. In addition to grouping on the key attribute &lt;em&gt;k&lt;/em&gt;, the query also groups records into tumbling windows of five seconds, which means that it computes a count for each value of &lt;em&gt;k&lt;/em&gt; every five seconds. Again, we use Calcite’s &lt;a href=&quot;https://calcite.apache.org/docs/reference.html#grouped-window-functions&quot;&gt;group window functions&lt;/a&gt; to specify this query. On the left side of the figure we see the input table &lt;em&gt;A&lt;/em&gt; and how it changes over time in append mode. On the right we see the result table and how it evolves over time.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-groupBy-window-cnt.png&quot; style=&quot;width:80%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;In contrast to the result of the first example, the resulting table grows relative to the time, i.e., every five seconds new result rows are computed (given that the input table received more records in the last five seconds). While the non-windowed query (mostly) updates rows of the result table, the windowed aggregation query only appends new rows to the result table.&lt;/p&gt;

&lt;p&gt;Although this blog post focuses on the semantics of SQL queries on dynamic tables and not on how to efficiently process such a query, we’d like to point out that it is not possible to compute the complete result of a query from scratch whenever an input table is updated. Instead, the query is compiled into a streaming program which continuously updates its result based on the changes on its input. This implies that not all valid SQL queries are supported but only those that can be continuously, incrementally, and efficiently computed. We plan discuss details about the evaluation of SQL queries on dynamic tables in a follow up blog post.&lt;/p&gt;

&lt;h2 id=&quot;emitting-a-dynamic-table&quot;&gt;Emitting a Dynamic Table&lt;/h2&gt;

&lt;p&gt;Querying a dynamic table yields another dynamic table, which represents the query’s results. Depending on the query and its input tables, the result table is continuously modified by insert, update, and delete changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without update modifications, or anything in between.&lt;/p&gt;

&lt;p&gt;Traditional database systems use logs to rebuild tables in case of failures and for replication. There are different logging techniques, such as UNDO, REDO, and UNDO/REDO logging. In a nutshell, UNDO logs record the previous value of a modified element to revert incomplete transactions, REDO logs record the new value of a modified element to redo lost changes of completed transactions, and UNDO/REDO logs record the old and the new value of a changed element to undo incomplete transactions and redo lost changes of completed transactions. Based on the principles of these logging techniques, a dynamic table can be converted into two types of changelog streams, a &lt;em&gt;REDO Stream&lt;/em&gt; and a &lt;em&gt;REDO+UNDO Stream&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;A dynamic table is converted into a redo+undo stream by converting the modifications on the table into stream messages. An insert modification is emitted as an insert message with the new row, a delete modification is emitted as a delete message with the old row, and an update modification is emitted as a delete message with the old row and an insert message with the new row. This behavior is illustrated in the following figure.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/undo-redo-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The left shows a dynamic table which is maintained in append mode and serves as input to the query in the center. The result of the query converted into a redo+undo stream which is shown at the bottom. The first record &lt;em&gt;(1, A)&lt;/em&gt; of the input table results in a new record in the result table and hence in an insert message &lt;em&gt;+(A, 1)&lt;/em&gt; to the stream. The second input record with &lt;em&gt;k = ‘A’&lt;/em&gt; &lt;em&gt;(4, A)&lt;/em&gt; produces an update of the &lt;em&gt;(A, 1)&lt;/em&gt; record in the result table and hence yields a delete message &lt;em&gt;-(A, 1)&lt;/em&gt; and an insert message for &lt;em&gt;+(A, 2)&lt;/em&gt;. All downstream operators or data sinks need to be able to correctly handle both types of messages.&lt;/p&gt;

&lt;p&gt;A dynamic table can be converted into a redo stream in two cases: either it is an append-only table (i.e., it only has insert modifications) or it has a unique key attribute. Each insert modification on the dynamic table results in an insert message with the new row to the redo stream. Due to the restriction of redo streams, only tables with unique keys can have update and delete modifications. If a key is removed from the keyed dynamic table, either because a row is deleted or because the key attribute of a row was modified, a delete message with the removed key is emitted to the redo stream. An update modification yields an update message with the updating, i.e., new row. Since delete and update modifications are defined with respect to the unique key, the downstream operators need to be able to access previous values by key. The figure below shows how the result table of the same query as above is converted into a redo stream.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/redo-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The row &lt;em&gt;(1, A)&lt;/em&gt; which yields an insert into the dynamic table results in the &lt;em&gt;+(A, 1)&lt;/em&gt; insert message. The row &lt;em&gt;(4, A)&lt;/em&gt; which produces an update yields the &lt;em&gt;*(A, 2)&lt;/em&gt; update message.&lt;/p&gt;

&lt;p&gt;Common use cases for redo streams are to write the result of a query to an append-only storage system, like rolling files or a Kafka topic, or to a data store with keyed access, such as Cassandra, a relational DBMS, or a compacted Kafka topic. It is also possible to materialize a dynamic table as keyed state inside of the streaming application that evaluates the continuous query and make it queryable from external systems. With this design Flink itself maintains the result of a continuous SQL query on a stream and serves key lookups on the result table, for instance from a dashboard application.&lt;/p&gt;

&lt;h2 id=&quot;what-will-change-when-switching-to-dynamic-tables&quot;&gt;What will Change When Switching to Dynamic Tables?&lt;/h2&gt;

&lt;p&gt;In version 1.2, all streaming operators of Flink’s relational APIs, like filter, project, and group window aggregates, only emit new rows and are not capable of updating previously emitted results. In contrast, dynamic table are able to handle update and delete modifications. Now you might ask yourself, How does the processing model of the current version relate to the new dynamic table model? Will the semantics of the APIs completely change and do we need to reimplement the APIs from scratch to achieve the desired semantics?&lt;/p&gt;

&lt;p&gt;The answer to all these questions is simple. The current processing model is a subset of the dynamic table model. Using the terminology we introduced in this post, the current model converts a stream into a dynamic table in append mode, i.e., an infinitely growing table. Since all operators only accept insert changes and produce insert changes on their result table (i.e., emit new rows), all supported queries result in dynamic append tables, which are converted back into DataStreams using the redo model for append-only tables. Consequently, the semantics of the current model are completely covered and preserved by the new dynamic table model.&lt;/p&gt;

&lt;h2 id=&quot;conclusion-and-outlook&quot;&gt;Conclusion and Outlook&lt;/h2&gt;

&lt;p&gt;Flink’s relational APIs are great to implement stream analytics applications in no time and used in several production settings. In this blog post we discussed the future of the Table API and SQL. This effort will make Flink and stream processing accessible to more people. Moreover, the unified semantics for querying historic and real-time data as well as the concept of querying and maintaining dynamic tables will enable and significantly ease the implementation of many exciting use cases and applications. As this post was focusing on the semantics of relational queries on streams and dynamic tables, we did not discuss the details of how a query will be executed, which includes the internal implementation of retractions, handling of late events, support for early results, and bounding space requirements. We plan to publish a follow up blog post on this topic at a later point in time.&lt;/p&gt;

&lt;p&gt;In recent months, many members of the Flink community have been discussing and contributing to the relational APIs. We made great progress so far. While most work has focused on processing streams in append mode, the next steps on the agenda are to work on dynamic tables to support queries that update their results. If you are excited about the idea of processing streams with SQL and would like to contribute to this effort, please give feedback, join the discussions on the mailing list, or grab a JIRA issue to work on.&lt;/p&gt;
</description>
<pubDate>Tue, 04 Apr 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/04/04/dynamic-tables.html</link>
<guid isPermaLink="true">/news/2017/04/04/dynamic-tables.html</guid>
</item>

<item>
<title>From Streams to Tables and Back Again: An Update on Flink&#39;s Table &amp; SQL API</title>
<description>&lt;p&gt;Stream processing can deliver a lot of value. Many organizations have recognized the benefit of managing large volumes of data in real-time, reacting quickly to trends, and providing customers with live services at scale. Streaming applications with well-defined business logic can deliver a competitive advantage.&lt;/p&gt;

&lt;p&gt;Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream&lt;/a&gt; abstraction is a powerful API which lets you flexibly define both basic and complex streaming pipelines. Additionally, it offers low-level operations such as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html&quot;&gt;Async IO&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html&quot;&gt;ProcessFunctions&lt;/a&gt;. However, many users do not need such a deep level of flexibility. They need an API which quickly solves 80% of their use cases where simple tasks can be defined using little code.&lt;/p&gt;

&lt;p&gt;To deliver the power of stream processing to a broader set of users, the Apache Flink community is developing APIs that provide simpler abstractions and more concise syntax so that users can focus on their business logic instead of advanced streaming concepts. Along with other APIs (such as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/cep.html&quot;&gt;CEP&lt;/a&gt; for complex event processing on streams), Flink offers a relational API that aims to unify stream and batch processing: the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html&quot;&gt;Table &amp;amp; SQL API&lt;/a&gt;, often referred to as the Table API.&lt;/p&gt;

&lt;p&gt;Recently, contributors working for companies such as Alibaba, Huawei, data Artisans, and more decided to further develop the Table API. Over the past year, the Table API has been rewritten entirely. Since Flink 1.1, its core has been based on &lt;a href=&quot;http://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;, which parses SQL and optimizes all relational queries. Today, the Table API can address a wide range of use cases in both batch and stream environments with unified semantics.&lt;/p&gt;

&lt;p&gt;This blog post summarizes the current status of Flink’s Table API and showcases some of the recently-added features in Apache Flink. Among the features presented here are the unified access to batch and streaming data, data transformation, and window operators.
The following paragraphs are not only supposed to give you a general overview of the Table API, but also to illustrate the potential of relational APIs in the future.&lt;/p&gt;

&lt;p&gt;Because the Table API is built on top of Flink’s core APIs, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStreams&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html&quot;&gt;DataSets&lt;/a&gt; can be converted to a Table and vice-versa without much overhead. Hereafter, we show how to create tables from different sources and specify programs that can be executed locally or in a distributed setting. In this post, we will use the Scala version of the Table API, but there is also a Java version as well as a SQL API with an equivalent set of features.&lt;/p&gt;

&lt;h2 id=&quot;data-transformation-and-etl&quot;&gt;Data Transformation and ETL&lt;/h2&gt;

&lt;p&gt;A common task in every data processing pipeline is importing data from one or multiple systems, applying some transformations to it, then exporting the data to another system. The Table API can help to manage these recurring tasks. For reading data, the API provides a set of ready-to-use &lt;code&gt;TableSources&lt;/code&gt; such as a &lt;code&gt;CsvTableSource&lt;/code&gt; and &lt;code&gt;KafkaTableSource&lt;/code&gt;, however, it also allows the implementation of custom &lt;code&gt;TableSources&lt;/code&gt; that can hide configuration specifics (e.g. watermark generation) from users who are less familiar with streaming concepts.&lt;/p&gt;

&lt;p&gt;Let’s assume we have a CSV file that stores customer information. The values are delimited by a “|”-character and contain a customer identifier, name, timestamp of the last update, and preferences encoded in a comma-separated key-value string:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;42|Bob Smith|2016-07-23 16:10:11|color=12,length=200,size=200
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The following example illustrates how to read a CSV file and perform some data cleansing before converting it to a regular DataStream program.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// set up execution environment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// configure table source&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customerSource&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CsvTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;/path/to/customer_data.csv&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ignoreFirstLine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fieldDelimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;|&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;LONG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;last_update&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;prefs&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// name your table source&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customerSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// define your table program&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNotNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;last_update&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;2016-01-01 00:00:00&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTimestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lowerCase&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// convert it to a data stream&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Table API comes with a large set of built-in functions that make it easy to specify  business logic using a language integrated query (LINQ) syntax. In the example above, we filter out customers with invalid names and only select those that updated their preferences recently. We convert names to lowercase for normalization. For debugging purposes, we convert the table into a DataStream and print it.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;CsvTableSource&lt;/code&gt; supports both batch and stream environments. If the programmer wants to execute the program above in a batch application, all he or she has to do is to replace the environment via &lt;code&gt;ExecutionEnvironment&lt;/code&gt; and change the output conversion from &lt;code&gt;DataStream&lt;/code&gt; to &lt;code&gt;DataSet&lt;/code&gt;. The Table API program itself doesn’t change.&lt;/p&gt;

&lt;p&gt;In the example, we converted the table program to a data stream of &lt;code&gt;Row&lt;/code&gt; objects. However, we are not limited to row data types. The Table API supports all types from the underlying APIs such as Java and Scala Tuples, Case Classes, POJOs, or generic types that are serialized using Kryo. Let’s assume that we want to have regular object (POJO) with the following format instead of generic rows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Customer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;java.util.Properties&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can use the following table program to convert the CSV file into Customer objects. Flink takes care of creating objects and mapping fields for us.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;last_update&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseProperties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Customer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might have noticed that the query above uses a function to parse the preferences field. Even though Flink’s Table API is shipped with a large set of built-in functions, is often necessary to define custom user-defined scalar functions. In the above example we use a user-defined function &lt;code&gt;parseProperties&lt;/code&gt;. The following code snippet shows how easily we can implement a scalar function.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;object&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;parseProperties&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScalarFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Properties&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foreach&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Scalar functions can be used to deserialize, extract, or convert values (and more). By overwriting the &lt;code&gt;open()&lt;/code&gt; method we can even have access to runtime information such as distributed cached files or metrics. Even the &lt;code&gt;open()&lt;/code&gt; method is only called once during the runtime’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/task_lifecycle.html&quot;&gt;task lifecycle&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;unified-windowing-for-static-and-streaming-data&quot;&gt;Unified Windowing for Static and Streaming Data&lt;/h2&gt;

&lt;p&gt;Another very common task, especially when working with continuous data, is the definition of windows to split a stream into pieces of finite size, over which we can apply computations. At the moment, the Table API supports three types of windows: sliding windows, tumbling windows, and session windows (for general definitions of the different types of windows, we recommend &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html&quot;&gt;Flink’s documentation&lt;/a&gt;). All three window types work on &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html&quot;&gt;event or processing time&lt;/a&gt;. Session windows can be defined over time intervals, sliding and tumbling windows can be defined over time intervals or a number of rows.&lt;/p&gt;

&lt;p&gt;Let’s assume that our customer data from the example above is an event stream of updates generated whenever the customer updated his or her preferences. We assume that events come from a TableSource that has assigned timestamps and watermarks. The definition of a window happens again in a LINQ-style fashion. The following example could be used to count the updates to the preferences during one day.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.d&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ay&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;from&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;to&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;updates&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;By using the &lt;code&gt;on()&lt;/code&gt; parameter, we can specify whether the window is supposed to work on event-time or not. The Table API assumes that timestamps and watermarks are assigned correctly when using event-time. Elements with timestamps smaller than the last received watermark are dropped. Since the extraction of timestamps and generation of watermarks depends on the data source and requires some deeper knowledge of their origin, the TableSource or the upstream DataStream is usually responsible for assigning these properties.&lt;/p&gt;

&lt;p&gt;The following code shows how to define other types of windows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// using processing-time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;100.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;manyRowWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// using event-time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Session&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;withGap&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;15.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minutes&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;sessionWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Slide&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.d&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ay&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;dailyWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Since batch is just a special case of streaming (where a batch happens to have a defined start and end point), it is also possible to apply all of these windows in a batch execution environment. Without any modification of the table program itself, we can run the code on a DataSet given that we specified a column named “rowtime”. This is particularly interesting if we want to compute exact results from time-to-time, so that late events that are heavily out-of-order can be included in the computation.&lt;/p&gt;

&lt;p&gt;At the moment, the Table API only supports so-called “group windows” that also exist in the DataStream API. Other windows such as SQL’s OVER clause windows are in development and &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot;&gt;planned for Flink 1.3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In order to demonstrate the expressiveness and capabilities of the API, here’s a snippet with a more advanced example of an exponentially decaying moving average over a sliding window of one hour which returns aggregated results every second. The table program weighs recent orders more heavily than older orders. This example is borrowed from &lt;a href=&quot;https://calcite.apache.org/docs/stream.html#hopping-windows&quot;&gt;Apache Calcite&lt;/a&gt; and shows what will be possible in future Flink releases for both the Table API and SQL.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Slide&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;productId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;productId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;unitPrice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;user-defined-table-functions&quot;&gt;User-defined Table Functions&lt;/h2&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html#user-defined-table-functions&quot;&gt;User-defined table functions&lt;/a&gt; were added in Flink 1.2. These can be quite useful for table columns containing non-atomic values which need to be extracted and mapped to separate fields before processing. Table functions take an arbitrary number of scalar values and allow for returning an arbitrary number of rows as output instead of a single value, similar to a flatMap function in the DataStream or DataSet API. The output of a table function can then be joined with the original row in the table by using either a left-outer join or cross join.&lt;/p&gt;

&lt;p&gt;Using the previously-mentioned customer table, let’s assume we want to produce a table that contains the color and size preferences as separate columns. The table program would look like this:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create an instance of the table function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extractPrefs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PropertiesExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// derive rows and join them with original row&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;extractPrefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;PropertiesExtractor&lt;/code&gt; is a user-defined table function that extracts the color and size. We are not interested in customers that haven’t set these preferences and thus don’t emit anything if both properties are not present in the string value. Since we are using a (cross) join in the program, customers without a result on the right side of the join will be filtered out.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PropertiesExtractor&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Unit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;c1&quot;&gt;// split string into (key, value) pairs&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;color&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;color&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;size&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;// emit a row if color and size are specified&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
      &lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// skip&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;override&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getResultType&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RowTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;There is significant interest in making streaming more accessible and easier to use. Flink’s Table API development is happening quickly, and we believe that soon, you will be able to implement large batch or streaming pipelines using purely relational APIs or even convert existing Flink jobs to table programs. The Table API is already a very useful tool since you can work around limitations and missing features at any time by switching back-and-forth between the DataSet/DataStream abstraction to the Table abstraction.&lt;/p&gt;

&lt;p&gt;Contributions like support of Apache Hive UDFs, external catalogs, more TableSources, additional windows, and more operators will make the Table API an even more useful tool. Particularly, the upcoming introduction of Dynamic Tables, which is worth a blog post of its own, shows that even in 2017, new relational APIs open the door to a number of possibilities.&lt;/p&gt;

&lt;p&gt;Try it out, or even better, join the design discussions on the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel&quot;&gt;JIRA&lt;/a&gt; and start contributing!&lt;/p&gt;
</description>
<pubDate>Wed, 29 Mar 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/03/29/table-sql-api-update.html</link>
<guid isPermaLink="true">/news/2017/03/29/table-sql-api-update.html</guid>
</item>

<item>
<title>Apache Flink 1.1.5 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.&lt;/p&gt;

&lt;p&gt;This release includes critical fixes for HA recovery robustness, fault tolerance
guarantees of the Flink Kafka Connector, as well as classloading issues with the Kryo serializer.
We highly recommend all users to upgrade to Flink 1.1.5.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;release-notes---flink---version-115&quot;&gt;Release Notes - Flink - Version 1.1.5&lt;/h2&gt;

&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5701&quot;&gt;FLINK-5701&lt;/a&gt;] -         FlinkKafkaProducer should check asyncException on checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6006&quot;&gt;FLINK-6006&lt;/a&gt;] -         Kafka Consumer can lose state if queried partition list is incomplete on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5940&quot;&gt;FLINK-5940&lt;/a&gt;] -         ZooKeeperCompletedCheckpointStore cannot handle broken state handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5942&quot;&gt;FLINK-5942&lt;/a&gt;] -         Harden ZooKeeperStateHandleStore to deal with corrupted data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6025&quot;&gt;FLINK-6025&lt;/a&gt;] -         User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5945&quot;&gt;FLINK-5945&lt;/a&gt;] -         Close function in OuterJoinOperatorBase#executeOnCollections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5934&quot;&gt;FLINK-5934&lt;/a&gt;] -         Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5771&quot;&gt;FLINK-5771&lt;/a&gt;] -         DelimitedInputFormat does not correctly handle multi-byte delimiters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5647&quot;&gt;FLINK-5647&lt;/a&gt;] -         Fix RocksDB Backend Cleanup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] -         CompilerException: &quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5585&quot;&gt;FLINK-5585&lt;/a&gt;] -         NullPointer Exception in JobManager.updateAccumulators
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5484&quot;&gt;FLINK-5484&lt;/a&gt;] -         Add test for registered Kryo types
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5518&quot;&gt;FLINK-5518&lt;/a&gt;] -         HadoopInputFormat throws NPE when close() is called before open()
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5575&quot;&gt;FLINK-5575&lt;/a&gt;] -         in old releases, warn users and guide them to the latest stable docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5639&quot;&gt;FLINK-5639&lt;/a&gt;] -         Clarify License implications of RabbitMQ Connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5466&quot;&gt;FLINK-5466&lt;/a&gt;] -         Make production environment default in gulpfile
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 23 Mar 2017 19:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/03/23/release-1.1.5.html</link>
<guid isPermaLink="true">/news/2017/03/23/release-1.1.5.html</guid>
</item>

<item>
<title>Announcing Apache Flink 1.2.0</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.2.0 release. Over the past months, the Flink community has been working hard to resolve 650 issues. See the &lt;a href=&quot;http://flink.apache.org/blog/release_1.2.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;

&lt;p&gt;This is the third major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.&lt;/p&gt;

&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/&quot;&gt;documentation&lt;/a&gt;. Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;. Some highlights of the release are listed below.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#dynamic-scaling--key-groups&quot; id=&quot;markdown-toc-dynamic-scaling--key-groups&quot;&gt;Dynamic Scaling / Key Groups&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#rescalable-non-partitioned-state&quot; id=&quot;markdown-toc-rescalable-non-partitioned-state&quot;&gt;Rescalable Non-Partitioned State&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#processfunction&quot; id=&quot;markdown-toc-processfunction&quot;&gt;ProcessFunction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#async-io&quot; id=&quot;markdown-toc-async-io&quot;&gt;Async I/O&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#run-flink-with-apache-mesos&quot; id=&quot;markdown-toc-run-flink-with-apache-mesos&quot;&gt;Run Flink with Apache Mesos&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#secure-data-access&quot; id=&quot;markdown-toc-secure-data-access&quot;&gt;Secure Data Access&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#queryable-state&quot; id=&quot;markdown-toc-queryable-state&quot;&gt;Queryable State&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#backwards-compatible-savepoints&quot; id=&quot;markdown-toc-backwards-compatible-savepoints&quot;&gt;Backwards compatible savepoints&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#table-api--sql&quot; id=&quot;markdown-toc-table-api--sql&quot;&gt;Table API &amp;amp; SQL&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#miscellaneous-improvements&quot; id=&quot;markdown-toc-miscellaneous-improvements&quot;&gt;Miscellaneous improvements&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;dynamic-scaling--key-groups&quot;&gt;Dynamic Scaling / Key Groups&lt;/h2&gt;

&lt;p&gt;Flink now supports changing the parallelism of a streaming job by restoring it from a savepoint with a different parallelism. Both changing the entire job’s parallelism and operator parallelism is supported.
In the &lt;code&gt;StreamExecutionEnvironment&lt;/code&gt;, users can set a new per-job configuration parameter called “max parallelism”. It determines the upper limit for the parallelism.&lt;/p&gt;

&lt;p&gt;By default, the value is set to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code&gt;128&lt;/code&gt; : for all parallelism &amp;lt;= 128&lt;/li&gt;
  &lt;li&gt;&lt;code&gt;MIN(nextPowerOfTwo(parallelism + (parallelism / 2)), 2^15)&lt;/code&gt;: for all parallelism &amp;gt; 128&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The following built-in functions and operators support rescaling:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Window operator&lt;/li&gt;
  &lt;li&gt;Rolling/Bucketing sink&lt;/li&gt;
  &lt;li&gt;Kafka consumers&lt;/li&gt;
  &lt;li&gt;Continuous File Processing source&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The write-ahead log Cassandra sink and the CEP operator are currently not rescalable. Users using the keyed state interfaces can use the dynamic scaling without changing their code.&lt;/p&gt;

&lt;h2 id=&quot;rescalable-non-partitioned-state&quot;&gt;Rescalable Non-Partitioned State&lt;/h2&gt;

&lt;p&gt;As part of the dynamic scaling effort, the community has also added rescalable non-partitioned state for operators like the Kafka consumer that don’t use keyed state but instead use operator state.&lt;/p&gt;

&lt;p&gt;In case of rescaling, the operator state needs to be redistributed among the parallel consumer instances. In case of the Kafka consumer, the assigned partitions and their offsets are redistributed.&lt;/p&gt;

&lt;h2 id=&quot;processfunction&quot;&gt;ProcessFunction&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;ProcessFunction&lt;/code&gt; is a low-level stream processing operation giving access to the basic building blocks of all (acyclic) streaming applications:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Events (stream elements)&lt;/li&gt;
  &lt;li&gt;State (fault tolerant, consistent)&lt;/li&gt;
  &lt;li&gt;Timers (event time and processing time)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The ProcessFunction can be thought of as a FlatMapFunction with access to keyed state and timers.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html&quot;&gt;ProcessFunction documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;async-io&quot;&gt;Async I/O&lt;/h2&gt;

&lt;p&gt;Flink now has a dedicated Async I/O operator for making blocking calls asynchronously and in a checkpointed fashion. For example, there are many Flink applications that need to query external datastores for each element in a stream. To avoid slowing down the stream to the speed of the external system, the async I/O operator allows requests to overlap.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html&quot;&gt;Async I/O documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;run-flink-with-apache-mesos&quot;&gt;Run Flink with Apache Mesos&lt;/h2&gt;

&lt;p&gt;The latest release further extends Flink’s deployment flexibility by adding support for Apache Mesos and DC/OS. In combination with Marathon, it is now possible to run an highly available Flink cluster on Mesos.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/mesos.html&quot;&gt;Mesos documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;secure-data-access&quot;&gt;Secure Data Access&lt;/h2&gt;

&lt;p&gt;Flink is now able to authenticate against external services such as Zookeeper, Kafka, HDFS and YARN using Kerberos.
Also, experimental support for encryption over the wire has been added.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/security-kerberos.html&quot;&gt;Kerberos documentation&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/security-ssl.html&quot;&gt;SSL setup documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;queryable-state&quot;&gt;Queryable State&lt;/h2&gt;

&lt;p&gt;This experimental feature allows users to query the current state of an operator.
If you have, for example, a flatMap() operator that keeps a running aggregate per key, queryable state allows you to retrieve the current aggregate value at any time by directly connecting to the TaskManager and retrieving that value.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html&quot;&gt;Queryable State documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;backwards-compatible-savepoints&quot;&gt;Backwards compatible savepoints&lt;/h2&gt;

&lt;p&gt;Flink 1.2.0 allows users to restart a job from an 1.1.4 savepoint. This makes major Flink version upgrades possible without losing application state. The following built-in operators are backwards compatible:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Window operator&lt;/li&gt;
  &lt;li&gt;Rolling/Bucketing sink&lt;/li&gt;
  &lt;li&gt;Kafka consumers&lt;/li&gt;
  &lt;li&gt;Continuous File Processing source&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/upgrading.html&quot;&gt;Upgrading Flink applications documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;table-api--sql&quot;&gt;Table API &amp;amp; SQL&lt;/h2&gt;

&lt;p&gt;This release significantly expanded the performance, stability, and coverage of Flink’s Table API and SQL support for batch and streaming tables.&lt;/p&gt;

&lt;p&gt;The community added tumbling, sliding, and session group-window aggregations over streaming tables
  e.g. &lt;code&gt;table.window(Session withGap 10.minutes on &#39;rowtime as &#39;w)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;SQL supports more built-in functions and operations
  e.g. &lt;code&gt;EXISTS&lt;/code&gt;, &lt;code&gt;VALUES&lt;/code&gt;, &lt;code&gt;LIMIT&lt;/code&gt;, &lt;code&gt;CURRENT_DATE&lt;/code&gt;, &lt;code&gt;INITCAP&lt;/code&gt;, &lt;code&gt;NULLIF&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Both APIs support more data types and are better integrated
  e.g. access a POJO field &lt;code&gt;myPojo.get(&#39;field&#39;)&lt;/code&gt;, &lt;code&gt;myPojo.flatten()&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Users can now define their own scalar and table functions
  e.g. &lt;code&gt;table.select(&#39;uid, parse(&#39;field) as &#39;parsed).join(split(&#39;parsed) as &#39;atom)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html&quot;&gt;Flink Table API &amp;amp; SQL documentation&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;miscellaneous-improvements&quot;&gt;Miscellaneous improvements&lt;/h2&gt;
&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Metrics in Flink web interface: A metrics system was added in Flink 1.1, and with this release, Flink provides a new tab in the web frontend to see some of the metrics in the web UI.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Kafka 0.10 support: Flink 1.2 now provides a connector for Apache Kafka 0.10.0.x, including support for consuming and producing messages with a timestamp using Flink’s internal event time (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html&quot;&gt;Kafka Connector Documentation&lt;/a&gt;)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Evictor Semantics: Flink 1.2 ships with more expressive evictor semantics that allow the programmer to evict elements form a window both before and after the application of the window function, and to remove elements arbitrarily (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#evictors&quot;&gt;Evictor Semantics Documentation&lt;/a&gt;)&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;According to git shortlog, the following 122 people contributed to the 1.2.0 release. Thank you to all contributors!&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Abhishek R. Singh&lt;/li&gt;
  &lt;li&gt;Ahmad Ragab&lt;/li&gt;
  &lt;li&gt;Aleksandr Chermenin&lt;/li&gt;
  &lt;li&gt;Alexander Pivovarov&lt;/li&gt;
  &lt;li&gt;Alexander Shoshin&lt;/li&gt;
  &lt;li&gt;Alexey Diomin&lt;/li&gt;
  &lt;li&gt;Aljoscha Krettek&lt;/li&gt;
  &lt;li&gt;Andrey Melentyev&lt;/li&gt;
  &lt;li&gt;Anton Mushin&lt;/li&gt;
  &lt;li&gt;Bob Thorman&lt;/li&gt;
  &lt;li&gt;Boris Osipov&lt;/li&gt;
  &lt;li&gt;Bram Vogelaar&lt;/li&gt;
  &lt;li&gt;Bruno Aranda&lt;/li&gt;
  &lt;li&gt;David Anderson&lt;/li&gt;
  &lt;li&gt;Dominik&lt;/li&gt;
  &lt;li&gt;Evgeny_Kincharov&lt;/li&gt;
  &lt;li&gt;Fabian Hueske&lt;/li&gt;
  &lt;li&gt;Fokko Driesprong&lt;/li&gt;
  &lt;li&gt;Gabor Gevay&lt;/li&gt;
  &lt;li&gt;George&lt;/li&gt;
  &lt;li&gt;Gordon Tai&lt;/li&gt;
  &lt;li&gt;Greg Hogan&lt;/li&gt;
  &lt;li&gt;Gyula Fora&lt;/li&gt;
  &lt;li&gt;Haohui Mai&lt;/li&gt;
  &lt;li&gt;Holger Frydrych&lt;/li&gt;
  &lt;li&gt;HungUnicorn&lt;/li&gt;
  &lt;li&gt;Ismaël Mejía&lt;/li&gt;
  &lt;li&gt;Ivan Mushketyk&lt;/li&gt;
  &lt;li&gt;Jakub Havlik&lt;/li&gt;
  &lt;li&gt;Jark Wu&lt;/li&gt;
  &lt;li&gt;Jendrik Poloczek&lt;/li&gt;
  &lt;li&gt;Jincheng Sun&lt;/li&gt;
  &lt;li&gt;Josh&lt;/li&gt;
  &lt;li&gt;Joshi&lt;/li&gt;
  &lt;li&gt;Keiji Yoshida&lt;/li&gt;
  &lt;li&gt;Kirill Morozov&lt;/li&gt;
  &lt;li&gt;Kurt Young&lt;/li&gt;
  &lt;li&gt;Liwei Lin&lt;/li&gt;
  &lt;li&gt;Lorenz Buehmann&lt;/li&gt;
  &lt;li&gt;Maciek Próchniak&lt;/li&gt;
  &lt;li&gt;Makman2&lt;/li&gt;
  &lt;li&gt;Markus Müller&lt;/li&gt;
  &lt;li&gt;Martin Junghanns&lt;/li&gt;
  &lt;li&gt;Márton Balassi&lt;/li&gt;
  &lt;li&gt;Max Kuklinski&lt;/li&gt;
  &lt;li&gt;Maximilian Michels&lt;/li&gt;
  &lt;li&gt;Milosz Tanski&lt;/li&gt;
  &lt;li&gt;Nagarjun&lt;/li&gt;
  &lt;li&gt;Neelesh Srinivas Salian&lt;/li&gt;
  &lt;li&gt;Neil Derraugh&lt;/li&gt;
  &lt;li&gt;Nick Chadwick&lt;/li&gt;
  &lt;li&gt;Nico Kruber&lt;/li&gt;
  &lt;li&gt;Niels Basjes&lt;/li&gt;
  &lt;li&gt;Pattarawat Chormai&lt;/li&gt;
  &lt;li&gt;Piotr Godek&lt;/li&gt;
  &lt;li&gt;Raghav&lt;/li&gt;
  &lt;li&gt;Ramkrishna&lt;/li&gt;
  &lt;li&gt;Robert Metzger&lt;/li&gt;
  &lt;li&gt;Rohit Agarwal&lt;/li&gt;
  &lt;li&gt;Roman Maier&lt;/li&gt;
  &lt;li&gt;Sachin&lt;/li&gt;
  &lt;li&gt;Sachin Goel&lt;/li&gt;
  &lt;li&gt;Scott Kidder&lt;/li&gt;
  &lt;li&gt;Shannon Carey&lt;/li&gt;
  &lt;li&gt;Stefan Richter&lt;/li&gt;
  &lt;li&gt;Steffen Hausmann&lt;/li&gt;
  &lt;li&gt;Stephan Epping&lt;/li&gt;
  &lt;li&gt;Stephan Ewen&lt;/li&gt;
  &lt;li&gt;Sunny T&lt;/li&gt;
  &lt;li&gt;Suri&lt;/li&gt;
  &lt;li&gt;Theodore Vasiloudis&lt;/li&gt;
  &lt;li&gt;Till Rohrmann&lt;/li&gt;
  &lt;li&gt;Tony Wei&lt;/li&gt;
  &lt;li&gt;Tzu-Li (Gordon) Tai&lt;/li&gt;
  &lt;li&gt;Ufuk Celebi&lt;/li&gt;
  &lt;li&gt;Vijay Srinivasaraghavan&lt;/li&gt;
  &lt;li&gt;Vishnu Viswanath&lt;/li&gt;
  &lt;li&gt;WangTaoTheTonic&lt;/li&gt;
  &lt;li&gt;William-Sang&lt;/li&gt;
  &lt;li&gt;Yassine Marzougui&lt;/li&gt;
  &lt;li&gt;anton solovev&lt;/li&gt;
  &lt;li&gt;beyond1920&lt;/li&gt;
  &lt;li&gt;biao.liub&lt;/li&gt;
  &lt;li&gt;chobeat&lt;/li&gt;
  &lt;li&gt;danielblazevski&lt;/li&gt;
  &lt;li&gt;f7753&lt;/li&gt;
  &lt;li&gt;fengyelei&lt;/li&gt;
  &lt;li&gt;fengyelei 00406569&lt;/li&gt;
  &lt;li&gt;gallenvara&lt;/li&gt;
  &lt;li&gt;gaolun.gl&lt;/li&gt;
  &lt;li&gt;godfreyhe&lt;/li&gt;
  &lt;li&gt;heytitle&lt;/li&gt;
  &lt;li&gt;hzyuemeng1&lt;/li&gt;
  &lt;li&gt;iteblog&lt;/li&gt;
  &lt;li&gt;kl0u&lt;/li&gt;
  &lt;li&gt;larsbachmann&lt;/li&gt;
  &lt;li&gt;lincoln-lil&lt;/li&gt;
  &lt;li&gt;manuzhang&lt;/li&gt;
  &lt;li&gt;medale&lt;/li&gt;
  &lt;li&gt;miaoever&lt;/li&gt;
  &lt;li&gt;mtunique&lt;/li&gt;
  &lt;li&gt;radekg&lt;/li&gt;
  &lt;li&gt;renkai&lt;/li&gt;
  &lt;li&gt;sergey_sokur&lt;/li&gt;
  &lt;li&gt;shijinkui&lt;/li&gt;
  &lt;li&gt;shuai.xus&lt;/li&gt;
  &lt;li&gt;smarthi&lt;/li&gt;
  &lt;li&gt;swapnil-chougule&lt;/li&gt;
  &lt;li&gt;tedyu&lt;/li&gt;
  &lt;li&gt;tibor.moger&lt;/li&gt;
  &lt;li&gt;tonycox&lt;/li&gt;
  &lt;li&gt;twalthr&lt;/li&gt;
  &lt;li&gt;vasia&lt;/li&gt;
  &lt;li&gt;wenlong.lwl&lt;/li&gt;
  &lt;li&gt;wrighe3&lt;/li&gt;
  &lt;li&gt;xiaogang.sxg&lt;/li&gt;
  &lt;li&gt;yushi.wxg&lt;/li&gt;
  &lt;li&gt;yuzhongliu&lt;/li&gt;
  &lt;li&gt;zentol&lt;/li&gt;
  &lt;li&gt;zhuhaifengleon&lt;/li&gt;
  &lt;li&gt;淘江&lt;/li&gt;
  &lt;li&gt;魏偉哲&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 06 Feb 2017 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/02/06/release-1.2.0.html</link>
<guid isPermaLink="true">/news/2017/02/06/release-1.2.0.html</guid>
</item>

<item>
<title>Apache Flink 1.1.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.&lt;/p&gt;

&lt;p&gt;This release includes major robustness improvements for checkpoint cleanup on failures and consumption of intermediate streams. We highly recommend all users to upgrade to Flink 1.1.4.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;note-for-rocksdb-backend-users&quot;&gt;Note for RocksDB Backend Users&lt;/h2&gt;

&lt;p&gt;We updated Flink’s RocksDB dependency version from &lt;code&gt;4.5.1&lt;/code&gt; to &lt;code&gt;4.11.2&lt;/code&gt;. Between these versions some of RocksDB’s internal configuration defaults changed that would affect the memory footprint of running Flink with RocksDB. Therefore, we manually reset them to the previous defaults. If you want to run with the new Rocks 4.11.2 defaults, you can do this via:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;RocksDBStateBackend&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Use the new default options. Otherwise, the default for RocksDB 4.5.1&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// `PredefinedOptions.DEFAULT_ROCKS_4_5_1` will be used.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setPredefinedOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PredefinedOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;DEFAULT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;release-notes---flink---version-114&quot;&gt;Release Notes - Flink - Version 1.1.4&lt;/h2&gt;

&lt;h3 id=&quot;sub-task&quot;&gt;Sub-task&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4510&quot;&gt;FLINK-4510&lt;/a&gt;] -         Always create CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4984&quot;&gt;FLINK-4984&lt;/a&gt;] -         Add Cancellation Barriers to BarrierTracker and BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4985&quot;&gt;FLINK-4985&lt;/a&gt;] -         Report Declined/Canceled Checkpoints to Checkpoint Coordinator
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] -         CompilerException: &amp;quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3680&quot;&gt;FLINK-3680&lt;/a&gt;] -         Remove or improve (not set) text in the Job Plan UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3813&quot;&gt;FLINK-3813&lt;/a&gt;] -         YARNSessionFIFOITCase.testDetachedMode failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4108&quot;&gt;FLINK-4108&lt;/a&gt;] -         NPE in Row.productArity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4506&quot;&gt;FLINK-4506&lt;/a&gt;] -         CsvOutputFormat defaults allowNullValues to false, even though doc and declaration says true
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4581&quot;&gt;FLINK-4581&lt;/a&gt;] -         Table API throws &amp;quot;No suitable driver found for jdbc:calcite&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4586&quot;&gt;FLINK-4586&lt;/a&gt;] -         NumberSequenceIterator and Accumulator threading issue
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4619&quot;&gt;FLINK-4619&lt;/a&gt;] -         JobManager does not answer to client when restore from savepoint fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4727&quot;&gt;FLINK-4727&lt;/a&gt;] -         Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4862&quot;&gt;FLINK-4862&lt;/a&gt;] -         NPE on EventTimeSessionWindows with ContinuousEventTimeTrigger
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4932&quot;&gt;FLINK-4932&lt;/a&gt;] -         Don&amp;#39;t let ExecutionGraph fail when in state Restarting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4933&quot;&gt;FLINK-4933&lt;/a&gt;] -         ExecutionGraph.scheduleOrUpdateConsumers can fail the ExecutionGraph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4977&quot;&gt;FLINK-4977&lt;/a&gt;] -         Enum serialization does not work in all cases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4991&quot;&gt;FLINK-4991&lt;/a&gt;] -         TestTask hangs in testWatchDogInterruptsTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4998&quot;&gt;FLINK-4998&lt;/a&gt;] -         ResourceManager fails when num task slots &amp;gt; Yarn vcores
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5013&quot;&gt;FLINK-5013&lt;/a&gt;] -         Flink Kinesis connector doesn&amp;#39;t work on old EMR versions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5028&quot;&gt;FLINK-5028&lt;/a&gt;] -         Stream Tasks must not go through clean shutdown logic on cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5038&quot;&gt;FLINK-5038&lt;/a&gt;] -         Errors in the &amp;quot;cancelTask&amp;quot; method prevent closeables from being closed early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5039&quot;&gt;FLINK-5039&lt;/a&gt;] -         Avro GenericRecord support is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5040&quot;&gt;FLINK-5040&lt;/a&gt;] -         Set correct input channel types with eager scheduling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5050&quot;&gt;FLINK-5050&lt;/a&gt;] -         JSON.org license is CatX
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5057&quot;&gt;FLINK-5057&lt;/a&gt;] -         Cancellation timeouts are picked from wrong config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5058&quot;&gt;FLINK-5058&lt;/a&gt;] -         taskManagerMemory attribute set wrong value in FlinkShell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5063&quot;&gt;FLINK-5063&lt;/a&gt;] -         State handles are not properly cleaned up for declined or expired checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5073&quot;&gt;FLINK-5073&lt;/a&gt;] -         ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5075&quot;&gt;FLINK-5075&lt;/a&gt;] -         Kinesis consumer incorrectly determines shards as newly discovered when tested against Kinesalite
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5082&quot;&gt;FLINK-5082&lt;/a&gt;] -         Pull ExecutionService lifecycle management out of the JobManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5085&quot;&gt;FLINK-5085&lt;/a&gt;] -         Execute CheckpointCoodinator&amp;#39;s state discard calls asynchronously
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5114&quot;&gt;FLINK-5114&lt;/a&gt;] -         PartitionState update with finished execution fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5142&quot;&gt;FLINK-5142&lt;/a&gt;] -         Resource leak in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5149&quot;&gt;FLINK-5149&lt;/a&gt;] -         ContinuousEventTimeTrigger doesn&amp;#39;t fire at the end of the window
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5154&quot;&gt;FLINK-5154&lt;/a&gt;] -         Duplicate TypeSerializer when writing RocksDB Snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5158&quot;&gt;FLINK-5158&lt;/a&gt;] -         Handle ZooKeeperCompletedCheckpointStore exceptions in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5172&quot;&gt;FLINK-5172&lt;/a&gt;] -         In RocksDBStateBackend, set flink-core and flink-streaming-java to &amp;quot;provided&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5173&quot;&gt;FLINK-5173&lt;/a&gt;] -         Upgrade RocksDB dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5184&quot;&gt;FLINK-5184&lt;/a&gt;] -         Error result of compareSerialized in RowComparator class
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5193&quot;&gt;FLINK-5193&lt;/a&gt;] -         Recovering all jobs fails completely if a single recovery fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5197&quot;&gt;FLINK-5197&lt;/a&gt;] -         Late JobStatusChanged messages can interfere with running jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5214&quot;&gt;FLINK-5214&lt;/a&gt;] -         Clean up checkpoint files when failing checkpoint operation on TM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5215&quot;&gt;FLINK-5215&lt;/a&gt;] -         Close checkpoint streams upon cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5216&quot;&gt;FLINK-5216&lt;/a&gt;] -         CheckpointCoordinator&amp;#39;s &amp;#39;minPauseBetweenCheckpoints&amp;#39; refers to checkpoint start rather then checkpoint completion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5218&quot;&gt;FLINK-5218&lt;/a&gt;] -         Eagerly close checkpoint streams on cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5228&quot;&gt;FLINK-5228&lt;/a&gt;] -         LocalInputChannel re-trigger request and release deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5229&quot;&gt;FLINK-5229&lt;/a&gt;] -         Cleanup StreamTaskStates if a checkpoint operation of a subsequent operator fails 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5246&quot;&gt;FLINK-5246&lt;/a&gt;] -         Don&amp;#39;t discard unknown checkpoint messages in the CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5248&quot;&gt;FLINK-5248&lt;/a&gt;] -         SavepointITCase doesn&amp;#39;t catch savepoint restore failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5274&quot;&gt;FLINK-5274&lt;/a&gt;] -         LocalInputChannel throws NPE if partition reader is released
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5275&quot;&gt;FLINK-5275&lt;/a&gt;] -         InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5276&quot;&gt;FLINK-5276&lt;/a&gt;] -         ExecutionVertex archiving can throw NPE with many previous attempts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5285&quot;&gt;FLINK-5285&lt;/a&gt;] -         CancelCheckpointMarker flood when using at least once mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5326&quot;&gt;FLINK-5326&lt;/a&gt;] -         IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data,  but none was available
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5352&quot;&gt;FLINK-5352&lt;/a&gt;] -         Restore RocksDB 1.1.3 memory behavior
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3347&quot;&gt;FLINK-3347&lt;/a&gt;] -         TaskManager (or its ActorSystem) need to restart in case they notice quarantine
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3787&quot;&gt;FLINK-3787&lt;/a&gt;] -         Yarn client does not report unfulfillable container constraints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4445&quot;&gt;FLINK-4445&lt;/a&gt;] -         Ignore unmatched state when restoring from savepoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4715&quot;&gt;FLINK-4715&lt;/a&gt;] -         TaskManager should commit suicide after cancellation failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4894&quot;&gt;FLINK-4894&lt;/a&gt;] -         Don&amp;#39;t block on buffer request after broadcastEvent 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4975&quot;&gt;FLINK-4975&lt;/a&gt;] -         Add a limit for how much data may be buffered during checkpoint alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4996&quot;&gt;FLINK-4996&lt;/a&gt;] -         Make CrossHint @Public
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5046&quot;&gt;FLINK-5046&lt;/a&gt;] -         Avoid redundant serialization when creating the TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5123&quot;&gt;FLINK-5123&lt;/a&gt;] -         Add description how to do proper shading to Flink docs.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5169&quot;&gt;FLINK-5169&lt;/a&gt;] -         Make consumption of input channels fair
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5192&quot;&gt;FLINK-5192&lt;/a&gt;] -         Provide better log config templates
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5194&quot;&gt;FLINK-5194&lt;/a&gt;] -         Log heartbeats on TRACE level
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5196&quot;&gt;FLINK-5196&lt;/a&gt;] -         Don&amp;#39;t log InputChannelDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5198&quot;&gt;FLINK-5198&lt;/a&gt;] -         Overwrite TaskState toString
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5199&quot;&gt;FLINK-5199&lt;/a&gt;] -         Improve logging of submitted job graph actions in HA case
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5201&quot;&gt;FLINK-5201&lt;/a&gt;] -         Promote loaded config properties to INFO
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5207&quot;&gt;FLINK-5207&lt;/a&gt;] -         Decrease HadoopFileSystem logging
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5249&quot;&gt;FLINK-5249&lt;/a&gt;] -         description of datastream rescaling doesn&amp;#39;t match the figure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5259&quot;&gt;FLINK-5259&lt;/a&gt;] -         wrong execution environment in retry delays example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5278&quot;&gt;FLINK-5278&lt;/a&gt;] -         Improve Task and checkpoint logging 
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;new-feature&quot;&gt;New Feature&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4976&quot;&gt;FLINK-4976&lt;/a&gt;] -         Add a way to abort in flight checkpoints
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;task&quot;&gt;Task&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4778&quot;&gt;FLINK-4778&lt;/a&gt;] -         Update program example in /docs/setup/cli.md due to the change in FLINK-2021
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Wed, 21 Dec 2016 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/12/21/release-1.1.4.html</link>
<guid isPermaLink="true">/news/2016/12/21/release-1.1.4.html</guid>
</item>

<item>
<title>Apache Flink in 2016: Year in Review</title>
<description>&lt;p&gt;2016 was an exciting year for the Apache Flink® community, and the
  &lt;a href=&quot;http://flink.apache.org/news/2016/03/08/release-1.0.0.html&quot; target=&quot;_blank&quot;&gt;release of Flink 1.0 in March&lt;/a&gt;
   marked the first time in Flink’s history that the community guaranteed API backward compatibility for all
   versions in a series. This step forward for Flink was followed by many new and exciting production deployments
   in organizations of all shapes and sizes, all around the globe.&lt;/p&gt;

&lt;p&gt;In this post, we’ll look back on the project’s progress over the course of 2016, and
we’ll also preview what 2017 has in store.&lt;/p&gt;

&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#community-growth&quot; id=&quot;markdown-toc-community-growth&quot;&gt;Community Growth&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#github&quot; id=&quot;markdown-toc-github&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#meetups&quot; id=&quot;markdown-toc-meetups&quot;&gt;Meetups&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#flink-forward-2016&quot; id=&quot;markdown-toc-flink-forward-2016&quot;&gt;Flink Forward 2016&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#features-and-ecosystem&quot; id=&quot;markdown-toc-features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#flink-ecosystem-growth&quot; id=&quot;markdown-toc-flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#feature-timeline-in-2016&quot; id=&quot;markdown-toc-feature-timeline-in-2016&quot;&gt;Feature Timeline in 2016&lt;/a&gt;&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#looking-ahead-to-2017&quot; id=&quot;markdown-toc-looking-ahead-to-2017&quot;&gt;Looking ahead to 2017&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;/div&gt;

&lt;h2 id=&quot;community-growth&quot;&gt;Community Growth&lt;/h2&gt;

&lt;h3 id=&quot;github&quot;&gt;Github&lt;/h3&gt;
&lt;p&gt;First, here’s a summary of community statistics from &lt;a href=&quot;https://github.com/apache/flink&quot; target=&quot;_blank&quot;&gt;GitHub&lt;/a&gt;. At the time of writing:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;&lt;b&gt;Contributors&lt;/b&gt; have increased from 150 in December 2015 to 258 in December 2016 (up &lt;b&gt;72%&lt;/b&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;b&gt;Stars&lt;/b&gt; have increased from 813 in December 2015 to 1830 in December 2016 (up &lt;b&gt;125%&lt;/b&gt;)&lt;/li&gt;
  &lt;li&gt;&lt;b&gt;Forks&lt;/b&gt; have increased from 544 in December 2015 to 1255 in December 2016 (up &lt;b&gt;130%&lt;/b&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The community also welcomed &lt;b&gt;3 new committers in 2016&lt;/b&gt;: Chengxiang Li, Greg Hogan, and Tzu-Li (Gordon) Tai.&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;img src=&quot;/img/blog/github-stats-2016.png&quot; width=&quot;775&quot; alt=&quot;Apache Flink GitHub Stats&quot; /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;Next, let’s take a look at a few other project stats, starting with number of commits. If we run:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git log --pretty=oneline --after=12/31/2015 | wc -l
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;…inside the Flink repository, we’ll see a total of &lt;strong&gt;1884&lt;/strong&gt; commits so far in 2016, bringing the all-time total commits to &lt;strong&gt;10,015&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Now, let’s go a bit deeper. And here are instructions in case you’d like to take a look at this data yourself.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Download gitstats from the &lt;a href=&quot;http://gitstats.sourceforge.net/&quot;&gt;project homepage&lt;/a&gt;. Or, on OS X with homebrew, type:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;brew install --HEAD homebrew/head-only/gitstats
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;Clone the Apache Flink git repository:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git clone git@github.com:apache/flink.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;Generate the statistics&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;gitstats flink/ flink-stats/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;View all the statistics as an html page using your defaulf browser:&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;open flink-stats/index.html
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;2016 is the year that Flink surpassed 1 million lines of code, now clocking in at &lt;strong&gt;1,034,137&lt;/strong&gt; lines.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-lines-of-code-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Total Lines of Code&quot; /&gt;&lt;/p&gt;

&lt;p&gt;Monday remains the day of the week with the most commits over the project’s history:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-dow-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Commits by Day of Week&quot; /&gt;&lt;/p&gt;

&lt;p&gt;And 5pm is still solidly the preferred commit time:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-hod-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Commits by Hour of Day&quot; /&gt;&lt;/p&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;h3 id=&quot;meetups&quot;&gt;Meetups&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://www.meetup.com/topics/apache-flink/&quot; target=&quot;_blank&quot;&gt;Apache Flink Meetup membership&lt;/a&gt; grew by &lt;b&gt;240%&lt;/b&gt;
this year, and at the time of writing, there are 41 meetups comprised of 16,541 members listing Flink as a topic–up from 16 groups with 4,864 members in December 2015.
The Flink community is proud to be truly global in nature.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-meetups-dec2016.png&quot; width=&quot;775&quot; alt=&quot;Apache Flink Meetup Map&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;flink-forward-2016&quot;&gt;Flink Forward 2016&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;http://2016.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;second annual Flink Forward conference &lt;/a&gt;took place in
Berlin on September 12-14, and over 350 members of the Flink community came together for speaker sessions, training,
and discussion about Flink. &lt;a href=&quot;http://2016.flink-forward.org/program/sessions/&quot; target=&quot;_blank&quot;&gt;Slides and videos&lt;/a&gt;
 from speaker sessions are available online, and we encourage you to take a look if you’re interested in learning more
 about how Flink is used in production in a wide range of organizations.&lt;/p&gt;

&lt;p&gt;Flink Forward will be expanding to &lt;a href=&quot;http://sf.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;San Francisco in April 2017&lt;/a&gt;, and the &lt;a href=&quot;http://berlin.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;third-annual Berlin event
  is scheduled for September 2017.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/speaker-logos-ff2016.png&quot; width=&quot;775&quot; alt=&quot;Flink Forward Speakers&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/h2&gt;

&lt;h3 id=&quot;flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/h3&gt;

&lt;p&gt;Flink was added to a selection of distributions during 2016, making it easier
for an even larger base of users to start working with Flink:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/blogs/big-data/use-apache-flink-on-amazon-emr/&quot; target=&quot;_blank&quot;&gt;
    Amazon EMR&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://cloud.google.com/dataproc/docs/release-notes/service#november_29_2016&quot; target=&quot;_blank&quot;&gt;
    Google Cloud Dataproc&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://www.lightbend.com/blog/introducing-lightbend-fast-data-platform&quot; target=&quot;_blank&quot;&gt;
    Lightbend Fast Data Platform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In addition, the Apache Beam and Flink communities teamed up to build a Flink runner for Beam that, according to the Google team, is &lt;a href=&quot;https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective&quot; target=&quot;_blank&quot;&gt;“sophisticated enough to be a compelling alternative to Cloud Dataflow when running on premise or on non-Google clouds”&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;feature-timeline-in-2016&quot;&gt;Feature Timeline in 2016&lt;/h3&gt;

&lt;p&gt;Here’s a selection of major features added to Flink over the course of 2016:&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/img/blog/flink-releases-2016.png&quot; width=&quot;775&quot; alt=&quot;Flink Release Timeline 2016&quot; /&gt;&lt;/p&gt;

&lt;p&gt;If you spend time in the &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4554?jql=project%20%3D%20FLINK%20AND%20issuetype%20%3D%20%22New%20Feature%22%20AND%20status%20%3D%20Resolved%20ORDER%20BY%20resolved%20DESC&quot; target=&quot;_blank&quot;&gt;Apache Flink JIRA project&lt;/a&gt;, you’ll see that the Flink community has addressed every single one of the roadmap items identified
in &lt;a href=&quot;http://flink.apache.org/news/2015/12/18/a-year-in-review.html&quot; target=&quot;_blank&quot;&gt;2015’s year in review post&lt;/a&gt;. Here’s to making that an annual tradition. :)&lt;/p&gt;

&lt;h2 id=&quot;looking-ahead-to-2017&quot;&gt;Looking ahead to 2017&lt;/h2&gt;

&lt;p&gt;A good source of information about the Flink community’s roadmap is the list of
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals&quot; target=&quot;_blank&quot;&gt;Flink
Improvement Proposals (FLIPs)&lt;/a&gt; in the project wiki. Below, we’ll highlight a selection of FLIPs
that have been accepted by the community as well as some that are still under discussion.&lt;/p&gt;

&lt;p&gt;We should note that work is already underway on a number of these features, and some will even be included in Flink 1.2 at the beginning of 2017.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;A new Flink deployment and process model&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot; target=&quot;_blank&quot;&gt;FLIP-6&lt;a&gt;&lt;/a&gt;. This work ensures that Flink supports a wide
range of deployment types and cluster managers, making it possible to run Flink smoothly in any environment.&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Dynamic scaling&lt;/strong&gt; for both key-value state &lt;a href=&quot;https://github.com/apache/flink/pull/2440&quot; target=&quot;_blank&quot;&gt;(as described in
this PR)&lt;a&gt;&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; non-partitioned state &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-8%3A+Rescalable+Non-Partitioned+State&quot; target=&quot;_blank&quot;&gt;(as described in FLIP-8)&lt;a&gt;&lt;/a&gt;, ensuring that it’s always possible to split or merge state when scaling up or down, respectively.&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Asynchronous I/O&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673&quot; target=&quot;_blank&quot;&gt;FLIP-12
&lt;/a&gt;, which makes I/O access a less time-consuming process without adding complexity or the need for extra checkpoint coordination.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Enhancements to the window evictor&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-4+%3A+Enhance+Window+Evictor&quot; target=&quot;_blank&quot;&gt;FLIP-4&lt;/a&gt;,
to provide users with more control over how elements are evicted from a window.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Fined-grained recovery from task failures&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot; target=&quot;_blank&quot;&gt;FLIP-1&lt;/a&gt;,
to make it possible to restart only what needs to be restarted during recovery, building on cached intermediate results.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Unified checkpoints and savepoints&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-10%3A+Unify+Checkpoints+and+Savepoints&quot; target=&quot;_blank&quot;&gt;FLIP-10&lt;/a&gt;, to
allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both
 the job and Flink version whereas checkpoints can only be recovered with the same job.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Table API window aggregations&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot; target=&quot;_blank&quot;&gt;FLIP-11&lt;/a&gt;, to support group-window and row-window aggregates on streaming and batch tables.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Side inputs&lt;/strong&gt;, as described in &lt;a href=&quot;https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit&quot; target=&quot;_blank&quot;&gt;this design document&lt;/a&gt;, to
enable the joining of a main, high-throughput stream with one more more inputs with static or slowly-changing data.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Lastly, we’d like to extend a sincere thank you to all of the Flink community for making 2016 a great year!&lt;/p&gt;
</description>
<pubDate>Mon, 19 Dec 2016 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/12/19/2016-year-in-review.html</link>
<guid isPermaLink="true">/news/2016/12/19/2016-year-in-review.html</guid>
</item>

<item>
<title>Apache Flink 1.1.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1. series.&lt;/p&gt;

&lt;p&gt;We recommend all users to upgrade to Flink 1.1.3.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;note-for-rocksdb-backend-users&quot;&gt;Note for RocksDB Backend Users&lt;/h2&gt;

&lt;p&gt;It is highly recommended to use the “fully async” mode for the RocksDB state backend. The “fully async” mode will most likely allow you to easily upgrade to Flink 1.2 (via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/savepoints.html&quot;&gt;savepoints&lt;/a&gt;) when it is released. The “semi async” mode will no longer be supported by Flink 1.2.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;RocksDBStateBackend&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;enableFullyAsyncSnapshots&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;release-notes---flink---version-113&quot;&gt;Release Notes - Flink - Version 1.1.3&lt;/h2&gt;

&lt;h2&gt;        Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] -         CompilerException: &amp;quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4311&quot;&gt;FLINK-4311&lt;/a&gt;] -         TableInputFormat fails when reused on next split
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4329&quot;&gt;FLINK-4329&lt;/a&gt;] -         Fix Streaming File Source Timestamps/Watermarks Handling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4485&quot;&gt;FLINK-4485&lt;/a&gt;] -         Finished jobs in yarn session fill /tmp filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4513&quot;&gt;FLINK-4513&lt;/a&gt;] -         Kafka connector documentation refers to Flink 1.1-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4514&quot;&gt;FLINK-4514&lt;/a&gt;] -         ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4540&quot;&gt;FLINK-4540&lt;/a&gt;] -         Detached job execution may prevent cluster shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4544&quot;&gt;FLINK-4544&lt;/a&gt;] -         TaskManager metrics are vulnerable to custom JMX bean installation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4566&quot;&gt;FLINK-4566&lt;/a&gt;] -         ProducerFailedException does not properly preserve Exception causes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4588&quot;&gt;FLINK-4588&lt;/a&gt;] -         Fix Merging of Covering Window in MergingWindowSet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4589&quot;&gt;FLINK-4589&lt;/a&gt;] -         Fix Merging of Covering Window in MergingWindowSet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4616&quot;&gt;FLINK-4616&lt;/a&gt;] -         Kafka consumer doesn&amp;#39;t store last emmited watermarks per partition in state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4618&quot;&gt;FLINK-4618&lt;/a&gt;] -         FlinkKafkaConsumer09 should start from the next record on startup from offsets in Kafka
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4619&quot;&gt;FLINK-4619&lt;/a&gt;] -         JobManager does not answer to client when restore from savepoint fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4636&quot;&gt;FLINK-4636&lt;/a&gt;] -         AbstractCEPPatternOperator fails to restore state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4640&quot;&gt;FLINK-4640&lt;/a&gt;] -         Serialization of the initialValue of a Fold on WindowedStream fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4651&quot;&gt;FLINK-4651&lt;/a&gt;] -         Re-register processing time timers at the WindowOperator upon recovery.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4663&quot;&gt;FLINK-4663&lt;/a&gt;] -         Flink JDBCOutputFormat logs wrong WARN message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4672&quot;&gt;FLINK-4672&lt;/a&gt;] -         TaskManager accidentally decorates Kill messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4677&quot;&gt;FLINK-4677&lt;/a&gt;] -         Jars with no job executions produces NullPointerException in ClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4702&quot;&gt;FLINK-4702&lt;/a&gt;] -         Kafka consumer must commit offsets asynchronously
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4727&quot;&gt;FLINK-4727&lt;/a&gt;] -         Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4732&quot;&gt;FLINK-4732&lt;/a&gt;] -         Maven junction plugin security threat
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4777&quot;&gt;FLINK-4777&lt;/a&gt;] -         ContinuousFileMonitoringFunction may throw IOException when files are moved
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4788&quot;&gt;FLINK-4788&lt;/a&gt;] -         State backend class cannot be loaded, because fully qualified name converted to lower-case
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;        Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4396&quot;&gt;FLINK-4396&lt;/a&gt;] -         GraphiteReporter class not found at startup of jobmanager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4574&quot;&gt;FLINK-4574&lt;/a&gt;] -         Strengthen fetch interval implementation in Kinesis consumer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4723&quot;&gt;FLINK-4723&lt;/a&gt;] -         Unify behaviour of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 12 Oct 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/10/12/release-1.1.3.html</link>
<guid isPermaLink="true">/news/2016/10/12/release-1.1.3.html</guid>
</item>

<item>
<title>Apache Flink 1.1.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.&lt;/p&gt;

&lt;p&gt;We recommend all users to upgrade to Flink 1.1.2.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;Release Notes - Flink - Version 1.1.2&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4236&quot;&gt;FLINK-4236&lt;/a&gt;] -         Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4309&quot;&gt;FLINK-4309&lt;/a&gt;] -         Potential null pointer dereference in DelegatingConfiguration#keySet()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4334&quot;&gt;FLINK-4334&lt;/a&gt;] -         Shaded Hadoop1 jar not fully excluded in Quickstart
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4341&quot;&gt;FLINK-4341&lt;/a&gt;] -         Kinesis connector does not emit maximum watermark properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4402&quot;&gt;FLINK-4402&lt;/a&gt;] -         Wrong metrics parameter names in documentation 
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4409&quot;&gt;FLINK-4409&lt;/a&gt;] -         class conflict between jsr305-1.3.9.jar and flink-shaded-hadoop2-1.1.1.jar
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4411&quot;&gt;FLINK-4411&lt;/a&gt;] -         [py] Chained dual input children are not properly propagated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4412&quot;&gt;FLINK-4412&lt;/a&gt;] -         [py] Chaining does not properly handle broadcast variables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4425&quot;&gt;FLINK-4425&lt;/a&gt;] -         &amp;quot;Out Of Memory&amp;quot; during savepoint deserialization
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4454&quot;&gt;FLINK-4454&lt;/a&gt;] -         Lookups for JobManager address in config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4480&quot;&gt;FLINK-4480&lt;/a&gt;] -         Incorrect link to elastic.co in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4486&quot;&gt;FLINK-4486&lt;/a&gt;] -         JobManager not fully running when yarn-session.sh finishes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4488&quot;&gt;FLINK-4488&lt;/a&gt;] -         Prevent cluster shutdown after job execution for non-detached jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4514&quot;&gt;FLINK-4514&lt;/a&gt;] -         ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4526&quot;&gt;FLINK-4526&lt;/a&gt;] -         ApplicationClient: remove redundant proxy messages
&lt;/li&gt;

&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3866&quot;&gt;FLINK-3866&lt;/a&gt;] -         StringArraySerializer claims type is immutable; shouldn&amp;#39;t
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3899&quot;&gt;FLINK-3899&lt;/a&gt;] -         Document window processing with Reduce/FoldFunction + WindowFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4302&quot;&gt;FLINK-4302&lt;/a&gt;] -         Add JavaDocs to MetricConfig
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4495&quot;&gt;FLINK-4495&lt;/a&gt;] -         Running multiple jobs on yarn (without yarn-session)
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Mon, 05 Sep 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/09/05/release-1.1.2.html</link>
<guid isPermaLink="true">/news/2016/09/05/release-1.1.2.html</guid>
</item>

<item>
<title>Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion</title>
<description>&lt;p&gt;An update for the Flink community: the &lt;a href=&quot;http://flink-forward.org/kb_day/day-1/&quot;&gt;Flink Forward 2016 schedule&lt;/a&gt; is now available online. This year&#39;s event will include 2 days of talks from stream processing experts at Google, MapR, Alibaba, Netflix, Cloudera, and more. Following the talks is a full day of hands-on Flink training.&lt;/p&gt;

&lt;p&gt;Ted Dunning has been announced as a keynote speaker at the event. Ted is the VP of Incubator at &lt;a href=&quot;http://www.apache.org&quot;&gt;Apache Software Foundation&lt;/a&gt;, the Chief Application Architect at &lt;a href=&quot;http://www.mapr.com&quot;&gt;MapR Technologies&lt;/a&gt;, and a mentor on many recent projects. He&#39;ll present &lt;a href=&quot;http://flink-forward.org/kb_sessions/keynote-tba/&quot;&gt;&quot;How Can We Take Flink Forward?&quot;&lt;/a&gt; on the second day of the conference.&lt;/p&gt;

&lt;p&gt;Following Ted&#39;s keynote there will be a panel discussion on &lt;a href=&quot;http://flink-forward.org/kb_sessions/panel-large-scale-streaming-in-production/&quot;&gt;&quot;Large Scale Streaming in Production&quot;&lt;/a&gt;. As stream processing systems become more mainstream, companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang &lt;a href=&quot;http://www.alibaba.com&quot;&gt;(Alibaba)&lt;/a&gt;, Monal Daxini &lt;a href=&quot;http://www.netflix.com&quot;&gt;(Netflix)&lt;/a&gt;, Maxim Fateev &lt;a href=&quot;http://www.uber.com&quot;&gt;(Uber)&lt;/a&gt;, and Ted Dunning &lt;a href=&quot;http://www.mapr.com&quot;&gt;(MapR Technologies)&lt;/a&gt; on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scale. The panel will be moderated by Jamie Grier &lt;a href=&quot;http://www.data-artisans.com&quot;&gt;(data Artisans)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The welcome keynote on Monday, September 12, will be presented by data Artisans&#39; co-founders Kostas Tzoumas and Stephan Ewen. They will talk about &lt;a href=&quot;http://flink-forward.org/kb_sessions/keynote-tba-2/&quot;&gt;&quot;The maturing data streaming ecosystem and Apache Flink’s accelerated growth&quot;&lt;/a&gt;. In this talk, Kostas and Stephan discuss several large-scale stream processing use cases that the data Artisans team has seen over the past year.&lt;/p&gt;

&lt;p&gt;And one more recent addition to the program: Maxim Fateev of Uber will present &lt;a href=&quot;http://flink-forward.org/kb_sessions/beyond-the-watermark-on-demand-backfilling-in-flink/&quot;&gt;&quot;Beyond the Watermark: On-Demand Backfilling in Flink&quot;&lt;/a&gt;. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim&#39;s talk covers Uber&#39;s solution for on-demand backfilling.&lt;/p&gt;

&lt;p&gt;We hope to see many community members at Flink Forward 2016. Registration is available online: &lt;a href=&quot;http://flink-forward.org/registration/&quot;&gt;flink-forward.org/registration&lt;/a&gt;
&lt;/p&gt;
</description>
<pubDate>Wed, 24 Aug 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/24/ff16-keynotes-panels.html</link>
<guid isPermaLink="true">/news/2016/08/24/ff16-keynotes-panels.html</guid>
</item>

<item>
<title>Flink 1.1.1 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version 1.1.1.&lt;/p&gt;

&lt;p&gt;The Maven artifacts published on Maven central for 1.1.0 had a Hadoop dependency issue: No Hadoop 1 specific version (with version 1.1.0-hadoop1) was deployed and 1.1.0 artifacts have a dependency on Hadoop 1 instead of Hadoop 2.&lt;/p&gt;

&lt;p&gt;This was fixed with this release and we &lt;strong&gt;highly recommend&lt;/strong&gt; all users to use this version of Flink by bumping your Flink dependencies to version 1.1.1:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Thu, 11 Aug 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/11/release-1.1.1.html</link>
<guid isPermaLink="true">/news/2016/08/11/release-1.1.1.html</guid>
</item>

<item>
<title>Announcing Apache Flink 1.1.0</title>
<description>&lt;div class=&quot;alert alert-success&quot;&gt;&lt;strong&gt;Important&lt;/strong&gt;: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use &lt;strong&gt;1.1.1&lt;/strong&gt; or &lt;strong&gt;1.1.1-hadoop1&lt;/strong&gt; as the Flink version.&lt;/div&gt;

&lt;p&gt;The Apache Flink community is pleased to announce the availability of Flink 1.1.0.&lt;/p&gt;

&lt;p&gt;This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the &lt;a href=&quot;/blog/release_1.1.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/&quot;&gt;check out the documentation&lt;/a&gt;. Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Some highlights of the release are listed in the following sections.&lt;/p&gt;

&lt;h2 id=&quot;connectors&quot;&gt;Connectors&lt;/h2&gt;

&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/index.html&quot;&gt;streaming connectors&lt;/a&gt; are a major part of Flink’s DataStream API. This release adds support for new external systems and further improves on the available connectors.&lt;/p&gt;

&lt;h3 id=&quot;continuous-file-system-sources&quot;&gt;Continuous File System Sources&lt;/h3&gt;

&lt;p&gt;A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via &lt;code&gt;FileProcessingMode&lt;/code&gt;s:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readFile&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;textInputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;&amp;quot;hdfs:///file-path&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;FileProcessingMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;PROCESS_CONTINUOUSLY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
  &lt;span class=&quot;mi&quot;&gt;5000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// monitoring interval (millis)&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;FilePathFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createDefaultFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// file path filter&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This will monitor &lt;code&gt;hdfs:///file-path&lt;/code&gt; every &lt;code&gt;5000&lt;/code&gt; milliseconds. Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/index.html#data-sources&quot;&gt;DataSource documentation for more details&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;kinesis-source-and-sink&quot;&gt;Kinesis Source and Sink&lt;/h3&gt;

&lt;p&gt;Flink 1.1 adds a Kinesis connector for both consuming (&lt;code&gt;FlinkKinesisConsumer&lt;/code&gt;) from and producing (&lt;code&gt;FlinkKinesisProduer&lt;/code&gt;) to &lt;a href=&quot;https://aws.amazon.com/kinesis/&quot;&gt;Amazon Kinesis Streams&lt;/a&gt;, which is a managed service purpose-built to make it easy to work with streaming data on AWS.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kinesis&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkKinesisConsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;stream-name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/kinesis.html&quot;&gt;Kinesis connector documentation for more details&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;cassandra-sink&quot;&gt;Cassandra Sink&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;http://wiki.apache.org/cassandra/GettingStarted&quot;&gt;Apache Cassandra&lt;/a&gt; sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;CassandraSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/cassandra.html&quot;&gt;Cassandra Sink documentation for more details&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;table-api-and-sql&quot;&gt;Table API and SQL&lt;/h2&gt;

&lt;p&gt;The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala).&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custDs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;name, zipcode&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;zipcode = &amp;#39;12345&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
  &lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT STREAM product, amount FROM Orders WHERE product LIKE &amp;#39;%Rubber%&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A more detailed introduction can be found in the &lt;a href=&quot;http://flink.apache.org/news/2016/05/24/stream-sql.html&quot;&gt;Flink blog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/table.html&quot;&gt;Table API documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;datastream-api&quot;&gt;DataStream API&lt;/h2&gt;

&lt;p&gt;The DataStream API now exposes &lt;strong&gt;session windows&lt;/strong&gt; and &lt;strong&gt;allowed lateness&lt;/strong&gt; as first-class citizens.&lt;/p&gt;

&lt;h3 id=&quot;session-windows&quot;&gt;Session Windows&lt;/h3&gt;

&lt;p&gt;Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a &lt;em&gt;certain period of inactivity&lt;/em&gt;. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/session-windows.svg&quot; style=&quot;height:400px&quot; /&gt;
&lt;/center&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;selector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EventTimeSessionWindows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;withGap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;minutes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;support-for-late-elements&quot;&gt;Support for Late Elements&lt;/h3&gt;

&lt;p&gt;You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called &lt;em&gt;allowed lateness&lt;/em&gt;. This specifies by how much time elements can be late.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;selector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;assigner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;allowedLateness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Elements that arrive within the allowed lateness are still put into windows and are considered when computing window results. If elements arrive after the allowed lateness they will be dropped. Flink will also make sure that any state held by the windowing operation is garbage collected once the watermark passes the end of a window plus the allowed lateness.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/windows.html&quot;&gt;Windows documentation for more details&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;scala-api-for-complex-event-processing-cep&quot;&gt;Scala API for Complex Event Processing (CEP)&lt;/h2&gt;

&lt;p&gt;Flink 1.0 added the initial version of the CEP library. The core of the library is a Pattern API, which allows you to easily specify patterns to match against in your event stream. While in Flink 1.0 this API was only available for Java, Flink 1.1. now exposes the same API for Scala, allowing you to specify your event patterns in a more concise manner.&lt;/p&gt;

&lt;p&gt;A more detailed introduction can be found in the &lt;a href=&quot;http://flink.apache.org/news/2016/04/06/cep-monitoring.html&quot;&gt;Flink blog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/libs/cep.html&quot;&gt;CEP documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;graph-generators-and-new-gelly-library-algorithms&quot;&gt;Graph generators and new Gelly library algorithms&lt;/h2&gt;

&lt;p&gt;This release includes many enhancements and new features for graph processing. Gelly now provides a collection of scalable graph generators for common graph types, such as complete, cycle, grid, hypercube, and RMat graphs. A variety of new graph algorithms have been added to the Gelly library, including Global and Local Clustering Coefficient, HITS, and similarity measures (Jaccard and Adamic-Adar).&lt;/p&gt;

&lt;p&gt;For a full list of new graph processing features, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/batch/libs/gelly.html&quot;&gt;Gelly documentation&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;metrics&quot;&gt;Metrics&lt;/h2&gt;

&lt;p&gt;Flink’s new metrics system allows you to easily gather and expose metrics from your user application to external systems. You can add counters, gauges, and histograms to your application via the runtime context:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;counter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getMetricGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;counter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;All registered metrics will be exposed via reporters. Out of the box, Flinks comes with support for JMX, Ganglia, Graphite, and statsD. In addition to your custom metrics, Flink exposes many internal metrics like checkpoint sizes and JVM stats.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html&quot;&gt;Metrics documentation for more details&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;

&lt;p&gt;The following 95 people contributed to this release:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Abdullah Ozturk&lt;/li&gt;
  &lt;li&gt;Ajay Bhat&lt;/li&gt;
  &lt;li&gt;Alexey Savartsov&lt;/li&gt;
  &lt;li&gt;Aljoscha Krettek&lt;/li&gt;
  &lt;li&gt;Andrea Sella&lt;/li&gt;
  &lt;li&gt;Andrew Palumbo&lt;/li&gt;
  &lt;li&gt;Chenguang He&lt;/li&gt;
  &lt;li&gt;Chiwan Park&lt;/li&gt;
  &lt;li&gt;David Moravek&lt;/li&gt;
  &lt;li&gt;Dominik Bruhn&lt;/li&gt;
  &lt;li&gt;Dyana Rose&lt;/li&gt;
  &lt;li&gt;Fabian Hueske&lt;/li&gt;
  &lt;li&gt;Flavio Pompermaier&lt;/li&gt;
  &lt;li&gt;Gabor Gevay&lt;/li&gt;
  &lt;li&gt;Gabor Horvath&lt;/li&gt;
  &lt;li&gt;Geoffrey Mon&lt;/li&gt;
  &lt;li&gt;Gordon Tai&lt;/li&gt;
  &lt;li&gt;Greg Hogan&lt;/li&gt;
  &lt;li&gt;Gyula Fora&lt;/li&gt;
  &lt;li&gt;Henry Saputra&lt;/li&gt;
  &lt;li&gt;Ignacio N. Lucero Ascencio&lt;/li&gt;
  &lt;li&gt;Igor Berman&lt;/li&gt;
  &lt;li&gt;Ismaël Mejía&lt;/li&gt;
  &lt;li&gt;Ivan Mushketyk&lt;/li&gt;
  &lt;li&gt;Jark Wu&lt;/li&gt;
  &lt;li&gt;Jiri Simsa&lt;/li&gt;
  &lt;li&gt;Jonas Traub&lt;/li&gt;
  &lt;li&gt;Josh&lt;/li&gt;
  &lt;li&gt;Joshi&lt;/li&gt;
  &lt;li&gt;Joshua Herman&lt;/li&gt;
  &lt;li&gt;Ken Krugler&lt;/li&gt;
  &lt;li&gt;Konstantin Knauf&lt;/li&gt;
  &lt;li&gt;Lasse Dalegaard&lt;/li&gt;
  &lt;li&gt;Li Fanxi&lt;/li&gt;
  &lt;li&gt;MaBiao&lt;/li&gt;
  &lt;li&gt;Mao Wei&lt;/li&gt;
  &lt;li&gt;Mark Reddy&lt;/li&gt;
  &lt;li&gt;Martin Junghanns&lt;/li&gt;
  &lt;li&gt;Martin Liesenberg&lt;/li&gt;
  &lt;li&gt;Maximilian Michels&lt;/li&gt;
  &lt;li&gt;Michal Fijolek&lt;/li&gt;
  &lt;li&gt;Márton Balassi&lt;/li&gt;
  &lt;li&gt;Nathan Howell&lt;/li&gt;
  &lt;li&gt;Niels Basjes&lt;/li&gt;
  &lt;li&gt;Niels Zeilemaker&lt;/li&gt;
  &lt;li&gt;Phetsarath, Sourigna&lt;/li&gt;
  &lt;li&gt;Robert Metzger&lt;/li&gt;
  &lt;li&gt;Scott Kidder&lt;/li&gt;
  &lt;li&gt;Sebastian Klemke&lt;/li&gt;
  &lt;li&gt;Shahin&lt;/li&gt;
  &lt;li&gt;Shannon Carey&lt;/li&gt;
  &lt;li&gt;Shannon Quinn&lt;/li&gt;
  &lt;li&gt;Stefan Richter&lt;/li&gt;
  &lt;li&gt;Stefano Baghino&lt;/li&gt;
  &lt;li&gt;Stefano Bortoli&lt;/li&gt;
  &lt;li&gt;Stephan Ewen&lt;/li&gt;
  &lt;li&gt;Steve Cosenza&lt;/li&gt;
  &lt;li&gt;Sumit Chawla&lt;/li&gt;
  &lt;li&gt;Tatu Saloranta&lt;/li&gt;
  &lt;li&gt;Tianji Li&lt;/li&gt;
  &lt;li&gt;Till Rohrmann&lt;/li&gt;
  &lt;li&gt;Todd Lisonbee&lt;/li&gt;
  &lt;li&gt;Tony Baines&lt;/li&gt;
  &lt;li&gt;Trevor Grant&lt;/li&gt;
  &lt;li&gt;Ufuk Celebi&lt;/li&gt;
  &lt;li&gt;Vasudevan&lt;/li&gt;
  &lt;li&gt;Yijie Shen&lt;/li&gt;
  &lt;li&gt;Zack Pierce&lt;/li&gt;
  &lt;li&gt;Zhai Jia&lt;/li&gt;
  &lt;li&gt;chengxiang li&lt;/li&gt;
  &lt;li&gt;chobeat&lt;/li&gt;
  &lt;li&gt;danielblazevski&lt;/li&gt;
  &lt;li&gt;dawid&lt;/li&gt;
  &lt;li&gt;dawidwys&lt;/li&gt;
  &lt;li&gt;eastcirclek&lt;/li&gt;
  &lt;li&gt;erli ding&lt;/li&gt;
  &lt;li&gt;gallenvara&lt;/li&gt;
  &lt;li&gt;kl0u&lt;/li&gt;
  &lt;li&gt;mans2singh&lt;/li&gt;
  &lt;li&gt;markreddy&lt;/li&gt;
  &lt;li&gt;mjsax&lt;/li&gt;
  &lt;li&gt;nikste&lt;/li&gt;
  &lt;li&gt;omaralvarez&lt;/li&gt;
  &lt;li&gt;philippgrulich&lt;/li&gt;
  &lt;li&gt;ramkrishna&lt;/li&gt;
  &lt;li&gt;sahitya-pavurala&lt;/li&gt;
  &lt;li&gt;samaitra&lt;/li&gt;
  &lt;li&gt;smarthi&lt;/li&gt;
  &lt;li&gt;spkavuly&lt;/li&gt;
  &lt;li&gt;subhankar&lt;/li&gt;
  &lt;li&gt;twalthr&lt;/li&gt;
  &lt;li&gt;vasia&lt;/li&gt;
  &lt;li&gt;xueyan.li&lt;/li&gt;
  &lt;li&gt;zentol&lt;/li&gt;
  &lt;li&gt;卫乐&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 08 Aug 2016 15:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/08/release-1.1.0.html</link>
<guid isPermaLink="true">/news/2016/08/08/release-1.1.0.html</guid>
</item>

<item>
<title>Stream Processing for Everyone with SQL and Apache Flink</title>
<description>&lt;p&gt;The capabilities of open source systems for distributed stream processing have evolved significantly over the last years. Initially, the first systems in the field (notably &lt;a href=&quot;https://storm.apache.org&quot;&gt;Apache Storm&lt;/a&gt;) provided low latency processing, but were limited to at-least-once guarantees, processing-time semantics, and rather low-level APIs. Since then, several new systems emerged and pushed the state of the art of open source stream processing in several dimensions. Today, users of Apache Flink or &lt;a href=&quot;https://beam.incubator.apache.org&quot;&gt;Apache Beam&lt;/a&gt; can use fluent Scala and Java APIs to implement stream processing jobs that operate in event-time with exactly-once semantics at high throughput and low latency.&lt;/p&gt;

&lt;p&gt;In the meantime, stream processing has taken off in the industry. We are witnessing a rapidly growing interest in stream processing which is reflected by prevalent deployments of streaming processing infrastructure such as &lt;a href=&quot;https://kafka.apache.org&quot;&gt;Apache Kafka&lt;/a&gt; and Apache Flink. The increasing number of available data streams results in a demand for people that can analyze streaming data and turn it into real-time insights. However, stream data analysis requires a special skill set including knowledge of streaming concepts such as the characteristics of unbounded streams, windows, time, and state as well as the skills to implement stream analysis jobs usually against Java or Scala APIs. People with this skill set are rare and hard to find.&lt;/p&gt;

&lt;p&gt;About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is &lt;em&gt;the&lt;/em&gt; standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysis and significantly simplify many applications including stream ingestion and simple transformations. In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.&lt;/p&gt;

&lt;h2 id=&quot;where-did-we-come-from&quot;&gt;Where did we come from?&lt;/h2&gt;

&lt;p&gt;With the &lt;a href=&quot;http://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html&quot;&gt;0.9.0-milestone1&lt;/a&gt; release, Apache Flink added an API to process relational data with SQL-like expressions called the Table API. The central concept of this API is a Table, a structured data set or stream on which relational operations can be applied. The Table API is tightly integrated with the DataSet and DataStream API. A Table can be easily created from a DataSet or DataStream and can also be converted back into a DataSet or DataStream as the following example shows&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// obtain a DataSet from somewhere&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempData&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// convert the DataSet to a Table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempTable&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// compute your result&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgTempCTable&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempTable&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;like&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;room%&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
   &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;Location&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
   &lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.556&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempC&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTempC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// convert result Table back into a DataSet and print it&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;avgTempCTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Although the example shows Scala code, there is also an equivalent Java version of the Table API. The following picture depicts the original architecture of the Table API.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/stream-sql/old-table-api.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;A Table is created from a DataSet or DataStream and transformed into a new Table by applying relational transformations such as &lt;code&gt;filter&lt;/code&gt;, &lt;code&gt;join&lt;/code&gt;, or &lt;code&gt;select&lt;/code&gt; on them. Internally, a logical table operator tree is constructed from the applied Table transformations. When a Table is translated back into a DataSet or DataStream, the respective translator translates the logical operator tree into DataSet or DataStream operators. Expressions like &lt;code&gt;&#39;location.like(&quot;room%&quot;)&lt;/code&gt; are compiled into Flink functions via code generation.&lt;/p&gt;

&lt;p&gt;However, the original Table API had a few limitations. First of all, it could not stand alone. Table API queries had to be always embedded into a DataSet or DataStream program. Queries against batch Tables did not support outer joins, sorting, and many scalar functions which are commonly used in SQL queries. Queries against streaming tables only supported filters, union, and projections and no aggregations or joins. Also, the translation process did not leverage query optimization techniques except for the physical optimization that is applied to all DataSet programs.&lt;/p&gt;

&lt;h2 id=&quot;table-api-joining-forces-with-sql&quot;&gt;Table API joining forces with SQL&lt;/h2&gt;

&lt;p&gt;The discussion about adding support for SQL came up a few times in the Flink community. With Flink 0.9 and the availability of the Table API, code generation for relational expressions, and runtime operators, the foundation for such an extension seemed to be there and SQL support the next logical step. On the other hand, the community was also well aware of the multitude of dedicated “SQL-on-Hadoop” solutions in the open source landscape (&lt;a href=&quot;https://hive.apache.org&quot;&gt;Apache Hive&lt;/a&gt;, &lt;a href=&quot;https://drill.apache.org&quot;&gt;Apache Drill&lt;/a&gt;, &lt;a href=&quot;http://impala.io&quot;&gt;Apache Impala&lt;/a&gt;, &lt;a href=&quot;https://tajo.apache.org&quot;&gt;Apache Tajo&lt;/a&gt;, just to name a few). Given these alternatives, we figured that time would be better spent improving Flink in other ways than implementing yet another SQL-on-Hadoop solution.&lt;/p&gt;

&lt;p&gt;However, with the growing popularity of stream processing and the increasing adoption of Flink in this area, the Flink community saw the need for a simpler API to enable more users to analyze streaming data. About half a year ago, we decided to take the Table API to the next level, extend the stream processing capabilities of the Table API, and add support for SQL on streaming data. What we came up with was a revised architecture for a Table API that supports SQL (and Table API) queries on streaming and static data sources. We did not want to reinvent the wheel and decided to build the new Table API on top of &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite&lt;/a&gt;, a popular SQL parser and optimizer framework. Apache Calcite is used by many projects including Apache Hive, Apache Drill, Cascading, and many &lt;a href=&quot;https://calcite.apache.org/docs/powered_by.html&quot;&gt;more&lt;/a&gt;. Moreover, the Calcite community put &lt;a href=&quot;https://calcite.apache.org/docs/stream.html&quot;&gt;SQL on streams&lt;/a&gt; on their roadmap which makes it a perfect fit for Flink’s SQL interface.&lt;/p&gt;

&lt;p&gt;Calcite is central in the new design as the following architecture sketch shows:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/stream-sql/new-table-api.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The new architecture features two integrated APIs to specify relational queries, the Table API and SQL. Queries of both APIs are validated against a catalog of registered tables and converted into Calcite’s representation for logical plans. In this representation, stream and batch queries look exactly the same. Next, Calcite’s cost-based optimizer applies transformation rules and optimizes the logical plans. Depending on the nature of the sources (streaming or static) we use different rule sets. Finally, the optimized plan is translated into a regular Flink DataStream or DataSet program. This step involves again code generation to compile relational expressions into Flink functions.&lt;/p&gt;

&lt;p&gt;The new architecture of the Table API maintains the basic principles of the original Table API and improves it. It keeps a uniform interface for relational queries on streaming and static data. In addition, we take advantage of Calcite’s query optimization framework and SQL parser. The design builds upon Flink’s established APIs, i.e., the DataStream API that offers low-latency, high-throughput stream processing with exactly-once semantics and consistent results due to event-time processing, and the DataSet API with robust and efficient in-memory operators and pipelined data exchange. Any improvements to Flink’s core APIs and engine will automatically improve the execution of Table API and SQL queries.&lt;/p&gt;

&lt;p&gt;With this effort, we are adding SQL support for both streaming and static data to Flink. However, we do not want to see this as a competing solution to dedicated, high-performance SQL-on-Hadoop solutions, such as Impala, Drill, and Hive. Instead, we see the sweet spot of Flink’s SQL integration primarily in providing access to streaming analytics to a wider audience. In addition, it will facilitate integrated applications that use Flink’s API’s as well as SQL while being executed on a single runtime engine.&lt;/p&gt;

&lt;h2 id=&quot;how-will-flinks-sql-on-streams-look-like&quot;&gt;How will Flink’s SQL on streams look like?&lt;/h2&gt;

&lt;p&gt;So far we discussed the motivation for and architecture of Flink’s stream SQL interface, but how will it actually look like? The new SQL interface is integrated into the Table API. DataStreams, DataSets, and external data sources can be registered as tables at the &lt;code&gt;TableEnvironment&lt;/code&gt; in order to make them queryable with SQL. The &lt;code&gt;TableEnvironment.sql()&lt;/code&gt; method states a SQL query and returns its result as a Table. The following example shows a complete program that reads a streaming table from a JSON encoded Kafka topic, processes it with a SQL query and writes the resulting stream into another Kafka topic. Please note that the KafkaJsonSource and KafkaJsonSink are under development and not available yet. In the future, TableSources and TableSinks can be persisted to and loaded from files to ease reuse of source and sink definitions and to reduce boilerplate code.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// get environments&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// configure Kafka connection&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kafkaProps&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define a JSON encoded Kafka topic as external table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorSource&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;KafkaJsonSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)](&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&amp;quot;sensorTopic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;kafkaProps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;location&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;time&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;tempF&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// register external table&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensorData&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// define query in external table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;roomSensors&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&amp;quot;FROM sensorData &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
    &lt;span class=&quot;s&quot;&gt;&amp;quot;WHERE location LIKE &amp;#39;room%&amp;#39;&amp;quot;&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// define a JSON encoded Kafka topic as external sink&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;roomSensorSink&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;KafkaJsonSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// define sink for room sensor data and execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;roomSensors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;roomSensorSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;You might have noticed that this example left out the most interesting aspects of stream data processing: window aggregates and joins. How will these operations be expressed in SQL? Well, that is a very good question. The Apache Calcite community put out an excellent proposal that discusses the syntax and semantics of &lt;a href=&quot;https://calcite.apache.org/docs/stream.html&quot;&gt;SQL on streams&lt;/a&gt;. It describes Calcite’s stream SQL as &lt;em&gt;“an extension to standard SQL, not another ‘SQL-like’ language”&lt;/em&gt;. This has several benefits. First, people who are familiar with standard SQL will be able to analyze data streams without learning a new syntax. Queries on static tables and streams are (almost) identical and can be easily ported. Moreover it is possible to specify queries that reference static and streaming tables at the same time which goes well together with Flink’s vision to handle batch processing as a special case of stream processing, i.e., as processing finite streams. Finally, using standard SQL for stream data analysis means following a well established standard that is supported by many tools.&lt;/p&gt;

&lt;p&gt;Although we haven’t completely fleshed out the details of how windows will be defined in Flink’s SQL syntax and Table API, the following examples show how a tumbling window query could look like in SQL and the Table API.&lt;/p&gt;

&lt;h3 id=&quot;sql-following-the-syntax-proposal-of-calcites-streaming-sql-document&quot;&gt;SQL (following the syntax proposal of Calcite’s streaming SQL document)&lt;/h3&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STREAM&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;TUMBLE_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;day&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;room&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; 
  &lt;span class=&quot;k&quot;&gt;AVG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;556&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgTempC&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorData&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;room%&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TUMBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;location&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3 id=&quot;table-api&quot;&gt;Table API&lt;/h3&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgRoomTemp&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ingest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensorData&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;like&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;room%&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitionBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumbling&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.556&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTempCs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;whats-up-next&quot;&gt;What’s up next?&lt;/h2&gt;

&lt;p&gt;The Flink community is actively working on SQL support for the next minor version Flink 1.1.0. In the first version, SQL (and Table API) queries on streams will be limited to selection, filter, and union operators. Compared to Flink 1.0.0, the revised Table API will support many more scalar functions and be able to read tables from external sources and write them back to external sinks. A lot of work went into reworking the architecture of the Table API and integrating Apache Calcite.&lt;/p&gt;

&lt;p&gt;In Flink 1.2.0, the feature set of SQL on streams will be significantly extended. Among other things, we plan to support different types of window aggregates and maybe also streaming joins. For this effort, we want to closely collaborate with the Apache Calcite community and help extending Calcite’s support for relational operations on streaming data when necessary.&lt;/p&gt;

&lt;p&gt;If this post made you curious and you want to try out Flink’s SQL interface and the new Table API, we encourage you to do so! Simply clone the SNAPSHOT &lt;a href=&quot;https://github.com/apache/flink/tree/master&quot;&gt;master branch&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/table.html&quot;&gt;Table API documentation for the SNAPSHOT version&lt;/a&gt;. Please note that the branch is under heavy development, and hence some code examples in this blog post might not work. We are looking forward to your feedback and welcome contributions.&lt;/p&gt;
</description>
<pubDate>Tue, 24 May 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/05/24/stream-sql.html</link>
<guid isPermaLink="true">/news/2016/05/24/stream-sql.html</guid>
</item>

<item>
<title>Flink 1.0.3 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.3&lt;/strong&gt;, the third bugfix release of the 1.0 series.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.3&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;

&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3790&quot;&gt;FLINK-3790&lt;/a&gt;] [streaming] Use proper hadoop config in rolling sink&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3840&quot;&gt;FLINK-3840&lt;/a&gt;] Remove Testing Files in RocksDB Backend&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3835&quot;&gt;FLINK-3835&lt;/a&gt;] [optimizer] Add input id to JSON plan to resolve ambiguous input names&lt;/li&gt;
  &lt;li&gt;[hotfix] OptionSerializer.duplicate to respect stateful element serializer&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3803&quot;&gt;FLINK-3803&lt;/a&gt;] [runtime] Pass CheckpointStatsTracker to ExecutionGraph&lt;/li&gt;
  &lt;li&gt;[hotfix] [cep] Make cep window border treatment consistent&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3678&quot;&gt;FLINK-3678&lt;/a&gt;] [dist, docs] Make Flink logs directory configurable&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;docs&quot;&gt;Docs&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[docs] Add note about S3AFileSystem ‘buffer.dir’ property&lt;/li&gt;
  &lt;li&gt;[docs] Update AWS S3 docs&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;tests&quot;&gt;Tests&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3860&quot;&gt;FLINK-3860&lt;/a&gt;] [connector-wikiedits] Add retry loop to WikipediaEditsSourceTest&lt;/li&gt;
  &lt;li&gt;[streaming-contrib] Fix port clash in DbStateBackend tests&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 May 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/05/11/release-1.0.3.html</link>
<guid isPermaLink="true">/news/2016/05/11/release-1.0.3.html</guid>
</item>

<item>
<title>Flink 1.0.2 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.2&lt;/strong&gt;, the second bugfix release of the 1.0 series.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.2&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;

&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3657&quot;&gt;FLINK-3657&lt;/a&gt;] [dataSet] Change access of DataSetUtils.countElements() to ‘public’&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3762&quot;&gt;FLINK-3762&lt;/a&gt;] [core] Enable Kryo reference tracking&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3732&quot;&gt;FLINK-3732&lt;/a&gt;] [core] Fix potential null deference in ExecutionConfig#equals()&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3760&quot;&gt;FLINK-3760&lt;/a&gt;] Fix StateDescriptor.readObject&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3730&quot;&gt;FLINK-3730&lt;/a&gt;] Fix RocksDB Local Directory Initialization&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3712&quot;&gt;FLINK-3712&lt;/a&gt;] Make all dynamic properties available to the CLI frontend&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3688&quot;&gt;FLINK-3688&lt;/a&gt;] WindowOperator.trigger() does not emit Watermark anymore&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3697&quot;&gt;FLINK-3697&lt;/a&gt;] Properly access type information for nested POJO key selection&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3654&quot;&gt;FLINK-3654&lt;/a&gt;] Disable Write-Ahead-Log in RocksDB State&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;docs&quot;&gt;Docs&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2544&quot;&gt;FLINK-2544&lt;/a&gt;] [docs] Add Java 8 version for building PowerMock tests to docs&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3469&quot;&gt;FLINK-3469&lt;/a&gt;] [docs] Improve documentation for grouping keys&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3634&quot;&gt;FLINK-3634&lt;/a&gt;] [docs] Fix documentation for DataSetUtils.zipWithUniqueId()&lt;/li&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3711&quot;&gt;FLINK-3711&lt;/a&gt;][docs] Documentation of Scala fold()() uses correct syntax&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;tests&quot;&gt;Tests&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3716&quot;&gt;FLINK-3716&lt;/a&gt;] [kafka consumer] Decreasing socket timeout so testFailOnNoBroker() will pass before JUnit timeout&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 22 Apr 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/22/release-1.0.2.html</link>
<guid isPermaLink="true">/news/2016/04/22/release-1.0.2.html</guid>
</item>

<item>
<title>Flink Forward 2016 Call for Submissions Is Now Open</title>
<description>&lt;p&gt;We are happy to announce that the call for submissions for Flink Forward 2016 is now open! The conference will take place September 12-14, 2016 in Berlin, Germany, bringing together the open source stream processing community. Most Apache Flink committers will attend the conference, making it the ideal venue to learn more about the project and its roadmap and connect with the community.&lt;/p&gt;

&lt;p&gt;The conference welcomes submissions on everything Flink-related, including experiences with using Flink, products based on Flink, technical talks on extending Flink, as well as connecting Flink with other open source or proprietary software.&lt;/p&gt;

&lt;p&gt;Read more &lt;a href=&quot;http://flink-forward.org/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Thu, 14 Apr 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/14/flink-forward-announce.html</link>
<guid isPermaLink="true">/news/2016/04/14/flink-forward-announce.html</guid>
</item>

<item>
<title>Introducing Complex Event Processing (CEP) with Apache Flink</title>
<description>&lt;p&gt;With the ubiquity of sensor networks and smart devices continuously collecting more and more data, we face the challenge to analyze an ever growing stream of data in near real-time. 
Being able to react quickly to changing trends or to deliver up to date business intelligence can be a decisive factor for a company’s success or failure. 
A key problem in real time processing is the detection of event patterns in data streams.&lt;/p&gt;

&lt;p&gt;Complex event processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern. 
The result of a matching are usually complex events which are derived from the input events. 
In contrast to traditional DBMSs where a query is executed on stored data, CEP executes data on a stored query. 
All data which is not relevant for the query can be immediately discarded. 
The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data. 
Furthermore, inputs are processed immediately. 
Once the system has seen all events for a matching sequence, results are emitted straight away. 
This aspect effectively leads to CEP’s real time analytics capability.&lt;/p&gt;

&lt;p&gt;Consequently, CEP’s processing paradigm drew significant interest and found application in a wide variety of use cases. 
Most notably, CEP is used nowadays for financial applications such as stock market trend and credit card fraud detection. 
Moreover, it is used in RFID-based tracking and monitoring, for example, to detect thefts in a warehouse where items are not properly checked out. 
CEP can also be used to detect network intrusion by specifying patterns of suspicious user behaviour.&lt;/p&gt;

&lt;p&gt;Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads. 
Consequently, the Flink community has introduced the first version of a new &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html&quot;&gt;CEP library&lt;/a&gt; with &lt;a href=&quot;http://flink.apache.org/news/2016/03/08/release-1.0.0.html&quot;&gt;Flink 1.0&lt;/a&gt;. 
In the remainder of this blog post, we introduce Flink’s CEP library and we illustrate its ease of use through the example of monitoring a data center.&lt;/p&gt;

&lt;h2 id=&quot;monitoring-and-alert-generation-for-data-centers&quot;&gt;Monitoring and alert generation for data centers&lt;/h2&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/cep-monitoring.svg&quot; style=&quot;width:600px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Assume we have a data center with a number of racks. 
For each rack the power consumption and the temperature are monitored. 
Whenever such a measurement takes place, a new power or temperature event is generated, respectively. 
Based on this monitoring event stream, we want to detect racks that are about to overheat, and dynamically adapt their workload and cooling.&lt;/p&gt;

&lt;p&gt;For this scenario we use a two staged approach. 
First, we monitor the temperature events. 
Whenever we see two consecutive events whose temperature exceeds a threshold value, we generate a temperature warning with the current average temperature. 
A temperature warning does not necessarily indicate that a rack is about to overheat. 
But whenever we see two consecutive warnings with increasing temperatures, then we want to issue an alert for this rack. 
This alert can then lead to countermeasures to cool the rack.&lt;/p&gt;

&lt;h3 id=&quot;implementation-with-apache-flink&quot;&gt;Implementation with Apache Flink&lt;/h3&gt;

&lt;p&gt;First, we define the messages of the incoming monitoring event stream. 
Every monitoring message contains its originating rack ID. 
The temperature event additionally contains the current temperature and the power consumption event contains the current voltage. 
We model the events as POJOs:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PowerEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;voltage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we can ingest the monitoring event stream using one of Flink’s connectors (e.g. Kafka, RabbitMQ, etc.). 
This will give us a &lt;code&gt;DataStream&amp;lt;MonitoringEvent&amp;gt; inputEventStream&lt;/code&gt; which we will use as the input for Flink’s CEP operator. 
But first, we have to define the event pattern to detect temperature warnings. 
The CEP library offers an intuitive &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html#the-pattern-api&quot;&gt;Pattern API&lt;/a&gt; to easily define these complex patterns.&lt;/p&gt;

&lt;p&gt;Every pattern consists of a sequence of events which can have optional filter conditions assigned. 
A pattern always starts with a first event to which we will assign the name &lt;code&gt;“First Event”&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This pattern will match every monitoring event. 
Since we are only interested in &lt;code&gt;TemperatureEvents&lt;/code&gt; whose temperature is above a threshold value, we have to add an additional subtype constraint and a where clause:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As stated before, we want to generate a &lt;code&gt;TemperatureWarning&lt;/code&gt; if and only if we see two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; for the same rack whose temperatures are too high. 
The Pattern API offers the &lt;code&gt;next&lt;/code&gt; call which allows us to add a new event to our pattern. 
This event has to follow directly the first matching event in order for the whole pattern to match.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The final pattern definition also contains the &lt;code&gt;within&lt;/code&gt; API call which defines that two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; have to occur within a time interval of 10 seconds for the pattern to match. 
Depending on the time characteristic setting, this can either be processing, ingestion or event time.&lt;/p&gt;

&lt;p&gt;Having defined the event pattern, we can now apply it on the &lt;code&gt;inputEventStream&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;inputEventStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Since we want to generate our warnings for each rack individually, we &lt;code&gt;keyBy&lt;/code&gt; the input event stream by the &lt;code&gt;“rackID”&lt;/code&gt; POJO field. 
This enforces that matching events of our pattern will all have the same rack ID.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;PatternStream&amp;lt;MonitoringEvent&amp;gt;&lt;/code&gt; gives us access to successfully matched event sequences. 
They can be accessed using the &lt;code&gt;select&lt;/code&gt; API call. 
The &lt;code&gt;select&lt;/code&gt; API call takes a &lt;code&gt;PatternSelectFunction&lt;/code&gt; which is called for every matching event sequence. 
The event sequence is provided as a &lt;code&gt;Map&amp;lt;String, MonitoringEvent&amp;gt;&lt;/code&gt; where each &lt;code&gt;MonitoringEvent&lt;/code&gt; is identified by its assigned event name. 
Our pattern select function generates for each matching pattern a &lt;code&gt;TemperatureWarning&lt;/code&gt; event.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;averageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; 
            &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Now we have generated a new complex event stream &lt;code&gt;DataStream&amp;lt;TemperatureWarning&amp;gt; warnings&lt;/code&gt; from the initial monitoring event stream. 
This complex event stream can again be used as the input for another round of complex event processing. 
We use the &lt;code&gt;TemperatureWarnings&lt;/code&gt; to generate &lt;code&gt;TemperatureAlerts&lt;/code&gt; whenever we see two consecutive &lt;code&gt;TemperatureWarnings&lt;/code&gt; for the same rack with increasing temperatures. 
The &lt;code&gt;TemperatureAlerts&lt;/code&gt; have the following definition:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureAlert&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;At first, we have to define our alert event pattern:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This definition says that we want to see two &lt;code&gt;TemperatureWarnings&lt;/code&gt; within 20 seconds. 
The first event has the name &lt;code&gt;“First Event”&lt;/code&gt; and the second consecutive event has the name &lt;code&gt;“Second Event”&lt;/code&gt;. 
The individual events don’t have a where clause assigned, because we need access to both events in order to decide whether the temperature is increasing. 
Therefore, we apply the filter condition in the select clause. 
But first, we obtain again a &lt;code&gt;PatternStream&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Again, we &lt;code&gt;keyBy&lt;/code&gt; the warnings input stream by the &lt;code&gt;&quot;rackID&quot;&lt;/code&gt; so that we generate our alerts for each rack individually. 
Next we apply the &lt;code&gt;flatSelect&lt;/code&gt; method which will give us access to matching event sequences and allows us to output an arbitrary number of complex events. 
Thus, we will only generate a &lt;code&gt;TemperatureAlert&lt;/code&gt; if and only if the temperature is increasing.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatSelect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;DataStream&amp;lt;TemperatureAlert&amp;gt; alerts&lt;/code&gt; is the data stream of temperature alerts for each rack. 
Based on these alerts we can now adapt the workload or cooling for overheating racks.&lt;/p&gt;

&lt;p&gt;The full source code for the presented example as well as an example data source which generates randomly monitoring events can be found in &lt;a href=&quot;https://github.com/tillrohrmann/cep-monitoring&quot;&gt;this repository&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;In this blog post we have seen how easy it is to reason about event streams using Flink’s CEP library. 
Using the example of monitoring and alert generation for a data center, we have implemented a short program which notifies us when a rack is about to overheat and potentially to fail.&lt;/p&gt;

&lt;p&gt;In the future, the Flink community will further extend the CEP library’s functionality and expressiveness. 
Next on the road map is support for a regular expression-like pattern specification, including Kleene star, lower and upper bounds, and negation. 
Furthermore, it is planned to allow the where-clause to access fields of previously matched events. 
This feature will allow to prune unpromising event sequences early.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; The example code requires Flink 1.0.1 or higher.&lt;/p&gt;

</description>
<pubDate>Wed, 06 Apr 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/06/cep-monitoring.html</link>
<guid isPermaLink="true">/news/2016/04/06/cep-monitoring.html</guid>
</item>

<item>
<title>Flink 1.0.1 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.1&lt;/strong&gt;, the first bugfix release of the 1.0 series.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.1&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;

&lt;h3&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3179&quot;&gt;FLINK-3179&lt;/a&gt;] -         Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3472&quot;&gt;FLINK-3472&lt;/a&gt;] -         JDBCInputFormat.nextRecord(..) has misleading message on NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3491&quot;&gt;FLINK-3491&lt;/a&gt;] -         HDFSCopyUtilitiesTest fails on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3495&quot;&gt;FLINK-3495&lt;/a&gt;] -         RocksDB Tests can&amp;#39;t run on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3533&quot;&gt;FLINK-3533&lt;/a&gt;] -         Update the Gelly docs wrt examples and cluster execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3563&quot;&gt;FLINK-3563&lt;/a&gt;] -         .returns() doesn&amp;#39;t compile when using .map() with a custom MapFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3566&quot;&gt;FLINK-3566&lt;/a&gt;] -         Input type validation often fails on custom TypeInfo implementations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3578&quot;&gt;FLINK-3578&lt;/a&gt;] -         Scala DataStream API does not support Rich Window Functions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3595&quot;&gt;FLINK-3595&lt;/a&gt;] -         Kafka09 consumer thread does not interrupt when stuck in record emission
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3602&quot;&gt;FLINK-3602&lt;/a&gt;] -         Recursive Types are not supported / crash TypeExtractor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3621&quot;&gt;FLINK-3621&lt;/a&gt;] -         Misleading documentation of memory configuration parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3629&quot;&gt;FLINK-3629&lt;/a&gt;] -         In wikiedits Quick Start example, &amp;quot;The first call, .window()&amp;quot; should be &amp;quot;The first call, .timeWindow()&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3651&quot;&gt;FLINK-3651&lt;/a&gt;] -         Fix faulty RollingSink Restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3653&quot;&gt;FLINK-3653&lt;/a&gt;] -         recovery.zookeeper.storageDir is not documented on the configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3663&quot;&gt;FLINK-3663&lt;/a&gt;] -         FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3681&quot;&gt;FLINK-3681&lt;/a&gt;] -         CEP library does not support Java 8 lambdas as select function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3682&quot;&gt;FLINK-3682&lt;/a&gt;] -         CEP operator does not set the processing timestamp correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3684&quot;&gt;FLINK-3684&lt;/a&gt;] -         CEP operator does not forward watermarks properly
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3570&quot;&gt;FLINK-3570&lt;/a&gt;] -         Replace random NIC selection heuristic by InetAddress.getLocalHost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3575&quot;&gt;FLINK-3575&lt;/a&gt;] -         Update Working With State Section in Doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3591&quot;&gt;FLINK-3591&lt;/a&gt;] -         Replace Quickstart K-Means Example by Streaming Example
&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;Test&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2444&quot;&gt;FLINK-2444&lt;/a&gt;] -         Add tests for HadoopInputFormats
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2445&quot;&gt;FLINK-2445&lt;/a&gt;] -         Add tests for HadoopOutputFormats
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 06 Apr 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/06/release-1.0.1.html</link>
<guid isPermaLink="true">/news/2016/04/06/release-1.0.1.html</guid>
</item>

<item>
<title>Announcing Apache Flink 1.0.0</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-1.0.png&quot; style=&quot;height:200px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Flink version 1.0.0 marks the beginning of the 1.X.X series of releases, which will maintain backwards compatibility with 1.0.0. This means that applications written against stable APIs of Flink 1.0.0 will compile and run with all Flink versions in the 1. series. This is the first time we are formally guaranteeing compatibility in Flink’s history, and we therefore see this release as a major milestone of the project, perhaps the most important since graduation as a top-level project.&lt;/p&gt;

&lt;p&gt;Apart from backwards compatibility, Flink 1.0.0 brings a variety of new user-facing features, as well as tons of bug fixes. About 64 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved.&lt;/p&gt;

&lt;p&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.0/&quot;&gt;check out the documentation&lt;/a&gt;. Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/p&gt;

&lt;h2 id=&quot;interface-stability-annotations&quot;&gt;Interface stability annotations&lt;/h2&gt;

&lt;p&gt;Flink 1.0.0 introduces interface stability annotations for API classes and methods. Interfaces defined as &lt;code&gt;@Public&lt;/code&gt; are guaranteed to remain stable across all releases of the 1.x series. The &lt;code&gt;@PublicEvolving&lt;/code&gt; annotation marks API features that may be subject to change in future versions.&lt;/p&gt;

&lt;p&gt;Flink’s stability annotations will help users to implement applications that compile and execute unchanged against future versions of Flink 1.x. This greatly reduces the complexity for users when upgrading to a newer Flink release.&lt;/p&gt;

&lt;h2 id=&quot;out-of-core-state-support&quot;&gt;Out-of-core state support&lt;/h2&gt;

&lt;p&gt;Flink 1.0.0 adds a new state backend that uses RocksDB to store state (both windows and user-defined key-value state). &lt;a href=&quot;http://rocksdb.org/&quot;&gt;RocksDB&lt;/a&gt; is an embedded key/value store database, originally developed by Facebook.
When using this backend, active state in streaming programs can grow well beyond memory. The RocksDB files are stored in a distributed file system such as HDFS or S3 for backups.&lt;/p&gt;

&lt;h2 id=&quot;savepoints-and-version-upgrades&quot;&gt;Savepoints and version upgrades&lt;/h2&gt;

&lt;p&gt;Savepoints are checkpoints of the state of a running streaming job that can be manually triggered by the user while the job is running. Savepoints solve several production headaches, including code upgrades (both application and framework), cluster maintenance and migration, A/B testing and what-if scenarios, as well as testing and debugging. Read more about savepoints at the &lt;a href=&quot;http://data-artisans.com/how-apache-flink-enables-new-streaming-applications/&quot;&gt;data Artisans blog&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;library-for-complex-event-processing-cep&quot;&gt;Library for Complex Event Processing (CEP)&lt;/h2&gt;

&lt;p&gt;Complex Event Processing has been one of the oldest and more important use cases from stream processing. The new CEP functionality in Flink allows you to use a distributed general-purpose stream processor instead of a specialized CEP system to detect complex patterns in event streams. Get started with &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html&quot;&gt;CEP on Flink&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;enhanced-monitoring-interface-job-submission-checkpoint-statistics-and-backpressure-monitoring&quot;&gt;Enhanced monitoring interface: job submission, checkpoint statistics and backpressure monitoring&lt;/h2&gt;

&lt;p&gt;The web interface now allows users to submit jobs. Previous Flink releases had a separate service for submitting jobs. The new interface is part of the JobManager frontend. It also works on YARN now.&lt;/p&gt;

&lt;p&gt;Backpressure monitoring allows users to trigger a sampling mechanism which analyzes the time operators are waiting for new network buffers. When senders are spending most of their time for new network buffers, they are experiencing backpressure from their downstream operators. Many users requested this feature for understanding bottlenecks in both batch and streaming applications.&lt;/p&gt;

&lt;h2 id=&quot;improved-checkpointing-control-and-monitoring&quot;&gt;Improved checkpointing control and monitoring&lt;/h2&gt;

&lt;p&gt;The checkpointing has been extended by a more fine-grained control mechanism: In previous versions, new checkpoints were triggered independent of the speed at which old checkpoints completed. This can lead to situations where new checkpoints are piling up, because they are triggered too frequently.&lt;/p&gt;

&lt;p&gt;The checkpoint coordinator now exposes statistics through our REST monitoring API and the web interface. Users can review the checkpoint size and duration on a per-operator basis and see the last completed checkpoints. This is helpful for identifying performance issues, such as processing slowdown by the checkpoints.&lt;/p&gt;

&lt;h2 id=&quot;improved-kafka-connector-and-support-for-kafka-09&quot;&gt;Improved Kafka connector and support for Kafka 0.9&lt;/h2&gt;

&lt;p&gt;Flink 1.0 supports both Kafka 0.8 and 0.9. With the new release, Flink exposes Kafka metrics for the producers and the 0.9 consumer through Flink’s accumulator system. We also enhanced the existing connector for Kafka 0.8, allowing users to subscribe to multiple topics in one source.&lt;/p&gt;

&lt;h2 id=&quot;changelog-and-known-issues&quot;&gt;Changelog and known issues&lt;/h2&gt;

&lt;p&gt;This release resolves more than 450 issues, including bug fixes, improvements, and new features. See the &lt;a href=&quot;/blog/release_1.0.0-changelog_known_issues.html#changelog&quot;&gt;complete changelog&lt;/a&gt; and &lt;a href=&quot;/blog/release_1.0.0-changelog_known_issues.html#known-issues&quot;&gt;known issues&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of contributors&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Abhishek Agarwal&lt;/li&gt;
  &lt;li&gt;Ajay Bhat&lt;/li&gt;
  &lt;li&gt;Aljoscha Krettek&lt;/li&gt;
  &lt;li&gt;Andra Lungu&lt;/li&gt;
  &lt;li&gt;Andrea Sella&lt;/li&gt;
  &lt;li&gt;Chesnay Schepler&lt;/li&gt;
  &lt;li&gt;Chiwan Park&lt;/li&gt;
  &lt;li&gt;Daniel Pape&lt;/li&gt;
  &lt;li&gt;Fabian Hueske&lt;/li&gt;
  &lt;li&gt;Filipe Correia&lt;/li&gt;
  &lt;li&gt;Frederick F. Kautz IV&lt;/li&gt;
  &lt;li&gt;Gabor Gevay&lt;/li&gt;
  &lt;li&gt;Gabor Horvath&lt;/li&gt;
  &lt;li&gt;Georgios Andrianakis&lt;/li&gt;
  &lt;li&gt;Greg Hogan&lt;/li&gt;
  &lt;li&gt;Gyula Fora&lt;/li&gt;
  &lt;li&gt;Henry Saputra&lt;/li&gt;
  &lt;li&gt;Hilmi Yildirim&lt;/li&gt;
  &lt;li&gt;Hubert Czerpak&lt;/li&gt;
  &lt;li&gt;Jark Wu&lt;/li&gt;
  &lt;li&gt;Johannes&lt;/li&gt;
  &lt;li&gt;Jun Aoki&lt;/li&gt;
  &lt;li&gt;Jun Aoki&lt;/li&gt;
  &lt;li&gt;Kostas Kloudas&lt;/li&gt;
  &lt;li&gt;Li Chengxiang&lt;/li&gt;
  &lt;li&gt;Lun Gao&lt;/li&gt;
  &lt;li&gt;Martin Junghanns&lt;/li&gt;
  &lt;li&gt;Martin Liesenberg&lt;/li&gt;
  &lt;li&gt;Matthias J. Sax&lt;/li&gt;
  &lt;li&gt;Maximilian Michels&lt;/li&gt;
  &lt;li&gt;Márton Balassi&lt;/li&gt;
  &lt;li&gt;Nick Dimiduk&lt;/li&gt;
  &lt;li&gt;Niels Basjes&lt;/li&gt;
  &lt;li&gt;Omer Katz&lt;/li&gt;
  &lt;li&gt;Paris Carbone&lt;/li&gt;
  &lt;li&gt;Patrice Freydiere&lt;/li&gt;
  &lt;li&gt;Peter Vandenabeele&lt;/li&gt;
  &lt;li&gt;Piotr Godek&lt;/li&gt;
  &lt;li&gt;Prez Cannady&lt;/li&gt;
  &lt;li&gt;Robert Metzger&lt;/li&gt;
  &lt;li&gt;Romeo Kienzler&lt;/li&gt;
  &lt;li&gt;Sachin Goel&lt;/li&gt;
  &lt;li&gt;Saumitra Shahapure&lt;/li&gt;
  &lt;li&gt;Sebastian Klemke&lt;/li&gt;
  &lt;li&gt;Stefano Baghino&lt;/li&gt;
  &lt;li&gt;Stephan Ewen&lt;/li&gt;
  &lt;li&gt;Stephen Samuel&lt;/li&gt;
  &lt;li&gt;Subhobrata Dey&lt;/li&gt;
  &lt;li&gt;Suneel Marthi&lt;/li&gt;
  &lt;li&gt;Ted Yu&lt;/li&gt;
  &lt;li&gt;Theodore Vasiloudis&lt;/li&gt;
  &lt;li&gt;Till Rohrmann&lt;/li&gt;
  &lt;li&gt;Timo Walther&lt;/li&gt;
  &lt;li&gt;Trevor Grant&lt;/li&gt;
  &lt;li&gt;Ufuk Celebi&lt;/li&gt;
  &lt;li&gt;Ulf Karlsson&lt;/li&gt;
  &lt;li&gt;Vasia Kalavri&lt;/li&gt;
  &lt;li&gt;fversaci&lt;/li&gt;
  &lt;li&gt;madhukar&lt;/li&gt;
  &lt;li&gt;qingmeng.wyh&lt;/li&gt;
  &lt;li&gt;ramkrishna&lt;/li&gt;
  &lt;li&gt;rtudoran&lt;/li&gt;
  &lt;li&gt;sahitya-pavurala&lt;/li&gt;
  &lt;li&gt;zhangminglei&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 08 Mar 2016 14:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/03/08/release-1.0.0.html</link>
<guid isPermaLink="true">/news/2016/03/08/release-1.0.0.html</guid>
</item>

<item>
<title>Flink 0.10.2 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;0.10.2&lt;/strong&gt;, the second bugfix release of the 0.10 series.&lt;/p&gt;

&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;0.10.2&lt;/code&gt; and updating the binaries on the server.&lt;/p&gt;

&lt;h2 id=&quot;issues-fixed&quot;&gt;Issues fixed&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3242&quot;&gt;FLINK-3242&lt;/a&gt;: Adjust StateBackendITCase for 0.10 signatures of state backends&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3236&quot;&gt;FLINK-3236&lt;/a&gt;: Flink user code classloader as parent classloader from Flink core classes&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2962&quot;&gt;FLINK-2962&lt;/a&gt;: Cluster startup script refers to unused variable&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3151&quot;&gt;FLINK-3151&lt;/a&gt;: Downgrade to Netty version 4.0.27.Final&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3224&quot;&gt;FLINK-3224&lt;/a&gt;: Call setInputType() on output formats that implement InputTypeConfigurable&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3218&quot;&gt;FLINK-3218&lt;/a&gt;: Fix overriding of user parameters when merging Hadoop configurations&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3189&quot;&gt;FLINK-3189&lt;/a&gt;: Fix argument parsing of CLI client INFO action&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3176&quot;&gt;FLINK-3176&lt;/a&gt;: Improve documentation for window apply&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3185&quot;&gt;FLINK-3185&lt;/a&gt;: Log error on failure during recovery&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3185&quot;&gt;FLINK-3185&lt;/a&gt;: Don’t swallow test failure Exception&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3147&quot;&gt;FLINK-3147&lt;/a&gt;: Expose HadoopOutputFormatBase fields as protected&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3145&quot;&gt;FLINK-3145&lt;/a&gt;: Pin Kryo version of transitive dependencies&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3143&quot;&gt;FLINK-3143&lt;/a&gt;: Update Closure Cleaner’s ASM references to ASM5&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3136&quot;&gt;FLINK-3136&lt;/a&gt;: Fix shaded imports in ClosureCleaner.scala&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3108&quot;&gt;FLINK-3108&lt;/a&gt;: JoinOperator’s with() calls the wrong TypeExtractor method&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3125&quot;&gt;FLINK-3125&lt;/a&gt;: Web server starts also when JobManager log files cannot be accessed.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3080&quot;&gt;FLINK-3080&lt;/a&gt;: Relax restrictions of DataStream.union()&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3081&quot;&gt;FLINK-3081&lt;/a&gt;: Properly stop periodic Kafka committer&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3082&quot;&gt;FLINK-3082&lt;/a&gt;: Fixed confusing error about an interface that no longer exists&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3067&quot;&gt;FLINK-3067&lt;/a&gt;: Enforce zkclient 0.7 for Kafka&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3020&quot;&gt;FLINK-3020&lt;/a&gt;: Set number of task slots to maximum parallelism in local execution&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 11 Feb 2016 09:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/02/11/release-0.10.2.html</link>
<guid isPermaLink="true">/news/2016/02/11/release-0.10.2.html</guid>
</item>

<item>
<title>Flink 2015: A year in review, and a lookout to 2016</title>
<description>&lt;p&gt;With 2015 ending, we thought that this would be good time to reflect
on the amazing work done by the Flink community over this past year,
and how much this community has grown.&lt;/p&gt;

&lt;p&gt;Overall, we have seen Flink grow in terms of functionality from an
engine to one of the most complete open-source stream processing
frameworks available. The community grew from a relatively small and
geographically focused team, to a truly global, and one of the largest
big data communities in the the Apache Software Foundation.&lt;/p&gt;

&lt;p&gt;We will also look at some interesting stats, including that the
busiest days for Flink are Mondays (who would have thought :-).&lt;/p&gt;

&lt;h1 id=&quot;community-growth&quot;&gt;Community growth&lt;/h1&gt;

&lt;p&gt;Let us start with some simple statistics from &lt;a href=&quot;https://github.com/apache/flink&quot;&gt;Flink’s
github repository&lt;/a&gt;. During 2015, the
Flink community &lt;strong&gt;doubled&lt;/strong&gt; in size, from about 75 contributors to
over 150. Forks of the repository more than &lt;strong&gt;tripled&lt;/strong&gt; from 160 in
February 2015 to 544 in December 2015, and the number of stars of the
repository almost tripled from 289 to 813.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/community-growth.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Although Flink started out geographically in Berlin, Germany, the
community is by now spread all around the globe, with many
contributors from North America, Europe, and Asia. A simple search at
meetup.com for groups that mention Flink as a focus area reveals &lt;a href=&quot;http://apache-flink.meetup.com/&quot;&gt;16
meetups around the globe&lt;/a&gt;:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/meetup-map.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;flink-forward-2015&quot;&gt;Flink Forward 2015&lt;/h1&gt;

&lt;p&gt;One of the highlights of the year for Flink was undoubtedly the &lt;a href=&quot;http://2015.flink-forward.org/&quot;&gt;Flink
Forward&lt;/a&gt; conference, the first conference
on Apache Flink that was held in October in Berlin. More than 250
participants (roughly half based outside Germany where the conference
was held) attended more than 33 technical talks from organizations
including Google, MongoDB, Bouygues Telecom, NFLabs, Euranova, RedHat,
IBM, Huawei, Intel, Ericsson, Capital One, Zalando, Amadeus, the Otto
Group, and ResearchGate. If you have not yet watched their talks,
check out the &lt;a href=&quot;http://2015.flink-forward.org/?post_type=day&quot;&gt;slides&lt;/a&gt; and
&lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD31JeWR1aMOi9LXPRQ6nyHO&quot;&gt;videos&lt;/a&gt;
from Flink Forward.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/ff-speakers.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;media-coverage&quot;&gt;Media coverage&lt;/h1&gt;

&lt;p&gt;And of course, interest in Flink was picked up by the tech
media. During 2015, articles about Flink appeared in
&lt;a href=&quot;http://www.infoq.com/Apache-Flink/news/&quot;&gt;InfoQ&lt;/a&gt;,
&lt;a href=&quot;http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/&quot;&gt;ZDNet&lt;/a&gt;,
&lt;a href=&quot;http://www.datanami.com/tag/apache-flink/&quot;&gt;Datanami&lt;/a&gt;,
&lt;a href=&quot;http://www.infoworld.com/article/2919602/hadoop/flink-hadoops-new-contender-for-mapreduce-spark.html&quot;&gt;Infoworld&lt;/a&gt;
(including being one of the &lt;a href=&quot;http://www.infoworld.com/article/2982429/open-source-tools/bossie-awards-2015-the-best-open-source-big-data-tools.html&quot;&gt;best open source big data tools of
2015&lt;/a&gt;),
the &lt;a href=&quot;http://blogs.gartner.com/nick-heudecker/apache-flink-offers-a-challenge-to-spark/&quot;&gt;Gartner
blog&lt;/a&gt;,
&lt;a href=&quot;http://dataconomy.com/tag/apache-flink/&quot;&gt;Dataconomy&lt;/a&gt;,
&lt;a href=&quot;http://sdtimes.com/tag/apache-flink/&quot;&gt;SDTimes&lt;/a&gt;, the &lt;a href=&quot;https://www.mapr.com/blog/apache-flink-new-way-handle-streaming-data&quot;&gt;MapR
blog&lt;/a&gt;,
&lt;a href=&quot;http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html&quot;&gt;KDnuggets&lt;/a&gt;,
and
&lt;a href=&quot;http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html&quot;&gt;HadoopSphere&lt;/a&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/appeared-in.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;It is interesting to see that Hadoop Summit EMEA 2016 had a whopping
number of 17 (!) talks submitted that are mentioning Flink in their
title and abstract:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/hadoop-summit.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;fun-with-stats-when-do-committers-commit&quot;&gt;Fun with stats: when do committers commit?&lt;/h1&gt;

&lt;p&gt;To get some deeper insight on what is happening in the Flink
community, let us do some analytics on the git log of the project :-)
The easiest thing we can do is count the number of commits at the
repository in 2015. Running&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git log --pretty=oneline --after=1/1/2015  | wc -l
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;on the Flink repository yields a total of &lt;strong&gt;2203 commits&lt;/strong&gt; in 2015.&lt;/p&gt;

&lt;p&gt;To dig deeper, we will use an open source tool called gitstats that
will give us some interesting statistics on the committer
behavior. You can create these also yourself and see many more by
following four easy steps:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Download gitstats from the &lt;a href=&quot;http://gitstats.sourceforge.net/&quot;&gt;project homepage&lt;/a&gt;.. E.g., on OS X with homebrew, type&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;brew install --HEAD homebrew/head-only/gitstats
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Clone the Apache Flink git repository:&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git clone git@github.com:apache/flink.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;Generate the statistics&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;gitstats flink/ flink-stats/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
  &lt;li&gt;View all the statistics as an html page using your favorite browser (e.g., chrome):&lt;/li&gt;
&lt;/ol&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;chrome flink-stats/index.html
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;First, we can see a steady growth of lines of code in Flink since the
initial Apache incubator project. During 2015, the codebase almost
&lt;strong&gt;doubled&lt;/strong&gt; from 500,000 LOC to 900,000 LOC.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/code-growth.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;It is interesting to see when committers commit. For Flink, Monday
afternoons are by far the most popular times to commit to the
repository:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/commit-stats.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;feature-timeline&quot;&gt;Feature timeline&lt;/h1&gt;

&lt;p&gt;So, what were the major features added to Flink and the Flink
ecosystem during 2015? Here is a (non-exhaustive) chronological list:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/feature-timeline.png&quot; style=&quot;height:400px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h1 id=&quot;roadmap-for-2016&quot;&gt;Roadmap for 2016&lt;/h1&gt;

&lt;p&gt;With 2015 coming to a close, the Flink community has already started
discussing Flink’s roadmap for the future. Some highlights
are:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Runtime scaling of streaming jobs:&lt;/strong&gt; streaming jobs are running
  forever, and need to react to a changing environment. Runtime
  scaling means dynamically increasing and decreasing the
  parallelism of a job to sustain certain SLAs, or react to changing
  input throughput.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;SQL queries for static data sets and streams:&lt;/strong&gt; building on top of
  Flink’s Table API, users should be able to write SQL
  queries for static data sets, as well as SQL queries on data
  streams that continuously produce new results.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Streaming operators backed by managed memory:&lt;/strong&gt; currently,
  streaming operators like user-defined state and windows are backed
  by JVM heap objects. Moving those to Flink managed memory will add
  the ability to spill to disk, GC efficiency, as well as better
  control over memory utilization.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Library for detecting temporal event patterns:&lt;/strong&gt; a common use case
  for stream processing is detecting patterns in an event stream
  with timestamps. Flink makes this possible with its support for
  event time, so many of these operators can be surfaced in the form
  of a library.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Support for Apache Mesos, and resource-dynamic YARN support:&lt;/strong&gt;
  support for both Mesos and YARN, including dynamic allocation and
  release of resource for more resource elasticity (for both batch
  and stream processing).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Security:&lt;/strong&gt; encrypt both the messages exchanged between
  TaskManagers and JobManager, as well as the connections for data
  exchange between workers.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;More streaming connectors, more runtime metrics, and continuous
  DataStream API enhancements:&lt;/strong&gt; add support for more sources and
  sinks (e.g., Amazon Kinesis, Cassandra, Flume, etc), expose more
  metrics to the user, and provide continuous improvements to the
  DataStream API.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you are interested in these features, we highly encourage you to
take a look at the &lt;a href=&quot;https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-778DAD7eqw5GANwE/edit&quot;&gt;current
draft&lt;/a&gt;,
and &lt;a href=&quot;https://mail-archives.apache.org/mod_mbox/flink-dev/201512.mbox/browser&quot;&gt;join the
discussion&lt;/a&gt;
on the Flink mailing lists.&lt;/p&gt;

</description>
<pubDate>Fri, 18 Dec 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/12/18/a-year-in-review.html</link>
<guid isPermaLink="true">/news/2015/12/18/a-year-in-review.html</guid>
</item>

<item>
<title>Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink</title>
<description>&lt;p&gt;&lt;a href=&quot;https://storm.apache.org&quot;&gt;Apache Storm&lt;/a&gt; was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics.
Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly.
Only shortly afterwards, Twitter acquired Backtype.
Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industry standard for big data stream processing.
In 2013, Storm entered the Apache incubator program, followed by its graduation to top-level in 2014.&lt;/p&gt;

&lt;p&gt;Apache Flink is a stream processing engine that improves upon older technologies like Storm in several dimensions,
including &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/internals/stream_checkpointing.html&quot;&gt;strong consistency guarantees&lt;/a&gt; (“exactly once”),
a higher level &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html&quot;&gt;DataStream API&lt;/a&gt;,
support for &lt;a href=&quot;http://flink.apache.org/news/2015/12/04/Introducing-windows.html&quot;&gt;event time and a rich windowing system&lt;/a&gt;,
as well as &lt;a href=&quot;https://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/&quot;&gt;superior throughput with competitive low latency&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;While Flink offers several technical benefits over Storm, an existing investment on a codebase of applications developed for Storm often makes it difficult to switch engines.
For these reasons, as part of the Flink 0.10 release, Flink ships with a Storm compatibility package that allows users to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Run &lt;strong&gt;unmodified&lt;/strong&gt; Storm topologies using Apache Flink benefiting from superior performance.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Embed&lt;/strong&gt; Storm code (spouts and bolts) as operators inside Flink DataStream programs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Only minor code changes are required in order to submit the program to Flink instead of Storm.
This minimizes the work for developers to run existing Storm topologies while leveraging Apache Flink’s fast and robust execution engine.&lt;/p&gt;

&lt;p&gt;We note that the Storm compatibility package is continuously improving and does not cover the full spectrum of Storm’s API.
However, it is powerful enough to cover many use cases.&lt;/p&gt;

&lt;h2 id=&quot;executing-storm-topologies-with-flink&quot;&gt;Executing Storm topologies with Flink&lt;/h2&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-storm.png&quot; style=&quot;height:200px;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The easiest way to use the Storm compatibility package is by executing a whole Storm topology in Flink.
For this, you only need to replace the dependency &lt;code&gt;storm-core&lt;/code&gt; by &lt;code&gt;flink-storm&lt;/code&gt; in your Storm project and &lt;strong&gt;change two lines of code&lt;/strong&gt; in your original Storm program.&lt;/p&gt;

&lt;p&gt;The following example shows a simple Storm-Word-Count-Program that can be executed in Flink.
First, the program is assembled the Storm way without any code change to Spouts, Bolts, or the topology itself.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// assemble topology, the Storm way&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TopologyBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TopologyBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;source&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormFileSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tokenizer&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltTokenizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;shuffleGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;source&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fieldsGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tokenizer&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Fields&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setBolt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StormBoltFileSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
       &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;shuffleGrouping&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;In order to execute the topology, we need to translate it to a &lt;code&gt;FlinkTopology&lt;/code&gt; and submit it to a local or remote Flink cluster, very similar to submitting the application to a Storm cluster.&lt;sup&gt;&lt;a href=&quot;#fn1&quot; id=&quot;ref1&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// transform Storm topology to Flink program&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// replaces: StormTopology topology = builder.createTopology();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;Config&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runLocal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;c1&quot;&gt;// use FlinkLocalCluster instead of LocalCluster&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;FlinkLocalCluster&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cluster&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkLocalCluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getLocalCluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cluster&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;submitTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
	&lt;span class=&quot;c1&quot;&gt;// use FlinkSubmitter instead of StormSubmitter&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;FlinkSubmitter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;submitTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;As a shorter Flink-style alternative that replaces the Storm-style submission code, you can also use context-based job execution:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// transform Storm topology to Flink program (as above)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTopology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// executes locally by default or remotely if submitted with Flink&amp;#39;s command-line client&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;topology&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After the code is packaged in a jar file (e.g., &lt;code&gt;StormWordCount.jar&lt;/code&gt;), it can be easily submitted to Flink via&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;bin/flink run StormWordCount.jar
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The used Spouts and Bolts as well as the topology assemble code is not changed at all!
Only the translation and submission step have to be changed to the Storm-API compatible Flink pendants.
This allows for minimal code changes and easy adaption to Flink.&lt;/p&gt;

&lt;h3 id=&quot;embedding-spouts-and-bolts-in-flink-programs&quot;&gt;Embedding Spouts and Bolts in Flink programs&lt;/h3&gt;

&lt;p&gt;It is also possible to use Spouts and Bolts within a regular Flink DataStream program.
The compatibility package provides wrapper classes for Spouts and Bolts which are implemented as a Flink &lt;code&gt;SourceFunction&lt;/code&gt; and &lt;code&gt;StreamOperator&lt;/code&gt; respectively.
Those wrappers automatically translate incoming Flink POJO and &lt;code&gt;TupleXX&lt;/code&gt; records into Storm’s &lt;code&gt;Tuple&lt;/code&gt; type and emitted &lt;code&gt;Values&lt;/code&gt; back into either POJOs or &lt;code&gt;TupleXX&lt;/code&gt; types for further processing by Flink operators.
As Storm is type agnostic, it is required to specify the output type of embedded Spouts/Bolts manually to get a fully typed Flink streaming program.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// use regular Flink streaming environment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// use Spout as source&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; 
  &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// Flink provided wrapper including original Spout&lt;/span&gt;
                &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SpoutWrapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FileSpout&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;localFilePath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt; 
                &lt;span class=&quot;c1&quot;&gt;// specify output type manually&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;TypeExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getForObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)));&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// FileSpout cannot be parallelized&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setParallelism&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// further processing with Flink&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Tokenizer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// use Bolt for counting&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;counts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
  &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;transform&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;c1&quot;&gt;// specify output type manually&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;TypeExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getForObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
                   &lt;span class=&quot;c1&quot;&gt;// Flink provided wrapper including original Bolt&lt;/span&gt;
                   &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BoltWrapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;BoltCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// write result to file via Flink sink&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;counts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeAsText&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;outputPath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// start Flink job&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WordCount with Spout source and Bolt counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Although some boilerplate code is needed (we plan to address this soon!), the actual embedded Spout and Bolt code can be used unmodified.
We also note that the resulting program is fully typed, and type errors will be found by Flink’s type extractor even if the original Spouts and Bolts are not.&lt;/p&gt;

&lt;h2 id=&quot;outlook&quot;&gt;Outlook&lt;/h2&gt;

&lt;p&gt;The Storm compatibility package is currently in beta and undergoes continuous development.
We are currently working on providing consistency guarantees for stateful Bolts.
Furthermore, we want to provide a better API integration for embedded Spouts and Bolts by providing a “StormExecutionEnvironment” as a special extension of Flink’s &lt;code&gt;StreamExecutionEnvironment&lt;/code&gt;.
We are also investigating the integration of Storm’s higher-level programming API Trident.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;Flink’s compatibility package for Storm allows using unmodified Spouts and Bolts within Flink.
This enables you to even embed third-party Spouts and Bolts where the source code is not available.
While you can embed Spouts/Bolts in a Flink program and mix-and-match them with Flink operators, running whole topologies is the easiest way to get started and can be achieved with almost no code changes.&lt;/p&gt;

&lt;p&gt;If you want to try out Flink’s Storm compatibility package checkout our &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/storm_compatibility.html&quot;&gt;Documentation&lt;/a&gt;.&lt;/p&gt;

&lt;hr /&gt;

&lt;p&gt;&lt;sup id=&quot;fn1&quot;&gt;1. We confess, there are three lines changed compared to a Storm project &lt;img class=&quot;emoji&quot; style=&quot;width:16px;height:16px;align:absmiddle&quot; src=&quot;/img/blog/smirk.png&quot; /&gt;—because the example covers local &lt;em&gt;and&lt;/em&gt; remote execution. &lt;a href=&quot;#ref1&quot; title=&quot;Back to text.&quot;&gt;↩&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

</description>
<pubDate>Fri, 11 Dec 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/12/11/storm-compatibility.html</link>
<guid isPermaLink="true">/news/2015/12/11/storm-compatibility.html</guid>
</item>

<item>
<title>Introducing Stream Windows in Apache Flink</title>
<description>&lt;p&gt;The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink’s API features very flexible window definitions on data streams which let it stand out among other open source stream processors.&lt;/p&gt;

&lt;p&gt;In this blog post, we discuss the concept of windows for stream processing, present Flink’s built-in windows, and explain its support for custom windowing semantics.&lt;/p&gt;

&lt;h2 id=&quot;what-are-windows-and-what-are-they-good-for&quot;&gt;What are windows and what are they good for?&lt;/h2&gt;

&lt;p&gt;Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/window-stream.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;If you would like to know, how many vehicles passed that location, you would simply sum the individual counts. However, the nature of a sensor stream is that it continuously produces data. Such a stream never ends and it is not possible to compute a final sum that can be returned. Instead, it is possible to compute rolling sums, i.e., return for each input event an updated sum record. This would yield a new stream of partial sums.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/window-rolling-sum.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;However, a stream of partial sums might not be what we are looking for, because it constantly updates the count and even more important, some information such as variation over time is lost. Hence, we might want to rephrase our question and ask for the number of cars that pass the location every minute. This requires us to group the elements of the stream into finite sets, each set corresponding to sixty seconds. This operation is called a &lt;em&gt;tumbling windows&lt;/em&gt; operation.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/window-tumbling-window.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Tumbling windows discretize a stream into non-overlapping windows. For certain applications it is important that windows are not disjunct because an application might require smoothed aggregates. For example, we can compute every thirty seconds the number of cars passed in the last minute. Such windows are called &lt;em&gt;sliding windows&lt;/em&gt;.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/window-sliding-window.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Defining windows on a data stream as discussed before is a non-parallel operation. This is because each element of a stream must be processed by the same window operator that decides which windows the element should be added to. Windows on a full stream are called &lt;em&gt;AllWindows&lt;/em&gt; in Flink. For many applications, a data stream needs to be grouped into multiple logical streams on each of which a window operator can be applied. Think for example about a stream of vehicle counts from multiple traffic sensors (instead of only one sensor as in our previous example), where each sensor monitors a different location. By grouping the stream by sensor id, we can compute windowed traffic statistics for each location in parallel. In Flink, we call such partitioned windows simply &lt;em&gt;Windows&lt;/em&gt;, as they are the common case for distributed streams. The following figure shows tumbling windows that collect two elements over a stream of &lt;code&gt;(sensorId, count)&lt;/code&gt; pair elements.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/windows-keyed.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Generally speaking, a window defines a finite set of elements on an unbounded stream. This set can be based on time (as in our previous examples), element counts, a combination of counts and time, or some custom logic to assign elements to windows. Flink’s DataStream API provides concise operators for the most common window operations as well as a generic windowing mechanism that allows users to define very custom windowing logic. In the following we present Flink’s time and count windows before discussing its windowing mechanism in detail.&lt;/p&gt;

&lt;h2 id=&quot;time-windows&quot;&gt;Time Windows&lt;/h2&gt;

&lt;p&gt;As their name suggests, time windows group stream elements by time. For example, a tumbling time window of one minute collects elements for one minute and applies a function on all elements in the window after one minute passed.&lt;/p&gt;

&lt;p&gt;Defining tumbling and sliding time windows in Apache Flink is very easy:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Stream of (sensorId, carCnt)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tumblingCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// key stream by sensorId&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
  &lt;span class=&quot;c1&quot;&gt;// tumbling time window of 1 minute length&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minutes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// compute sum over carCnt&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;slidingCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
  &lt;span class=&quot;c1&quot;&gt;// sliding time window of 1 minute length and 30 secs trigger interval&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minutes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;There is one aspect that we haven’t discussed yet, namely the exact meaning of “&lt;em&gt;collects elements for one minute&lt;/em&gt;” which boils down to the question, “&lt;em&gt;How does the stream processor interpret time?&lt;/em&gt;”.&lt;/p&gt;

&lt;p&gt;Apache Flink features three different notions of time, namely &lt;em&gt;processing time&lt;/em&gt;, &lt;em&gt;event time&lt;/em&gt;, and &lt;em&gt;ingestion time&lt;/em&gt;.&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;In &lt;strong&gt;processing time&lt;/strong&gt;, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute.&lt;/li&gt;
  &lt;li&gt;In &lt;strong&gt;event time&lt;/strong&gt;, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Ingestion time&lt;/strong&gt; is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2 id=&quot;count-windows&quot;&gt;Count Windows&lt;/h2&gt;

&lt;p&gt;Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added.&lt;/p&gt;

&lt;p&gt;In Flink’s DataStream API, tumbling and sliding count windows are defined as follows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Stream of (sensorId, carCnt)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tumblingCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// key stream by sensorId&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// tumbling count window of 100 elements size&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;countWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// compute the carCnt sum &lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;slidingCnts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vehicleCnts&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;c1&quot;&gt;// sliding count window of 100 elements size and 10 elements trigger interval&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;countWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2 id=&quot;dissecting-flinks-windowing-mechanics&quot;&gt;Dissecting Flink’s windowing mechanics&lt;/h2&gt;

&lt;p&gt;Flink’s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink’s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated.&lt;/p&gt;

&lt;p&gt;The following figure depicts Flink’s windowing mechanism and introduces the components being involved.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/window-intro/window-mechanics.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Elements that arrive at a window operator are handed to a &lt;code&gt;WindowAssigner&lt;/code&gt;. The WindowAssigner assigns elements to one or more windows, possibly creating new windows. A &lt;code&gt;Window&lt;/code&gt; itself is just an identifier for a list of elements and may provide some optional meta information, such as begin and end time in case of a &lt;code&gt;TimeWindow&lt;/code&gt;. Note that an element can be added to multiple windows, which also means that multiple windows can exist at the same time.&lt;/p&gt;

&lt;p&gt;Each window owns a &lt;code&gt;Trigger&lt;/code&gt; that decides when the window is evaluated or purged. The trigger is called for each element that is inserted into the window and when a previously registered timer times out. On each event, a trigger can decide to fire (i.e., evaluate), purge (remove the window and discard its content), or fire and then purge the window. A trigger that just fires evaluates the window and keeps it as it is, i.e., all elements remain in the window and are evaluated again when the triggers fires the next time. A window can be evaluated several times and exists until it is purged. Note that a window consumes memory until it is purged.&lt;/p&gt;

&lt;p&gt;When a Trigger fires, the list of window elements can be given to an optional &lt;code&gt;Evictor&lt;/code&gt;. The evictor can iterate through the list and decide to cut off some elements from the start of the list, i.e., remove some of the elements that entered the window first. The remaining elements are given to an evaluation function. If no Evictor was defined, the Trigger hands all the window elements directly to the evaluation function.&lt;/p&gt;

&lt;p&gt;The evaluation function receives the elements of a window (possibly filtered by an Evictor) and computes one or more result elements for the window. The DataStream API accepts different types of evaluation functions, including predefined aggregation functions such as &lt;code&gt;sum()&lt;/code&gt;, &lt;code&gt;min()&lt;/code&gt;, &lt;code&gt;max()&lt;/code&gt;, as well as a &lt;code&gt;ReduceFunction&lt;/code&gt;, &lt;code&gt;FoldFunction&lt;/code&gt;, or &lt;code&gt;WindowFunction&lt;/code&gt;. A WindowFunction is the most generic evaluation function and receives the window object (i.e, the meta data of the window), the list of window elements, and the window key (in case of a keyed window) as parameters.&lt;/p&gt;

&lt;p&gt;These are the components that constitute Flink’s windowing mechanics. We now show step-by-step how to implement custom windowing logic with the DataStream API. We start with a stream of type &lt;code&gt;DataStream[IN]&lt;/code&gt; and key it using a key selector function that extracts a key of type &lt;code&gt;KEY&lt;/code&gt; to obtain a &lt;code&gt;KeyedStream[IN, KEY]&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// created a keyed stream using a key selector function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;KeyedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myKeySel&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We apply a &lt;code&gt;WindowAssigner[IN, WINDOW]&lt;/code&gt; that creates windows of type &lt;code&gt;WINDOW&lt;/code&gt; resulting in a &lt;code&gt;WindowedStream[IN, KEY, WINDOW]&lt;/code&gt;. In addition, a &lt;code&gt;WindowAssigner&lt;/code&gt; also provides a default &lt;code&gt;Trigger&lt;/code&gt; implementation.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create windowed stream using a WindowAssigner&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;WindowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;KEY&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;WINDOW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myAssigner&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;WindowAssigner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;WINDOW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can explicitly specify a &lt;code&gt;Trigger&lt;/code&gt; to overwrite the default &lt;code&gt;Trigger&lt;/code&gt; provided by the &lt;code&gt;WindowAssigner&lt;/code&gt;. Note that specifying a triggers does not add an additional trigger condition but replaces the current trigger.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// override the default trigger of the WindowAssigner&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;trigger&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myTrigger&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Trigger&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;WINDOW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We may want to specify an optional &lt;code&gt;Evictor&lt;/code&gt; as follows.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// specify an optional evictor&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evictor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myEvictor&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Evictor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;WINDOW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Finally, we apply a &lt;code&gt;WindowFunction&lt;/code&gt; that returns elements of type &lt;code&gt;OUT&lt;/code&gt; to obtain a &lt;code&gt;DataStream[OUT]&lt;/code&gt;.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// apply window function to windowed stream&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myWinFunc&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;WindowFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;IN&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;OUT&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;KEY&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;WINDOW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With Flink’s internal windowing mechanics and its exposure through the DataStream API it is possible to implement very custom windowing logic such as session windows or windows that emit early results if the values exceed a certain threshold.&lt;/p&gt;

&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;

&lt;p&gt;Support for various types of windows over continuous data streams is a must-have for modern stream processors. Apache Flink is a stream processor with a very strong feature set, including a very flexible mechanism to build and evaluate windows over continuous data streams. Flink provides pre-defined window operators for common uses cases as well as a toolbox that allows to define very custom windowing logic. The Flink community will add more pre-defined window operators as we learn the requirements from our users.&lt;/p&gt;
</description>
<pubDate>Fri, 04 Dec 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/12/04/Introducing-windows.html</link>
<guid isPermaLink="true">/news/2015/12/04/Introducing-windows.html</guid>
</item>

<item>
<title>Flink 0.10.1 released</title>
<description>&lt;p&gt;Today, the Flink community released the first bugfix release of the 0.10 series of Flink.&lt;/p&gt;

&lt;p&gt;We recommend all users updating to this release, by bumping the version of your Flink dependencies and updating the binaries on the server.&lt;/p&gt;

&lt;h2 id=&quot;issues-fixed&quot;&gt;Issues fixed&lt;/h2&gt;

&lt;ul class=&quot;list-unstyled&quot;&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2879&quot;&gt;FLINK-2879&lt;/a&gt;] -         Links in documentation are broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2938&quot;&gt;FLINK-2938&lt;/a&gt;] -         Streaming docs not in sync with latest state changes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2942&quot;&gt;FLINK-2942&lt;/a&gt;] -         Dangling operators in web UI&amp;#39;s program visualization (non-deterministic)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2967&quot;&gt;FLINK-2967&lt;/a&gt;] -         TM address detection might not always detect the right interface on slow networks / overloaded JMs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2977&quot;&gt;FLINK-2977&lt;/a&gt;] -         Cannot access HBase in a Kerberos secured Yarn cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2987&quot;&gt;FLINK-2987&lt;/a&gt;] -         Flink 0.10 fails to start on YARN 2.6.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2989&quot;&gt;FLINK-2989&lt;/a&gt;] -         Job Cancel button doesn&amp;#39;t work on Yarn
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3005&quot;&gt;FLINK-3005&lt;/a&gt;] -         Commons-collections object deserialization remote command execution vulnerability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3011&quot;&gt;FLINK-3011&lt;/a&gt;] -         Cannot cancel failing/restarting streaming job from the command line
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3019&quot;&gt;FLINK-3019&lt;/a&gt;] -         CLI does not list running/restarting jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3020&quot;&gt;FLINK-3020&lt;/a&gt;] -         Local streaming execution: set number of task manager slots to the maximum parallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3024&quot;&gt;FLINK-3024&lt;/a&gt;] -         TimestampExtractor Does not Work When returning Long.MIN_VALUE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3032&quot;&gt;FLINK-3032&lt;/a&gt;] -         Flink does not start on Hadoop 2.7.1 (HDP), due to class conflict
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3043&quot;&gt;FLINK-3043&lt;/a&gt;] -         Kafka Connector description in Streaming API guide is wrong/outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3047&quot;&gt;FLINK-3047&lt;/a&gt;] -         Local batch execution: set number of task manager slots to the maximum parallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3052&quot;&gt;FLINK-3052&lt;/a&gt;] -         Optimizer does not push properties out of bulk iterations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2966&quot;&gt;FLINK-2966&lt;/a&gt;] -         Improve the way job duration is reported on web frontend.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2974&quot;&gt;FLINK-2974&lt;/a&gt;] -         Add periodic offset commit to Kafka Consumer if checkpointing is disabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3028&quot;&gt;FLINK-3028&lt;/a&gt;] -         Cannot cancel restarting job via web frontend
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3040&quot;&gt;FLINK-3040&lt;/a&gt;] -         Add docs describing how to configure State Backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3041&quot;&gt;FLINK-3041&lt;/a&gt;] -         Twitter Streaming Description section of Streaming Programming guide refers to an incorrect example &amp;#39;TwitterLocal&amp;#39;
&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Fri, 27 Nov 2015 09:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/11/27/release-0.10.1.html</link>
<guid isPermaLink="true">/news/2015/11/27/release-0.10.1.html</guid>
</item>

<item>
<title>Announcing Apache Flink 0.10.0</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 0.10.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on data stream processing and operational features. About 80 contributors provided bug fixes, improvements, and new features such that in total more than 400 JIRA issues could be resolved.&lt;/p&gt;

&lt;p&gt;For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes.&lt;/p&gt;

&lt;p&gt;We encourage everyone to &lt;a href=&quot;/downloads.html&quot;&gt;download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/&quot;&gt;check out the documentation&lt;/a&gt;. Feedback through the Flink &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/p&gt;

&lt;h2 id=&quot;new-features&quot;&gt;New Features&lt;/h2&gt;

&lt;h3 id=&quot;event-time-stream-processing&quot;&gt;Event-time Stream Processing&lt;/h3&gt;

&lt;p&gt;Many stream processing applications consume data from sources that produce events with associated timestamps such as sensor or user-interaction events. Very often, events have to be collected from several sources such that it is usually not guaranteed that events arrive in the exact order of their timestamps at the stream processor. Consequently, stream processors must take out-of-order elements into account in order to produce results which are correct and consistent with respect to the timestamps of the events. With release 0.10.0, Apache Flink supports event-time processing as well as ingestion-time and processing-time processing. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2674&quot;&gt;FLINK-2674&lt;/a&gt; for details.&lt;/p&gt;

&lt;h3 id=&quot;stateful-stream-processing&quot;&gt;Stateful Stream Processing&lt;/h3&gt;

&lt;p&gt;Operators that maintain and update state are a common pattern in many stream processing applications. Since streaming applications tend to run for a very long time, operator state can become very valuable and impossible to recompute. In order to enable fault-tolerance, operator state must be backed up to persistent storage in regular intervals. Flink 0.10.0 offers flexible interfaces to define, update, and query operator state and hooks to connect various state backends.&lt;/p&gt;

&lt;h3 id=&quot;highly-available-cluster-operations&quot;&gt;Highly-available Cluster Operations&lt;/h3&gt;

&lt;p&gt;Stream processing applications may be live for months. Therefore, a production-ready stream processor must be highly-available and continue to process data even in the face of failures. With release 0.10.0, Flink supports high availability modes for standalone cluster and &lt;a href=&quot;https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html&quot;&gt;YARN&lt;/a&gt; setups, eliminating any single point of failure. In this mode, Flink relies on &lt;a href=&quot;https://zookeeper.apache.org&quot;&gt;Apache Zookeeper&lt;/a&gt; for leader election and persisting small sized meta-data of running jobs. You can &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/jobmanager_high_availability.html&quot;&gt;check out the documentation&lt;/a&gt; to see how to enable high availability. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2287&quot;&gt;FLINK-2287&lt;/a&gt; for details.&lt;/p&gt;

&lt;h3 id=&quot;graduated-datastream-api&quot;&gt;Graduated DataStream API&lt;/h3&gt;

&lt;p&gt;The DataStream API was revised based on user feedback and with foresight for upcoming features and graduated from beta status to fully supported. The most obvious changes are related to the methods for stream partitioning and window operations. The new windowing system is based on the concepts of window assigners, triggers, and evictors, inspired by the &lt;a href=&quot;http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf&quot;&gt;Dataflow Model&lt;/a&gt;. The new API is fully described in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html&quot;&gt;DataStream API documentation&lt;/a&gt;. This &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.9.x+to+0.10.x&quot;&gt;migration guide&lt;/a&gt; will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2674&quot;&gt;FLINK-2674&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2877&quot;&gt;FLINK-2877&lt;/a&gt; for details.&lt;/p&gt;

&lt;h3 id=&quot;new-connectors-for-data-streams&quot;&gt;New Connectors for Data Streams&lt;/h3&gt;

&lt;p&gt;Apache Flink 0.10.0 features DataStream sources and sinks for many common data producers and stores. This includes an exactly-once rolling file sink which supports any file system, including HDFS, local FS, and S3. We also updated the &lt;a href=&quot;https://kafka.apache.org&quot;&gt;Apache Kafka&lt;/a&gt; producer to use the new producer API, and added a connectors for &lt;a href=&quot;https://github.com/elastic/elasticsearch&quot;&gt;ElasticSearch&lt;/a&gt; and &lt;a href=&quot;https://nifi.apache.org&quot;&gt;Apache Nifi&lt;/a&gt;. More connectors for DataStream programs will be added by the community in the future. See the following JIRA issues for details &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2583&quot;&gt;FLINK-2583&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2386&quot;&gt;FLINK-2386&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2372&quot;&gt;FLINK-2372&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2740&quot;&gt;FLINK-2740&lt;/a&gt;, and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2558&quot;&gt;FLINK-2558&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;new-web-dashboard--real-time-monitoring&quot;&gt;New Web Dashboard &amp;amp; Real-time Monitoring&lt;/h3&gt;

&lt;p&gt;The 0.10.0 release features a newly designed and significantly improved monitoring dashboard for Apache Flink. The new dashboard visualizes the progress of running jobs and shows real-time statistics of processed data volumes and record counts. Moreover, it gives access to resource usage and JVM statistics of TaskManagers including JVM heap usage and garbage collection details. The following screenshot shows the job view of the new dashboard.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/new-dashboard-screenshot.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The web server that provides all monitoring statistics has been designed with a REST interface allowing other systems to also access the internal system metrics. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2357&quot;&gt;FLINK-2357&lt;/a&gt; for details.&lt;/p&gt;

&lt;h3 id=&quot;off-heap-managed-memory&quot;&gt;Off-heap Managed Memory&lt;/h3&gt;

&lt;p&gt;Flink’s internal operators (such as its sort algorithm and hash tables) write data to and read data from managed memory to achieve memory-safe operations and reduce garbage collection overhead. Until version 0.10.0, managed memory was allocated only from JVM heap memory. With this release, managed memory can also be allocated from off-heap memory. This will facilitate shorter TaskManager start-up times as well as reduce garbage collection pressure. See &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/setup/config.html#managed-memory&quot;&gt;the documentation&lt;/a&gt; to learn how to configure managed memory on off-heap memory. JIRA issue &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1320&quot;&gt;FLINK-1320&lt;/a&gt; contains further details.&lt;/p&gt;

&lt;h3 id=&quot;outer-joins&quot;&gt;Outer Joins&lt;/h3&gt;

&lt;p&gt;Outer joins have been one of the most frequently requested features for Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/programming_guide.html&quot;&gt;DataSet API&lt;/a&gt;. Although there was a workaround to implement outer joins as CoGroup function, it had significant drawbacks including added code complexity and not being fully memory-safe. With release 0.10.0, Flink adds native support for &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/dataset_transformations.html#outerjoin&quot;&gt;left, right, and full outer joins&lt;/a&gt; to the DataSet API. All outer joins are backed by a memory-safe operator implementation that leverages Flink’s managed memory. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-687&quot;&gt;FLINK-687&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2107&quot;&gt;FLINK-2107&lt;/a&gt; for details.&lt;/p&gt;

&lt;h3 id=&quot;gelly-major-improvements-and-scala-api&quot;&gt;Gelly: Major Improvements and Scala API&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.10/libs/gelly_guide.html&quot;&gt;Gelly&lt;/a&gt; is Flink’s API and library for processing and analyzing large-scale graphs. Gelly was introduced with release 0.9.0 and has been very well received by users and contributors. Based on user feedback, Gelly has been improved since then. In addition, Flink 0.10.0 introduces a Scala API for Gelly. See &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2857&quot;&gt;FLINK-2857&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1962&quot;&gt;FLINK-1962&lt;/a&gt; for details.&lt;/p&gt;

&lt;h2 id=&quot;more-improvements-and-fixes&quot;&gt;More Improvements and Fixes&lt;/h2&gt;

&lt;p&gt;The Flink community resolved more than 400 issues. The following list is a selection of new features and fixed bugs.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1851&quot;&gt;FLINK-1851&lt;/a&gt; Java Table API does not support Casting&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2152&quot;&gt;FLINK-2152&lt;/a&gt; Provide zipWithIndex utility in flink-contrib&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2158&quot;&gt;FLINK-2158&lt;/a&gt; NullPointerException in DateSerializer.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2240&quot;&gt;FLINK-2240&lt;/a&gt; Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2533&quot;&gt;FLINK-2533&lt;/a&gt; Gap based random sample optimization&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2555&quot;&gt;FLINK-2555&lt;/a&gt; Hadoop Input/Output Formats are unable to access secured HDFS clusters&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2565&quot;&gt;FLINK-2565&lt;/a&gt; Support primitive arrays as keys&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2582&quot;&gt;FLINK-2582&lt;/a&gt; Document how to build Flink with other Scala versions&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2584&quot;&gt;FLINK-2584&lt;/a&gt; ASM dependency is not shaded away&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2689&quot;&gt;FLINK-2689&lt;/a&gt; Reusing null object for joins with SolutionSet&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2703&quot;&gt;FLINK-2703&lt;/a&gt; Remove log4j classes from fat jar / document how to use Flink with logback&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2763&quot;&gt;FLINK-2763&lt;/a&gt; Bug in Hybrid Hash Join: Request to spill a partition with less than two buffers.&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2767&quot;&gt;FLINK-2767&lt;/a&gt; Add support Scala 2.11 to Scala shell&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2774&quot;&gt;FLINK-2774&lt;/a&gt; Import Java API classes automatically in Flink’s Scala shell&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2782&quot;&gt;FLINK-2782&lt;/a&gt; Remove deprecated features for 0.10&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2800&quot;&gt;FLINK-2800&lt;/a&gt; kryo serialization problem&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2834&quot;&gt;FLINK-2834&lt;/a&gt; Global round-robin for temporary directories&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2842&quot;&gt;FLINK-2842&lt;/a&gt; S3FileSystem is broken&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2874&quot;&gt;FLINK-2874&lt;/a&gt; Certain Avro generated getters/setters not recognized&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2895&quot;&gt;FLINK-2895&lt;/a&gt; Duplicate immutable object creation&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2964&quot;&gt;FLINK-2964&lt;/a&gt; MutableHashTable fails when spilling partitions without overflow segments&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;notice&quot;&gt;Notice&lt;/h2&gt;

&lt;p&gt;As previously announced, Flink 0.10.0 no longer supports Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015).
Also note that some methods in the DataStream API had to be renamed as part of the API rework. For example the &lt;code&gt;groupBy&lt;/code&gt; method has been renamed to &lt;code&gt;keyBy&lt;/code&gt; and the windowing API changed. This &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Migration+Guide%3A+0.9.x+to+0.10.x&quot;&gt;migration guide&lt;/a&gt; will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0.&lt;/p&gt;

&lt;h2 id=&quot;contributors&quot;&gt;Contributors&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Alexander Alexandrov&lt;/li&gt;
  &lt;li&gt;Marton Balassi&lt;/li&gt;
  &lt;li&gt;Enrique Bautista&lt;/li&gt;
  &lt;li&gt;Faye Beligianni&lt;/li&gt;
  &lt;li&gt;Bryan Bende&lt;/li&gt;
  &lt;li&gt;Ajay Bhat&lt;/li&gt;
  &lt;li&gt;Chris Brinkman&lt;/li&gt;
  &lt;li&gt;Dmitry Buzdin&lt;/li&gt;
  &lt;li&gt;Kun Cao&lt;/li&gt;
  &lt;li&gt;Paris Carbone&lt;/li&gt;
  &lt;li&gt;Ufuk Celebi&lt;/li&gt;
  &lt;li&gt;Shivani Chandna&lt;/li&gt;
  &lt;li&gt;Liang Chen&lt;/li&gt;
  &lt;li&gt;Felix Cheung&lt;/li&gt;
  &lt;li&gt;Hubert Czerpak&lt;/li&gt;
  &lt;li&gt;Vimal Das&lt;/li&gt;
  &lt;li&gt;Behrouz Derakhshan&lt;/li&gt;
  &lt;li&gt;Suminda Dharmasena&lt;/li&gt;
  &lt;li&gt;Stephan Ewen&lt;/li&gt;
  &lt;li&gt;Fengbin Fang&lt;/li&gt;
  &lt;li&gt;Gyula Fora&lt;/li&gt;
  &lt;li&gt;Lun Gao&lt;/li&gt;
  &lt;li&gt;Gabor Gevay&lt;/li&gt;
  &lt;li&gt;Piotr Godek&lt;/li&gt;
  &lt;li&gt;Sachin Goel&lt;/li&gt;
  &lt;li&gt;Anton Haglund&lt;/li&gt;
  &lt;li&gt;Gábor Hermann&lt;/li&gt;
  &lt;li&gt;Greg Hogan&lt;/li&gt;
  &lt;li&gt;Fabian Hueske&lt;/li&gt;
  &lt;li&gt;Martin Junghanns&lt;/li&gt;
  &lt;li&gt;Vasia Kalavri&lt;/li&gt;
  &lt;li&gt;Ulf Karlsson&lt;/li&gt;
  &lt;li&gt;Frederick F. Kautz&lt;/li&gt;
  &lt;li&gt;Samia Khalid&lt;/li&gt;
  &lt;li&gt;Johannes Kirschnick&lt;/li&gt;
  &lt;li&gt;Kostas Kloudas&lt;/li&gt;
  &lt;li&gt;Alexander Kolb&lt;/li&gt;
  &lt;li&gt;Johann Kovacs&lt;/li&gt;
  &lt;li&gt;Aljoscha Krettek&lt;/li&gt;
  &lt;li&gt;Sebastian Kruse&lt;/li&gt;
  &lt;li&gt;Andreas Kunft&lt;/li&gt;
  &lt;li&gt;Chengxiang Li&lt;/li&gt;
  &lt;li&gt;Chen Liang&lt;/li&gt;
  &lt;li&gt;Andra Lungu&lt;/li&gt;
  &lt;li&gt;Suneel Marthi&lt;/li&gt;
  &lt;li&gt;Tamara Mendt&lt;/li&gt;
  &lt;li&gt;Robert Metzger&lt;/li&gt;
  &lt;li&gt;Maximilian Michels&lt;/li&gt;
  &lt;li&gt;Chiwan Park&lt;/li&gt;
  &lt;li&gt;Sahitya Pavurala&lt;/li&gt;
  &lt;li&gt;Pietro Pinoli&lt;/li&gt;
  &lt;li&gt;Ricky Pogalz&lt;/li&gt;
  &lt;li&gt;Niraj Rai&lt;/li&gt;
  &lt;li&gt;Lokesh Rajaram&lt;/li&gt;
  &lt;li&gt;Johannes Reifferscheid&lt;/li&gt;
  &lt;li&gt;Till Rohrmann&lt;/li&gt;
  &lt;li&gt;Henry Saputra&lt;/li&gt;
  &lt;li&gt;Matthias Sax&lt;/li&gt;
  &lt;li&gt;Shiti Saxena&lt;/li&gt;
  &lt;li&gt;Chesnay Schepler&lt;/li&gt;
  &lt;li&gt;Peter Schrott&lt;/li&gt;
  &lt;li&gt;Saumitra Shahapure&lt;/li&gt;
  &lt;li&gt;Nikolaas Steenbergen&lt;/li&gt;
  &lt;li&gt;Thomas Sun&lt;/li&gt;
  &lt;li&gt;Peter Szabo&lt;/li&gt;
  &lt;li&gt;Viktor Taranenko&lt;/li&gt;
  &lt;li&gt;Kostas Tzoumas&lt;/li&gt;
  &lt;li&gt;Pieter-Jan Van Aeken&lt;/li&gt;
  &lt;li&gt;Theodore Vasiloudis&lt;/li&gt;
  &lt;li&gt;Timo Walther&lt;/li&gt;
  &lt;li&gt;Chengxuan Wang&lt;/li&gt;
  &lt;li&gt;Huang Wei&lt;/li&gt;
  &lt;li&gt;Dawid Wysakowicz&lt;/li&gt;
  &lt;li&gt;Rerngvit Yanggratoke&lt;/li&gt;
  &lt;li&gt;Nezih Yigitbasi&lt;/li&gt;
  &lt;li&gt;Ted Yu&lt;/li&gt;
  &lt;li&gt;Rucong Zhang&lt;/li&gt;
  &lt;li&gt;Vyacheslav Zholudev&lt;/li&gt;
  &lt;li&gt;Zoltán Zvara&lt;/li&gt;
&lt;/ul&gt;

</description>
<pubDate>Mon, 16 Nov 2015 09:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/11/16/release-0.10.0.html</link>
<guid isPermaLink="true">/news/2015/11/16/release-0.10.0.html</guid>
</item>

<item>
<title>Off-heap Memory in Apache Flink and the curious JIT compiler</title>
<description>&lt;p&gt;Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, “off-heap” has become almost something like a magic word to solve these problems.&lt;/p&gt;

&lt;p&gt;In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java’s JIT compiler for highly optimized methods and loops.&lt;/p&gt;

&lt;h2 id=&quot;recap-memory-management-in-flink&quot;&gt;Recap: Memory Management in Flink&lt;/h2&gt;

&lt;p&gt;To understand Flink’s approach to off-heap memory, we need to recap Flink’s approach to custom managed memory. We have written an &lt;a href=&quot;/news/2015/05/11/Juggling-with-Bits-and-Bytes.html&quot;&gt;earlier blog post about how Flink manages JVM memory itself&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As a summary, the core part is that Flink implements its algorithms not against Java objects, arrays, or lists, but actually against a data structure similar to &lt;code&gt;java.nio.ByteBuffer&lt;/code&gt;. Flink uses its own specialized version, called &lt;a href=&quot;https://github.com/apache/flink/blob/release-0.9.1-rc1/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java&quot;&gt;&lt;code&gt;MemorySegment&lt;/code&gt;&lt;/a&gt; on which algorithms put and get at specific positions ints, longs, byte arrays, etc, and compare and copy memory. The memory segments are held and distributed by a central component (called &lt;code&gt;MemoryManager&lt;/code&gt;) from which algorithms request segments according to their calculated memory budgets.&lt;/p&gt;

&lt;p&gt;Don’t believe that this can be fast? Have a look at the &lt;a href=&quot;/news/2015/05/11/Juggling-with-Bits-and-Bytes.html&quot;&gt;benchmarks in the earlier blogpost&lt;/a&gt;, which show that it is actually often much faster than working on objects, due to better control over data layout (cache efficiency, data size), and reducing the pressure on Java’s Garbage Collector.&lt;/p&gt;

&lt;p&gt;This form of memory management has been in Flink for a long time. Anecdotally, the first public demo of Flink’s predecessor project &lt;em&gt;Stratosphere&lt;/em&gt;, at the VLDB conference in 2010, was running its programs with custom managed memory (although I believe few attendees were aware of that).&lt;/p&gt;

&lt;h2 id=&quot;why-actually-bother-with-off-heap-memory&quot;&gt;Why actually bother with off-heap memory?&lt;/h2&gt;

&lt;p&gt;Given that Flink has a sophisticated level of managing on-heap memory, why do we even bother with off-heap memory? It is true that &lt;em&gt;“out of memory”&lt;/em&gt; has been much less of a problem for Flink because of its heap memory management techniques. Nonetheless, there are a few good reasons to offer the possibility to move Flink’s managed memory out of the JVM heap:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Very large JVMs (100s of GBytes heap memory) tend to be tricky. It takes long to start them (allocate and initialize heap) and garbage collection stalls can be huge (minutes). While newer incremental garbage collectors (like G1) mitigate this problem to some extend, an even better solution is to just make the heap much smaller and allocate Flink’s managed memory chunks outside the heap.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;I/O and network efficiency: In many cases, we write MemorySegments to disk (spilling) or to the network (data transfer). Off-heap memory can be written/transferred with zero copies, while heap memory always incurs an additional memory copy.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Off-heap memory can actually be owned by other processes. That way, cached data survives process crashes (due to user code exceptions) and can be used for recovery. Flink does not exploit that, yet, but it is interesting future work.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The opposite question is also valid. Why should Flink ever not use off-heap memory?&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;On-heap is easier and interplays better with tools. Some container environments and monitoring tools get confused when the monitored heap size does not remotely reflect the amount of memory used by the process.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Short lived memory segments are cheaper on the heap. Flink sometimes needs to allocate some short lived buffers, which works cheaper on the heap than off-heap.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Some operations are actually a bit faster on heap memory (or the JIT compiler understands them better).&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;the-off-heap-memory-implementation&quot;&gt;The off-heap Memory Implementation&lt;/h2&gt;

&lt;p&gt;Given that all memory intensive internal algorithms are already implemented against the &lt;code&gt;MemorySegment&lt;/code&gt;, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all &lt;code&gt;ByteBuffer.allocate(numBytes)&lt;/code&gt; calls with &lt;code&gt;ByteBuffer.allocateDirect(numBytes)&lt;/code&gt;. In Flink’s case it meant that we made the &lt;code&gt;MemorySegment&lt;/code&gt; abstract and added the &lt;code&gt;HeapMemorySegment&lt;/code&gt; and &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; subclasses. The &lt;code&gt;OffHeapMemorySegment&lt;/code&gt; takes the off-heap memory pointer from a &lt;code&gt;java.nio.DirectByteBuffer&lt;/code&gt; and implements its specialized access methods using &lt;code&gt;sun.misc.Unsafe&lt;/code&gt;. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, &lt;em&gt;-XX:MaxDirectMemorySize&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;In practice we had to go one step further, to make the implementation perform well. While the &lt;code&gt;ByteBuffer&lt;/code&gt; is used in I/O code paths to compose headers and move bulk memory into place, the MemorySegment is part of the innermost loops of many algorithms (sorting, hash tables, …). That means that the access methods have to be as fast as possible.&lt;/p&gt;

&lt;h2 id=&quot;understanding-the-jit-and-tuning-the-implementation&quot;&gt;Understanding the JIT and tuning the implementation&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;MemorySegment&lt;/code&gt; was (before our change) a standalone class, it was &lt;em&gt;final&lt;/em&gt; (had no subclasses). Via &lt;em&gt;Class Hierarchy Analysis (CHA)&lt;/em&gt;, the JIT compiler was able to determine that all of the accessor method calls go to one specific implementation. That way, all method calls can be perfectly de-virtualized and inlined, which is essential to performance, and the basis for all further optimizations (like vectorization of the calling loop).&lt;/p&gt;

&lt;p&gt;With two different memory segments loaded at the same time, the JIT compiler cannot perform the same level of optimization any more, which results in a noticeable difference in performance: A slowdown of about 2.7 x in the following example:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Writing 100000 x 32768 bytes to 32768 bytes segment:

HeapMemorySegment    (standalone) : 1,441 msecs
OffHeapMemorySegment (standalone) : 1,628 msecs
HeapMemorySegment    (subclass)   : 3,841 msecs
OffHeapMemorySegment (subclass)   : 3,847 msecs
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To get back to the original performance, we explored two approaches:&lt;/p&gt;

&lt;h3 id=&quot;approach-1-make-sure-that-only-one-memory-segment-implementation-is-ever-loaded&quot;&gt;Approach 1: Make sure that only one memory segment implementation is ever loaded.&lt;/h3&gt;

&lt;p&gt;We re-structured the code a bit to make sure that all places that produce long-lived and short-lived memory segments instantiate the same MemorySegment subclass (Heap- or Off-Heap segment). Using factories rather than directly instantiating the memory segment classes, this was straightforward.&lt;/p&gt;

&lt;p&gt;Experiments (see appendix) showed that the JIT compiler properly detects this (via hierarchy analysis) and that it can perform the same level of aggressive optimization as before, when there was only one &lt;code&gt;MemorySegment&lt;/code&gt; class.&lt;/p&gt;

&lt;h3 id=&quot;approach-2-write-one-segment-that-handles-both-heap-and-off-heap-memory&quot;&gt;Approach 2: Write one segment that handles both heap and off-heap memory&lt;/h3&gt;

&lt;p&gt;We created a class &lt;code&gt;HybridMemorySegment&lt;/code&gt; which handles transparently both heap- and off-heap memory. It can be initialized either with a byte array (heap memory), or with a pointer to a memory region outside the heap (off-heap memory).&lt;/p&gt;

&lt;p&gt;Fortunately, there is a nice trick to do this without introducing code branches and specialized handling of the two different memory types. The trick is based on the way that the &lt;code&gt;sun.misc.Unsafe&lt;/code&gt; methods interpret object references. To illustrate this, we take the method that gets a long integer from a memory position:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;sun.misc.Unsafe.getLong(Object reference, long offset)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The method accepts an object reference, takes its memory address, and add the offset to obtain a pointer. It then fetches the eight bytes at the address pointed to and interprets them as a long integer. Since the method accepts &lt;em&gt;null&lt;/em&gt; as the reference (and interprets it a &lt;em&gt;zero&lt;/em&gt;) one can write a method that fetches a long integer seamlessly from heap and off-heap memory as follows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;HybridMemorySegment&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heapMemory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;  &lt;span class=&quot;c1&quot;&gt;// non-null in heap case, null in off-heap case&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;       &lt;span class=&quot;c1&quot;&gt;// may be absolute, or relative to byte[]&lt;/span&gt;


  &lt;span class=&quot;c1&quot;&gt;// method of interest&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;UNSAFE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getLong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;heapMemory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;address&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;


  &lt;span class=&quot;c1&quot;&gt;// initialize for heap memory&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;HybridMemorySegment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heapMemory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;heapMemory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;heapMemory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;address&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;UNSAFE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;arrayBaseOffset&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[].&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
  
  &lt;span class=&quot;c1&quot;&gt;// initialize for off-heap memory&lt;/span&gt;
  &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;HybridMemorySegment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offheapPointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;heapMemory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;address&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offheapPointer&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;To check whether both cases (heap and off-heap) really result in the same code paths (no hidden branches inside the &lt;code&gt;Unsafe.getLong(Object, long)&lt;/code&gt; method) one can check out the C++ source code of &lt;code&gt;sun.misc.Unsafe&lt;/code&gt;, available here: &lt;a href=&quot;http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/prims/unsafe.cpp&quot;&gt;http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/prims/unsafe.cpp&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Of particular interest is the macro in line 155, which is the base of all GET methods. Tracing the function calls (many are no-ops), one can see that both variants of Unsafe’s &lt;code&gt;getLong()&lt;/code&gt; result in the same code:
Either &lt;code&gt;0 + absolutePointer&lt;/code&gt; or &lt;code&gt;objectRefAddress + offset&lt;/code&gt;.&lt;/p&gt;

&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;

&lt;p&gt;We ended up choosing a combination of both techniques:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;For off-heap memory, we use the &lt;code&gt;HybridMemorySegment&lt;/code&gt; from approach (2) which can represent both heap and off-heap memory. That way, the same class represents the long-lived off-heap memory as the short-lived temporary buffers allocated (or wrapped) on the heap.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We follow approach (1) to use factories to make sure that one segment is ever only loaded, which gives peak performance. We can exploit the performance benefits of the &lt;code&gt;HeapMemorySegment&lt;/code&gt; on individual byte operations, and we have a mechanism in place to add further implementations of &lt;code&gt;MemorySegments&lt;/code&gt; for the case that Oracle really removes &lt;code&gt;sun.misc.Unsafe&lt;/code&gt; in future Java versions.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The final code can be found in the Flink repository, under &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory&quot;&gt;https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Detailed micro benchmarks are in the appendix.  A summary of the findings is as follows:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;code&gt;HybridMemorySegment&lt;/code&gt; performs equally well in heap and off-heap memory, as is to be expected (the code paths are the same)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The &lt;code&gt;HeapMemorySegment&lt;/code&gt; is quite a bit faster in reading individual bytes, not so much at writing them. Access to a &lt;em&gt;byte[]&lt;/em&gt; is after all a bit cheaper than an invocation of a &lt;code&gt;sun.misc.Unsafe&lt;/code&gt; method, even when JIT-ed.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The abstract class &lt;code&gt;MemorySegment&lt;/code&gt; (with its subclasses &lt;code&gt;HeapMemorySegment&lt;/code&gt; and &lt;code&gt;HybridMemorySegment&lt;/code&gt;) performs as well as any specialized non-abstract class, as long as only one subclass is loaded. When both are loaded, performance may suffer by a factor of 2.7 x on certain operations.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;How badly the performance degrades in cases where both MemorySegment subclasses are loaded seems to depend a lot on which subclass is loaded and operated on before and after which. Sometimes, performance is affected more than other times. It seems to be an artifact of the JIT’s code profiling and how heavily it performs optimistic specialization towards certain subclasses.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There is still a bit of mystery left, specifically why sometimes code is faster when it performs more checks (has more instructions and an additional branch). Even though the branch is perfectly predictable, this seems counter-intuitive. The only explanation that we could come up with is that the branch optimizations (such as optimistic elimination etc) result in code that does better register allocation (for whatever reason, maybe the intermediate instructions just fit the allocation algorithm better).&lt;/p&gt;

&lt;h2 id=&quot;tldr&quot;&gt;tl;dr&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Off-heap memory in Flink complements the already very fast on-heap memory management. It improves the scalability to very large heap sizes and reduces memory copies for network and disk I/O.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Flink’s already present memory management infrastructure made the addition of off-heap memory simple. Off-heap memory is not only used for caching data, Flink can actually sort data off-heap and build hash tables off-heap.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We play a few nice tricks in the implementation to make sure the code is as friendly as possible to the JIT compiler and processor, to make the managed memory accesses are as fast as possible.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Understanding the JVM’s JIT compiler is tough - one needs a lot of (randomized) micro benchmarking to examine its behavior.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;hr /&gt;

&lt;h2 id=&quot;appendix-detailed-micro-benchmarks&quot;&gt;Appendix: Detailed Micro Benchmarks&lt;/h2&gt;

&lt;p&gt;These microbenchmarks test the performance of the different memory segment implementations on various operation.&lt;/p&gt;

&lt;p&gt;Each experiments tests the different implementations multiple times in different orders, to balance the advantage/disadvantage of the JIT compiler specializing towards certain code paths. All experiments were run 5x, discarding the fastest and slowest run, and then averaged. This compensated for delay before the JIT kicks in.&lt;/p&gt;

&lt;p&gt;My setup:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Oracle Java 8 (1.8.0_25)&lt;/li&gt;
  &lt;li&gt;4 GBytes JVM heap (the experiments need 1.4 GBytes Heap + 1 GBytes direct memory)&lt;/li&gt;
  &lt;li&gt;Intel Core i7-4700MQ CPU, 2.40GHz (4 cores, 8 hardware contexts)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The tested implementations are&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Type&lt;/th&gt;
      &lt;th&gt;Description&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt; &lt;em&gt;(exclusive)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;The case where it is the only loaded MemorySegment subclass.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt; &lt;em&gt;(mixed)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;The case where both the HeapMemorySegment and the HybridMemorySegment are loaded.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt; &lt;em&gt;(heap-exclusive)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Backed by heap memory, and the case where it is the only loaded MemorySegment class.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt; &lt;em&gt;(heap-mixed)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Backed by heap memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt; &lt;em&gt;(off-heap-exclusive)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Backed by off-heap memory, and the case where it is the only loaded MemorySegment class.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt; &lt;em&gt;(off-heap-mixed)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Backed by heap off-memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;Has no class hierarchy and virtual methods at all.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt; &lt;em&gt;(heap)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Has no class hierarchy and virtual methods at all, backed by heap memory.&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt; &lt;em&gt;(off-heap)&lt;/em&gt;&lt;/td&gt;
      &lt;td&gt;Has no class hierarchy and virtual methods at all, backed by off-heap memory.&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;div class=&quot;small&quot;&gt;
&lt;h3 id=&quot;byte-accesses&quot;&gt;Byte accesses&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Writing 100000 x 32768 bytes to 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, exclusive&lt;/td&gt;
      &lt;td&gt;1,441 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;3,841 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, exclusive&lt;/td&gt;
      &lt;td&gt;1,626 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, exclusive&lt;/td&gt;
      &lt;td&gt;1,628 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;3,848 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;3,847 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1,442 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;1,623 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;1,620 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 100000 x 32768 bytes from 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, exclusive&lt;/td&gt;
      &lt;td&gt;1,326 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;1,378 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, exclusive&lt;/td&gt;
      &lt;td&gt;2,029 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, exclusive&lt;/td&gt;
      &lt;td&gt;2,030 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;2,047 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;2,049 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1,331 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;2,030 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;2,030 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Writing 10 x 1073741824 bytes to 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, exclusive&lt;/td&gt;
      &lt;td&gt;5,602 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;12,570 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, exclusive&lt;/td&gt;
      &lt;td&gt;5,691 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, exclusive&lt;/td&gt;
      &lt;td&gt;5,691 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;12,566 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;12,556 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;5,599 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;5,687 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;5,681 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 10 x 1073741824 bytes from 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, exclusive&lt;/td&gt;
      &lt;td&gt;4,243 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;4,265 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, exclusive&lt;/td&gt;
      &lt;td&gt;6,730 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, exclusive&lt;/td&gt;
      &lt;td&gt;6,725 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;6,933 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;6,926 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;4,247 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;6,919 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;6,916 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;byte-array-accesses&quot;&gt;Byte Array accesses&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Writing 100000 x 32 byte[1024] to 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;164 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;163 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;163 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;165 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;182 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;176 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 100000 x 32 byte[1024] from 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;157 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;155 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;162 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;161 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;175 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;179 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Writing 10 x 1048576 byte[1024] to 1073741824 bytes segment&lt;/strong&gt; &lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;1,164 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;1,173 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;1,157 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1,169 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;1,174 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;1,166 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 10 x 1048576 byte[1024] from 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;854 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;853 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;854 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;857 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;896 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;887 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;long-integer-accesses&quot;&gt;Long integer accesses&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(note that the heap and off-heap segments use the same or comparable code for this)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing 100000 x 4096 longs to 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;221 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;222 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;221 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;194 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;220 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;221 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 100000 x 4096 longs from 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;233 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;232 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;231 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;232 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;232 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;233 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Writing 10 x 134217728 longs to 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;1,120 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;1,120 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;1,115 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1,148 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;1,116 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;1,113 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 10 x 134217728 longs from 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;1,097 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;1,099 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;1,093 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;917 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;1,105 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;1,097 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;h3 id=&quot;integer-accesses&quot;&gt;Integer accesses&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;(note that the heap and off-heap segments use the same or comparable code for this)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing 100000 x 8192 ints to 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;578 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;580 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;576 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;624 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;576 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;578 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 100000 x 8192 ints from 32768 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;464 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;464 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;465 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;463 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;464 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;463 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Writing 10 x 268435456 ints to 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;2,187 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;2,161 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;2,152 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;2,770 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;2,161 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;2,157 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;&lt;strong&gt;Reading 10 x 268435456 ints from 1073741824 bytes segment&lt;/strong&gt;&lt;/p&gt;

&lt;table class=&quot;table&quot;&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Segment&lt;/th&gt;
      &lt;th&gt;Time&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HeapMemorySegment&lt;/code&gt;, mixed&lt;/td&gt;
      &lt;td&gt;1,782 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, heap, mixed&lt;/td&gt;
      &lt;td&gt;1,783 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;HybridMemorySegment&lt;/code&gt;, off-heap, mixed&lt;/td&gt;
      &lt;td&gt;1,774 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHeapSegment&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1,501 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, heap&lt;/td&gt;
      &lt;td&gt;1,774 msecs&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code&gt;PureHybridSegment&lt;/code&gt;, off-heap&lt;/td&gt;
      &lt;td&gt;1,771 msecs&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;

</description>
<pubDate>Wed, 16 Sep 2015 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/09/16/off-heap-memory.html</link>
<guid isPermaLink="true">/news/2015/09/16/off-heap-memory.html</guid>
</item>

<item>
<title>Announcing Flink Forward 2015</title>
<description>&lt;p&gt;&lt;a href=&quot;http://2015.flink-forward.org/&quot;&gt;Flink Forward 2015&lt;/a&gt; is the first
conference with Flink at its center that aims to bring together the
Apache Flink community in a single place. The organizers are starting
this conference in October 12 and 13 from Berlin, the place where
Apache Flink started.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-forward-banner.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The &lt;a href=&quot;http://2015.flink-forward.org/?post_type=day&quot;&gt;conference program&lt;/a&gt; has
been announced by the organizers and a program committee consisting of
Flink PMC members. The agenda contains talks from industry and
academia as well as a dedicated session on hands-on Flink training.&lt;/p&gt;

&lt;p&gt;Some highlights of the talks include&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;A keynote by &lt;a href=&quot;http://2015.flink-forward.org/?speaker=william-vambenepe&quot;&gt;William
Vambenepe&lt;/a&gt;,
lead of the product management team responsible for Big Data
services on Google Cloud Platform (BigQuery, Dataflow, etc…) on
data streaming, Google Cloud Dataflow, and Apache Flink.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Talks by several practitioners on how they are putting Flink to work
in their projects, including ResearchGate, Bouygues Telecom,
Amadeus, Telefonica, Capital One, Ericsson, and Otto Group.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Talks on how open source projects, including Apache Mahout, Apache
SAMOA (incubating), Apache Zeppelin (incubating), Apache BigTop, and
Apache Storm integrate with Apache Flink.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Talks by Flink committers on several aspects of the system, such as
fault tolerance, the internal runtime architecture, and others.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the &lt;a href=&quot;http://2015.flink-forward.org/?post_type=day&quot;&gt;schedule&lt;/a&gt; and
register for the conference.&lt;/p&gt;

</description>
<pubDate>Thu, 03 Sep 2015 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/09/03/flink-forward.html</link>
<guid isPermaLink="true">/news/2015/09/03/flink-forward.html</guid>
</item>

<item>
<title>Apache Flink 0.9.1 available</title>
<description>&lt;p&gt;The Flink community is happy to announce that Flink 0.9.1 is now available.&lt;/p&gt;

&lt;p&gt;0.9.1 is a maintenance release, which includes a lot of minor fixes across
several parts of the system. We suggest all users of Flink to work with this
latest stable version.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;/downloads.html&quot;&gt;Download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10&quot;&gt;check out the
documentation&lt;/a&gt;. Feedback through the Flink mailing lists
is, as always, very welcome!&lt;/p&gt;

&lt;p&gt;The following &lt;a href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.9.1&quot;&gt;issues were fixed&lt;/a&gt;
for this release:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1916&quot;&gt;FLINK-1916&lt;/a&gt; EOFException when running delta-iteration job&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2089&quot;&gt;FLINK-2089&lt;/a&gt; “Buffer recycled” IllegalStateException during cancelling&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2189&quot;&gt;FLINK-2189&lt;/a&gt; NullPointerException in MutableHashTable&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2205&quot;&gt;FLINK-2205&lt;/a&gt; Confusing entries in JM Webfrontend Job Configuration section&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2229&quot;&gt;FLINK-2229&lt;/a&gt; Data sets involving non-primitive arrays cannot be unioned&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2238&quot;&gt;FLINK-2238&lt;/a&gt; Scala ExecutionEnvironment.fromCollection does not work with Sets&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2248&quot;&gt;FLINK-2248&lt;/a&gt; Allow disabling of sdtout logging output&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2257&quot;&gt;FLINK-2257&lt;/a&gt; Open and close of RichWindowFunctions is not called&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2262&quot;&gt;FLINK-2262&lt;/a&gt; ParameterTool API misnamed function&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2280&quot;&gt;FLINK-2280&lt;/a&gt; GenericTypeComparator.compare() does not respect ascending flag&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2285&quot;&gt;FLINK-2285&lt;/a&gt; Active policy emits elements of the last window twice&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2286&quot;&gt;FLINK-2286&lt;/a&gt; Window ParallelMerge sometimes swallows elements of the last window&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2293&quot;&gt;FLINK-2293&lt;/a&gt; Division by Zero Exception&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2298&quot;&gt;FLINK-2298&lt;/a&gt; Allow setting custom YARN application names through the CLI&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2347&quot;&gt;FLINK-2347&lt;/a&gt; Rendering problem with Documentation website&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2353&quot;&gt;FLINK-2353&lt;/a&gt; Hadoop mapred IOFormat wrappers do not respect JobConfigurable interface&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2356&quot;&gt;FLINK-2356&lt;/a&gt; Resource leak in checkpoint coordinator&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2361&quot;&gt;FLINK-2361&lt;/a&gt; CompactingHashTable loses entries&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2362&quot;&gt;FLINK-2362&lt;/a&gt; distinct is missing in DataSet API documentation&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2381&quot;&gt;FLINK-2381&lt;/a&gt; Possible class not found Exception on failed partition producer&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2384&quot;&gt;FLINK-2384&lt;/a&gt; Deadlock during partition spilling&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2386&quot;&gt;FLINK-2386&lt;/a&gt; Implement Kafka connector using the new Kafka Consumer API&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2394&quot;&gt;FLINK-2394&lt;/a&gt; HadoopOutFormat OutputCommitter is default to FileOutputCommiter&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2412&quot;&gt;FLINK-2412&lt;/a&gt; Race leading to IndexOutOfBoundsException when querying for buffer while releasing SpillablePartition&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2422&quot;&gt;FLINK-2422&lt;/a&gt; Web client is showing a blank page if “Meta refresh” is disabled in browser&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2424&quot;&gt;FLINK-2424&lt;/a&gt; InstantiationUtil.serializeObject(Object) does not close output stream&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2437&quot;&gt;FLINK-2437&lt;/a&gt; TypeExtractor.analyzePojo has some problems around the default constructor detection&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2442&quot;&gt;FLINK-2442&lt;/a&gt; PojoType fields not supported by field position keys&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2447&quot;&gt;FLINK-2447&lt;/a&gt; TypeExtractor returns wrong type info when a Tuple has two fields of the same POJO type&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2450&quot;&gt;FLINK-2450&lt;/a&gt; IndexOutOfBoundsException in KryoSerializer&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2460&quot;&gt;FLINK-2460&lt;/a&gt; ReduceOnNeighborsWithExceptionITCase failure&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2527&quot;&gt;FLINK-2527&lt;/a&gt; If a VertexUpdateFunction calls setNewVertexValue more than once, the MessagingFunction will only see the first value set&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2540&quot;&gt;FLINK-2540&lt;/a&gt; LocalBufferPool.requestBuffer gets into infinite loop&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2542&quot;&gt;FLINK-2542&lt;/a&gt; It should be documented that it is required from a join key to override hashCode(), when it is not a POJO&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2555&quot;&gt;FLINK-2555&lt;/a&gt; Hadoop Input/Output Formats are unable to access secured HDFS clusters&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2560&quot;&gt;FLINK-2560&lt;/a&gt; Flink-Avro Plugin cannot be handled by Eclipse&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2572&quot;&gt;FLINK-2572&lt;/a&gt; Resolve base path of symlinked executable&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2584&quot;&gt;FLINK-2584&lt;/a&gt; ASM dependency is not shaded away&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 01 Sep 2015 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/09/01/release-0.9.1.html</link>
<guid isPermaLink="true">/news/2015/09/01/release-0.9.1.html</guid>
</item>

<item>
<title>Introducing Gelly: Graph Processing with Apache Flink</title>
<description>&lt;p&gt;This blog post introduces &lt;strong&gt;Gelly&lt;/strong&gt;, Apache Flink’s &lt;em&gt;graph-processing API and library&lt;/em&gt;. Flink’s native support
for iterations makes it a suitable platform for large-scale graph analytics.
By leveraging delta iterations, Gelly is able to map various graph processing models such as
vertex-centric or gather-sum-apply to Flink dataflows.&lt;/p&gt;

&lt;p&gt;Gelly allows Flink users to perform end-to-end data analysis in a single system.
Gelly can be seamlessly used with Flink’s DataSet API,
which means that pre-processing, graph creation, analysis, and post-processing can be done
in the same application. At the end of this post, we will go through a step-by-step example
in order to demonstrate that loading, transformation, filtering, graph creation, and analysis
can be performed in a single Flink program.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Overview&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;a href=&quot;#what-is-gelly&quot;&gt;What is Gelly?&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#graph-representation-and-creation&quot;&gt;Graph Representation and Creation&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#transformations-and-utilities&quot;&gt;Transformations and Utilities&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#iterative-graph-processing&quot;&gt;Iterative Graph Processing&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#library-of-graph-algorithms&quot;&gt;Library of Graph Algorithms&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#use-case-music-profiles&quot;&gt;Use-Case: Music Profiles&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#ongoing-and-future-work&quot;&gt;Ongoing and Future Work&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;what-is-gelly&quot;&gt;What is Gelly?&lt;/h2&gt;

&lt;p&gt;Gelly is a Graph API for Flink. It is currently supported in both Java and Scala.
The Scala methods are implemented as wrappers on top of the basic Java operations.
The API contains a set of utility functions for graph analysis, supports iterative graph
processing and introduces a library of graph algorithms.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-stack.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;graph-representation-and-creation&quot;&gt;Graph Representation and Creation&lt;/h2&gt;

&lt;p&gt;In Gelly, a graph is represented by a DataSet of vertices and a DataSet of edges.
A vertex is defined by its unique ID and a value, whereas an edge is defined by its source ID,
target ID, and value. A vertex or edge for which a value is not specified will simply have the
value type set to &lt;code&gt;NullValue&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;A graph can be created from:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;DataSet of edges&lt;/strong&gt; and an optional &lt;strong&gt;DataSet of vertices&lt;/strong&gt; using &lt;code&gt;Graph.fromDataSet()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;DataSet of Tuple3&lt;/strong&gt; and an optional &lt;strong&gt;DataSet of Tuple2&lt;/strong&gt; using &lt;code&gt;Graph.fromTupleDataSet()&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Collection of edges&lt;/strong&gt; and an optional &lt;strong&gt;Collection of vertices&lt;/strong&gt; using &lt;code&gt;Graph.fromCollection()&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In all three cases, if the vertices are not provided,
Gelly will automatically produce the vertex IDs from the edge source and target IDs.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;transformations-and-utilities&quot;&gt;Transformations and Utilities&lt;/h2&gt;

&lt;p&gt;These are methods of the Graph class and include common graph metrics, transformations
and mutations as well as neighborhood aggregations.&lt;/p&gt;

&lt;h4 id=&quot;common-graph-metrics&quot;&gt;Common Graph Metrics&lt;/h4&gt;
&lt;p&gt;These methods can be used to retrieve several graph metrics and properties, such as the number
of vertices, edges and the node degrees.&lt;/p&gt;

&lt;h4 id=&quot;transformations&quot;&gt;Transformations&lt;/h4&gt;
&lt;p&gt;The transformation methods enable several Graph operations, using high-level functions similar to
the ones provided by the batch processing API. These transformations can be applied one after the
other, yielding a new Graph after each step, in a fashion similar to operators on DataSets:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;inputGraph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUndirected&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mapEdges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;CustomEdgeMapper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Transformations can be applied on:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Vertices&lt;/strong&gt;: &lt;code&gt;mapVertices&lt;/code&gt;, &lt;code&gt;joinWithVertices&lt;/code&gt;, &lt;code&gt;filterOnVertices&lt;/code&gt;, &lt;code&gt;addVertex&lt;/code&gt;, …&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Edges&lt;/strong&gt;: &lt;code&gt;mapEdges&lt;/code&gt;, &lt;code&gt;filterOnEdges&lt;/code&gt;, &lt;code&gt;removeEdge&lt;/code&gt;, …&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Triplets&lt;/strong&gt; (source vertex, target vertex, edge): &lt;code&gt;getTriplets&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h4 id=&quot;neighborhood-aggregations&quot;&gt;Neighborhood Aggregations&lt;/h4&gt;

&lt;p&gt;Neighborhood methods allow vertices to perform an aggregation on their first-hop neighborhood.
This provides a vertex-centric view, where each vertex can access its neighboring edges and neighbor values.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;reduceOnEdges()&lt;/code&gt; provides access to the neighboring edges of a vertex,
i.e. the edge value and the vertex ID of the edge endpoint. In order to also access the
neighboring vertices’ values, one should call the &lt;code&gt;reduceOnNeighbors()&lt;/code&gt; function.
The scope of the neighborhood is defined by the EdgeDirection parameter, which can be IN, OUT or ALL,
to gather in-coming, out-going or all edges (neighbors) of a vertex.&lt;/p&gt;

&lt;p&gt;The two neighborhood
functions mentioned above can only be used when the aggregation function is associative and commutative.
In case the function does not comply with these restrictions or if it is desirable to return zero,
one or more values per vertex, the more general  &lt;code&gt;groupReduceOnEdges()&lt;/code&gt; and 
&lt;code&gt;groupReduceOnNeighbors()&lt;/code&gt; functions must be called.&lt;/p&gt;

&lt;p&gt;Consider the following graph, for instance:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/neighborhood.png&quot; style=&quot;width:60%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Assume you would want to compute the sum of the values of all incoming neighbors for each vertex.
We will call the &lt;code&gt;reduceOnNeighbors()&lt;/code&gt; aggregation method since the sum is an associative and commutative operation and the neighbors’ values are needed:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduceOnNeighbors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SumValues&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EdgeDirection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The vertex with id 1 is the only node that has no incoming edges. The result is therefore:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/reduce-on-neighbors.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;iterative-graph-processing&quot;&gt;Iterative Graph Processing&lt;/h2&gt;

&lt;p&gt;During the past few years, many different programming models for distributed graph processing
have been introduced: &lt;a href=&quot;http://delivery.acm.org/10.1145/2490000/2484843/a22-salihoglu.pdf?ip=141.23.53.206&amp;amp;id=2484843&amp;amp;acc=ACTIVE%20SERVICE&amp;amp;key=2BA2C432AB83DA15.0F42380CB8DD3307.4D4702B0C3E38B35.4D4702B0C3E38B35&amp;amp;CFID=706313474&amp;amp;CFTOKEN=60107876&amp;amp;__acm__=1440408958_b131e035942130653e5782409b5c0cde&quot;&gt;vertex-centric&lt;/a&gt;,
&lt;a href=&quot;http://researcher.ibm.com/researcher/files/us-ytian/giraph++.pdf&quot;&gt;partition-centric&lt;/a&gt;, &lt;a href=&quot;http://www.eecs.harvard.edu/cs261/notes/gonzalez-2012.htm&quot;&gt;gather-apply-scatter&lt;/a&gt;,
&lt;a href=&quot;http://infoscience.epfl.ch/record/188535/files/paper.pdf&quot;&gt;edge-centric&lt;/a&gt;, &lt;a href=&quot;http://www.vldb.org/pvldb/vol7/p1673-quamar.pdf&quot;&gt;neighborhood-centric&lt;/a&gt;.
Each one of these models targets a specific class of graph applications and each corresponding
system implementation optimizes the runtime respectively. In Gelly, we would like to exploit the
flexible dataflow model and the efficient iterations of Flink, to support multiple distributed
graph processing models on top of the same system.&lt;/p&gt;

&lt;p&gt;Currently, Gelly has methods for writing vertex-centric programs and provides support for programs
implemented using the gather-sum(accumulate)-apply model. We are also considering to offer support
for the partition-centric computation model, using Fink’s &lt;code&gt;mapPartition()&lt;/code&gt; operator.
This model exposes the partition structure to the user and allows local graph structure exploitation
inside a partition to avoid unnecessary communication.&lt;/p&gt;

&lt;h4 id=&quot;vertex-centric&quot;&gt;Vertex-centric&lt;/h4&gt;

&lt;p&gt;Gelly wraps Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-0.8/spargel_guide.html&quot;&gt;Spargel APi&lt;/a&gt; to 
support the vertex-centric, Pregel-like programming model. Gelly’s &lt;code&gt;runVertexCentricIteration&lt;/code&gt; method accepts two user-defined functions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;MessagingFunction:&lt;/strong&gt; defines what messages a vertex sends out for the next superstep.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;VertexUpdateFunction:&lt;/strong&gt;* defines how a vertex will update its value based on the received messages.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The method will execute the vertex-centric iteration on the input Graph and return a new Graph, with updated vertex values.&lt;/p&gt;

&lt;p&gt;Gelly’s vertex-centric programming model exploits Flink’s efficient delta iteration operators.
Many iterative graph algorithms expose non-uniform behavior, where some vertices converge to
their final value faster than others. In such cases, the number of vertices that need to be
recomputed during an iteration decreases as the algorithm moves towards convergence.&lt;/p&gt;

&lt;p&gt;For example, consider a Single Source Shortest Paths problem on the following graph, where S
is the source node, i is the iteration counter and the edge values represent distances between nodes:&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sssp.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;In each iteration, a vertex receives distances from its neighbors and adopts the minimum of
these distances and its current distance as the new value. Then, it  propagates its new value
to its neighbors. If a vertex does not change value during an iteration, there is no need for
it to propagate its old distance to its neighbors; as they have already taken it into account.&lt;/p&gt;

&lt;p&gt;Flink’s &lt;code&gt;IterateDelta&lt;/code&gt; operator permits exploitation of this property as well as the
execution of computations solely on the active parts of the graph. The operator receives two inputs:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;the &lt;strong&gt;Solution Set&lt;/strong&gt;, which represents the current state of the input and&lt;/li&gt;
  &lt;li&gt;the &lt;strong&gt;Workset&lt;/strong&gt;, which determines which parts of the graph will be recomputed in the next iteration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the SSSP example above, the Workset contains the vertices which update their distances.
The user-defined iterative function is applied on these inputs to produce state updates.
These updates are efficiently applied on the state, which is kept in memory.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/iteration.png&quot; style=&quot;width:60%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Internally, a vertex-centric iteration is a Flink delta iteration, where the initial Solution Set
is the vertex set of the input graph and the Workset is created by selecting the active vertices,
i.e. the ones that updated their value in the previous iteration. The messaging and vertex-update
functions are user-defined functions wrapped inside coGroup operators. In each superstep,
the active vertices (Workset) are coGrouped with the edges to generate the neighborhoods for
each vertex. The messaging function is then applied on each neighborhood. Next, the result of the
messaging function is coGrouped with the current vertex values (Solution Set) and the user-defined
vertex-update function is applied on the result. The output of this coGroup operator is finally
used to update the Solution Set and create the Workset input for the next iteration.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/vertex-centric-plan.png&quot; style=&quot;width:40%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h4 id=&quot;gather-sum-apply&quot;&gt;Gather-Sum-Apply&lt;/h4&gt;

&lt;p&gt;Gelly supports a variation of the popular Gather-Sum-Apply-Scatter  computation model,
introduced by PowerGraph. In GSA, a vertex pulls information from its neighbors as opposed to the
vertex-centric approach where the updates are pushed from the incoming neighbors.
The &lt;code&gt;runGatherSumApplyIteration()&lt;/code&gt; accepts three user-defined functions:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;GatherFunction:&lt;/strong&gt; gathers neighboring partial values along in-edges.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;SumFunction:&lt;/strong&gt; accumulates/reduces the values into a single one.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;ApplyFunction:&lt;/strong&gt; uses the result computed in the sum phase to update the current vertex’s value.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Similarly to vertex-centric, GSA leverages Flink’s delta iteration operators as, in many cases,
vertex values do not need to be recomputed during an iteration.&lt;/p&gt;

&lt;p&gt;Let us reconsider the Single Source Shortest Paths algorithm. In each iteration, a vertex:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Gather&lt;/strong&gt; retrieves distances from its neighbors summed up with the corresponding edge values;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Sum&lt;/strong&gt; compares the newly obtained distances in order to extract the minimum;&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Apply&lt;/strong&gt; and finally adopts the minimum distance computed in the sum step,
provided that it is lower than its current value. If a vertex’s value does not change during
an iteration, it no longer propagates its distance.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Internally, a Gather-Sum-Apply Iteration is a Flink delta iteration where the initial solution
set is the vertex input set and the workset is created by selecting the active vertices.&lt;/p&gt;

&lt;p&gt;The three functions: gather, sum and apply are user-defined functions wrapped in map, reduce
and join operators respectively. In each superstep, the active vertices are joined with the
edges in order to create neighborhoods for each vertex. The gather function is then applied on
the neighborhood values via a map function. Afterwards, the result is grouped by the vertex ID
and reduced using the sum function. Finally, the outcome of the sum phase is joined with the
current vertex values (solution set), the values are updated, thus creating a new workset that
serves as input for the next iteration.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/GSA-plan.png&quot; style=&quot;width:40%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;library-of-graph-algorithms&quot;&gt;Library of Graph Algorithms&lt;/h2&gt;

&lt;p&gt;We are building a library of graph algorithms in Gelly, to easily analyze large-scale graphs.
These algorithms extend the &lt;code&gt;GraphAlgorithm&lt;/code&gt; interface and can be simply executed on
the input graph by calling a &lt;code&gt;run()&lt;/code&gt; method.&lt;/p&gt;

&lt;p&gt;We currently have implementations of the following algorithms:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;PageRank&lt;/li&gt;
  &lt;li&gt;Single-Source-Shortest-Paths&lt;/li&gt;
  &lt;li&gt;Label Propagation&lt;/li&gt;
  &lt;li&gt;Community Detection (based on &lt;a href=&quot;http://arxiv.org/pdf/0808.2633.pdf&quot;&gt;this paper&lt;/a&gt;)&lt;/li&gt;
  &lt;li&gt;Connected Components&lt;/li&gt;
  &lt;li&gt;GSA Connected Components&lt;/li&gt;
  &lt;li&gt;GSA PageRank&lt;/li&gt;
  &lt;li&gt;GSA Single-Source-Shortest-Paths&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gelly also offers implementations of common graph algorithms through &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example&quot;&gt;examples&lt;/a&gt;.
Among them, one can find graph weighting schemes, like Jaccard Similarity and Euclidean Distance Weighting, 
as well as computation of common graph metrics.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;use-case-music-profiles&quot;&gt;Use-Case: Music Profiles&lt;/h2&gt;

&lt;p&gt;In the following section, we go through a use-case scenario that combines the Flink DataSet API
with Gelly in order to process users’ music preferences to suggest additions to their playlist.&lt;/p&gt;

&lt;p&gt;First, we read a user’s music profile which is in the form of user-id, song-id and the number of
plays that each song has. We then filter out the list of songs the users do not wish to see in their
playlist. Then we compute the top songs per user (i.e. the songs a user listened to the most).
Finally, as a separate use-case on the same data set, we create a user-user similarity graph based
on the common songs and use this resulting graph to detect communities by calling Gelly’s Label Propagation
library method.&lt;/p&gt;

&lt;p&gt;For running the example implementation, please use the 0.10-SNAPSHOT version of Flink as a
dependency. The full example code base can be found &lt;a href=&quot;https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java&quot;&gt;here&lt;/a&gt;. The public data set used for testing
can be found &lt;a href=&quot;http://labrosa.ee.columbia.edu/millionsong/tasteprofile&quot;&gt;here&lt;/a&gt;. This data set contains &lt;strong&gt;48,373,586&lt;/strong&gt; real user-id, song-id and
play-count triplets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The code snippets in this post try to reduce verbosity by skipping type parameters of generic functions. Please have a look at &lt;a href=&quot;https://github.com/apache/flink/blob/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/MusicProfiles.java&quot;&gt;the full example&lt;/a&gt; for the correct and complete code.&lt;/p&gt;

&lt;h4 id=&quot;filtering-out-bad-records&quot;&gt;Filtering out Bad Records&lt;/h4&gt;

&lt;p&gt;After reading the &lt;code&gt;(user-id, song-id, play-count)&lt;/code&gt; triplets from a CSV file and after parsing a
text file in order to retrieve the list of songs that a user would not want to include in a
playlist, we use a coGroup function to filter out the mismatches.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// read the user-song-play triplets.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;triplets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;getUserSongTripletsData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// read the mismatches dataset and extract the songIDs&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;validTriplets&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;triplets&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;coGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mismatches&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equalTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;with&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;CoGroupFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;coGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;triplets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;invalidSongs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;invalidSongs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple3&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;triplet&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;triplets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// valid triplet&lt;/span&gt;
                                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;triplet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The coGroup simply takes the triplets whose song-id (second field) matches the song-id from the
mismatches list (first field) and if the iterator was empty for a certain triplet, meaning that
there were no mismatches found, the triplet associated with that song is collected.&lt;/p&gt;

&lt;h4 id=&quot;compute-the-top-songs-per-user&quot;&gt;Compute the Top Songs per User&lt;/h4&gt;

&lt;p&gt;As a next step, we would like to see which songs a user played more often. To this end, we
build a user-song weighted, bipartite graph in which edge source vertices are users, edge target
vertices are songs and where the weight represents the number of times the user listened to that
certain song.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/user-song-graph.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create a user -&amp;gt; song weighted bipartite graph where the edge weights&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// correspond to play counts&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;NullValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userSongGraph&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromTupleDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;validTriplets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Consult the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/gelly/&quot;&gt;Gelly guide&lt;/a&gt; for guidelines 
on how to create a graph from a given DataSet of edges or from a collection.&lt;/p&gt;

&lt;p&gt;To retrieve the top songs per user, we call the groupReduceOnEdges function as it perform an
aggregation over the first hop neighborhood taking just the edges into consideration. We will
basically iterate through the edge value and collect the target (song) of the maximum weight edge.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//get the top track (most listened to) for each user&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;usersWithTopTrack&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userSongGraph&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupReduceOnEdges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GetTopSongPerUser&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EdgeDirection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;GetTopSongPerUser&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;EdgesFunctionWithVertexValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;iterateEdges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Vertex&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;vertex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxPlaycount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topSong&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxPlaycount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;maxPlaycount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;topSong&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTarget&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vertex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;topSong&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h4 id=&quot;creating-a-user-user-similarity-graph&quot;&gt;Creating a User-User Similarity Graph&lt;/h4&gt;

&lt;p&gt;Clustering users based on common interests, in this case, common top songs, could prove to be
very useful for advertisements or for recommending new musical compilations. In a user-user graph,
two users who listen to the same song will simply be linked together through an edge as depicted
in the figure below.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/user-song-to-user-user.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;To form the user-user graph in Flink, we will simply take the edges from the user-song graph
(left-hand side of the image), group them by song-id, and then add all the users (source vertex ids)
to an ArrayList.&lt;/p&gt;

&lt;p&gt;We then match users who listened to the same song two by two, creating a new edge to mark their
common interest (right-hand side of the image).&lt;/p&gt;

&lt;p&gt;Afterwards, we perform a &lt;code&gt;distinct()&lt;/code&gt; operation to avoid creation of duplicate data.
Considering that we now have the DataSet of edges which present interest, creating a graph is as
straightforward as a call to the &lt;code&gt;Graph.fromDataSet()&lt;/code&gt; method.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create a user-user similarity graph:&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// two users that listen to the same song are connected&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;similarUsers&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userSongGraph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getEdges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
        &lt;span class=&quot;c1&quot;&gt;// filter out user-song edges that are below the playcount threshold&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FilterFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            	&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;playcountThreshold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduceGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GroupReduceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;reduce&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;List&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Edge&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
                        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                                &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Edge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)));&lt;/span&gt;
                            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;distinct&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;similarUsersGraph&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;similarUsers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getUndirected&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After having created a user-user graph, it would make sense to detect the various communities
formed. To do so, we first initialize each vertex with a numeric label using the
&lt;code&gt;joinWithVertices()&lt;/code&gt; function that takes a data set of Tuple2 as a parameter and joins
the id of a vertex with the first element of the tuple, afterwards applying a map function.
Finally, we call the &lt;code&gt;run()&lt;/code&gt; method with the LabelPropagation library method passed
as a parameter. In the end, the vertices will be updated to contain the most frequent label
among their neighbors.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// detect user communities using label propagation&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// initialize each vertex with a unique numeric label&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idsWithInitialLabels&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataSetUtils&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;zipWithUniqueId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;similarUsersGraph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getVertexIds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
                &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// update the vertex values and run the label propagation algorithm&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Vertex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;verticesWithCommunity&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;similarUsersGraph&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;joinWithVertices&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;idsWithlLabels&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idWithLabel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;idWithLabel&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;LabelPropagation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;numIterations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getVertices&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;ongoing-and-future-work&quot;&gt;Ongoing and Future Work&lt;/h2&gt;

&lt;p&gt;Currently, Gelly matches the basic functionalities provided by most state-of-the-art graph
processing systems. Our vision is to turn Gelly into more than “yet another library for running
PageRank-like algorithms” by supporting generic iterations, implementing graph partitioning,
providing bipartite graph support and by offering numerous other features.&lt;/p&gt;

&lt;p&gt;We are also enriching Flink Gelly with a set of operators suitable for highly skewed graphs
as well as a Graph API built on Flink Streaming.&lt;/p&gt;

&lt;p&gt;In the near future, we would like to see how Gelly can be integrated with graph visualization
tools, graph database systems and sampling techniques.&lt;/p&gt;

&lt;p&gt;Curious? Read more about our plans for Gelly in the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Gelly&quot;&gt;roadmap&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;links&quot;&gt;Links&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/libs/gelly/&quot;&gt;Gelly Documentation&lt;/a&gt;&lt;/p&gt;
</description>
<pubDate>Mon, 24 Aug 2015 00:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html</link>
<guid isPermaLink="true">/news/2015/08/24/introducing-flink-gelly.html</guid>
</item>

<item>
<title>Announcing Apache Flink 0.9.0</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Download the release&lt;/a&gt; and check out &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/&quot;&gt;the documentation&lt;/a&gt;. Feedback through the Flink&lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt; mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/p&gt;

&lt;h2 id=&quot;new-features&quot;&gt;New Features&lt;/h2&gt;

&lt;h3 id=&quot;exactly-once-fault-tolerance-for-streaming-programs&quot;&gt;Exactly-once Fault Tolerance for streaming programs&lt;/h3&gt;

&lt;p&gt;This release introduces a new fault tolerance mechanism for streaming dataflows. The new checkpointing algorithm takes data sources and also user-defined state into account and recovers failures such that all records are reflected exactly once in the operator states.&lt;/p&gt;

&lt;p&gt;The checkpointing algorithm is lightweight and driven by barriers that are periodically injected into the data streams at the sources. As such, it has an extremely low coordination overhead and is able to sustain very high throughput rates. User-defined state can be automatically backed up to configurable storage by the fault tolerance mechanism.&lt;/p&gt;

&lt;p&gt;Please refer to &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#stateful-computation&quot;&gt;the documentation on stateful computation&lt;/a&gt; for details in how to use fault tolerant data streams with Flink.&lt;/p&gt;

&lt;p&gt;The fault tolerance mechanism requires data sources that can replay recent parts of the stream, such as &lt;a href=&quot;http://kafka.apache.org&quot;&gt;Apache Kafka&lt;/a&gt;. Read more &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#apache-kafka&quot;&gt;about how to use the persistent Kafka source&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;table-api&quot;&gt;Table API&lt;/h3&gt;

&lt;p&gt;Flink’s new Table API offers a higher-level abstraction for interacting with structured data sources. The Table API allows users to execute logical, SQL-like queries on distributed data sets while allowing them to freely mix declarative queries with regular Flink operators. Here is an example that groups and joins two tables:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickCounts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clicks&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;user&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activeUsers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clickCounts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;userId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Tables consist of logical attributes that can be selected by name rather than physical Java and Scala data types. This alleviates a lot of boilerplate code for common ETL tasks and raises the abstraction for Flink programs. Tables are available for both static and streaming data sources (DataSet and DataStream APIs).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/libs/table.html&quot;&gt;Check out the Table guide for Java and Scala&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;gelly-graph-processing-api&quot;&gt;Gelly Graph Processing API&lt;/h3&gt;

&lt;p&gt;Gelly is a Java Graph API for Flink. It contains a set of utilities for graph analysis, support for iterative graph processing and a library of graph algorithms. Gelly exposes a Graph data structure that wraps DataSets for vertices and edges, as well as methods for creating graphs from DataSets, graph transformations and utilities (e.g., in- and out- degrees of vertices), neighborhood aggregations, iterative vertex-centric graph processing, as well as a library of common graph algorithms, including PageRank, SSSP, label propagation, and community detection.&lt;/p&gt;

&lt;p&gt;Gelly internally builds on top of Flink’s&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/iterations.html&quot;&gt; delta iterations&lt;/a&gt;. Iterative graph algorithms are executed leveraging mutable state, achieving similar performance with specialized graph processing systems.&lt;/p&gt;

&lt;p&gt;Gelly will eventually subsume Spargel, Flink’s Pregel-like API.&lt;/p&gt;

&lt;p&gt;Note: The Gelly library is still in beta status and subject to improvements and heavy performance tuning.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/libs/gelly_guide.html&quot;&gt;Check out the Gelly guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;flink-machine-learning-library&quot;&gt;Flink Machine Learning Library&lt;/h3&gt;

&lt;p&gt;This release includes the first version of Flink’s Machine Learning library. The library’s pipeline approach, which has been strongly inspired by scikit-learn’s abstraction of transformers and predictors, makes it easy to quickly set up a data processing pipeline and to get your job done.&lt;/p&gt;

&lt;p&gt;Flink distinguishes between transformers and predictors. Transformers are components which transform your input data into a new format allowing you to extract features, cleanse your data or to sample from it. Predictors on the other hand constitute the components which take your input data and train a model on it. The model you obtain from the learner can then be evaluated and used to make predictions on unseen data.&lt;/p&gt;

&lt;p&gt;Currently, the machine learning library contains transformers and predictors to do multiple tasks. The library supports multiple linear regression using stochastic gradient descent to scale to large data sizes. Furthermore, it includes an alternating least squares (ALS) implementation to factorizes large matrices. The matrix factorization can be used to do collaborative filtering. An implementation of the communication efficient distributed dual coordinate ascent (CoCoA) algorithm is the latest addition to the library. The CoCoA algorithm can be used to train distributed soft-margin SVMs.&lt;/p&gt;

&lt;p&gt;Note: The ML library is still in beta status and subject to improvements and heavy performance tuning.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/libs/ml/&quot;&gt;Check out FlinkML&lt;/a&gt;&lt;/p&gt;

&lt;h3 id=&quot;flink-on-yarn-leveraging-apache-tez&quot;&gt;Flink on YARN leveraging Apache Tez&lt;/h3&gt;

&lt;p&gt;We are introducing a new execution mode for Flink to be able to run restricted Flink programs on top of&lt;a href=&quot;http://tez.apache.org&quot;&gt; Apache Tez&lt;/a&gt;. This mode retains Flink’s APIs, optimizer, as well as Flink’s runtime operators, but instead of wrapping those in Flink tasks that are executed by Flink TaskManagers, it wraps them in Tez runtime tasks and builds a Tez DAG that represents the program.&lt;/p&gt;

&lt;p&gt;By using Flink on Tez, users have an additional choice for an execution platform for Flink programs. While Flink’s distributed runtime favors low latency, streaming shuffles, and iterative algorithms, Tez focuses on scalability and elastic resource usage in shared YARN clusters.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/flink_on_tez.html&quot;&gt;Get started with Flink on Tez&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;reworked-distributed-runtime-on-akka&quot;&gt;Reworked Distributed Runtime on Akka&lt;/h3&gt;

&lt;p&gt;Flink’s RPC system has been replaced by the widely adopted&lt;a href=&quot;http://akka.io&quot;&gt; Akka&lt;/a&gt; framework. Akka’s concurrency model offers the right abstraction to develop a fast as well as robust distributed system. By using Akka’s own failure detection mechanism the stability of Flink’s runtime is significantly improved, because the system can now react in proper form to node outages. Furthermore, Akka improves Flink’s scalability by introducing asynchronous messages to the system. These asynchronous messages allow Flink to be run on many more nodes than before.&lt;/p&gt;

&lt;h3 id=&quot;improved-yarn-support&quot;&gt;Improved YARN support&lt;/h3&gt;

&lt;p&gt;Flink’s YARN client contains several improvements, such as a detached mode for starting a YARN session in the background, the ability to submit a single Flink job to a YARN cluster without starting a session, including a “fire and forget” mode. Flink is now also able to reallocate failed YARN containers to maintain the size of the requested cluster. This feature allows to implement fault-tolerant setups on top of YARN. There is also an internal Java API to deploy and control a running YARN cluster. This is being used by system integrators to easily control Flink on YARN within their Hadoop 2 cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/setup/yarn_setup.html&quot;&gt;See the YARN docs&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;static-code-analysis-for-the-flink-optimizer-opening-the-udf-blackboxes&quot;&gt;Static Code Analysis for the Flink Optimizer: Opening the UDF blackboxes&lt;/h3&gt;

&lt;p&gt;This release introduces a first version of a static code analyzer that pre-interprets functions written by the user to get information about the function’s internal dataflow. The code analyzer can provide useful information about &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html#semantic-annotations&quot;&gt;forwarded fields&lt;/a&gt; to Flink’s optimizer and thus speedup job executions. It also informs if the code contains obvious mistakes. For stability reasons, the code analyzer is initially disabled by default. It can be activated through&lt;/p&gt;

&lt;p&gt;ExecutionEnvironment.getExecutionConfig().setCodeAnalysisMode(…)&lt;/p&gt;

&lt;p&gt;either as an assistant that gives hints during the implementation or by directly applying the optimizations that have been found.&lt;/p&gt;

&lt;h2 id=&quot;more-improvements-and-fixes&quot;&gt;More Improvements and Fixes&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1605&quot;&gt;FLINK-1605&lt;/a&gt;: Flink is not exposing its Guava and ASM dependencies to Maven projects depending on Flink. We use the maven-shade-plugin to relocate these dependencies into our own namespace. This allows users to use any Guava or ASM version.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1605&quot;&gt;FLINK-1417&lt;/a&gt;: Automatic recognition and registration of Java Types at Kryo and the internal serializers: Flink has its own type handling and serialization framework falling back to Kryo for types that it cannot handle. To get the best performance Flink is automatically registering all types a user is using in their program with Kryo.Flink also registers serializers for Protocol Buffers, Thrift, Avro and YodaTime automatically. Users can also manually register serializers to Kryo (https://issues.apache.org/jira/browse/FLINK-1399)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1296&quot;&gt;FLINK-1296&lt;/a&gt;: Add support for sorting very large records&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1679&quot;&gt;FLINK-1679&lt;/a&gt;: “degreeOfParallelism” methods renamed to “parallelism”&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1501&quot;&gt;FLINK-1501&lt;/a&gt;: Add metrics library for monitoring TaskManagers&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1760&quot;&gt;FLINK-1760&lt;/a&gt;: Add support for building Flink with Scala 2.11&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1648&quot;&gt;FLINK-1648&lt;/a&gt;: Add a mode where the system automatically sets the parallelism to the available task slots&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1622&quot;&gt;FLINK-1622&lt;/a&gt;: Add groupCombine operator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1589&quot;&gt;FLINK-1589&lt;/a&gt;: Add option to pass Configuration to LocalExecutor&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1504&quot;&gt;FLINK-1504&lt;/a&gt;: Add support for accessing secured HDFS clusters in standalone mode&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1478&quot;&gt;FLINK-1478&lt;/a&gt;: Add strictly local input split assignment&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1512&quot;&gt;FLINK-1512&lt;/a&gt;: Add CsvReader for reading into POJOs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1461&quot;&gt;FLINK-1461&lt;/a&gt;: Add sortPartition operator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1450&quot;&gt;FLINK-1450&lt;/a&gt;: Add Fold operator to the Streaming api&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1389&quot;&gt;FLINK-1389&lt;/a&gt;: Allow setting custom file extensions for files created by the FileOutputFormat&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1236&quot;&gt;FLINK-1236&lt;/a&gt;: Add support for localization of Hadoop Input Splits&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1179&quot;&gt;FLINK-1179&lt;/a&gt;: Add button to JobManager web interface to request stack trace of a TaskManager&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1105&quot;&gt;FLINK-1105&lt;/a&gt;: Add support for locally sorted output&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1688&quot;&gt;FLINK-1688&lt;/a&gt;: Add socket sink&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1436&quot;&gt;FLINK-1436&lt;/a&gt;: Improve usability of command line interface&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2174&quot;&gt;FLINK-2174&lt;/a&gt;: Allow comments in ‘slaves’ file&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1698&quot;&gt;FLINK-1698&lt;/a&gt;: Add polynomial base feature mapper to ML library&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1697&quot;&gt;FLINK-1697&lt;/a&gt;: Add alternating least squares algorithm for matrix factorization to ML library&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1792&quot;&gt;FLINK-1792&lt;/a&gt;: FLINK-456 Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1672&quot;&gt;FLINK-1672&lt;/a&gt;: Refactor task registration/unregistration&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2001&quot;&gt;FLINK-2001&lt;/a&gt;: DistanceMetric cannot be serialized&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1676&quot;&gt;FLINK-1676&lt;/a&gt;: enableForceKryo() is not working as expected&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1959&quot;&gt;FLINK-1959&lt;/a&gt;: Accumulators BROKEN after Partitioning&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1696&quot;&gt;FLINK-1696&lt;/a&gt;: Add multiple linear regression to ML library&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1820&quot;&gt;FLINK-1820&lt;/a&gt;: Bug in DoubleParser and FloatParser - empty String is not casted to 0&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1985&quot;&gt;FLINK-1985&lt;/a&gt;: Streaming does not correctly forward ExecutionConfig to runtime&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1828&quot;&gt;FLINK-1828&lt;/a&gt;: Impossible to output data to an HBase table&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1952&quot;&gt;FLINK-1952&lt;/a&gt;: Cannot run ConnectedComponents example: Could not allocate a slot on instance&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1848&quot;&gt;FLINK-1848&lt;/a&gt;: Paths containing a Windows drive letter cannot be used in FileOutputFormats&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1954&quot;&gt;FLINK-1954&lt;/a&gt;: Task Failures and Error Handling&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2004&quot;&gt;FLINK-2004&lt;/a&gt;: Memory leak in presence of failed checkpoints in KafkaSource&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2132&quot;&gt;FLINK-2132&lt;/a&gt;: Java version parsing is not working for OpenJDK&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2098&quot;&gt;FLINK-2098&lt;/a&gt;: Checkpoint barrier initiation at source is not aligned with snapshotting&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2069&quot;&gt;FLINK-2069&lt;/a&gt;: writeAsCSV function in DataStream Scala API creates no file&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2092&quot;&gt;FLINK-2092&lt;/a&gt;: Document (new) behavior of print() and execute()&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2177&quot;&gt;FLINK-2177&lt;/a&gt;: NullPointer in task resource release&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2054&quot;&gt;FLINK-2054&lt;/a&gt;: StreamOperator rework removed copy calls when passing output to a chained operator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2196&quot;&gt;FLINK-2196&lt;/a&gt;: Missplaced Class in flink-java SortPartitionOperator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2191&quot;&gt;FLINK-2191&lt;/a&gt;: Inconsistent use of Closure Cleaner in Streaming API&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2206&quot;&gt;FLINK-2206&lt;/a&gt;: JobManager webinterface shows 5 finished jobs at most&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2188&quot;&gt;FLINK-2188&lt;/a&gt;: Reading from big HBase Tables&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1781&quot;&gt;FLINK-1781&lt;/a&gt;: Quickstarts broken due to Scala Version Variables&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;notice&quot;&gt;Notice&lt;/h2&gt;

&lt;p&gt;The 0.9 series of Flink is the last version to support Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015).&lt;/p&gt;

&lt;p&gt;Flink will require at least Java 7 in major releases after 0.9.0.&lt;/p&gt;
</description>
<pubDate>Wed, 24 Jun 2015 16:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/06/24/announcing-apache-flink-0.9.0-release.html</link>
<guid isPermaLink="true">/news/2015/06/24/announcing-apache-flink-0.9.0-release.html</guid>
</item>

<item>
<title>April 2015 in the Flink community</title>
<description>&lt;p&gt;April was an packed month for Apache Flink.&lt;/p&gt;

&lt;h3 id=&quot;flink-runner-for-google-cloud-dataflow&quot;&gt;Flink runner for Google Cloud Dataflow&lt;/h3&gt;

&lt;p&gt;A Flink runner for Google Cloud Dataflow was announced. See the blog
posts by &lt;a href=&quot;http://data-artisans.com/announcing-google-cloud-dataflow-on-flink-and-easy-flink-deployment-on-google-cloud/&quot;&gt;data Artisans&lt;/a&gt; and
the &lt;a href=&quot;http://googlecloudplatform.blogspot.de/2015/03/announcing-Google-Cloud-Dataflow-runner-for-Apache-Flink.html&quot;&gt;Google Cloud Platform Blog&lt;/a&gt;.
Google Cloud Dataflow programs can be written using and open-source
SDK and run in multiple backends, either as a managed service inside
Google’s infrastructure, or leveraging open source runners,
including Apache Flink.&lt;/p&gt;

&lt;h2 id=&quot;flink-090-milestone1-release&quot;&gt;Flink 0.9.0-milestone1 release&lt;/h2&gt;

&lt;p&gt;The highlight of April was of course the availability of &lt;a href=&quot;/news/2015/04/13/release-0.9.0-milestone1.html&quot;&gt;Flink 0.9-milestone1&lt;/a&gt;. This was a release packed with new features, including, a Python DataSet API, the new SQL-like Table API, FlinkML, a machine learning library on Flink, Gelly, FLink’s Graph API, as well as a mode to run Flink on YARN leveraging Tez. In case you missed it, check out the &lt;a href=&quot;/news/2015/04/13/release-0.9.0-milestone1.html&quot;&gt;release announcement blog post&lt;/a&gt; for details&lt;/p&gt;

&lt;h2 id=&quot;conferences-and-meetups&quot;&gt;Conferences and meetups&lt;/h2&gt;

&lt;p&gt;April kicked off the conference season. Apache Flink was presented at ApacheCon in Texas (&lt;a href=&quot;http://www.slideshare.net/fhueske/apache-flink&quot;&gt;slides&lt;/a&gt;), the Hadoop Summit in Brussels featured two talks on Flink (see slides &lt;a href=&quot;http://www.slideshare.net/AljoschaKrettek/data-analysis-with-apache-flink-hadoop-summit-2015&quot;&gt;here&lt;/a&gt; and &lt;a href=&quot;http://www.slideshare.net/GyulaFra/flink-streaming-hadoopsummit&quot;&gt;here&lt;/a&gt;), as well as at the Hadoop User Groups of the Netherlands (&lt;a href=&quot;http://www.slideshare.net/stephanewen1/apache-flink-overview-and-use-cases-at-prehadoop-summit-meetups&quot;&gt;slides&lt;/a&gt;) and Stockholm. The brand new &lt;a href=&quot;http://www.meetup.com/Apache-Flink-Stockholm/&quot;&gt;Apache Flink meetup Stockholm&lt;/a&gt; was also established.&lt;/p&gt;

&lt;h2 id=&quot;google-summer-of-code&quot;&gt;Google Summer of Code&lt;/h2&gt;

&lt;p&gt;Three students will work on Flink during Google’s &lt;a href=&quot;https://www.google-melange.com/gsoc/homepage/google/gsoc2015&quot;&gt;Summer of Code program&lt;/a&gt; on distributed pattern matching, exact and approximate statistics for data streams and windows, as well as asynchronous iterations and updates.&lt;/p&gt;

&lt;h2 id=&quot;flink-on-the-web&quot;&gt;Flink on the web&lt;/h2&gt;

&lt;p&gt;Fabian Hueske gave an &lt;a href=&quot;http://www.infoq.com/news/2015/04/hueske-apache-flink?utm_campaign=infoq_content&amp;amp;utm_source=infoq&amp;amp;utm_medium=feed&amp;amp;utm_term=global&quot;&gt;interview at InfoQ&lt;/a&gt; on Apache Flink.&lt;/p&gt;

&lt;h2 id=&quot;upcoming-events&quot;&gt;Upcoming events&lt;/h2&gt;

&lt;p&gt;Stay tuned for a wealth of upcoming events! Two Flink talsk will be presented at &lt;a href=&quot;http://berlinbuzzwords.de/15/sessions&quot;&gt;Berlin Buzzwords&lt;/a&gt;, Flink will be presented at the &lt;a href=&quot;http://2015.hadoopsummit.org/san-jose/&quot;&gt;Hadoop Summit in San Jose&lt;/a&gt;. A &lt;a href=&quot;http://www.meetup.com/Apache-Flink-Meetup/events/220557545/&quot;&gt;training workshop on Apache Flink&lt;/a&gt; is being organized in Berlin. Finally, &lt;a href=&quot;http://2015.flink-forward.org/&quot;&gt;Flink Forward&lt;/a&gt;, the first conference to bring together the whole Flink community is taking place in Berlin in October 2015.&lt;/p&gt;
</description>
<pubDate>Thu, 14 May 2015 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/05/14/Community-update-April.html</link>
<guid isPermaLink="true">/news/2015/05/14/Community-update-April.html</guid>
</item>

<item>
<title>Juggling with Bits and Bytes</title>
<description>&lt;h2 id=&quot;how-apache-flink-operates-on-binary-data&quot;&gt;How Apache Flink operates on binary data&lt;/h2&gt;

&lt;p&gt;Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.&lt;/p&gt;

&lt;p&gt;In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.&lt;/p&gt;

&lt;h2 id=&quot;data-objects-lets-put-them-on-the-heap&quot;&gt;Data Objects? Let’s put them on the heap!&lt;/h2&gt;

&lt;p&gt;The most straight-forward approach to process lots of data in a JVM is to put it as objects on the heap and operate on these objects. Caching a data set as objects would be as simple as maintaining a list containing an object for each record. An in-memory sort would simply sort the list of objects.
However, this approach has a few notable drawbacks. First of all it is not trivial to watch and control heap memory usage when a lot of objects are created and invalidated constantly. Memory overallocation instantly kills the JVM with an &lt;code&gt;OutOfMemoryError&lt;/code&gt;. Another aspect is garbage collection on multi-GB JVMs which are flooded with new objects. The overhead of garbage collection in such environments can easily reach 50% and more. Finally, Java objects come with a certain space overhead depending on the JVM and platform. For data sets with many small objects this can significantly reduce the effectively usable amount of memory. Given proficient system design and careful, use-case specific system parameter tuning, heap memory usage can be more or less controlled and &lt;code&gt;OutOfMemoryErrors&lt;/code&gt; avoided. However, such setups are rather fragile especially if data characteristics or the execution environment change.&lt;/p&gt;

&lt;h2 id=&quot;what-is-flink-doing-about-that&quot;&gt;What is Flink doing about that?&lt;/h2&gt;

&lt;p&gt;Apache Flink has its roots at a research project which aimed to combine the best technologies of MapReduce-based systems and parallel database systems. Coming from this background, Flink has always had its own way of processing data in-memory. Instead of putting lots of objects on the heap, Flink serializes objects into a fixed number of pre-allocated memory segments. Its DBMS-style sort and join algorithms operate as much as possible on this binary data to keep the de/serialization overhead at a minimum. If more data needs to be processed than can be kept in memory, Flink’s operators partially spill data to disk. In fact, a lot of Flink’s internal implementations look more like C/C++ rather than common Java. The following figure gives a high-level overview of how Flink stores data serialized in memory segments and spills to disk if necessary.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/memory-mgmt.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Flink’s style of active memory management and operating on binary data has several benefits:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Memory-safe execution &amp;amp; efficient out-of-core algorithms.&lt;/strong&gt; Due to the fixed amount of allocated memory segments, it is trivial to monitor remaining memory resources. In case of memory shortage, processing operators can efficiently write larger batches of memory segments to disk and later them read back. Consequently, &lt;code&gt;OutOfMemoryErrors&lt;/code&gt; are effectively prevented.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Reduced garbage collection pressure.&lt;/strong&gt; Because all long-lived data is in binary representation in Flink’s managed memory, all data objects are short-lived or even mutable and can be reused. Short-lived objects can be more efficiently garbage-collected, which significantly reduces garbage collection pressure. Right now, the pre-allocated memory segments are long-lived objects on the JVM heap, but the Flink community is actively working on allocating off-heap memory for this purpose. This effort will result in much smaller JVM heaps and facilitate even faster garbage collection cycles.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Space efficient data representation.&lt;/strong&gt; Java objects have a storage overhead which can be avoided if the data is stored in a binary representation.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Efficient binary operations &amp;amp; cache sensitivity.&lt;/strong&gt; Binary data can be efficiently compared and operated on given a suitable binary representation. Furthermore, the binary representations can put related values, as well as hash codes, keys, and pointers, adjacently into memory. This gives data structures with usually more cache efficient access patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These properties of active memory management are very desirable in a data processing systems for large-scale data analytics but have a significant price tag attached. Active memory management and operating on binary data is not trivial to implement, i.e., using &lt;code&gt;java.util.HashMap&lt;/code&gt; is much easier than implementing a spillable hash-table backed by byte arrays and a custom serialization stack. Of course Apache Flink is not the only JVM-based data processing system that operates on serialized binary data. Projects such as &lt;a href=&quot;http://drill.apache.org/&quot;&gt;Apache Drill&lt;/a&gt;, &lt;a href=&quot;http://ignite.incubator.apache.org/&quot;&gt;Apache Ignite (incubating)&lt;/a&gt; or &lt;a href=&quot;http://projectgeode.org/&quot;&gt;Apache Geode (incubating)&lt;/a&gt; apply similar techniques and it was recently announced that also &lt;a href=&quot;http://spark.apache.org/&quot;&gt;Apache Spark&lt;/a&gt; will evolve into this direction with &lt;a href=&quot;https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html&quot;&gt;Project Tungsten&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the following we discuss in detail how Flink allocates memory, de/serializes objects, and operates on binary data. We will also show some performance numbers comparing processing objects on the heap and operating on binary data.&lt;/p&gt;

&lt;h2 id=&quot;how-does-flink-allocate-memory&quot;&gt;How does Flink allocate memory?&lt;/h2&gt;

&lt;p&gt;A Flink worker, called TaskManager, is composed of several internal components such as an actor system for coordination with the Flink master, an IOManager that takes care of spilling data to disk and reading it back, and a MemoryManager that coordinates memory usage. In the context of this blog post, the MemoryManager is of most interest.&lt;/p&gt;

&lt;p&gt;The MemoryManager takes care of allocating, accounting, and distributing MemorySegments to data processing operators such as sort and join operators. A &lt;a href=&quot;https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java&quot;&gt;MemorySegment&lt;/a&gt; is Flink’s distribution unit of memory and is backed by a regular Java byte array (size is 32 KB by default). A MemorySegment provides very efficient write and read access to its backed byte array using Java’s unsafe methods. You can think of a MemorySegment as a custom-tailored version of Java’s NIO ByteBuffer. In order to operate on multiple MemorySegments like on a larger chunk of consecutive memory, Flink uses logical views that implement Java’s &lt;code&gt;java.io.DataOutput&lt;/code&gt; and &lt;code&gt;java.io.DataInput&lt;/code&gt; interfaces.&lt;/p&gt;

&lt;p&gt;MemorySegments are allocated once at TaskManager start-up time and are destroyed when the TaskManager is shut down. Hence, they are reused and not garbage-collected over the whole lifetime of a TaskManager. After all internal data structures of a TaskManager have been initialized and all core services have been started, the MemoryManager starts creating MemorySegments. By default 70% of the JVM heap that is available after service initialization is allocated by the MemoryManager. It is also possible to configure an absolute amount of managed memory. The remaining JVM heap is used for objects that are instantiated during task processing, including objects created by user-defined functions. The following figure shows the memory distribution in the TaskManager JVM after startup.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/memory-alloc.png&quot; style=&quot;width:60%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h2 id=&quot;how-does-flink-serialize-objects&quot;&gt;How does Flink serialize objects?&lt;/h2&gt;

&lt;p&gt;The Java ecosystem offers several libraries to convert objects into a binary representation and back. Common alternatives are standard Java serialization, &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt;, &lt;a href=&quot;http://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt;, &lt;a href=&quot;http://thrift.apache.org/&quot;&gt;Apache Thrift&lt;/a&gt;, or Google’s &lt;a href=&quot;https://github.com/google/protobuf&quot;&gt;Protobuf&lt;/a&gt;. Flink includes its own custom serialization framework in order to control the binary representation of data. This is important because operating on binary data such as comparing or even manipulating binary data requires exact knowledge of the serialization layout. Further, configuring the serialization layout with respect to operations that are performed on binary data can yield a significant performance boost. Flink’s serialization stack also leverages the fact, that the type of the objects which are going through de/serialization are exactly known before a program is executed.&lt;/p&gt;

&lt;p&gt;Flink programs can process data represented as arbitrary Java or Scala objects. Before a program is optimized, the data types at each processing step of the program’s data flow need to be identified. For Java programs, Flink features a reflection-based type extraction component to analyze the return types of user-defined functions. Scala programs are analyzed with help of the Scala compiler. Flink represents each data type with a &lt;a href=&quot;https://github.com/apache/flink/blob/release-0.9.0-milestone-1/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/TypeInformation.java&quot;&gt;TypeInformation&lt;/a&gt;. Flink has TypeInformations for several kinds of data types, including:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;BasicTypeInfo: Any (boxed) Java primitive type or java.lang.String.&lt;/li&gt;
  &lt;li&gt;BasicArrayTypeInfo: Any array of a (boxed) Java primitive type or java.lang.String.&lt;/li&gt;
  &lt;li&gt;WritableTypeInfo: Any implementation of Hadoop’s Writable interface.&lt;/li&gt;
  &lt;li&gt;TupleTypeInfo: Any Flink tuple (Tuple1 to Tuple25). Flink tuples are Java representations for fixed-length tuples with typed fields.&lt;/li&gt;
  &lt;li&gt;CaseClassTypeInfo: Any Scala CaseClass (including Scala tuples).&lt;/li&gt;
  &lt;li&gt;PojoTypeInfo: Any POJO (Java or Scala), i.e., an object with all fields either being public or accessible through getters and setter that follow the common naming conventions.&lt;/li&gt;
  &lt;li&gt;GenericTypeInfo: Any data type that cannot be identified as another type.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each TypeInformation provides a serializer for the data type it represents. For example, a BasicTypeInfo returns a serializer that writes the respective primitive type, the serializer of a WritableTypeInfo delegates de/serialization to the write() and readFields() methods of the object implementing Hadoop’s Writable interface, and a GenericTypeInfo returns a serializer that delegates serialization to Kryo. Object serialization to a DataOutput which is backed by Flink MemorySegments goes automatically through Java’s efficient unsafe operations. For data types that can be used as keys, i.e., compared and hashed, the TypeInformation provides TypeComparators. TypeComparators compare and hash objects and can - depending on the concrete data type - also efficiently compare binary representations and extract fixed-length binary key prefixes.&lt;/p&gt;

&lt;p&gt;Tuple, Pojo, and CaseClass types are composite types, i.e., containers for one or more possibly nested data types. As such, their serializers and comparators are also composite and delegate the serialization and comparison of their member data types to the respective serializers and comparators. The following figure illustrates the serialization of a (nested) &lt;code&gt;Tuple3&amp;lt;Integer, Double, Person&amp;gt;&lt;/code&gt; object where &lt;code&gt;Person&lt;/code&gt; is a POJO and defined as follows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Person&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/data-serialization.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;Flink’s type system can be easily extended by providing custom TypeInformations, Serializers, and Comparators to improve the performance of serializing and comparing custom data types.&lt;/p&gt;

&lt;h2 id=&quot;how-does-flink-operate-on-binary-data&quot;&gt;How does Flink operate on binary data?&lt;/h2&gt;

&lt;p&gt;Similar to many other data processing APIs (including SQL), Flink’s APIs provide transformations to group, sort, and join data sets. These transformations operate on potentially very large data sets. Relational database systems feature very efficient algorithms for these purposes since several decades including external merge-sort, merge-join, and hybrid hash-join. Flink builds on this technology, but generalizes it to handle arbitrary objects using its custom serialization and comparison stack. In the following, we show how Flink operates with binary data by the example of Flink’s in-memory sort algorithm.&lt;/p&gt;

&lt;p&gt;Flink assigns a memory budget to its data processing operators. Upon initialization, a sort algorithm requests its memory budget from the MemoryManager and receives a corresponding set of MemorySegments. The set of MemorySegments becomes the memory pool of a so-called sort buffer which collects the data that is be sorted. The following figure illustrates how data objects are serialized into the sort buffer.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sorting-binary-data-1.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The sort buffer is internally organized into two memory regions. The first region holds the full binary data of all objects. The second region contains pointers to the full binary object data and - depending on the key data type - fixed-length sort keys. When an object is added to the sort buffer, its binary data is appended to the first region, and a pointer (and possibly a key) is appended to the second region. The separation of actual data and pointers plus fixed-length keys is done for two purposes. It enables efficient swapping of fix-length entries (key+pointer) and also reduces the data that needs to be moved when sorting. If the sort key is a variable length data type such as a String, the fixed-length sort key must be a prefix key such as the first n characters of a String. Note, not all data types provide a fixed-length (prefix) sort key. When serializing objects into the sort buffer, both memory regions are extended with MemorySegments from the memory pool. Once the memory pool is empty and no more objects can be added, the sort buffer is completely filled and can be sorted. Flink’s sort buffer provides methods to compare and swap elements. This makes the actual sort algorithm pluggable. By default, Flink uses a Quicksort implementation which can fall back to HeapSort. 
The following figure shows how two objects are compared.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sorting-binary-data-2.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The sort buffer compares two elements by comparing their binary fix-length sort keys. The comparison is successful if either done on a full key (not a prefix key) or if the binary prefix keys are not equal. If the prefix keys are equal (or the sort key data type does not provide a binary prefix key), the sort buffer follows the pointers to the actual object data, deserializes both objects and compares the objects. Depending on the result of the comparison, the sort algorithm decides whether to swap the compared elements or not. The sort buffer swaps two elements by moving their fix-length keys and pointers. The actual data is not moved. Once the sort algorithm finishes, the pointers in the sort buffer are correctly ordered. The following figure shows how the sorted data is returned from the sort buffer.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sorting-binary-data-3.png&quot; style=&quot;width:80%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The sorted data is returned by sequentially reading the pointer region of the sort buffer, skipping the sort keys and following the sorted pointers to the actual data. This data is either deserialized and returned as objects or the binary representation is copied and written to disk in case of an external merge-sort (see this &lt;a href=&quot;http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html&quot;&gt;blog post on joins in Flink&lt;/a&gt;).&lt;/p&gt;

&lt;h2 id=&quot;show-me-numbers&quot;&gt;Show me numbers!&lt;/h2&gt;

&lt;p&gt;So, what does operating on binary data mean for performance? We’ll run a benchmark that sorts 10 million &lt;code&gt;Tuple2&amp;lt;Integer, String&amp;gt;&lt;/code&gt; objects to find out. The values of the Integer field are sampled from a uniform distribution. The String field values have a length of 12 characters and are sampled from a long-tail distribution. The input data is provided by an iterator that returns a mutable object, i.e., the same tuple object instance is returned with different field values. Flink uses this technique when reading data from memory, network, or disk to avoid unnecessary object instantiations. The benchmarks are run in a JVM with 900 MB heap size which is approximately the required amount of memory to store and sort 10 million tuple objects on the heap without dying of an &lt;code&gt;OutOfMemoryError&lt;/code&gt;. We sort the tuples on the Integer field and on the String field using three sorting methods:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;&lt;strong&gt;Object-on-heap.&lt;/strong&gt; The tuples are stored in a regular &lt;code&gt;java.util.ArrayList&lt;/code&gt; with initial capacity set to 10 million entries and sorted using Java’s regular collection sort.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Flink-serialized.&lt;/strong&gt; The tuple fields are serialized into a sort buffer of 600 MB size using Flink’s custom serializers, sorted as described above, and finally deserialized again. When sorting on the Integer field, the full Integer is used as sort key such that the sort happens entirely on binary data (no deserialization of objects required). For sorting on the String field a 8-byte prefix key is used and tuple objects are deserialized if the prefix keys are equal.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Kryo-serialized.&lt;/strong&gt; The tuple fields are serialized into a sort buffer of 600 MB size using Kryo serialization and sorted without binary sort keys. This means that each pair-wise comparison requires two object to be deserialized.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;All sort methods are implemented using a single thread. The reported times are averaged over ten runs. After each run, we call &lt;code&gt;System.gc()&lt;/code&gt; to request a garbage collection run which does not go into measured execution time. The following figure shows the time to store the input data in memory, sort it, and read it back as objects.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/sort-benchmark.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;We see that Flink’s sort on binary data using its own serializers significantly outperforms the other two methods. Comparing to the object-on-heap method, we see that loading the data into memory is much faster. Since we actually collect the objects, there is no opportunity to reuse the object instances, but have to re-create every tuple. This is less efficient than Flink’s serializers (or Kryo serialization). On the other hand, reading objects from the heap comes for free compared to deserialization. In our benchmark, object cloning was more expensive than serialization and deserialization combined. Looking at the sorting time, we see that also sorting on the binary representation is faster than Java’s collection sort. Sorting data that was serialized using Kryo without binary sort key, is much slower than both other methods. This is due to the heavy deserialization overhead. Sorting the tuples on their String field is faster than sorting on the Integer field due to the long-tailed value distribution which significantly reduces the number of pair-wise comparisons. To get a better feeling of what is happening during sorting we monitored the executing JVM using VisualVM. The following screenshots show heap memory usage, garbage collection activity and CPU usage over the execution of 10 runs.&lt;/p&gt;

&lt;table width=&quot;100%&quot;&gt;
  &lt;tr&gt;
    &lt;th&gt;&lt;/th&gt;
    &lt;th&gt;&lt;center&gt;&lt;b&gt;Garbage Collection&lt;/b&gt;&lt;/center&gt;&lt;/th&gt;
    &lt;th&gt;&lt;center&gt;&lt;b&gt;Memory Usage&lt;/b&gt;&lt;/center&gt;&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;b&gt;Object-on-Heap (int)&lt;/b&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/objHeap-int-gc.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/objHeap-int-mem.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;b&gt;Flink-Serialized (int)&lt;/b&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/flinkSer-int-gc.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/flinkSer-int-mem.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;b&gt;Kryo-Serialized (int)&lt;/b&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/kryoSer-int-gc.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
    &lt;td&gt;&lt;img src=&quot;/img/blog/kryoSer-int-mem.png&quot; style=&quot;width:80%&quot; /&gt;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;The experiments run single-threaded on an 8-core machine, so full utilization of one core only corresponds to a 12.5% overall utilization. The screenshots show that operating on binary data significantly reduces garbage collection activity. For the object-on-heap approach, the garbage collector runs in very short intervals while filling the sort buffer and causes a lot of CPU usage even for a single processing thread (sorting itself does not trigger the garbage collector). The JVM garbage collects with multiple parallel threads, explaining the high overall CPU utilization. On the other hand, the methods that operate on serialized data rarely trigger the garbage collector and have a much lower CPU utilization. In fact the garbage collector does not run at all if the tuples are sorted on the Integer field using the flink-serialized method because no objects need to be deserialized for pair-wise comparisons. The kryo-serialized method requires slightly more garbage collection since it does not use binary sort keys and deserializes two objects for each comparison.&lt;/p&gt;

&lt;p&gt;The memory usage charts shows that the flink-serialized and kryo-serialized constantly occupy a high amount of memory (plus some objects for operation). This is due to the pre-allocation of MemorySegments. The actual memory usage is much lower, because the sort buffers are not completely filled. The following table shows the memory consumption of each method. 10 million records result in about 280 MB of binary data (object data plus pointers and sort keys) depending on the used serializer and presence and size of a binary sort key. Comparing this to the memory requirements of the object-on-heap approach we see that operating on binary data can significantly improve memory efficiency. In our benchmark more than twice as much data can be sorted in-memory if serialized into a sort buffer instead of holding it as objects on the heap.&lt;/p&gt;

&lt;table width=&quot;100%&quot;&gt;
  &lt;tr&gt;
  	&lt;th&gt;Occupied Memory&lt;/th&gt;
    &lt;th&gt;Object-on-Heap&lt;/th&gt;
    &lt;th&gt;Flink-Serialized&lt;/th&gt;
    &lt;th&gt;Kryo-Serialized&lt;/th&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;b&gt;Sort on Integer&lt;/b&gt;&lt;/td&gt;
    &lt;td&gt;approx. 700 MB (heap)&lt;/td&gt;
    &lt;td&gt;277 MB (sort buffer)&lt;/td&gt;
    &lt;td&gt;266 MB (sort buffer)&lt;/td&gt;
  &lt;/tr&gt;
  &lt;tr&gt;
    &lt;td&gt;&lt;b&gt;Sort on String&lt;/b&gt;&lt;/td&gt;
    &lt;td&gt;approx. 700 MB (heap)&lt;/td&gt;
    &lt;td&gt;315 MB (sort buffer)&lt;/td&gt;
    &lt;td&gt;266 MB (sort buffer)&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;

&lt;p&gt;&lt;br /&gt;&lt;/p&gt;

&lt;p&gt;To summarize, the experiments verify the previously stated benefits of operating on binary data.&lt;/p&gt;

&lt;h2 id=&quot;were-not-done-yet&quot;&gt;We’re not done yet!&lt;/h2&gt;

&lt;p&gt;Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators.&lt;/p&gt;

&lt;p&gt;The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community!&lt;/p&gt;

&lt;h2 id=&quot;tldr-give-me-three-things-to-remember&quot;&gt;TL;DR; Give me three things to remember!&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Flink’s active memory management avoids nasty &lt;code&gt;OutOfMemoryErrors&lt;/code&gt; that kill your JVMs and reduces garbage collection overhead.&lt;/li&gt;
  &lt;li&gt;Flink features a highly efficient data de/serialization stack that facilitates operations on binary data and makes more data fit into memory.&lt;/li&gt;
  &lt;li&gt;Flink’s DBMS-style operators operate natively on binary data yielding high performance in-memory and destage gracefully to disk if necessary.&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 11 May 2015 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html</link>
<guid isPermaLink="true">/news/2015/05/11/Juggling-with-Bits-and-Bytes.html</guid>
</item>

<item>
<title>Announcing Flink 0.9.0-milestone1 preview release</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the availability of
the 0.9.0-milestone-1 release. The release is a preview of the
upcoming 0.9.0 release. It contains many new features which will be
available in the upcoming 0.9 release. Interested users are encouraged
to try it out and give feedback. As the version number indicates, this
release is a preview release that contains known issues.&lt;/p&gt;

&lt;p&gt;You can download the release
&lt;a href=&quot;http://flink.apache.org/downloads.html#preview&quot;&gt;here&lt;/a&gt; and check out the
latest documentation
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/&quot;&gt;here&lt;/a&gt;. Feedback
through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing
lists&lt;/a&gt; is, as
always, very welcome!&lt;/p&gt;

&lt;h2 id=&quot;new-features&quot;&gt;New Features&lt;/h2&gt;

&lt;h3 id=&quot;table-api&quot;&gt;Table API&lt;/h3&gt;

&lt;p&gt;Flink’s new Table API offers a higher-level abstraction for
interacting with structured data sources. The Table API allows users
to execute logical, SQL-like queries on distributed data sets while
allowing them to freely mix declarative queries with regular Flink
operators. Here is an example that groups and joins two tables:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clickCounts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;clicks&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;user&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;url&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;activeUsers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;clickCounts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;userId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Tables consist of logical attributes that can be selected by name
rather than physical Java and Scala data types. This alleviates a lot
of boilerplate code for common ETL tasks and raises the abstraction
for Flink programs. Tables are available for both static and streaming
data sources (DataSet and DataStream APIs).&lt;/p&gt;

&lt;p&gt;Check out the Table guide for Java and Scala
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/table.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;gelly-graph-processing-api&quot;&gt;Gelly Graph Processing API&lt;/h3&gt;

&lt;p&gt;Gelly is a Java Graph API for Flink. It contains a set of utilities
for graph analysis, support for iterative graph processing and a
library of graph algorithms. Gelly exposes a Graph data structure that
wraps DataSets for vertices and edges, as well as methods for creating
graphs from DataSets, graph transformations and utilities (e.g., in-
and out- degrees of vertices), neighborhood aggregations, iterative
vertex-centric graph processing, as well as a library of common graph
algorithms, including PageRank, SSSP, label propagation, and community
detection.&lt;/p&gt;

&lt;p&gt;Gelly internally builds on top of Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/iterations.html&quot;&gt;delta
iterations&lt;/a&gt;. Iterative
graph algorithms are executed leveraging mutable state, achieving
similar performance with specialized graph processing systems.&lt;/p&gt;

&lt;p&gt;Gelly will eventually subsume Spargel, Flink’s Pregel-like API. Check
out the Gelly guide
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/libs/gelly.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;flink-machine-learning-library&quot;&gt;Flink Machine Learning Library&lt;/h3&gt;

&lt;p&gt;This release includes the first version of Flink’s Machine Learning
library. The library’s pipeline approach, which has been strongly
inspired by scikit-learn’s abstraction of transformers and estimators,
makes it easy to quickly set up a data processing pipeline and to get
your job done.&lt;/p&gt;

&lt;p&gt;Flink distinguishes between transformers and learners. Transformers
are components which transform your input data into a new format
allowing you to extract features, cleanse your data or to sample from
it. Learners on the other hand constitute the components which take
your input data and train a model on it. The model you obtain from the
learner can then be evaluated and used to make predictions on unseen
data.&lt;/p&gt;

&lt;p&gt;Currently, the machine learning library contains transformers and
learners to do multiple tasks. The library supports multiple linear
regression using a stochastic gradient implementation to scale to
large data sizes. Furthermore, it includes an alternating least
squares (ALS) implementation to factorizes large matrices. The matrix
factorization can be used to do collaborative filtering. An
implementation of the communication efficient distributed dual
coordinate ascent (CoCoA) algorithm is the latest addition to the
library. The CoCoA algorithm can be used to train distributed
soft-margin SVMs.&lt;/p&gt;

&lt;h3 id=&quot;flink-on-yarn-leveraging-apache-tez&quot;&gt;Flink on YARN leveraging Apache Tez&lt;/h3&gt;

&lt;p&gt;We are introducing a new execution mode for Flink to be able to run
restricted Flink programs on top of &lt;a href=&quot;http://tez.apache.org&quot;&gt;Apache
Tez&lt;/a&gt;. This mode retains Flink’s APIs,
optimizer, as well as Flink’s runtime operators, but instead of
wrapping those in Flink tasks that are executed by Flink TaskManagers,
it wraps them in Tez runtime tasks and builds a Tez DAG that
represents the program.&lt;/p&gt;

&lt;p&gt;By using Flink on Tez, users have an additional choice for an
execution platform for Flink programs. While Flink’s distributed
runtime favors low latency, streaming shuffles, and iterative
algorithms, Tez focuses on scalability and elastic resource usage in
shared YARN clusters.&lt;/p&gt;

&lt;p&gt;Get started with Flink on Tez
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/setup/flink_on_tez.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;reworked-distributed-runtime-on-akka&quot;&gt;Reworked Distributed Runtime on Akka&lt;/h3&gt;

&lt;p&gt;Flink’s RPC system has been replaced by the widely adopted
&lt;a href=&quot;http://akka.io&quot;&gt;Akka&lt;/a&gt; framework. Akka’s concurrency model offers the
right abstraction to develop a fast as well as robust distributed
system. By using Akka’s own failure detection mechanism the stability
of Flink’s runtime is significantly improved, because the system can
now react in proper form to node outages. Furthermore, Akka improves
Flink’s scalability by introducing asynchronous messages to the
system. These asynchronous messages allow Flink to be run on many more
nodes than before.&lt;/p&gt;

&lt;h3 id=&quot;exactly-once-processing-on-kafka-streaming-sources&quot;&gt;Exactly-once processing on Kafka Streaming Sources&lt;/h3&gt;

&lt;p&gt;This release introduces stream processing with exacly-once delivery
guarantees for Flink streaming programs that analyze streaming sources
that are persisted by &lt;a href=&quot;http://kafka.apache.org&quot;&gt;Apache Kafka&lt;/a&gt;. The
system is internally tracking the Kafka offsets to ensure that Flink
can pick up data from Kafka where it left off in case of an failure.&lt;/p&gt;

&lt;p&gt;Read
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming_guide.html#apache-kafka&quot;&gt;here&lt;/a&gt;
on how to use the persistent Kafka source.&lt;/p&gt;

&lt;h3 id=&quot;improved-yarn-support&quot;&gt;Improved YARN support&lt;/h3&gt;

&lt;p&gt;Flink’s YARN client contains several improvements, such as a detached
mode for starting a YARN session in the background, the ability to
submit a single Flink job to a YARN cluster without starting a
session, including a “fire and forget” mode. Flink is now also able to
reallocate failed YARN containers to maintain the size of the
requested cluster. This feature allows to implement fault-tolerant
setups on top of YARN. There is also an internal Java API to deploy
and control a running YARN cluster. This is being used by system
integrators to easily control Flink on YARN within their Hadoop 2
cluster.&lt;/p&gt;

&lt;p&gt;See the YARN docs
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/setup/yarn_setup.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;more-improvements-and-fixes&quot;&gt;More Improvements and Fixes&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1605&quot;&gt;FLINK-1605&lt;/a&gt;:
Flink is not exposing its Guava and ASM dependencies to Maven
projects depending on Flink. We use the maven-shade-plugin to
relocate these dependencies into our own namespace. This allows
users to use any Guava or ASM version.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1605&quot;&gt;FLINK-1417&lt;/a&gt;:
Automatic recognition and registration of Java Types at Kryo and the
internal serializers: Flink has its own type handling and
serialization framework falling back to Kryo for types that it cannot
handle. To get the best performance Flink is automatically registering
all types a user is using in their program with Kryo.Flink also
registers serializers for Protocol Buffers, Thrift, Avro and YodaTime
automatically.  Users can also manually register serializers to Kryo
(https://issues.apache.org/jira/browse/FLINK-1399)&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1296&quot;&gt;FLINK-1296&lt;/a&gt;: Add
support for sorting very large records&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1679&quot;&gt;FLINK-1679&lt;/a&gt;:
“degreeOfParallelism” methods renamed to “parallelism”&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1501&quot;&gt;FLINK-1501&lt;/a&gt;: Add
metrics library for monitoring TaskManagers&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1760&quot;&gt;FLINK-1760&lt;/a&gt;: Add
support for building Flink with Scala 2.11&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1648&quot;&gt;FLINK-1648&lt;/a&gt;: Add
a mode where the system automatically sets the parallelism to the
available task slots&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1622&quot;&gt;FLINK-1622&lt;/a&gt;: Add
groupCombine operator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1589&quot;&gt;FLINK-1589&lt;/a&gt;: Add
option to pass Configuration to LocalExecutor&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1504&quot;&gt;FLINK-1504&lt;/a&gt;: Add
support for accessing secured HDFS clusters in standalone mode&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1478&quot;&gt;FLINK-1478&lt;/a&gt;: Add
strictly local input split assignment&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1512&quot;&gt;FLINK-1512&lt;/a&gt;: Add
CsvReader for reading into POJOs.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1461&quot;&gt;FLINK-1461&lt;/a&gt;: Add
sortPartition operator&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1450&quot;&gt;FLINK-1450&lt;/a&gt;: Add
Fold operator to the Streaming api&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1389&quot;&gt;FLINK-1389&lt;/a&gt;:
Allow setting custom file extensions for files created by the
FileOutputFormat&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1236&quot;&gt;FLINK-1236&lt;/a&gt;: Add
support for localization of Hadoop Input Splits&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1179&quot;&gt;FLINK-1179&lt;/a&gt;: Add
button to JobManager web interface to request stack trace of a
TaskManager&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1105&quot;&gt;FLINK-1105&lt;/a&gt;: Add
support for locally sorted output&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1688&quot;&gt;FLINK-1688&lt;/a&gt;: Add
socket sink&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1436&quot;&gt;FLINK-1436&lt;/a&gt;:
Improve usability of command line interface&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 13 Apr 2015 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html</link>
<guid isPermaLink="true">/news/2015/04/13/release-0.9.0-milestone1.html</guid>
</item>

<item>
<title>March 2015 in the Flink community</title>
<description>&lt;p&gt;March has been a busy month in the Flink community.&lt;/p&gt;

&lt;h3 id=&quot;scaling-als&quot;&gt;Scaling ALS&lt;/h3&gt;

&lt;p&gt;Flink committers employed at &lt;a href=&quot;http://data-artisans.com&quot;&gt;data Artisans&lt;/a&gt; published a &lt;a href=&quot;http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/&quot;&gt;blog post&lt;/a&gt; on how they scaled matrix factorization with Flink and Google Compute Engine to matrices with 28 billion elements.&lt;/p&gt;

&lt;h3 id=&quot;learn-about-the-internals-of-flink&quot;&gt;Learn about the internals of Flink&lt;/h3&gt;

&lt;p&gt;The community has started an effort to better document the internals
of Flink. Check out the first articles on the Flink wiki on &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525&quot;&gt;how Flink
manages
memory&lt;/a&gt;,
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Data+exchange+between+tasks&quot;&gt;how tasks in Flink exchange
data&lt;/a&gt;,
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Type+System%2C+Type+Extraction%2C+Serialization&quot;&gt;type extraction and serialization in
Flink&lt;/a&gt;,
as well as &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Akka+and+Actors&quot;&gt;how Flink builds on Akka for distributed
coordination&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Check out also the &lt;a href=&quot;http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html&quot;&gt;new blog
post&lt;/a&gt;
on how Flink executes joins with several insights into Flink’s runtime.&lt;/p&gt;

&lt;h3 id=&quot;meetups-and-talks&quot;&gt;Meetups and talks&lt;/h3&gt;

&lt;p&gt;Flink’s machine learning efforts were presented at the &lt;a href=&quot;http://www.meetup.com/Machine-Learning-Stockholm/events/221144997/&quot;&gt;Machine
Learning Stockholm meetup
group&lt;/a&gt;. The
regular Berlin Flink meetup featured a talk on the past, present, and
future of Flink. The talk is available on
&lt;a href=&quot;https://www.youtube.com/watch?v=fw2DBE6ZiEQ&amp;amp;feature=youtu.be&quot;&gt;youtube&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;in-the-flink-master&quot;&gt;In the Flink master&lt;/h2&gt;

&lt;h3 id=&quot;table-api-in-scala-and-java&quot;&gt;Table API in Scala and Java&lt;/h3&gt;

&lt;p&gt;The new &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-libraries/flink-table&quot;&gt;Table
API&lt;/a&gt;
in Flink is now available in both Java and Scala. Check out the
examples &lt;a href=&quot;https://github.com/apache/flink/blob/master/flink-libraries/flink-table/src/main/java/org/apache/flink/examples/java/JavaTableExample.java&quot;&gt;here (Java)&lt;/a&gt; and &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-libraries/flink-table/src/main/scala/org/apache/flink/examples/scala&quot;&gt;here (Scala)&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;additions-to-the-machine-learning-library&quot;&gt;Additions to the Machine Learning library&lt;/h3&gt;

&lt;p&gt;Flink’s &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-libraries/flink-ml&quot;&gt;Machine Learning
library&lt;/a&gt;
is seeing quite a bit of traction. Recent additions include the &lt;a href=&quot;http://arxiv.org/abs/1409.1458&quot;&gt;CoCoA
algorithm&lt;/a&gt; for distributed
optimization.&lt;/p&gt;

&lt;h3 id=&quot;exactly-once-delivery-guarantees-for-streaming-jobs&quot;&gt;Exactly-once delivery guarantees for streaming jobs&lt;/h3&gt;

&lt;p&gt;Flink streaming jobs now provide exactly once processing guarantees
when coupled with persistent sources (notably &lt;a href=&quot;http://kafka.apache.org&quot;&gt;Apache
Kafka&lt;/a&gt;). Flink periodically checkpoints and
persists the offsets of the sources and restarts from those
checkpoints at failure recovery. This functionality is currently
limited in that it does not yet handle large state and iterative
programs.&lt;/p&gt;

</description>
<pubDate>Tue, 07 Apr 2015 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2015/04/07/march-in-flink.html</link>
<guid isPermaLink="true">/news/2015/04/07/march-in-flink.html</guid>
</item>

<item>
<title>Peeking into Apache Flink&#39;s Engine Room</title>
<description>&lt;h3 id=&quot;join-processing-in-apache-flink&quot;&gt;Join Processing in Apache Flink&lt;/h3&gt;

&lt;p&gt;Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system.&lt;/p&gt;

&lt;p&gt;In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;show how easy it is to join data sets using Flink’s fluent APIs,&lt;/li&gt;
  &lt;li&gt;discuss basic distributed join strategies, Flink’s join implementations, and its memory management,&lt;/li&gt;
  &lt;li&gt;talk about Flink’s optimizer that automatically chooses join strategies,&lt;/li&gt;
  &lt;li&gt;show some performance numbers for joining data sets of different sizes, and finally&lt;/li&gt;
  &lt;li&gt;briefly discuss joining of co-located and pre-sorted data sets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Disclaimer&lt;/em&gt;: This blog post is exclusively about equi-joins. Whenever I say “join” in the following, I actually mean “equi-join”.&lt;/p&gt;

&lt;h3 id=&quot;how-do-i-join-with-flink&quot;&gt;How do I join with Flink?&lt;/h3&gt;

&lt;p&gt;Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce &lt;a href=&quot;http://research.google.com/archive/mapreduce.html&quot;&gt;[1]&lt;/a&gt; but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html&quot;&gt;[2]&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Joining two Scala case class data sets is very easy as the following example shows:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// define your data types&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PageVisit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;url&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ip&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userId&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;email&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// get your data from somewhere&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;visits&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;PageVisit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;// filter the users data set&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;germanUsers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;users&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;country&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;de&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// join data sets&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;germanVisits&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;PageVisit&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
      &lt;span class=&quot;c1&quot;&gt;// equi-join condition (PageVisit.userId = User.id)&lt;/span&gt;
     &lt;span class=&quot;n&quot;&gt;visits&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;germanUsers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;userId&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equalTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Flink’s APIs also allow to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;apply a user-defined join function to each pair of joined elements instead returning a &lt;code&gt;($Left, $Right)&lt;/code&gt; tuple,&lt;/li&gt;
  &lt;li&gt;select fields of pairs of joined Tuple elements (projection), and&lt;/li&gt;
  &lt;li&gt;define composite join keys such as &lt;code&gt;.where(“orderDate”, “zipCode”).equalTo(“date”, “zip”)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See the documentation for more details on Flink’s join features &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join&quot;&gt;[3]&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-does-flink-join-my-data&quot;&gt;How does Flink join my data?&lt;/h3&gt;

&lt;p&gt;Flink uses techniques which are well known from parallel database systems to efficiently execute parallel joins. A join operator must establish all pairs of elements from its input data sets for which the join condition evaluates to true. In a standalone system, the most straight-forward implementation of a join is the so-called nested-loop join which builds the full Cartesian product and evaluates the join condition for each pair of elements. This strategy has quadratic complexity and does obviously not scale to large inputs.&lt;/p&gt;

&lt;p&gt;In a distributed system joins are commonly processed in two steps:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;The data of both inputs is distributed across all parallel instances that participate in the join and&lt;/li&gt;
  &lt;li&gt;each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets &lt;em&gt;R&lt;/em&gt; and &lt;em&gt;S&lt;/em&gt;.&lt;/p&gt;

&lt;h4 id=&quot;ship-strategies&quot;&gt;Ship Strategies&lt;/h4&gt;
&lt;p&gt;Flink features two ship strategies to establish a valid data partitioning for a join:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the &lt;em&gt;Repartition-Repartition&lt;/em&gt; strategy (RR) and&lt;/li&gt;
  &lt;li&gt;the &lt;em&gt;Broadcast-Forward&lt;/em&gt; strategy (BF).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Repartition-Repartition strategy partitions both inputs, R and S, on their join key attributes using the same partitioning function. Each partition is assigned to exactly one parallel join instance and all data of that partition is sent to its associated instance. This ensures that all elements that share the same join key are shipped to the same parallel instance and can be locally joined. The cost of the RR strategy is a full shuffle of both data sets over the network.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-repartition.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-broadcast.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.&lt;/p&gt;

&lt;h4 id=&quot;flinks-memory-management&quot;&gt;Flink’s Memory Management&lt;/h4&gt;
&lt;p&gt;Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an &lt;code&gt;OutOfMemoryException&lt;/code&gt; and kill the JVM.&lt;/p&gt;

&lt;p&gt;Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.&lt;/p&gt;

&lt;p&gt;This design has several nice properties. First, the number of data objects on the JVM heap is much lower resulting in less garbage collection pressure. Second, objects on the heap have a certain space overhead and the binary representation is more compact. Especially data sets of many small elements benefit from that. Third, an algorithm knows exactly when the input data exceeds its working memory and can react by writing some of its filled byte arrays to the worker’s local filesystem. After the content of a byte array is written to disk, it can be reused to process more data. Reading data back into memory is as simple as reading the binary data from the local filesystem. The following figure illustrates Flink’s memory management.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-memmgmt.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;This active memory management makes Flink extremely robust for processing very large data sets on limited memory resources while preserving all benefits of in-memory processing if data is small enough to fit in-memory. De/serializing data into and from memory has a certain cost overhead compared to simply holding all data elements on the JVM’s heap. However, Flink features efficient custom de/serializers which also allow to perform certain operations such as comparisons directly on serialized data without deserializing data objects from memory.&lt;/p&gt;

&lt;h4 id=&quot;local-strategies&quot;&gt;Local Strategies&lt;/h4&gt;

&lt;p&gt;After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;the &lt;em&gt;Sort-Merge-Join&lt;/em&gt; strategy (SM) and&lt;/li&gt;
  &lt;li&gt;the &lt;em&gt;Hybrid-Hash-Join&lt;/em&gt; strategy (HH).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Sort-Merge-Join works by first sorting both input data sets on their join key attributes (Sort Phase) and merging the sorted data sets as a second step (Merge Phase). The sort is done in-memory if the local partition of a data set is small enough. Otherwise, an external merge-sort is done by collecting data until the working memory is filled, sorting it, writing the sorted data to the local filesystem, and starting over by filling the working memory again with more incoming data. After all input data has been received, sorted, and written as sorted runs to the local file system, a fully sorted stream can be obtained. This is done by reading the partially sorted runs from the local filesystem and sort-merging the records on the fly. Once the sorted streams of both inputs are available, both streams are sequentially read and merge-joined in a zig-zag fashion by comparing the sorted join key attributes, building join element pairs for matching keys, and advancing the sorted stream with the lower join key. The figure below shows how the Sort-Merge-Join strategy works.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-smj.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The Hybrid-Hash-Join distinguishes its inputs as build-side and probe-side input and works in two phases, a build phase followed by a probe phase. In the build phase, the algorithm reads the build-side input and inserts all data elements into an in-memory hash table indexed by their join key attributes. If the hash table outgrows the algorithm’s working memory, parts of the hash table (ranges of hash indexes) are written to the local filesystem. The build phase ends after the build-side input has been fully consumed. In the probe phase, the algorithm reads the probe-side input and probes the hash table for each element using its join key attribute. If the element falls into a hash index range that was spilled to disk, the element is also written to disk. Otherwise, the element is immediately joined with all matching elements from the hash table. If the hash table completely fits into the working memory, the join is finished after the probe-side input has been fully consumed. Otherwise, the current hash table is dropped and a new hash table is built using spilled parts of the build-side input. This hash table is probed by the corresponding parts of the spilled probe-side input. Eventually, all data is joined. Hybrid-Hash-Joins perform best if the hash table completely fits into the working memory because an arbitrarily large the probe-side input can be processed on-the-fly without materializing it. However even if build-side input does not fit into memory, the the Hybrid-Hash-Join has very nice properties. In this case, in-memory processing is partially preserved and only a fraction of the build-side and probe-side data needs to be written to and read from the local filesystem. The next figure illustrates how the Hybrid-Hash-Join works.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-hhj.png&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;h3 id=&quot;how-does-flink-choose-join-strategies&quot;&gt;How does Flink choose join strategies?&lt;/h3&gt;

&lt;p&gt;Ship and local strategies do not depend on each other and can be independently chosen. Therefore, Flink can execute a join of two data sets R and S in nine different ways by combining any of the three ship strategies (RR, BF with R being broadcasted, BF with S being broadcasted) with any of the three local strategies (SM, HH with R being build-side, HH with S being build-side). Each of these strategy combinations results in different execution performance depending on the data sizes and the available amount of working memory. In case of a small data set R and a much larger data set S, broadcasting R and using it as build-side input of a Hybrid-Hash-Join is usually a good choice because the much larger data set S is not shipped and not materialized (given that the hash table completely fits into memory). If both data sets are rather large or the join is performed on many parallel instances, repartitioning both inputs is a robust choice.&lt;/p&gt;

&lt;p&gt;Flink features a cost-based optimizer which automatically chooses the execution strategies for all operators including joins. Without going into the details of cost-based optimization, this is done by computing cost estimates for execution plans with different strategies and picking the plan with the least estimated costs. Thereby, the optimizer estimates the amount of data which is shipped over the the network and written to disk. If no reliable size estimates for the input data can be obtained, the optimizer falls back to robust default choices. A key feature of the optimizer is to reason about existing data properties. For example, if the data of one input is already partitioned in a suitable way, the generated candidate plans will not repartition this input. Hence, the choice of a RR ship strategy becomes more likely. The same applies for previously sorted data and the Sort-Merge-Join strategy. Flink programs can help the optimizer to reason about existing data properties by providing semantic information about  user-defined functions &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations&quot;&gt;[4]&lt;/a&gt;. While the optimizer is a killer feature of Flink, it can happen that a user knows better than the optimizer how to execute a specific join. Similar to relational database systems, Flink offers optimizer hints to tell the optimizer which join strategies to pick &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#join-algorithm-hints&quot;&gt;[5]&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;how-is-flinks-join-performance&quot;&gt;How is Flink’s join performance?&lt;/h3&gt;

&lt;p&gt;Alright, that sounds good, but how fast are joins in Flink? Let’s have a look. We start with a benchmark of the single-core performance of Flink’s Hybrid-Hash-Join implementation and run a Flink program that executes a Hybrid-Hash-Join with parallelism 1. We run the program on a n1-standard-2 Google Compute Engine instance (2 vCPUs, 7.5GB memory) with two locally attached SSDs. We give 4GB as working memory to the join. The join program generates 1KB records for both inputs on-the-fly, i.e., the data is not read from disk. We run 1:N (Primary-Key/Foreign-Key) joins and generate the smaller input with unique Integer join keys and the larger input with randomly chosen Integer join keys that fall into the key range of the smaller input. Hence, each tuple of the larger side joins with exactly one tuple of the smaller side. The result of the join is immediately discarded. We vary the size of the build-side input from 1 million to 12 million elements (1GB to 12GB). The probe-side input is kept constant at 64 million elements (64GB). The following chart shows the average execution time of three runs for each setup.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-single-perf.png&quot; style=&quot;width:85%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;The joins with 1 to 3 GB build side (blue bars) are pure in-memory joins. The other joins partially spill data to disk (4 to 12GB, orange bars). The results show that the performance of Flink’s Hybrid-Hash-Join remains stable as long as the hash table completely fits into memory. As soon as the hash table becomes larger than the working memory, parts of the hash table and corresponding parts of the probe side are spilled to disk. The chart shows that the performance of the Hybrid-Hash-Join gracefully decreases in this situation, i.e., there is no sharp increase in runtime when the join starts spilling. In combination with Flink’s robust memory management, this execution behavior gives smooth performance without the need for fine-grained, data-dependent memory tuning.&lt;/p&gt;

&lt;p&gt;So, Flink’s Hybrid-Hash-Join implementation performs well on a single thread even for limited memory resources, but how good is Flink’s performance when joining larger data sets in a distributed setting? For the next experiment we compare the performance of the most common join strategy combinations, namely:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Broadcast-Forward, Hybrid-Hash-Join (broadcasting and building with the smaller side),&lt;/li&gt;
  &lt;li&gt;Repartition, Hybrid-Hash-Join (building with the smaller side), and&lt;/li&gt;
  &lt;li&gt;Repartition, Sort-Merge-Join&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;for different input size ratios:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;1GB     : 1000GB&lt;/li&gt;
  &lt;li&gt;10GB    : 1000GB&lt;/li&gt;
  &lt;li&gt;100GB   : 1000GB&lt;/li&gt;
  &lt;li&gt;1000GB  : 1000GB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.&lt;/p&gt;

&lt;p&gt;As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80.&lt;/p&gt;

&lt;center&gt;
&lt;img src=&quot;/img/blog/joins-dist-perf.png&quot; style=&quot;width:70%;margin:15px&quot; /&gt;
&lt;/center&gt;

&lt;p&gt;As expected, the Broadcast-Forward strategy performs best for very small inputs because the large probe side is not shipped over the network and is locally joined. However, when the size of the broadcasted side grows, two problems arise. First the amount of data which is shipped increases but also each parallel instance has to process the full broadcasted data set. The performance of both Repartitioning strategies behaves similar for growing input sizes which indicates that these strategies are mainly limited by the cost of the data transfer (at max 2TB are shipped over the network and joined). Although the Sort-Merge-Join strategy shows the worst performance all shown cases, it has a right to exist because it can nicely exploit sorted input data.&lt;/p&gt;

&lt;h3 id=&quot;ive-got-sooo-much-data-to-join-do-i-really-need-to-ship-it&quot;&gt;I’ve got sooo much data to join, do I really need to ship it?&lt;/h3&gt;

&lt;p&gt;We have seen that off-the-shelf distributed joins work really well in Flink. But what if your data is so huge that you do not want to shuffle it across your cluster? We recently added some features to Flink for specifying semantic properties (partitioning and sorting) on input splits and co-located reading of local input files. With these tools at hand, it is possible to join pre-partitioned data sets from your local filesystem without sending a single byte over your cluster’s network. If the input data is even pre-sorted, the join can be done as a Sort-Merge-Join without sorting, i.e., the join is essentially done on-the-fly. Exploiting co-location requires a very special setup though. Data needs to be stored on the local filesystem because HDFS does not feature data co-location and might move file blocks across data nodes. That means you need to take care of many things yourself which HDFS would have done for you, including replication to avoid data loss. On the other hand, performance gains of joining co-located and pre-sorted can be quite substantial.&lt;/p&gt;

&lt;h3 id=&quot;tldr-what-should-i-remember-from-all-of-this&quot;&gt;tl;dr: What should I remember from all of this?&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake.&lt;/li&gt;
  &lt;li&gt;The optimizer does the hard choices for you, but gives you control in case you know better.&lt;/li&gt;
  &lt;li&gt;Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk.&lt;/li&gt;
  &lt;li&gt;Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty &lt;code&gt;OutOfMemoryException&lt;/code&gt;. It just runs out-of-the-box.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4 id=&quot;references&quot;&gt;References&lt;/h4&gt;

&lt;p&gt;[1] &lt;a href=&quot;&quot;&gt;“MapReduce: Simplified data processing on large clusters”&lt;/a&gt;, Dean, Ghemawat, 2004 &lt;br /&gt;
[2] &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html&quot;&gt;Flink 0.8.1 documentation: Data Transformations&lt;/a&gt; &lt;br /&gt;
[3] &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/dataset_transformations.html#join&quot;&gt;Flink 0.8.1 documentation: Joins&lt;/a&gt; &lt;br /&gt;
[4] &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/index.html#semantic-annotations&quot;&gt;Flink 1.0 documentation: Semantic annotations&lt;/a&gt; &lt;br /&gt;
[5] &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/dataset_transformations.html#join-algorithm-hints&quot;&gt;Flink 1.0 documentation: Optimizer join hints&lt;/a&gt; &lt;br /&gt;&lt;/p&gt;
</description>
<pubDate>Fri, 13 Mar 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html</link>
<guid isPermaLink="true">/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html</guid>
</item>

<item>
<title>February 2015 in the Flink community</title>
<description>&lt;p&gt;February might be the shortest month of the year, but this does not
mean that the Flink community has not been busy adding features to the
system and fixing bugs. Here’s a rundown of the activity in the Flink
community last month.&lt;/p&gt;

&lt;h3 id=&quot;release&quot;&gt;0.8.1 release&lt;/h3&gt;

&lt;p&gt;Flink 0.8.1 was released. This bugfixing release resolves a total of 22 issues.&lt;/p&gt;

&lt;h3 id=&quot;new-committer&quot;&gt;New committer&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mxm&quot;&gt;Max Michels&lt;/a&gt; has been voted a committer by the Flink PMC.&lt;/p&gt;

&lt;h3 id=&quot;flink-adapter-for-apache-samoa&quot;&gt;Flink adapter for Apache SAMOA&lt;/h3&gt;

&lt;p&gt;&lt;a href=&quot;http://samoa.incubator.apache.org&quot;&gt;Apache SAMOA (incubating)&lt;/a&gt; is a
distributed streaming machine learning (ML) framework with a
programming abstraction for distributed streaming ML algorithms. SAMOA
runs on a variety of backend engines, currently Apache Storm and
Apache S4.  A &lt;a href=&quot;https://github.com/apache/incubator-samoa/pull/11&quot;&gt;pull
request&lt;/a&gt; is
available at the SAMOA repository that adds a Flink adapter for SAMOA.&lt;/p&gt;

&lt;h3 id=&quot;easy-flink-deployment-on-google-compute-cloud&quot;&gt;Easy Flink deployment on Google Compute Cloud&lt;/h3&gt;

&lt;p&gt;Flink is now integrated in bdutil, Google’s open source tool for
creating and configuring (Hadoop) clusters in Google Compute
Engine. Deployment of Flink clusters in now supported starting with
&lt;a href=&quot;https://groups.google.com/forum/#!topic/gcp-hadoop-announce/uVJ_6y9cGKM&quot;&gt;bdutil
1.2.0&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;flink-on-the-web&quot;&gt;Flink on the Web&lt;/h3&gt;

&lt;p&gt;A new blog post on &lt;a href=&quot;http://flink.apache.org/news/2015/02/09/streaming-example.html&quot;&gt;Flink
Streaming&lt;/a&gt;
was published at the blog. Flink was mentioned in several articles on
the web. Here are some examples:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://dataconomy.com/how-flink-became-an-apache-top-level-project/&quot;&gt;How Flink became an Apache Top-Level Project&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://www.linkedin.com/pulse/stale-synchronous-parallelism-new-frontier-apache-flink-nam-luc-tran?utm_content=buffer461af&amp;amp;utm_medium=social&amp;amp;utm_source=linkedin.com&amp;amp;utm_campaign=buffer&quot;&gt;Stale Synchronous Parallelism: The new frontier for Apache Flink?&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://www.hadoopsphere.com/2015/02/distributed-data-processing-with-apache.html&quot;&gt;Distributed data processing with Apache Flink&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://www.hadoopsphere.com/2015/02/ciao-latency-hallo-speed.html&quot;&gt;Ciao latency, hello speed&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;in-the-flink-master&quot;&gt;In the Flink master&lt;/h2&gt;

&lt;p&gt;The following features have been now merged in Flink’s master repository.&lt;/p&gt;

&lt;h3 id=&quot;gelly&quot;&gt;Gelly&lt;/h3&gt;

&lt;p&gt;Gelly, Flink’s Graph API allows users to manipulate graph-shaped data
directly. Here’s for example a calculation of shortest paths in a
graph:&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Graph&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;vertices&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;edges&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Vertex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;singleSourceShortestPaths&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;graph&lt;/span&gt;
     &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;run&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SingleSourceShortestPaths&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;srcVertexId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
           &lt;span class=&quot;n&quot;&gt;maxIterations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getVertices&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;p&gt;See more Gelly examples
&lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-libraries/flink-gelly-examples&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;flink-expressions&quot;&gt;Flink Expressions&lt;/h3&gt;

&lt;p&gt;The newly merged
&lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-libraries/flink-table&quot;&gt;flink-table&lt;/a&gt;
module is the first step in Flink’s roadmap towards logical queries
and SQL support. Here’s a preview on how you can read two CSV file,
assign a logical schema to, and apply transformations like filters and
joins using logical attributes rather than physical data types.&lt;/p&gt;

&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getCustomerDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;mktSegment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;mktSegment&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;AUTOMOBILE&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getOrdersDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dateFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parse&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;orderDate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;before&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
 &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;as&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;orderId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;custId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;orderDate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;shipPrio&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;items&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
 &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;customers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;custId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;===&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
   &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;orderId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;orderDate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;shipPrio&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

&lt;h3 id=&quot;access-to-hcatalog-tables&quot;&gt;Access to HCatalog tables&lt;/h3&gt;

&lt;p&gt;With the &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-batch-connectors/flink-hcatalog&quot;&gt;flink-hcatalog
module&lt;/a&gt;,
you can now conveniently access HCatalog/Hive tables. The module
supports projection (selection and order of fields) and partition
filters.&lt;/p&gt;

&lt;h3 id=&quot;access-to-secured-yarn-clustershdfs&quot;&gt;Access to secured YARN clusters/HDFS.&lt;/h3&gt;

&lt;p&gt;With this change users can access Kerberos secured YARN (and HDFS)
Hadoop clusters.  Also, basic support for accessing secured HDFS with
a standalone Flink setup is now available.&lt;/p&gt;

</description>
<pubDate>Mon, 02 Mar 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/03/02/february-2015-in-flink.html</link>
<guid isPermaLink="true">/news/2015/03/02/february-2015-in-flink.html</guid>
</item>

<item>
<title>Introducing Flink Streaming</title>
<description>&lt;p&gt;This post is the first of a series of blog posts on Flink Streaming,
the recent addition to Apache Flink that makes it possible to analyze
continuous data sources in addition to static files. Flink Streaming
uses the pipelined Flink engine to process data streams in real time
and offers a new API including definition of flexible windows.&lt;/p&gt;

&lt;p&gt;In this post, we go through an example that uses the Flink Streaming
API to compute statistics on stock market data that arrive
continuously and combine the stock market data with Twitter streams.
See the &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html&quot;&gt;Streaming Programming
Guide&lt;/a&gt; for a
detailed presentation of the Streaming API.&lt;/p&gt;

&lt;p&gt;First, we read a bunch of stock price streams and combine them into
one stream of market data. We apply several transformations on this
market data stream, like rolling aggregations per stock. Then we emit
price warning alerts when the prices are rapidly changing. Moving 
towards more advanced features, we compute rolling correlations
between the market data streams and a Twitter stream with stock mentions.&lt;/p&gt;

&lt;p&gt;For running the example implementation please use the &lt;em&gt;0.9-SNAPSHOT&lt;/em&gt; 
version of Flink as a dependency. The full example code base can be 
found &lt;a href=&quot;https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/scala/org/apache/flink/streaming/scala/examples/windowing/StockPrices.scala&quot;&gt;here&lt;/a&gt; in Scala and &lt;a href=&quot;https://github.com/mbalassi/flink/blob/stockprices/flink-staging/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/windowing/StockPrices.java&quot;&gt;here&lt;/a&gt; in Java7.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;reading-from-multiple-inputs&quot;&gt;Reading from multiple inputs&lt;/h2&gt;

&lt;p&gt;First, let us create the stream of stock prices:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Read a socket stream of stock prices&lt;/li&gt;
  &lt;li&gt;Parse the text in the stream to create a stream of &lt;code&gt;StockPrice&lt;/code&gt; objects&lt;/li&gt;
  &lt;li&gt;Add four other sources tagged with the stock symbol.&lt;/li&gt;
  &lt;li&gt;Finally, merge the streams to create a unified stream.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;img alt=&quot;Reading from multiple inputs&quot; src=&quot;/img/blog/blog_multi_input.png&quot; width=&quot;70%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;
  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Array&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;//Read from a socket stream at map it to StockPrice objects&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socketStockStream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;socketTextStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;localhost&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9999&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDouble&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;//Generate other stock streams&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SPX_Stream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generateStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SPX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FTSE_Stream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generateStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;FTSE&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DJI_Stream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generateStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;DJI&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BUX_Stream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generateStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;BUX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;c1&quot;&gt;//Merge all stock streams together&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socketStockStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;merge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;SPX_Stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FTSE_Stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; 
    &lt;span class=&quot;nc&quot;&gt;DJI_Stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BUX_Stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;

  &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Stock stream&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;
  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;main&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;args&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;//Read from a socket stream at map it to StockPrice objects&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socketStockStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;socketTextStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;localhost&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;9999&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

                &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
                &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                    &lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;],&lt;/span&gt;
                        &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;parseDouble&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tokens&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]));&lt;/span&gt;
                &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;//Generate other stock streams&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SPX_stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SPX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FTSE_stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;FTSE&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DJI_stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;DJI&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BUX_stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;BUX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;40&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;

    &lt;span class=&quot;c1&quot;&gt;//Merge all stock streams together&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;socketStockStream&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;merge&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SPX_stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FTSE_stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DJI_stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BUX_stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Stock stream&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;See
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html#data-sources&quot;&gt;here&lt;/a&gt;
on how you can create streaming sources for Flink Streaming
programs. Flink, of course, has support for reading in streams from
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/index.html&quot;&gt;external
sources&lt;/a&gt;
such as Apache Kafka, Apache Flume, RabbitMQ, and others. For the sake
of this example, the data streams are simply generated using the
&lt;code&gt;generateStock&lt;/code&gt; method:&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;
  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SPX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;FTSE&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DJI&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DJT&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;BUX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DAX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;GOOG&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generateStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1000.&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextGaussian&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;
  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYMBOLS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Arrays&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;asList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;SPX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;FTSE&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DJI&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DJT&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;BUX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;DAX&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;GOOG&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Serializable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;StockPrice{&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&amp;quot;symbol=&amp;#39;&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;&amp;#39;\&amp;#39;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&amp;quot;, count=&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;sc&quot;&gt;&amp;#39;}&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StockSource&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sigma&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;invoke&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_PRICE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;nextGaussian&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sigma&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;nextInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;To read from the text socket stream please make sure that you have a
socket running. For the sake of the example executing the following
command in a terminal does the job. You can get
&lt;a href=&quot;http://netcat.sourceforge.net/&quot;&gt;netcat&lt;/a&gt; here if it is not available
on your machine.&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;nc -lk 9999
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If we execute the program from our IDE we see the system the
stock prices being generated:&lt;/p&gt;

&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;INFO    Job execution switched to status RUNNING.
INFO    Socket Stream(1/1) switched to SCHEDULED 
INFO    Socket Stream(1/1) switched to DEPLOYING
INFO    Custom Source(1/1) switched to SCHEDULED 
INFO    Custom Source(1/1) switched to DEPLOYING
…
1&amp;gt; StockPrice{symbol=&#39;SPX&#39;, count=1011.3405732645239}
2&amp;gt; StockPrice{symbol=&#39;SPX&#39;, count=1018.3381290039248}
1&amp;gt; StockPrice{symbol=&#39;DJI&#39;, count=1036.7454894073978}
3&amp;gt; StockPrice{symbol=&#39;DJI&#39;, count=1135.1170217478427}
3&amp;gt; StockPrice{symbol=&#39;BUX&#39;, count=1053.667523187687}
4&amp;gt; StockPrice{symbol=&#39;BUX&#39;, count=1036.552601487263}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;window-aggregations&quot;&gt;Window aggregations&lt;/h2&gt;

&lt;p&gt;We first compute aggregations on time-based windows of the
data. Flink provides &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html&quot;&gt;flexible windowing semantics&lt;/a&gt; where windows can
also be defined based on count of records or any custom user defined
logic.&lt;/p&gt;

&lt;p&gt;We partition our stream into windows of 10 seconds and slide the
window every 5 seconds. We compute three statistics every 5 seconds.
The first is the minimum price of all stocks, the second produces
maximum price per stock, and the third is the mean stock price 
(using a map window function). Aggregations and groupings can be
performed on named fields of POJOs, making the code more readable.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Basic windowing aggregations&quot; src=&quot;/img/blog/blog_basic_window.png&quot; width=&quot;70%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;

  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Define the desired time window&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;every&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute some simple statistics on a rolling window&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lowest&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;price&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxByStock&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;maxBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;price&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rollingMean&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute the mean of a window&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nonEmpty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foldLeft&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Define the desired time window&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;WindowedDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;every&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute some simple statistics on a rolling window&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lowest&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;minBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;price&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;maxByStock&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;maxBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;price&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rollingMean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WindowMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute the mean of a window&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;WindowMean&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; 
    &lt;span class=&quot;n&quot;&gt;WindowMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;
            &lt;span class=&quot;nf&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

&lt;/div&gt;

&lt;p&gt;Let us note that to print a windowed stream one has to flatten it first,
thus getting rid of the windowing logic. For example execute 
&lt;code&gt;maxByStock.flatten().print()&lt;/code&gt; to print the stream of maximum prices of
 the time windows by stock. For Scala &lt;code&gt;flatten()&lt;/code&gt; is called implicitly
when needed.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;data-driven-windows&quot;&gt;Data-driven windows&lt;/h2&gt;

&lt;p&gt;The most interesting event in the stream is when the price of a stock
is changing rapidly. We can send a warning when a stock price changes
more than 5% since the last warning. To do that, we use a delta-based window providing a
threshold on when the computation will be triggered, a function to
compute the difference and a default value with which the first record
is compared. We also create a &lt;code&gt;Count&lt;/code&gt; data type to count the warnings
every 30 seconds.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Data-driven windowing semantics&quot; src=&quot;/img/blog/blog_data_driven.png&quot; width=&quot;100%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;

  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;defaultPrice&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Use delta policy to create price change warnings&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceWarnings&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Delta&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.05&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceChange&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;defaultPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sendWarning&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Count the number of warnings every half a minute&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningsPerStock&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceWarnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceChange&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p1&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p2&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;nc&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;abs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;p1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;p2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sendWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nonEmpty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;head&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_PRICE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_STOCK_PRICE&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_PRICE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Use delta policy to create price change warnings&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceWarnings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stockStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Delta&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mf&quot;&gt;0.05&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DeltaFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
        &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getDelta&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;oldDataPoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newDataPoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;abs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;oldDataPoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;price&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newDataPoint&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;price&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;},&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DEFAULT_STOCK_PRICE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SendWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Count the number of warnings every half a minute&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningsPerStock&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;priceWarnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Count&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Serializable&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;Count{&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&amp;quot;symbol=&amp;#39;&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;sc&quot;&gt;&amp;#39;\&amp;#39;&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;s&quot;&gt;&amp;quot;, count=&amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
                &lt;span class=&quot;sc&quot;&gt;&amp;#39;}&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SendWarning&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapWindowFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StockPrice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;symbol&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;combining-with-a-twitter-stream&quot;&gt;Combining with a Twitter stream&lt;/h2&gt;

&lt;p&gt;Next, we will read a Twitter stream and correlate it with our stock
price stream. Flink has support for connecting to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/twitter.html&quot;&gt;Twitter’s
API&lt;/a&gt;
but for the sake of this example we generate dummy tweet data.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Social media analytics&quot; src=&quot;/img/blog/blog_social_media.png&quot; width=&quot;100%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;

  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Read a stream of tweets&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetStream&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generateTweets&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Extract the stock symbols&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mentionedSymbols&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tweet&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toUpperCase&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;contains&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Count the extracted symbols&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsPerStock&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mentionedSymbols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;generateTweets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;&amp;lt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;to&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;symbols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mkString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;nc&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nextInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Read a stream of tweets&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TweetSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Extract the stock symbols&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mentionedSymbols&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toUpperCase&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FilterFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYMBOLS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;contains&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Count the extracted symbols&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsPerStock&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mentionedSymbols&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;count&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatten&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TweetSource&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;Random&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;StringBuilder&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;invoke&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;random&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setLength&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
                &lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SYMBOLS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;nextInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SYMBOLS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())));&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stringBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;Thread&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sleep&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;500&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;streaming-joins&quot;&gt;Streaming joins&lt;/h2&gt;

&lt;p&gt;Finally, we join real-time tweets and stock prices and compute a
rolling correlation between the number of price warnings and the
number of mentions of a given stock in the Twitter stream. As both of
these data streams are potentially infinite, we apply the join on a
30-second window.&lt;/p&gt;

&lt;p&gt;&lt;img alt=&quot;Streaming joins&quot; src=&quot;/img/blog/blog_stream_join.png&quot; width=&quot;60%&quot; class=&quot;img-responsive center-block&quot; /&gt;&lt;/p&gt;

&lt;div class=&quot;codetabs&quot;&gt;

  &lt;div data-lang=&quot;scala&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot; data-lang=&quot;scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Join warnings and parsed tweets&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsAndWarning&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningsPerStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tweetsPerStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;onWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;equalTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rollingCorrelation&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsAndWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;computeCorrelation&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rollingCorrelation&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute rolling correlation&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;computeCorrelation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;])&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;nonEmpty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;var2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;

    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;average&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;zip&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xy&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;xy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;average&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))))&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d2&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;average&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;var2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mean2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))))&lt;/span&gt;

    &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;d1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
  &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

  &lt;div data-lang=&quot;java7&quot;&gt;

    &lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot; data-lang=&quot;java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;//Join warnings and parsed tweets&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsAndWarning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningsPerStock&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tweetsPerStock&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;onWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equalTo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;symbol&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;with&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JoinFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
        &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
            &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;

&lt;span class=&quot;c1&quot;&gt;//Compute rolling correlation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rollingCorrelation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tweetsAndWarning&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;30&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TimeUnit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;SECONDS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WindowCorrelation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;rollingCorrelation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;

&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;WindowCorrelation&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WindowMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftSum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightSum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
    &lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;mapWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Iterable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; 
        &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;leftSum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;rightSum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.;&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;//compute mean for both sides, save count&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;leftSum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;rightSum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;++;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;leftMean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftSum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;doubleValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;rightMean&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightSum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;doubleValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;

        &lt;span class=&quot;c1&quot;&gt;//compute covariance &amp;amp; std. deviations&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;

        &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;values&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f0&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;leftMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pair&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightMean&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
        &lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Math&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqrt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;

        &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cov&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;leftSd&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rightSd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
    &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;

  &lt;/div&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;other-things-to-try&quot;&gt;Other things to try&lt;/h2&gt;

&lt;p&gt;For a full feature overview please check the &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/index.html&quot;&gt;Streaming Guide&lt;/a&gt;, which describes all the available API features.
You are very welcome to try out our features for different use-cases we are looking forward to your experiences. Feel free to &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;contact us&lt;/a&gt;.&lt;/p&gt;

&lt;h2 id=&quot;upcoming-for-streaming&quot;&gt;Upcoming for streaming&lt;/h2&gt;

&lt;p&gt;There are some aspects of Flink Streaming that are subjects to
change by the next release making this application look even nicer.&lt;/p&gt;

&lt;p&gt;Stay tuned for later blog posts on how Flink Streaming works
internally, fault tolerance, and performance measurements!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;#top&quot;&gt;Back to top&lt;/a&gt;&lt;/p&gt;
</description>
<pubDate>Mon, 09 Feb 2015 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/02/09/streaming-example.html</link>
<guid isPermaLink="true">/news/2015/02/09/streaming-example.html</guid>
</item>

<item>
<title>January 2015 in the Flink community</title>
<description>&lt;p&gt;Happy 2015! Here is a (hopefully digestible) summary of what happened last month in the Flink community.&lt;/p&gt;

&lt;h3 id=&quot;release&quot;&gt;0.8.0 release&lt;/h3&gt;

&lt;p&gt;Flink 0.8.0 was released. See &lt;a href=&quot;http://flink.apache.org/news/2015/01/21/release-0.8.html&quot;&gt;here&lt;/a&gt; for the release notes.&lt;/p&gt;

&lt;h3 id=&quot;flink-roadmap&quot;&gt;Flink roadmap&lt;/h3&gt;

&lt;p&gt;The community has published a &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Roadmap&quot;&gt;roadmap for 2015&lt;/a&gt; on the Flink wiki. Check it out to see what is coming up in Flink, and pick up an issue to contribute!&lt;/p&gt;

&lt;h3 id=&quot;articles-in-the-press&quot;&gt;Articles in the press&lt;/h3&gt;

&lt;p&gt;The Apache Software Foundation &lt;a href=&quot;https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces69&quot;&gt;announced&lt;/a&gt; Flink as a Top-Level Project. The announcement was picked up by the media, e.g., &lt;a href=&quot;http://sdtimes.com/inside-apache-software-foundations-newest-top-level-project-apache-flink/?utm_content=11232092&amp;amp;utm_medium=social&amp;amp;utm_source=twitter&quot;&gt;here&lt;/a&gt;, &lt;a href=&quot;http://www.datanami.com/2015/01/12/apache-flink-takes-route-distributed-data-processing/&quot;&gt;here&lt;/a&gt;, and &lt;a href=&quot;http://i-programmer.info/news/197-data-mining/8176-flink-reaches-top-level-status.html&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;hadoop-summit&quot;&gt;Hadoop Summit&lt;/h3&gt;

&lt;p&gt;A submitted abstract on Flink Streaming won the community vote at “The Future of Hadoop” track.&lt;/p&gt;

&lt;h3 id=&quot;meetups-and-talks&quot;&gt;Meetups and talks&lt;/h3&gt;

&lt;p&gt;Flink was presented at the &lt;a href=&quot;http://www.meetup.com/Hadoop-User-Group-France/events/219778022/&quot;&gt;Paris Hadoop User Group&lt;/a&gt;, the &lt;a href=&quot;http://www.meetup.com/hadoop/events/167785202/&quot;&gt;Bay Area Hadoop User Group&lt;/a&gt;, the &lt;a href=&quot;http://www.meetup.com/Apache-Tez-User-Group/events/219302692/&quot;&gt;Apache Tez User Group&lt;/a&gt;, and &lt;a href=&quot;https://fosdem.org/2015/schedule/track/graph_processing/&quot;&gt;FOSDEM 2015&lt;/a&gt;. The January &lt;a href=&quot;http://www.meetup.com/Apache-Flink-Meetup/events/219639984/&quot;&gt;Flink meetup in Berlin&lt;/a&gt; had talks on recent community updates and new features.&lt;/p&gt;

&lt;h2 id=&quot;notable-code-contributions&quot;&gt;Notable code contributions&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Code contributions listed here may not be part of a release or even the Flink master repository yet.&lt;/p&gt;

&lt;h3 id=&quot;using-off-heap-memoryhttpsgithubcomapacheflinkpull290&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/290&quot;&gt;Using off-heap memory&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets).&lt;/p&gt;

&lt;h3 id=&quot;gelly-flinks-graph-apihttpsgithubcomapacheflinkpull335&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/335&quot;&gt;Gelly, Flink’s Graph API&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;This pull request introduces Gelly, Flink’s brand new Graph API. Gelly offers a native graph programming abstraction with functionality for vertex-centric programming, as well as available graph algorithms. See &lt;a href=&quot;http://www.slideshare.net/vkalavri/largescale-graph-processing-with-apache-flink-graphdevroom-fosdem15&quot;&gt;this slide set&lt;/a&gt; for an overview of Gelly.&lt;/p&gt;

&lt;h3 id=&quot;semantic-annotationshttpsgithubcomapacheflinkpull311&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/311&quot;&gt;Semantic annotations&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;Semantic annotations are a powerful mechanism to expose information about the behavior of Flink functions to Flink’s optimizer. The optimizer can leverage this information to generate more efficient execution plans. For example the output of a Reduce operator that groups on the second field of a tuple is still partitioned on that field if the Reduce function does not modify the value of the second field. By exposing this information to the optimizer, the optimizer can generate plans that avoid expensive data shuffling and reuse the partitioned output of Reduce. Semantic annotations can be defined for most data types, including (nested) tuples and POJOs. See the snapshot documentation for details (not online yet).&lt;/p&gt;

&lt;h3 id=&quot;new-yarn-clienthttpsgithubcomapacheflinkpull292&quot;&gt;&lt;a href=&quot;https://github.com/apache/flink/pull/292&quot;&gt;New YARN client&lt;/a&gt;&lt;/h3&gt;

&lt;p&gt;The improved YARN client of Flink now allows users to deploy Flink on YARN for executing a single job. Older versions only supported a long-running YARN session. The code of the YARN client has been refactored to provide an (internal) Java API for controlling YARN clusters more easily.&lt;/p&gt;
</description>
<pubDate>Wed, 04 Feb 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/02/04/january-in-flink.html</link>
<guid isPermaLink="true">/news/2015/02/04/january-in-flink.html</guid>
</item>

<item>
<title>Apache Flink 0.8.0 available</title>
<description>&lt;p&gt;We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://www.apache.org/dyn/closer.cgi/flink/flink-0.8.0/flink-0.8.0-bin-hadoop2.tgz&quot;&gt;Download Flink 0.8.0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12328699&quot;&gt;See the release changelog&lt;/a&gt;&lt;/p&gt;

&lt;h2 id=&quot;overview-of-major-new-features&quot;&gt;Overview of major new features&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Extended filesystem support&lt;/strong&gt;: The former &lt;code&gt;DistributedFileSystem&lt;/code&gt; interface has been generalized to &lt;code&gt;HadoopFileSystem&lt;/code&gt; now supporting all sub classes of &lt;code&gt;org.apache.hadoop.fs.FileSystem&lt;/code&gt;. This allows users to use all file systems supported by Hadoop with Apache Flink.
&lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/example_connectors.html&quot;&gt;See connecting to other systems&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Streaming Scala API&lt;/strong&gt;: As an alternative to the existing Java API Streaming is now also programmable in Scala. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Streaming windowing semantics&lt;/strong&gt;: The new windowing api offers an expressive way to define custom logic for triggering the execution of a stream window and removing elements. The new features include out-of-the-box support for windows based in logical or physical time and data-driven properties on the events themselves among others. &lt;a href=&quot;http://ci.apache.org/projects/flink/flink-docs-release-0.8/streaming_guide.html#window-operators&quot;&gt;Read more here&lt;/a&gt;&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Mutable and immutable objects in runtime&lt;/strong&gt; All Flink versions before 0.8.0 were always passing the same objects to functions written by users. This is a common performance optimization, also used in other systems such as Hadoop.
 However, this is error-prone for new users because one has to carefully check that references to the object aren’t kept in the user function. Starting from 0.8.0, Flink allows to configure a mode which is disabling that mechanism.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Performance and usability improvements&lt;/strong&gt;: The new Apache Flink 0.8.0 release brings several new features which will significantly improve the performance and the usability of the system. Amongst others, these features include:
    &lt;ul&gt;
      &lt;li&gt;Improved input split assignment which maximizes computation locality&lt;/li&gt;
      &lt;li&gt;Smart broadcasting mechanism which minimizes network I/O&lt;/li&gt;
      &lt;li&gt;Custom partitioners which let the user control how the data is partitioned within the cluster. This helps to prevent data skewness and allows to implement highly efficient algorithms.&lt;/li&gt;
      &lt;li&gt;coGroup operator now supports group sorting for its inputs&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Kryo is the new fallback serializer&lt;/strong&gt;: Apache Flink has a sophisticated type analysis and serialization framework that is able to handle commonly used types very efficiently.
 In addition to that, there is a fallback serializer for types which are not supported. Older versions of Flink used the reflective &lt;a href=&quot;http://avro.apache.org/&quot;&gt;Avro&lt;/a&gt; serializer for that purpose. With this release, Flink is using the powerful &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt; and twitter-chill library for support of types such as Java Collections and Scala specifc types.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;strong&gt;Hadoop 2.2.0+ is now the default Hadoop dependency&lt;/strong&gt;: With Flink 0.8.0 we made the “hadoop2” build profile the default build for Flink. This means that all users using Hadoop 1 (0.2X or 1.2.X versions) have to specify  version “0.8.0-hadoop1” in their pom files.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;HBase module updated&lt;/strong&gt; The HBase version has been updated to 0.98.6.1. Also, Hbase is now available to the Hadoop1 and Hadoop2 profile of Flink.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;contributors&quot;&gt;Contributors&lt;/h2&gt;

&lt;ul&gt;
  &lt;li&gt;Marton Balassi&lt;/li&gt;
  &lt;li&gt;Daniel Bali&lt;/li&gt;
  &lt;li&gt;Carsten Brandt&lt;/li&gt;
  &lt;li&gt;Moritz Borgmann&lt;/li&gt;
  &lt;li&gt;Stefan Bunk&lt;/li&gt;
  &lt;li&gt;Paris Carbone&lt;/li&gt;
  &lt;li&gt;Ufuk Celebi&lt;/li&gt;
  &lt;li&gt;Nils Engelbach&lt;/li&gt;
  &lt;li&gt;Stephan Ewen&lt;/li&gt;
  &lt;li&gt;Gyula Fora&lt;/li&gt;
  &lt;li&gt;Gabor Hermann&lt;/li&gt;
  &lt;li&gt;Fabian Hueske&lt;/li&gt;
  &lt;li&gt;Vasiliki Kalavri&lt;/li&gt;
  &lt;li&gt;Johannes Kirschnick&lt;/li&gt;
  &lt;li&gt;Aljoscha Krettek&lt;/li&gt;
  &lt;li&gt;Suneel Marthi&lt;/li&gt;
  &lt;li&gt;Robert Metzger&lt;/li&gt;
  &lt;li&gt;Felix Neutatz&lt;/li&gt;
  &lt;li&gt;Chiwan Park&lt;/li&gt;
  &lt;li&gt;Flavio Pompermaier&lt;/li&gt;
  &lt;li&gt;Mingliang Qi&lt;/li&gt;
  &lt;li&gt;Shiva Teja Reddy&lt;/li&gt;
  &lt;li&gt;Till Rohrmann&lt;/li&gt;
  &lt;li&gt;Henry Saputra&lt;/li&gt;
  &lt;li&gt;Kousuke Saruta&lt;/li&gt;
  &lt;li&gt;Chesney Schepler&lt;/li&gt;
  &lt;li&gt;Erich Schubert&lt;/li&gt;
  &lt;li&gt;Peter Szabo&lt;/li&gt;
  &lt;li&gt;Jonas Traub&lt;/li&gt;
  &lt;li&gt;Kostas Tzoumas&lt;/li&gt;
  &lt;li&gt;Timo Walther&lt;/li&gt;
  &lt;li&gt;Daniel Warneke&lt;/li&gt;
  &lt;li&gt;Chen Xu&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 21 Jan 2015 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2015/01/21/release-0.8.html</link>
<guid isPermaLink="true">/news/2015/01/21/release-0.8.html</guid>
</item>

</channel>
</rss>
