blob: 807bf73b6d67719092a8ad36cfacb4c08551aec8 [file] [log] [blame]
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Flink Blog Feed</title>
<description>Flink Blog</description>
<link>https://flink.apache.org/blog</link>
<atom:link href="https://flink.apache.org/blog/feed.xml" rel="self" type="application/rss+xml" />
<item>
<title>The State of Flink on Docker</title>
<description>&lt;p&gt;With over 50 million downloads from Docker Hub, the Flink docker images are a very popular deployment option.&lt;/p&gt;
&lt;p&gt;The Flink community recently put some effort into improving the Docker experience for our users with the goal to reduce confusion and improve usability.&lt;/p&gt;
&lt;p&gt;Let’s quickly break down the recent improvements:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Reduce confusion: Flink used to have 2 Dockerfiles and a 3rd file maintained outside of the official repository — all with different features and varying stability. Now, we have one central place for all images: &lt;a href=&quot;https://github.com/apache/flink-docker&quot;&gt;apache/flink-docker&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Here, we keep all the Dockerfiles for the different releases. Check out the &lt;a href=&quot;https://github.com/apache/flink-docker/blob/master/README.md&quot;&gt;detailed readme&lt;/a&gt; of that repository for further explanation on the different branches, as well as the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification&quot;&gt;Flink Improvement Proposal (FLIP-111)&lt;/a&gt; that contains the detailed planning.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;apache/flink-docker&lt;/code&gt; repository also seeds the &lt;a href=&quot;https://hub.docker.com/_/flink&quot;&gt;official Flink image on Docker Hub&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Improve Usability: The Dockerfiles are used for various purposes: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html&quot;&gt;Native Docker deployments&lt;/a&gt;, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/native_kubernetes.html&quot;&gt;Flink on Kubernetes&lt;/a&gt;, the (unofficial) &lt;a href=&quot;https://github.com/docker-flink/examples&quot;&gt;Flink helm example&lt;/a&gt; and the project’s &lt;a href=&quot;https://github.com/apache/flink/tree/master/flink-end-to-end-tests&quot;&gt;internal end to end tests&lt;/a&gt;. With one unified image, all these consumers of the images benefit from the same set of features, documentation and testing.&lt;/p&gt;
&lt;p&gt;The new images support &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#configure-options&quot;&gt;passing configuration variables&lt;/a&gt; via a &lt;code&gt;FLINK_PROPERTIES&lt;/code&gt; environment variable. Users can &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#using-plugins&quot;&gt;enable default plugins&lt;/a&gt; with the &lt;code&gt;ENABLE_BUILT_IN_PLUGINS&lt;/code&gt; environment variable. The images also allow loading custom jar paths and configuration files.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Looking into the future, there are already some interesting potential improvements lined up:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16260&quot;&gt;Java 11 Docker images&lt;/a&gt; (already completed)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15793&quot;&gt;Use vanilla docker-entrypoint with flink-kubernetes&lt;/a&gt; (in progress)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17167&quot;&gt;History server support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15587&quot;&gt;Support for OpenShift&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;how-do-i-get-started&quot;&gt;How do I get started?&lt;/h2&gt;
&lt;p&gt;This is a short tutorial on &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#start-a-session-cluster&quot;&gt;how to start a Flink Session Cluster&lt;/a&gt; with Docker.&lt;/p&gt;
&lt;p&gt;A &lt;em&gt;Flink Session cluster&lt;/em&gt; can be used to run multiple jobs. Each job needs to be submitted to the cluster after it has been deployed. To deploy a &lt;em&gt;Flink Session cluster&lt;/em&gt; with Docker, you need to start a &lt;em&gt;JobManager&lt;/em&gt; container. To enable communication between the containers, we first set a required Flink configuration property and create a network:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;FLINK_PROPERTIES=&quot;jobmanager.rpc.address: jobmanager&quot;
docker network create flink-network
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then we launch the JobManager:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker run \
--rm \
--name=jobmanager \
--network flink-network \
-p 8081:8081 \
--env FLINK_PROPERTIES=&quot;${FLINK_PROPERTIES}&quot; \
flink:1.11.1 jobmanager
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;and one or more &lt;em&gt;TaskManager&lt;/em&gt; containers:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker run \
--rm \
--name=taskmanager \
--network flink-network \
--env FLINK_PROPERTIES=&quot;${FLINK_PROPERTIES}&quot; \
flink:1.11.1 taskmanager
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You now have a fully functional Flink cluster running! You can access the the web front end here: &lt;a href=&quot;http://localhost:8081/&quot;&gt;localhost:8081&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Let’s now submit one of Flink’s example jobs:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;c&quot;&gt;# 1: (optional) Download the Flink distribution, and unpack it&lt;/span&gt;
wget https://archive.apache.org/dist/flink/flink-1.11.1/flink-1.11.1-bin-scala_2.12.tgz
tar xf flink-1.11.1-bin-scala_2.12.tgz
&lt;span class=&quot;nb&quot;&gt;cd &lt;/span&gt;flink-1.11.1
&lt;span class=&quot;c&quot;&gt;# 2: Start the Flink job&lt;/span&gt;
./bin/flink run ./examples/streaming/TopSpeedWindowing.jar&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The main steps of the tutorial are also recorded in this short screencast:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-docker/flink-docker.gif&quot; width=&quot;882px&quot; height=&quot;730px&quot; alt=&quot;Demo video&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;strong&gt;Next steps&lt;/strong&gt;: Now that you’ve successfully completed this tutorial, we recommend you checking out the full &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html&quot;&gt;Flink on Docker documentation&lt;/a&gt; for implementing more advanced deployment scenarios, such as Job Clusters, Docker Compose or our native Kubernetes integration.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;We encourage all readers to try out Flink on Docker to provide the community with feedback to further improve the experience.
Please refer to the user@flink.apache.org (&lt;a href=&quot;https://flink.apache.org/community.html#how-to-subscribe-to-a-mailing-list&quot;&gt;remember to subscribe first&lt;/a&gt;) for general questions and our &lt;a href=&quot;https://issues.apache.org/jira/issues/?jql=project+%3D+FLINK+AND+component+%3D+flink-docker&quot;&gt;issue tracker&lt;/a&gt; for specific bugs or improvements, or &lt;a href=&quot;https://flink.apache.org/contributing/how-to-contribute.html&quot;&gt;ideas for contributions&lt;/a&gt;!&lt;/p&gt;
</description>
<pubDate>Thu, 20 Aug 2020 02:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/08/20/flink-docker.html</link>
<guid isPermaLink="true">/news/2020/08/20/flink-docker.html</guid>
</item>
<item>
<title>Monitoring and Controlling Networks of IoT Devices with Flink Stateful Functions</title>
<description>&lt;p&gt;In this blog post, we’ll take a look at a class of use cases that is a natural fit for &lt;a href=&quot;https://flink.apache.org/stateful-functions.html&quot;&gt;Flink Stateful Functions&lt;/a&gt;: monitoring and controlling networks of connected devices (often called the “Internet of Things” (IoT)).&lt;/p&gt;
&lt;p&gt;IoT networks are composed of many individual, but interconnected components, which makes getting some kind of high-level insight into the status, problems, or optimization opportunities in these networks not trivial. Each individual device “sees” only its own state, which means that the status of groups of devices, or even the network as a whole, is often a complex aggregation of the individual devices’ state. Diagnosing, controlling, or optimizing these groups of devices thus requires distributed logic that analyzes the “bigger picture” and then acts upon it.&lt;/p&gt;
&lt;p&gt;A powerful approach to implement this is using &lt;em&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Digital_twin&quot;&gt;digital twins&lt;/a&gt;&lt;/em&gt;: each device has a corresponding virtual entity (i.e. the digital twin), which also captures their relationships and interactions. The digital twins track the status of their corresponding devices and send updates to other twins, representing groups (such as geographical regions) of devices. Those, in turn, handle the logic to obtain the network’s aggregated view, or this “bigger picture” we mentioned before.&lt;/p&gt;
&lt;h1 id=&quot;our-scenario-datacenter-monitoring-and-alerting&quot;&gt;Our Scenario: Datacenter Monitoring and Alerting&lt;/h1&gt;
&lt;figure style=&quot;float:right;padding-left:1px;padding-top: 20px&quot;&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/rack.png&quot; width=&quot;350px&quot; /&gt;
&lt;figcaption style=&quot;padding-top: 10px;text-align:center&quot;&gt;&lt;b&gt;Fig.1&lt;/b&gt; An oversimplified view of a data center.&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;p&gt;There are many examples of the digital twins approach in the real world, such as &lt;a href=&quot;https://www.infoq.com/presentations/tesla-vpp/&quot;&gt;smart grids of batteries&lt;/a&gt;, &lt;a href=&quot;https://www.alibabacloud.com/solutions/intelligence-brain/city&quot;&gt;smart cities&lt;/a&gt;, or &lt;a href=&quot;https://www.youtube.com/watch?v=9y27FJgz5-M&quot;&gt;monitoring infrastructure software clusters&lt;/a&gt;. In this blogpost, we’ll use the example of data center monitoring and alert correlation implemented with Stateful Functions.&lt;/p&gt;
&lt;p&gt;Consider a very simplified view of a data center, consisting of many thousands of commodity servers arranged in server racks. Each server rack typically contains up to 40 servers, with a ToR (Top of the Rack) network switch connected to each server. The switches from all the racks connect through a larger switch (&lt;strong&gt;Fig. 1&lt;/strong&gt;).&lt;/p&gt;
&lt;p&gt;In this datacenter, many things can go wrong: a disk in a server can stop working, network cards can start dropping packets, or ToR switches might cease to function. The entire data center might also be affected by power supply degradation, causing servers to operate at reduced capacity. On-site engineers must be able to identify these incidents quickly and fix them promptly.&lt;/p&gt;
&lt;p&gt;Diagnosing individual server failures is rather straightforward: take a recent history of metric reports from that particular server, analyse it and pinpoint the anomaly. On the other hand, other incidents only make sense “together”, because they share a common root cause. Diagnosing or predicting causes of networking degradation at a rack or datacenter level requires an aggregate view of metrics (such as package drop rates) from the individual machines and racks, and possibly some prediction model or diagnosis code that runs under certain conditions.&lt;/p&gt;
&lt;h2 id=&quot;monitoring-a-virtual-datacenter-via-digital-twins&quot;&gt;Monitoring a Virtual Datacenter via Digital Twins&lt;/h2&gt;
&lt;p&gt;For the sake of this blog post, our oversimplified data center has some servers and racks, each with a unique ID. Each server has a metrics-collecting daemon that publishes metrics to a message queue, and there is a provisioning service that operators will use to ask for server commission- and decommissioning.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/1.png&quot; width=&quot;550px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Our application will consume these server metrics and commission/decommission events, and produce server/rack/datacenter alerts. There will also be an operator consuming any alerts triggered by the monitoring system. In the next section, we’ll show how this use case can be naturally modeled with Stateful Functions (StateFun).&lt;/p&gt;
&lt;h2 id=&quot;implementing-the-use-case-with-flink-statefun&quot;&gt;Implementing the use case with Flink StateFun&lt;/h2&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;You can find the code for this example at: &lt;a href=&quot;https://github.com/igalshilman/iot-statefun-blogpost&quot;&gt;https://github.com/igalshilman/iot-statefun-blogpost&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;The basic building block for modeling a StateFun application is a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/concepts/application-building-blocks.html#stateful-functions&quot;&gt;&lt;em&gt;stateful function&lt;/em&gt;&lt;/a&gt;, which has the following properties:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;It has a logical unique address; and persisted, fault tolerant state, scoped to that address.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;It can &lt;em&gt;react&lt;/em&gt; to messages, both internal (or, sent from other stateful functions) and external (e.g. a message from Kafka).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Invocations of a specific function are serializable, so messages sent to a specific address are &lt;strong&gt;not&lt;/strong&gt; executed concurrently.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;There can be many billions of function instances in a single StateFun cluster.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To model our use case, we’ll define three functions: &lt;strong&gt;ServerFun&lt;/strong&gt;, &lt;strong&gt;RackFun&lt;/strong&gt; and &lt;strong&gt;DataCenterFun&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;ServerFun&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Each physical server is represented with its &lt;em&gt;digital twin&lt;/em&gt; stateful function. This function is responsible for:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Maintaining a sliding window of incoming metrics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Applying a model that decides whether or not to trigger an alert.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Alerting if metrics are missing for too long.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Notifying its containing &lt;strong&gt;RackFun&lt;/strong&gt; about any open incidents.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;RackFun&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;While the &lt;em&gt;ServerFun&lt;/em&gt; is responsible for identifying server-local incidents, we need a function that correlates incidents happening on the different servers deployed in the same rack and:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Collects open incidents reported by the &lt;strong&gt;ServerFun&lt;/strong&gt; functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Maintains an histogram of currently opened incidents on this rack.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Applies a correlation model to the individual incidents sent by the &lt;strong&gt;ServerFun&lt;/strong&gt;, and reports high-level, related incidents as a single incident to the &lt;strong&gt;DataCenterFun&lt;/strong&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;strong&gt;DataCenterFun&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This function maintains a view of incidents across different racks in our datacenter.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/2.png&quot; width=&quot;600px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;To summarize our plan:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Leaf functions ingest raw metric data (&lt;span style=&quot;color:blue&quot;&gt;blue&lt;/span&gt; lines), and apply localized logic to trigger an alert.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Intermediate functions operate on already summarized events (&lt;span style=&quot;color:orange&quot;&gt;orange&lt;/span&gt; lines) and correlate them into high-level events.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A root function correlates the high-level events across the intermediate functions and into a single &lt;em&gt;healthy/not healthy&lt;/em&gt; value.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;how-does-it-really-look&quot;&gt;How does it really look?&lt;/h2&gt;
&lt;h3 id=&quot;serverfun&quot;&gt;ServerFun&lt;/h3&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/3_1.png&quot; width=&quot;600px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;This section associates a behaviour for every message that the function expects to be invoked with.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;metricsHistory&lt;/code&gt; buffer is our sliding window of the last 15 minutes worth of &lt;code&gt;ServerMetricReports&lt;/code&gt;. Note that this buffer is configured to expire entries 15 minutes after they were written.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;serverHealthState&lt;/code&gt; represents the current physical server state, open incidents and so on.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Let’s take a look at what happens when a &lt;code&gt;ServerMetricReport&lt;/code&gt; message arrives:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/3_2.png&quot; width=&quot;600px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;Retrieve the previously computed &lt;code&gt;serverHealthState&lt;/code&gt; that is kept in state.&lt;/li&gt;
&lt;li&gt;Evaluate a model on the sliding window of the previous metric reports + the current metric reported + the previously computed server state to obtain an assessment of the current server health.&lt;/li&gt;
&lt;li&gt;If the server is not believed to be healthy, emit an alert via an alerts topic, and also send a message to our containing rack with all the open incidents that this server currently has.&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
&lt;p&gt;We’ll omit the other handlers for brevity, but it’s important to mention that &lt;b&gt;onTimer&lt;/b&gt; makes sure that metric reports are coming in periodically, otherwise it’d trigger an alert stating that we didn’t hear from that server for a long time.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;rackfun&quot;&gt;RackFun&lt;/h3&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/5.png&quot; width=&quot;650px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;This function keeps a mapping between a &lt;code&gt;ServerId&lt;/code&gt; and a set of open incidents on that server.&lt;/li&gt;
&lt;li&gt;When new alerts are received, this function tries to correlate the alert with any other open alerts on that rack. If a correlated rack alert is present, this function notifies the &lt;strong&gt;DataCenterFun&lt;/strong&gt; about it.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;datacenterfun&quot;&gt;DataCenterFun&lt;/h3&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/6.png&quot; width=&quot;650px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;ol&gt;
&lt;li&gt;A persisted mapping between a &lt;code&gt;RackId&lt;/code&gt; and the latest alert that rack reported.&lt;/li&gt;
&lt;li&gt;Throughout the usage of ingress/egress pairs, this function can report back its current view of the world of what racks are currently known to be unhealthy.&lt;/li&gt;
&lt;li&gt;An operator (via a front-end) can send a &lt;code&gt;GetUnhealthyRacks&lt;/code&gt; message addressed to that &lt;strong&gt;DataCenterFun&lt;/strong&gt;, and wait for the corresponding response &lt;code&gt;message(UnhealthyRacks)&lt;/code&gt;. Whenever a rack reports &lt;em&gt;OK&lt;/em&gt;, it’ll be removed from the unhealthy racks map.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This pattern — where each layer of functions performs a stateful aggregation of events sent from the previous layer (or the input) — is useful for a whole class of problems. And, although we used connected devices to motivate this use case, it’s not limited to the IoT domain.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-18-statefun/7.png&quot; width=&quot;500px&quot; alt=&quot;&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Stateful Functions provides the building blocks necessary for building complex distributed applications (here the digital twins that support analysis and interactions of the physical entities), while removing common complexities of distributed systems like service discovery, retires, circuit breakers, state management, scalability and similar challenges. If you’d like to learn more about Stateful Functions, head over to the official &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-master/&quot;&gt;documentation&lt;/a&gt;, where you can also find more hands-on tutorials to try out yourself!&lt;/p&gt;
</description>
<pubDate>Wed, 19 Aug 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/08/19/statefun.html</link>
<guid isPermaLink="true">/2020/08/19/statefun.html</guid>
</item>
<item>
<title>Accelerating your workload with GPU and other external resources</title>
<description>&lt;p&gt;Apache Flink 1.11 introduces a new &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/external_resources.html&quot;&gt;External Resource Framework&lt;/a&gt;,
which allows you to request external resources from the underlying resource management systems (e.g., Kubernetes) and accelerate your workload with
those resources. As Flink provides a first-party GPU plugin at the moment, we will take GPU as an example and show how it affects Flink applications
in the AI field. Other external resources (e.g. RDMA and SSD) can also be supported &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/external_resources.html#implement-a-plugin-for-your-custom-resource-type&quot;&gt;in a pluggable manner&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&quot;end-to-end-real-time-ai-with-gpu&quot;&gt;End-to-end real-time AI with GPU&lt;/h1&gt;
&lt;p&gt;Recently, AI and Machine Learning have gained additional popularity and have been widely used in various scenarios, such
as personalized recommendation and image recognition. &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Flink&lt;/a&gt;, with the ability to support GPU
allocation, can be used to build an end-to-end real-time AI workflow.&lt;/p&gt;
&lt;h2 id=&quot;why-flink&quot;&gt;Why Flink&lt;/h2&gt;
&lt;p&gt;Typical AI workloads fall into two categories: training and inference.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-06-accelerate-with-external-resources/ai-workflow.png&quot; width=&quot;800px&quot; alt=&quot;Typical AI Workflow&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Typical AI Workflow&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The training workload is usually a batch task, in which we train a model from a bounded dataset. On the other hand, the inference
workload tends to be a streaming job. It consumes an unbounded data stream, which contains image data, for example, and uses a model
to produce the output of predictions. Both workloads need to do data preprocessing first. Flink, as a
&lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;unified batch and stream processing engine&lt;/a&gt;, can be used to build an end-to-end AI workflow naturally.&lt;/p&gt;
&lt;p&gt;In many cases, the training and inference workload can benefit a lot by leveraging GPUs. &lt;a href=&quot;https://azure.microsoft.com/en-us/blog/gpus-vs-cpus-for-deployment-of-deep-learning-models/&quot;&gt;Research&lt;/a&gt;
shows that CPU cluster is outperformed by GPU cluster, which is of similar cost, by about 400 percent. As training datasets
are getting bigger and models more complex, supporting GPUs has become mandatory for running AI workloads.&lt;/p&gt;
&lt;p&gt;With the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/external_resources.html&quot;&gt;External Resource Framework&lt;/a&gt;
and its &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/external_resources.html#plugin-for-gpu-resources&quot;&gt;GPU plugin&lt;/a&gt;, Flink
can now request GPU resources from the external resource management system and expose GPU information to operators. With this
feature, users can now easily build end-to-end training and real-time inference pipelines with GPU support on Flink.&lt;/p&gt;
&lt;h2 id=&quot;example-mnist-inference-with-flink&quot;&gt;Example: MNIST Inference with Flink&lt;/h2&gt;
&lt;p&gt;We take the MNIST inference task as an example to show how to use the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/external_resources.html&quot;&gt;External Resource Framework&lt;/a&gt;
and how to leverage GPUs in Flink. MNIST is a database of handwritten digits, which is usually viewed as the HelloWorld of AI.
The goal is to recognize a 28px*28px picture of a number from 0 to 9.&lt;/p&gt;
&lt;p&gt;First, you need to set configurations for the external resource framework to enable GPU support:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;external-resources: gpu
&lt;span class=&quot;c&quot;&gt;# Define the driver factory class of gpu resource.&lt;/span&gt;
external-resource.gpu.driver-factory.class: org.apache.flink.externalresource.gpu.GPUDriverFactory
&lt;span class=&quot;c&quot;&gt;# Define the amount of gpu resource per TaskManager.&lt;/span&gt;
external-resource.gpu.amount: 1
&lt;span class=&quot;c&quot;&gt;# Enable the coordination mode if you run it in standalone mode&lt;/span&gt;
external-resource.gpu.param.discovery-script.args: --enable-coordination
&lt;span class=&quot;c&quot;&gt;# If you run it on Yarn&lt;/span&gt;
external-resource.gpu.yarn.config-key: yarn.io/gpu
&lt;span class=&quot;c&quot;&gt;# If you run it on Kubernetes&lt;/span&gt;
external-resource.gpu.kubernetes.config-key: nvidia.com/gpu&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For more details of the configuration, please refer to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/external_resources.html#configurations-1&quot;&gt;official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the MNIST inference task, we first need to read the images and do data preprocessing. You can download &lt;a href=&quot;http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz&quot;&gt;training&lt;/a&gt;
or &lt;a href=&quot;http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz&quot;&gt;testing&lt;/a&gt; data from &lt;a href=&quot;http://yann.lecun.com/exdb/mnist/&quot;&gt;this site&lt;/a&gt;.
We provide a simple &lt;a href=&quot;https://github.com/KarmaGYZ/flink-mnist/blob/master/src/main/java/org/apache/flink/MNISTReader.java&quot;&gt;MNISTReader&lt;/a&gt;.
It will read the image data located in the provided file path and transform each image into a list of floating point numbers.&lt;/p&gt;
&lt;p&gt;Then, we need a classifier to recognize those images. A one-layer pre-trained neural network, whose prediction accuracy is 92.14%,
is used in our classify operator. To leverage GPUs in order to accelerate the matrix-matrix multiplication, we use &lt;a href=&quot;https://github.com/jcuda/jcuda&quot;&gt;JCuda&lt;/a&gt;
to call the native Cuda API. The prediction logic of the &lt;a href=&quot;https://github.com/KarmaGYZ/flink-mnist/blob/master/src/main/java/org/apache/flink/MNISTClassifier.java&quot;&gt;MNISTClassifier&lt;/a&gt; is shown below.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MNISTClassifier&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Get the GPU information and select the first GPU.&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ExternalResourceInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;externalResourceInfos&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExternalResourceInfos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;resourceName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;firstIndexOptional&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;externalResourceInfos&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;index&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Initialize JCublas with the selected GPU&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;JCuda&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cudaSetDevice&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;parseInt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;firstIndexOptional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;JCublas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cublasInit&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Float&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Performs multiplication using JCublas. The matrixPointer points to our pre-trained model.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;JCublas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cublasSgemv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;sc&quot;&gt;&amp;#39;n&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DIMENSIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DIMENSIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.0f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;matrixPointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DIMENSIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inputPointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.0f&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Read the result back from GPU.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;JCublas&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cublasGetVector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DIMENSIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Sizeof&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;outputPointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pointer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;to&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DIMENSIONS&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;f1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The complete MNIST inference project can be found &lt;a href=&quot;https://github.com/KarmaGYZ/flink-mnist&quot;&gt;here&lt;/a&gt;. In this project, we simply
print the inference result to &lt;strong&gt;STDOUT&lt;/strong&gt;. In the actual production environment, you could also write the result to Elasticsearch or Kafka, for example.&lt;/p&gt;
&lt;p&gt;The MNIST inference task is just a simple case that shows you how the external resource framework works and what Flink can
do with GPU support. With Flink’s open source extension &lt;a href=&quot;https://github.com/alibaba/Alink&quot;&gt;Alink&lt;/a&gt;, which contains a lot of
pre-built algorithms based on Flink, and &lt;a href=&quot;https://github.com/alibaba/flink-ai-extended&quot;&gt;Tensorflow on Flink&lt;/a&gt;, some complex
AI workloads, e.g. online learning, real-time inference service, could be easily implemented as well.&lt;/p&gt;
&lt;h1 id=&quot;other-external-resources&quot;&gt;Other external resources&lt;/h1&gt;
&lt;p&gt;In addition to GPU support, there are many other external resources that can be used to accelerate jobs in some specific scenarios.
E.g. FPGA, for AI workloads, is supported by both Yarn and Kubernetes. Some low-latency network devices, like RDMA and Solarflare, also
provide their device plugin for Kubernetes. Currently, Yarn supports GPUs and FPGAs, while the list of Kubernetes’ device plugins can be found &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#examples&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;With the external resource framework, you only need to implement a plugin that enables the operator to get the information
for these external resources; see &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/external_resources.html#implement-a-plugin-for-your-custom-resource-type&quot;&gt;Custom Plugin&lt;/a&gt;
for more details. If you just want to ensure that an external resource exists in the TaskManager, then you only need to find the
configuration key of that resource in the underlying resource management system and configure the external resource framework accordingly.&lt;/p&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In the latest Flink release (Flink 1.11), an external resource framework has been introduced to support requesting various types of
resources from the underlying resource management systems, and supply all the necessary information for using these resources to the
operators. The first-party GPU plugin expands the application prospects of Flink in the AI domain. Different resource types can be supported
in a pluggable way. You can also implement your own plugins for custom resource types.&lt;/p&gt;
&lt;p&gt;Future developments in this area include implementing operator level resource isolation and fine-grained external resource scheduling.
The community may kick this work off once &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation&quot;&gt;FLIP-56&lt;/a&gt;
is finished. If you have any suggestions or questions for the community, we encourage you to sign up to the Apache Flink
&lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; and join the discussion there.&lt;/p&gt;
</description>
<pubDate>Thu, 06 Aug 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/08/06/external-resource.html</link>
<guid isPermaLink="true">/news/2020/08/06/external-resource.html</guid>
</item>
<item>
<title>PyFlink: The integration of Pandas into PyFlink</title>
<description>&lt;p&gt;Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-04-pyflink-pandas/python-scientific-stack.png&quot; width=&quot;450px&quot; alt=&quot;Python Scientific Stack&quot; /&gt;
&lt;/center&gt;
&lt;center&gt;
&lt;a href=&quot;https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science?slide=52&quot;&gt;Pic source: VanderPlas 2017, slide 52.&lt;/a&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;In an effort to meet the user needs and demands, the Flink community hopes to leverage and make better use of these tools. Along this direction, the Flink community put some great effort in integrating Pandas into PyFlink with the latest Flink version 1.11. Some of the added features include &lt;strong&gt;support for Pandas UDF&lt;/strong&gt; and the &lt;strong&gt;conversion between Pandas DataFrame and Table&lt;/strong&gt;. Pandas UDF not only greatly improve the execution performance of Python UDF, but also make it more convenient for users to leverage libraries such as Pandas and NumPy in Python UDF. Additionally, providing support for the conversion between Pandas DataFrame and Table enables users to switch processing engines seamlessly without the need for an intermediate connector. In the remainder of this article, we will introduce how these functionalities work and how to use them with a step-by-step example.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Currently, only Scalar Pandas UDFs are supported in PyFlink.&lt;/p&gt;
&lt;/div&gt;
&lt;h1 id=&quot;pandas-udf-in-flink-111&quot;&gt;Pandas UDF in Flink 1.11&lt;/h1&gt;
&lt;p&gt;Using scalar Python UDF was already possible in Flink 1.10 as described in a &lt;a href=&quot;https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html&quot;&gt;previous article on the Flink blog&lt;/a&gt;. Scalar Python UDFs work based on three primary steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;the Java operator serializes one input row to bytes and sends them to the Python worker;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the Python worker deserializes the input row and evaluates the Python UDF with it;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;the resulting row is serialized and sent back to the Java operator&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;While providing support for Python UDFs in PyFlink greatly improved the user experience, it had some drawbacks, namely resulting in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;High serialization/deserialization overhead&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Difficulty when leveraging popular Python libraries used by data scientists — such as Pandas or NumPy — that provide high-performance data structure and functions.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format (&lt;a href=&quot;https://arrow.apache.org/docs/format/Columnar.html&quot;&gt;Arrow memory format&lt;/a&gt;). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-04-pyflink-pandas/vm-communication.png&quot; width=&quot;550px&quot; alt=&quot;VM Communication&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The performance of vectorized UDFs is usually much higher when compared to the normal Python UDF, as the serialization/deserialization overhead is minimized by falling back to &lt;a href=&quot;https://arrow.apache.org/&quot;&gt;Apache Arrow&lt;/a&gt;, while handling &lt;code&gt;pandas.Series&lt;/code&gt; as input/output allows us to take full advantage of the Pandas and NumPy libraries, making it a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).&lt;/p&gt;
&lt;h1 id=&quot;conversion-between-pyflink-table-and-pandas-dataframe&quot;&gt;Conversion between PyFlink Table and Pandas DataFrame&lt;/h1&gt;
&lt;p&gt;Pandas DataFrame is the de-facto standard for working with tabular data in the Python community while PyFlink Table is Flink’s representation of the tabular data in Python. Enabling the conversion between PyFlink Table and Pandas DataFrame allows switching between PyFlink and Pandas seamlessly when processing data in Python. Users can process data by utilizing one execution engine and switch to a different one effortlessly. For example, in case users already have a Pandas DataFrame at hand and want to perform some expensive transformation, they can easily convert it to a PyFlink Table and leverage the power of the Flink engine. On the other hand, users can also convert a PyFlink Table to a Pandas DataFrame and perform the same transformation with the rich functionalities provided by the Pandas ecosystem.&lt;/p&gt;
&lt;h1 id=&quot;examples&quot;&gt;Examples&lt;/h1&gt;
&lt;p&gt;Using Python in Apache Flink requires installing PyFlink, which is available on &lt;a href=&quot;https://pypi.org/project/apache-flink/&quot;&gt;PyPI&lt;/a&gt; and can be easily installed using &lt;code&gt;pip&lt;/code&gt;. Before installing PyFlink, check the working version of Python running in your system using:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python --version
Python 3.7.6&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Please note that Python 3.5 or higher is required to install and run PyFlink&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python -m pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;using-pandas-udf&quot;&gt;Using Pandas UDF&lt;/h2&gt;
&lt;p&gt;Pandas UDFs take &lt;code&gt;pandas.Series&lt;/code&gt; as the input and return a &lt;code&gt;pandas.Series&lt;/code&gt; of the same length as the output. Pandas UDFs can be used at the exact same place where non-Pandas functions are currently being utilized. To mark a UDF as a Pandas UDF, you only need to add an extra parameter udf_type=”pandas” in the udf decorator:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;pandas&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# takes id: pandas.Series and temperature: pandas.Series as input&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# use interpolate() to interpolate the missing temperature&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;limit_direction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;both&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# output temperature: pandas.Series&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The Pandas UDF above uses the Pandas &lt;code&gt;dataframe.interpolate()&lt;/code&gt; function to interpolate the missing temperature data for each equipment id. This is a common IoT scenario whereby each equipment/device reports it’s id and temperature to be analyzed, but the temperature field may be null due to various reasons.
With the function, you can register and use it in the same way as the &lt;a href=&quot;https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html&quot;&gt;normal Python UDF&lt;/a&gt;. Below is a complete example of how to use the Pandas UDF in PyFlink.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.udf&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_parallelism&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_config&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_configuration&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_boolean&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;python.fn-execution.memory.managed&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;True&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FLOAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;pandas&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# takes id: pandas.Series and temperature: pandas.Series as input&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;({&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# use interpolate() to interpolate the missing temperature&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;df&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupby&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;id&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;group&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;limit_direction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;both&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# output temperature: pandas.Series&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolated_df&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;temperature&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;interpolate&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;interpolate&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;my_source_ddl&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; create table mySource (&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; id INT,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; temperature FLOAT &lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; ) with (&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;connector.type&amp;#39; = &amp;#39;filesystem&amp;#39;,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;format.type&amp;#39; = &amp;#39;csv&amp;#39;,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;connector.path&amp;#39; = &amp;#39;/tmp/input&amp;#39;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; )&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;my_sink_ddl&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; create table mySink (&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; id INT,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; temperature FLOAT &lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; ) with (&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;connector.type&amp;#39; = &amp;#39;filesystem&amp;#39;,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;format.type&amp;#39; = &amp;#39;csv&amp;#39;,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; &amp;#39;connector.path&amp;#39; = &amp;#39;/tmp/output&amp;#39;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; )&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute_sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_source_ddl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute_sql&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;my_sink_ddl&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;\
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;id, interpolate(id, temperature) as temperature&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;pandas_udf_demo&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To submit the job:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Firstly, you need to prepare the input data in the “/tmp/input” file. For example,&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo&lt;/span&gt; -e &lt;span class=&quot;s2&quot;&gt;&amp;quot;1,98.0\n1,\n1,100.0\n2,99.0&amp;quot;&lt;/span&gt; &amp;gt; /tmp/input&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Next, you can run this example on the command line,&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python pandas_udf_demo.py&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, see more details &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/cli.html#job-submission-examples&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Finally, you can see the execution result on the command line. As you can see, all the temperature data with an empty value has been interpolated:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt; cat /tmp/output
1,98.0
1,99.0
1,100.0
2,99.0&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conversion-between-pyflink-table-and-pandas-dataframe-1&quot;&gt;Conversion between PyFlink Table and Pandas DataFrame&lt;/h2&gt;
&lt;p&gt;You can use the &lt;code&gt;from_pandas()&lt;/code&gt; method to create a PyFlink Table from a Pandas DataFrame or use the &lt;code&gt;to_pandas()&lt;/code&gt; method to convert a PyFlink Table to a Pandas DataFrame.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pandas&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pd&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;numpy&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;np&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Create a PyFlink Table&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pd&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataFrame&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;np&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;random&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rand&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_pandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;a&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;b&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;a &amp;gt; 0.5&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# Convert the PyFlink Table to a Pandas DataFrame&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;to_pandas&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pdf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id=&quot;conclusion--upcoming-work&quot;&gt;Conclusion &amp;amp; Upcoming work&lt;/h1&gt;
&lt;p&gt;In this article, we introduce the integration of Pandas in Flink 1.11, including Pandas UDF and the conversion between Table and Pandas. In fact, in the latest Apache Flink release, there are many excellent features added to PyFlink, such as support of User-defined Table functions and User-defined Metrics for Python UDFs. What’s more, from Flink 1.11, you can build PyFlink with Cython support and “Cythonize” your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).&lt;/p&gt;
&lt;p&gt;Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases. Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-08-04-pyflink-pandas/mission-of-pyFlink.gif&quot; width=&quot;600px&quot; alt=&quot;Mission of PyFlink&quot; /&gt;
&lt;/center&gt;
</description>
<pubDate>Tue, 04 Aug 2020 02:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/08/04/pyflink-pandas-udf-support-flink.html</link>
<guid isPermaLink="true">/2020/08/04/pyflink-pandas-udf-support-flink.html</guid>
</item>
<item>
<title>Advanced Flink Application Patterns Vol.3: Custom Window Processing</title>
<description>&lt;style type=&quot;text/css&quot;&gt;
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
.tg .tg-wide{padding:10px 30px;}
.tg .tg-top{vertical-align:top}
.tg .tg-topcenter{text-align:center;vertical-align:top}
.tg .tg-center{text-align:center;vertical-align:center}
&lt;/style&gt;
&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;In the previous articles of the series, we described how you can achieve
flexible stream partitioning based on dynamically-updated configurations
(a set of fraud-detection rules) and how you can utilize Flink&#39;s
Broadcast mechanism to distribute processing configuration at runtime
among the relevant operators. &lt;/p&gt;
&lt;p&gt;Following up directly where we left the discussion of the end-to-end
solution last time, in this article we will describe how you can use the
&quot;Swiss knife&quot; of Flink - the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/process_function.html&quot;&gt;&lt;em&gt;Process Function&lt;/em&gt;&lt;/a&gt; to create an
implementation that is tailor-made to match your streaming business
logic requirements. Our discussion will continue in the context of the
&lt;a href=&quot;/news/2020/01/15/demo-fraud-detection.html#fraud-detection-demo&quot;&gt;Fraud Detection engine&lt;/a&gt;. We will also demonstrate how you can
implement your own &lt;strong&gt;custom replacement for time windows&lt;/strong&gt; for cases
where the out-of-the-box windowing available from the DataStream API
does not satisfy your requirements. In particular, we will look at the
trade-offs that you can make when designing a solution which requires
low-latency reactions to individual events.&lt;/p&gt;
&lt;p&gt;This article will describe some high-level concepts that can be applied
independently, but it is recommended that you review the material in
&lt;a href=&quot;/news/2020/01/15/demo-fraud-detection.html&quot;&gt;part one&lt;/a&gt; and
&lt;a href=&quot;/news/2020/03/24/demo-fraud-detection-2.html&quot;&gt;part two&lt;/a&gt; of the series as well as checkout the &lt;a href=&quot;https://github.com/afedulov/fraud-detection-demo&quot;&gt;code
base&lt;/a&gt; in order to make
it easier to follow along.&lt;/p&gt;
&lt;h2 id=&quot;processfunction-as-a-window&quot;&gt;ProcessFunction as a “Window”&lt;/h2&gt;
&lt;h3 id=&quot;low-latency&quot;&gt;Low Latency&lt;/h3&gt;
&lt;p&gt;Let’s start with a reminder of the type of fraud detection rule that we
would like to support:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“Whenever the &lt;strong&gt;sum&lt;/strong&gt; of  &lt;strong&gt;payments&lt;/strong&gt; from the same &lt;strong&gt;payer&lt;/strong&gt; to the
same &lt;strong&gt;beneficiary&lt;/strong&gt; within &lt;strong&gt;a 24 hour
period&lt;/strong&gt; is &lt;strong&gt;greater&lt;/strong&gt; than &lt;strong&gt;200 000 $&lt;/strong&gt; - trigger an alert.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In other words, given a stream of transactions partitioned by a key that
combines the payer and the beneficiary fields, we would like to look
back in time and determine, for each incoming transaction, if the sum of
all previous payments between the two specific participants exceeds the
defined threshold. In effect, the computation window is always moved
along to the position of the last observed event for a particular data
partitioning key.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/time-windows.png&quot; width=&quot;600px&quot; alt=&quot;Figure 1: Time Windows&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 1: Time Windows&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;One of the common key requirements for a fraud detection system is &lt;em&gt;low
response time&lt;/em&gt;. The sooner the fraudulent action gets detected, the
higher the chances that it can be blocked and its negative consequences
mitigated. This requirement is especially prominent in the financial
domain, where you have one important constraint - any time spent
evaluating a fraud detection model is time that a law-abiding user of
your system will spend waiting for a response. Swiftness of processing
often becomes a competitive advantage between various payment systems
and the time limit for producing an alert could lie as low as &lt;em&gt;300-500
ms&lt;/em&gt;. This is all the time you get from the moment of ingestion of a
transaction event into a fraud detection system until an alert has to
become available to downstream systems. &lt;/p&gt;
&lt;p&gt;As you might know, Flink provides a powerful &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/windows.html&quot;&gt;Window
API&lt;/a&gt;
that is applicable for a wide range of use cases. However, if you go
over all of the available types of supported windows, you will realize
that none of them exactly match our main requirement for this use case -
the low-latency evaluation of &lt;em&gt;each&lt;/em&gt; incoming transaction. There is
no type of window in Flink that can express the &lt;em&gt;“x minutes/hours/days
back from the &lt;u&gt;current event&lt;/u&gt;”&lt;/em&gt; semantic. In the Window API, events
fall into windows (as defined by the window
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/windows.html#window-assigners&quot;&gt;assigners&lt;/a&gt;),
but they cannot themselves individually control the creation and
evaluation of windows*. As described above, our goal for the fraud
detection engine is to achieve immediate evaluation of the previous
relevant data points as soon as the new event is received. This raises
the question of feasibility of applying the Window API in this case. The Window API offers some options for defining custom triggers, evictors, and window assigners, which may get to the required result. However, it is usually difficult to get this right (and easy to break). Moreover, this approach does not provide access to broadcast state, which is required for implementing dynamic reconfiguration of business rules.&lt;/p&gt;
&lt;p&gt;*) apart from the session windows, but they are limited to assignments
based on the session &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/windows.html#session-windows&quot;&gt;gaps&lt;/a&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/evaluation-delays.png&quot; width=&quot;600px&quot; alt=&quot;Figure 2: Evaluation Delays&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 2: Evaluation Delays&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Let’s take an example of using a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/windows.html#sliding-windows&quot;&gt;sliding
window&lt;/a&gt;
from Flink’s Window API. Using sliding windows with the slide of &lt;em&gt;S&lt;/em&gt;
translates into an expected value of evaluation delay equal to &lt;em&gt;S/2.&lt;/em&gt;
This means that you would need to define a window slide of 600-1000 ms
to fulfill the low-latency requirement of 300-500 ms delay, even before
taking any actual computation time into account. The fact that Flink
stores a separate window state for each sliding window pane renders this
approach unfeasible under any moderately high load conditions.&lt;/p&gt;
&lt;p&gt;In order to satisfy the requirements, we need to create our own
low-latency window implementation. Luckily, Flink gives us all the tools
required to do so. &lt;code&gt;ProcessFunction&lt;/code&gt; is a low-level, but powerful
building block in Flink&#39;s API. It has a simple contract:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;SomeProcessFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KeyType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;InputType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OutputType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;InputType&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OutputType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;onTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OnTimerContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OutputType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;processElement()&lt;/code&gt; receives input events one by one. You can react to
each input by producing one or more output events to the next
operator by calling &lt;code&gt;out.collect(someOutput)&lt;/code&gt;. You can also pass data
to a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/side_output.html&quot;&gt;side
output&lt;/a&gt;
or ignore a particular input altogether.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;onTimer()&lt;/code&gt; is called by Flink when a previously-registered timer
fires. Both event time and processing time timers are supported.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;open()&lt;/code&gt; is equivalent to a constructor. It is called inside of the
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/glossary.html#flink-taskmanager&quot;&gt;TaskManager’s&lt;/a&gt;
JVM, and is used for initialization, such as registering
Flink-managed state. It is also the right place to initialize fields
that are not serializable and cannot be transferred from the
JobManager’s JVM.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Most importantly, &lt;code&gt;ProcessFunction&lt;/code&gt; also has access to the fault-tolerant
state, handled by Flink. This combination, together with Flink&#39;s
message processing and delivery guarantees, makes it possible to build
resilient event-driven applications with almost arbitrarily
sophisticated business logic. This includes creation and processing of
custom windows with state.&lt;/p&gt;
&lt;h3 id=&quot;implementation&quot;&gt;Implementation&lt;/h3&gt;
&lt;h4 id=&quot;state-and-clean-up&quot;&gt;State and Clean-up&lt;/h4&gt;
&lt;p&gt;In order to be able to process time windows, we need to keep track of
data belonging to the window inside of our program. To ensure that this
data is fault-tolerant and can survive failures in a distributed system,
we should store it inside of Flink-managed state. As the time
progresses, we do not need to keep all previous transactions. According
to the sample rule, all events that are older than 24 hours become
irrelevant. We are looking at a window of data that constantly moves and
where stale transactions need to be constantly moved out of scope (in
other words, cleaned up from state).&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/window-clean-up.png&quot; width=&quot;400px&quot; alt=&quot;Figure 3: Window Clean-up&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 3: Window Clean-up&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;We will
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/state/state.html#using-keyed-state&quot;&gt;use&lt;/a&gt;
&lt;code&gt;MapState&lt;/code&gt; to store the individual events of the window. In order to allow
efficient clean-up of the out-of-scope events, we will utilize event
timestamps as the &lt;code&gt;MapState&lt;/code&gt; keys.&lt;/p&gt;
&lt;p&gt;In a general case, we have to take into account the fact that there
might be different events with exactly the same timestamp, therefore
instead of individual Transaction per key(timestamp) we will store sets.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;MapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Side Note &lt;/span&gt;
when any Flink-managed state is used inside a
&lt;code&gt;KeyedProcessFunction&lt;/code&gt;, the data returned by the &lt;code&gt;state.value()&lt;/code&gt; call is
automatically scoped by the key of the &lt;em&gt;currently-processed event&lt;/em&gt;
- see Figure 4. If &lt;code&gt;MapState&lt;/code&gt; is used, the same principle applies, with
the difference that a &lt;code&gt;Map&lt;/code&gt; is returned instead of &lt;code&gt;MyObject&lt;/code&gt;. If you are
compelled to do something like
&lt;code&gt;mapState.value().get(inputEvent.getKey())&lt;/code&gt;, you should probably be using
&lt;code&gt;ValueState&lt;/code&gt; instead of the &lt;code&gt;MapState&lt;/code&gt;. As we want to store &lt;em&gt;multiple values
per event key&lt;/em&gt;, in our case, &lt;code&gt;MapState&lt;/code&gt; is the right choice.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/keyed-state-scoping.png&quot; width=&quot;800px&quot; alt=&quot;Figure 4: Keyed State Scoping&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 4: Keyed State Scoping&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;p&gt;As described in the &lt;a href=&quot;/news/2020/01/15/demo-fraud-detection.html&quot;&gt;first blog of the series&lt;/a&gt;, we are dispatching events based on the keys
specified in the active fraud detection rules. Multiple distinct rules
can be based on the same grouping key. This means that our alerting
function can potentially receive transactions scoped by the same key
(e.g. &lt;code&gt;{payerId=25;beneficiaryId=12}&lt;/code&gt;), but destined to be evaluated
according to different rules, which implies potentially different
lengths of the time windows. This raises the question of how can we best
store fault-tolerant window state within the &lt;code&gt;KeyedProcessFunction&lt;/code&gt;. One
approach would be to create and manage separate &lt;code&gt;MapStates&lt;/code&gt; per rule. Such
an approach, however, would be wasteful - we would separately hold state
for overlapping time windows, and therefore unnecessarily store
duplicate events. A better approach is to always store just enough data
to be able to estimate all currently active rules which are scoped by
the same key. In order to achieve that, whenever a new rule is added, we
will determine if its time window has the largest span and store it in
the broadcast state under the special reserved &lt;code&gt;WIDEST_RULE_KEY&lt;/code&gt;. This
information will later be used during the state clean-up procedure, as
described later in this section.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;updateWidestWindowRule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;updateWidestWindowRule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;widestWindowRule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WIDEST_RULE_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;widestWindowRule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WIDEST_RULE_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;widestWindowRule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWindowMillis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWindowMillis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WIDEST_RULE_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let’s now look at the implementation of the main method,
&lt;code&gt;processElement()&lt;/code&gt;, in some detail.&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&quot;/news/2020/01/15/demo-fraud-detection.html#dynamic-data-partitioning&quot;&gt;previous blog post&lt;/a&gt;, we described how &lt;code&gt;DynamicKeyFunction&lt;/code&gt; allowed
us to perform dynamic data partitioning based on the &lt;code&gt;groupingKeyNames&lt;/code&gt;
parameter in the rule definition. The subsequent description is focused
around the &lt;code&gt;DynamicAlertFunction&lt;/code&gt;, which makes use of the remaining rule
settings.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/sample-rule-definition.png&quot; width=&quot;700px&quot; alt=&quot;Figure 5: Sample Rule Definition&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 5: Sample Rule Definition&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;As described in the previous parts of the blog post
series, our alerting process function receives events of type
&lt;code&gt;Keyed&amp;lt;Transaction, String, Integer&amp;gt;&lt;/code&gt;, where &lt;code&gt;Transaction&lt;/code&gt; is the main
“wrapped” event, String is the key (&lt;em&gt;payer #x - beneficiary #y&lt;/em&gt; in
Figure 1), and &lt;code&gt;Integer&lt;/code&gt; is the ID of the rule that caused the dispatch of
this event. This rule was previously &lt;a href=&quot;/news/2020/03/24/demo-fraud-detection-2.html#broadcast-state-pattern&quot;&gt;stored in the broadcast state&lt;/a&gt; and has to be retrieved from that state by the ID. Here is the
outline of the implementation:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicAlertFunction&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedBroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Add Transaction to state&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWrapped&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (1)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;addToStateValuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWrapped&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Calculate the aggregate value&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Descriptors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;rulesDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (2)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowStartTimestampForEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWindowStartTimestampFor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (3)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SimpleAccumulator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BigDecimal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregator&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RuleHelper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAggregator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (4)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isStateValueInWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowStartForEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;aggregateValuesInState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Evaluate the rule and trigger an alert if violated&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BigDecimal&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregateResult&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getLocalValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (5)&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;isRuleViolated&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;apply&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aggregateResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isRuleViolated&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;decisionTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;System&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;currentTimeMillis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;decisionTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getWrapped&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;aggregateResult&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Register timers to ensure state cleanup&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cleanupTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// &amp;lt;--- (6)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timerService&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerEventTimeTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cleanupTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;
Here are the details of the steps:&lt;br /&gt;
1) We first add each new event to our window state:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;V&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;V&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;addToStateValuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;K&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;V&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;K&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;V&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;V&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;mapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HashSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;mapState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valuesSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;2) Next, we retrieve the previously-broadcasted rule, according to
which the incoming transaction needs to be evaluated.&lt;/p&gt;
&lt;p&gt;3) &lt;code&gt;getWindowStartTimestampFor&lt;/code&gt; determines, given the window span defined
in the rule, and the current transaction timestamp, how far back in
time our evaluation should span.&lt;/p&gt;
&lt;p&gt;4) The aggregate value is calculated by iterating over all window state
entries and applying an aggregate function. It could be an &lt;em&gt;average,
max, min&lt;/em&gt; or, as in the example rule from the beginning of this
section, a &lt;em&gt;sum&lt;/em&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;boolean&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;isStateValueInWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowStartForEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowStartForEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;currentEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;aggregateValuesInState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SimpleAccumulator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BigDecimal&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Set&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inWindow&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;inWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BigDecimal&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;aggregatedValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FieldsExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBigDecimalByName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAggregateFieldName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;aggregator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;aggregatedValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;5) Having an aggregate value, we can compare it to the threshold value
that is specified in the rule definition and fire an alert, if
necessary.&lt;/p&gt;
&lt;p&gt;6) At the end, we register a clean-up timer using
&lt;code&gt;ctx.timerService().registerEventTimeTimer()&lt;/code&gt;. This timer will be
responsible for removing the current transaction when it is going to
move out of scope.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note &lt;/span&gt;
Notice the rounding during timer creation. It is an important technique
which enables a reasonable trade-off between the precision with which
the timers will be triggered, and the number of timers being used.
Timers are stored in Flink’s fault-tolerant state, and managing them
with millisecond-level precision can be wasteful. In our case, with this
rounding, we will create at most one timer per key in any given second. Flink documentation provides some additional &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/process_function.html#timer-coalescing&quot;&gt;&lt;u&gt;details&lt;/u&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;7) The &lt;code&gt;onTimer&lt;/code&gt; method will trigger the clean-up of the window state.&lt;/p&gt;
&lt;p&gt;As previously described, we are always keeping as many events in the
state as required for the evaluation of an active rule with the widest
window span. This means that during the clean-up, we only need to remove
the state which is out of scope of this widest window.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/widest-window.png&quot; width=&quot;800px&quot; alt=&quot;Figure 6: Widest Window&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 6: Widest Window&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;This is how the clean-up procedure can be implemented:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;onTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OnTimerContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;widestWindowRule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Descriptors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;rulesDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WIDEST_RULE_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cleanupEventTimeWindow&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ofNullable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;widestWindowRule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;Rule:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getWindowMillis&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Optional&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cleanupEventTimeThreshold&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cleanupEventTimeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Remove events that are older than (timestamp - widestWindowSpan)ms&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;cleanupEventTimeThreshold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ifPresent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evictOutOfScopeElementsFromWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;evictOutOfScopeElementsFromWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threshold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;try&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keys&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;iterator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;while&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;hasNext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stateEventTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;threshold&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;keys&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;remove&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catch&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;throw&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RuntimeException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ex&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
You might be wondering why we did not use &lt;code&gt;ListState&lt;/code&gt; , as we are always
iterating over all of the values of the window state? This is actually
an optimization for the case when &lt;code&gt;RocksDBStateBackend&lt;/code&gt;
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/state_backends.html#the-rocksdbstatebackend&quot;&gt;is used&lt;/a&gt;. Iterating over a &lt;code&gt;ListState&lt;/code&gt; would cause all of the &lt;code&gt;Transaction&lt;/code&gt;
objects to be deserialized. Using &lt;code&gt;MapState&lt;/code&gt;&#39;s keys iterator only causes
deserialization of the keys (type &lt;code&gt;long&lt;/code&gt;), and therefore reduces the
computational overhead.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;This concludes the description of the implementation details. Our
approach triggers evaluation of a time window as soon as a new
transaction arrives. It therefore fulfills the main requirement that we
have targeted - low delay for potentially issuing an alert. For the
complete implementation, please have a look at
&lt;a href=&quot;https://github.com/afedulov/fraud-detection-demo&quot;&gt;the project on github&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;improvements-and-optimizations&quot;&gt;Improvements and Optimizations&lt;/h2&gt;
&lt;p&gt;What are the pros and cons of the described approach?&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Pros:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Low latency capabilities&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tailored solution with potential use-case specific optimizations&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Efficient state reuse (shared state for the rules with the same key)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Cons:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Cannot make use of potential future optimizations in the existing
Window API&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;No late event handling, which is available out of the box in the
Window API&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Quadratic computation complexity and potentially large state&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let’s now look at the latter two drawbacks and see if we can address
them.&lt;/p&gt;
&lt;h4 id=&quot;late-events&quot;&gt;Late events:&lt;/h4&gt;
&lt;p&gt;Processing late events poses a certain question - is it still meaningful
to re-evaluate the window in case of a late event arrival? In case this
is required, you would need to extend the widest window used for the
clean-up by your maximum expected out-of-orderness. This would avoid
having potentially incomplete time window data for such late firings
(see Figure 7).&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/late-events.png&quot; width=&quot;500px&quot; alt=&quot;Figure 7: Late Events Handling&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 7: Late Events Handling&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;It can be argued, however, that for a use case that puts emphasis on low
latency processing, such late triggering would be meaningless. In this
case, we could keep track of the most recent timestamp that we have
observed so far, and for events that do not monotonically increase this
value, only add them to the state and skip the aggregate calculation and
the alert triggering logic.&lt;/p&gt;
&lt;h4 id=&quot;redundant-re-computations-and-state-size&quot;&gt;Redundant Re-computations and State Size:&lt;/h4&gt;
&lt;p&gt;In our described implementation we keep individual transactions in state
and go over them to calculate the aggregate again and again on every new
event. This is obviously not optimal in terms of wasting computational
resources on repeated calculations.&lt;/p&gt;
&lt;p&gt;What is the main reason to keep the individual transactions in state?
The granularity of stored events directly corresponds to the precision
of the time window calculation. Because we store transactions
individually, we can precisely ignore individual transactions as soon as
they leave the exact 2592000000 ms time window (30 days in ms). At this
point, it is worth raising the question - do we really need this
milliseconds precision when estimating such a long time window, or is it
OK to accept potential false positives in exceptional cases? If the
answer for your use case is that such precision is not needed, you could
implement additional optimization based on bucketing and
pre-aggregation. The idea of this optimization can be broken down as
follows:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Instead of storing individual events, create a parent class that can
either contain fields of a single transaction, or combined values,
calculated based on applying an aggregate function to a set of
transactions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Instead of using timestamps in milliseconds as &lt;code&gt;MapState&lt;/code&gt; keys, round
them to the level of “resolution” that you are willing to accept
(for instance, a full minute). Each entry therefore represents a
bucket.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Whenever a window is evaluated, append the new transaction’s data to
the bucket aggregate instead of storing individual data points per
transaction.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/pre-aggregation.png&quot; width=&quot;700px&quot; alt=&quot;Figure 8: Pre-aggregation&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 8: Pre-aggregation&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h4 id=&quot;state-data-and-serializers&quot;&gt;State Data and Serializers&lt;/h4&gt;
&lt;p&gt;Another question that we can ask ourselves in order to further optimize
the implementation is how probable is it to get different events with
exactly the same timestamp. In the described implementation, we
demonstrated one way of approaching this question by storing sets of
transactions per timestamp in &lt;code&gt;MapState&amp;lt;Long, Set&amp;lt;Transaction&amp;gt;&amp;gt;&lt;/code&gt;. Such
a choice, however, might have a more significant effect on performance
than might be anticipated. The reason is that Flink does not currently
provide a native &lt;code&gt;Set&lt;/code&gt; serializer and will enforce a fallback to the less
efficient &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/types_serialization.html#general-class-types&quot;&gt;Kryo
serializer&lt;/a&gt;
instead
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16729&quot;&gt;FLINK-16729&lt;/a&gt;). A
meaningful alternative strategy is to assume that, in a normal scenario,
no two discrepant events can have exactly the same timestamp and to turn
the window state into a &lt;code&gt;MapState&amp;lt;Long, Transaction&amp;gt;&lt;/code&gt; type. You can use
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/side_output.html&quot;&gt;side-outputs&lt;/a&gt;
to collect and monitor any unexpected occurrences which contradict your
assumption. During performance optimizations, I generally recommend you
to &lt;a href=&quot;https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#disabling-kryo&quot;&gt;disable the fallback to
Kryo&lt;/a&gt;
and verify where your application might be further optimized by ensuring
that &lt;a href=&quot;https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html#performance-comparison&quot;&gt;more efficient
serializers&lt;/a&gt;
are being used.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Tip:&lt;/span&gt;
you can quickly determine which serializer is going to be
used for your classes by setting a breakpoint and verifying the type of
the returned TypeInformation.
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/type-pojo.png&quot; alt=&quot;POJO&quot; /&gt;&lt;/td&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
&lt;i&gt;PojoTypeInfo&lt;/i&gt; indicates that that an efficient Flink POJO serializer will be used.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tg-top&quot;&gt;
&lt;img src=&quot;/img/blog/patterns-blog-3/type-kryo.png&quot; alt=&quot;Kryo&quot; /&gt;&lt;/td&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
&lt;i&gt;GenericTypeInfo&lt;/i&gt; indicates the fallback to a Kryo serializer.&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Event pruning&lt;/strong&gt;: instead of storing complete events and putting
additional stress on the ser/de machinery, we can reduce individual
events data to only relevant information. This would potentially require
“unpacking” individual events as fields, and storing those fields into a
generic &lt;code&gt;Map&amp;lt;String, Object&amp;gt;&lt;/code&gt; data structure, based on the
configurations of active rules.&lt;/p&gt;
&lt;p&gt;While this adjustment could potentially produce significant improvements
for objects of large size, it should not be your first pick as it can
easily turn into a premature optimization.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary:&lt;/h2&gt;
&lt;p&gt;This article concludes the description of the implementation of the
fraud detection engine that we started in &lt;a href=&quot;/news/2020/01/15/demo-fraud-detection.html&quot;&gt;part one&lt;/a&gt;. In this blog
post we demonstrated how &lt;code&gt;ProcessFunction&lt;/code&gt; can be utilized to
&quot;impersonate&quot; a window with a sophisticated custom logic. We have
discussed the pros and cons of such approach and elaborated how custom
use-case-specific optimizations can be applied - something that would
not be directly possible with the Window API.&lt;/p&gt;
&lt;p&gt;The goal of this blog post was to illustrate the power and flexibility
of Apache Flink’s APIs. At the core of it are the pillars of Flink, that
spare you, as a developer, very significant amounts of work and
generalize well to a wide range of use cases by providing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Efficient data exchange in a distributed cluster&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Horizontal scalability via data partitioning&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Fault-tolerant state with quick, local access&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Convenient abstraction for working with this state, which is as simple as using a
local variable&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Multi-threaded, parallel execution engine. &lt;code&gt;ProcessFunction&lt;/code&gt; code runs
in a single thread, without the need for synchronization. Flink
handles all the parallel execution aspects and correct access to the
shared state, without you, as a developer, having to think about it
(concurrency is hard).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All these aspects make it possible to build applications with Flink that
go well beyond trivial streaming ETL use cases and enable implementation
of arbitrarily-sophisticated, distributed event-driven applications.
With Flink, you can rethink approaches to a wide range of use cases
which normally would rely on using stateless parallel execution nodes
and “pushing” the concerns of state fault tolerance to a database, an
approach that is often destined to run into scalability issues in the
face of ever-increasing data volumes.&lt;/p&gt;
</description>
<pubDate>Thu, 30 Jul 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/07/30/demo-fraud-detection-3.html</link>
<guid isPermaLink="true">/news/2020/07/30/demo-fraud-detection-3.html</guid>
</item>
<item>
<title>Flink SQL Demo: Building an End-to-End Streaming Application</title>
<description>&lt;p&gt;Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.&lt;/p&gt;
&lt;p&gt;In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user behavior in real-time. All exercises in this blogpost are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation. The final result of this demo is shown in the following figure:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image1.gif&quot; width=&quot;650px&quot; alt=&quot;Demo Overview&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h1 id=&quot;preparation&quot;&gt;Preparation&lt;/h1&gt;
&lt;p&gt;Prepare a Linux or MacOS computer with Docker installed.&lt;/p&gt;
&lt;h2 id=&quot;starting-the-demo-environment&quot;&gt;Starting the Demo Environment&lt;/h2&gt;
&lt;p&gt;The components required in this demo are all managed in containers, so we will use &lt;code&gt;docker-compose&lt;/code&gt; to start them. First, download the &lt;code&gt;docker-compose.yml&lt;/code&gt; file that defines the demo environment, for example by running the following commands:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;mkdir flink-sql-demo; cd flink-sql-demo;
wget https://raw.githubusercontent.com/wuchong/flink-sql-demo/v1.11-EN/docker-compose.yml
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The Docker Compose environment consists of the following containers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Flink SQL CLI:&lt;/strong&gt; used to submit queries and visualize their results.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Flink Cluster:&lt;/strong&gt; a Flink JobManager and a Flink TaskManager container to execute queries.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;MySQL:&lt;/strong&gt; MySQL 5.7 and a pre-populated &lt;code&gt;category&lt;/code&gt; table in the database. The &lt;code&gt;category&lt;/code&gt; table will be joined with data in Kafka to enrich the real-time data.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kafka:&lt;/strong&gt; mainly used as a data source. The DataGen component automatically writes data into a Kafka topic.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Zookeeper:&lt;/strong&gt; this component is required by Kafka.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Elasticsearch:&lt;/strong&gt; mainly used as a data sink.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kibana:&lt;/strong&gt; used to visualize the data in Elasticsearch.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;DataGen:&lt;/strong&gt; the data generator. After the container is started, user behavior data is automatically generated and sent to the Kafka topic. By default, 2000 data entries are generated each second for about 1.5 hours. You can modify DataGen’s &lt;code&gt;speedup&lt;/code&gt; parameter in &lt;code&gt;docker-compose.yml&lt;/code&gt; to adjust the generation rate (which takes effect after Docker Compose is restarted).&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;alert alert-danger&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-danger&quot; style=&quot;display: inline-block&quot;&gt; Note &lt;/span&gt;
Before starting the containers, we recommend configuring Docker so that sufficient resources are available and the environment does not become unresponsive. We suggest running Docker at 3-4 GB memory and 3-4 CPU cores.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;To start all containers, run the following command in the directory that contains the &lt;code&gt;docker-compose.yml&lt;/code&gt; file.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker-compose up -d
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run &lt;code&gt;docker ps&lt;/code&gt; to check whether the 9 containers are running properly. You can also visit &lt;a href=&quot;http://localhost:5601/&quot;&gt;http://localhost:5601/&lt;/a&gt; to see if Kibana is running normally.&lt;/p&gt;
&lt;p&gt;Don’t forget to run the following command to stop all containers after you finished the tutorial:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker-compose down
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;entering-the-flink-sql-cli-client&quot;&gt;Entering the Flink SQL CLI client&lt;/h2&gt;
&lt;p&gt;To enter the SQL CLI client run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;docker-compose &lt;span class=&quot;nb&quot;&gt;exec &lt;/span&gt;sql-client ./sql-client.sh&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The command starts the SQL CLI client in the container.
You should see the welcome screen of the CLI client.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image3.png&quot; width=&quot;500px&quot; alt=&quot;Flink SQL CLI welcome page&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;creating-a-kafka-table-using-ddl&quot;&gt;Creating a Kafka table using DDL&lt;/h2&gt;
&lt;p&gt;The DataGen container continuously writes events into the Kafka &lt;code&gt;user_behavior&lt;/code&gt; topic. This data contains the user behavior on the day of November 27, 2017 (behaviors include “click”, “like”, “purchase” and “add to shopping cart” events). Each row represents a user behavior event, with the user ID, product ID, product category ID, event type, and timestamp in JSON format. Note that the dataset is from the &lt;a href=&quot;https://tianchi.aliyun.com/dataset/dataDetail?dataId=649&quot;&gt;Alibaba Cloud Tianchi public dataset&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the directory that contains &lt;code&gt;docker-compose.yml&lt;/code&gt;, run the following command to view the first 10 data entries generated in the Kafka topic:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker-compose exec kafka bash -c &#39;kafka-console-consumer.sh --topic user_behavior --bootstrap-server kafka:9094 --from-beginning --max-messages 10&#39;
{&quot;user_id&quot;: &quot;952483&quot;, &quot;item_id&quot;:&quot;310884&quot;, &quot;category_id&quot;: &quot;4580532&quot;, &quot;behavior&quot;: &quot;pv&quot;, &quot;ts&quot;: &quot;2017-11-27T00:00:00Z&quot;}
{&quot;user_id&quot;: &quot;794777&quot;, &quot;item_id&quot;:&quot;5119439&quot;, &quot;category_id&quot;: &quot;982926&quot;, &quot;behavior&quot;: &quot;pv&quot;, &quot;ts&quot;: &quot;2017-11-27T00:00:00Z&quot;}
...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In order to make the events in the Kafka topic accessible to Flink SQL, we run the following DDL statement in SQL CLI to create a table that connects to the topic in the Kafka cluster:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_behavior&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;item_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;category_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;behavior&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;proctime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PROCTIME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- generates processing-time attribute using computed column&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;5&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SECOND&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- defines watermark on ts column, marks ts as event-time attribute&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- using kafka connector&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;topic&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;user_behavior&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- kafka topic&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;scan.startup.mode&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;earliest-offset&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- reading from the beginning&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;properties.bootstrap.servers&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka:9094&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- kafka broker address&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;format&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;json&amp;#39;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- the data format is json&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The above snippet declares five fields based on the data format. In addition, it uses the computed column syntax and built-in &lt;code&gt;PROCTIME()&lt;/code&gt; function to declare a virtual column that generates the processing-time attribute. It also uses the &lt;code&gt;WATERMARK&lt;/code&gt; syntax to declare the watermark strategy on the &lt;code&gt;ts&lt;/code&gt; field (tolerate 5-seconds out-of-order). Therefore, the &lt;code&gt;ts&lt;/code&gt; field becomes an event-time attribute. For more information about time attributes and DDL syntax, see the following official documents:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/streaming/time_attributes.html&quot;&gt;Time attributes in Flink’s Table API &amp;amp; SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table&quot;&gt;DDL Syntax in Flink SQL&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;After creating the &lt;code&gt;user_behavior&lt;/code&gt; table in the SQL CLI, run &lt;code&gt;SHOW TABLES;&lt;/code&gt; and &lt;code&gt;DESCRIBE user_behavior;&lt;/code&gt; to see registered tables and table details. Also, run the command &lt;code&gt;SELECT * FROM user_behavior;&lt;/code&gt; directly in the SQL CLI to preview the data (press &lt;code&gt;q&lt;/code&gt; to exit).&lt;/p&gt;
&lt;p&gt;Next, we discover more about Flink SQL through three real-world scenarios.&lt;/p&gt;
&lt;h1 id=&quot;hourly-trading-volume&quot;&gt;Hourly Trading Volume&lt;/h1&gt;
&lt;h2 id=&quot;creating-an-elasticsearch-table-using-ddl&quot;&gt;Creating an Elasticsearch table using DDL&lt;/h2&gt;
&lt;p&gt;Let’s create an Elasticsearch result table in the SQL CLI. We need two columns in this case: &lt;code&gt;hour_of_day&lt;/code&gt; and &lt;code&gt;buy_cnt&lt;/code&gt; (trading volume).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buy_cnt_per_hour&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hour_of_day&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;buy_cnt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;elasticsearch-7&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- using elasticsearch connector&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;hosts&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://elasticsearch:9200&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- elasticsearch address&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;index&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;buy_cnt_per_hour&amp;#39;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- elasticsearch index name, similar to database table name&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;There is no need to create the &lt;code&gt;buy_cnt_per_hour&lt;/code&gt; index in Elasticsearch in advance since Elasticsearch will automatically create the index if it does not exist.&lt;/p&gt;
&lt;h2 id=&quot;submitting-a-query&quot;&gt;Submitting a Query&lt;/h2&gt;
&lt;p&gt;The hourly trading volume is the number of “buy” behaviors completed each hour. Therefore, we can use a &lt;code&gt;TUMBLE&lt;/code&gt; window function to assign data into hourly windows. Then, we count the number of “buy” records in each window. To implement this, we can filter out the “buy” data first and then apply &lt;code&gt;COUNT(*)&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buy_cnt_per_hour&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TUMBLE_START&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_behavior&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;behavior&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;buy&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TUMBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOUR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Here, we use the built-in &lt;code&gt;HOUR&lt;/code&gt; function to extract the value for each hour in the day from a &lt;code&gt;TIMESTAMP&lt;/code&gt; column. Use &lt;code&gt;INSERT INTO&lt;/code&gt; to start a Flink SQL job that continuously writes results into the Elasticsearch &lt;code&gt;buy_cnt_per_hour&lt;/code&gt; index. The Elasticearch result table can be seen as a materialized view of the query. You can find more information about Flink’s window aggregation in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#group-windows&quot;&gt;Apache Flink documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;After running the previous query in the Flink SQL CLI, we can observe the submitted task on the &lt;a href=&quot;http://localhost:8081&quot;&gt;Flink Web UI&lt;/a&gt;. This task is a streaming task and therefore runs continuously.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image4.jpg&quot; width=&quot;800px&quot; alt=&quot;Flink Dashboard&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;using-kibana-to-visualize-results&quot;&gt;Using Kibana to Visualize Results&lt;/h2&gt;
&lt;p&gt;Access Kibana at &lt;a href=&quot;http://localhost:5601&quot;&gt;http://localhost:5601&lt;/a&gt;. First, configure an index pattern by clicking “Management” in the left-side toolbar and find “Index Patterns”. Next, click “Create Index Pattern” and enter the full index name &lt;code&gt;buy_cnt_per_hour&lt;/code&gt; to create the index pattern. After creating the index pattern, we can explore data in Kibana.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note &lt;/span&gt;
Since we are using the TUMBLE window of one hour here, it might take about four minutes between the time that containers started and until the first row is emitted. Until then the index does not exist and Kibana is unable to find the index.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Click “Discover” in the left-side toolbar. Kibana lists the content of the created index.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image5.jpg&quot; width=&quot;800px&quot; alt=&quot;Kibana Discover&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Next, create a dashboard to display various views. Click “Dashboard” on the left side of the page to create a dashboard named “User Behavior Analysis”. Then, click “Create New” to create a new view. Select “Area” (area graph), then select the &lt;code&gt;buy_cnt_per_hour&lt;/code&gt; index, and draw the trading volume area chart as illustrated in the configuration on the left side of the following diagram. Apply the changes by clicking the “▶” play button. Then, save it as “Hourly Trading Volume”.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image6.jpg&quot; width=&quot;800px&quot; alt=&quot;Hourly Trading Volume&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;You can see that during the early morning hours the number of transactions have the lowest value for the entire day.&lt;/p&gt;
&lt;p&gt;As real-time data is added into the indices, you can enable auto-refresh in Kibana to see real-time visualization changes and updates. You can do so by clicking the time picker and entering a refresh interval (e.g. 3 seconds) in the “Refresh every” field.&lt;/p&gt;
&lt;h1 id=&quot;cumulative-number-of-unique-visitors-every-10-min&quot;&gt;Cumulative number of Unique Visitors every 10-min&lt;/h1&gt;
&lt;p&gt;Another interesting visualization is the cumulative number of unique visitors (UV). For example, the number of UV at 10:00 represents the total number of UV from 00:00 to 10:00. Therefore, the curve is monotonically increasing.&lt;/p&gt;
&lt;p&gt;Let’s create another Elasticsearch table in the SQL CLI to store the UV results. This table contains 3 columns: date, time and cumulative UVs.
The &lt;code&gt;date_str&lt;/code&gt; and &lt;code&gt;time_str&lt;/code&gt; column are defined as primary key, Elasticsearch sink will use them to calculate the document ID and work in upsert mode to update UV values under the document ID.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cumulative_uv&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;date_str&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;time_str&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;uv&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;date_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ENFORCED&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;elasticsearch-7&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;hosts&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://elasticsearch:9200&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;index&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;cumulative_uv&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can extract the date and time using &lt;code&gt;DATE_FORMAT&lt;/code&gt; function based on the &lt;code&gt;ts&lt;/code&gt; field. As the section title describes, we only need to report every 10 minutes. So, we can use &lt;code&gt;SUBSTR&lt;/code&gt; and the string concat function &lt;code&gt;||&lt;/code&gt; to convert the time value into a 10-minute interval time string, such as &lt;code&gt;12:00&lt;/code&gt;, &lt;code&gt;12:10&lt;/code&gt;.
Next, we group data by &lt;code&gt;date_str&lt;/code&gt; and perform a &lt;code&gt;COUNT DISTINCT&lt;/code&gt; aggregation on &lt;code&gt;user_id&lt;/code&gt; to get the current cumulative UV in this day. Additionally, we perform a &lt;code&gt;MAX&lt;/code&gt; aggregation on &lt;code&gt;time_str&lt;/code&gt; field to get the current stream time: the maximum event time observed so far.
As the maximum time is also a part of the primary key of the sink, the final result is that we will insert a new point into the elasticsearch every 10 minute. And every latest point will be updated continuously until the next 10-minute point is generated.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cumulative_uv&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;date_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;MAX&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;uv&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;yyyy-MM-dd&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;date_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SUBSTR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DATE_FORMAT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ts&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;HH:mm&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;||&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;0&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;time_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_behavior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;date_str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After submitting this query, we create a &lt;code&gt;cumulative_uv&lt;/code&gt; index pattern in Kibana. We then create a “Line” (line graph) on the dashboard, by selecting the &lt;code&gt;cumulative_uv&lt;/code&gt; index, and drawing the cumulative UV curve according to the configuration on the left side of the following figure before finally saving the curve.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image7.jpg&quot; width=&quot;800px&quot; alt=&quot;Cumulative Unique Visitors every 10-min&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h1 id=&quot;top-categories&quot;&gt;Top Categories&lt;/h1&gt;
&lt;p&gt;The last visualization represents the category rankings to inform us on the most popular categories in our e-commerce site. Since our data source offers events for more than 5,000 categories without providing any additional significance to our analytics, we would like to reduce it so that it only includes the top-level categories. We will use the data in our MySQL database by joining it as a dimension table with our Kafka events to map sub-categories to top-level categories.&lt;/p&gt;
&lt;p&gt;Create a table in the SQL CLI to make the data in MySQL accessible to Flink SQL.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;category_dim&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sub_category_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;parent_category_name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;jdbc&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;url&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;jdbc:mysql://mysql:3306/flink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;table-name&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;category&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;username&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;root&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;password&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;123456&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;lookup.cache.max-rows&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;5000&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;lookup.cache.ttl&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;10min&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The underlying JDBC connector implements the &lt;code&gt;LookupTableSource&lt;/code&gt; interface, so the created JDBC table &lt;code&gt;category_dim&lt;/code&gt; can be used as a temporal table (i.e. lookup table) out-of-the-box in the data enrichment.&lt;/p&gt;
&lt;p&gt;In addition, create an Elasticsearch table to store the category statistics.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;top_category&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;category_name&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;PRIMARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ENFORCED&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;buy_cnt&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;elasticsearch-7&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;hosts&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;http://elasticsearch:9200&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;index&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;top_category&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In order to enrich the category names, we use Flink SQL’s temporal table joins to join a dimension table. You can access more information about &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/streaming/joins.html#join-with-a-temporal-table&quot;&gt;temporal joins&lt;/a&gt; in the Flink documentation.&lt;/p&gt;
&lt;p&gt;Additionally, we use the &lt;code&gt;CREATE VIEW&lt;/code&gt; syntax to register the query as a logical view, allowing us to easily reference this query in subsequent queries and simplify nested queries. Please note that creating a logical view does not trigger the execution of the job and the view results are not persisted. Therefore, this statement is lightweight and does not have additional overhead.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;VIEW&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rich_user_behavior&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;item_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;behavior&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;parent_category_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;category_name&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user_behavior&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;category_dim&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYSTEM_TIME&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;proctime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;C&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;U&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;category_id&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;C&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sub_category_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, we group the dimensional table by category name to count the number of &lt;code&gt;buy&lt;/code&gt; events and write the result to Elasticsearch’s &lt;code&gt;top_category&lt;/code&gt; index.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;top_category&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;category_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;buy_cnt&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rich_user_behavior&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;behavior&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;buy&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;category_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After submitting the query, we create a &lt;code&gt;top_category&lt;/code&gt; index pattern in Kibana. We then create a “Horizontal Bar” (bar graph) on the dashboard, by selecting the &lt;code&gt;top_category&lt;/code&gt; index and drawing the category ranking according to the configuration on the left side of the following diagram before finally saving the list.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-28-flink-sql-demo/image8.jpg&quot; width=&quot;800px&quot; alt=&quot;Top Categories&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;As illustrated in the diagram, the categories of clothing and shoes exceed by far other categories on the e-commerce website.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;We have now implemented three practical applications and created charts for them. We can now return to the dashboard page and drag-and-drop each view to give our dashboard a more formal and intuitive style, as illustrated in the beginning of the blogpost. Of course, Kibana also provides a rich set of graphics and visualization features, and the user_behavior logs contain a lot more interesting information to explore. Using Flink SQL, you can analyze data in more dimensions, while using Kibana allows you to display more views and observe real-time changes in its charts!&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;In the previous sections, we described how to use Flink SQL to integrate Kafka, MySQL, Elasticsearch, and Kibana to quickly build a real-time analytics application. The entire process can be completed using standard SQL syntax, without a line of Java or Scala code. We hope that this article provides some clear and practical examples of the convenience and power of Flink SQL, featuring an easy connection to various external systems, native support for event time and out-of-order handling, dimension table joins and a wide range of built-in functions. We hope you have fun following the examples in this blogpost!&lt;/p&gt;
</description>
<pubDate>Tue, 28 Jul 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html</link>
<guid isPermaLink="true">/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html</guid>
</item>
<item>
<title>Flink Community Update - July&#39;20</title>
<description>&lt;p&gt;As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project.&lt;/p&gt;
&lt;p&gt;Also, events are starting to pick up again, so we’ve put together a list of some great ones you can (virtually) attend in August!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#the-past-month-in-flink&quot; id=&quot;markdown-toc-the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-releases&quot; id=&quot;markdown-toc-flink-releases&quot;&gt;Flink Releases&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-111&quot; id=&quot;markdown-toc-flink-111&quot;&gt;Flink 1.11&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-1111&quot; id=&quot;markdown-toc-flink-1111&quot;&gt;Flink 1.11.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#gearing-up-for-flink-112&quot; id=&quot;markdown-toc-gearing-up-for-flink-112&quot;&gt;Gearing up for Flink 1.12&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#a-look-into-the-evolution-of-flink-releases&quot; id=&quot;markdown-toc-a-look-into-the-evolution-of-flink-releases&quot;&gt;A Look Into the Evolution of Flink Releases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#first-time-contributor-guide&quot; id=&quot;markdown-toc-first-time-contributor-guide&quot;&gt;First-time Contributor Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#replacing-charged-words-in-the-flink-repo&quot; id=&quot;markdown-toc-replacing-charged-words-in-the-flink-repo&quot;&gt;Replacing “charged” words in the Flink repo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#upcoming-events-and-more&quot; id=&quot;markdown-toc-upcoming-events-and-more&quot;&gt;Upcoming Events (and More!)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/h1&gt;
&lt;h2 id=&quot;flink-releases&quot;&gt;Flink Releases&lt;/h2&gt;
&lt;h3 id=&quot;flink-111&quot;&gt;Flink 1.11&lt;/h3&gt;
&lt;p&gt;A couple of weeks ago, Flink 1.11 was announced in what was (again) the biggest Flink release to date (&lt;em&gt;see &lt;a href=&quot;#a-look-into-the-evolution-of-flink-releases&quot;&gt;“A Look Into the Evolution of Flink Releases”&lt;/a&gt;&lt;/em&gt;)! The new release brought significant improvements to usability as well as new features to Flink users across the API stack. Some highlights of Flink 1.11 are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Unaligned checkpoints to cope with high backpressure scenarios;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The new source API, that simplifies and unifies the implementation of (custom) sources;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for Change Data Capture (CDC) and other common use cases in the Table API/SQL;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pandas UDFs and other performance optimizations in PyFlink, making it more powerful for data science and ML workloads.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a more detailed look into the release, you can recap the &lt;a href=&quot;https://flink.apache.org/news/2020/07/06/release-1.11.0.html&quot;&gt;announcement blogpost&lt;/a&gt; and join the upcoming meetup on &lt;a href=&quot;https://www.meetup.com/seattle-flink/events/271922632/&quot;&gt;“What’s new in Flink 1.11?”&lt;/a&gt;, where you’ll be able to ask anything release-related to Aljoscha Krettek (Flink PMC Member). The community has also been working on a series of blogposts that deep-dive into the most significant features and improvements in 1.11, so keep an eye on the &lt;a href=&quot;https://flink.apache.org/blog/&quot;&gt;Flink blog&lt;/a&gt;!&lt;/p&gt;
&lt;h3 id=&quot;flink-1111&quot;&gt;Flink 1.11.1&lt;/h3&gt;
&lt;p&gt;Shortly after releasing Flink 1.11, the community announced the first patch version to cover some outstanding issues in the major release. This version is &lt;strong&gt;particularly important for users of the Table API/SQL&lt;/strong&gt;, as it addresses known limitations that affect the usability of new features like changelog sources and support for JDBC catalogs.&lt;/p&gt;
&lt;p&gt;You can find a detailed list with all the improvements and bugfixes that went into Flink 1.11.1 in the &lt;a href=&quot;https://flink.apache.org/news/2020/07/21/release-1.11.1.html&quot;&gt;announcement blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;gearing-up-for-flink-112&quot;&gt;Gearing up for Flink 1.12&lt;/h2&gt;
&lt;p&gt;The Flink 1.12 release cycle has been kicked-off last week and a discussion about what features will go into the upcoming release is underway in &lt;a href=&quot;https://lists.apache.org/thread.html/rb01160c7c9c26304a7665f9a252d4ed1583173620df307015c095fcf%40%3Cdev.flink.apache.org%3E&quot;&gt;this @dev Mailing List thread&lt;/a&gt;. While we wait for more of these ideas to turn into proposals and JIRA issues, here are some recent FLIPs that are already being discussed in the Flink community:&lt;/p&gt;
&lt;table class=&quot;table table-bordered&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;FLIP&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866298&quot;&gt;FLIP-130&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Support Python DataStream API&lt;/b&gt;&lt;/li&gt;
&lt;p&gt;Python support in Flink has so far been bounded to the Table API/SQL. These APIs are high-level and convenient, but have some limitations for more complex stream processing use cases. To expand the usability of PyFlink to a broader set of use cases, FLIP-130 proposes to support it also in the DataStream API, starting with stateless operations.&lt;/p&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-132+Temporal+Table+DDL&quot;&gt;FLIP-132&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Temporal Table DDL&lt;/b&gt;&lt;/li&gt;
&lt;p&gt;Flink SQL users can&#39;t currently create temporal tables using SQL DDL, which forces them to change context frequently for use cases that require them. FLIP-132 proposes to extend the DDL syntax to support temporal tables, which in turn will allow to also bring &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#join-with-a-temporal-table&quot;&gt;temporal joins&lt;/a&gt; with changelog sources to Flink SQL.&lt;/p&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;
&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;2 new PMC Members&lt;/strong&gt; since the last update. Congratulations!&lt;/p&gt;
&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars0.githubusercontent.com/u/8957547?s=400&amp;amp;u=4560f775da9ebc5f3aa2e1563f57cdad03862ce8&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/PiotrNowojski&quot;&gt;Piotr Nowojski&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars0.githubusercontent.com/u/6239804?s=460&amp;amp;u=6cd81b1ab38fcc6a5736fcfa957c51093bf060e2&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/LiyuApache&quot;&gt;Yu Li&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;
&lt;h2 id=&quot;a-look-into-the-evolution-of-flink-releases&quot;&gt;A Look Into the Evolution of Flink Releases&lt;/h2&gt;
&lt;p&gt;It’s &lt;a href=&quot;https://flink.apache.org/news/2020/04/01/community-update.html#a-look-into-the-flink-repository&quot;&gt;been a while&lt;/a&gt; since we had a look at community numbers, so this time we’d like to shed some light on the evolution of contributors and, well, work across releases. Let’s have a look at some &lt;em&gt;git&lt;/em&gt; data:&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-29-community-update/2020-07-29_releases.png&quot; width=&quot;600px&quot; alt=&quot;Flink Releases&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;If we consider Flink 1.8 (Apr. 2019) as the baseline, the Flink community more than &lt;strong&gt;tripled&lt;/strong&gt; the number of implemented and/or resolved issues in a single release with the support of an &lt;strong&gt;additional ~100 contributors&lt;/strong&gt; in Flink 1.11. This is pretty impressive on its own, and even more so if you consider that Flink contributors are distributed around the globe, working across different locations and timezones!&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;first-time-contributor-guide&quot;&gt;First-time Contributor Guide&lt;/h2&gt;
&lt;p&gt;Flink has an extensive guide for &lt;a href=&quot;https://flink.apache.org/contributing/how-to-contribute.html&quot;&gt;code and non-code contributions&lt;/a&gt; that helps new community members navigate the project and get familiar with existing contribution guidelines. In particular for code contributions, knowing where to start can be difficult, given the sheer size of the Flink codebase and the pace of development of the project.&lt;/p&gt;
&lt;p&gt;To better guide new contributors, a brief section was added to the guide on &lt;a href=&quot;https://flink.apache.org/contributing/contribute-code.html#looking-for-what-to-contribute&quot;&gt;how to look for what to contribute&lt;/a&gt; and the &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18704?filter=12349196&quot;&gt;&lt;em&gt;starter&lt;/em&gt; label&lt;/a&gt; has been revived in Jira to highlight issues that are suitable for first-time contributors.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note &lt;/span&gt;
As a reminder, you no longer need to ask for contributor permissions to start contributing to Flink. Once you’ve found something you’d like to work on, read the &lt;a href=&quot;https://flink.apache.org/contributing/contribute-code.html&quot;&gt;contribution guide&lt;/a&gt; carefully and reach out to a Flink Committer, who will be able to help you get started.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;replacing-charged-words-in-the-flink-repo&quot;&gt;Replacing “charged” words in the Flink repo&lt;/h2&gt;
&lt;p&gt;The community is working on gradually replacing words that are outdated and carry a negative connotation in the Flink codebase, such as “master/slave” and “whitelist/blacklist”. The progress of this work can be tracked in &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18209&quot;&gt;FLINK-18209&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h1 id=&quot;upcoming-events-and-more&quot;&gt;Upcoming Events (and More!)&lt;/h1&gt;
&lt;p&gt;We’re happy to see the “high season” of virtual events approaching, with a lot of great conferences taking place in the coming month, as well as some meetups. Here, we highlight some of the Flink talks happening in those events, but we recommend checking out the complete event programs!&lt;/p&gt;
&lt;p&gt;As usual, we also leave you with some resources to read and explore.&lt;/p&gt;
&lt;table class=&quot;table table-bordered&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-console&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Events&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;b&gt;Virtual Flink Meetup (Jul. 29)&lt;/b&gt;
&lt;p&gt;&lt;a href=&quot;https://www.meetup.com/seattle-flink/events/271922632/&quot;&gt;What’s new in Flink 1.11? + Q&amp;amp;A with Aljoscha Krettek&lt;/a&gt;&lt;/p&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;b&gt;DC Thursday (Jul. 30)&lt;/b&gt;
&lt;p&gt;&lt;a href=&quot;https://www.eventbrite.com/e/dc-thurs-apache-flink-w-stephan-ewen-tickets-112137488246?utm_campaign=Events%20%26%20Talks&amp;amp;utm_content=135006406&amp;amp;utm_medium=social&amp;amp;utm_source=twitter&amp;amp;hss_channel=tw-2581958070&quot;&gt;Interview and Community Q&amp;amp;A with Stephan Ewen&lt;/a&gt;&lt;/p&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;b&gt;KubeCon + CloudNativeCon Europe (Aug. 17-20)&lt;/b&gt;
&lt;p&gt;&lt;a href=&quot;https://kccnceu20.sched.com/event/ZelA/stateful-serverless-and-the-elephant-in-the-room-stephan-ewen-ververica&quot;&gt;Stateful Serverless and the Elephant in the Room&lt;/a&gt;&lt;/p&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;b&gt;DataEngBytes (Aug. 20-21)&lt;/b&gt;
&lt;p&gt;&lt;a href=&quot;https://dataengconf.com.au/&quot;&gt;Change Data Capture with Flink SQL and Debezium&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://dataengconf.com.au/&quot;&gt;Sweet Streams are Made of These: Data Driven Development with Stream Processing&lt;/a&gt;&lt;/p&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;b&gt;Beam Summit (Aug. 24-29)&lt;/b&gt;
&lt;p&gt;&lt;a href=&quot;https://2020.beamsummit.org/sessions/streaming-fast-slow/&quot;&gt;Streaming, Fast and Slow&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://2020.beamsummit.org/sessions/building-stateful-streaming-pipelines/&quot;&gt;Building Stateful Streaming Pipelines With Beam&lt;/a&gt;&lt;/p&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon-fire&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Blogposts&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;b&gt;Flink 1.11 Series&lt;/b&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/07/14/application-mode.html&quot;&gt;Application Deployment in Flink: Current State and the new Application Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/2020/07/23/catalogs.html&quot;&gt;Sharing is caring - Catalogs in Flink SQL (Tutorial)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/2020/07/28/flink-sql-demo-building-e2e-streaming-application.html&quot;&gt;Flink SQL Demo: Building an End-to-End Streaming Application (Tutorial)&lt;/a&gt;&lt;/li&gt;
&lt;p&gt;&lt;/p&gt;
&lt;b&gt;Other&lt;/b&gt;
&lt;li&gt;&lt;a href=&quot;https://blogs.oracle.com/javamagazine/streaming-analytics-with-java-and-apache-flink?source=:em:nw:mt::RC_WWMK200429P00043:NSL400072808&amp;amp;elq_mid=167902&amp;amp;sh=162609181316181313222609291604350235&amp;amp;cmid=WWMK200429P00043C0004&quot;&gt;Streaming analytics with Java and Apache Flink (Tutorial)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.ververica.com/blog/flink-for-online-machine-learning-and-real-time-processing-at-weibo&quot;&gt;Flink for online Machine Learning and real-time processing at Weibo&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.ververica.com/blog/data-driven-matchmaking-at-azar-with-apache-flink&quot;&gt;Data-driven Matchmaking at Azar with Apache Flink&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-certificate&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Flink Packages&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;&lt;p&gt;&lt;a href=&quot;https://flink-packages.org/&quot;&gt;Flink Packages&lt;/a&gt; is a website where you can explore (and contribute to) the Flink &lt;br /&gt; ecosystem of connectors, extensions, APIs, tools and integrations. &lt;b&gt;New in:&lt;/b&gt; &lt;/p&gt;
&lt;li&gt;&lt;a href=&quot;https://flink-packages.org/packages/flink-metrics-signalfx&quot;&gt; SignalFx Metrics Reporter&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink-packages.org/packages/yauaa&quot;&gt;Yauaa: Yet Another UserAgent Analyzer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr /&gt;
&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more.&lt;/p&gt;
</description>
<pubDate>Mon, 27 Jul 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/07/27/community-update.html</link>
<guid isPermaLink="true">/news/2020/07/27/community-update.html</guid>
</item>
<item>
<title>Sharing is caring - Catalogs in Flink SQL</title>
<description>&lt;p&gt;With an ever-growing number of people working with data, it’s a common practice for companies to build self-service platforms with the goal of democratizing their access across different teams and — especially — to enable users from any background to be independent in their data needs. In such environments, metadata management becomes a crucial aspect. Without it, users often work blindly, spending too much time searching for datasets and their location, figuring out data formats and similar cumbersome tasks.&lt;/p&gt;
&lt;p&gt;In this blog post, we want to give you a high level overview of catalogs in Flink. We’ll describe why you should consider using them and what you can achieve with one in place. To round it up, we’ll also showcase how simple it is to combine catalogs and Flink, in the form of an end-to-end example that you can try out yourself.&lt;/p&gt;
&lt;h2 id=&quot;why-do-i-need-a-catalog&quot;&gt;Why do I need a catalog?&lt;/h2&gt;
&lt;p&gt;Frequently, companies start building a data platform with a metastore, catalog, or schema registry of some sort already in place. Those let you clearly separate making the data available from consuming it. That separation has a few benefits:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Improved productivity&lt;/strong&gt; - The most obvious one. Making data reusable and shifting the focus on building new models/pipelines rather than data cleansing and discovery.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Security&lt;/strong&gt; - You can control the access to certain features of the data. For example, you can make the schema of the dataset publicly available, but limit the actual access to the underlying data only to particular teams.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compliance&lt;/strong&gt; - If you have all the metadata in a central entity, it’s much easier to ensure compliance with GDPR and similar regulations and legal requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;what-is-stored-in-a-catalog&quot;&gt;What is stored in a catalog?&lt;/h2&gt;
&lt;p&gt;Almost all data sets can be described by certain properties that must be known in order to consume them. Those include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Schema&lt;/strong&gt; - It describes the actual contents of the data, what columns it has, what are the constraints (e.g. keys) on which the updates should be performed, which fields can act as time attributes, what are the rules for watermark generation and so on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Location&lt;/strong&gt; - Does the data come from Kafka or a file in a filesystem? How do you connect to the external system? Which topic or file name do you use?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Format&lt;/strong&gt; - Is the data serialized as JSON, CSV, or maybe Avro records?&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Statistics&lt;/strong&gt; - You can also store additional information that can be useful when creating an execution plan of your query. For example, you can choose the best join algorithm, based on the number of rows in joined datasets.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Catalogs don’t have to be limited to the metadata of datasets. You can usually store other objects that can be reused in different scenarios, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Functions&lt;/strong&gt; - It’s very common to have domain specific functions that can be helpful in different use cases. Instead of having to create them in each place separately, you can just create them once and share them with others.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Queries&lt;/strong&gt; - Those can be useful when you don’t want to persist a data set, but want to provide a recipe for creating it from other sources instead.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;catalogs-support-in-flink-sql&quot;&gt;Catalogs support in Flink SQL&lt;/h2&gt;
&lt;p&gt;Starting from version 1.9, Flink has a set of Catalog APIs that allows to integrate Flink with various catalog implementations. With the help of those APIs, you can query tables in Flink that were created in your external catalogs (e.g. Hive Metastore). Additionally, depending on the catalog implementation, you can create new objects such as tables or views from Flink, reuse them across different jobs, and possibly even use them in other tools compatible with that catalog. In other words, you can see catalogs as having a two-fold purpose:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Provide an out-of-the box integration with ecosystems such as RDBMSs or Hive that allows you to query external objects like tables, views, or functions with no additional connector configuration. The connector properties are automatically derived from the catalog itself.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Act as a persistent store for Flink-specific metadata. In this mode, we additionally store connector properties alongside the logical metadata (e.g. schema, object name). That approach enables you to, for example, store a full definition of a Kafka-backed table with records serialized with Avro in Hive that can be later on used by Flink. However, as it incorporates Flink-specific properties, it can not be used by other tools that leverage Hive Metastore.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As of Flink 1.11, there are two catalog implementations supported by the community:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;A comprehensive Hive catalog&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A Postgres catalog (preview, read-only, for now)&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Flink does not store data at rest; it is a compute engine and requires other systems to consume input from and write its output. This means that Flink does not own the lifecycle of the data. Integration with Catalogs does not change that. Flink uses catalogs for metadata management only.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;All you need to do to start querying your tables defined in either of these metastores is to create the corresponding catalogs with connection parameters. Once this is done, you can use them the way you would in any relational database management system.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;c1&quot;&gt;-- create a catalog which gives access to the backing Postgres installation&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CATALOG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;type&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;jdbc&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;property-version&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;base-url&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;jdbc:postgresql://postgres:5432/&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;default-database&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;postgres&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;username&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;postgres&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;password&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;example&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- create a catalog which gives access to the backing Hive installation&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CATALOG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;type&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;hive&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;property-version&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;hive-version&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;2.3.6&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;hive-conf-dir&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;/opt/hive-conf&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After creating the catalogs, you can confirm that they are available to Flink and also list the databases or tables in each of these catalogs:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;show&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;catalogs&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;default_catalog&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- switch the default catalog to Hive&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catalog&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;show&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;databases&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- hive&amp;#39;s default database&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;show&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;dev_orders&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;use&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;catalog&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;show&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tables&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prod_customer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prod_nation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prod_rates&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prod_region&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;region_stats&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- desribe a schema of a table in Postgres, the Postgres types are automatically mapped to&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- Flink&amp;#39;s type system&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;describe&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_customer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;root&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_custkey: INT NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_name: VARCHAR(25) NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_address: VARCHAR(40) NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_nationkey: INT NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_phone: CHAR(15) NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_acctbal: DOUBLE NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_mktsegment: CHAR(10) NOT NULL&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;c1&quot;&gt;-- c_comment: VARCHAR(117) NOT NULL&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now that you know which tables are available, you can write your first query.
In this scenario, we keep customer orders in Hive (&lt;code&gt;dev_orders&lt;/code&gt;) because of their volume, and reference customer data in Postgres (&lt;code&gt;prod_customer&lt;/code&gt;) to be able to easily update it. Let’s write a query that shows customers and their orders by region and order priority for a specific day.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CATALOG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;r_name&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;region&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;o_orderpriority&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;priority&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;DISTINCT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_custkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number_of_customers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o_orderkey&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;number_of_orders&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dev_orders&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- we need to fully qualify the table in hive because we set the&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- current catalog to Postgres&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_customer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o_custkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_custkey&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_nation&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c_nationkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n_nationkey&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_region&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;n_regionkey&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r_regionkey&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FLOOR&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;o_ordertime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TO&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;2020-04-01 0:00:00.000&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o_orderpriority&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;4-NOT SPECIFIED&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o_orderpriority&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ORDER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;r_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o_orderpriority&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Flink’s catalog support also covers storing Flink-specific objects in external catalogs that might not be fully usable by the corresponding external tools. The most notable use case for this is, for example, storing a table that describes a Kafka topic in a Hive catalog. Take the following DDL statement, that contains a watermark declaration as well as a set of connector properties that are not recognizable by Hive. You won’t be able to query the table with Hive, but it will be persisted and can be reused by different Flink jobs.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CATALOG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_lineitem&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_orderkey&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_partkey&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_suppkey&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_linenumber&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTEGER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_quantity&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_extendedprice&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_discount&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_tax&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DOUBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_currency&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_returnflag&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_linestatus&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_ordertime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_shipinstruct&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_shipmode&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_comment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_proctime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PROCTIME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_ordertime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_ordertime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;5&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SECONDS&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;topic&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;lineitem&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;scan.startup.mode&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;earliest-offset&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;properties.bootstrap.servers&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka:9092&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;properties.group.id&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;testGroup&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;format&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;csv&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;csv.field-delimiter&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;|&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With &lt;code&gt;prod_lineitem&lt;/code&gt; stored in Hive, you can now write a query that will enrich the incoming stream with static data kept in Postgres. To illustrate how this works, let’s calculate the item prices based on the current currency rates:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;n&quot;&gt;USE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;CATALOG&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;postgres&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_proctime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;querytime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_orderkey&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;order&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_linenumber&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;linenumber&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_currency&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;currency&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rs_rate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cur_rate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;l_extendedprice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_discount&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_tax&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rs_rate&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;open_in_euro&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;hive&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;default&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;`&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prod_lineitem&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prod_rates&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SYSTEM_TIME&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OF&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_proctime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rs_symbol&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;l_currency&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;l_linestatus&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;O&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The query above uses a &lt;code&gt;SYSTEM AS OF&lt;/code&gt; &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/streaming/temporal_tables.html#temporal-table&quot;&gt;clause&lt;/a&gt; for executing a temporal join. If you’d like to learn more about the different kind of joins you can do in Flink I highly encourage you to check &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/queries.html#joins&quot;&gt;this documentation page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Catalogs can be extremely powerful when building data platforms aimed at reusing the work of different teams in an organization. Centralizing the metadata is a common practice for improving productivity, security, and compliance when working with data.&lt;/p&gt;
&lt;p&gt;Flink provides flexible metadata management capabilities, that aim at reducing the cumbersome, repetitive work needed before querying the data such as defining schemas, connection properties etc. As of version 1.11, Flink provides a native, comprehensive integration with Hive Metastore and a read-only version for Postgres catalogs.&lt;/p&gt;
&lt;p&gt;You can get started with Flink and catalogs by reading &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/catalogs.html&quot;&gt;the docs&lt;/a&gt;. If you want to play around with Flink SQL (e.g. try out how catalogs work in Flink yourself), you can check &lt;a href=&quot;https://github.com/fhueske/flink-sql-demo&quot;&gt;this demo&lt;/a&gt; prepared by our colleagues Fabian and Timo — it runs in a dockerized environment, and we used it for the examples in this blog post.&lt;/p&gt;
</description>
<pubDate>Thu, 23 Jul 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/07/23/catalogs.html</link>
<guid isPermaLink="true">/2020/07/23/catalogs.html</guid>
</item>
<item>
<title>Apache Flink 1.11.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.&lt;/p&gt;
&lt;p&gt;This release includes 44 fixes and minor improvements for Flink 1.11.0. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.11.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.11.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.11.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.11.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15794&quot;&gt;FLINK-15794&lt;/a&gt;] - Rethink default value of kubernetes.container.image
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18324&quot;&gt;FLINK-18324&lt;/a&gt;] - Translate updated data type and function page into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18387&quot;&gt;FLINK-18387&lt;/a&gt;] - Translate &amp;quot;BlackHole SQL Connector&amp;quot; page into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18388&quot;&gt;FLINK-18388&lt;/a&gt;] - Translate &amp;quot;CSV Format&amp;quot; page into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18391&quot;&gt;FLINK-18391&lt;/a&gt;] - Translate &amp;quot;Avro Format&amp;quot; page into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18395&quot;&gt;FLINK-18395&lt;/a&gt;] - Translate &amp;quot;ORC Format&amp;quot; page into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18469&quot;&gt;FLINK-18469&lt;/a&gt;] - Add Application Mode to release notes.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18524&quot;&gt;FLINK-18524&lt;/a&gt;] - Scala varargs cause exception for new inference
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15414&quot;&gt;FLINK-15414&lt;/a&gt;] - KafkaITCase#prepare failed in travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16181&quot;&gt;FLINK-16181&lt;/a&gt;] - IfCallGen will throw NPE for primitive types in blink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16572&quot;&gt;FLINK-16572&lt;/a&gt;] - CheckPubSubEmulatorTest is flaky on Azure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17543&quot;&gt;FLINK-17543&lt;/a&gt;] - Rerunning failed azure jobs fails when uploading logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17636&quot;&gt;FLINK-17636&lt;/a&gt;] - SingleInputGateTest.testConcurrentReadStateAndProcessAndClose: Trying to read from released RecoveredInputChannel
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18097&quot;&gt;FLINK-18097&lt;/a&gt;] - History server doesn&amp;#39;t clean all job json files
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18419&quot;&gt;FLINK-18419&lt;/a&gt;] - Can not create a catalog from user jar
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18434&quot;&gt;FLINK-18434&lt;/a&gt;] - Can not select fields with JdbcCatalog
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18440&quot;&gt;FLINK-18440&lt;/a&gt;] - ROW_NUMBER function: ROW/RANGE not allowed with RANK, DENSE_RANK or ROW_NUMBER functions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18461&quot;&gt;FLINK-18461&lt;/a&gt;] - Changelog source can&amp;#39;t be insert into upsert sink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18470&quot;&gt;FLINK-18470&lt;/a&gt;] - Tests RocksKeyGroupsRocksSingleStateIteratorTest#testMergeIteratorByte &amp;amp; RocksKeyGroupsRocksSingleStateIteratorTest#testMergeIteratorShort fail locally
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18471&quot;&gt;FLINK-18471&lt;/a&gt;] - flink-runtime lists &amp;quot;org.uncommons.maths:uncommons-maths:1.2.2a&amp;quot; as a bundled dependency, but it isn&amp;#39;t
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18477&quot;&gt;FLINK-18477&lt;/a&gt;] - ChangelogSocketExample does not work
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18478&quot;&gt;FLINK-18478&lt;/a&gt;] - AvroDeserializationSchema does not work with types generated by avrohugger
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18485&quot;&gt;FLINK-18485&lt;/a&gt;] - Kerberized YARN per-job on Docker test failed during unzip jce_policy-8.zip
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18519&quot;&gt;FLINK-18519&lt;/a&gt;] - Propagate exception to client when execution fails for REST submission
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18520&quot;&gt;FLINK-18520&lt;/a&gt;] - New Table Function type inference fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18529&quot;&gt;FLINK-18529&lt;/a&gt;] - Query Hive table and filter by timestamp partition can fail
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18539&quot;&gt;FLINK-18539&lt;/a&gt;] - StreamExecutionEnvironment#addSource(SourceFunction, TypeInformation) doesn&amp;#39;t use the user defined type information
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18573&quot;&gt;FLINK-18573&lt;/a&gt;] - InfluxDB reporter cannot be loaded as plugin
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18583&quot;&gt;FLINK-18583&lt;/a&gt;] - The _id field is incorrectly set to index in Elasticsearch6 DynamicTableSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18585&quot;&gt;FLINK-18585&lt;/a&gt;] - Dynamic index can not work in new DynamicTableSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18591&quot;&gt;FLINK-18591&lt;/a&gt;] - Fix the format issue for metrics web page
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18186&quot;&gt;FLINK-18186&lt;/a&gt;] - Various updates on Kubernetes standalone document
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18422&quot;&gt;FLINK-18422&lt;/a&gt;] - Update Prefer tag in documentation &amp;#39;Fault Tolerance training lesson&amp;#39;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18457&quot;&gt;FLINK-18457&lt;/a&gt;] - Fix invalid links in &amp;quot;Detecting Patterns&amp;quot; page of &amp;quot;Streaming Concepts&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18472&quot;&gt;FLINK-18472&lt;/a&gt;] - Local Installation Getting Started Guide
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18484&quot;&gt;FLINK-18484&lt;/a&gt;] - RowSerializer arity error does not provide specific information about the mismatch
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18501&quot;&gt;FLINK-18501&lt;/a&gt;] - Mapping of Pluggable Filesystems to scheme is not properly logged
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18526&quot;&gt;FLINK-18526&lt;/a&gt;] - Add the configuration of Python UDF using Managed Memory in the doc of Pyflink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18532&quot;&gt;FLINK-18532&lt;/a&gt;] - Remove Beta tag from MATCH_RECOGNIZE docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18561&quot;&gt;FLINK-18561&lt;/a&gt;] - Build manylinux1 with better compatibility instead of manylinux2014 Python Wheel Packages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18593&quot;&gt;FLINK-18593&lt;/a&gt;] - Hive bundle jar URLs are broken
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18534&quot;&gt;FLINK-18534&lt;/a&gt;] - KafkaTableITCase.testKafkaDebeziumChangelogSource failed with &amp;quot;Topic &amp;#39;changelog_topic&amp;#39; already exists&amp;quot;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18502&quot;&gt;FLINK-18502&lt;/a&gt;] - Add the page &amp;#39;legacySourceSinks.zh.md&amp;#39; into the directory &amp;#39;docs/dev/table&amp;#39;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18505&quot;&gt;FLINK-18505&lt;/a&gt;] - Correct the content of &amp;#39;sourceSinks.zh.md&amp;#39;
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 21 Jul 2020 20:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/07/21/release-1.11.1.html</link>
<guid isPermaLink="true">/news/2020/07/21/release-1.11.1.html</guid>
</item>
<item>
<title>Application Deployment in Flink: Current State and the new Application Mode</title>
<description>&lt;p&gt;With the rise of stream processing and real-time analytics as a critical tool for modern
businesses, an increasing number of organizations build platforms with Apache Flink at their
core and offer it internally as a service. Many talks with related topics from companies
like &lt;a href=&quot;https://www.youtube.com/watch?v=VX3S9POGAdU&quot;&gt;Uber&lt;/a&gt;, &lt;a href=&quot;https://www.youtube.com/watch?v=VX3S9POGAdU&quot;&gt;Netflix&lt;/a&gt;
and &lt;a href=&quot;https://www.youtube.com/watch?v=cH9UdK0yYjc&quot;&gt;Alibaba&lt;/a&gt; in the latest editions of Flink Forward further
illustrate this trend.&lt;/p&gt;
&lt;p&gt;These platforms aim at simplifying application submission internally by lifting all the
operational burden from the end user. To submit Flink applications, these platforms
usually expose only a centralized or low-parallelism endpoint (&lt;em&gt;e.g.&lt;/em&gt; a Web frontend)
for application submission that we will call the &lt;em&gt;Deployer&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;One of the roadblocks that platform developers and maintainers often mention is that the
Deployer can be a heavy resource consumer that is difficult to provision for. Provisioning
for average load can lead to the Deployer service being overwhelmed with deployment
requests (in the worst case, for all production applications in a short period of time),
while planning based on top load leads to unnecessary costs. Building on this observation,
Flink 1.11 introduces the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/#application-mode&quot;&gt;Application Mode&lt;/a&gt;
as a deployment option, which allows for a lightweight, more scalable application
submission process that manages to spread more evenly the application deployment load
across the nodes in the cluster.&lt;/p&gt;
&lt;p&gt;In order to understand the problem and how the Application Mode solves it, we start by
describing briefly the current status of application execution in Flink, before
describing the architectural changes introduced by the new deployment mode and how to
leverage them.&lt;/p&gt;
&lt;h1 id=&quot;application-execution-in-flink&quot;&gt;Application Execution in Flink&lt;/h1&gt;
&lt;p&gt;The execution of an application in Flink mainly involves three entities: the &lt;em&gt;Client&lt;/em&gt;,
the &lt;em&gt;JobManager&lt;/em&gt; and the &lt;em&gt;TaskManagers&lt;/em&gt;. The Client is responsible for submitting the application to the
cluster, the JobManager is responsible for the necessary bookkeeping during execution,
and the TaskManagers are the ones doing the actual computation. For more details please
refer to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/concepts/flink-architecture.html&quot;&gt;Flink’s Architecture&lt;/a&gt;
documentation page.&lt;/p&gt;
&lt;h2 id=&quot;current-deployment-modes&quot;&gt;Current Deployment Modes&lt;/h2&gt;
&lt;p&gt;Before the introduction of the Application Mode in version 1.11, Flink allowed users to execute an application either on a
&lt;em&gt;Session&lt;/em&gt; or a &lt;em&gt;Per-Job Cluster&lt;/em&gt;. The differences between the two have to do with the cluster
lifecycle and the resource isolation guarantees they provide.&lt;/p&gt;
&lt;h3 id=&quot;session-mode&quot;&gt;Session Mode&lt;/h3&gt;
&lt;p&gt;Session Mode assumes an already running cluster and uses the resources of that cluster to
execute any submitted application. Applications executed in the same (session) cluster use,
and consequently compete for, the same resources. This has the advantage that you do not
pay the resource overhead of spinning up a full cluster for every submitted job. But, if
one of the jobs misbehaves or brings down a TaskManager, then all jobs running on that
TaskManager will be affected by the failure. Apart from a negative impact on the job that
caused the failure, this implies a potential massive recovery process with all the
restarting jobs accessing the file system concurrently and making it unavailable to other
services. Additionally, having a single cluster running multiple jobs implies more load
for the JobManager, which is responsible for the bookkeeping of all the jobs in the
cluster. This mode is ideal for short jobs where startup latency is of high importance,
&lt;em&gt;e.g.&lt;/em&gt; interactive queries.&lt;/p&gt;
&lt;h3 id=&quot;per-job-mode&quot;&gt;Per-Job Mode&lt;/h3&gt;
&lt;p&gt;In Per-Job Mode, the available cluster manager framework (&lt;em&gt;e.g.&lt;/em&gt; YARN or Kubernetes) is
used to spin up a Flink cluster for each submitted job, which is available to that job
only. When the job finishes, the cluster is shut down and any lingering resources
(&lt;em&gt;e.g.&lt;/em&gt; files) are cleaned up. This mode allows for better resource isolation, as a
misbehaving job cannot affect any other job. In addition, it spreads the load of
bookkeeping across multiple entities, as each application has its own JobManager.
Given the aforementioned resource isolation concerns of the Session Mode, users often
opt for the Per-Job Mode for long-running jobs which are willing to accept some increase
in startup latency in favor of resilience.&lt;/p&gt;
&lt;p&gt;To summarize, in Session Mode, the cluster lifecycle is independent of any job running on
the cluster and all jobs running on the cluster share its resources. The per-job mode
chooses to pay the price of spinning up a cluster for every submitted job, in order to
provide better resource isolation guarantees as the resources are not shared across jobs.
In this case, the lifecycle of the cluster is bound to that of the job.&lt;/p&gt;
&lt;h2 id=&quot;application-submission&quot;&gt;Application Submission&lt;/h2&gt;
&lt;p&gt;Flink application execution consists of two stages: &lt;em&gt;pre-flight&lt;/em&gt;, when the users’ &lt;code&gt;main()&lt;/code&gt;
method is called; and &lt;em&gt;runtime&lt;/em&gt;, which is triggered as soon as the user code calls &lt;code&gt;execute()&lt;/code&gt;.
The &lt;code&gt;main()&lt;/code&gt; method constructs the user program using one of Flink’s APIs
(DataStream API, Table API, DataSet API). When the &lt;code&gt;main()&lt;/code&gt; method calls &lt;code&gt;env.execute()&lt;/code&gt;,
the user-defined pipeline is translated into a form that Flink’s runtime can understand,
called the &lt;em&gt;job graph&lt;/em&gt;, and it is shipped to the cluster.&lt;/p&gt;
&lt;p&gt;Despite their differences, both session and per-job modes execute the application’s &lt;code&gt;main()&lt;/code&gt;
method, &lt;em&gt;i.e.&lt;/em&gt; the &lt;em&gt;pre-flight&lt;/em&gt; phase, on the client side.&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;This is usually not a problem for individual users who already have all the dependencies
of their jobs locally, and then submit their applications through a client running on
their machine. But in the case of submission through a remote entity like the Deployer,
this process includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;downloading the application’s dependencies locally,&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;executing the main()method to extract the job graph,&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;ship the job graph and its dependencies to the cluster for execution and,&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;potentially, wait for the result.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This makes the Client a heavy resource consumer as it may need substantial network
bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to
execute the &lt;code&gt;main()&lt;/code&gt; method. This problem is even more pronounced as more users share
the same Client.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-07-14-application-mode/session-per-job.png&quot; width=&quot;75%&quot; alt=&quot;Session and Per-Job Mode&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;The figure above illustrates the two deployment modes using 3 applications depicted in
&lt;span style=&quot;color:red&quot;&gt;red&lt;/span&gt;, &lt;span style=&quot;color:blue&quot;&gt;blue&lt;/span&gt; and &lt;span style=&quot;color:green&quot;&gt;green&lt;/span&gt;.
Each one has a parallelism of 3. The black rectangles represent
different processes: TaskManagers, JobManagers and the Deployer; and we assume a single
Deployer process in all scenarios. The colored triangles represent the load of the
submission process, while the colored rectangles represent the load of the TaskManager
and JobManager processes. As shown in the figure, the Deployer in both per-job and
session mode share the same load. Their difference lies in the distribution of the
tasks and the JobManager load. In the Session Mode, there is a single JobManager for
all the jobs in the cluster while in the per-job mode, there is one for each job. In
addition, tasks in Session Mode are assigned randomly to TaskManagers while in Per-Job
Mode, each TaskManager can only have tasks of a single job.&lt;/p&gt;
&lt;h1 id=&quot;application-mode&quot;&gt;Application Mode&lt;/h1&gt;
&lt;p&gt;&lt;img style=&quot;float: right;margin-left:10px;margin-right: 15px;&quot; src=&quot;/img/blog/2020-07-14-application-mode/application.png&quot; width=&quot;320px&quot; alt=&quot;Application Mode&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The Application Mode builds on the above observations and tries to combine the resource
isolation of the per-job mode with a lightweight and scalable application submission
process. To achieve this, it creates a cluster &lt;em&gt;per submitted application&lt;/em&gt;, but this
time, the &lt;code&gt;main()&lt;/code&gt; method of the application is executed on the JobManager.&lt;/p&gt;
&lt;p&gt;Creating a cluster per application can be seen as creating a session cluster shared
only among the jobs of a particular application and torn down when the application
finishes. With this architecture, the Application Mode provides the same resource
isolation and load balancing guarantees as the Per-Job Mode, but at the granularity of
a whole application. This makes sense, as jobs belonging to the same application are
expected to be correlated and treated as a unit.&lt;/p&gt;
&lt;p&gt;Executing the &lt;code&gt;main()&lt;/code&gt; method on the JobManager allows saving the CPU cycles required
for extracting the job graph, but also the bandwidth required on the client for
downloading the dependencies locally and shipping the job graph and its dependencies
to the cluster. Furthermore, it spreads the network load more evenly, as there is one
JobManager per application. This is illustrated in the figure above, where we have the
same scenario as in the session and per-job deployment mode section, but this time
the client load has shifted to the JobManager of each application.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
In the Application Mode, the main() method is executed on the cluster and not on the Client, as in the other modes.
This may have implications for your code as, for example, any paths you register in your
environment using the registerCachedFile() must be accessible by the JobManager of
your application.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Compared to the Per-Job Mode, the Application Mode allows the submission of applications
consisting of multiple jobs. The order of job execution is not affected by the deployment
mode but by the call used to launch the job. Using the blocking &lt;code&gt;execute()&lt;/code&gt; method
establishes an order and will lead to the execution of the “next” job being postponed
until “this” job finishes. In contrast, the non-blocking &lt;code&gt;executeAsync()&lt;/code&gt; method will
immediately continue to submit the “next” job as soon as the current job is submitted.&lt;/p&gt;
&lt;h2 id=&quot;reducing-network-requirements&quot;&gt;Reducing Network Requirements&lt;/h2&gt;
&lt;p&gt;As described above, by executing the application’s &lt;code&gt;main()&lt;/code&gt; method on the JobManager,
the Application Mode manages to save a lot of the resources previously required during
job submission. But there is still room for improvement.&lt;/p&gt;
&lt;p&gt;Focusing on YARN, which already supports all the optimizations mentioned here&lt;sup id=&quot;fnref:2&quot;&gt;&lt;a href=&quot;#fn:2&quot; class=&quot;footnote&quot;&gt;2&lt;/a&gt;&lt;/sup&gt;, and
even with the Application Mode in place, the Client is still required to send the user
jar to the JobManager. In addition, &lt;em&gt;for each application&lt;/em&gt;, the Client has to ship to
the cluster the “flink-dist” directory which contains the binaries of the framework
itself, including the &lt;code&gt;flink-dist.jar&lt;/code&gt;, &lt;code&gt;lib/&lt;/code&gt; and &lt;code&gt;plugin/&lt;/code&gt; directories. These two can
account for a substantial amount of bandwidth on the client side. Furthermore, shipping
the same flink-dist binaries on every submission is both a waste of bandwidth but also
of storage space which can be alleviated by simply allowing applications to share the
same binaries.&lt;/p&gt;
&lt;p&gt;In Flink 1.11, we introduce options that allow the user to:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Specify a remote path to a directory where YARN can find the Flink distribution binaries, and&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Specify a remote path where YARN can find the user jar.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;For 1., we leverage YARN’s distributed cache and allow applications to share these
binaries. So, if an application happens to find copies of Flink on the local storage
of its TaskManager due to a previous application that was executed on the same
TaskManager, it will not even have to download it internally.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Both optimizations are available to all deployment modes on YARN, and not only the Application Mode.&lt;/p&gt;
&lt;/div&gt;
&lt;h1 id=&quot;example-application-mode-on-yarn&quot;&gt;Example: Application Mode on Yarn&lt;/h1&gt;
&lt;p&gt;For a full description, please refer to the official Flink documentation and more
specifically to the page that refers to your cluster management framework, &lt;em&gt;e.g.&lt;/em&gt;
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/yarn_setup.html#run-an-application-in-application-mode&quot;&gt;YARN&lt;/a&gt;
or &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/native_kubernetes.html#flink-kubernetes-application&quot;&gt;Kubernetes&lt;/a&gt;.
Here we will give some examples around YARN, where all the above features are available.&lt;/p&gt;
&lt;p&gt;To launch an application in Application Mode, you can use:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;&lt;b&gt;./bin/flink run-application -t yarn-application&lt;/b&gt; ./MyApplication.jar&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;With this command, all configuration parameters, such as the path to a savepoint to
be used to bootstrap the application’s state or the required JobManager/TaskManager
memory sizes, can be specified by their configuration option, prefixed by &lt;code&gt;-D&lt;/code&gt;. For
a catalog of the available configuration options, please refer to Flink’s
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html&quot;&gt;configuration page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As an example, the command to specify the memory sizes of the JobManager and the
TaskManager would look like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./bin/flink run-application -t yarn-application \
&lt;b&gt;-Djobmanager.memory.process.size=2048m&lt;/b&gt; \
&lt;b&gt;-Dtaskmanager.memory.process.size=4096m&lt;/b&gt; \
./MyApplication.jar
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As discussed earlier, the above will make sure that your application’s &lt;code&gt;main()&lt;/code&gt; method
will be executed on the JobManager.&lt;/p&gt;
&lt;p&gt;To further save the bandwidth of shipping the Flink distribution to the cluster, consider
pre-uploading the Flink distribution to a location accessible by YARN and using the
&lt;code&gt;yarn.provided.lib.dirs&lt;/code&gt; configuration option, as shown below:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./bin/flink run-application -t yarn-application \
-Djobmanager.memory.process.size=2048m \
-Dtaskmanager.memory.process.size=4096m \
&lt;b&gt;-Dyarn.provided.lib.dirs=&quot;hdfs://myhdfs/remote-flink-dist-dir&quot;&lt;/b&gt; \
./MyApplication.jar
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Finally, in order to further save the bandwidth required to submit your application jar,
you can pre-upload it to HDFS, and specify the remote path that points to
&lt;code&gt;./MyApplication.jar&lt;/code&gt;, as shown below:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;./bin/flink run-application -t yarn-application \
-Djobmanager.memory.process.size=2048m \
-Dtaskmanager.memory.process.size=4096m \
-Dyarn.provided.lib.dirs=&quot;hdfs://myhdfs/remote-flink-dist-dir&quot; \
&lt;b&gt;hdfs://myhdfs/jars/MyApplication.jar&lt;/b&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will make the job submission extra lightweight as the needed Flink jars and the
application jar are going to be picked up from the specified remote locations rather
than be shipped to the cluster by the Client. The only thing the Client will ship to
the cluster is the configuration of your application which includes all the
aforementioned paths.&lt;/p&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;We hope that this discussion helped you understand the differences between the various
deployment modes offered by Flink and will help you to make informed decisions about
which one is suitable in your own setup. Feel free to play around with them and report
any issues you may find. If you have any questions or requests, do not hesitate to post
them in the &lt;a href=&quot;https://wints.github.io/flink-web//community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt;
and, hopefully, see you (virtually) at one of our conferences or meetups soon!&lt;/p&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;ol&gt;
&lt;li id=&quot;fn:1&quot;&gt;
&lt;p&gt;The only exceptions are the Web Submission and the Standalone per-job implementation. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn:2&quot;&gt;
&lt;p&gt;Support for Kubernetes will come soon. &lt;a href=&quot;#fnref:2&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
<pubDate>Tue, 14 Jul 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/07/14/application-mode.html</link>
<guid isPermaLink="true">/news/2020/07/14/application-mode.html</guid>
</item>
<item>
<title>Apache Flink 1.11.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is proud to announce the release of Flink 1.11.0! More than 200 contributors worked on over 1.3k issues to bring significant improvements to usability as well as new features to Flink users across the whole API stack. Some highlights that we’re particularly excited about are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The core engine is introducing &lt;strong&gt;unaligned checkpoints&lt;/strong&gt;, a major change to Flink’s fault tolerance mechanism that improves checkpointing performance under heavy backpressure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;A &lt;strong&gt;new Source API&lt;/strong&gt; that simplifies the implementation of (custom) sources by unifying batch and streaming execution, as well as offloading internals such as event-time handling, watermark generation or idleness detection to Flink.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Flink SQL is introducing &lt;strong&gt;Support for Change Data Capture (CDC)&lt;/strong&gt; to easily consume and interpret database changelogs from tools like Debezium. The renewed &lt;strong&gt;FileSystem Connector&lt;/strong&gt; also expands the set of use cases and formats supported in the Table API/SQL, enabling scenarios like streaming data directly from Kafka to Hive.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Multiple performance optimizations to PyFlink, including support for &lt;strong&gt;vectorized User-defined Functions (Pandas UDFs)&lt;/strong&gt;. This improves interoperability with libraries like Pandas and NumPy, making Flink more powerful for data science and ML workloads.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Read on for all major new features and improvements, important changes to be aware of and what to expect moving forward!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#unaligned-checkpoints-beta&quot; id=&quot;markdown-toc-unaligned-checkpoints-beta&quot;&gt;Unaligned Checkpoints (Beta)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#unified-watermark-generators&quot; id=&quot;markdown-toc-unified-watermark-generators&quot;&gt;Unified Watermark Generators&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-data-source-api-beta&quot; id=&quot;markdown-toc-new-data-source-api-beta&quot;&gt;New Data Source API (Beta)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#application-mode-deployments&quot; id=&quot;markdown-toc-application-mode-deployments&quot;&gt;Application Mode Deployments&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#other-improvements&quot; id=&quot;markdown-toc-other-improvements&quot;&gt;Other Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-apisql-support-for-change-data-capture-cdc&quot; id=&quot;markdown-toc-table-apisql-support-for-change-data-capture-cdc&quot;&gt;Table API/SQL: Support for Change Data Capture (CDC)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-apisql-jdbc-catalog-interface-and-postgres-catalog&quot; id=&quot;markdown-toc-table-apisql-jdbc-catalog-interface-and-postgres-catalog&quot;&gt;Table API/SQL: JDBC Catalog Interface and Postgres Catalog&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-apisql-filesystem-connector-with-support-for-avro-orc-and-parquet&quot; id=&quot;markdown-toc-table-apisql-filesystem-connector-with-support-for-avro-orc-and-parquet&quot;&gt;Table API/SQL: FileSystem Connector with Support for Avro, ORC and Parquet&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-apisql-support-for-python-udfs&quot; id=&quot;markdown-toc-table-apisql-support-for-python-udfs&quot;&gt;Table API/SQL: Support for Python UDFs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#other-improvements-to-the-table-apisql&quot; id=&quot;markdown-toc-other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#pyflink-support-for-pandas-udfs&quot; id=&quot;markdown-toc-pyflink-support-for-pandas-udfs&quot;&gt;PyFlink: Support for Pandas UDFs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#other-improvements-to-pyflink&quot; id=&quot;markdown-toc-other-improvements-to-pyflink&quot;&gt;Other Improvements to PyFlink&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;The binary distribution and source artifacts are now available on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; of the Flink website, and the most recent distribution of PyFlink is available on &lt;a href=&quot;https://pypi.org/project/apache-flink/&quot;&gt;PyPI&lt;/a&gt;. Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html&quot;&gt;release notes&lt;/a&gt; carefully, and check the complete &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346364&amp;amp;styleName=Html&amp;amp;projectId=12315522&quot;&gt;release changelog&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/flink-docs-release-1.11/&quot;&gt;updated documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;We encourage you to download the release and share your feedback with the community through the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;unaligned-checkpoints-beta&quot;&gt;Unaligned Checkpoints (Beta)&lt;/h3&gt;
&lt;p&gt;Triggering a checkpoint in Flink will cause a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/internals/stream_checkpointing.html#barriers&quot;&gt;checkpoint barrier&lt;/a&gt; to flow from the sources of your topology all the way towards the sinks. For operators that receive more than one input stream, the barriers flowing through each channel need to be aligned before the operator can snapshot its state and forward the checkpoint barrier — typically, this alignment will take just a few milliseconds to complete, but it can become a bottleneck in backpressured pipelines as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Checkpoint barriers will flow much slower through backpressured channels, effectively blocking the remaining channels and their upstream operators during checkpointing;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Slow checkpoint barrier propagation leads to longer checkpointing times and can, worst case, result in little to no progress in the application.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To improve the performance of checkpointing under backpressure scenarios, the community is rolling out the first iteration of unaligned checkpoints (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints&quot;&gt;FLIP-76&lt;/a&gt;) with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), this approach doesn’t wait for barrier alignment across input channels, instead allowing barriers to overtake in-flight records (i.e., data stored in buffers) and forwarding them downstream before the synchronous part of the checkpoint takes place (Fig. 2).&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-6&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-07-06-release-1.11.0/image1.gif&quot; width=&quot;600px&quot; alt=&quot;Aligned Checkpoints&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.1:&lt;/b&gt; Aligned Checkpoints&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-6&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-07-06-release-1.11.0/image2.png&quot; width=&quot;600px&quot; alt=&quot;Unaligned Checkpoints&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.2:&lt;/b&gt; Unaligned Checkpoints&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Because in-flight records have to be persisted as part of the snapshot, unaligned checkpoints will lead to increased checkpoints sizes. On the upside, &lt;strong&gt;checkpointing times are heavily reduced&lt;/strong&gt;, so users will see more progress (even in unstable environments) as more up-to-date checkpoints will lighten the recovery process. You can learn more about the current limitations of unaligned checkpoints in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/checkpoints.html#unaligned-checkpoints&quot;&gt;documentation&lt;/a&gt;, and track the improvement work planned for this feature in &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14551&quot;&gt;FLINK-14551&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As with any beta feature, we appreciate early feedback that you might want to share with the community after giving unaligned checkpoints a try!&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot;&gt;Info&lt;/span&gt; To enable this feature, you need to configure the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/environment/CheckpointConfig.html&quot;&gt;&lt;code&gt;enableUnalignedCheckpoints&lt;/code&gt;&lt;/a&gt; option in your &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing&quot;&gt;checkpoint config&lt;/a&gt;. Please note that unaligned checkpoints can only be enabled if &lt;code&gt;checkpointingMode&lt;/code&gt; is set to &lt;code&gt;CheckpointingMode.EXACTLY_ONCE&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;unified-watermark-generators&quot;&gt;Unified Watermark Generators&lt;/h3&gt;
&lt;p&gt;So far, watermark generation (prev. also called &lt;em&gt;assignment&lt;/em&gt;) relied on two different interfaces: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPunctuatedWatermarks.html&quot;&gt;&lt;code&gt;AssignerWithPunctuatedWatermarks&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/streaming/api/functions/AssignerWithPeriodicWatermarks.html&quot;&gt;&lt;code&gt;AssignerWithPeriodicWatermarks&lt;/code&gt;&lt;/a&gt;; that were closely intertwined with timestamp extraction. This made it difficult to implement long-requested features like support for idleness detection, besides leading to code duplication and maintenance burden. With &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners&quot;&gt;FLIP-126&lt;/a&gt;, the legacy watermark assigners are unified into a single interface: the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkGenerator.html&quot;&gt;&lt;code&gt;WatermarkGenerator&lt;/code&gt;&lt;/a&gt;; and detached from the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/TimestampAssigner.html&quot;&gt;&lt;code&gt;TimestampAssigner&lt;/code&gt;&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This gives users more control over watermark emission and simplifies the implementation of new connectors that need to support watermark assignment and timestamp extraction at the source (see &lt;em&gt;&lt;a href=&quot;#new-data-source-api-beta&quot;&gt;New Data Source API&lt;/a&gt;&lt;/em&gt;). Multiple &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11//dev/event_timestamps_watermarks.html#introduction-to-watermark-strategies&quot;&gt;strategies for watermarking&lt;/a&gt; are available out-of-the-box as convenience methods in Flink 1.11 (e.g. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forBoundedOutOfOrderness-java.time.Duration-&quot;&gt;&lt;code&gt;forBoundedOutOfOrderness&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#forMonotonousTimestamps--&quot;&gt;&lt;code&gt;forMonotonousTimestamps&lt;/code&gt;&lt;/a&gt;), though you can also choose to customize your own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Support for Watermark Idleness Detection&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.html#withIdleness-java.time.Duration-&quot;&gt;&lt;code&gt;WatermarkStrategy.withIdleness()&lt;/code&gt;&lt;/a&gt; method allows you to mark a stream as idle if no events arrive within a configured time (i.e. a timeout duration), which in turn allows handling event time skew properly and preventing idle partitions from holding back the event time progress of the entire application. Users can already benefit from &lt;strong&gt;per-partition idleness detection&lt;/strong&gt; in the Kafka connector, which has been adapted to use the new interfaces (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17669&quot;&gt;FLINK-17669&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot;&gt;Note&lt;/span&gt; &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners&quot;&gt;FLIP-126&lt;/a&gt; introduces no breaking changes, but we recommend that users give preference to the new &lt;code&gt;WatermarkGenerator&lt;/code&gt; interface moving forward, in preparation for the deprecation of the legacy watermark assigners in future releases.&lt;/p&gt;
&lt;h3 id=&quot;new-data-source-api-beta&quot;&gt;New Data Source API (Beta)&lt;/h3&gt;
&lt;p&gt;Up to this point, writing a production-grade source connector for Flink was a non-trivial task that required users to be somewhat familiar with Flink internals and account for implementation details like event time assignment, watermark generation or idleness detection in their code. Flink 1.11 introduces a new Data Source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) to overcome these limitations, as well as the need to rewrite separate code for batch and streaming execution.&lt;/p&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-07-06-release-1.11.0/image3.png&quot; width=&quot;600px&quot; alt=&quot;Data Source API&quot; /&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Separating the work of split discovery and the actual reading of the consumed data (i.e. the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/sources.html#data-source-concepts&quot;&gt;&lt;em&gt;splits&lt;/em&gt;&lt;/a&gt;) in different components — resp. the &lt;code&gt;SplitEnumerator&lt;/code&gt; and &lt;code&gt;SourceReader&lt;/code&gt; — allows mixing and matching different enumeration strategies and split readers.&lt;/p&gt;
&lt;p&gt;As an example, the existing Kafka connector has multiple strategies for partition discovery that are intermingled with the rest of the code. With the new interfaces in place, it would only need a single reader implementation and there could be several split enumerators for the different partition discovery strategies.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Batch and Streaming Unification&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Source connectors implemented using the Data Source API will be able to work both as a bounded (&lt;em&gt;batch&lt;/em&gt;) and unbounded (&lt;em&gt;streaming&lt;/em&gt;) source. The difference between both cases is minimal: for bounded input, the &lt;code&gt;SplitEnumerator&lt;/code&gt; will generate a fixed set of splits and each split is finite; for unbounded input, either the splits are not finite or the &lt;code&gt;SplitEnumerator&lt;/code&gt; keeps generating new splits.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Implicit Watermark and Event Time Handling&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;TimestampAssigner&lt;/code&gt; and &lt;code&gt;WatermarkGenerator&lt;/code&gt; run transparently as part of the &lt;code&gt;SourceReader&lt;/code&gt; component, so users also don’t have to implement any timestamp extraction or watermark generation code.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot;&gt;Note&lt;/span&gt; The existing source connectors have not yet been reimplemented using the Data Source API — this is planned for upcoming releases. If you’re looking to implement a new source, please refer to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/sources.html#data-sources&quot;&gt;Data Source documentation&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/sources.html#the-data-source-api&quot;&gt;the tips on source development&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;application-mode-deployments&quot;&gt;Application Mode Deployments&lt;/h3&gt;
&lt;p&gt;Prior to Flink 1.11, jobs in a Flink application could either be submitted to a long-running &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#flink-session-cluster&quot;&gt;Flink Session Cluster&lt;/a&gt; (&lt;em&gt;session mode&lt;/em&gt;) or a dedicated &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/concepts/flink-architecture.html#flink-job-cluster&quot;&gt;Flink Job Cluster&lt;/a&gt; (&lt;em&gt;job mode&lt;/em&gt;). For both these modes, the &lt;code&gt;main()&lt;/code&gt; method of user programs runs on the &lt;em&gt;client&lt;/em&gt; side. This presents some challenges: on one hand, if the client is part of a large installation, it can easily become a bottleneck for &lt;code&gt;JobGraph&lt;/code&gt; generation; and on the other, it’s not a good fit for containerized environments like Docker or Kubernetes.&lt;/p&gt;
&lt;p&gt;From this release on, Flink gets an additional deployment mode: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/deployment/#application-mode&quot;&gt;Application Mode&lt;/a&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-85+Flink+Application+Mode&quot;&gt;FLIP-85&lt;/a&gt;); where the &lt;code&gt;main()&lt;/code&gt; method runs on the cluster, rather than the client. The job submission becomes a one-step process: you package your application logic and dependencies into an executable job JAR and the cluster entrypoint (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/client/deployment/application/ApplicationClusterEntryPoint.html&quot;&gt;&lt;code&gt;ApplicationClusterEntryPoint&lt;/code&gt;&lt;/a&gt;) is responsible for calling the &lt;code&gt;main()&lt;/code&gt; method to extract the &lt;code&gt;JobGraph&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In Flink 1.11, the community worked to already support &lt;em&gt;application mode&lt;/em&gt; in Kubernetes (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10934&quot;&gt;FLINK-10934&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id=&quot;other-improvements&quot;&gt;Other Improvements&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Unified Memory Configuration for JobManagers (&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-16614&quot;&gt;FLIP-116&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Following the work started in Flink 1.10 to improve memory management and configuration, this release introduces a new memory model that aligns the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/memory/mem_setup_jobmanager.html&quot;&gt;JobManagers’ configuration options&lt;/a&gt; and terminology with that introduced in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors&quot;&gt;FLIP-49&lt;/a&gt; for TaskManagers. This affects all deployment types: standalone, YARN, Mesos and the new active Kubernetes integration.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Attention&lt;/span&gt; Reusing a previous Flink configuration without any adjustments can result in differently computed memory parameters for the JVM and, as a result, performance changes or even failures. Make sure to check the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration&quot;&gt;migration guide&lt;/a&gt; if you’re planning to update to the latest version.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Improvements to the Flink WebUI (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal&quot;&gt;FLIP-75&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In Flink 1.11, the community kicked off a series of improvements to the Flink WebUI. The first to roll out are better TaskManager and JobManager log display (&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427143&quot;&gt;FLIP-103&lt;/a&gt;), as well as a new thread dump utility (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14816&quot;&gt;FLINK-14816&lt;/a&gt;). Some additional work planned for upcoming releases includes better backpressure detection, more flexible and configurable exception display and support for displaying the history of subtask failure attempts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Docker Image Unification (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-111%3A+Docker+image+unification&quot;&gt;FLIP-111&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;With this release, all Docker-related resources have been consolidated into &lt;a href=&quot;https://github.com/apache/flink-docker&quot;&gt;apache/flink-docker&lt;/a&gt; and the entry point script has been extended to allow users to run the default Docker image in different modes without the need to create a custom image. The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/ops/deployment/docker.html#customize-flink-image&quot;&gt;updated documentation&lt;/a&gt; describes in detail how to use and customize the official Flink Docker image for different environments and deployment modes.&lt;/p&gt;
&lt;hr /&gt;
&lt;h3 id=&quot;table-apisql-support-for-change-data-capture-cdc&quot;&gt;Table API/SQL: Support for Change Data Capture (CDC)&lt;/h3&gt;
&lt;p&gt;Change Data Capture (CDC) has become a popular pattern to capture committed changes from a database and propagate those changes to downstream consumers, for example to keep multiple datastores in sync and avoid common pitfalls such as &lt;a href=&quot;https://thorben-janssen.com/dual-writes/&quot;&gt;dual writes&lt;/a&gt;. Being able to easily ingest and interpret these changelogs into the Table API/SQL has been a highly demanded feature in the Flink community — and it’s now possible with Flink 1.11.&lt;/p&gt;
&lt;p&gt;To extend the scope of the Table API/SQL to use cases like CDC, Flink 1.11 introduces new table source and sink interfaces with &lt;strong&gt;changelog mode&lt;/strong&gt; (see &lt;em&gt;&lt;a href=&quot;#other-improvements-to-the-table-apisql&quot;&gt;New TableSource and TableSink Interfaces&lt;/a&gt;&lt;/em&gt;) and support for the &lt;a href=&quot;https://debezium.io/&quot;&gt;Debezium&lt;/a&gt; and &lt;a href=&quot;https://github.com/alibaba/canal&quot;&gt;Canal&lt;/a&gt; formats (&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=147427289&quot;&gt;FLIP-105&lt;/a&gt;). This means that dynamic tables sources are no longer limited to append-only operations and can ingest these external changelogs (&lt;code&gt;INSERT&lt;/code&gt; events), interpret them into change operations (&lt;code&gt;INSERT&lt;/code&gt;, &lt;code&gt;UPDATE&lt;/code&gt;, &lt;code&gt;DELETE&lt;/code&gt; events) and emit them downstream with the change type.&lt;/p&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-07-06-release-1.11.0/image4.png&quot; width=&quot;500px&quot; alt=&quot;CDC&quot; /&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Users have to specify either &lt;code&gt;“format=debezium-json”&lt;/code&gt; or &lt;code&gt;“format=canal-json”&lt;/code&gt; in their &lt;code&gt;CREATE TABLE&lt;/code&gt; statement to consume changelogs using SQL DDL.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;...&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- e.g. &amp;#39;kafka&amp;#39;&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;format&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;debezium-json&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;debezium-json.schema-include&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;true&amp;#39;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- default: false (Debezium can be configured to include or exclude the message schema)&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;debezium-json.ignore-parse-errors&amp;#39;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;true&amp;#39;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- default: false&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Flink 1.11 only supports Kafka as a changelog source out-of-the-box and JSON-encoded changelogs, with Avro (Debezium) and Protobuf (Canal) planned for future releases. There are also plans to support MySQL binlogs and Kafka compacted topics as sources, as well as to extend changelog support to batch execution.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Attention&lt;/span&gt; There is a known issue (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18461&quot;&gt;FLINK-18461&lt;/a&gt;) that prevents changelog sources from being used to write to upsert sinks (e.g. MySQL, HBase, Elasticsearch). This will be fixed in the next patch release (1.11.1).&lt;/p&gt;
&lt;h3 id=&quot;table-apisql-jdbc-catalog-interface-and-postgres-catalog&quot;&gt;Table API/SQL: JDBC Catalog Interface and Postgres Catalog&lt;/h3&gt;
&lt;p&gt;Flink 1.11 introduces a generic JDBC catalog interface (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog&quot;&gt;FLIP-93&lt;/a&gt;) that enables users of the Table API/SQL to &lt;strong&gt;derive table schemas automatically&lt;/strong&gt; from connections to relational databases over &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#jdbc-connector&quot;&gt;JDBC&lt;/a&gt;. This eliminates the previous need for manual schema definition and type conversion, and also allows to check for schema errors at compile time instead of runtime.&lt;/p&gt;
&lt;p&gt;The first implementation, rolling out with the new release, is the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/catalogs.html#postgrescatalog&quot;&gt;Postgres catalog&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;table-apisql-filesystem-connector-with-support-for-avro-orc-and-parquet&quot;&gt;Table API/SQL: FileSystem Connector with Support for Avro, ORC and Parquet&lt;/h3&gt;
&lt;p&gt;To improve the user experience for end-to-end streaming ETL use cases, the Flink community worked on a new FileSystem Connector for the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table&quot;&gt;FLIP-115&lt;/a&gt;). The implementation is based on Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/filesystems/index.html&quot;&gt;FileSystem abstraction&lt;/a&gt; and reuses &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/streamfile_sink.html&quot;&gt;StreamingFileSink&lt;/a&gt; to ensure the same set of capabilities and consistent behaviour with the DataStream API.&lt;/p&gt;
&lt;p&gt;This also means that Table API/SQL users can now make use of all formats already supported by StreamingFileSink, like (Avro) Parquet, as well as the new formats introduced with this release, like Avro (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11395&quot;&gt;FLINK-11395&lt;/a&gt;) and Orc (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10114&quot;&gt;FLINK-10114&lt;/a&gt;).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;column_name1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;column_name2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;part_name1&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;part_name2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PARTITIONED&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;part_name1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;part_name2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;filesystem&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;path&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;file:///path/to/file,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt; &amp;#39;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39; = &amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;-- supported formats: Avro, ORC, Parquet, CSV, JSON &lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The new all-rounder FileSystem Connector transparently handles batch and streaming execution, provides exactly-once guarantees and has full partition support, greatly expanding the scope of usage of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#file-system-connector&quot;&gt;legacy connector&lt;/a&gt;. This allows users to easily implement common use cases like &lt;strong&gt;directly streaming data from Kafka to Hive&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;You can track the upcoming improvements to the FileSystem Connector in &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17778&quot;&gt;FLINK-17778&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;table-apisql-support-for-python-udfs&quot;&gt;Table API/SQL: Support for Python UDFs&lt;/h3&gt;
&lt;p&gt;Prior to this release, users of the Table API/SQL were limited to defining UDFs in either Java or Scala. In Flink 1.11, the community worked on expanding the usage scope of the Python language beyond PyFlink and providing support for Python UDFs in the SQL DDL syntax (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL&quot;&gt;FLIP-106&lt;/a&gt;), as well as the SQL Client (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client&quot;&gt;FLIP-114&lt;/a&gt;). Users can also register Python UDFs in the system catalog via SQL DDL or the Java Catalog API, so that functions can be shared between jobs.&lt;/p&gt;
&lt;h3 id=&quot;other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;DDL and DML Compatibility for the Hive Connector (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-123%3A+DDL+and+DML+compatibility+for+Hive+connector&quot;&gt;FLIP-123&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Starting from Flink 1.11, users can write SQL statements directly using Hive syntax (HiveQL) in the Table API/SQL and the SQL Client. For this purpose, an additional dialect was introduced and users can now dynamically switch between Flink (&lt;code&gt;default&lt;/code&gt;) and Hive (&lt;code&gt;hive&lt;/code&gt;) on a per-statement basis. For a complete list of supported DDL and DML statements, check the Hive dialect &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/hive/hive_dialect.html#hive-dialect&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Extensions and Improvements to the Flink SQL Syntax&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Flink 1.11 introduces the concept of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/create.html#create-table&quot;&gt;primary key constraints&lt;/a&gt; to leverage runtime optimizations in Flink SQL DDL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP+87%3A+Primary+key+constraints+in+Table+API&quot;&gt;FLIP-87&lt;/a&gt;);&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;View objects are now fully supported in SQL DDL using the &lt;code&gt;CREATE&lt;/code&gt;/&lt;code&gt;ALTER&lt;/code&gt;/&lt;code&gt;DROP VIEW&lt;/code&gt; statements (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-71%3A+E2E+View+support+in+FLINK+SQL&quot;&gt;FLIP-71&lt;/a&gt;);&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Users can now specify or override table options in their DQL/DML statements using &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/hints.html#dynamic-table-options&quot;&gt;dynamic table options&lt;/a&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL&quot;&gt;FLIP-113&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To make connector properties less verbose and improve exception handling, some key properties have been refactored (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory&quot;&gt;FLIP-122&lt;/a&gt;). This change does not break compatibility, so users can still use the old property keys.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;New TableSource and TableSink Interfaces (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces&quot;&gt;FLIP-95&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Flink 1.11 introduces new table source and sink interfaces (resp. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/source/DynamicTableSource.html&quot;&gt;&lt;code&gt;DynamicTableSource&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/connector/sink/DynamicTableSink.html&quot;&gt;&lt;code&gt;DynamicTableSink&lt;/code&gt;&lt;/a&gt;) that unify batch and streaming execution, provide more efficient data processing with the Blink planner and offer support for handling changelogs (see &lt;em&gt;&lt;a href=&quot;#table-apisql-support-for-change-data-capture-cdc&quot;&gt;Support for Change Data Capture (CDC)&lt;/a&gt;&lt;/em&gt;). The new interfaces also make it easier for users to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sourceSinks.html#full-stack-example&quot;&gt;implement custom connectors&lt;/a&gt; or modify existing ones. For an end-to-end example on how to implement a custom scan table source with a decoding format that supports changelog semantics, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sourceSinks.html#full-stack-example&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot;&gt;Note&lt;/span&gt; Although compatibility is not immediately affected, we recommend that Table API/SQL users update any sources and sinks to the new interface stack.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Refactored TableEnvironment Interface (&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878&quot;&gt;FLIP-84&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The semantics to describe similar behaviours in the &lt;code&gt;TableEnvironment&lt;/code&gt; and &lt;code&gt;Table&lt;/code&gt; interfaces have diverged over time, leading to an inconsistent and sometimes unclear user experience. To improve this and make programming more fluent in the Table API/SQL, Flink 1.11 introduces new methods that unify behaviours like execution triggering (e.g. &lt;code&gt;executeSql()&lt;/code&gt;) and result representation (e.g. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/api/TableResult.html#print--&quot;&gt;&lt;code&gt;print()&lt;/code&gt;&lt;/a&gt;, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/api/java/org/apache/flink/table/api/TableResult.html#collect--&quot;&gt;&lt;code&gt;collect()&lt;/code&gt;&lt;/a&gt;), and also lay the groundwork for important features like &lt;a href=&quot;https://lists.apache.org/thread.html/r076e63bf6c8ed42d1b9ed2b406029696274a3a90cc360bc3a03e65d2%40%3Cdev.flink.apache.org%3E&quot;&gt;multi-statement execution support&lt;/a&gt; in future releases.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot;&gt;Note&lt;/span&gt; The methods deprecated with FLIP-84 will not be immediately removed, but we recommend that users adopt the newly introduced methods. For a complete list of new and deprecated methods, check the “Summary” section of &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878&quot;&gt;FLIP-84&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;New Type Inference for Table API UDFs (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-65%3A+New+type+inference+for+Table+API+UDFs&quot;&gt;FLIP-65&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In Flink 1.9, the community started working on a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/types.html#data-types&quot;&gt;new data type system&lt;/a&gt; for the Table API to improve its compliance with the SQL standard (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System&quot;&gt;FLIP-37&lt;/a&gt;). This work is now close to being completed in Flink 1.11, with the exposure of Table API UDFs to the new type system (scalar and table functions, with aggregate functions planned for the next release).&lt;/p&gt;
&lt;hr /&gt;
&lt;h3 id=&quot;pyflink-support-for-pandas-udfs&quot;&gt;PyFlink: Support for Pandas UDFs&lt;/h3&gt;
&lt;p&gt;Up to this release, Python UDFs in PyFlink only supported scalar values of standard Python types. This presented some limitations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;High serialization/deserialization overhead in the process of transferring data between the JVM and the Python processes;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hard to integrate with common Python libraries for high-performance numerical processing like pandas and NumPy.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To overcome these limitations, the community introduced support for (scalar) &lt;strong&gt;vectorized Python UDFs&lt;/strong&gt; based on &lt;a href=&quot;https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html&quot;&gt;pandas&lt;/a&gt; in Flink 1.11 (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink&quot;&gt;FLIP-97&lt;/a&gt;). The performance of vectorized UDFs is usually much higher, as the serialization/deserialization overhead is minimized by falling back to &lt;a href=&quot;https://arrow.apache.org/&quot;&gt;Apache Arrow&lt;/a&gt;; and handling &lt;code&gt;pandas.Series&lt;/code&gt; as input/output allows to take full advantage of the pandas and NumPy libraries. This makes Pandas UDFs a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;pandas&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To mark a UDF as a Pandas UDF, you only need to add an extra parameter &lt;code&gt;udf_type=”pandas”&lt;/code&gt; in the udf decorator, as described in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/vectorized_python_udfs.html#vectorized-user-defined-functions&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;other-improvements-to-pyflink&quot;&gt;Other Improvements to PyFlink&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Conversion fromPandas/toPandas (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame&quot;&gt;FLIP-120&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Arrow is also supported as an optimization to convert between PyFlink tables and &lt;a href=&quot;https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html&quot;&gt;&lt;code&gt;pandas.DataFrames&lt;/code&gt;&lt;/a&gt;, enabling users to switch processing engines seamlessly without the need for an intermediate connector. For examples on how to use the new &lt;code&gt;fromPandas()&lt;/code&gt; and &lt;code&gt;toPandas()&lt;/code&gt; methods in PyFlink, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/conversion_of_pandas.html#conversions-between-pyflink-table-and-pandas-dataframe&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Support for User-defined Table Functions (UDTFs) (&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-14500&quot;&gt;FLINK-14500&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;From Flink 1.11, you can define and register custom &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/python_udfs.html#table-functions&quot;&gt;UDTFs&lt;/a&gt; in PyFlink. Similar to a Python UDF, a UDTF takes zero, one or multiple scalar values as input, but can return an arbitrary number of rows as output instead of a single value.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cython Performance Optimization for UDFs (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function&quot;&gt;FLIP-121&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://cython.readthedocs.io/en/latest/src/quickstart/cythonize.html&quot;&gt;Cython&lt;/a&gt; is a compiled superset of the Python language that is often used to improve the performance of large-scale numeric processing in Python, as it optimizes execution to machine code-level speed and pairs well with popular C-based libraries like NumPy. From Flink 1.11, you can build &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/flinkDev/building.html#build-pyflink&quot;&gt;PyFlink with Cython support&lt;/a&gt; and “Cythonize” your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;User-defined Metrics in Python UDFs (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-112%3A+Support+User-Defined+Metrics+in++Python+UDF&quot;&gt;FLIP-112&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;To make it easier for users to monitor and debug the execution of Python UDFs, PyFlink now allows gathering and exposing metrics to external systems, as well as defining user scopes and variables. You can access the metrics system from a UDF by calling &lt;code&gt;function_context.get_metric_group()&lt;/code&gt; in the open method, as described in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/metrics.html#registering-metrics&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-17339&quot;&gt;FLINK-17339&lt;/a&gt;] The Blink planner is the &lt;strong&gt;default&lt;/strong&gt; in the Table API/SQL starting from Flink 1.11. This was already the case for the SQL Client since Flink 1.10. The old Flink planner is still supported, but not actively developed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5763&quot;&gt;FLINK-5763&lt;/a&gt;] Savepoints now contain all their state inside a single directory (both metadata and program state). This makes it straightforward to figure out which files make up the state of a savepoint and allows users to &lt;strong&gt;relocate savepoints&lt;/strong&gt; by simply moving a directory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16408&quot;&gt;FLINK-16408&lt;/a&gt;] To reduce pressure on the JVM metaspace, the user code class loader is being reused by a &lt;code&gt;TaskExecutor&lt;/code&gt; as long as there is at least a single slot allocated for the respective job. This changes Flink’s recovery behaviour slightly, so that it will not reload static fields.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11086&quot;&gt;FLINK-11086&lt;/a&gt;] Flink now supports Hadoop versions above &lt;strong&gt;Hadoop 3.0.0&lt;/strong&gt;. Note that the Flink project does not provide any updated “flink-shaded-hadoop-*” jars. Users need to provide Hadoop dependencies through the &lt;code&gt;HADOOP_CLASSPATH&lt;/code&gt; environment variable (recommended) or the lib/ folder.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16963&quot;&gt;FLINK-16963&lt;/a&gt;] All &lt;code&gt;MetricReporters&lt;/code&gt; that come with Flink have been converted to plugins. These should no longer be placed into &lt;code&gt;/lib&lt;/code&gt; (which may result in dependency conflicts), but &lt;code&gt;/plugins/&amp;lt;some_directory&amp;gt;&lt;/code&gt; instead.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12639&quot;&gt;FLINK-12639&lt;/a&gt;] The Flink &lt;strong&gt;documentation&lt;/strong&gt; is undergoing some &lt;strong&gt;rework&lt;/strong&gt;, so you might notice that the navigation and organization of content look slightly different starting from Flink 1.11.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/release-notes/flink-1.11.html&quot;&gt;release notes&lt;/a&gt; carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.11. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;The Apache Flink community would like to thank all the 200+ contributors that have made this release possible:&lt;/p&gt;
&lt;p&gt;Aitozi, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Andrey Zagrebin, Arvid Heise, Ayush Saxena, Bairos, Bartosz Krasinski, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Canbin Zheng, Cedric Chen, Chesnay Schepler, Colm O hEigeartaigh, Congxian Qiu, CrazyTomatoOo, Danish Amjad, Danny Chan, David Anderson, Dawid Wysakowicz, Dian Fu, Dominik Wosiński, Echo Lee, Ethan Marsh, Etienne Chauchot, Fabian Hueske, Fabian Paul, Flavio Pompermaier, Gao Yun, Gary Yao, Ghildiyal, Grebennikov Roman, GuoWei Ma, Guru Prasad, Gyula Fora, Hequn Cheng, Hu Guang, HuFeiHu, HuangXingBo, Igal Shilman, Ismael Juma, Jacob Sevart, Jark Wu, Jaskaran Bindra, Jason K, Jeff Yang, Jeff Zhang, Jerry Wang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jiayi-Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Jun Qin, JunZhang, Jörn Kottmann, Kevin Bohinski, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Lining Jing, Liupengcheng, LululuAlu, Marta Paes Moreira, Matt Welke, Max Kuklinski, Maximilian Michels, Nico Kruber, Niels Basjes, Oleksandr Nitavskyi, Paul Lam, Paul Lin, PengFei Li, PengchengLiu, Piotr Nowojski, Prem Santosh, Qingsheng Ren, Rafi Aroch, Raymond Farrelly, Richard Deurwaarder, Robert Metzger, RocMarshal, Roey Shem Tov, Roman, Roman Khachatryan, Rong Rong, RoyRuan, Rui Li, Seth Wiesman, Shaobin.Ou, Shengkai, Shuiqiang Chen, Shuo Cheng, Sivaprasanna, Sivaprasanna S, SteNicholas, Stefan Richter, Stephan Ewen, Steve OU, Steve Whelan, Tartarus, Terry Wang, Thomas Weise, Till Rohrmann, Timo Walther, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, Wei Zhong, Weike DONG, Xiaogang Zhou, Xintong Song, Xu Bai, Xuannan, Yadong Xie, Yang Wang, Yangze Guo, Yichao Yang, Ying, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Yuval Itzchakov, Zakelly, Zhao, Zhenghua Gao, Zhijiang, Zhu Zhu, acqua.csq, austin ce, azagrebin, bdine, bowen.li, caoyingjie, caozhen, caozhen1937, chaojianok, chen, chendonglin, comsir, cpugputpu, czhang2, dianfu, edu05, eduardowt, fangliang, felixzheng, fmyblack, gauss, gk0916, godfrey he, godfreyhe, guliziduo, guowei.mgw, hehuiyuan, hequn8128, hpeter, huangxingbo, huzheng, ifndef-SleePy, jingwen-ywb, jrthe42, kevin.cyj, klion26, lamber-ken, leesf, libenchao, lijiewang.wlj, liuyongvs, lsy, lumen, machinedoll, mans2singh, molsionmo, oliveryunchang, openinx, paul8263, ptmagic, qqibrow, sev7e0, shuai-xu, shuai.xu, shuiqiangchen, snuyanzin, spafka, sunhaibotb, sunjincheng121, testfixer, tison, vinoyang, vthinkxie, wangtong, wangxianghu, wangxiyuan, wangxlong, wangyang0918, wenlong.lwl, whlwanghailong, william, windWheel, wooplevip, wuxuyang, xushiwei, xuyang1706, yanghua, yangyichao-mango, yuzhao.cyz, zentol, zhanglibing, zhangmang, zhangzhanchun, zhengcanbin, zhengshuli, zhenxianyimeng, zhijiang, zhongyong jin, zhule, zhuxiaoshang, zjuwangg, zoudan, zoudaokoulife, zzchun, “lzh576177775”, 骚sir, 厉颖, 张军, 曹建华, 漫步云端&lt;/p&gt;
</description>
<pubDate>Mon, 06 Jul 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/07/06/release-1.11.0.html</link>
<guid isPermaLink="true">/news/2020/07/06/release-1.11.0.html</guid>
</item>
<item>
<title>Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 2</title>
<description>&lt;p&gt;In a previous post, we introduced the basics of Flink on Zeppelin and how to do Streaming ETL. In this second part of the “Flink on Zeppelin” series of posts, I will share how to
perform streaming data visualization via Flink on Zeppelin and how to use Apache Flink UDFs in Zeppelin.&lt;/p&gt;
&lt;h1 id=&quot;streaming-data-visualization&quot;&gt;Streaming Data Visualization&lt;/h1&gt;
&lt;p&gt;With &lt;a href=&quot;https://zeppelin.apache.org/&quot;&gt;Zeppelin&lt;/a&gt;, you can build a real time streaming dashboard without writing any line of javascript/html/css code.&lt;/p&gt;
&lt;p&gt;Overall, Zeppelin supports 3 kinds of streaming data analytics:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Single Mode&lt;/li&gt;
&lt;li&gt;Update Mode&lt;/li&gt;
&lt;li&gt;Append Mode&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;single-mode&quot;&gt;Single Mode&lt;/h3&gt;
&lt;p&gt;Single mode is used for cases when the result of a SQL statement is always one row, such as the following example.
The output format is translated in HTML, and you can specify a paragraph local property template for the final output content template.
And you can use &lt;code&gt;{i}&lt;/code&gt; as placeholder for the {i}th column of the result.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_single_mode.gif&quot; width=&quot;80%&quot; alt=&quot;Single Mode&quot; /&gt;
&lt;/center&gt;
&lt;h3 id=&quot;update-mode&quot;&gt;Update Mode&lt;/h3&gt;
&lt;p&gt;Update mode is suitable for the cases when the output format is more than one row,
and will always be continuously updated. Here’s one example where we use &lt;code&gt;GROUP BY&lt;/code&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_update_mode.gif&quot; width=&quot;80%&quot; alt=&quot;Update Mode&quot; /&gt;
&lt;/center&gt;
&lt;h3 id=&quot;append-mode&quot;&gt;Append Mode&lt;/h3&gt;
&lt;p&gt;Append mode is suitable for the cases when the output data is always appended.
For instance, the example below uses a tumble window.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_append_mode.gif&quot; width=&quot;80%&quot; alt=&quot;Append Mode&quot; /&gt;
&lt;/center&gt;
&lt;h1 id=&quot;udf&quot;&gt;UDF&lt;/h1&gt;
&lt;p&gt;SQL is a very powerful language, especially in expressing data flow. But most of the time, you need to handle complicated business logic that cannot be expressed by SQL.
In these cases UDFs (user-defined functions) come particularly handy. In Zeppelin, you can write Scala or Python UDFs, while you can also import Scala, Python and Java UDFs.
Here are 2 examples of Scala and Python UDFs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scala UDF&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flink&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScalaUpper&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScalarFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toUpperCase&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;btenv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;scala_upper&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScalaUpper&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Python UDF&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;o&quot;&gt;%&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;flink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pyflink&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PythonUpper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ScalarFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;upper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;bt_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;python_upper&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PythonUpper&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After you define the UDFs, you can use them directly in SQL:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use Scala UDF in SQL&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_scala_udf.png&quot; width=&quot;100%&quot; alt=&quot;Scala UDF&quot; /&gt;
&lt;/center&gt;
&lt;ul&gt;
&lt;li&gt;Use Python UDF in SQL&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-23-flink-on-zeppelin-part2/flink_python_udf.png&quot; width=&quot;100%&quot; alt=&quot;Python UDF&quot; /&gt;
&lt;/center&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this post, we explained how to perform streaming data visualization via Flink on Zeppelin and how to use UDFs.
Besides that, you can do more in Zeppelin with Flink, such as batch processing, Hive integration and more.
You can check the following articles for more details and here’s a list of &lt;a href=&quot;https://www.youtube.com/watch?v=YxPo0Fosjjg&amp;amp;list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX&quot;&gt;Flink on Zeppelin tutorial videos&lt;/a&gt; for your reference.&lt;/p&gt;
&lt;h1 id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://zeppelin.apache.org&quot;&gt;Apache Zeppelin official website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-1-get-started-2591aaa6aa47&quot;&gt;Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-2-batch-711731df5ad9&quot;&gt;Part 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-3-streaming-5fca1e16754&quot;&gt;Part 3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-4-advanced-usage-998b74908cd9&quot;&gt;Part 4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=YxPo0Fosjjg&amp;amp;list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX&quot;&gt;Flink on Zeppelin tutorial videos&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 23 Jun 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/ecosystem/2020/06/23/flink-on-zeppelin-part2.html</link>
<guid isPermaLink="true">/ecosystem/2020/06/23/flink-on-zeppelin-part2.html</guid>
</item>
<item>
<title>Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 1</title>
<description>&lt;p&gt;The latest release of &lt;a href=&quot;https://zeppelin.apache.org/&quot;&gt;Apache Zeppelin&lt;/a&gt; comes with a redesigned interpreter for Apache Flink (version Flink 1.10+ is only supported moving forward)
that allows developers to use Flink directly on Zeppelin notebooks for interactive data analysis. I wrote 2 posts about how to use Flink in Zeppelin. This is part-1 where I explain how the Flink interpreter in Zeppelin works,
and provide a tutorial for running Streaming ETL with Flink on Zeppelin.&lt;/p&gt;
&lt;h1 id=&quot;the-flink-interpreter-in-zeppelin-09&quot;&gt;The Flink Interpreter in Zeppelin 0.9&lt;/h1&gt;
&lt;p&gt;The Flink interpreter can be accessed and configured from Zeppelin’s interpreter settings page.
The interpreter has been refactored so that Flink users can now take advantage of Zeppelin to write Flink applications in three languages,
namely Scala, Python (PyFlink) and SQL (for both batch &amp;amp; streaming executions).
Zeppelin 0.9 now comes with the Flink interpreter group, consisting of the below five interpreters:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;%flink - Provides a Scala environment&lt;/li&gt;
&lt;li&gt;%flink.pyflink - Provides a python environment&lt;/li&gt;
&lt;li&gt;%flink.ipyflink - Provides an ipython environment&lt;/li&gt;
&lt;li&gt;%flink.ssql - Provides a stream sql environment&lt;/li&gt;
&lt;li&gt;%flink.bsql - Provides a batch sql environment&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Not only has the interpreter been extended to support writing Flink applications in three languages, but it has also extended the available execution modes for Flink that now include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Running Flink in Local Mode&lt;/li&gt;
&lt;li&gt;Running Flink in Remote Mode&lt;/li&gt;
&lt;li&gt;Running Flink in Yarn Mode&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can find more information about how to get started with Zeppelin and all the execution modes for Flink applications in &lt;a href=&quot;https://github.com/apache/zeppelin/tree/master/notebook/Flink%20Tutorial&quot;&gt;Zeppelin notebooks&lt;/a&gt; in this post.&lt;/p&gt;
&lt;h1 id=&quot;flink-on-zeppelin-for-stream-processing&quot;&gt;Flink on Zeppelin for Stream processing&lt;/h1&gt;
&lt;p&gt;Performing stream processing jobs with Apache Flink on Zeppelin allows you to run most major streaming cases,
such as streaming ETL and real time data analytics, with the use of Flink SQL and specific UDFs.
Below we showcase how you can execute streaming ETL using Flink on Zeppelin:&lt;/p&gt;
&lt;p&gt;You can use Flink SQL to perform streaming ETL by following the steps below
(for the full tutorial, please refer to the &lt;a href=&quot;https://github.com/apache/zeppelin/blob/master/notebook/Flink%20Tutorial/4.%20Streaming%20ETL_2EYD56B9B.zpln&quot;&gt;Flink Tutorial/Streaming ETL tutorial&lt;/a&gt; of the Zeppelin distribution):&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Step 1. Create source table to represent the source data.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-15-flink-on-zeppelin/create_source.png&quot; width=&quot;80%&quot; alt=&quot;Create Source Table&quot; /&gt;
&lt;/center&gt;
&lt;ul&gt;
&lt;li&gt;Step 2. Create a sink table to represent the processed data.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-15-flink-on-zeppelin/create_sink.png&quot; width=&quot;80%&quot; alt=&quot;Create Sink Table&quot; /&gt;
&lt;/center&gt;
&lt;ul&gt;
&lt;li&gt;Step 3. After creating the source and sink table, we can insert them to our statement to trigger the stream processing job as the following:&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-15-flink-on-zeppelin/etl.png&quot; width=&quot;80%&quot; alt=&quot;ETL&quot; /&gt;
&lt;/center&gt;
&lt;ul&gt;
&lt;li&gt;Step 4. After initiating the streaming job, you can use another SQL statement to query the sink table to verify the results of your job. Here you can see the top 10 records which will be refreshed every 3 seconds.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-15-flink-on-zeppelin/preview.png&quot; width=&quot;80%&quot; alt=&quot;Preview&quot; /&gt;
&lt;/center&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this post, we explained how the redesigned Flink interpreter works in Zeppelin 0.9.0 and provided some examples for performing streaming ETL jobs with
Flink and Zeppelin. In the next post, I will talk about how to do streaming data visualization via Flink on Zeppelin.
Besides that, you can find an additional &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-2-batch-711731df5ad9&quot;&gt;tutorial for batch processing with Flink on Zeppelin&lt;/a&gt; as well as using Flink on Zeppelin for
more advance operations like resource isolation, job concurrency &amp;amp; parallelism, multiple Hadoop &amp;amp; Hive environments and more on our series of posts on Medium.
And here’s a list of &lt;a href=&quot;https://www.youtube.com/watch?v=YxPo0Fosjjg&amp;amp;list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX&quot;&gt;Flink on Zeppelin tutorial videos&lt;/a&gt; for your reference.&lt;/p&gt;
&lt;h1 id=&quot;references&quot;&gt;References&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;http://zeppelin.apache.org&quot;&gt;Apache Zeppelin official website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-1-get-started-2591aaa6aa47&quot;&gt;Part 1&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-2-batch-711731df5ad9&quot;&gt;Part 2&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-3-streaming-5fca1e16754&quot;&gt;Part 3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Flink on Zeppelin tutorials - &lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-4-advanced-usage-998b74908cd9&quot;&gt;Part 4&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=YxPo0Fosjjg&amp;amp;list=PL4oy12nnS7FFtg3KV1iS5vDb0pTz12VcX&quot;&gt;Flink on Zeppelin tutorial videos&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 15 Jun 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/06/15/flink-on-zeppelin-part1.html</link>
<guid isPermaLink="true">/news/2020/06/15/flink-on-zeppelin-part1.html</guid>
</item>
<item>
<title>Flink Community Update - June&#39;20</title>
<description>&lt;p&gt;And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020.&lt;/p&gt;
&lt;p&gt;To top it off, a piece of good news: &lt;a href=&quot;https://www.flink-forward.org/global-2020&quot;&gt;Flink Forward&lt;/a&gt; is back on October 19-22 as a free virtual event!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#the-past-month-in-flink&quot; id=&quot;markdown-toc-the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-stateful-functions-21-release&quot; id=&quot;markdown-toc-flink-stateful-functions-21-release&quot;&gt;Flink Stateful Functions 2.1 Release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#testing-is-on-for-flink-111&quot; id=&quot;markdown-toc-testing-is-on-for-flink-111&quot;&gt;Testing is ON for Flink 1.11&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-minor-releases&quot; id=&quot;markdown-toc-flink-minor-releases&quot;&gt;Flink Minor Releases&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-1101&quot; id=&quot;markdown-toc-flink-1101&quot;&gt;Flink 1.10.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-forward-global-virtual-conference-2020&quot; id=&quot;markdown-toc-flink-forward-global-virtual-conference-2020&quot;&gt;Flink Forward Global Virtual Conference 2020&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#google-season-of-docs-2020&quot; id=&quot;markdown-toc-google-season-of-docs-2020&quot;&gt;Google Season of Docs 2020&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/h1&gt;
&lt;h2 id=&quot;flink-stateful-functions-21-release&quot;&gt;Flink Stateful Functions 2.1 Release&lt;/h2&gt;
&lt;p&gt;It might seem like &lt;a href=&quot;https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html&quot;&gt;Stateful Functions 2.0 was announced&lt;/a&gt; only a handful of weeks ago (and it was!), but the Flink community has just released Stateful Functions 2.1! This release introduces two new features: state expiration for any kind of persisted state and support for UNIX Domain Sockets (UDS) to improve the performance of inter-container communication in co-located deployments; as well as other important changes that improve the overall stability and testability of the project. You can read the &lt;a href=&quot;https://flink.apache.org/news/2020/06/09/release-statefun-2.1.0.html&quot;&gt;announcement blogpost&lt;/a&gt; for more details on the release!&lt;/p&gt;
&lt;p&gt;As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration. If you’d like to get involved, we’re always &lt;a href=&quot;https://github.com/apache/flink-statefun#contributing&quot;&gt;looking for new contributors&lt;/a&gt; — especially around SDKs for other languages (e.g. Go, Rust, Javascript).&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;testing-is-on-for-flink-111&quot;&gt;Testing is ON for Flink 1.11&lt;/h2&gt;
&lt;p&gt;Things have been pretty quiet in the Flink community, as all efforts shifted to testing the newest features shipping with Flink 1.11. While we wait for a voting Release Candidate (RC) to be out, you can check the progress of testing in &lt;a href=&quot;https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=364&amp;amp;projectKey=FLINK&quot;&gt;this JIRA burndown board&lt;/a&gt; and learn more about some of the &lt;a href=&quot;https://flink.apache.org/news/2020/05/07/community-update.html#warming-up-for-flink-111&quot;&gt;upcoming features&lt;/a&gt; in these Flink Forward videos:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=ssEmeLcL5Uk&quot;&gt;Rethinking of fault tolerance in Flink: what lies ahead?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=t7fAN3xNJ3Q&quot;&gt;It’s finally here: Python on Flink &amp;amp; Flink on Zeppelin&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=KDD8e4GE12w&quot;&gt;A deep dive into Flink SQL&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=4ce1H9CRyEc&quot;&gt;Production-Ready Flink and Hive Integration - what story you can tell now?&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We encourage the wider community to also get involved in testing once the voting RC is out. Keep an eye on the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@dev mailing list&lt;/a&gt; for updates!&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;flink-minor-releases&quot;&gt;Flink Minor Releases&lt;/h2&gt;
&lt;h3 id=&quot;flink-1101&quot;&gt;Flink 1.10.1&lt;/h3&gt;
&lt;p&gt;The community released Flink 1.10.1, covering some outstanding bugs in Flink 1.10. You can find more in the &lt;a href=&quot;https://flink.apache.org/news/2020/05/12/release-1.10.1.html&quot;&gt;announcement blogpost&lt;/a&gt;!&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;
&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;2 new Committers&lt;/strong&gt; since the last update. Congratulations!&lt;/p&gt;
&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars3.githubusercontent.com/u/4471524?s=400&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;Benchao Li&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars0.githubusercontent.com/u/6509172?s=400&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;Xintong Song&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;
&lt;h2 id=&quot;flink-forward-global-virtual-conference-2020&quot;&gt;Flink Forward Global Virtual Conference 2020&lt;/h2&gt;
&lt;p&gt;After a first successful &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7&quot;&gt;virtual conference&lt;/a&gt; last April, Flink Forward will be hosting a second free virtual edition on October 19-22. This time around, the conference will feature two days of hands-on training and two full days of conference talks!&lt;/p&gt;
&lt;p&gt;Got a Flink story to share? Maybe your recent adventures with Stateful Functions? The &lt;a href=&quot;https://www.flink-forward.org/global-2020/call-for-presentations&quot;&gt;Call for Presentations is now open&lt;/a&gt; and accepting submissions from the community until &lt;strong&gt;June 19th, 11:59 PM CEST&lt;/strong&gt;.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-06-10-community-update/FlinkForward_Banner_CFP_Global_2020.png&quot; width=&quot;600px&quot; alt=&quot;Flink Forward Global 2020&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;google-season-of-docs-2020&quot;&gt;Google Season of Docs 2020&lt;/h2&gt;
&lt;p&gt;In the last update, we announced that Flink was applying to &lt;a href=&quot;https://developers.google.com/season-of-docs&quot;&gt;Google Season of Docs (GSoD)&lt;/a&gt; again this year. The good news: we’ve made it into the shortlist of accepted projects! This represents an invaluable opportunity for the Flink community to collaborate with technical writers to improve the Table API &amp;amp; SQL documentation. We’re honored to have seen a great number of people reach out over the last couple of weeks, and look forward to receiving applications from this week on!&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more about our project idea or want to get involved in GSoD as a technical writer, check out the &lt;a href=&quot;https://flink.apache.org/news/2020/05/04/season-of-docs.html&quot;&gt;announcement blogpost&lt;/a&gt; and &lt;a href=&quot;https://developers.google.com/season-of-docs/docs/tech-writer-application-hints&quot;&gt;submit your application&lt;/a&gt;. The deadline for GSoD applications is &lt;strong&gt;July 9th, 18:00 UTC&lt;/strong&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more.&lt;/p&gt;
</description>
<pubDate>Thu, 11 Jun 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/06/11/community-update.html</link>
<guid isPermaLink="true">/news/2020/06/11/community-update.html</guid>
</item>
<item>
<title>Stateful Functions 2.1.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.1.0! This release introduces new features around state expiration and performance improvements for co-located deployments, as well as other important changes that improve the stability and testability of the project. As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration.&lt;/p&gt;
&lt;p&gt;The binary distribution and source artifacts are now available on the updated &lt;a href=&quot;https://flink.apache.org/downloads.html&quot;&gt;Downloads&lt;/a&gt; page of the Flink website, and the most recent Python SDK distribution is available on &lt;a href=&quot;https://pypi.org/project/apache-flink-statefun/&quot;&gt;PyPI&lt;/a&gt;. For more details, check the complete &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12347861&quot;&gt;release changelog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/&quot;&gt;updated documentation&lt;/a&gt;. We encourage you to download the release and share your feedback with the community through the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-18016?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20%22Stateful%20Functions%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC&quot;&gt;JIRA&lt;/a&gt;!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#support-for-state-time-to-live-ttl&quot; id=&quot;markdown-toc-support-for-state-time-to-live-ttl&quot;&gt;Support for State Time-To-Live (TTL)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improved-performance-with-unix-domain-sockets-uds&quot; id=&quot;markdown-toc-improved-performance-with-unix-domain-sockets-uds&quot;&gt;Improved Performance with UNIX Domain Sockets (UDS)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;support-for-state-time-to-live-ttl&quot;&gt;Support for State Time-To-Live (TTL)&lt;/h3&gt;
&lt;p&gt;Being able to define state expiration and a state cleanup strategy is a useful feature for stateful applications — for example, to keep state size from growing indefinitely or to work with sensitive data. In previous StateFun versions, users could implement this behavior manually using &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/sdk/java.html#sending-delayed-messages&quot;&gt;delayed messages&lt;/a&gt; as state expiration callbacks. For StateFun 2.1, the community has worked on enabling users to configure any persisted state to expire and be purged after a given duration (i.e. the state time-to-live) (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17644&quot;&gt;FLINK-17644&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17875&quot;&gt;FLINK-17875&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Persisted state can be configured to expire after the last &lt;em&gt;write&lt;/em&gt; operation (&lt;code&gt;AFTER_WRITE&lt;/code&gt;) or after the last &lt;em&gt;read or write&lt;/em&gt; operation (&lt;code&gt;AFTER_READ_AND_WRITE&lt;/code&gt;). For the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/sdk/java.html#state-expiration&quot;&gt;Java SDK&lt;/a&gt;, users can configure State TTL in the definition of their persisted fields:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Persisted&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PersistedValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PersistedValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;my-value&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Expiration&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;expireAfterWriting&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Duration&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ofHours&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/concepts/distributed_architecture.html#remote-functions&quot;&gt;remote functions&lt;/a&gt; using e.g. the Python SDK, users can configure State TTL in their &lt;code&gt;module.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;functions:
- function:
states:
- name: xxxx
expireAfter: 5min # optional key
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;b&gt;Note:&lt;/b&gt;
The state expiration mode for remote functions is currently restricted to AFTER_READ_AND_WRITE, and the actual TTL being set is the longest duration across all registered state, not for each individual state entry. This is planned to be improved in upcoming releases (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17954&quot;&gt;FLINK-17954&lt;/a&gt;).
&lt;/div&gt;
&lt;h3 id=&quot;improved-performance-with-unix-domain-sockets-uds&quot;&gt;Improved Performance with UNIX Domain Sockets (UDS)&lt;/h3&gt;
&lt;p&gt;Stateful functions can be &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/concepts/distributed_architecture.html#deployment-styles-for-functions&quot;&gt;deployed in multiple ways&lt;/a&gt;, even within the same application. For deployments where functions are &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.1/concepts/distributed_architecture.html#co-located-functions&quot;&gt;co-located&lt;/a&gt; with the Flink StateFun workers, it’s common to use Kubernetes to deploy pods consisting of a Flink StateFun container and the function sidecar container, communicating via the pod-local network. To improve the performance of such deployments, StateFun 2.1 allows using &lt;a href=&quot;https://troydhanson.github.io/network/Unix_domain_sockets.html&quot;&gt;Unix Domain Sockets&lt;/a&gt; (UDS) to communicate between containers in the same pod (i.e. the same machine) (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17611&quot;&gt;FLINK-17611&lt;/a&gt;), which drastically reduces the overhead of going through the network stack.&lt;/p&gt;
&lt;p&gt;Users can &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-master/sdk/modules.html#defining-functions&quot;&gt;enable transport via UDS&lt;/a&gt; in a remote module by specifying the following in their &lt;code&gt;module.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;functions:
- function:
spec:
- endpoint: http(s)+unix://&amp;lt;socket-file-path&amp;gt;/&amp;lt;serve-url-path&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17712&quot;&gt;FLINK-17712&lt;/a&gt;] The Flink version in StateFun 2.1 has been upgraded to 1.10.1, the most recent patch version.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17533&quot;&gt;FLINK-17533&lt;/a&gt;] StateFun 2.1 now supports concurrent checkpoints, which means applications will no longer fail on savepoints that are triggered concurrently to a checkpoint.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16928&quot;&gt;FLINK-16928&lt;/a&gt;] StateFun 2.0 was using the Flink legacy scheduler due to a &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16927&quot;&gt;bug in Flink 1.10&lt;/a&gt;. In 2.1, this change is reverted to using the new Flink scheduler again.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17516&quot;&gt;FLINK-17516&lt;/a&gt;] The coverage for end-to-end StateFun tests has been extended to also include exactly-once semantics verification (with failure recovery).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12347861&quot;&gt;release notes&lt;/a&gt; for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 2.1.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;The Apache Flink community would like to thank all contributors that have made this release possible:&lt;/p&gt;
&lt;p&gt;abc863377, Authuir, Chesnay Schepler, Congxian Qiu, David Anderson, Dian Fu, Francesco Guardiani, Igal Shilman, Marta Paes Moreira, Patrick Wiener, Rafi Aroch, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai&lt;/p&gt;
&lt;p&gt;If you’d like to get involved, we’re always &lt;a href=&quot;https://github.com/apache/flink-statefun#contributing&quot;&gt;looking for new contributors&lt;/a&gt; — especially around SDKs for other languages like Go, Rust or Javascript.&lt;/p&gt;
</description>
<pubDate>Tue, 09 Jun 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/06/09/release-statefun-2.1.0.html</link>
<guid isPermaLink="true">/news/2020/06/09/release-statefun-2.1.0.html</guid>
</item>
<item>
<title>Apache Flink 1.10.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.10 series.&lt;/p&gt;
&lt;p&gt;This release includes 158 fixes and minor improvements for Flink 1.10.0. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.10.1.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
FLINK-16684 changed the builders of the StreamingFileSink to make them compilable in Scala. This change is source compatible but binary incompatible. If using the StreamingFileSink, please recompile your user code against 1.10.1 before upgrading.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
FLINK-16683 Flink no longer supports starting clusters with .bat scripts. Users should instead use environments like WSL or Cygwin and work with the .sh scripts.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.10.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.10.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.10.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14126&quot;&gt;FLINK-14126&lt;/a&gt;] - Elasticsearch Xpack Machine Learning doesn&amp;#39;t support ARM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15143&quot;&gt;FLINK-15143&lt;/a&gt;] - Create document for FLIP-49 TM memory model and configuration guide
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15561&quot;&gt;FLINK-15561&lt;/a&gt;] - Unify Kerberos credentials checking
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15790&quot;&gt;FLINK-15790&lt;/a&gt;] - Make FlinkKubeClient and its implementations asynchronous
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15817&quot;&gt;FLINK-15817&lt;/a&gt;] - Kubernetes Resource leak while deployment exception happens
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16049&quot;&gt;FLINK-16049&lt;/a&gt;] - Remove outdated &amp;quot;Best Practices&amp;quot; section from Application Development Section
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16131&quot;&gt;FLINK-16131&lt;/a&gt;] - Translate &amp;quot;Amazon S3&amp;quot; page of &amp;quot;File Systems&amp;quot; into Chinese
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16389&quot;&gt;FLINK-16389&lt;/a&gt;] - Bump Kafka 0.10 to 0.10.2.2
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2336&quot;&gt;FLINK-2336&lt;/a&gt;] - ArrayIndexOufOBoundsException in TypeExtractor when mapping
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10918&quot;&gt;FLINK-10918&lt;/a&gt;] - incremental Keyed state with RocksDB throws cannot create directory error in windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11193&quot;&gt;FLINK-11193&lt;/a&gt;] - Rocksdb timer service factory configuration option is not settable per job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13483&quot;&gt;FLINK-13483&lt;/a&gt;] - PrestoS3FileSystemITCase.testDirectoryListing fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14038&quot;&gt;FLINK-14038&lt;/a&gt;] - ExecutionGraph deploy failed due to akka timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14311&quot;&gt;FLINK-14311&lt;/a&gt;] - Streaming File Sink end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14316&quot;&gt;FLINK-14316&lt;/a&gt;] - Stuck in &amp;quot;Job leader ... lost leadership&amp;quot; error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15417&quot;&gt;FLINK-15417&lt;/a&gt;] - Remove the docker volume or mount when starting Mesos e2e cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15669&quot;&gt;FLINK-15669&lt;/a&gt;] - SQL client can&amp;#39;t cancel flink job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15772&quot;&gt;FLINK-15772&lt;/a&gt;] - Shaded Hadoop S3A with credentials provider end-to-end test fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15811&quot;&gt;FLINK-15811&lt;/a&gt;] - StreamSourceOperatorWatermarksTest.testNoMaxWatermarkOnAsyncCancel fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15812&quot;&gt;FLINK-15812&lt;/a&gt;] - HistoryServer archiving is done in Dispatcher main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15838&quot;&gt;FLINK-15838&lt;/a&gt;] - Dangling CountDownLatch.await(timeout)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15852&quot;&gt;FLINK-15852&lt;/a&gt;] - Job is submitted to the wrong session cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15904&quot;&gt;FLINK-15904&lt;/a&gt;] - Make Kafka Consumer work with activated &amp;quot;disableGenericTypes()&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15936&quot;&gt;FLINK-15936&lt;/a&gt;] - TaskExecutorTest#testSlotAcceptance deadlocks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15953&quot;&gt;FLINK-15953&lt;/a&gt;] - Job Status is hard to read for some Statuses
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16013&quot;&gt;FLINK-16013&lt;/a&gt;] - List and map config options could not be parsed correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16014&quot;&gt;FLINK-16014&lt;/a&gt;] - S3 plugin ClassNotFoundException SAXParser
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16025&quot;&gt;FLINK-16025&lt;/a&gt;] - Service could expose blob server port mismatched with JM Container
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16026&quot;&gt;FLINK-16026&lt;/a&gt;] - Travis failed due to python setup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16047&quot;&gt;FLINK-16047&lt;/a&gt;] - Blink planner produces wrong aggregate results with state clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16067&quot;&gt;FLINK-16067&lt;/a&gt;] - Flink&amp;#39;s CalciteParser swallows error position information
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16068&quot;&gt;FLINK-16068&lt;/a&gt;] - table with keyword-escaped columns and computed_column_expression columns
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16070&quot;&gt;FLINK-16070&lt;/a&gt;] - Blink planner can not extract correct unique key for UpsertStreamTableSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16108&quot;&gt;FLINK-16108&lt;/a&gt;] - StreamSQLExample is failed if running in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16111&quot;&gt;FLINK-16111&lt;/a&gt;] - Kubernetes deployment does not respect &amp;quot;taskmanager.cpu.cores&amp;quot;.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16113&quot;&gt;FLINK-16113&lt;/a&gt;] - ExpressionReducer shouldn&amp;#39;t escape the reduced string value
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16115&quot;&gt;FLINK-16115&lt;/a&gt;] - Aliyun oss filesystem could not work with plugin mechanism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16139&quot;&gt;FLINK-16139&lt;/a&gt;] - Co-location constraints are not reset on task recovery in DefaultScheduler
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16161&quot;&gt;FLINK-16161&lt;/a&gt;] - Statistics zero should be unknown in HiveCatalog
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16170&quot;&gt;FLINK-16170&lt;/a&gt;] - SearchTemplateRequest ClassNotFoundException when use flink-sql-connector-elasticsearch7
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16220&quot;&gt;FLINK-16220&lt;/a&gt;] - JsonRowSerializationSchema throws cast exception : NullNode cannot be cast to ArrayNode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16231&quot;&gt;FLINK-16231&lt;/a&gt;] - Hive connector is missing jdk.tools exclusion against Hive 2.x.x
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16234&quot;&gt;FLINK-16234&lt;/a&gt;] - Fix unstable cases in StreamingJobGraphGeneratorTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16241&quot;&gt;FLINK-16241&lt;/a&gt;] - Remove the license and notice file in flink-ml-lib module on release-1.10 branch
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16242&quot;&gt;FLINK-16242&lt;/a&gt;] - BinaryGeneric serialization error cause checkpoint failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16262&quot;&gt;FLINK-16262&lt;/a&gt;] - Class loader problem with FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16269&quot;&gt;FLINK-16269&lt;/a&gt;] - Generic type can not be matched when convert table to stream.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16281&quot;&gt;FLINK-16281&lt;/a&gt;] - parameter &amp;#39;maxRetryTimes&amp;#39; can not work in JDBCUpsertTableSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16301&quot;&gt;FLINK-16301&lt;/a&gt;] - Annoying &amp;quot;Cannot find FunctionDefinition&amp;quot; messages with SQL for f_proctime or =
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16308&quot;&gt;FLINK-16308&lt;/a&gt;] - SQL connector download links are broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16313&quot;&gt;FLINK-16313&lt;/a&gt;] - flink-state-processor-api: surefire execution unstable on Azure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16331&quot;&gt;FLINK-16331&lt;/a&gt;] - Remove source licenses for old WebUI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16345&quot;&gt;FLINK-16345&lt;/a&gt;] - Computed column can not refer time attribute column
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16360&quot;&gt;FLINK-16360&lt;/a&gt;] - connector on hive 2.0.1 don&amp;#39;t support type conversion from STRING to VARCHAR
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16371&quot;&gt;FLINK-16371&lt;/a&gt;] - HadoopCompressionBulkWriter fails with &amp;#39;java.io.NotSerializableException&amp;#39;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16373&quot;&gt;FLINK-16373&lt;/a&gt;] - EmbeddedLeaderService: IllegalStateException: The RPC connection is already closed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16413&quot;&gt;FLINK-16413&lt;/a&gt;] - Reduce hive source parallelism when limit push down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16414&quot;&gt;FLINK-16414&lt;/a&gt;] - create udaf/udtf function using sql casuing ValidationException: SQL validation failed. null
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16433&quot;&gt;FLINK-16433&lt;/a&gt;] - TableEnvironmentImpl doesn&amp;#39;t clear buffered operations when it fails to translate the operation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16435&quot;&gt;FLINK-16435&lt;/a&gt;] - Replace since decorator with versionadd to mark the version an API was introduced
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16467&quot;&gt;FLINK-16467&lt;/a&gt;] - MemorySizeTest#testToHumanReadableString() is not portable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16526&quot;&gt;FLINK-16526&lt;/a&gt;] - Fix exception when computed column expression references a keyword column name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16541&quot;&gt;FLINK-16541&lt;/a&gt;] - Document of table.exec.shuffle-mode is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16550&quot;&gt;FLINK-16550&lt;/a&gt;] - HadoopS3* tests fail with NullPointerException exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16560&quot;&gt;FLINK-16560&lt;/a&gt;] - Forward Configuration in PackagedProgramUtils#getPipelineFromProgram
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16567&quot;&gt;FLINK-16567&lt;/a&gt;] - Get the API error of the StreamQueryConfig on Page &amp;quot;Query Configuration&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16573&quot;&gt;FLINK-16573&lt;/a&gt;] - Kinesis consumer does not properly shutdown RecordFetcher threads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16576&quot;&gt;FLINK-16576&lt;/a&gt;] - State inconsistency on restore with memory state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16626&quot;&gt;FLINK-16626&lt;/a&gt;] - Prevent REST handler from being closed more than once
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16632&quot;&gt;FLINK-16632&lt;/a&gt;] - SqlDateTimeUtils#toSqlTimestamp(String, String) may yield incorrect result
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16635&quot;&gt;FLINK-16635&lt;/a&gt;] - Incompatible okio dependency in flink-metrics-influxdb module
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16646&quot;&gt;FLINK-16646&lt;/a&gt;] - flink read orc file throw a NullPointerException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16647&quot;&gt;FLINK-16647&lt;/a&gt;] - Miss file extension when inserting to hive table with compression
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16652&quot;&gt;FLINK-16652&lt;/a&gt;] - BytesColumnVector should init buffer in Hive 3.x
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16662&quot;&gt;FLINK-16662&lt;/a&gt;] - Blink Planner failed to generate JobGraph for POJO DataStream converting to Table (Cannot determine simple type name)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16664&quot;&gt;FLINK-16664&lt;/a&gt;] - Unable to set DataStreamSource parallelism to default (-1)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16675&quot;&gt;FLINK-16675&lt;/a&gt;] - TableEnvironmentITCase. testClearOperation fails on travis nightly build
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16684&quot;&gt;FLINK-16684&lt;/a&gt;] - StreamingFileSink builder does not work with Scala
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16696&quot;&gt;FLINK-16696&lt;/a&gt;] - Savepoint trigger documentation is insufficient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16703&quot;&gt;FLINK-16703&lt;/a&gt;] - AkkaRpcActor state machine does not record transition to terminating state.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16705&quot;&gt;FLINK-16705&lt;/a&gt;] - LocalExecutor tears down MiniCluster before client can retrieve JobResult
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16718&quot;&gt;FLINK-16718&lt;/a&gt;] - KvStateServerHandlerTest leaks Netty ByteBufs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16727&quot;&gt;FLINK-16727&lt;/a&gt;] - Fix cast exception when having time point literal as parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16732&quot;&gt;FLINK-16732&lt;/a&gt;] - Failed to call Hive UDF with constant return value
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16740&quot;&gt;FLINK-16740&lt;/a&gt;] - OrcSplitReaderUtil::logicalTypeToOrcType fails to create decimal type with precision &amp;lt; 10
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16759&quot;&gt;FLINK-16759&lt;/a&gt;] - HiveModuleTest failed to compile on release-1.10
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16767&quot;&gt;FLINK-16767&lt;/a&gt;] - Failed to read Hive table with RegexSerDe
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16771&quot;&gt;FLINK-16771&lt;/a&gt;] - NPE when filtering by decimal column
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16821&quot;&gt;FLINK-16821&lt;/a&gt;] - Run Kubernetes test failed with invalid named &amp;quot;minikube&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16822&quot;&gt;FLINK-16822&lt;/a&gt;] - The config set by SET command does not work
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16825&quot;&gt;FLINK-16825&lt;/a&gt;] - PrometheusReporterEndToEndITCase should rely on path returned by DownloadCache
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16836&quot;&gt;FLINK-16836&lt;/a&gt;] - Losing leadership does not clear rpc connection in JobManagerLeaderListener
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16860&quot;&gt;FLINK-16860&lt;/a&gt;] - Failed to push filter into OrcTableSource when upgrading to 1.9.2
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16888&quot;&gt;FLINK-16888&lt;/a&gt;] - Re-add jquery license file under &amp;quot;/licenses&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16901&quot;&gt;FLINK-16901&lt;/a&gt;] - Flink Kinesis connector NOTICE should have contents of AWS KPL&amp;#39;s THIRD_PARTY_NOTICES file manually merged in
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16913&quot;&gt;FLINK-16913&lt;/a&gt;] - ReadableConfigToConfigurationAdapter#getEnum throws UnsupportedOperationException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16916&quot;&gt;FLINK-16916&lt;/a&gt;] - The logic of NullableSerializer#copy is wrong
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16944&quot;&gt;FLINK-16944&lt;/a&gt;] - Compile error in. DumpCompiledPlanTest and PreviewPlanDumpTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16980&quot;&gt;FLINK-16980&lt;/a&gt;] - Python UDF doesn&amp;#39;t work with protobuf 3.6.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16981&quot;&gt;FLINK-16981&lt;/a&gt;] - flink-runtime tests are crashing the JVM on Java11 because of PowerMock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17062&quot;&gt;FLINK-17062&lt;/a&gt;] - Fix the conversion from Java row type to Python row type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17066&quot;&gt;FLINK-17066&lt;/a&gt;] - Update pyarrow version bounds less than 0.14.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17093&quot;&gt;FLINK-17093&lt;/a&gt;] - Python UDF doesn&amp;#39;t work when the input column is from composite field
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17107&quot;&gt;FLINK-17107&lt;/a&gt;] - CheckpointCoordinatorConfiguration#isExactlyOnce() is inconsistent with StreamConfig#getCheckpointMode()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17114&quot;&gt;FLINK-17114&lt;/a&gt;] - When the pyflink job runs in local mode and the command &amp;quot;python&amp;quot; points to Python 2.7, the startup of the Python UDF worker will fail.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17124&quot;&gt;FLINK-17124&lt;/a&gt;] - The PyFlink Job runs into infinite loop if the Python UDF imports job code
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17152&quot;&gt;FLINK-17152&lt;/a&gt;] - FunctionDefinitionUtil generate wrong resultType and acc type of AggregateFunctionDefinition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17308&quot;&gt;FLINK-17308&lt;/a&gt;] - ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17313&quot;&gt;FLINK-17313&lt;/a&gt;] - Validation error when insert decimal/varchar with precision into sink using TypeInformation of row
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17334&quot;&gt;FLINK-17334&lt;/a&gt;] - Flink does not support HIVE UDFs with primitive return types
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17338&quot;&gt;FLINK-17338&lt;/a&gt;] - LocalExecutorITCase.testBatchQueryCancel test timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17359&quot;&gt;FLINK-17359&lt;/a&gt;] - Entropy key is not resolved if flink-s3-fs-hadoop is added as a plugin
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17403&quot;&gt;FLINK-17403&lt;/a&gt;] - Fix invalid classpath in BashJavaUtilsITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17471&quot;&gt;FLINK-17471&lt;/a&gt;] - Move LICENSE and NOTICE files to root directory of python distribution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17483&quot;&gt;FLINK-17483&lt;/a&gt;] - Update flink-sql-connector-elasticsearch7 NOTICE file to correctly reflect bundled dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17496&quot;&gt;FLINK-17496&lt;/a&gt;] - Performance regression with amazon-kinesis-producer 0.13.1 in Flink 1.10.x
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17499&quot;&gt;FLINK-17499&lt;/a&gt;] - LazyTimerService used to register timers via State Processing API incorrectly mixes event time timers with processing time timers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17514&quot;&gt;FLINK-17514&lt;/a&gt;] - TaskCancelerWatchdog does not kill TaskManager
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17275&quot;&gt;FLINK-17275&lt;/a&gt;] - Add core training exercises
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9656&quot;&gt;FLINK-9656&lt;/a&gt;] - Environment java opts for flink run
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15094&quot;&gt;FLINK-15094&lt;/a&gt;] - Warning about using private constructor of java.nio.DirectByteBuffer in Java 11
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15584&quot;&gt;FLINK-15584&lt;/a&gt;] - Give nested data type of ROWs in ValidationException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15616&quot;&gt;FLINK-15616&lt;/a&gt;] - Move boot error messages from python-udf-boot.log to taskmanager&amp;#39;s log file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15989&quot;&gt;FLINK-15989&lt;/a&gt;] - Rewrap OutOfMemoryError in allocateUnpooledOffHeap with better message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16018&quot;&gt;FLINK-16018&lt;/a&gt;] - Improve error reporting when submitting batch job (instead of AskTimeoutException)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16125&quot;&gt;FLINK-16125&lt;/a&gt;] - Make zookeeper.connect optional for Kafka connectors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16167&quot;&gt;FLINK-16167&lt;/a&gt;] - Update documentation about python shell execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16191&quot;&gt;FLINK-16191&lt;/a&gt;] - Improve error message on Windows when RocksDB Paths are too long
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16280&quot;&gt;FLINK-16280&lt;/a&gt;] - Fix sample code errors in the documentation about elasticsearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16288&quot;&gt;FLINK-16288&lt;/a&gt;] - Setting the TTL for discarding task pods on Kubernetes.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16293&quot;&gt;FLINK-16293&lt;/a&gt;] - Document using plugins in Kubernetes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16343&quot;&gt;FLINK-16343&lt;/a&gt;] - Improve exception message when reading an unbounded source in batch mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16406&quot;&gt;FLINK-16406&lt;/a&gt;] - Increase default value for JVM Metaspace to minimise its OutOfMemoryError
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16538&quot;&gt;FLINK-16538&lt;/a&gt;] - Restructure Python Table API documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16604&quot;&gt;FLINK-16604&lt;/a&gt;] - Column key in JM configuration is too narrow
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16683&quot;&gt;FLINK-16683&lt;/a&gt;] - Remove scripts for starting Flink on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16697&quot;&gt;FLINK-16697&lt;/a&gt;] - Disable JMX rebinding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16763&quot;&gt;FLINK-16763&lt;/a&gt;] - Should not use BatchTableEnvironment for Python UDF in the document of flink-1.10
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16772&quot;&gt;FLINK-16772&lt;/a&gt;] - Bump derby to 10.12.1.1+ or exclude it
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16790&quot;&gt;FLINK-16790&lt;/a&gt;] - enables the interpretation of backslash escapes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16862&quot;&gt;FLINK-16862&lt;/a&gt;] - Remove example url in quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16874&quot;&gt;FLINK-16874&lt;/a&gt;] - Respect the dynamic options when calculating memory options in taskmanager.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16942&quot;&gt;FLINK-16942&lt;/a&gt;] - ES 5 sink should allow users to select netty transport client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17065&quot;&gt;FLINK-17065&lt;/a&gt;] - Add documentation about the Python versions supported for PyFlink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17125&quot;&gt;FLINK-17125&lt;/a&gt;] - Add a Usage Notes Page to Answer Common Questions Encountered by PyFlink Users
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17254&quot;&gt;FLINK-17254&lt;/a&gt;] - Improve the PyFlink documentation and examples to use SQL DDL for source/sink definition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17276&quot;&gt;FLINK-17276&lt;/a&gt;] - Add checkstyle to training exercises
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17277&quot;&gt;FLINK-17277&lt;/a&gt;] - Apply IntelliJ recommendations to training exercises
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17278&quot;&gt;FLINK-17278&lt;/a&gt;] - Add Travis to the training exercises
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17279&quot;&gt;FLINK-17279&lt;/a&gt;] - Use gradle build scans for training exercises
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17316&quot;&gt;FLINK-17316&lt;/a&gt;] - Have HourlyTips solutions use TumblingEventTimeWindows.of
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15741&quot;&gt;FLINK-15741&lt;/a&gt;] - Fix TTL docs after enabling RocksDB compaction filter by default (needs Chinese translation)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15933&quot;&gt;FLINK-15933&lt;/a&gt;] - update content of how generic table schema is stored in hive via HiveCatalog
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15991&quot;&gt;FLINK-15991&lt;/a&gt;] - Create Chinese documentation for FLIP-49 TM memory model
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16004&quot;&gt;FLINK-16004&lt;/a&gt;] - Exclude flink-rocksdb-state-memory-control-test jars from the dist
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16454&quot;&gt;FLINK-16454&lt;/a&gt;] - Update the copyright year in NOTICE files
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16530&quot;&gt;FLINK-16530&lt;/a&gt;] - Add documentation about &amp;quot;GROUPING SETS&amp;quot; and &amp;quot;CUBE&amp;quot; support in streaming mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16592&quot;&gt;FLINK-16592&lt;/a&gt;] - The doc of Streaming File Sink has a mistake of grammar
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 12 May 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/05/12/release-1.10.1.html</link>
<guid isPermaLink="true">/news/2020/05/12/release-1.10.1.html</guid>
</item>
<item>
<title>Flink Community Update - May&#39;20</title>
<description>&lt;p&gt;Can you smell it? It’s release month! It took a while, but now that we’re &lt;a href=&quot;https://flink.apache.org/news/2020/04/01/community-update.html&quot;&gt;all caught up with the past&lt;/a&gt;, the Community Update is here to stay. This time around, we’re warming up for Flink 1.11 and peeping back to the month of April in the Flink community — with the release of Stateful Functions 2.0, a new self-paced Flink training and some efforts to improve the Flink documentation experience.&lt;/p&gt;
&lt;p&gt;Last month also marked the debut of Flink Forward Virtual Conference 2020: what did you think? If you missed it altogether or just want to recap some of the sessions, the &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7&quot;&gt;videos&lt;/a&gt; and &lt;a href=&quot;https://www.slideshare.net/FlinkForward&quot;&gt;slides&lt;/a&gt; are now available!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#the-past-month-in-flink&quot; id=&quot;markdown-toc-the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-stateful-functions-20-is-out&quot; id=&quot;markdown-toc-flink-stateful-functions-20-is-out&quot;&gt;Flink Stateful Functions 2.0 is out!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#warming-up-for-flink-111&quot; id=&quot;markdown-toc-warming-up-for-flink-111&quot;&gt;Warming up for Flink 1.11&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-minor-releases&quot; id=&quot;markdown-toc-flink-minor-releases&quot;&gt;Flink Minor Releases&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-193&quot; id=&quot;markdown-toc-flink-193&quot;&gt;Flink 1.9.3&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-1101&quot; id=&quot;markdown-toc-flink-1101&quot;&gt;Flink 1.10.1&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#a-new-self-paced-apache-flink-training&quot; id=&quot;markdown-toc-a-new-self-paced-apache-flink-training&quot;&gt;A new self-paced Apache Flink training&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#google-season-of-docs-2020&quot; id=&quot;markdown-toc-google-season-of-docs-2020&quot;&gt;Google Season of Docs 2020&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#and-something-to-read&quot; id=&quot;markdown-toc-and-something-to-read&quot;&gt;…and something to read!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;the-past-month-in-flink&quot;&gt;The Past Month in Flink&lt;/h1&gt;
&lt;h2 id=&quot;flink-stateful-functions-20-is-out&quot;&gt;Flink Stateful Functions 2.0 is out!&lt;/h2&gt;
&lt;p&gt;In the beginning of April, the Flink community announced the &lt;a href=&quot;https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html&quot;&gt;release of Stateful Functions 2.0&lt;/a&gt; — the first as part of the Apache Flink project. From this release, you can use Flink as the base of a (stateful) serverless platform with out-of-the-box consistent and scalable state, and efficient messaging between functions. You can even run your stateful functions on platforms like AWS Lambda, as Gordon (&lt;a href=&quot;https://twitter.com/tzulitai&quot;&gt;@tzulitai&lt;/a&gt;) demonstrated in &lt;a href=&quot;https://www.youtube.com/watch?v=tuSylBadNSo&amp;amp;list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7&amp;amp;index=27&amp;amp;t=8s&quot;&gt;his Flink Forward talk&lt;/a&gt;.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-05-06-community-update/2020-05-06-community-update_2.png&quot; width=&quot;550px&quot; alt=&quot;Stateful Functions&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;It’s been encouraging to see so many questions about Stateful Functions popping up in the &lt;a href=&quot;https://lists.apache.org/list.html?user@flink.apache.org:lte=3M:statefun&quot;&gt;mailing list&lt;/a&gt; and Stack Overflow! If you’d like to get involved, we’re always &lt;a href=&quot;https://github.com/apache/flink-statefun#contributing&quot;&gt;looking for new contributors&lt;/a&gt; — especially around SDKs for other languages like Go, Javascript and Rust.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;warming-up-for-flink-111&quot;&gt;Warming up for Flink 1.11&lt;/h2&gt;
&lt;p&gt;The final preparations for the release of Flink 1.11 are well underway, with the feature freeze scheduled for May 15th, and there’s a lot of new features and improvements to look out for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;On the &lt;strong&gt;usability&lt;/strong&gt; side, you can expect a big focus on smoothing data ingestion with contributions like support for Change Data Capture (CDC) in the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL&quot;&gt;FLIP-105&lt;/a&gt;), easy streaming data ingestion into Apache Hive (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table&quot;&gt;FLIP-115&lt;/a&gt;) or support for Pandas DataFrames in PyFlink (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-120%3A+Support+conversion+between+PyFlink+Table+and+Pandas+DataFrame&quot;&gt;FLIP-120&lt;/a&gt;). A great deal of effort has also gone into maturing PyFlink, with the introduction of user defined metrics in Python UDFs (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-112%3A+Support+User-Defined+Metrics+in++Python+UDF&quot;&gt;FLIP-112&lt;/a&gt;) and the extension of Python UDF support beyond the Python Table API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-106%3A+Support+Python+UDF+in+SQL+Function+DDL&quot;&gt;FLIP-106&lt;/a&gt;,&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-114%3A+Support+Python+UDF+in+SQL+Client&quot;&gt;FLIP-114&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;On the &lt;strong&gt;operational&lt;/strong&gt; side, the much anticipated new Source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints&quot;&gt;FLIP-76&lt;/a&gt;) and changes to network memory management will allow to speed up checkpointing under backpressure — this is part of a bigger effort to rethink fault tolerance that will introduce many other non-trivial changes to Flink. You can learn more about it in &lt;a href=&quot;https://youtu.be/ssEmeLcL5Uk&quot;&gt;this&lt;/a&gt; recent Flink Forward talk!&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Throw into the mix improvements around type systems, the WebUI, metrics reporting, supported formats and…we can’t wait! To get an overview of the ongoing developments, have a look at &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ANNOUNCE-Development-progress-of-Apache-Flink-1-11-tp40718.html&quot;&gt;this thread&lt;/a&gt;. We encourage the community to get involved in testing once an RC (Release Candidate) is out. Keep an eye on the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@dev mailing list&lt;/a&gt; for updates!&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;flink-minor-releases&quot;&gt;Flink Minor Releases&lt;/h2&gt;
&lt;h3 id=&quot;flink-193&quot;&gt;Flink 1.9.3&lt;/h3&gt;
&lt;p&gt;The community released Flink 1.9.3, covering some outstanding bugs from Flink 1.9! You can find more in the &lt;a href=&quot;(https://flink.apache.org/news/2020/04/24/release-1.9.3.html)&quot;&gt;announcement blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;flink-1101&quot;&gt;Flink 1.10.1&lt;/h3&gt;
&lt;p&gt;Also in the pipeline is the release of Flink 1.10.1, already in the &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-10-1-release-candidate-2-td41019.html&quot;&gt;RC voting&lt;/a&gt; phase. So, you can expect Flink 1.10.1 to be released soon!&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;
&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;3 PMC Members&lt;/strong&gt; and &lt;strong&gt;2 new Committers&lt;/strong&gt; since the last update. Congratulations!&lt;/p&gt;
&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars2.githubusercontent.com/u/6242259?s=400&amp;amp;u=6e39f4fdbabc8ce4ccde9125166f791957d3ae80&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/dwysakowicz&quot;&gt;Dawid Wysakowicz&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars1.githubusercontent.com/u/4971479?s=400&amp;amp;u=49d4f217e26186606ab13a17a23a038b62b86682&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/HequnC&quot;&gt;Hequn Cheng&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars3.githubusercontent.com/u/12387855?s=400&amp;amp;u=37edbfccb6908541f359433f420f9f1bc25bc714&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;Zhijiang Wang&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars3.githubusercontent.com/u/11538663?s=400&amp;amp;u=f4643f1981e2a8f8a1962c34511b0d32a31d9502&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/snntrable&quot;&gt;Konstantin Knauf&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-3&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;img class=&quot;img-circle&quot; src=&quot;https://avatars1.githubusercontent.com/u/1891970?s=400&amp;amp;u=b7718355ceb1f4a8d1e554c3ae7221e2f32cc8e0&amp;amp;v=4&quot; width=&quot;90&quot; height=&quot;90&quot; /&gt;
&lt;p&gt;&lt;a href=&quot;https://twitter.com/sjwiesman&quot;&gt;Seth Wiesman&lt;/a&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;hr /&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;
&lt;h2 id=&quot;a-new-self-paced-apache-flink-training&quot;&gt;A new self-paced Apache Flink training&lt;/h2&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;This week, the Flink website received the invaluable contribution of a self-paced training course curated by David (&lt;a href=&quot;https://twitter.com/alpinegizmo&quot;&gt;@alpinegizmo&lt;/a&gt;) — or, what used to be the entire training materials under &lt;a href=&quot;training.ververica.com&quot;&gt;training.ververica.com&lt;/a&gt;. The new materials guide you through the very basics of Flink and the DataStream API, and round off every concepts section with hands-on exercises to help you better assimilate what you learned.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-05-06-community-update/2020-05-06-community-update_1.png&quot; width=&quot;1000px&quot; alt=&quot;Self-paced Flink Training&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:140%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Whether you’re new to Flink or just looking to strengthen your foundations, this training is the most comprehensive way to get started and is now completely open source: &lt;a href=&quot;https://flink.apache.org/training.html&quot;&gt;https://flink.apache.org/training.html&lt;/a&gt;. For now, the materials are only available in English, but the community intends to also provide a Chinese translation in the future.&lt;/p&gt;
&lt;hr /&gt;
&lt;h2 id=&quot;google-season-of-docs-2020&quot;&gt;Google Season of Docs 2020&lt;/h2&gt;
&lt;p&gt;Google Season of Docs (GSOD) is a great initiative organized by &lt;a href=&quot;https://opensource.google.com/&quot;&gt;Google Open Source&lt;/a&gt; to pair technical writers with mentors to work on documentation for open source projects. Last year, the Flink community submitted &lt;a href=&quot;https://flink.apache.org/news/2019/04/17/sod.html&quot;&gt;an application&lt;/a&gt; that unfortunately didn’t make the cut — but we are trying again! This time, with a project idea to improve the Table API &amp;amp; SQL documentation:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1) Restructure the Table API &amp;amp; SQL Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reworking the current documentation structure would allow to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Lower the entry barrier to Flink for non-programmatic (i.e. SQL) users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Make the available features more easily discoverable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Improve the flow and logical correlation of topics.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685&quot;&gt;FLIP-60&lt;/a&gt; contains a detailed proposal on how to reorganize the existing documentation, which can be used as a starting point.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) Extend the Table API &amp;amp; SQL Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some areas of the documentation have insufficient detail or are not &lt;a href=&quot;https://flink.apache.org/contributing/docs-style.html#general-guiding-principles&quot;&gt;accessible&lt;/a&gt; for new Flink users. Examples of topics and sections that require attention are: planners, built-in functions, connectors, overview and concepts sections. There is a lot of work to be done and the technical writer could choose what areas to focus on — these improvements could then be added to the documentation rework umbrella issue (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12639&quot;&gt;FLINK-12639&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;If you’re interested in learning more about this project idea or want to get involved in GSoD as a technical writer, check out the &lt;a href=&quot;https://flink.apache.org/news/2020/05/04/season-of-docs.html&quot;&gt;announcement blogpost&lt;/a&gt;.&lt;/p&gt;
&lt;hr /&gt;
&lt;h1 id=&quot;and-something-to-read&quot;&gt;…and something to read!&lt;/h1&gt;
&lt;p&gt;Events across the globe have pretty much come to a halt, so we’ll leave you with some interesting resources to read and explore instead. In addition to this written content, you can also recap the sessions from the &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD0ngnBSU-bYGfgVv17MiwA7&quot;&gt;Flink Forward Virtual Conference&lt;/a&gt;!&lt;/p&gt;
&lt;table class=&quot;table table-bordered&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Links&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-bookmark&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Blogposts&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://medium.com/@abdelkrim.hadjidj/event-driven-supply-chain-for-crisis-with-flinksql-be80cb3ad4f9&quot;&gt;Event-Driven Supply Chain for Crisis with FlinkSQL and Zeppelin&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/04/21/memory-management-improvements-flink-1.10.html&quot;&gt;Memory Management Improvements with Apache Flink 1.10&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html&quot;&gt;Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon-console&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Tutorials&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html&quot;&gt;PyFlink: Introducing Python Support for UDFs in Flink&#39;s Table API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://dev.to/morsapaes/flink-stateful-functions-where-to-start-2j39&quot;&gt;Flink Stateful Functions: where to start?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-certificate&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Flink Packages&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;&lt;p&gt;&lt;a href=&quot;https://flink-packages.org/&quot;&gt;Flink Packages&lt;/a&gt; is a website where you can explore (and contribute to) the Flink &lt;br /&gt; ecosystem of connectors, extensions, APIs, tools and integrations. &lt;b&gt;New in:&lt;/b&gt; &lt;/p&gt;
&lt;li&gt;&lt;a href=&quot;https://flink-packages.org/packages/spillable-state-backend-for-flink&quot;&gt;Spillable State Backend for Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink-packages.org/packages/flink-memory-calculator&quot;&gt;Flink Memory Calculator&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink-packages.org/packages/ververica-platform-community-edition&quot;&gt;Ververica Platform Community Edition&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more.&lt;/p&gt;
</description>
<pubDate>Thu, 07 May 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/05/07/community-update.html</link>
<guid isPermaLink="true">/news/2020/05/07/community-update.html</guid>
</item>
<item>
<title>Applying to Google Season of Docs 2020</title>
<description>&lt;p&gt;The Flink community is thrilled to share that the project is applying again to &lt;a href=&quot;https://developers.google.com/season-of-docs/&quot;&gt;Google Season of Docs&lt;/a&gt; (GSoD) this year! If you’re unfamiliar with the program, GSoD is a great initiative organized by &lt;a href=&quot;https://opensource.google.com/&quot;&gt;Google Open Source&lt;/a&gt; to pair technical writers with mentors to work on documentation for open source projects. The &lt;a href=&quot;https://developers.google.com/season-of-docs/docs/2019/participants&quot;&gt;first edition&lt;/a&gt; supported over 40 projects, including some other cool Apache Software Foundation (ASF) members like Apache Airflow and Apache Cassandra.&lt;/p&gt;
&lt;h1 id=&quot;why-apply&quot;&gt;Why Apply?&lt;/h1&gt;
&lt;p&gt;As one of the most active projects in the ASF, Flink is experiencing a boom in contributions and some major changes to its codebase. And, while the project has also seen a significant increase in activity when it comes to writing, reviewing and translating documentation, it’s hard to keep up with the pace.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-05-04-season-of-docs/2020-04-30-season-of-docs_1.png&quot; width=&quot;650px&quot; alt=&quot;GitHub 1&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Since last year, the community has been working on &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation&quot;&gt;FLIP-42&lt;/a&gt; to improve the documentation experience and bring a more accessible information architecture to Flink. After &lt;a href=&quot;https://www.mail-archive.com/dev@flink.apache.org/msg36987.html&quot;&gt;some discussion&lt;/a&gt;, we agreed that GSoD would be a valuable opportunity to double down on this effort and collaborate with someone who is passionate about technical writing…and Flink!&lt;/p&gt;
&lt;h1 id=&quot;how-can-you-contribute&quot;&gt;How can you contribute?&lt;/h1&gt;
&lt;p&gt;If working shoulder to shoulder with the Flink community on documentation sounds exciting, we’d love to hear from you! You can read more about our idea for this year’s project below and, depending on whether it is accepted, &lt;a href=&quot;https://developers.google.com/season-of-docs/docs/tech-writer-guide&quot;&gt;apply&lt;/a&gt; as a technical writer. If you have any questions or just want to know more about the project idea, ping us at &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;dev@flink.apache.org&lt;/a&gt;!&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
Please &lt;a href=&quot;mailto:dev-subscribe@flink.apache.org&quot;&gt;subscribe&lt;/a&gt; to the Apache Flink mailing list before reaching out.
If you are not subscribed then responses to your message will not go through.
You can always &lt;a href=&quot;mailto:dev-unsubscribe@flink.apache.org&quot;&gt;unsubscribe&lt;/a&gt; at any time.
&lt;/div&gt;
&lt;h2 id=&quot;project-improve-the-table-api--sql-documentation&quot;&gt;Project: Improve the Table API &amp;amp; SQL Documentation&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; is a stateful stream processor supporting a broad set of use cases and featuring APIs at different levels of abstraction that allow users to trade off expressiveness and usability, as well as work with their language of choice (Java/Scala, SQL or Python). The Table API &amp;amp; SQL are Flink’s high-level relational abstractions and focus on data analytics use cases. A core principle is that either API can be used to process static (batch) and continuous (streaming) data with the same syntax and yielding the same results.&lt;/p&gt;
&lt;p&gt;As the Flink community works on extending the scope of the Table API &amp;amp; SQL, a lot of new features are being added and some underlying structures are also being refactored. At the same time, the documentation for these APIs is growing onto a somewhat unruly structure and has potential for improvement in some areas.&lt;/p&gt;
&lt;p&gt;The project has two main workstreams: restructuring and extending the Table API &amp;amp; SQL documentation. These can be worked on by one person as a bigger effort or assigned to different technical writers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1) Restructure the Table API &amp;amp; SQL Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Reworking the current documentation structure would allow to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Lower the entry barrier to Flink for non-programmatic (i.e. SQL) users.&lt;/li&gt;
&lt;li&gt;Make the available features more easily discoverable.&lt;/li&gt;
&lt;li&gt;Improve the flow and logical correlation of topics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685&quot;&gt;FLIP-60&lt;/a&gt; contains a detailed proposal on how to reorganize the existing documentation, which can be used as a starting point.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2) Extend the Table API &amp;amp; SQL Documentation&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some areas of the documentation have insufficient detail or are not &lt;a href=&quot;https://flink.apache.org/contributing/docs-style.html#general-guiding-principles&quot;&gt;accessible&lt;/a&gt; for new Flink users. Examples of topics and sections that require attention are: planners, built-in functions, connectors, overview and concepts sections. There is a lot of work to be done and the technical writer could choose what areas to focus on — these improvements could then be added to the documentation rework umbrella issue (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12639&quot;&gt;FLINK-12639&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id=&quot;project-mentors&quot;&gt;Project Mentors&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://twitter.com/aljoscha&quot;&gt;Aljoscha Krettek&lt;/a&gt; (Apache Flink and Apache Beam PMC Member)&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://twitter.com/sjwiesman&quot;&gt;Seth Wiesman&lt;/a&gt; (Apache Flink Committer)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;related-resources&quot;&gt;Related Resources&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;FLIP-60: &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685&quot;&gt;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Table API &amp;amp; SQL Documentation: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;How to Contribute Documentation: &lt;a href=&quot;https://flink.apache.org/contributing/contribute-documentation.html&quot;&gt;https://flink.apache.org/contributing/contribute-documentation.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Documentation Style Guide: &lt;a href=&quot;https://flink.apache.org/contributing/docs-style.html&quot;&gt;https://flink.apache.org/contributing/docs-style.html&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We look forward to receiving feedback on this GSoD application and also to continue improving the documentation experience for Flink users. Join us!&lt;/p&gt;
</description>
<pubDate>Mon, 04 May 2020 08:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/05/04/season-of-docs.html</link>
<guid isPermaLink="true">/news/2020/05/04/season-of-docs.html</guid>
</item>
<item>
<title>Apache Flink 1.9.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.9 series.&lt;/p&gt;
&lt;p&gt;This release includes 38 fixes and minor improvements for Flink 1.9.2. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.9.3.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15143&quot;&gt;FLINK-15143&lt;/a&gt;] - Create document for FLIP-49 TM memory model and configuration guide
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16389&quot;&gt;FLINK-16389&lt;/a&gt;] - Bump Kafka 0.10 to 0.10.2.2
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11193&quot;&gt;FLINK-11193&lt;/a&gt;] - Rocksdb timer service factory configuration option is not settable per job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14316&quot;&gt;FLINK-14316&lt;/a&gt;] - Stuck in &amp;quot;Job leader ... lost leadership&amp;quot; error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14560&quot;&gt;FLINK-14560&lt;/a&gt;] - The value of taskmanager.memory.size in flink-conf.yaml is set to zero will cause taskmanager not to work
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15010&quot;&gt;FLINK-15010&lt;/a&gt;] - Temp directories flink-netty-shuffle-* are not cleaned up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15085&quot;&gt;FLINK-15085&lt;/a&gt;] - HistoryServer dashboard config json out of sync
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15386&quot;&gt;FLINK-15386&lt;/a&gt;] - SingleJobSubmittedJobGraphStore.putJobGraph has a logic error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15575&quot;&gt;FLINK-15575&lt;/a&gt;] - Azure Filesystem Shades Wrong Package &amp;quot;httpcomponents&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15638&quot;&gt;FLINK-15638&lt;/a&gt;] - releasing/create_release_branch.sh does not set version in flink-python/pyflink/version.py
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15812&quot;&gt;FLINK-15812&lt;/a&gt;] - HistoryServer archiving is done in Dispatcher main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15844&quot;&gt;FLINK-15844&lt;/a&gt;] - Removal of JobWithJars.buildUserCodeClassLoader method without Configuration breaks backwards compatibility
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15863&quot;&gt;FLINK-15863&lt;/a&gt;] - Fix docs stating that savepoints are relocatable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16047&quot;&gt;FLINK-16047&lt;/a&gt;] - Blink planner produces wrong aggregate results with state clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16242&quot;&gt;FLINK-16242&lt;/a&gt;] - BinaryGeneric serialization error cause checkpoint failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16308&quot;&gt;FLINK-16308&lt;/a&gt;] - SQL connector download links are broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16373&quot;&gt;FLINK-16373&lt;/a&gt;] - EmbeddedLeaderService: IllegalStateException: The RPC connection is already closed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16573&quot;&gt;FLINK-16573&lt;/a&gt;] - Kinesis consumer does not properly shutdown RecordFetcher threads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16576&quot;&gt;FLINK-16576&lt;/a&gt;] - State inconsistency on restore with memory state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16696&quot;&gt;FLINK-16696&lt;/a&gt;] - Savepoint trigger documentation is insufficient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16703&quot;&gt;FLINK-16703&lt;/a&gt;] - AkkaRpcActor state machine does not record transition to terminating state.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16836&quot;&gt;FLINK-16836&lt;/a&gt;] - Losing leadership does not clear rpc connection in JobManagerLeaderListener
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16860&quot;&gt;FLINK-16860&lt;/a&gt;] - Failed to push filter into OrcTableSource when upgrading to 1.9.2
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16916&quot;&gt;FLINK-16916&lt;/a&gt;] - The logic of NullableSerializer#copy is wrong
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-17062&quot;&gt;FLINK-17062&lt;/a&gt;] - Fix the conversion from Java row type to Python row type
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14278&quot;&gt;FLINK-14278&lt;/a&gt;] - Pass in ioExecutor into AbstractDispatcherResourceManagerComponentFactory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15908&quot;&gt;FLINK-15908&lt;/a&gt;] - Add description of support &amp;#39;pip install&amp;#39; to 1.9.x documents
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15909&quot;&gt;FLINK-15909&lt;/a&gt;] - Add PyPI release process into the subsequent release of 1.9.x
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15938&quot;&gt;FLINK-15938&lt;/a&gt;] - Idle state not cleaned in StreamingJoinOperator and StreamingSemiAntiJoinOperator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16018&quot;&gt;FLINK-16018&lt;/a&gt;] - Improve error reporting when submitting batch job (instead of AskTimeoutException)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16031&quot;&gt;FLINK-16031&lt;/a&gt;] - Improve the description in the README file of PyFlink 1.9.x
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16167&quot;&gt;FLINK-16167&lt;/a&gt;] - Update documentation about python shell execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16280&quot;&gt;FLINK-16280&lt;/a&gt;] - Fix sample code errors in the documentation about elasticsearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16697&quot;&gt;FLINK-16697&lt;/a&gt;] - Disable JMX rebinding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16862&quot;&gt;FLINK-16862&lt;/a&gt;] - Remove example url in quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16942&quot;&gt;FLINK-16942&lt;/a&gt;] - ES 5 sink should allow users to select netty transport client
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11767&quot;&gt;FLINK-11767&lt;/a&gt;] - Introduce new TypeSerializerUpgradeTestBase, new PojoSerializerUpgradeTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-16454&quot;&gt;FLINK-16454&lt;/a&gt;] - Update the copyright year in NOTICE files
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 24 Apr 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/24/release-1.9.3.html</link>
<guid isPermaLink="true">/news/2020/04/24/release-1.9.3.html</guid>
</item>
<item>
<title>Memory Management Improvements with Apache Flink 1.10</title>
<description>&lt;p&gt;Apache Flink 1.10 comes with significant changes to the memory model of the Task Managers and configuration options for your Flink applications. These recently-introduced changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), providing strict control over its memory consumption. In this post, we describe Flink’s memory model, as it stands in Flink 1.10, how to set up and manage memory consumption of your Flink applications and the recent changes the community implemented in the latest Apache Flink release.&lt;/p&gt;
&lt;h2 id=&quot;introduction-to-flinks-memory-model&quot;&gt;Introduction to Flink’s memory model&lt;/h2&gt;
&lt;p&gt;Having a clear understanding of Apache Flink’s memory model allows you to manage resources for the various workloads more efficiently. The following diagram illustrates the main memory components in Flink:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-21-memory-management-improvements-flink-1.10/total-process-memory.svg&quot; width=&quot;400px&quot; alt=&quot;Flink: Total Process Memory&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Flink: Total Process Memory&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The Task Manager process is a JVM process. On a high level, its memory consists of the &lt;em&gt;JVM Heap&lt;/em&gt; and &lt;em&gt;Off-Heap&lt;/em&gt; memory. These types of memory are consumed by Flink directly or by JVM for its specific purposes (i.e. metaspace etc.). There are two major memory consumers within Flink: the user code of job operator tasks and the framework itself consuming memory for internal data structures, network buffers, etc.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Please note that&lt;/strong&gt; the user code has direct access to all memory types: &lt;em&gt;JVM Heap, Direct&lt;/em&gt; and &lt;em&gt;Native memory&lt;/em&gt;. Therefore, Flink cannot really control its allocation and usage. There are however two types of Off-Heap memory which are consumed by tasks and controlled explicitly by Flink:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Managed Memory&lt;/em&gt; (Off-Heap)&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Network Buffers&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The latter is part of the &lt;em&gt;JVM Direct Memory&lt;/em&gt;, allocated for user record data exchange between operator tasks.&lt;/p&gt;
&lt;h2 id=&quot;how-to-set-up-flink-memory&quot;&gt;How to set up Flink memory&lt;/h2&gt;
&lt;p&gt;With the latest release of Flink 1.10 and in order to provide better user experience, the framework comes with both high-level and fine-grained tuning of memory components. There are essentially three alternatives to setting up memory in Task Managers.&lt;/p&gt;
&lt;p&gt;The first two — and simplest — alternatives are configuring one of the two following options for total memory available for the JVM process of the Task Manager:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Total Process Memory&lt;/em&gt;: total memory consumed by the Flink Java application (including user code) and by the JVM to run the whole process.&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Total Flink Memory&lt;/em&gt;: only memory consumed by the Flink Java application, including user code but excluding memory allocated by JVM to run it&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It is advisable to configure the &lt;em&gt;Total Flink Memory&lt;/em&gt; for standalone deployments where explicitly declaring how much memory is given to Flink is a common practice, while the outer &lt;em&gt;JVM overhead&lt;/em&gt; is of little interest. For the cases of deploying Flink in containerized environments (such as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html&quot;&gt;Kubernetes&lt;/a&gt;, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/yarn_setup.html&quot;&gt;Yarn&lt;/a&gt; or &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/mesos.html&quot;&gt;Mesos&lt;/a&gt;), the &lt;em&gt;Total Process Memory&lt;/em&gt; option is recommended instead, because it becomes the size for the total memory of the requested container. Containerized environments usually strictly enforce this memory limit.&lt;/p&gt;
&lt;p&gt;If you want more fine-grained control over the size of &lt;em&gt;JVM Heap&lt;/em&gt; and &lt;em&gt;Managed Memory&lt;/em&gt; (Off-Heap), there is also a second alternative to configure both &lt;em&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#task-operator-heap-memory&quot;&gt;Task Heap&lt;/a&gt;&lt;/em&gt; and &lt;em&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#managed-memory&quot;&gt;Managed Memory&lt;/a&gt;&lt;/em&gt;. This alternative gives a clear separation between the heap memory and any other memory types.&lt;/p&gt;
&lt;p&gt;In line with the community’s efforts to &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;unify batch and stream processing&lt;/a&gt;, this model works universally for both scenarios. It allows sharing the &lt;em&gt;JVM Heap&lt;/em&gt; memory between the user code of operator tasks in any workload and the heap state backend in stream processing scenarios. In a similar way, the &lt;em&gt;Managed Memory&lt;/em&gt; can be used for batch spilling and for the RocksDB state backend in streaming.&lt;/p&gt;
&lt;p&gt;The remaining memory components are automatically adjusted either based on their default values or additionally configured parameters. Flink also checks the overall consistency. You can find more information about the different memory components in the corresponding &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html&quot;&gt;documentation&lt;/a&gt;. Additionally, you can try different configuration options with the &lt;a href=&quot;https://docs.google.com/spreadsheets/d/1mJaMkMPfDJJ-w6nMXALYmTc4XxiV30P5U7DzgwLkSoE/edit#gid=0&quot;&gt;configuration spreadsheet&lt;/a&gt; of &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors&quot;&gt;FLIP-49&lt;/a&gt; and check the corresponding results for your individual case.&lt;/p&gt;
&lt;p&gt;If you are migrating from a Flink version older than 1.10, we suggest following the steps in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_migration.html&quot;&gt;migration guide&lt;/a&gt; of the Flink documentation.&lt;/p&gt;
&lt;h2 id=&quot;other-components&quot;&gt;Other components&lt;/h2&gt;
&lt;p&gt;While configuring Flink’s memory, the size of different memory components can either be fixed with the value of the respective option or tuned using multiple options. Below we provide some more insight about the memory setup.&lt;/p&gt;
&lt;h3 id=&quot;fractions-of-the-total-flink-memory&quot;&gt;Fractions of the Total Flink Memory&lt;/h3&gt;
&lt;p&gt;This method allows a proportional breakdown of the &lt;em&gt;Total Flink Memory&lt;/em&gt; where the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#managed-memory&quot;&gt;Managed Memory&lt;/a&gt; (if not set explicitly) and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#capped-fractionated-components&quot;&gt;Network Buffers&lt;/a&gt; can take certain fractions of it. The remaining memory is then assigned to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#task-operator-heap-memory&quot;&gt;Task Heap&lt;/a&gt; (if not set explicitly) and other fixed &lt;em&gt;JVM Heap&lt;/em&gt; and &lt;em&gt;Off-Heap components&lt;/em&gt;. The following picture represents an example of such a setup:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-21-memory-management-improvements-flink-1.10/flink-memory-setup.svg&quot; width=&quot;800px&quot; alt=&quot;Flink: Example of Memory Setup&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Flink: Example of Memory Setup&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Please note that&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flink will verify that the size of the derived &lt;em&gt;Network Memory&lt;/em&gt; is between its minimum and maximum value, otherwise Flink’s startup will fail. The maximum and minimum limits have default values which can be overwritten by the respective configuration options.&lt;/li&gt;
&lt;li&gt;In general, the configured fractions are treated by Flink as hints. Under certain scenarios, the derived value might not match the fraction. For example, if the &lt;em&gt;Total Flink Memory&lt;/em&gt; and the &lt;em&gt;Task Heap&lt;/em&gt; are configured to fixed values, the &lt;em&gt;Managed Memory&lt;/em&gt; will get a certain fraction and the &lt;em&gt;Network Memory&lt;/em&gt; will get the remaining memory which might not exactly match its fraction.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;more-hints-to-control-the-container-memory-limit&quot;&gt;More hints to control the container memory limit&lt;/h3&gt;
&lt;p&gt;The heap and direct memory usage are managed by the JVM. There are also many other possible sources of native memory consumption in Apache Flink or its user applications which are not managed by Flink or the JVM. Controlling their limits is often difficult which complicates debugging of potential memory leaks. If Flink’s process allocates too much memory in an unmanaged way, it can often result in killing Task Manager containers in containerized environments. In this case, it may be hard to understand which type of memory consumption has exceeded its limit. Flink 1.10 introduces some specific tuning options to clearly represent such components. Although Flink cannot always enforce strict limits and borders among them, the idea here is to explicitly plan the memory usage. Below we provide some examples of how memory setup can prevent containers exceeding their memory limit:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_tuning.html#rocksdb-state-backend&quot;&gt;RocksDB state cannot grow too big&lt;/a&gt;. The memory consumption of RocksDB state backend is accounted for in the &lt;em&gt;Managed Memory&lt;/em&gt;. RocksDB respects its limit by default (only since Flink 1.10). You can increase the &lt;em&gt;Managed Memory&lt;/em&gt; size to improve RocksDB’s performance or decrease it to save resources.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html#configure-off-heap-memory-direct-or-native&quot;&gt;User code or its dependencies consume significant off-heap memory&lt;/a&gt;. Tuning the &lt;em&gt;Task Off-Heap&lt;/em&gt; option can assign additional direct or native memory to the user code or any of its dependencies. Flink cannot control native allocations but it sets the limit for &lt;em&gt;JVM Direct&lt;/em&gt; memory allocations. The &lt;em&gt;Direct&lt;/em&gt; memory limit is enforced by the JVM.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#jvm-parameters&quot;&gt;JVM metaspace requires additional memory&lt;/a&gt;. If you encounter &lt;code&gt;OutOfMemoryError: Metaspace&lt;/code&gt;, Flink provides an option to increase its limit and the JVM will ensure that it is not exceeded.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_detail.html#capped-fractionated-components&quot;&gt;JVM requires more internal memory&lt;/a&gt;. There is no direct control over certain types of JVM process allocations but Flink provides &lt;em&gt;JVM Overhead&lt;/em&gt; options. The options allow declaring an additional amount of memory, anticipated for those allocations and not covered by other options.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The latest Flink release (Flink 1.10) introduces some significant changes to Flink’s memory configuration, making it possible to manage your application memory and debug Flink significantly better than before. Future developments in this area also include adopting a similar memory model for the job manager process in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP+116%3A+Unified+Memory+Configuration+for+Job+Managers&quot;&gt;FLIP-116&lt;/a&gt;, so stay tuned for more additions and features in upcoming releases. If you have any suggestions or questions for the community, we encourage you to sign up to the Apache Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; and become part of the discussion there.&lt;/p&gt;
</description>
<pubDate>Tue, 21 Apr 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/21/memory-management-improvements-flink-1.10.html</link>
<guid isPermaLink="true">/news/2020/04/21/memory-management-improvements-flink-1.10.html</guid>
</item>
<item>
<title>Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can</title>
<description>&lt;p&gt;Almost every Flink job has to exchange data between its operators and since these records may not only be sent to another instance in the same JVM but instead to a separate process, records need to be serialized to bytes first. Similarly, Flink’s off-heap state-backend is based on a local embedded RocksDB instance which is implemented in native C++ code and thus also needs transformation into bytes on every state access. Wire and state serialization alone can easily cost a lot of your job’s performance if not executed correctly and thus, whenever you look into the profiler output of your Flink job, you will most likely see serialization in the top places for using CPU cycles.&lt;/p&gt;
&lt;p&gt;Since serialization is so crucial to your Flink job, we would like to highlight Flink’s serialization stack in a series of blog posts starting with looking at the different ways Flink can serialize your data types.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#recap-flink-serialization&quot; id=&quot;markdown-toc-recap-flink-serialization&quot;&gt;Recap: Flink Serialization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#choice-of-serializer&quot; id=&quot;markdown-toc-choice-of-serializer&quot;&gt;Choice of Serializer&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#pojoserializer&quot; id=&quot;markdown-toc-pojoserializer&quot;&gt;PojoSerializer&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#tuple-data-types&quot; id=&quot;markdown-toc-tuple-data-types&quot;&gt;Tuple Data Types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#row-data-types&quot; id=&quot;markdown-toc-row-data-types&quot;&gt;Row Data Types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#avro&quot; id=&quot;markdown-toc-avro&quot;&gt;Avro&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#avro-specific&quot; id=&quot;markdown-toc-avro-specific&quot;&gt;Avro Specific&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#avro-generic&quot; id=&quot;markdown-toc-avro-generic&quot;&gt;Avro Generic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#avro-reflect&quot; id=&quot;markdown-toc-avro-reflect&quot;&gt;Avro Reflect&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#kryo&quot; id=&quot;markdown-toc-kryo&quot;&gt;Kryo&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#disabling-kryo&quot; id=&quot;markdown-toc-disabling-kryo&quot;&gt;Disabling Kryo&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#apache-thrift-via-kryo&quot; id=&quot;markdown-toc-apache-thrift-via-kryo&quot;&gt;Apache Thrift (via Kryo)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#protobuf-via-kryo&quot; id=&quot;markdown-toc-protobuf-via-kryo&quot;&gt;Protobuf (via Kryo)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#state-schema-evolution&quot; id=&quot;markdown-toc-state-schema-evolution&quot;&gt;State Schema Evolution&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#performance-comparison&quot; id=&quot;markdown-toc-performance-comparison&quot;&gt;Performance Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;recap-flink-serialization&quot;&gt;Recap: Flink Serialization&lt;/h1&gt;
&lt;p&gt;Flink handles &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/types_serialization.html&quot;&gt;data types and serialization&lt;/a&gt; with its own type descriptors, generic type extraction, and type serialization framework. We recommend reading through the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/types_serialization.html&quot;&gt;documentation&lt;/a&gt; first in order to be able to follow the arguments we present below. In essence, Flink tries to infer information about your job’s data types for wire and state serialization, and to be able to use grouping, joining, and aggregation operations by referring to individual field names, e.g.
&lt;code&gt;stream.keyBy(“ruleId”)&lt;/code&gt; or
&lt;code&gt;dataSet.join(another).where(&quot;name&quot;).equalTo(&quot;personName&quot;)&lt;/code&gt;. It also allows optimizations in the serialization format as well as reducing unnecessary de/serializations (mainly in certain Batch operations as well as in the SQL/Table APIs).&lt;/p&gt;
&lt;h1 id=&quot;choice-of-serializer&quot;&gt;Choice of Serializer&lt;/h1&gt;
&lt;p&gt;Apache Flink’s out-of-the-box serialization can be roughly divided into the following groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flink-provided special serializers&lt;/strong&gt; for basic types (Java primitives and their boxed form), arrays, composite types (tuples, Scala case classes, Rows), and a few auxiliary types (Option, Either, Lists, Maps, …),&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;POJOs&lt;/strong&gt;; a public, standalone class with a public no-argument constructor and all non-static, non-transient fields in the class hierarchy either public or with a public getter- and a setter-method; see &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/types_serialization.html#rules-for-pojo-types&quot;&gt;POJO Rules&lt;/a&gt;,&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Generic types&lt;/strong&gt;; user-defined data types that are not recognized as a POJO and then serialized via &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Alternatively, you can also register &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/custom_serializers.html&quot;&gt;custom serializers&lt;/a&gt; for user-defined data types. This includes writing your own serializers or integrating other serialization systems like &lt;a href=&quot;https://developers.google.com/protocol-buffers/&quot;&gt;Google Protobuf&lt;/a&gt; or &lt;a href=&quot;https://thrift.apache.org/&quot;&gt;Apache Thrift&lt;/a&gt; via &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt;. Overall, this gives quite a number of different options of serializing user-defined data types and we will elaborate seven of them in the sections below.&lt;/p&gt;
&lt;h2 id=&quot;pojoserializer&quot;&gt;PojoSerializer&lt;/h2&gt;
&lt;p&gt;As outlined above, if your data type is not covered by a specialized serializer but follows the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/types_serialization.html#rules-for-pojo-types&quot;&gt;POJO Rules&lt;/a&gt;, it will be serialized with the &lt;a href=&quot;https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java&quot;&gt;PojoSerializer&lt;/a&gt; which uses Java reflection to access an object’s fields. It is fast, generic, Flink-specific, and supports &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/schema_evolution.html&quot;&gt;state schema evolution&lt;/a&gt; out of the box. If a composite data type cannot be serialized as a POJO, you will find the following message (or similar) in your cluster logs:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;15:45:51,460 INFO org.apache.flink.api.java.typeutils.TypeExtractor - Class … cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on “Data Types &amp;amp; Serialization” for details of the effect on performance.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;This means, that the PojoSerializer will not be used, but instead Flink will fall back to Kryo for serialization (see below). We will have a more detailed look into a few (more) situations that can lead to unexpected Kryo fallbacks in the second part of this blog post series.&lt;/p&gt;
&lt;h2 id=&quot;tuple-data-types&quot;&gt;Tuple Data Types&lt;/h2&gt;
&lt;p&gt;Flink comes with a predefined set of tuple types which all have a fixed length and contain a set of strongly-typed fields of potentially different types. There are implementations for &lt;code&gt;Tuple0&lt;/code&gt;, &lt;code&gt;Tuple1&amp;lt;T0&amp;gt;&lt;/code&gt;, …, &lt;code&gt;Tuple25&amp;lt;T0, T1, ..., T24&amp;gt;&lt;/code&gt; and they may serve as easy-to-use wrappers that spare the creation of POJOs for each and every combination of objects you need to pass between computations. With the exception of &lt;code&gt;Tuple0&lt;/code&gt;, these are serialized and deserialized with the &lt;a href=&quot;https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/TupleSerializer.java&quot;&gt;TupleSerializer&lt;/a&gt; and the according fields’ serializers. Since tuple classes are completely under the control of Flink, both actions can be performed without reflection by accessing the appropriate fields directly. This certainly is a (performance) advantage when working with tuples instead of POJOs. Tuples, however, are not as flexible and certainly less descriptive in code.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Since &lt;code&gt;Tuple0&lt;/code&gt; does not contain any data and therefore is probably a bit special anyway, it will use a special serializer implementation: &lt;a href=&quot;https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java&quot;&gt;Tuple0Serializer&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;row-data-types&quot;&gt;Row Data Types&lt;/h2&gt;
&lt;p&gt;Row types are mainly used by the Table and SQL APIs of Flink. A &lt;code&gt;Row&lt;/code&gt; groups an arbitrary number of objects together similar to the tuples above. These fields are not strongly typed and may all be of different types. Because field types are missing, Flink’s type extraction cannot automatically extract type information and users of a &lt;code&gt;Row&lt;/code&gt; need to manually tell Flink about the row’s field types. The &lt;a href=&quot;https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/RowSerializer.java&quot;&gt;RowSerializer&lt;/a&gt; will then make use of these types for efficient serialization.&lt;/p&gt;
&lt;p&gt;Row type information can be provided in two ways:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;you can have your source or operator implement &lt;code&gt;ResultTypeQueryable&amp;lt;Row&amp;gt;&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RowSource&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ResultTypeQueryable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getProducedType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ROW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;OBJECT_ARRAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;you can provide the types when building the job graph by using &lt;code&gt;SingleOutputStreamOperator#returns()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sourceStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RowSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;ROW&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;INT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;OBJECT_ARRAY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
If you fail to provide the type information for a &lt;code&gt;Row&lt;/code&gt;, Flink identifies that &lt;code&gt;Row&lt;/code&gt; is not a valid POJO type according to the rules above and falls back to Kryo serialization (see below) which you will also see in the logs as:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;13:10:11,148 INFO org.apache.flink.api.java.typeutils.TypeExtractor - Class class org.apache.flink.types.Row cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on &quot;Data Types &amp;amp; Serialization&quot; for details of the effect on performance.&lt;/code&gt;&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;avro&quot;&gt;Avro&lt;/h2&gt;
&lt;p&gt;Flink offers built-in support for the &lt;a href=&quot;http://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt; serialization framework (currently using version 1.8.2) by adding the &lt;code&gt;org.apache.flink:flink-avro&lt;/code&gt; dependency into your job. Flink’s &lt;a href=&quot;https://github.com/apache/flink/blob/release-1.10.0/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroSerializer.java&quot;&gt;AvroSerializer&lt;/a&gt; can then use Avro’s specific, generic, and reflective data serialization and make use of Avro’s performance and flexibility, especially in terms of &lt;a href=&quot;https://avro.apache.org/docs/current/spec.html#Schema+Resolution&quot;&gt;evolving the schema&lt;/a&gt; when the classes change over time.&lt;/p&gt;
&lt;h3 id=&quot;avro-specific&quot;&gt;Avro Specific&lt;/h3&gt;
&lt;p&gt;Avro specific records will be automatically detected by checking that the given type’s type hierarchy contains the &lt;code&gt;SpecificRecordBase&lt;/code&gt; class. You can either specify your concrete Avro type, or—if you want to be more generic and allow different types in your operator—use the &lt;code&gt;SpecificRecordBase&lt;/code&gt; type (or a subtype) in your user functions, in &lt;code&gt;ResultTypeQueryable#getProducedType()&lt;/code&gt;, or in &lt;code&gt;SingleOutputStreamOperator#returns()&lt;/code&gt;. Since specific records use generated Java code, they are strongly typed and allow direct access to the fields via known getters and setters.&lt;/p&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt; If you specify the Flink type as &lt;code&gt;SpecificRecord&lt;/code&gt; and not &lt;code&gt;SpecificRecordBase&lt;/code&gt;, Flink will not see this as an Avro type. Instead, it will use Kryo to de/serialize any objects which may be considerably slower.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;avro-generic&quot;&gt;Avro Generic&lt;/h3&gt;
&lt;p&gt;Avro’s &lt;code&gt;GenericRecord&lt;/code&gt; types cannot, unfortunately, be used automatically since they require the user to &lt;a href=&quot;https://avro.apache.org/docs/1.8.2/gettingstartedjava.html#Serializing+and+deserializing+without+code+generation&quot;&gt;specify a schema&lt;/a&gt; (either manually or by retrieving it from some schema registry). With that schema, you can provide the right type information by either of the following options just like for the Row Types above:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;implement &lt;code&gt;ResultTypeQueryable&amp;lt;GenericRecord&amp;gt;&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AvroGenericSource&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GenericRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ResultTypeQueryable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GenericRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;GenericRecordAvroTypeInfo&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;producedType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AvroGenericSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;producedType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GenericRecordAvroTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GenericRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getProducedType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;producedType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;provide type information when building the job graph by using &lt;code&gt;SingleOutputStreamOperator#returns()&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;GenericRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sourceStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AvroGenericSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;GenericRecordAvroTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Without this type information, Flink will fall back to Kryo for serialization which would serialize the schema into every record, over and over again. As a result, the serialized form will be bigger and more costly to create.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Since Avro’s &lt;code&gt;Schema&lt;/code&gt; class is not serializable, it can not be sent around as is. You can work around this by converting it to a String and parsing it back when needed. If you only do this once on initialization, there is practically no difference to sending it directly.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;avro-reflect&quot;&gt;Avro Reflect&lt;/h3&gt;
&lt;p&gt;The third way of using Avro is to exchange Flink’s PojoSerializer (for POJOs according to the rules above) for Avro’s reflection-based serializer. This can be enabled by calling&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;enableForceAvro&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;kryo&quot;&gt;Kryo&lt;/h2&gt;
&lt;p&gt;Any class or object which does not fall into the categories above or is covered by a Flink-provided special serializer is de/serialized with a fallback to &lt;a href=&quot;https://github.com/EsotericSoftware/kryo&quot;&gt;Kryo&lt;/a&gt; (currently version 2.24.0) which is a powerful and generic serialization framework in Java. Flink calls such a type a &lt;em&gt;generic type&lt;/em&gt; and you may stumble upon &lt;code&gt;GenericTypeInfo&lt;/code&gt; when debugging code. If you are using Kryo serialization, make sure to register your types with kryo:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerKryoType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyCustomType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Registering types adds them to an internal map of classes to tags so that, during serialization, Kryo does not have to add the fully qualified class names as a prefix into the serialized form. Instead, Kryo uses these (integer) tags to identify the underlying classes and reduce serialization overhead.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Flink will store Kryo serializer mappings from type registrations in its checkpoints and savepoints and will retain them across job (re)starts.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;disabling-kryo&quot;&gt;Disabling Kryo&lt;/h3&gt;
&lt;p&gt;If desired, you can disable the Kryo fallback, i.e. the ability to serialize generic types, by calling&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;disableGenericTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is mostly useful for finding out where these fallbacks are applied and replacing them with better serializers. If your job has any generic types with this configuration, it will fail with&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Exception in thread “main” java.lang.UnsupportedOperationException: Generic types have been disabled in the ExecutionConfig and type … is treated as a generic type.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;If you cannot immediately see from the type where it is being used, this log message also gives you a stacktrace that can be used to set breakpoints and find out more details in your IDE.&lt;/p&gt;
&lt;h2 id=&quot;apache-thrift-via-kryo&quot;&gt;Apache Thrift (via Kryo)&lt;/h2&gt;
&lt;p&gt;In addition to the variants above, Flink also allows you to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/custom_serializers.html#register-a-custom-serializer-for-your-flink-program&quot;&gt;register other type serialization frameworks&lt;/a&gt; with Kryo. After adding the appropriate dependencies from the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/custom_serializers.html#register-a-custom-serializer-for-your-flink-program&quot;&gt;documentation&lt;/a&gt; (&lt;code&gt;com.twitter:chill-thrift&lt;/code&gt; and &lt;code&gt;org.apache.thrift:libthrift&lt;/code&gt;), you can use &lt;a href=&quot;https://thrift.apache.org/&quot;&gt;Apache Thrift&lt;/a&gt; like the following:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addDefaultKryoSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyCustomType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TBaseSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This only works if generic types are not disabled and &lt;code&gt;MyCustomType&lt;/code&gt; is a Thrift-generated data type. If the data type is not generated by Thrift, Flink will fail at runtime with an exception like this:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;java.lang.ClassCastException: class MyCustomType cannot be cast to class org.apache.thrift.TBase (MyCustomType and org.apache.thrift.TBase are in unnamed module of loader ‘app’)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Please note that &lt;code&gt;TBaseSerializer&lt;/code&gt; can be registered as a default Kryo serializer as above (and as specified in &lt;a href=&quot;https://github.com/twitter/chill/blob/v0.7.6/chill-thrift/src/main/java/com/twitter/chill/thrift/TBaseSerializer.java&quot;&gt;its documentation&lt;/a&gt;) or via &lt;code&gt;registerTypeWithKryoSerializer&lt;/code&gt;. In practice, we found both ways working. We also saw no difference between registering Thrift classes in addition to the call above. Both may be different in your scenario.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;protobuf-via-kryo&quot;&gt;Protobuf (via Kryo)&lt;/h2&gt;
&lt;p&gt;In a way similar to Apache Thrift, &lt;a href=&quot;https://developers.google.com/protocol-buffers/&quot;&gt;Google Protobuf&lt;/a&gt; may be &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/custom_serializers.html#register-a-custom-serializer-for-your-flink-program&quot;&gt;registered as a custom serializer&lt;/a&gt; after adding the right dependencies (&lt;code&gt;com.twitter:chill-protobuf&lt;/code&gt; and &lt;code&gt;com.google.protobuf:protobuf-java&lt;/code&gt;):&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTypeWithKryoSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyCustomType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProtobufSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will work as long as generic types have not been disabled (this would disable Kryo for good). If &lt;code&gt;MyCustomType&lt;/code&gt; is not a Protobuf-generated class, your Flink job will fail at runtime with the following exception:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;java.lang.ClassCastException: class &lt;code&gt;MyCustomType&lt;/code&gt; cannot be cast to class com.google.protobuf.Message (&lt;code&gt;MyCustomType&lt;/code&gt; and com.google.protobuf.Message are in unnamed module of loader ‘app’)&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Please note that &lt;code&gt;ProtobufSerializer&lt;/code&gt; can be registered as a default Kryo serializer (as specified in the &lt;a href=&quot;https://github.com/twitter/chill/blob/v0.7.6/chill-thrift/src/main/java/com/twitter/chill/thrift/TBaseSerializer.java&quot;&gt;Protobuf documentation&lt;/a&gt;) or via &lt;code&gt;registerTypeWithKryoSerializer&lt;/code&gt; (as presented here). In practice, we found both ways working. We also saw no difference between registering your Protobuf classes in addition to the call above. Both may be different in your scenario.&lt;/p&gt;
&lt;/div&gt;
&lt;h1 id=&quot;state-schema-evolution&quot;&gt;State Schema Evolution&lt;/h1&gt;
&lt;p&gt;Before taking a closer look at the performance of each of the serializers described above, we would like to emphasize that performance is not everything that counts inside a real-world Flink job. Types for storing state, for example, should be able to evolve their schema (add/remove/change fields) throughout the lifetime of the job without losing previous state. This is what Flink calls &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/schema_evolution.html&quot;&gt;State Schema Evolution&lt;/a&gt;. Currently, as of Flink 1.10, there are only two serializers that support out-of-the-box schema evolution: POJO and Avro. For anything else, if you want to change the state schema, you will have to either implement your own &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/custom_serialization.html&quot;&gt;custom serializers&lt;/a&gt; or use the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/libs/state_processor_api.html&quot;&gt;State Processor API&lt;/a&gt; to modify your state for the new code.&lt;/p&gt;
&lt;h1 id=&quot;performance-comparison&quot;&gt;Performance Comparison&lt;/h1&gt;
&lt;p&gt;With so many options for serialization, it is actually not easy to make the right choice. We already saw some technical advantages and disadvantages of each of them outlined above. Since serializers are at the core of your Flink jobs and usually also sit on the hot path (per record invocations), let us actually take a deeper look into their performance with the help of the Flink benchmarks project at &lt;a href=&quot;https://github.com/dataArtisans/flink-benchmarks&quot;&gt;https://github.com/dataArtisans/flink-benchmarks&lt;/a&gt;. This project adds a few micro-benchmarks on top of Flink (some more low-level than others) to track performance regressions and improvements. Flink’s continuous benchmarks for monitoring the serialization stack’s performance are implemented in &lt;a href=&quot;https://github.com/dataArtisans/flink-benchmarks/blob/master/src/main/java/org/apache/flink/benchmark/SerializationFrameworkMiniBenchmarks.java&quot;&gt;SerializationFrameworkMiniBenchmarks.java&lt;/a&gt;. This is only a subset of all available serialization benchmarks though and you will find the complete set in &lt;a href=&quot;https://github.com/dataArtisans/flink-benchmarks/blob/master/src/main/java/org/apache/flink/benchmark/full/SerializationFrameworkAllBenchmarks.java&quot;&gt;SerializationFrameworkAllBenchmarks.java&lt;/a&gt;. All of these use the same definition of a small POJO that may cover average use cases. Essentially (without constructors, getters, and setters), these are the data types that it uses for evaluating performance:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyPojo&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;operationNames&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MyOperation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;operations&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;otherId1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;otherId2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;otherId3&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Object&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;someObject&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyOperation&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;protected&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is mapped to tuples, rows, Avro specific records, Thrift and Protobuf representations appropriately and sent through a simple Flink job at parallelism 4 where the data type is used during network communication like this:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setParallelism&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;4&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PojoSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RECORDS_PER_INVOCATION&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;rebalance&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DiscardingSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After running this through the &lt;a href=&quot;http://openjdk.java.net/projects/code-tools/jmh/&quot;&gt;jmh&lt;/a&gt; micro-benchmarks defined in &lt;a href=&quot;https://github.com/dataArtisans/flink-benchmarks/blob/master/src/main/java/org/apache/flink/benchmark/full/SerializationFrameworkAllBenchmarks.java&quot;&gt;SerializationFrameworkAllBenchmarks.java&lt;/a&gt;, I retrieved the following performance results for Flink 1.10 on my machine (in number of operations per millisecond):
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-15-flink-serialization-performance-results.svg&quot; width=&quot;800px&quot; alt=&quot;Communication between the Flink operator and the Python execution environment&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;A few takeaways from these numbers:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The default fallback from POJO to Kryo reduces performance by 75%.&lt;br /&gt;
Registering types with Kryo significantly improves its performance with only 64% fewer operations than by using a POJO.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Avro GenericRecord and SpecificRecord are roughly serialized at the same speed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Avro Reflect serialization is even slower than Kryo default (-45%).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Tuples are the fastest, closely followed by Rows. Both leverage fast specialized serialization code based on direct access without Java reflection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Using a (nested) Tuple instead of a POJO may speed up your job by 42% (but is less flexible!).
Having code-generation for the PojoSerializer (&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-3599&quot;&gt;FLINK-3599&lt;/a&gt;) may actually close that gap (or at least move closer to the RowSerializer). If you feel like giving the implementation a go, please give the Flink community a note and we will see whether we can make that happen.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you cannot use POJOs, try to define your data type with one of the serialization frameworks that generate specific code for it: Protobuf, Avro, Thrift (in that order, performance-wise).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt; As with all benchmarks, please bear in mind that these numbers only give a hint on Flink’s serializer performance in a specific scenario. They may be different with your data types but the rough classification is probably the same. If you want to be sure, please verify the results with your data types. You should be able to copy from &lt;code&gt;SerializationFrameworkAllBenchmarks.java&lt;/code&gt; to set up your own micro-benchmarks or integrate different serialization benchmarks into your own tooling.&lt;/p&gt;
&lt;/div&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;In the sections above, we looked at how Flink performs serialization for different sorts of data types and elaborated the technical advantages and disadvantages. For data types used in Flink state, you probably want to leverage either POJO or Avro types which, currently, are the only ones supporting state evolution out of the box and allow your stateful application to develop over time. POJOs are usually faster in the de/serialization while Avro may support more flexible schema evolution and may integrate better with external systems. Please note, however, that you can use different serializers for external vs. internal components or even state vs. network communication.&lt;/p&gt;
&lt;p&gt;The fastest de/serialization is achieved with Flink’s internal tuple and row serializers which can access these types’ fields directly without going via reflection. With roughly 30% decreased throughput as compared to tuples, Protobuf and POJO types do not perform too badly on their own and are more flexible and maintainable. Avro (specific and generic) records as well as Thrift data types further reduce performance by 20% and 30%, respectively. You definitely want to avoid Kryo as that reduces throughput further by around 50% and more!&lt;/p&gt;
&lt;p&gt;The next article in this series will use this finding as a starting point to look into a few common pitfalls and obstacles of avoiding Kryo, how to get the most out of the PojoSerializer, and a few more tuning techniques with respect to serialization. Stay tuned for more.&lt;/p&gt;
</description>
<pubDate>Wed, 15 Apr 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/15/flink-serialization-tuning-vol-1.html</link>
<guid isPermaLink="true">/news/2020/04/15/flink-serialization-tuning-vol-1.html</guid>
</item>
<item>
<title>PyFlink: Introducing Python Support for UDFs in Flink&#39;s Table API</title>
<description>&lt;p&gt;Flink 1.9 introduced the Python Table API, allowing developers and data engineers to write Python Table API jobs for Table transformations and analysis, such as Python ETL or aggregate jobs. However, Python users faced some limitations when it came to support for Python UDFs in Flink 1.9, preventing them from extending the system’s built-in functionality.&lt;/p&gt;
&lt;p&gt;In Flink 1.10, the community further extended the support for Python by adding Python UDFs in PyFlink. Additionally, both the Python UDF environment and dependency management are now supported, allowing users to import third-party libraries in the UDFs, leveraging Python’s rich set of third-party libraries.&lt;/p&gt;
&lt;h1 id=&quot;python-support-for-udfs-in-flink-110&quot;&gt;Python Support for UDFs in Flink 1.10&lt;/h1&gt;
&lt;p&gt;Before diving into how you can define and use Python UDFs, we explain the motivation and background behind how UDFs work in PyFlink and provide some additional context about the implementation of our approach. Below we give a brief introduction on the PyFlink architecture from job submission, all the way to executing the Python UDF.&lt;/p&gt;
&lt;p&gt;The PyFlink architecture mainly includes two parts — local and cluster — as shown in the architecture visual below. The local phase is the compilation of the job, and the cluster is the execution of the job.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-09-pyflink-udfs/pyflink-udf-architecture.png&quot; width=&quot;600px&quot; alt=&quot;PyFlink UDF Architecture&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;For the local part, the Python API is a mapping of the Java API: each time Python executes a method in the figure above, it will synchronously call the method corresponding to Java through Py4J, and finally generate a Java JobGraph, before submitting it to the cluster.&lt;/p&gt;
&lt;p&gt;For the cluster part, just like ordinary Java jobs, the JobMaster schedules tasks to TaskManagers. The tasks that include Python UDF in a TaskManager involve the execution of Java and Python operators. In the Python UDF operator, various gRPC services are used to provide different communications between the Java VM and the Python VM, such as DataService for data transmissions, StateService for state requirements, and Logging and Metrics Services. These services are built on Beam’s Fn API. While currently only Process mode is supported for Python workers, support for Docker mode and External service mode is also considered for future Flink releases.&lt;/p&gt;
&lt;h1 id=&quot;how-to-use-pyflink-with-udfs-in-flink-110&quot;&gt;How to use PyFlink with UDFs in Flink 1.10&lt;/h1&gt;
&lt;p&gt;This section provides some Python user defined function (UDF) examples, including how to install PyFlink, how to define/register/invoke UDFs in PyFlink and how to execute the job.&lt;/p&gt;
&lt;h2 id=&quot;install-pyflink&quot;&gt;Install PyFlink&lt;/h2&gt;
&lt;p&gt;Using Python in Apache Flink requires installing PyFlink. PyFlink is available through PyPI and can be easily installed using pip:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;python -m pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Please note that Python 3.5 or higher is required to install and run PyFlink&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;define-a-python-udf&quot;&gt;Define a Python UDF&lt;/h2&gt;
&lt;p&gt;There are many ways to define a Python scalar function, besides extending the base class &lt;code&gt;ScalarFunction&lt;/code&gt;. The following example shows the different ways of defining a Python scalar function that takes two columns of &lt;code&gt;BIGINT&lt;/code&gt; as input parameters and returns the sum of them as the result.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# option 1: extending the base class `ScalarFunction`&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ScalarFunction&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# option 2: Python function&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# option 3: lambda function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# option 4: callable function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CallableAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;object&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__call__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;CallableAdd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# option 5: partial function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;partial_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;functools&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partial&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partial_add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;k&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;register-a-python-udf&quot;&gt;Register a Python UDF&lt;/h2&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# register the Python function&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;invoke-a-python-udf&quot;&gt;Invoke a Python UDF&lt;/h2&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;c&quot;&gt;# use the function in Python Table API&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;my_table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add(a, b)&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Below, you can find a complete example of using Python UDF.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.datastream&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.descriptors&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;pyflink.table.udf&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_execution_environment&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_parallelism&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;add&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;register_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;/tmp/input&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;a&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;b&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_temporary_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FileSystem&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;/tmp/output&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_format&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OldCsv&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;with_schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;sum&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create_temporary_table&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;from_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySource&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;\
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;add(a, b)&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; \
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;insert_into&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;#39;mySink&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;tutorial_job&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;submit-the-job&quot;&gt;Submit the job&lt;/h2&gt;
&lt;p&gt;Firstly, you need to prepare the input data in the “/tmp/input” file. For example,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ echo &quot;1,2&quot; &amp;gt; /tmp/input&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Next, you can run this example on the command line,&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ python python_udf_sum.py&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, (see more details &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html#job-submission-examples&quot;&gt;here&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Finally, you can see the execution result on the command line:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;$ cat /tmp/output
3&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&quot;python-udf-dependency-management&quot;&gt;Python UDF dependency management&lt;/h2&gt;
&lt;p&gt;In many cases, you would like to import third-party dependencies in the Python UDF. The example below provides detailed guidance on how to manage such dependencies.&lt;/p&gt;
&lt;p&gt;Suppose you want to use the &lt;code&gt;mpmath&lt;/code&gt; to perform the sum of the example above. The Python UDF may look like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@udf&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input_types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()],&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result_type&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataTypes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;mpmath&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;fadd&lt;/span&gt; &lt;span class=&quot;c&quot;&gt;# add third-party dependency&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fadd&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;j&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;To make it available on the worker node that does not contain the dependency, you can specify the dependencies with the following commands and API:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;cd&lt;/span&gt; /tmp
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;echo &lt;/span&gt;&lt;span class=&quot;nv&quot;&gt;mpmath&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;==&lt;/span&gt;1.1.0 &amp;gt; requirements.txt
&lt;span class=&quot;nv&quot;&gt;$ &lt;/span&gt;pip download -d cached_dir -r requirements.txt --no-binary :all:&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;n&quot;&gt;t_env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;set_python_requirements&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;/tmp/requirements.txt&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;/tmp/cached_dir&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A &lt;code&gt;requirements.txt&lt;/code&gt; file that defines the third-party dependencies is used. If the dependencies cannot be accessed in the cluster, then you can specify a directory containing the installation packages of these dependencies by using the parameter “&lt;code&gt;requirements_cached_dir&lt;/code&gt;”, as illustrated in the example above. The dependencies will be uploaded to the cluster and installed offline.&lt;/p&gt;
&lt;h1 id=&quot;conclusion--upcoming-work&quot;&gt;Conclusion &amp;amp; Upcoming work&lt;/h1&gt;
&lt;p&gt;In this blog post, we introduced the architecture of Python UDFs in PyFlink and provided some examples on how to define, register and invoke UDFs. Flink 1.10 brings Python support in the framework to new levels, allowing Python users to write even more magic with their preferred language. The community is actively working towards continuously improving the functionality and performance of PyFlink. Future work in upcoming releases will introduce support for Pandas UDFs in scalar and aggregate functions, add support to use Python UDFs through the SQL client to further expand the usage scope of Python UDFs, provide support for a Python ML Pipeline API and finally work towards even more performance improvements. The picture below provides more details on the roadmap for succeeding releases.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-09-pyflink-udfs/roadmap-of-pyflink.png&quot; width=&quot;600px&quot; alt=&quot;Roadmap of PyFlink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
</description>
<pubDate>Thu, 09 Apr 2020 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2020/04/09/pyflink-udf-support-flink.html</link>
<guid isPermaLink="true">/2020/04/09/pyflink-udf-support-flink.html</guid>
</item>
<item>
<title>Stateful Functions 2.0 - An Event-driven Database on Apache Flink</title>
<description>&lt;p&gt;Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project.
This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the &lt;strong&gt;first version of an event-driven database&lt;/strong&gt; that is built on Apache Flink.&lt;/p&gt;
&lt;p&gt;Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes.&lt;/p&gt;
&lt;p&gt;With these features, Stateful Functions 2.0 addresses &lt;a href=&quot;https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf&quot;&gt;two of the most cited shortcomings&lt;/a&gt; of many FaaS setups today: consistent state and efficient messaging between functions.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#an-event-driven-database&quot; id=&quot;markdown-toc-an-event-driven-database&quot;&gt;An Event-driven Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#event-driven-database-vs-requestresponse-database&quot; id=&quot;markdown-toc-event-driven-database-vs-requestresponse-database&quot;&gt;“Event-driven Database” vs. “Request/Response Database”&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#state-and-consistency&quot; id=&quot;markdown-toc-state-and-consistency&quot;&gt;State and Consistency&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#remote-co-located-or-embedded-functions&quot; id=&quot;markdown-toc-remote-co-located-or-embedded-functions&quot;&gt;Remote, Co-located or Embedded Functions&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#remote-functions&quot; id=&quot;markdown-toc-remote-functions&quot;&gt;Remote Functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#co-located-functions&quot; id=&quot;markdown-toc-co-located-functions&quot;&gt;Co-located Functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#embedded-functions&quot; id=&quot;markdown-toc-embedded-functions&quot;&gt;Embedded Functions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#loading-data-into-the-database&quot; id=&quot;markdown-toc-loading-data-into-the-database&quot;&gt;Loading Data into the Database&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#try-it-out-and-get-involved&quot; id=&quot;markdown-toc-try-it-out-and-get-involved&quot;&gt;Try it out and get involved!&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#thank-you&quot; id=&quot;markdown-toc-thank-you&quot;&gt;Thank you!&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;an-event-driven-database&quot;&gt;An Event-driven Database&lt;/h2&gt;
&lt;p&gt;When Stateful Functions joined Apache Flink at the beginning of this year, the project had started as a library on top of Flink to build general-purpose event-driven applications. Users would implement &lt;em&gt;functions&lt;/em&gt; that receive and send messages, and maintain state in persistent variables. Flink provided the runtime with efficient exactly-once state and messaging. Stateful Functions 1.0 was a FaaS-inspired mix between stream processing and actor programming — on steroids.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image2.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 1&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.1:&lt;/b&gt; A ride-sharing app as a Stateful Functions example.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;In version 2.0, Stateful Functions now physically decouples the functions from Flink and the JVM, to invoke them through simple services. That makes it possible to execute functions on a FaaS platform, a Kubernetes deployment or behind a (micro) service.&lt;/p&gt;
&lt;p&gt;Flink invokes the functions through a service endpoint via HTTP or gRPC based on incoming events, and supplies state access. The system makes sure that only one invocation per entity (&lt;code&gt;type&lt;/code&gt;+&lt;code&gt;ID&lt;/code&gt;) is ongoing at any point in time, thus guaranteeing consistency through isolation.
By supplying state access as part of the function invocation, the functions themselves behave like stateless applications and can be managed with the same simplicity and benefits: rapid scalability, scale-to-zero, rolling/zero-downtime upgrades and so on.&lt;/p&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image5.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 2&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.2:&lt;/b&gt; In Stateful Functions 2.0, functions are stateless and state access is part of the function invocation.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;The functions can be implemented in any programming language that can handle HTTP requests or bring up a gRPC server. The &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;StateFun project&lt;/a&gt; includes a very slim SDK for Python, taking requests and dispatching them to annotated functions. We aim to provide similar SDKs for other languages, such as Go, JavaScript or Rust. Users do not need to write any Flink code (or JVM code) at all; data ingresses/egresses and function endpoints can be defined in a compact YAML spec.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;div class=&quot;row&quot;&gt;
&lt;div class=&quot;col-lg-6&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image3.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 3&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.3:&lt;/b&gt; A module declaring a remote endpoint and a function type.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div class=&quot;col-lg-6&quot;&gt;
&lt;div class=&quot;text-center&quot;&gt;
&lt;figure&gt;
&lt;div style=&quot;line-height:540%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image10.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 4&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.4:&lt;/b&gt; A Python implementation of a simple classifier function.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;/div&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;The Flink processes (and the JVM) are not executing any user-code at all — though this is possible, for performance reasons (see &lt;a href=&quot;#embedded-functions&quot;&gt;Embedded Functions&lt;/a&gt;). Rather than running application-specific dataflows, Flink here stores the state of the functions and provides the dynamic messaging plane through which functions message each other, carefully dispatching messages/invocations to the event-driven functions/services to maintain consistency guarantees.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Effectively, Flink takes the role of the database, but tailored towards event-driven functions and services.
It integrates state storage with the messaging between (and the invocations of) functions and services.
Because of this, Stateful Functions 2.0 can be thought of as an “Event-driven Database” on Apache Flink.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;event-driven-database-vs-requestresponse-database&quot;&gt;“Event-driven Database” vs. “Request/Response Database”&lt;/h2&gt;
&lt;p&gt;In the case of a traditional database or key/value store (let’s call them request/response databases), the application issues queries to the database (e.g. SQL via JDBC, GET/PUT via HTTP). In contrast, an event-driven database like StateFun &lt;strong&gt;&lt;em&gt;inverts&lt;/em&gt;&lt;/strong&gt; that relationship between database and application: the database invokes the functions/services based on arriving messages. This fits very naturally with FaaS and many event-driven application architectures.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image7.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 5&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.5:&lt;/b&gt; Stateful Functions 2.0 inverts the relationship between database and application.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;In the case of applications built on request/response databases, the database is responsible only for the state. Communication between different functions/services is a separate concern handled within the application layer. In contrast to that, an event-driven database takes care of both state storage and message transport, in a tightly integrated manner.&lt;/p&gt;
&lt;p&gt;Similar to &lt;a href=&quot;https://www.brianstorti.com/the-actor-model/&quot;&gt;Actor Programming&lt;/a&gt;, Stateful Functions uses the idea of &lt;em&gt;addressable entities&lt;/em&gt; - here, the entity is a function &lt;code&gt;type&lt;/code&gt; with an invocation scoped to an &lt;code&gt;ID&lt;/code&gt;. These addressable entities own the state and are the targets of messages. Different to actor systems is that the application logic is external and the addressable entities are not physical objects in memory (i.e. actors), but rows in Flink’s managed state, together with the entities’ mailboxes.&lt;/p&gt;
&lt;h3 id=&quot;state-and-consistency&quot;&gt;State and Consistency&lt;/h3&gt;
&lt;p&gt;Besides matching the needs of serverless applications and FaaS well, the event-driven database approach also helps with simplifying consistent state management.&lt;/p&gt;
&lt;p&gt;Consider the example below, with two entities of an application — for example two microservices (&lt;em&gt;Service 1&lt;/em&gt;, &lt;em&gt;Service 2&lt;/em&gt;). &lt;em&gt;Service 1&lt;/em&gt; is invoked, updates the state in the database, and sends a request to &lt;em&gt;Service 2&lt;/em&gt;. Assume that this request fails. There is, in general, no way for &lt;em&gt;Service 1&lt;/em&gt; to know whether &lt;em&gt;Service 2&lt;/em&gt; processed the request and updated its state or not (c.f. &lt;a href=&quot;https://en.wikipedia.org/wiki/Two_Generals%27_Problem&quot;&gt;Two Generals Problem&lt;/a&gt;). To work around that, many techniques exist — making requests idempotent and retrying, commit/rollback protocols, or external transaction coordinators, for example. Solving this in the application layer is complex enough, and including the database into these approaches only adds more complexity.&lt;/p&gt;
&lt;p&gt;In the scenario where the event-driven database takes care of state and messaging, we have a much easier problem to solve. Assume one shard of the database receives the initial message, updates its state, invokes &lt;em&gt;Service 1&lt;/em&gt;, and routes the message produced by the function to another shard, to be delivered to &lt;em&gt;Service 2&lt;/em&gt;. Now assume message transport errored — it may have failed or not, we cannot know for certain. Because the database is in charge of state and messaging, it can offer a generic solution to make sure that either both go through or none does, for example through transactions or &lt;a href=&quot;https://dl.acm.org/doi/abs/10.14778/3137765.3137777&quot;&gt;consistent snapshots&lt;/a&gt;. The application functions are stateless and their invocations without side effects, which means they can be re-invoked again without implications on consistency.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;figure&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image8.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 6&quot; /&gt;
&lt;br /&gt;&lt;br /&gt;
&lt;figcaption&gt;&lt;i&gt;&lt;b&gt;Fig.6:&lt;/b&gt; The event-driven database integrates state access and messaging, guaranteeing consistency.&lt;/i&gt;&lt;/figcaption&gt;
&lt;/figure&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;That is the big lesson we learned from working on stream processing technology in the past years: &lt;strong&gt;state access/updates and messaging need to be integrated&lt;/strong&gt;. This gives you consistency, scalable behavior and backpressures well based on both state access and compute bottlenecks.&lt;/p&gt;
&lt;p&gt;Despite state and computation being physically separated here, the scheduling/dispatching of function invocations is still integrated and physically co-located with state access, preserving the consistency guarantees given by physical state/compute co-location.&lt;/p&gt;
&lt;h2 id=&quot;remote-co-located-or-embedded-functions&quot;&gt;Remote, Co-located or Embedded Functions&lt;/h2&gt;
&lt;p&gt;Functions can be deployed in various ways that trade off loose coupling and independent scaling with performance overhead. Each module of functions can be of a different kind, so some functions can run remote, while others could run embedded.&lt;/p&gt;
&lt;h3 id=&quot;remote-functions&quot;&gt;Remote Functions&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Remote Functions&lt;/em&gt; are the mechanism described so far, where functions are deployed separately from the Flink StateFun cluster. The state/messaging tier (i.e. the Flink processes) and the function tier can be deployed and scaled independently. All function invocations are remote and have to go through the endpoint service.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image6.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 7&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;In a similar way as databases are accessed via a standardized protocol (e.g. ODBC/JDBC for relational databases, REST for many key/value stores), StateFun 2.0 invokes functions and services through a standardized protocol: HTTP or gRPC with data in a well-defined ProtoBuf schema.&lt;/p&gt;
&lt;h3 id=&quot;co-located-functions&quot;&gt;Co-located Functions&lt;/h3&gt;
&lt;p&gt;An alternative way of deploying functions is &lt;em&gt;co-location&lt;/em&gt; with the Flink JVM processes. In such a setup, each Flink TaskManager would talk to one function process sitting “next to it”. A common way to do this is to use a system like Kubernetes and deploy pods consisting of a Flink container and the function container that communicate via the pod-local network.&lt;/p&gt;
&lt;p&gt;This mode supports different languages while avoiding to route invocations through a Service/Gateway/LoadBalancer, but it cannot scale the state and compute parts independently.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image9.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 8&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;This style of deployment is similar to how &lt;a href=&quot;https://beam.apache.org/roadmap/portability/&quot;&gt;Apache Beam’s portability layer&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/tutorials/python_table_api.html&quot;&gt;Flink’s Python API&lt;/a&gt; deploy their non-JVM language SDKs.&lt;/p&gt;
&lt;h3 id=&quot;embedded-functions&quot;&gt;Embedded Functions&lt;/h3&gt;
&lt;p&gt;&lt;em&gt;Embedded Functions&lt;/em&gt; are the mode of Stateful Functions 1.0 and Flink’s Java/Scala stream processing APIs. Functions are deployed into the JVM and are directly invoked with the messages and state access. This is the most performant way, though at the cost of only supporting JVM languages.&lt;/p&gt;
&lt;div style=&quot;line-height:60%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-04-07-release-statefun-2.0.0/image11.png&quot; width=&quot;600px&quot; alt=&quot;Statefun 9&quot; /&gt;
&lt;/center&gt;
&lt;div style=&quot;line-height:150%;&quot;&gt;
&lt;br /&gt;
&lt;/div&gt;
&lt;p&gt;Following the database analogy, embedded functions are a bit like &lt;em&gt;stored procedures&lt;/em&gt;, but in a principled way: the functions here are normal Java/Scala/Kotlin functions implementing standard interfaces and can be developed or tested in any IDE.&lt;/p&gt;
&lt;h2 id=&quot;loading-data-into-the-database&quot;&gt;Loading Data into the Database&lt;/h2&gt;
&lt;p&gt;When building a new stateful application, you usually don’t start from a completely blank slate. Often, the application has initial state, such as initial “bootstrap” state, or state from previous versions of the application. When using a database, one could simply bulk load the data to prepare the application.&lt;/p&gt;
&lt;p&gt;The equivalent step for Flink would be to write a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/ops/state/savepoints.html&quot;&gt;savepoint&lt;/a&gt; that contains the initial state. Savepoints are snapshots of the state of the distributed stream processing application and can be passed to Flink to start processing from that state. Think of them as a database dump, but of a distributed streaming database. In the case of StateFun, the savepoint would contain the state of the functions.&lt;/p&gt;
&lt;p&gt;To create a savepoint for a Stateful Functions program, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/deployment-and-operations/state-bootstrap.html&quot;&gt;State Bootstrapping API&lt;/a&gt; that is part of StateFun 2.0. The State Bootstrapping API uses Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/batch/&quot;&gt;DataSet API&lt;/a&gt;, but we plan to expand this to use SQL in the next versions.&lt;/p&gt;
&lt;h2 id=&quot;try-it-out-and-get-involved&quot;&gt;Try it out and get involved!&lt;/h2&gt;
&lt;p&gt;We hope that we could convey some of the excitement we feel about Stateful Functions. If we managed to pique your curiosity, try it out — for example, starting with &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/getting-started/python_walkthrough.html&quot;&gt;this walkthrough&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The project is still in a comparatively early stage, so if you want to get involved, there is lots to work on: SDKs for other languages (e.g. Go, JavaScript, Rust), ingresses/egresses and tools for testing, among others.&lt;/p&gt;
&lt;p&gt;To follow the project and learn more, please check out these resources:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code: &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;https://github.com/apache/flink-statefun&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Docs: &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/&quot;&gt;https://ci.apache.org/projects/flink/flink-statefun-docs-release-2.0/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apache Flink project site: &lt;a href=&quot;https://flink.apache.org/&quot;&gt;https://flink.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apache Flink on Twitter: &lt;a href=&quot;https://twitter.com/apacheflink&quot;&gt;@ApacheFlink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Stateful Functions Webpage: &lt;a href=&quot;https://statefun.io&quot;&gt;https://statefun.io&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Stateful Functions on Twitter: &lt;a href=&quot;https://twitter.com/statefun_io&quot;&gt;@StateFun_IO&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;thank-you&quot;&gt;Thank you!&lt;/h2&gt;
&lt;p&gt;The Apache Flink community would like to thank all contributors that have made this release possible:&lt;/p&gt;
&lt;p&gt;David Anderson, Dian Fu, Igal Shilman, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai, hequn8128&lt;/p&gt;
</description>
<pubDate>Tue, 07 Apr 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/07/release-statefun-2.0.0.html</link>
<guid isPermaLink="true">/news/2020/04/07/release-statefun-2.0.0.html</guid>
</item>
<item>
<title>Flink Community Update - April&#39;20</title>
<description>&lt;p&gt;While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.&lt;/p&gt;
&lt;p&gt;And since now it’s more important than ever to keep up the spirits, we’d like to invite you to join the &lt;a href=&quot;https://www.flink-forward.org/sf-2020&quot;&gt;Flink Forward Virtual Conference&lt;/a&gt;, on April 22-24 (see &lt;a href=&quot;#upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt;). Hope to see you there!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#the-year-so-far-in-flink&quot; id=&quot;markdown-toc-the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-110-release&quot; id=&quot;markdown-toc-flink-110-release&quot;&gt;Flink 1.10 Release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#stateful-functions-contribution-and-20-release&quot; id=&quot;markdown-toc-stateful-functions-contribution-and-20-release&quot;&gt;Stateful Functions Contribution and 2.0 Release&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#building-up-to-flink-111&quot; id=&quot;markdown-toc-building-up-to-flink-111&quot;&gt;Building up to Flink 1.11&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#a-look-into-the-flink-repository&quot; id=&quot;markdown-toc-a-look-into-the-flink-repository&quot;&gt;A Look into the Flink Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-community-packages&quot; id=&quot;markdown-toc-flink-community-packages&quot;&gt;Flink Community Packages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-engine-room&quot; id=&quot;markdown-toc-flink-engine-room&quot;&gt;Flink “Engine Room”&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#upcoming-events&quot; id=&quot;markdown-toc-upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-forward-virtual-conference&quot; id=&quot;markdown-toc-flink-forward-virtual-conference&quot;&gt;Flink Forward Virtual Conference&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#others&quot; id=&quot;markdown-toc-others&quot;&gt;Others&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/h1&gt;
&lt;h2 id=&quot;flink-110-release&quot;&gt;Flink 1.10 Release&lt;/h2&gt;
&lt;p&gt;To kick off the new year, the Flink community &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;released Flink 1.10&lt;/a&gt; with the record contribution of over 200 engineers. This release introduced significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and advances in Python support (PyFlink). Flink 1.10 also marked the completion of the &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html#preview-of-the-new-blink-sql-query-processor&quot;&gt;Blink integration&lt;/a&gt;, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.&lt;/p&gt;
&lt;p&gt;The community is now discussing the &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Releasing-Flink-1-10-1-td38689.html#a38690&quot;&gt;release of Flink 1.10.1&lt;/a&gt;, covering some outstanding bugs from Flink 1.10.&lt;/p&gt;
&lt;h2 id=&quot;stateful-functions-contribution-and-20-release&quot;&gt;Stateful Functions Contribution and 2.0 Release&lt;/h2&gt;
&lt;p&gt;Last January, the first version of Stateful Functions (&lt;a href=&quot;https://statefun.io/&quot;&gt;statefun.io&lt;/a&gt;) code was pushed to the &lt;a href=&quot;https://github.com/apache/flink-statefun&quot;&gt;Flink repository&lt;/a&gt;. Stateful Functions started out as an API to build general purpose event-driven applications on Flink, taking advantage of its advanced state management mechanism to cut the “middleman” that usually handles state coordination in such applications (e.g. a database).&lt;/p&gt;
&lt;p&gt;In a &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Update-on-Flink-Stateful-Functions-what-are-the-next-steps-tp38646.html&quot;&gt;recent update&lt;/a&gt;, some new features were announced, like multi-language support (including a Python SDK), function unit testing and Stateful Functions’ own flavor of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/state_processor_api.html&quot;&gt;State Processor API&lt;/a&gt;. The release cycle will be independent from core Flink releases and the Release Candidate (RC) has been created — so, &lt;strong&gt;you can expect Stateful Functions 2.0 to be released very soon!&lt;/strong&gt;&lt;/p&gt;
&lt;h2 id=&quot;building-up-to-flink-111&quot;&gt;Building up to Flink 1.11&lt;/h2&gt;
&lt;p&gt;Amidst the usual outpour of discussion threads, JIRA tickets and FLIPs, the community is working full steam on bringing Flink 1.11 to life in the next few months. The feature freeze is currently scheduled for late April, so the release is expected around mid May.
The upcoming release will focus on new features and integrations that broaden the scope of Flink use cases, as well as core runtime enhancements to streamline the operations of complex deployments.&lt;/p&gt;
&lt;p&gt;Some of the plans on the use case side include support for changelog streams in the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL&quot;&gt;FLIP-105&lt;/a&gt;), easy streaming data ingestion into Apache Hive (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-115%3A+Filesystem+connector+in+Table&quot;&gt;FLIP-115&lt;/a&gt;) and support for Pandas DataFrames in PyFlink. On the operational side, the much anticipated new Source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints&quot;&gt;FLIP-76&lt;/a&gt;) and some changes to network memory management will allow to speed up checkpointing under backpressure.&lt;/p&gt;
&lt;p&gt;Throw into the mix improvements around type systems, the WebUI, metrics reporting and supported formats, this release is bound to keep the community busy. For a complete overview of the ongoing development, check &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-of-Apache-Flink-1-11-td38724.html#a38793&quot;&gt;this discussion&lt;/a&gt; and follow the weekly updates on the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;
&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;1 PMC (Project Management Committee) Member&lt;/strong&gt; and &lt;strong&gt;5 new Committers&lt;/strong&gt; since the last update (September 2019):&lt;/p&gt;
&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Jark Wu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Zili Chen, Jingsong Lee, Yu Li, Dian Fu, Zhu Zhu
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Congratulations to all and thank you for your hardworking commitment to Flink!&lt;/p&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;
&lt;h2 id=&quot;a-look-into-the-flink-repository&quot;&gt;A Look into the Flink Repository&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&quot;https://flink.apache.org/news/2019/09/10/community-update.html&quot;&gt;last update&lt;/a&gt;, we shared some numbers around Flink releases and mailing list activity. This time, we’re looking into the activity in the Flink repository and how it’s evolving.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-03-30-flink-community-update/2020-03-30-flink-community-update_1.png&quot; width=&quot;725px&quot; alt=&quot;GitHub 1&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;There is a clear upward trend in the number of contributions to the repository, based on the number of commits. This reflects the &lt;strong&gt;fast pace of development&lt;/strong&gt; the project is experiencing and also the &lt;strong&gt;successful integration of the China-based Flink contributors&lt;/strong&gt; started early last year. To complement these observations, the repository registered a &lt;strong&gt;1.5x increase in the number of individual contributors in 2019&lt;/strong&gt;, compared to the previous year.&lt;/p&gt;
&lt;p&gt;But did this increase in capacity produce any other measurable benefits?&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-03-30-flink-community-update/2020-03-30-flink-community-update_2.png&quot; width=&quot;725px&quot; alt=&quot;GitHub 2&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;If we look at the average time of Pull Request (PR) “resolution”, it seems like it did: &lt;strong&gt;the average time it takes to close a PR these days has been steadily decreasing&lt;/strong&gt; since last year, sitting between 5-6 days for the past few months.&lt;/p&gt;
&lt;p&gt;These are great indicators of the health of Flink as an open source project!&lt;/p&gt;
&lt;h2 id=&quot;flink-community-packages&quot;&gt;Flink Community Packages&lt;/h2&gt;
&lt;p&gt;If you missed the launch of &lt;a href=&quot;http://flink-packages.org/&quot;&gt;flink-packages.org&lt;/a&gt;, here’s a reminder! Ververica has &lt;a href=&quot;https://www.ververica.com/blog/announcing-flink-community-packages&quot;&gt;created (and open sourced)&lt;/a&gt; a website that showcases the work of the community to push forward the ecosystem surrounding Flink. There, you can explore existing packages (like the Pravega and Pulsar Flink connectors, or the Flink Kubernetes operators developed by Google and Lyft) and also submit your own contributions to the ecosystem.&lt;/p&gt;
&lt;h2 id=&quot;flink-engine-room&quot;&gt;Flink “Engine Room”&lt;/h2&gt;
&lt;p&gt;The community has recently launched the &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewrecentblogposts.action?key=FLINK&quot;&gt;“Engine Room”&lt;/a&gt;, a dedicated space in Flink’s Wiki for knowledge sharing between contributors. The goal of this initiative is to make ongoing development on Flink internals more transparent across different work streams, and also to help new contributors get on board with best practices. The first blogpost is already up and sheds light on the &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/2020/03/22/Migrating+Flink%27s+CI+Infrastructure+from+Travis+CI+to+Azure+Pipelines&quot;&gt;migration of Flink’s CI infrastructure from Travis to Azure Pipelines&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&quot;upcoming-events&quot;&gt;Upcoming Events&lt;/h1&gt;
&lt;h2 id=&quot;flink-forward-virtual-conference&quot;&gt;Flink Forward Virtual Conference&lt;/h2&gt;
&lt;p&gt;The organization of Flink Forward had to make the hard decision of cancelling this year’s event in San Francisco. But all is not lost! &lt;strong&gt;Flink Forward SF will be held online on April 22-24 and you can register (for free)&lt;/strong&gt; &lt;a href=&quot;https://www.flink-forward.org/sf-2020&quot;&gt;here&lt;/a&gt;. Join the community for interactive talks and Q&amp;amp;A sessions with core Flink contributors and companies like Splunk, Lyft, Netflix or Google.&lt;/p&gt;
&lt;h2 id=&quot;others&quot;&gt;Others&lt;/h2&gt;
&lt;p&gt;Events across the globe have come to a halt due to the growing concerns around COVID-19, so this time we’ll leave you with some interesting content to read instead. In addition to this written content, you can also recap last year’s sessions from &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD207Aa8b5CsZjc7Z_KRezGz&quot;&gt;Flink Forward Berlin&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/playlist?list=PLDX4T_cnKjD3ANoNinSx3Au-poZTHvbF5&quot;&gt;Flink Forward China&lt;/a&gt;!&lt;/p&gt;
&lt;table class=&quot;table table-bordered&quot;&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Links&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon glyphicon-bookmark&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Blogposts&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://medium.com/bird-engineering/replayable-process-functions-in-flink-time-ordering-and-timers-28007a0210e1&quot;&gt;Replayable Process Functions: Time, Ordering, and Timers @Bird&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://engineering.salesforce.com/application-log-intelligence-performance-insights-at-salesforce-using-flink-92955f30573f&quot;&gt;Application Log Intelligence &amp;amp; Performance Insights at Salesforce Using Flink @Salesforce&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html&quot;&gt;State Unlocked: Interacting with State in Apache Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html&quot;&gt;Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html&quot;&gt;Apache Beam: How Beam Runs on Top of Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/features/2020/03/27/flink-for-data-warehouse.html&quot;&gt;Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;span class=&quot;glyphicon glyphicon-console&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Tutorials&lt;/td&gt;
&lt;td&gt;&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://medium.com/@zjffdu/flink-on-zeppelin-part-3-streaming-5fca1e16754&quot;&gt;Flink on Zeppelin — (Part 3). Streaming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/blogs/big-data/streaming-etl-with-apache-flink-and-amazon-kinesis-data-analytics&quot;&gt;Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/02/20/ddl.html&quot;&gt;No Java Required: Configuring Sources and Sinks in SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html&quot;&gt;A Guide for Unit Testing in Apache Flink&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;@community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more.&lt;/p&gt;
</description>
<pubDate>Wed, 01 Apr 2020 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2020/04/01/community-update.html</link>
<guid isPermaLink="true">/news/2020/04/01/community-update.html</guid>
</item>
<item>
<title>Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration</title>
<description>&lt;p&gt;In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#introduction&quot; id=&quot;markdown-toc-introduction&quot;&gt;Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-and-its-integration-with-hive-comes-into-the-scene&quot; id=&quot;markdown-toc-flink-and-its-integration-with-hive-comes-into-the-scene&quot;&gt;Flink and Its Integration With Hive Comes into the Scene&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#unified-metadata-management&quot; id=&quot;markdown-toc-unified-metadata-management&quot;&gt;Unified Metadata Management&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#stream-processing&quot; id=&quot;markdown-toc-stream-processing&quot;&gt;Stream Processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#compatible-with-more-hive-versions&quot; id=&quot;markdown-toc-compatible-with-more-hive-versions&quot;&gt;Compatible with More Hive Versions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#reuse-hive-user-defined-functions-udfs&quot; id=&quot;markdown-toc-reuse-hive-user-defined-functions-udfs&quot;&gt;Reuse Hive User Defined Functions (UDFs)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#enhanced-read-and-write-on-hive-data&quot; id=&quot;markdown-toc-enhanced-read-and-write-on-hive-data&quot;&gt;Enhanced Read and Write on Hive Data&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#formats&quot; id=&quot;markdown-toc-formats&quot;&gt;Formats&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#more-data-types&quot; id=&quot;markdown-toc-more-data-types&quot;&gt;More Data Types&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#roadmap&quot; id=&quot;markdown-toc-roadmap&quot;&gt;Roadmap&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#summary&quot; id=&quot;markdown-toc-summary&quot;&gt;Summary&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;What are some of the latest requirements for your data warehouse and data infrastructure in 2020?&lt;/p&gt;
&lt;p&gt;We’ve came up with some for you.&lt;/p&gt;
&lt;p&gt;Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. People become less and less tolerant of delays between when data is generated and when it arrives at their hands, ready to use. Hours or even days of delay is not acceptable anymore. Users are expecting minutes, or even seconds, of end-to-end latency for data in their warehouse, to get quicker-than-ever insights.&lt;/p&gt;
&lt;p&gt;Secondly, the infrastructure should be able to handle both offline batch data for offline analytics and exploration, and online streaming data for more timely analytics. Both are indispensable as they both have very valid use cases. Apart from the real time processing mentioned above, batch processing would still exist as it’s good for ad hoc queries and explorations, and full-size calculations. Your modern infrastructure should not force users to choose between one or the other, it should offer users both options for a world-class data infrastructure.&lt;/p&gt;
&lt;p&gt;Thirdly, the data players, including data engineers, data scientists, analysts, and operations, urge a more unified infrastructure than ever before for easier ramp-up and higher working efficiency. The big data landscape has been fragmented for years - companies may have one set of infrastructure for real time processing, one set for batch, one set for OLAP, etc. That, oftentimes, comes as a result of the legacy of lambda architecture, which was popular in the era when stream processors were not as mature as today and users had to periodically run batch processing as a way to correct streaming pipelines. Well, it’s a different era now! As stream processing becomes mainstream and dominant, end users no longer want to learn shattered pieces of skills and maintain many moving parts with all kinds of tools and pipelines. Instead, what they really need is a unified analytics platform that can be mastered easily, and simplify any operational complexity.&lt;/p&gt;
&lt;p&gt;If any of these resonate with you, you just found the right post to read: we have never been this close to the vision by strengthening Flink’s integration with Hive to a production grade.&lt;/p&gt;
&lt;h2 id=&quot;flink-and-its-integration-with-hive-comes-into-the-scene&quot;&gt;Flink and Its Integration With Hive Comes into the Scene&lt;/h2&gt;
&lt;p&gt;Apache Flink has been a proven scalable system to handle extremely high workload of streaming data in super low latency in many giant tech companies.&lt;/p&gt;
&lt;p&gt;Despite its huge success in the real time processing domain, at its deep root, Flink has been faithfully following its inborn philosophy of being &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;a unified data processing engine for both batch and streaming&lt;/a&gt;, and taking a streaming-first approach in its architecture to do batch processing. By making batch a special case for streaming, Flink really leverages its cutting edge streaming capabilities and applies them to batch scenarios to gain the best offline performance. Flink’s batch performance has been quite outstanding in the early days and has become even more impressive, as the community started merging Blink, Alibaba’s fork of Flink, back to Flink in 1.9 and finished it in 1.10.&lt;/p&gt;
&lt;p&gt;On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.&lt;/p&gt;
&lt;p&gt;Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;Flink 1.10&lt;/a&gt; and we can’t wait to walk you through the details.&lt;/p&gt;
&lt;h3 id=&quot;unified-metadata-management&quot;&gt;Unified Metadata Management&lt;/h3&gt;
&lt;p&gt;Hive Metastore has evolved into the de facto metadata hub over the years in the Hadoop, or even the cloud, ecosystem. Many companies have a single Hive Metastore service instance in production to manage all of their schemas, either Hive or non-Hive metadata, as the single source of truth.&lt;/p&gt;
&lt;p&gt;In 1.9 we introduced Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html&quot;&gt;HiveCatalog&lt;/a&gt;, connecting Flink to users’ rich metadata pool. The meaning of &lt;code&gt;HiveCatalog&lt;/code&gt; is two-fold here. First, it allows Apache Flink users to utilize Hive Metastore to store and manage Flink’s metadata, including tables, UDFs, and statistics of data. Second, it enables Flink to access Hive’s existing metadata, so that Flink itself can read and write Hive tables.&lt;/p&gt;
&lt;p&gt;In Flink 1.10, users can store Flink’s own tables, views, UDFs, statistics in Hive Metastore on all of the compatible Hive versions mentioned above. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_catalog.html#example&quot;&gt;Here’s an end-to-end example&lt;/a&gt; of how to store a Flink’s Kafka source table in Hive Metastore and later query the table in Flink SQL.&lt;/p&gt;
&lt;h3 id=&quot;stream-processing&quot;&gt;Stream Processing&lt;/h3&gt;
&lt;p&gt;The Hive integration feature in Flink 1.10 empowers users to re-imagine what they can accomplish with their Hive data and unlock stream processing use cases:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;join real-time streaming data in Flink with offline Hive data for more complex data processing&lt;/li&gt;
&lt;li&gt;backfill Hive data with Flink directly in a unified fashion&lt;/li&gt;
&lt;li&gt;leverage Flink to move real-time data into Hive more quickly, greatly shortening the end-to-end latency between when data is generated and when it arrives at your data warehouse for analytics, from hours — or even days — to minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;compatible-with-more-hive-versions&quot;&gt;Compatible with More Hive Versions&lt;/h3&gt;
&lt;p&gt;In Flink 1.10, we brought full coverage to most Hive versions including 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 2.3, and 3.1. Take a look &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;reuse-hive-user-defined-functions-udfs&quot;&gt;Reuse Hive User Defined Functions (UDFs)&lt;/h3&gt;
&lt;p&gt;Users can &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#hive-user-defined-functions&quot;&gt;reuse all kinds of Hive UDFs in Flink&lt;/a&gt; since Flink 1.9.&lt;/p&gt;
&lt;p&gt;This is a great win for Flink users with past history with the Hive ecosystem, as they may have developed custom business logic in their Hive UDFs. Being able to run these functions without any rewrite saves users a lot of time and brings them a much smoother experience when they migrate to Flink.&lt;/p&gt;
&lt;p&gt;To take it a step further, Flink 1.10 introduces &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/hive_functions.html#use-hive-built-in-functions-via-hivemodule&quot;&gt;compatibility of Hive built-in functions via HiveModule&lt;/a&gt;. Over the years, the Hive community has developed a few hundreds of built-in functions that are super handy for users. For those built-in functions that don’t exist in Flink yet, users are now able to leverage the existing Hive built-in functions that they are familiar with and complete their jobs seamlessly.&lt;/p&gt;
&lt;h3 id=&quot;enhanced-read-and-write-on-hive-data&quot;&gt;Enhanced Read and Write on Hive Data&lt;/h3&gt;
&lt;p&gt;Flink 1.10 extends its read and write capabilities on Hive data to all the common use cases with better performance.&lt;/p&gt;
&lt;p&gt;On the reading side, Flink now can read Hive regular tables, partitioned tables, and views. Lots of optimization techniques are developed around reading, including partition pruning and projection pushdown to transport less data from file storage, limit pushdown for faster experiment and exploration, and vectorized reader for ORC files.&lt;/p&gt;
&lt;p&gt;On the writing side, Flink 1.10 introduces “INSERT INTO” and “INSERT OVERWRITE” to its syntax, and can write to not only Hive’s regular tables, but also partitioned tables with either static or dynamic partitions.&lt;/p&gt;
&lt;h3 id=&quot;formats&quot;&gt;Formats&lt;/h3&gt;
&lt;p&gt;Your engine should be able to handle all common types of file formats to give you the freedom of choosing one over another in order to fit your business needs. It’s no exception for Flink. We have tested the following table storage formats: text, csv, SequenceFile, ORC, and Parquet.&lt;/p&gt;
&lt;h3 id=&quot;more-data-types&quot;&gt;More Data Types&lt;/h3&gt;
&lt;p&gt;In Flink 1.10, we added support for a few more frequently-used Hive data types that were not covered by Flink 1.9. Flink users now should have a full, smooth experience to query and manipulate Hive data from Flink.&lt;/p&gt;
&lt;h3 id=&quot;roadmap&quot;&gt;Roadmap&lt;/h3&gt;
&lt;p&gt;Integration between any two systems is a never-ending story.&lt;/p&gt;
&lt;p&gt;We are constantly improving Flink itself and the Flink-Hive integration also gets improved by collecting user feedback and working with folks in this vibrant community.&lt;/p&gt;
&lt;p&gt;After careful consideration and prioritization of the feedback we received, we have prioritize many of the below requests for the next Flink release of 1.11.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hive streaming sink so that Flink can stream data into Hive tables, bringing a real streaming experience to Hive&lt;/li&gt;
&lt;li&gt;Native Parquet reader for better performance&lt;/li&gt;
&lt;li&gt;Additional interoperability - support creating Hive tables, views, functions in Flink&lt;/li&gt;
&lt;li&gt;Better out-of-box experience with built-in dependencies, including documentations&lt;/li&gt;
&lt;li&gt;JDBC driver so that users can reuse their existing toolings to run SQL jobs on Flink&lt;/li&gt;
&lt;li&gt;Hive syntax and semantic compatible mode&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you have more feature requests or discover bugs, please reach out to the community through mailing list and JIRAs.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Data warehousing is shifting to a more real-time fashion, and Apache Flink can make a difference for your organization in this space.&lt;/p&gt;
&lt;p&gt;Flink 1.10 brings production-ready Hive integration and empowers users to achieve more in both metadata management and unified/batch data processing.&lt;/p&gt;
&lt;p&gt;We encourage all our users to get their hands on Flink 1.10. You are very welcome to join the community in development, discussions, and all other kinds of collaborations in this topic.&lt;/p&gt;
</description>
<pubDate>Fri, 27 Mar 2020 03:30:00 +0100</pubDate>
<link>https://flink.apache.org/features/2020/03/27/flink-for-data-warehouse.html</link>
<guid isPermaLink="true">/features/2020/03/27/flink-for-data-warehouse.html</guid>
</item>
<item>
<title>Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic</title>
<description>&lt;p&gt;In the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;first article&lt;/a&gt; of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded &lt;code&gt;KeysExtractor&lt;/code&gt; implementation.&lt;/p&gt;
&lt;p&gt;We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. You will learn how the approach to data partitioning described in &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Part 1&lt;/a&gt; can be applied in combination with a dynamic configuration. These two patterns, when used together, can eliminate the need to recompile the code and redeploy your Flink job for a wide range of modifications of the business logic.&lt;/p&gt;
&lt;h2 id=&quot;rules-broadcasting&quot;&gt;Rules Broadcasting&lt;/h2&gt;
&lt;p&gt;Let’s first have a look at the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html#dynamic-data-partitioning&quot;&gt;previously-defined&lt;/a&gt; data-processing pipeline:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;DynamicKeyFunction&lt;/code&gt; provides dynamic data partitioning while &lt;code&gt;DynamicAlertFunction&lt;/code&gt; is responsible for executing the main logic of processing transactions and sending alert messages according to defined rules.&lt;/p&gt;
&lt;p&gt;Vol.1 of this series simplified the use case and assumed that the applied set of rules is pre-initialized and accessible via the &lt;code&gt;List&amp;lt;Rules&amp;gt;&lt;/code&gt; within &lt;code&gt;DynamicKeyFunction&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/* Simplified */&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* Rules that are initialized somehow.*/&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Adding rules to this list is obviously possible directly inside the code of the Flink Job at the stage of its initialization (Create a &lt;code&gt;List&lt;/code&gt; object; use it’s &lt;code&gt;add&lt;/code&gt; method). A major drawback of doing so is that it will require recompilation of the job with each rule modification. In a real Fraud Detection system, rules are expected to change on a frequent basis, making this approach unacceptable from the point of view of business and operational requirements. A different approach is needed.&lt;/p&gt;
&lt;p&gt;Next, let’s take a look at a sample rule definition that we introduced in the previous post of the series:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/rule-dsl.png&quot; width=&quot;800px&quot; alt=&quot;Figure 1: Rule definition&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 1: Rule definition&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The previous post covered use of &lt;code&gt;groupingKeyNames&lt;/code&gt; by &lt;code&gt;DynamicKeyFunction&lt;/code&gt; to extract message keys. Parameters from the second part of this rule are used by &lt;code&gt;DynamicAlertFunction&lt;/code&gt;: they define the actual logic of the performed operations and their parameters (such as the alert-triggering limit). This means that the same rule must be present in both &lt;code&gt;DynamicKeyFunction&lt;/code&gt; and &lt;code&gt;DynamicAlertFunction&lt;/code&gt;. To achieve this result, we will use the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/state/broadcast_state.html&quot;&gt;broadcast data distribution mechanism&lt;/a&gt; of Apache Flink.&lt;/p&gt;
&lt;p&gt;Figure 2 presents the final job graph of the system that we are building:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/job-graph.png&quot; width=&quot;800px&quot; alt=&quot;Figure 2: Job Graph of the Fraud Detection Flink Job&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 2: Job Graph of the Fraud Detection Flink Job&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The main blocks of the Transactions processing pipeline are:&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transaction Source&lt;/strong&gt; that consumes transaction messages from Kafka partitions in parallel. &lt;br /&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Key Function&lt;/strong&gt; that performs data enrichment with a dynamic key. The subsequent &lt;code&gt;keyBy&lt;/code&gt; hashes this dynamic key and partitions the data accordingly among all parallel instances of the following operator.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Alert Function&lt;/strong&gt; that accumulates a data window and creates Alerts based on it.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;data-exchange-inside-apache-flink&quot;&gt;Data Exchange inside Apache Flink&lt;/h2&gt;
&lt;p&gt;The job graph above also indicates various data exchange patterns between the operators. In order to understand how the broadcast pattern works, let’s take a short detour and discuss what methods of message propagation exist in Apache Flink’s distributed runtime.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;FORWARD&lt;/strong&gt; connection after the Transaction Source means that all data consumed by one of the parallel instances of the Transaction Source operator is transferred to exactly one instance of the subsequent &lt;code&gt;DynamicKeyFunction&lt;/code&gt; operator. It also indicates the same level of parallelism of the two connected operators (12 in the above case). This communication pattern is illustrated in Figure 3. Orange circles represent transactions, and dotted rectangles depict parallel instances of the conjoined operators.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/forward.png&quot; width=&quot;800px&quot; alt=&quot;Figure 3: FORWARD message passing across operator instances&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 3: FORWARD message passing across operator instances&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;HASH&lt;/strong&gt; connection between &lt;code&gt;DynamicKeyFunction&lt;/code&gt; and &lt;code&gt;DynamicAlertFunction&lt;/code&gt; means that for each message a hash code is calculated and messages are evenly distributed among available parallel instances of the next operator. Such a connection needs to be explicitly “requested” from Flink by using &lt;code&gt;keyBy&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/hash.png&quot; width=&quot;800px&quot; alt=&quot;Figure 4: HASHED message passing across operator instances (via `keyBy`)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 4: HASHED message passing across operator instances (via `keyBy`)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;REBALANCE&lt;/strong&gt; distribution is either caused by an explicit call to &lt;code&gt;rebalance()&lt;/code&gt; or by a change of parallelism (12 -&amp;gt; 1 in the case of the job graph from Figure 2). Calling &lt;code&gt;rebalance()&lt;/code&gt; causes data to be repartitioned in a round-robin fashion and can help to mitigate data skew in certain scenarios.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/rebalance.png&quot; width=&quot;800px&quot; alt=&quot;Figure 5: REBALANCE message passing across operator instances&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 5: REBALANCE message passing across operator instances&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The Fraud Detection job graph in Figure 2 contains an additional data source: &lt;em&gt;Rules Source&lt;/em&gt;. It also consumes from Kafka. Rules are “mixed into” the main processing data flow through the &lt;strong&gt;BROADCAST&lt;/strong&gt; channel. Unlike other methods of transmitting data between operators, such as &lt;code&gt;forward&lt;/code&gt;, &lt;code&gt;hash&lt;/code&gt; or &lt;code&gt;rebalance&lt;/code&gt; that make each message available for processing in only one of the parallel instances of the receiving operator, &lt;code&gt;broadcast&lt;/code&gt; makes each message available at the input of all of the parallel instances of the operator to which the &lt;em&gt;broadcast stream&lt;/em&gt; is connected. This makes &lt;code&gt;broadcast&lt;/code&gt; applicable to a wide range of tasks that need to affect the processing of all messages, regardless of their key or source partition.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/patterns-blog-2/broadcast.png&quot; width=&quot;800px&quot; alt=&quot;Figure 6: BROADCAST message passing across operator instances&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 6: BROADCAST message passing across operator instances&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
There are actually a few more specialized data partitioning schemes in Flink which we did not mention here. If you want to find out more, please refer to Flink’s documentation on &lt;strong&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/#physical-partitioning&quot;&gt;stream partitioning&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;broadcast-state-pattern&quot;&gt;Broadcast State Pattern&lt;/h2&gt;
&lt;p&gt;In order to make use of the Rules Source, we need to “connect” it to the main data stream:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Streams setup&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[...]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesUpdateStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;[...]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BroadcastStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesUpdateStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;broadcast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Processing pipeline setup&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rulesStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, the broadcast stream can be created from any regular stream by calling the &lt;code&gt;broadcast&lt;/code&gt; method and specifying a state descriptor. Flink assumes that broadcasted data needs to be stored and retrieved while processing events of the main data flow and, therefore, always automatically creates a corresponding &lt;em&gt;broadcast state&lt;/em&gt; from this state descriptor. This is different from any other Apache Flink state type in which you need to initialize it in the &lt;code&gt;open()&lt;/code&gt; method of the processing function. Also note that broadcast state always has a key-value format (&lt;code&gt;MapState&lt;/code&gt;).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rules&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Connecting to &lt;code&gt;rulesStream&lt;/code&gt; causes some changes in the signature of the processing functions. The previous article presented it in a slightly simplified way as a &lt;code&gt;ProcessFunction&lt;/code&gt;. However, &lt;code&gt;DynamicKeyFunction&lt;/code&gt; is actually a &lt;code&gt;BroadcastProcessFunction&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;BroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IN2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN2&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;OUT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The difference is the addition of the &lt;code&gt;processBroadcastElement&lt;/code&gt; method through which messages of the rules stream will arrive. The following new version of &lt;code&gt;DynamicKeyFunction&lt;/code&gt; allows modifying the list of data-distribution keys at runtime through this stream:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;BroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;broadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ReadOnlyBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;RULES_STATE_DESCRIPTOR&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rulesState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;immutableEntries&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeysExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getGroupingKeyNames&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In the above code, &lt;code&gt;processElement()&lt;/code&gt; receives Transactions, and &lt;code&gt;processBroadcastElement()&lt;/code&gt; receives Rule updates. When a new rule is created, it is distributed as depicted in Figure 6 and saved in all parallel instances of the operator using &lt;code&gt;processBroadcastState&lt;/code&gt;. We use a Rule’s ID as the key to store and reference individual rules. Instead of iterating over a hardcoded &lt;code&gt;List&amp;lt;Rules&amp;gt;&lt;/code&gt;, we iterate over entries in the dynamically-updated broadcast state.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;DynamicAlertFunction&lt;/code&gt; follows the same logic with respect to storing the rules in the broadcast &lt;code&gt;MapState&lt;/code&gt;. As described in &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;Part 1&lt;/a&gt;, each message in the &lt;code&gt;processElement&lt;/code&gt; input is intended to be processed by one specific rule and comes “pre-marked” with a corresponding ID by &lt;code&gt;DynamicKeyFunction&lt;/code&gt;. All we need to do is retrieve the definition of the corresponding rule from &lt;code&gt;BroadcastState&lt;/code&gt; by using the provided ID and process it according to the logic required by that rule. At this stage, we will also add messages to the internal function state in order to perform calculations on the required time window of data. We will consider how this is done in the final blog of the series about Fraud Detection.&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;In this blog post, we continued our investigation of the use case of a Fraud Detection System built with Apache Flink. We looked into different ways in which data can be distributed between parallel operator instances and, most importantly, examined broadcast state. We demonstrated how dynamic partitioning — a pattern described in the &lt;a href=&quot;https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html&quot;&gt;first part&lt;/a&gt; of the series — can be combined and enhanced by the functionality provided by the broadcast state pattern. The ability to send dynamic updates at runtime is a powerful feature of Apache Flink that is applicable in a variety of other use cases, such as controlling state (cleanup/insert/fix), running A/B experiments or executing updates of ML model coefficients.&lt;/p&gt;
</description>
<pubDate>Tue, 24 Mar 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/03/24/demo-fraud-detection-2.html</link>
<guid isPermaLink="true">/news/2020/03/24/demo-fraud-detection-2.html</guid>
</item>
<item>
<title>Apache Beam: How Beam Runs on Top of Flink</title>
<description>&lt;p&gt;Note: This blog post is based on the talk &lt;a href=&quot;https://www.youtube.com/watch?v=hxHGLrshnCY&quot;&gt;“Beam on Flink: How Does It Actually Work?”&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://beam.apache.org/&quot;&gt;Apache Beam&lt;/a&gt; are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. We also take a closer look at how Beam works with Flink to provide an idea of the technical aspects of running Beam pipelines with Flink. We hope you find some useful information on how and why the two frameworks can be utilized in combination. For more information, you can refer to the corresponding &lt;a href=&quot;https://beam.apache.org/documentation/runners/flink/&quot;&gt;documentation&lt;/a&gt; on the Beam website or contact the community through the &lt;a href=&quot;https://beam.apache.org/community/contact-us/&quot;&gt;Beam mailing list&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&quot;what-is-apache-beam&quot;&gt;What is Apache Beam&lt;/h1&gt;
&lt;p&gt;&lt;a href=&quot;https://beam.apache.org/&quot;&gt;Apache Beam&lt;/a&gt; is an open-source, unified model for defining batch and streaming data-parallel processing pipelines. It is unified in the sense that you use a single API, in contrast to using a separate API for batch and streaming like it is the case in Flink. Beam was originally developed by Google which released it in 2014 as the Cloud Dataflow SDK. In 2016, it was donated to &lt;a href=&quot;https://www.apache.org/&quot;&gt;the Apache Software Foundation&lt;/a&gt; with the name of Beam. It has been developed by the open-source community ever since. With Apache Beam, developers can write data processing jobs, also known as pipelines, in multiple languages, e.g. Java, Python, Go, SQL. A pipeline is then executed by one of Beam’s Runners. A Runner is responsible for translating Beam pipelines such that they can run on an execution engine. Every supported execution engine has a Runner. The following Runners are available: Apache Flink, Apache Spark, Apache Samza, Hazelcast Jet, Google Cloud Dataflow, and others.&lt;/p&gt;
&lt;p&gt;The execution model, as well as the API of Apache Beam, are similar to Flink’s. Both frameworks are inspired by the &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf&quot;&gt;MapReduce&lt;/a&gt;, &lt;a href=&quot;https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41378.pdf&quot;&gt;MillWheel&lt;/a&gt;, and &lt;a href=&quot;https://research.google/pubs/pub43864/&quot;&gt;Dataflow&lt;/a&gt; papers. Like Flink, Beam is designed for parallel, distributed data processing. Both have similar transformations, support for windowing, event/processing time, watermarks, timers, triggers, and much more. However, Beam not being a full runtime focuses on providing the framework for building portable, multi-language batch and stream processing pipelines such that they can be run across several execution engines. The idea is that you write your pipeline once and feed it with either batch or streaming data. When you run it, you just pick one of the supported backends to execute. A large integration test suite in Beam called “ValidatesRunner” ensures that the results will be the same, regardless of which backend you choose for the execution.&lt;/p&gt;
&lt;p&gt;One of the most exciting developments in the Beam technology is the framework’s support for multiple programming languages including Java, Python, Go, Scala and SQL. Essentially, developers can write their applications in a programming language of their choice. Beam, with the help of the Runners, translates the program to one of the execution engines, as shown in the diagram below.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-beam-vision.png&quot; width=&quot;600px&quot; alt=&quot;The vision of Apache Beam&quot; /&gt;
&lt;/center&gt;
&lt;h1 id=&quot;reasons-to-use-beam-with-flink&quot;&gt;Reasons to use Beam with Flink&lt;/h1&gt;
&lt;p&gt;Why would you want to use Beam with Flink instead of directly using Flink? Ultimately, Beam and Flink complement each other and provide additional value to the user. The main reasons for using Beam with Flink are the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Beam provides a unified API for both batch and streaming scenarios.&lt;/li&gt;
&lt;li&gt;Beam comes with native support for different programming languages, like Python or Go with all their libraries like Numpy, Pandas, Tensorflow, or TFX.&lt;/li&gt;
&lt;li&gt;You get the power of Apache Flink like its exactly-once semantics, strong memory management and robustness.&lt;/li&gt;
&lt;li&gt;Beam programs run on your existing Flink infrastructure or infrastructure for other supported Runners, like Spark or Google Cloud Dataflow.&lt;/li&gt;
&lt;li&gt;You get additional features like side inputs and cross-language pipelines that are not supported natively in Flink but only supported when using Beam with Flink.&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;the-flink-runner-in-beam&quot;&gt;The Flink Runner in Beam&lt;/h1&gt;
&lt;p&gt;The Flink Runner in Beam translates Beam pipelines into Flink jobs. The translation can be parameterized using Beam’s pipeline options which are parameters for settings like configuring the job name, parallelism, checkpointing, or metrics reporting.&lt;/p&gt;
&lt;p&gt;If you are familiar with a DataSet or a DataStream, you will have no problems understanding what a PCollection is. PCollection stands for parallel collection in Beam and is exactly what DataSet/DataStream would be in Flink. Due to Beam’s unified API we only have one type of results of transformation: PCollection.&lt;/p&gt;
&lt;p&gt;Beam pipelines are composed of transforms. Transforms are like operators in Flink and come in two flavors: primitive and composite transforms. The beauty of all this is that Beam only comes with a small set of primitive transforms which are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Source&lt;/code&gt; (for loading data)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ParDo&lt;/code&gt; (think of a flat map operator on steroids)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GroupByKey&lt;/code&gt; (think of keyBy() in Flink)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;AssignWindows&lt;/code&gt; (windows can be assigned at any point in time in Beam)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Flatten&lt;/code&gt; (like a union() operation in Flink)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Composite transforms are built by combining the above primitive transforms. For example, &lt;code&gt;Combine = GroupByKey + ParDo&lt;/code&gt;.&lt;/p&gt;
&lt;h1 id=&quot;flink-runner-internals&quot;&gt;Flink Runner Internals&lt;/h1&gt;
&lt;p&gt;Although using the Flink Runner in Beam has no prerequisite to understanding its internals, we provide more details of how the Flink runner works in Beam to share knowledge of how the two frameworks can integrate and work together to provide state-of-the-art streaming data pipelines.&lt;/p&gt;
&lt;p&gt;The Flink Runner has two translation paths. Depending on whether we execute in batch or streaming mode, the Runner either translates into Flink’s DataSet or into Flink’s DataStream API. Since multi-language support has been added to Beam, another two translation paths have been added. To summarize the four modes:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The Classic Flink Runner for batch jobs:&lt;/strong&gt; Executes batch Java pipelines&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Classic Flink Runner for streaming jobs:&lt;/strong&gt; Executes streaming Java pipelines&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Portable Flink Runner for batch jobs:&lt;/strong&gt; Executes Java as well as Python, Go and other supported SDK pipelines for batch scenarios&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The Portable Flink Runner for streaming jobs:&lt;/strong&gt; Executes Java as well as Python, Go and other supported SDK pipelines for streaming scenarios&lt;/li&gt;
&lt;/ol&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-runner-translation-paths.png&quot; width=&quot;300px&quot; alt=&quot;The 4 translation paths in the Beam&#39;s Flink Runner&quot; /&gt;
&lt;/center&gt;
&lt;h2 id=&quot;the-classic-flink-runner-in-beam&quot;&gt;The “Classic” Flink Runner in Beam&lt;/h2&gt;
&lt;p&gt;The classic Flink Runner was the initial version of the Runner, hence the “classic” name. Beam pipelines are represented as a graph in Java which is composed of the aforementioned composite and primitive transforms. Beam provides translators which traverse the graph in topological order. Topological order means that we start from all the sources first as we iterate through the graph. Presented with a transform from the graph, the Flink Runner generates the API calls as you would normally when writing a Flink job.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/classic-flink-runner-beam.png&quot; width=&quot;600px&quot; alt=&quot;The Classic Flink Runner in Beam&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;While Beam and Flink share very similar concepts, there are enough differences between the two frameworks that make Beam pipelines impossible to be translated 1:1 into a Flink program. In the following sections, we will present the key differences:&lt;/p&gt;
&lt;h3 id=&quot;serializers-vs-coders&quot;&gt;Serializers vs Coders&lt;/h3&gt;
&lt;p&gt;When data is transferred over the wire in Flink, it has to be turned into bytes. This is done with the help of serializers. Flink has a type system to instantiate the correct coder for a given type, e.g. &lt;code&gt;StringTypeSerializer&lt;/code&gt; for a String. Apache Beam also has its own type system which is similar to Flink’s but uses slightly different interfaces. Serializers are called Coders in Beam. In order to make a Beam Coder run in Flink, we have to make the two serializer types compatible. This is done by creating a special Flink type information that looks like the one in Flink but calls the appropriate Beam coder. That way, we can use Beam’s coders although we are executing the Beam job with Flink. Flink operators expect a TypeInformation, e.g. &lt;code&gt;StringTypeInformation&lt;/code&gt;, for which we use a &lt;code&gt;CoderTypeInformation&lt;/code&gt; in Beam. The type information returns the serializer for which we return a &lt;code&gt;CoderTypeSerializer&lt;/code&gt;, which calls the underlying Beam Coder.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-serializers-coders.png&quot; width=&quot;300px&quot; alt=&quot;Serializers vs Coders&quot; /&gt;
&lt;/center&gt;
&lt;h3 id=&quot;read&quot;&gt;Read&lt;/h3&gt;
&lt;p&gt;The &lt;code&gt;Read&lt;/code&gt; transform provides a way to read data into your pipeline in Beam. The Read transform is supported by two wrappers in Beam, the &lt;code&gt;SourceInputFormat&lt;/code&gt; for batch processing and the &lt;code&gt;UnboundedSourceWrapper&lt;/code&gt; for stream processing.&lt;/p&gt;
&lt;h3 id=&quot;pardo&quot;&gt;ParDo&lt;/h3&gt;
&lt;p&gt;&lt;code&gt;ParDo&lt;/code&gt; is the swiss army knife of Beam and can be compared to a &lt;code&gt;RichFlatMapFunction&lt;/code&gt; in Flink with additional features such as &lt;code&gt;SideInputs&lt;/code&gt;, &lt;code&gt;SideOutputs&lt;/code&gt;, State and Timers. &lt;code&gt;ParDo&lt;/code&gt; is essentially translated by the Flink runner using the &lt;code&gt;FlinkDoFnFunction&lt;/code&gt; for batch processing or the &lt;code&gt;FlinkStatefulDoFnFunction&lt;/code&gt;, while for streaming scenarios the translation is executed with the &lt;code&gt;DoFnOperator&lt;/code&gt; that takes care of checkpointing and buffering of data during checkpoints, watermark emissions and maintenance of state and timers. This is all executed by Beam’s interface, called the &lt;code&gt;DoFnRunner&lt;/code&gt;, that encapsulates Beam-specific execution logic, like retrieving state, executing state and timers, or reporting metrics.&lt;/p&gt;
&lt;h3 id=&quot;side-inputs&quot;&gt;Side Inputs&lt;/h3&gt;
&lt;p&gt;In addition to the main input, ParDo transforms can have a number of side inputs. A side input can be a static set of data that you want to have available at all parallel instances. However, it is more flexible than that. You can have keyed and even windowed side input which updates based on the window size. This is a very powerful concept which does not exist in Flink but is added on top of Flink using Beam.&lt;/p&gt;
&lt;h3 id=&quot;assignwindows&quot;&gt;AssignWindows&lt;/h3&gt;
&lt;p&gt;In Flink, windows are assigned by the &lt;code&gt;WindowOperator&lt;/code&gt; when you use the &lt;code&gt;window()&lt;/code&gt; in the API. In Beam, windows can be assigned at any point in time. Any element is implicitly part of a window. If no window is assigned explicitly, the element is part of the &lt;code&gt;GlobalWindow&lt;/code&gt;. Window information is stored for each element in a wrapper called &lt;code&gt;WindowedValue&lt;/code&gt;. The window information is only used once we issue a &lt;code&gt;GroupByKey&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;groupbykey&quot;&gt;GroupByKey&lt;/h3&gt;
&lt;p&gt;Most of the time it is useful to partition the data by a key. In Flink, this is done via the &lt;code&gt;keyBy()&lt;/code&gt; API call. In Beam the &lt;code&gt;GroupByKey&lt;/code&gt; transform can only be applied if the input is of the form &lt;code&gt;KV&amp;lt;Key, Value&amp;gt;&lt;/code&gt;. Unlike Flink where the key can even be nested inside the data, Beam enforces the key to always be explicit. The &lt;code&gt;GroupByKey&lt;/code&gt; transform then groups the data by key and by window which is similar to what &lt;code&gt;keyBy(..).window(..)&lt;/code&gt; would give us in Flink. Beam has its own set of libraries to do that because Beam has its own set of window functions and triggers. Essentially, GroupByKey is very similar to what the WindowOperator does in Flink.&lt;/p&gt;
&lt;h3 id=&quot;flatten&quot;&gt;Flatten&lt;/h3&gt;
&lt;p&gt;The Flatten operator takes multiple DataSet/DataStreams, called P[arallel]Collections in Beam, and combines them into one collection. This is equivalent to Flink’s &lt;code&gt;union()&lt;/code&gt; operation.&lt;/p&gt;
&lt;h2 id=&quot;the-portable-flink-runner-in-beam&quot;&gt;The “Portable” Flink Runner in Beam&lt;/h2&gt;
&lt;p&gt;The portable Flink Runner in Beam is the evolution of the classic Runner. Classic Runners are tied to the JVM ecosystem, but the Beam community wanted to move past this and also execute Python, Go and other languages. This adds another dimension to Beam in terms of portability because, like previously mentioned, Beam already had portability across execution engines. It was necessary to change the translation logic of the Runner to be able to support language portability.&lt;/p&gt;
&lt;p&gt;There are two important building blocks for portable Runners:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;A common pipeline format across all the languages: The Runner API&lt;/li&gt;
&lt;li&gt;A common interface during execution for the communication between the Runner and the code written in any language: The Fn API&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The Runner API provides a universal representation of the pipeline as Protobuf which contains the transforms, types, and user code. Protobuf was chosen as the format because every language has libraries available for it. Similarly, for the execution part, Beam introduced the Fn API interface to handle the communication between the Runner/execution engine and the user code that may be written in a different language and executes in a different process. Fn API is pronounced “fun API”, you may guess why.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-language-portability.png&quot; width=&quot;600px&quot; alt=&quot;Language Portability in Apache Beam&quot; /&gt;
&lt;/center&gt;
&lt;h2 id=&quot;how-are-beam-programs-translated-in-language-portability&quot;&gt;How Are Beam Programs Translated In Language Portability?&lt;/h2&gt;
&lt;p&gt;Users write their Beam pipelines in one language, but they may get executed in an environment based on a completely different language. How does that work? To explain that, let’s follow the lifecycle of a pipeline. Let’s suppose we use the Python SDK to write the pipeline. Before submitting the pipeline via the Job API to Beam’s JobServer, Beam would convert it to the Runner API, the language-agnostic format we described before. The JobServer is also a Beam component that handles the staging of the required dependencies during execution. The JobServer will then kick-off the translation which is similar to the classic Runner. However, an important change is the so-called &lt;code&gt;ExecutableStage&lt;/code&gt; transform. It is essentially a ParDo transform that we already know but designed for holding language-dependent code. Beam tries to combine as many of these transforms into one “executable stage”. The result again is a Flink program which is then sent to the Flink cluster and executed there. The major difference compared to the classic Runner is that during execution we will start &lt;em&gt;environments&lt;/em&gt; to execute the aforementioned &lt;em&gt;ExecutableStages&lt;/em&gt;. The following environments are available:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Docker-based (the default)&lt;/li&gt;
&lt;li&gt;Process-based (a simple process is started)&lt;/li&gt;
&lt;li&gt;Externally-provided (K8s or other schedulers)&lt;/li&gt;
&lt;li&gt;Embedded (intended for testing and only works with Java)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Environments hold the &lt;em&gt;SDK Harness&lt;/em&gt; which is the code that handles the execution and the communication with the Runner over the Fn API. For example, when Flink executes Python code, it sends the data to the Python environment containing the Python SDK Harness. Sending data to an external process involves a minor overhead which we have measured to be 5-10% slower than the classic Java pipelines. However, Beam uses a fusion of transforms to execute as many transforms as possible in the same environment which share the same input or output. That’s why in real-world scenarios the overhead could be much lower.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-02-22-beam-on-flink/flink-runner-beam-language-portability-architecture.png&quot; width=&quot;600px&quot; alt=&quot;Language Portability Architecture in beam&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;Environments can be present for many languages. This opens up an entirely new type of pipelines: cross-language pipelines. In cross-language pipelines we can combine transforms of two or more languages, e.g. a machine learning pipeline with the feature generation written in Java and the learning written in Python. All this can be run on top of Flink.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Using Apache Beam with Apache Flink combines (a.) the power of Flink with (b.) the flexibility of Beam. All it takes to run Beam is a Flink cluster, which you may already have. Apache Beam’s fully-fledged Python API is probably the most compelling argument for using Beam with Flink, but the unified API which allows to “write-once” and “execute-anywhere” is also very appealing to Beam users. On top of this, features like side inputs and a rich connector ecosystem are also reasons why people like Beam.&lt;/p&gt;
&lt;p&gt;With the introduction of schemas, a new format for handling type information, Beam is heading in a similar direction as Flink with its type system which is essential for the Table API or SQL. Speaking of, the next Flink release will include a Python version of the Table API which is based on the language portability of Beam. Looking ahead, the Beam community plans to extend the support for interactive programs like notebooks. TFX, which is built with Beam, is a very powerful way to solve many problems around training and validating machine learning models.&lt;/p&gt;
&lt;p&gt;For many years, Beam and Flink have inspired and learned from each other. With the Python support being based on Beam in Flink, they only seem to come closer to each other. That’s all the better for the community, and also users have more options and functionality to choose from.&lt;/p&gt;
</description>
<pubDate>Sat, 22 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html</link>
<guid isPermaLink="true">/ecosystem/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink.html</guid>
</item>
<item>
<title>No Java Required: Configuring Sources and Sinks in SQL</title>
<description>&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
&lt;p&gt;The recent &lt;a href=&quot;https://flink.apache.org/news/2020/02/11/release-1.10.0.html&quot;&gt;Apache Flink 1.10 release&lt;/a&gt; includes many exciting features.
In particular, it marks the end of the community’s year-long effort to merge in the &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;Blink SQL contribution&lt;/a&gt; from Alibaba.
The reason the community chose to spend so much time on the contribution is that SQL works.
It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts.
Best of all, Flink SQL is ANSI-SQL compliant, which means if you’ve ever used a database in the past, you already know it&lt;sup id=&quot;fnref:1&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt;!&lt;/p&gt;
&lt;p&gt;A lot of work focused on improving runtime performance and progressively extending its coverage of the SQL standard.
Flink now supports the full TPC-DS query set for batch queries, reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads.
Its streaming SQL supports an almost equal set of features - those that are well defined on a streaming runtime - including &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/streaming/joins.html&quot;&gt;complex joins&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/match_recognize.html&quot;&gt;MATCH_RECOGNIZE&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As important as this work is, the community also strives to make these features generally accessible to the broadest audience possible.
That is why the Flink community is excited in 1.10 to offer production-ready DDL syntax (e.g., &lt;code&gt;CREATE TABLE&lt;/code&gt;, &lt;code&gt;DROP TABLE&lt;/code&gt;) and a refactored catalog interface.&lt;/p&gt;
&lt;h1 id=&quot;accessing-your-data-where-it-lives&quot;&gt;Accessing Your Data Where It Lives&lt;/h1&gt;
&lt;p&gt;Flink does not store data at rest; it is a compute engine and requires other systems to consume input from and write its output.
Those that have used Flink’s &lt;code&gt;DataStream&lt;/code&gt; API in the past will be familiar with connectors that allow for interacting with external systems.
Flink has a vast connector ecosystem that includes all major message queues, filesystems, and databases.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
If your favorite system does not have a connector maintained in the central Apache Flink repository, check out the &lt;a href=&quot;https://flink-packages.org&quot;&gt;flink packages website&lt;/a&gt;, which has a growing number of community-maintained components.
&lt;/div&gt;
&lt;p&gt;While these connectors are battle-tested and production-ready, they are written in Java and configured in code, which means they are not amenable to pure SQL or Table applications.
For a holistic SQL experience, not only queries need to be written in SQL, but also table definitions.&lt;/p&gt;
&lt;h1 id=&quot;create-table-statements&quot;&gt;CREATE TABLE Statements&lt;/h1&gt;
&lt;p&gt;While Flink SQL has long provided table abstractions atop some of Flink’s most popular connectors, configurations were not always so straightforward.
Beginning in 1.10, Flink supports defining tables through &lt;code&gt;CREATE TABLE&lt;/code&gt; statements.
With this feature, users can now create logical tables, backed by various external systems, in pure SQL.&lt;/p&gt;
&lt;p&gt;By defining tables in SQL, developers can write queries against logical schemas that are abstracted away from the underlying physical data store. Coupled with Flink SQL’s unified approach to batch and stream processing, Flink provides a straight line from discovery to production.&lt;/p&gt;
&lt;p&gt;Users can define tables over static data sets, anything from a local CSV file to a full-fledged data lake or even Hive.
Leveraging Flink’s efficient batch processing capabilities, they can perform ad-hoc queries searching for exciting insights.
Once something interesting is identified, businesses can gain real-time and continuous insights by merely altering the table so that it is powered by a message queue such as Kafka.
Because Flink guarantees SQL queries have unified semantics over batch and streaming, users can be confident that redeploying this query as a continuous streaming application over a message queue will output identical results.&lt;/p&gt;
&lt;figure class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot; data-lang=&quot;sql&quot;&gt;&lt;span class=&quot;c1&quot;&gt;-- Define a table called orders that is backed by a Kafka topic&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- The definition includes all relevant Kafka properties,&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- the underlying format (JSON) and even defines a&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- watermarking algorithm based on one of the fields&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- so that this table can be used with event time.&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;user_id&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;5&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SECONDS&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;kafka&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.version&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;universal&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.topic&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;orders&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.startup-mode&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;earliest-offset&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.properties.bootstrap.servers&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;localhost:9092&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;format.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;json&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- Define a table called product_analysis&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- on top of ElasticSearch 7 where we &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- can write the results of our query. &lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product_analysis&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;product&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tracking_time&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;units_sold&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;BIGINT&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;elasticsearch&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.version&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;7&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.hosts&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;localhost:9200&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.index&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;ProductAnalysis&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s1&quot;&gt;&amp;#39;connector.document.type&amp;#39;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;analysis&amp;#39;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- A simple query that analyzes order data&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- from Kafka and writes results into &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;-- ElasticSearch. &lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;product_analysis&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;product_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TUMBLE_START&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tracking_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;COUNT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;units_sold&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;orders&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;product_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TUMBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;order_time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
&lt;h1 id=&quot;catalogs&quot;&gt;Catalogs&lt;/h1&gt;
&lt;p&gt;While being able to create tables is important, it often isn’t enough.
A business analyst, for example, shouldn’t have to know what properties to set for Kafka, or even have to know what the underlying data source is, to be able to write a query.&lt;/p&gt;
&lt;p&gt;To solve this problem, Flink 1.10 also ships with a revamped catalog system for managing metadata about tables and user definined functions.
With catalogs, users can create tables once and reuse them across Jobs and Sessions.
Now, the team managing a data set can create a table and immediately make it accessible to other groups within their organization.&lt;/p&gt;
&lt;p&gt;The most notable catalog that Flink integrates with today is Hive Metastore.
The Hive catalog allows Flink to fully interoperate with Hive and serve as a more efficient query engine.
Flink supports reading and writing Hive tables, using Hive UDFs, and even leveraging Hive’s metastore catalog to persist Flink specific metadata.&lt;/p&gt;
&lt;h1 id=&quot;looking-ahead&quot;&gt;Looking Ahead&lt;/h1&gt;
&lt;p&gt;Flink SQL has made enormous strides to democratize stream processing, and 1.10 marks a significant milestone in that development.
However, we are not ones to rest on our laurels and, the community is committed to raising the bar on standards while lowering the barriers to entry.
The community is looking to add more catalogs, such as JDBC and Apache Pulsar.
We encourage you to sign up for the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;mailing list&lt;/a&gt; and stay on top of the announcements and new features in upcoming releases.&lt;/p&gt;
&lt;hr /&gt;
&lt;div class=&quot;footnotes&quot;&gt;
&lt;ol&gt;
&lt;li id=&quot;fn:1&quot;&gt;
&lt;p&gt;My colleague Timo, whose worked on Flink SQL from the beginning, has the entire SQL standard printed on his desk and references it before any changes are merged. It’s enormous. &lt;a href=&quot;#fnref:1&quot; class=&quot;reversefootnote&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
<pubDate>Thu, 20 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/20/ddl.html</link>
<guid isPermaLink="true">/news/2020/02/20/ddl.html</guid>
</item>
<item>
<title>Apache Flink 1.10.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).&lt;/p&gt;
&lt;p&gt;Flink 1.10 also marks the completion of the &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html#preview-of-the-new-blink-sql-query-processor&quot;&gt;Blink integration&lt;/a&gt;, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#improved-memory-management-and-configuration&quot; id=&quot;markdown-toc-improved-memory-management-and-configuration&quot;&gt;Improved Memory Management and Configuration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#unified-logic-for-job-submission&quot; id=&quot;markdown-toc-unified-logic-for-job-submission&quot;&gt;Unified Logic for Job Submission&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#native-kubernetes-integration-beta&quot; id=&quot;markdown-toc-native-kubernetes-integration-beta&quot;&gt;Native Kubernetes Integration (Beta)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-apisql-production-ready-hive-integration&quot; id=&quot;markdown-toc-table-apisql-production-ready-hive-integration&quot;&gt;Table API/SQL: Production-ready Hive Integration&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#other-improvements-to-the-table-apisql&quot; id=&quot;markdown-toc-other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#pyflink-support-for-native-user-defined-functions-udfs&quot; id=&quot;markdown-toc-pyflink-support-for-native-user-defined-functions-udfs&quot;&gt;PyFlink: Support for Native User Defined Functions (UDFs)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;The binary distribution and source artifacts are now available on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; of the Flink website. For more details, check the complete &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12345845&quot;&gt;release changelog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/&quot;&gt;updated documentation&lt;/a&gt;. We encourage you to download the release and share your feedback with the community through the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;improved-memory-management-and-configuration&quot;&gt;Improved Memory Management and Configuration&lt;/h3&gt;
&lt;p&gt;The current &lt;code&gt;TaskExecutor&lt;/code&gt; memory configuration in Flink has some shortcomings that make it hard to reason about or optimize resource utilization, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Different configuration models for memory footprint in Streaming and Batch execution;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Complex and user-dependent configuration of off-heap state backends (i.e. RocksDB) in Streaming execution.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To make memory options more explicit and intuitive to users, Flink 1.10 introduces significant changes to the &lt;code&gt;TaskExecutor&lt;/code&gt; memory model and configuration logic (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors&quot;&gt;FLIP-49&lt;/a&gt;). These changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), giving users strict control over its memory consumption.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Managed Memory Extension&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Managed memory was extended to also account for memory usage of &lt;code&gt;RocksDBStateBackend&lt;/code&gt;. While batch jobs can use either on-heap or off-heap memory, streaming jobs with &lt;code&gt;RocksDBStateBackend&lt;/code&gt; can use off-heap memory only. Therefore, to allow users to switch between Streaming and Batch execution without having to modify cluster configurations, managed memory is now always off-heap.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Simplified RocksDB Configuration&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Configuring an off-heap state backend like RocksDB used to involve a good deal of manual tuning, like decreasing the JVM heap size or setting Flink to use off-heap memory. This can now be achieved through Flink’s out-of-box configuration, and adjusting the memory budget for &lt;code&gt;RocksDBStateBackend&lt;/code&gt; is as simple as resizing the managed memory size.&lt;/p&gt;
&lt;p&gt;Another important improvement was to allow Flink to bind RocksDB native memory usage (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7289&quot;&gt;FLINK-7289&lt;/a&gt;), preventing it from exceeding its total memory budget — this is especially relevant in containerized environments like Kubernetes. For details on how to enable and tune this feature, refer to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/state/large_state_tuning.html#tuning-rocksdb&quot;&gt;Tuning RocksDB&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Note&lt;/span&gt; FLIP-49 changes the process of cluster resource configuration, which may require tuning your clusters for upgrades from previous Flink versions. For a comprehensive overview of the changes introduced and tuning guidance, consult &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/memory/mem_setup.html&quot;&gt;this setup&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;unified-logic-for-job-submission&quot;&gt;Unified Logic for Job Submission&lt;/h3&gt;
&lt;p&gt;Prior to this release, job submission was part of the duties of the Execution Environments and closely tied to the different deployment targets (e.g. Yarn, Kubernetes, Mesos). This led to a poor separation of concerns and, over time, to a growing number of customized environments that users needed to configure and manage separately.&lt;/p&gt;
&lt;p&gt;In Flink 1.10, job submission logic is abstracted into the generic &lt;code&gt;Executor&lt;/code&gt; interface (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission&quot;&gt;FLIP-73&lt;/a&gt;). The addition of the &lt;code&gt;ExecutorCLI&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=133631524&quot;&gt;FLIP-81&lt;/a&gt;) introduces a unified way to specify configuration parameters for &lt;strong&gt;any&lt;/strong&gt; &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/cli.html#deployment-targets&quot;&gt;execution target&lt;/a&gt;. To round up this effort, the process of result retrieval was also decoupled from job submission with the introduction of a &lt;code&gt;JobClient&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API&quot;&gt;FLINK-74&lt;/a&gt;), responsible for fetching the &lt;code&gt;JobExecutionResult&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;
&lt;center&gt;
&lt;img vspace=&quot;8&quot; style=&quot;width:100%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_zeppelin.png&quot; /&gt;
&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;In particular, these changes make it much easier to programmatically use Flink in downstream frameworks — for example, Apache Beam or Zeppelin interactive notebooks — by providing users with a unified entry point to Flink. For users working with Flink across multiple target environments, the transition to a configuration-based execution process also significantly reduces boilerplate code and maintainability overhead.&lt;/p&gt;
&lt;h3 id=&quot;native-kubernetes-integration-beta&quot;&gt;Native Kubernetes Integration (Beta)&lt;/h3&gt;
&lt;p&gt;For users looking to get started with Flink on a containerized environment, deploying and managing a standalone cluster on top of Kubernetes requires some upfront knowledge about containers, operators and environment-specific tools like &lt;code&gt;kubectl&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;In Flink 1.10, we rolled out the first phase of &lt;strong&gt;Active Kubernetes Integration&lt;/strong&gt; (&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-9953&quot;&gt;FLINK-9953&lt;/a&gt;) with support for session clusters (with per-job planned). In this context, “active” means that Flink’s ResourceManager (&lt;code&gt;K8sResMngr&lt;/code&gt;) natively communicates with Kubernetes to allocate new pods on-demand, similar to Flink’s Yarn and Mesos integration. Users can also leverage namespaces to launch Flink clusters for multi-tenant environments with limited aggregate resource consumption. RBAC roles and service accounts with enough permission should be configured beforehand.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;
&lt;center&gt;
&lt;img vspace=&quot;8&quot; style=&quot;width:75%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_nativek8s.png&quot; /&gt;
&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;As introduced in &lt;a href=&quot;#unified-logic-for-job-submission&quot;&gt;Unified Logic For Job Submission&lt;/a&gt;, all command-line options in Flink 1.10 are mapped to a unified configuration. For this reason, users can simply refer to the Kubernetes config options and submit a job to an existing Flink session on Kubernetes in the CLI using:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&amp;lt;ClusterId&amp;gt; examples/streaming/WindowJoin.jar&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you want to try out this preview feature, we encourage you to walk through the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/native_kubernetes.html&quot;&gt;Native Kubernetes setup&lt;/a&gt;, play around with it and share feedback with the community.&lt;/p&gt;
&lt;h3 id=&quot;table-apisql-production-ready-hive-integration&quot;&gt;Table API/SQL: Production-ready Hive Integration&lt;/h3&gt;
&lt;p&gt;Hive integration was announced as a preview feature in Flink 1.9. This preview allowed users to persist Flink-specific metadata (e.g. Kafka tables) in Hive Metastore using SQL DDL, call UDFs defined in Hive and use Flink for reading and writing Hive tables. Flink 1.10 rounds up this effort with further developments that bring production-ready Hive integration to Flink with full compatibility of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/#supported-hive-versions&quot;&gt;most Hive versions&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&quot;native-partition-support-for-batch-sql&quot;&gt;Native Partition Support for Batch SQL&lt;/h4&gt;
&lt;p&gt;So far, only writes to non-partitioned Hive tables were supported. In Flink 1.10, the Flink SQL syntax has been extended with &lt;code&gt;INSERT OVERWRITE&lt;/code&gt; and &lt;code&gt;PARTITION&lt;/code&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-63%3A+Rework+table+partition+support&quot;&gt;FLIP-63&lt;/a&gt;), enabling users to write into both static and dynamic partitions in Hive.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Static Partition Writing&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVERWRITE&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename1&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PARTITION&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partcol1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partcol2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;val2&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;...)]&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;select_statement1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Dynamic Partition Writing&lt;/strong&gt;&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;INSERT&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;INTO&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;|&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OVERWRITE&lt;/span&gt; &lt;span class=&quot;err&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tablename1&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;select_statement1&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;from_statement&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Fully supporting partitioned tables allows users to take advantage of partition pruning on read, which significantly increases the performance of these operations by reducing the amount of data that needs to be scanned.&lt;/p&gt;
&lt;h4 id=&quot;further-optimizations&quot;&gt;Further Optimizations&lt;/h4&gt;
&lt;p&gt;Besides partition pruning, Flink 1.10 introduces more &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/hive/read_write_hive.html#optimizations&quot;&gt;read optimizations&lt;/a&gt; to Hive integration, such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Projection pushdown:&lt;/strong&gt; Flink leverages projection pushdown to minimize data transfer between Flink and Hive tables by omitting unnecessary fields from table scans. This is especially beneficial for tables with a large number of columns.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;LIMIT pushdown:&lt;/strong&gt; for queries with the &lt;code&gt;LIMIT&lt;/code&gt; clause, Flink will limit the number of output records wherever possible to minimize the amount of data transferred across the network.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ORC Vectorization on Read:&lt;/strong&gt; to boost read performance for ORC files, Flink now uses the native ORC Vectorized Reader by default for Hive versions above 2.0.0 and columns with non-complex data types.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;pluggable-modules-as-flink-system-objects-beta&quot;&gt;Pluggable Modules as Flink System Objects (Beta)&lt;/h4&gt;
&lt;p&gt;Flink 1.10 introduces a generic mechanism for pluggable modules in the Flink table core, with a first focus on system functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules&quot;&gt;FLIP-68&lt;/a&gt;). With modules, users can extend Flink’s system objects — for example use Hive built-in functions that behave like Flink system functions. This release ships with a pre-implemented &lt;code&gt;HiveModule&lt;/code&gt;, supporting multiple Hive versions, but users are also given the possibility to &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/modules.html&quot;&gt;write their own pluggable modules&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;other-improvements-to-the-table-apisql&quot;&gt;Other Improvements to the Table API/SQL&lt;/h3&gt;
&lt;h4 id=&quot;watermarks-and-computed-columns-in-sql-ddl&quot;&gt;Watermarks and Computed Columns in SQL DDL&lt;/h4&gt;
&lt;p&gt;Flink 1.10 supports stream-specific syntax extensions to define time attributes and watermark generation in Flink SQL DDL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+Time+Attribute+in+SQL+DDL&quot;&gt;FLIP-66&lt;/a&gt;). This allows time-based operations, like windowing, and the definition of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/create.html#create-table&quot;&gt;watermark strategies&lt;/a&gt; on tables created using DDL statements.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;TABLE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;table_name&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;WATERMARK&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FOR&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;columnName&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;watermark_strategy_expression&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;WITH&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This release also introduces support for virtual computed columns (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-70%3A+Flink+SQL+Computed+Column+Design&quot;&gt;FLIP-70&lt;/a&gt;) that can be derived based on other columns in the same table or deterministic expressions (i.e. literal values, UDFs and built-in functions). In Flink, computed columns are useful to define time attributes &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/create.html#create-table&quot;&gt;upon table creation&lt;/a&gt;.&lt;/p&gt;
&lt;h4 id=&quot;additional-extensions-to-sql-ddl&quot;&gt;Additional Extensions to SQL DDL&lt;/h4&gt;
&lt;p&gt;There is now a clear distinction between temporary/persistent and system/catalog functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog&quot;&gt;FLIP-57&lt;/a&gt;). This not only eliminates ambiguity in function reference, but also allows for deterministic function resolution order (i.e. in case of naming collision, system functions will precede catalog functions, with temporary functions taking precedence over persistent functions for both dimensions).&lt;/p&gt;
&lt;p&gt;Following the groundwork in FLIP-57, we extended the SQL DDL syntax to support the creation of catalog functions, temporary functions and temporary system functions (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-79+Flink+Function+DDL+Support&quot;&gt;FLIP-79&lt;/a&gt;):&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;CREATE&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;TEMPORARY&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;SYSTEM&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FUNCTION&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IF&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;NOT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;EXISTS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;catalog_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.][&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;db_name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.]&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;function_name&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;identifier&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;LANGUAGE&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;JAVA&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;|&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SCALA&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For a complete overview of the current state of DDL support in Flink SQL, check the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/sql/&quot;&gt;updated documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;span class=&quot;label label-danger&quot;&gt;Note&lt;/span&gt; In order to correctly handle and guarantee a consistent behavior across meta-objects (tables, views, functions) in the future, some object declaration methods in the Table API have been deprecated in favor of methods that are closer to standard SQL DDL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module&quot;&gt;FLIP-64&lt;/a&gt;).&lt;/p&gt;
&lt;h4 id=&quot;full-tpc-ds-coverage-for-batch&quot;&gt;Full TPC-DS Coverage for Batch&lt;/h4&gt;
&lt;p&gt;TPC-DS is a widely used industry-standard decision support benchmark to evaluate and measure the performance of SQL-based data processing engines. In Flink 1.10, all TPC-DS queries are supported end-to-end (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11491&quot;&gt;FLINK-11491&lt;/a&gt;), reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads.&lt;/p&gt;
&lt;h3 id=&quot;pyflink-support-for-native-user-defined-functions-udfs&quot;&gt;PyFlink: Support for Native User Defined Functions (UDFs)&lt;/h3&gt;
&lt;p&gt;A preview of PyFlink was introduced in the previous release, making headway towards the goal of full Python support in Flink. For this release, the focus was to enable users to register and use Python User-Defined Functions (UDF, with UDTF/UDAF planned) in the Table API/SQL (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table&quot;&gt;FLIP-58&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;&lt;span&gt;
&lt;center&gt;
&lt;img vspace=&quot;8&quot; hspace=&quot;100&quot; style=&quot;width:75%&quot; src=&quot;/img/blog/2020-02-11-release-1.10.0/flink_1.10_pyflink.gif&quot; /&gt;
&lt;/center&gt;
&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;If you are interested in the underlying implementation — leveraging Apache Beam’s &lt;a href=&quot;https://beam.apache.org/roadmap/portability/&quot;&gt;Portability Framework&lt;/a&gt; — refer to the “Architecture” section of FLIP-58 and also to &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-78%3A+Flink+Python+UDF+Environment+and+Dependency+Management&quot;&gt;FLIP-78&lt;/a&gt;. These data structures lay the required foundation for Pandas support and for PyFlink to eventually reach the DataStream API.&lt;/p&gt;
&lt;p&gt;From Flink 1.10, users can also easily install PyFlink through &lt;code&gt;pip&lt;/code&gt; using:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;pip install apache-flink&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For a preview of other improvements planned for PyFlink, check &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14500&quot;&gt;FLINK-14500&lt;/a&gt; and get involved in the &lt;a href=&quot;http://apache-flink.147419.n8.nabble.com/Re-DISCUSS-What-parts-of-the-Python-API-should-we-focus-on-next-td1285.html&quot;&gt;discussion&lt;/a&gt; for requested user features.&lt;/p&gt;
&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10725&quot;&gt;FLINK-10725&lt;/a&gt;] Flink can now be compiled and run on Java 11.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://jira.apache.org/jira/browse/FLINK-15495&quot;&gt;FLINK-15495&lt;/a&gt;] The Blink planner is now the default in the SQL Client, so that users can benefit from all the latest features and improvements. The switch from the old planner in the Table API is also planned for the next release, so we recommend that users start getting familiar with the Blink planner.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13025&quot;&gt;FLINK-13025&lt;/a&gt;] There is a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/connectors/elasticsearch.html#elasticsearch-connector&quot;&gt;new Elasticsearch sink connector&lt;/a&gt;, fully supporting Elasticsearch 7.x versions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15115&quot;&gt;FLINK-15115&lt;/a&gt;] The connectors for Kafka 0.8 and 0.9 have been marked as deprecated and will no longer be actively supported. If you are still using these versions or have any other related concerns, please reach out to the @dev mailing list.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14516&quot;&gt;FLINK-14516&lt;/a&gt;] The non-credit-based network flow control code was removed, along with the configuration option &lt;code&gt;taskmanager.network.credit.model&lt;/code&gt;. Moving forward, Flink will always use credit-based flow control.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12122&quot;&gt;FLINK-12122&lt;/a&gt;] &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt; was rolled out with Flink 1.5.0 and introduced a code regression related to the way slots are allocated from &lt;code&gt;TaskManagers&lt;/code&gt;. To use a scheduling strategy that is closer to the pre-FLIP behavior, where Flink tries to spread out the workload across all currently available &lt;code&gt;TaskManagers&lt;/code&gt;, users can set &lt;code&gt;cluster.evenly-spread-out-slots: true&lt;/code&gt; in the &lt;code&gt;flink-conf.yaml&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11956&quot;&gt;FLINK-11956&lt;/a&gt;] &lt;code&gt;s3-hadoop&lt;/code&gt; and &lt;code&gt;s3-presto&lt;/code&gt; filesystems no longer use class relocations and should be loaded through &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/filesystems/#pluggable-file-systems&quot;&gt;plugins&lt;/a&gt;, but now seamlessly integrate with all credential providers. Other filesystems are strongly recommended to be used only as plugins, as we will continue to remove relocations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Flink 1.9 shipped with a refactored Web UI, with the legacy one being kept around as backup in case something wasn’t working as expected. No issues have been reported so far, so &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Remove-old-WebUI-td35218.html&quot;&gt;the community voted&lt;/a&gt; to drop the legacy Web UI in Flink 1.10.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.10/release-notes/flink-1.10.html&quot;&gt;release notes&lt;/a&gt; carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.10. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;The Apache Flink community would like to thank all contributors that have made this release possible:&lt;/p&gt;
&lt;p&gt;Achyuth Samudrala, Aitozi, Alberto Romero, Alec.Ch, Aleksey Pak, Alexander Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrey Zagrebin, Arvid Heise, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Biao Liu, Chesnay Schepler, Congxian Qiu, Cyrille Chépélov, César Soto Valero, David Anderson, David Hrbacek, David Moravek, Dawid Wysakowicz, Dezhi Cai, Dian Fu, Dyana Rose, Eamon Taaffe, Fabian Hueske, Fawad Halim, Fokko Driesprong, Frey Gao, Gabor Gevay, Gao Yun, Gary Yao, GatsbyNewton, GitHub, Grebennikov Roman, GuoWei Ma, Gyula Fora, Haibo Sun, Hao Dang, Henvealf, Hongtao Zhang, HuangXingBo, Hwanju Kim, Igal Shilman, Jacob Sevart, Jark Wu, Jeff Martin, Jeff Yang, Jeff Zhang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Joao Boto, John Lonergan, Kaibo Zhou, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Ling Wang, Lining Jing, Liupengcheng, LouisXu, Mads Chr. Olesen, Marco Zühlke, Marcos Klein, Matyas Orhidi, Maximilian Bode, Maximilian Michels, Nick Pavlakis, Nico Kruber, Nicolas Deslandes, Pablo Valtuille, Paul Lam, Paul Lin, PengFei Li, Piotr Nowojski, Piotr Przybylski, Piyush Narang, Ricco Chen, Richard Deurwaarder, Robert Metzger, Roman, Roman Grebennikov, Roman Khachatryan, Rong Rong, Rui Li, Ryan Tao, Scott Kidder, Seth Wiesman, Shannon Carey, Shaobin.Ou, Shuo Cheng, Stefan Richter, Stephan Ewen, Steve OU, Steven Wu, Terry Wang, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Wind (Jiayi Liao), Xintong Song, XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yadong Xie, Yang Wang, Yangze Guo, Yikun Jiang, Ying, YngwieWang, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, a-suiniaev, azagrebin, beyond1920, biao.liub, blueszheng, bowen.li, caoyingjie, catkint, chendonglin, chenqi, chunpinghe, cyq89051127, danrtsey.wy, dengziming, dianfu, eskabetxe, fanrui, forideal, gentlewang, godfrey he, godfreyhe, haodang, hehuiyuan, hequn8128, hpeter, huangxingbo, huzheng, ifndef-SleePy, jiemotongxue, joe, jrthe42, kevin.cyj, klion26, lamber-ken, libenchao, liketic, lincoln-lil, lining, liuyongvs, liyafan82, lz, mans2singh, mojo, openinx, ouyangwulin, shining-huang, shuai-xu, shuo.cs, stayhsfLee, sunhaibotb, sunjincheng121, tianboxiu, tianchen, tianchen92, tison, tszkitlo40, unknown, vinoyang, vthinkxie, wangpeibin, wangxiaowei, wangxiyuan, wangxlong, wangyang0918, whlwanghailong, xuchao0903, xuyang1706, yanghua, yangjf2019, yongqiang chai, yuzhao.cyz, zentol, zhangzhanchum, zhengcanbin, zhijiang, zhongyong jin, zhuzhu.zz, zjuwangg, zoudaokoulife, 砚田, 谢磊, 张志豪, 曹建华&lt;/p&gt;
</description>
<pubDate>Tue, 11 Feb 2020 03:30:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/11/release-1.10.0.html</link>
<guid isPermaLink="true">/news/2020/02/11/release-1.10.0.html</guid>
</item>
<item>
<title>A Guide for Unit Testing in Apache Flink</title>
<description>&lt;p&gt;Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications.
Apache Flink provides a robust unit testing framework to make sure your applications behave in production as expected during development. You need to include the following dependencies to utilize the provided framework.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-test-utils_${scala.binary.version}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;${flink.version}&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-runtime_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.0&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;classifier&amp;gt;&lt;/span&gt;tests&lt;span class=&quot;nt&quot;&gt;&amp;lt;/classifier&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.0&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;scope&amp;gt;&lt;/span&gt;test&lt;span class=&quot;nt&quot;&gt;&amp;lt;/scope&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;classifier&amp;gt;&lt;/span&gt;tests&lt;span class=&quot;nt&quot;&gt;&amp;lt;/classifier&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The strategy of writing unit tests differs for various operators. You can break down the strategy into the following three buckets:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Stateless Operators&lt;/li&gt;
&lt;li&gt;Stateful Operators&lt;/li&gt;
&lt;li&gt;Timed Process Operators&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;stateless-operators&quot;&gt;Stateless Operators&lt;/h1&gt;
&lt;p&gt;Writing unit tests for a stateless operator is a breeze. You need to follow the basic norm of writing a test case, i.e., create an instance of the function class and test the appropriate methods. Let’s take an example of a simple &lt;code&gt;Map&lt;/code&gt; operator.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyStatelessMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The test case for the above operator should look like&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MyStatelessMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyStatelessMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Pretty simple, right? Let’s take a look at one for the &lt;code&gt;FlatMap&lt;/code&gt; operator.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyStatelessFlatMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;FlatMap&lt;/code&gt; operators require a &lt;code&gt;Collector&lt;/code&gt; object along with the input. For the test case, we have two options:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Mock the &lt;code&gt;Collector&lt;/code&gt; object using Mockito&lt;/li&gt;
&lt;li&gt;Use the &lt;code&gt;ListCollector&lt;/code&gt; provided by Flink&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I prefer the second method as it requires fewer lines of code and is suitable for most of the cases.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MyStatelessFlatMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statelessFlatMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyStatelessFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ListCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;listCollector&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ListCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;statelessFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;listCollector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id=&quot;stateful-operators&quot;&gt;Stateful Operators&lt;/h1&gt;
&lt;p&gt;Writing test cases for stateful operators requires more effort. You need to check whether the operator state is updated correctly and if it is cleaned up properly along with the output of the operator.&lt;/p&gt;
&lt;p&gt;Let’s take an example of stateful &lt;code&gt;FlatMap&lt;/code&gt; function&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StatefulFlatMap&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichFlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;previousInput&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;){&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot; &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The intricate part of writing tests for the above class is to mock the configuration as well as the runtime context of the application. Flink provides TestHarness classes so that users don’t have to create the mock objects themselves. Using the &lt;code&gt;KeyedOperatorHarness&lt;/code&gt;, the test looks like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.api.operators.StreamFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.runtime.streamrecord.StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StatefulFlatMap&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;StatefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// OneInputStreamOperatorTestHarness takes the input and output types as type parameters &lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// KeyedOneInputStreamOperatorTestHarness takes three arguments:&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Flink operator object, key selector and key type&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// test first record&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;statefulFlatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;previousInput&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateValue&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// test second record&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;parallel&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello parallel world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;parallel&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousInput&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The test harness provides many helper methods, three of which are being used here:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;open&lt;/code&gt;: calls the open of the &lt;code&gt;FlatMap&lt;/code&gt; function with relevant parameters. It also initializes the context.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;processElement&lt;/code&gt;: allows users to pass an input element as well as the timestamp associated with the element.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;extractOutputStreamRecords&lt;/code&gt;: gets the output records along with their timestamps from the &lt;code&gt;Collector&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The test harness simplifies the unit testing for the stateful functions to a large extent.&lt;/p&gt;
&lt;p&gt;You might also need to check whether the state value is being set correctly. You can get the state value directly from the operator using a mechanism similar to the one used while creating the state. This is also demonstrated in the previous example.&lt;/p&gt;
&lt;h1 id=&quot;timed-process-operators&quot;&gt;Timed Process Operators&lt;/h1&gt;
&lt;p&gt;Writing tests for process functions, that work with time, is quite similar to writing tests for stateful functions because you can also use test harness.
However, you need to take care of another aspect, which is providing timestamps for events and controlling the current time of the application. By setting the current (processing or event) time, you can trigger registered timers, which will call the &lt;code&gt;onTimer&lt;/code&gt; method of the function&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timerService&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerProcessingTimeTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;hello &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;onTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;long&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;OnTimerContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;format&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Timer triggered at timestamp %d&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;timestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We need to test both the methods in the &lt;code&gt;KeyedProcessFunction&lt;/code&gt;, i.e., &lt;code&gt;processElement&lt;/code&gt; as well as &lt;code&gt;onTimer&lt;/code&gt;. Using a test harness, we can control the current time of the function. Thus, we can trigger the timer at will rather than waiting for a specific time.&lt;/p&gt;
&lt;p&gt;Let’s take a look at the test case&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testProcessElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessOperator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Function time is initialized to 0&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Test&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;testOnTimer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MyProcessFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;MyProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedOneInputStreamOperatorTestHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessOperator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;myProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;1&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;numProcessingTimeTimers&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Function time is set to 50&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProcessingTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;50&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Assert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;assertEquals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Lists&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;hello world&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamRecord&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Timer triggered at timestamp 50&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;testHarness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;extractOutputStreamRecords&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The mechanism to test the multi-input stream operators such as CoProcess functions is similar to the ones described in this article. You should use the TwoInput variant of the harness for these operators, such as &lt;code&gt;TwoInputStreamOperatorTestHarness&lt;/code&gt;.&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;In the previous sections we showcased how unit testing in Apache Flink works for stateless, stateful and times-aware-operators. We hope you found the steps easy to follow and execute while developing your Flink applications. If you have any questions or feedback you can reach out to me &lt;a href=&quot;https://www.kharekartik.dev/about/&quot;&gt;here&lt;/a&gt; or contact the community on the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;Apache Flink user mailing list&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Fri, 07 Feb 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html</link>
<guid isPermaLink="true">/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html</guid>
</item>
<item>
<title>Apache Flink 1.9.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.&lt;/p&gt;
&lt;p&gt;This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.9.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12122&quot;&gt;FLINK-12122&lt;/a&gt;] - Spread out tasks evenly across all available registered TaskManagers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13360&quot;&gt;FLINK-13360&lt;/a&gt;] - Add documentation for HBase connector for Table API &amp;amp; SQL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13361&quot;&gt;FLINK-13361&lt;/a&gt;] - Add documentation for JDBC connector for Table API &amp;amp; SQL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13723&quot;&gt;FLINK-13723&lt;/a&gt;] - Use liquid-c for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13724&quot;&gt;FLINK-13724&lt;/a&gt;] - Remove unnecessary whitespace from the docs&amp;#39; sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13725&quot;&gt;FLINK-13725&lt;/a&gt;] - Use sassc for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13726&quot;&gt;FLINK-13726&lt;/a&gt;] - Build docs with jekyll 4.0.0.pre.beta1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13791&quot;&gt;FLINK-13791&lt;/a&gt;] - Speed up sidenav by using group_by
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13817&quot;&gt;FLINK-13817&lt;/a&gt;] - Expose whether web submissions are enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13818&quot;&gt;FLINK-13818&lt;/a&gt;] - Check whether web submission are enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14535&quot;&gt;FLINK-14535&lt;/a&gt;] - Cast exception is thrown when count distinct on decimal fields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14735&quot;&gt;FLINK-14735&lt;/a&gt;] - Improve batch schedule check input consumable performance
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10377&quot;&gt;FLINK-10377&lt;/a&gt;] - Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10435&quot;&gt;FLINK-10435&lt;/a&gt;] - Client sporadically hangs after Ctrl + C
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11120&quot;&gt;FLINK-11120&lt;/a&gt;] - TIMESTAMPADD function handles TIME incorrectly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11835&quot;&gt;FLINK-11835&lt;/a&gt;] - ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] - Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12399&quot;&gt;FLINK-12399&lt;/a&gt;] - FilterableTableSource does not use filters on job run
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13184&quot;&gt;FLINK-13184&lt;/a&gt;] - Starting a TaskExecutor blocks the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13589&quot;&gt;FLINK-13589&lt;/a&gt;] - DelimitedInputFormat index error on multi-byte delimiters with whole file input splits
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13702&quot;&gt;FLINK-13702&lt;/a&gt;] - BaseMapSerializerTest.testDuplicate fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13708&quot;&gt;FLINK-13708&lt;/a&gt;] - Transformations should be cleared because a table environment could execute multiple job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13740&quot;&gt;FLINK-13740&lt;/a&gt;] - TableAggregateITCase.testNonkeyedFlatAggregate failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13749&quot;&gt;FLINK-13749&lt;/a&gt;] - Make Flink client respect classloading policy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13758&quot;&gt;FLINK-13758&lt;/a&gt;] - Failed to submit JobGraph when registered hdfs file in DistributedCache
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13799&quot;&gt;FLINK-13799&lt;/a&gt;] - Web Job Submit Page displays stream of error message when web submit is disables in the config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13827&quot;&gt;FLINK-13827&lt;/a&gt;] - Shell variable should be escaped in start-scala-shell.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13862&quot;&gt;FLINK-13862&lt;/a&gt;] - Update Execution Plan docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13945&quot;&gt;FLINK-13945&lt;/a&gt;] - Instructions for building flink-shaded against vendor repository don&amp;#39;t work
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13969&quot;&gt;FLINK-13969&lt;/a&gt;] - Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13995&quot;&gt;FLINK-13995&lt;/a&gt;] - Fix shading of the licence information of netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13999&quot;&gt;FLINK-13999&lt;/a&gt;] - Correct the documentation of MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14066&quot;&gt;FLINK-14066&lt;/a&gt;] - Pyflink building failure in master and 1.9.0 version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14074&quot;&gt;FLINK-14074&lt;/a&gt;] - MesosResourceManager can&amp;#39;t create new taskmanagers in Session Cluster Mode.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14175&quot;&gt;FLINK-14175&lt;/a&gt;] - Upgrade KPL version in flink-connector-kinesis to fix application OOM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14200&quot;&gt;FLINK-14200&lt;/a&gt;] - Temporal Table Function Joins do not work on Tables (only TableSources) on the query side
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14235&quot;&gt;FLINK-14235&lt;/a&gt;] - Kafka010ProducerITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14315&quot;&gt;FLINK-14315&lt;/a&gt;] - NPE with JobMaster.disconnectTaskManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14337&quot;&gt;FLINK-14337&lt;/a&gt;] - HistoryServer does not handle NPE on corruped archives properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14347&quot;&gt;FLINK-14347&lt;/a&gt;] - YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14355&quot;&gt;FLINK-14355&lt;/a&gt;] - Example code in state processor API docs doesn&amp;#39;t compile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14370&quot;&gt;FLINK-14370&lt;/a&gt;] - KafkaProducerAtLeastOnceITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14382&quot;&gt;FLINK-14382&lt;/a&gt;] - Incorrect handling of FLINK_PLUGINS_DIR on Yarn
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14398&quot;&gt;FLINK-14398&lt;/a&gt;] - Further split input unboxing code into separate methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14413&quot;&gt;FLINK-14413&lt;/a&gt;] - Shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14434&quot;&gt;FLINK-14434&lt;/a&gt;] - Dispatcher#createJobManagerRunner should not start JobManagerRunner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14445&quot;&gt;FLINK-14445&lt;/a&gt;] - Python module build failed when making sdist
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14447&quot;&gt;FLINK-14447&lt;/a&gt;] - Network metrics doc table render confusion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14459&quot;&gt;FLINK-14459&lt;/a&gt;] - Python module build hangs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14524&quot;&gt;FLINK-14524&lt;/a&gt;] - PostgreSQL JDBC sink generates invalid SQL in upsert mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14547&quot;&gt;FLINK-14547&lt;/a&gt;] - UDF cannot be in the join condition in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14561&quot;&gt;FLINK-14561&lt;/a&gt;] - Don&amp;#39;t write FLINK_PLUGINS_DIR ENV variable to Flink configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14562&quot;&gt;FLINK-14562&lt;/a&gt;] - RMQSource leaves idle consumer after closing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14574&quot;&gt;FLINK-14574&lt;/a&gt;] - flink-s3-fs-hadoop doesn&amp;#39;t work with plugins mechanism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14589&quot;&gt;FLINK-14589&lt;/a&gt;] - Redundant slot requests with the same AllocationID leads to inconsistent slot table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14641&quot;&gt;FLINK-14641&lt;/a&gt;] - Fix description of metric `fullRestarts`
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14673&quot;&gt;FLINK-14673&lt;/a&gt;] - Shouldn&amp;#39;t expect HMS client to throw NoSuchObjectException for non-existing function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14683&quot;&gt;FLINK-14683&lt;/a&gt;] - RemoteStreamEnvironment&amp;#39;s construction function has a wrong method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14701&quot;&gt;FLINK-14701&lt;/a&gt;] - Slot leaks if SharedSlotOversubscribedException happens
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14784&quot;&gt;FLINK-14784&lt;/a&gt;] - CsvTableSink miss delimiter when row start with null member
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14817&quot;&gt;FLINK-14817&lt;/a&gt;] - &amp;quot;Streaming Aggregation&amp;quot; document contains misleading code examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14846&quot;&gt;FLINK-14846&lt;/a&gt;] - Correct the default writerbuffer size documentation of RocksDB
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14910&quot;&gt;FLINK-14910&lt;/a&gt;] - DisableAutoGeneratedUIDs fails on keyBy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14930&quot;&gt;FLINK-14930&lt;/a&gt;] - OSS Filesystem Uses Wrong Shading Prefix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14949&quot;&gt;FLINK-14949&lt;/a&gt;] - Task cancellation can be stuck against out-of-thread error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14951&quot;&gt;FLINK-14951&lt;/a&gt;] - State TTL backend end-to-end test fail when taskManager has multiple slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14953&quot;&gt;FLINK-14953&lt;/a&gt;] - Parquet table source should use schema type to build FilterPredicate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14960&quot;&gt;FLINK-14960&lt;/a&gt;] - Dependency shading of table modules test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14976&quot;&gt;FLINK-14976&lt;/a&gt;] - Cassandra Connector leaks Semaphore on Throwable; hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15001&quot;&gt;FLINK-15001&lt;/a&gt;] - The digest of sub-plan reuse should contain retraction traits for stream physical nodes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15013&quot;&gt;FLINK-15013&lt;/a&gt;] - Flink (on YARN) sometimes needs too many slots
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15030&quot;&gt;FLINK-15030&lt;/a&gt;] - Potential deadlock for bounded blocking ResultPartition.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15036&quot;&gt;FLINK-15036&lt;/a&gt;] - Container startup error will be handled out side of the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15063&quot;&gt;FLINK-15063&lt;/a&gt;] - Input group and output group of the task metric are reversed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15065&quot;&gt;FLINK-15065&lt;/a&gt;] - RocksDB configurable options doc description error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15076&quot;&gt;FLINK-15076&lt;/a&gt;] - Source thread should be interrupted during the Task cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15234&quot;&gt;FLINK-15234&lt;/a&gt;] - Hive table created from flink catalog table shouldn&amp;#39;t have null properties in parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15240&quot;&gt;FLINK-15240&lt;/a&gt;] - is_generic key is missing for Flink table stored in HiveCatalog
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15259&quot;&gt;FLINK-15259&lt;/a&gt;] - HiveInspector.toInspectors() should convert Flink constant to Hive constant
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15266&quot;&gt;FLINK-15266&lt;/a&gt;] - NPE in blink planner code gen
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15361&quot;&gt;FLINK-15361&lt;/a&gt;] - ParquetTableSource should pass predicate in projectFields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15412&quot;&gt;FLINK-15412&lt;/a&gt;] - LocalExecutorITCase#testParameterizedTypes failed in travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15413&quot;&gt;FLINK-15413&lt;/a&gt;] - ScalarOperatorsTest failed in travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15418&quot;&gt;FLINK-15418&lt;/a&gt;] - StreamExecMatchRule not set FlinkRelDistribution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15421&quot;&gt;FLINK-15421&lt;/a&gt;] - GroupAggsHandler throws java.time.LocalDateTime cannot be cast to java.sql.Timestamp
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15435&quot;&gt;FLINK-15435&lt;/a&gt;] - ExecutionConfigTests.test_equals_and_hash in pyFlink fails when cpu core numbers is 6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15443&quot;&gt;FLINK-15443&lt;/a&gt;] - Use JDBC connector write FLOAT value occur ClassCastException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15478&quot;&gt;FLINK-15478&lt;/a&gt;] - FROM_BASE64 code gen type wrong
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15489&quot;&gt;FLINK-15489&lt;/a&gt;] - WebUI log refresh not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15522&quot;&gt;FLINK-15522&lt;/a&gt;] - Misleading root cause exception when cancelling the job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15523&quot;&gt;FLINK-15523&lt;/a&gt;] - ConfigConstants generally excluded from japicmp
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15543&quot;&gt;FLINK-15543&lt;/a&gt;] - Apache Camel not bundled but listed in flink-dist NOTICE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15549&quot;&gt;FLINK-15549&lt;/a&gt;] - Integer overflow in SpillingResettableMutableObjectIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15577&quot;&gt;FLINK-15577&lt;/a&gt;] - WindowAggregate RelNodes missing Window specs in digest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15615&quot;&gt;FLINK-15615&lt;/a&gt;] - Docs: wrong guarantees stated for the file sink
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11135&quot;&gt;FLINK-11135&lt;/a&gt;] - Reorder Hadoop config loading in HadoopUtils
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12848&quot;&gt;FLINK-12848&lt;/a&gt;] - Method equals() in RowTypeInfo should consider fieldsNames
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13729&quot;&gt;FLINK-13729&lt;/a&gt;] - Update website generation dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14008&quot;&gt;FLINK-14008&lt;/a&gt;] - Auto-generate binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14104&quot;&gt;FLINK-14104&lt;/a&gt;] - Bump Jackson to 2.10.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14123&quot;&gt;FLINK-14123&lt;/a&gt;] - Lower the default value of taskmanager.memory.fraction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14206&quot;&gt;FLINK-14206&lt;/a&gt;] - Let fullRestart metric count fine grained restarts as well
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14215&quot;&gt;FLINK-14215&lt;/a&gt;] - Add Docs for TM and JM Environment Variable Setting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14251&quot;&gt;FLINK-14251&lt;/a&gt;] - Add FutureUtils#forward utility
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14334&quot;&gt;FLINK-14334&lt;/a&gt;] - ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14335&quot;&gt;FLINK-14335&lt;/a&gt;] - ExampleIntegrationTest in testing docs is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14408&quot;&gt;FLINK-14408&lt;/a&gt;] - In OldPlanner, UDF open method can not be invoke when SQL is optimized
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14557&quot;&gt;FLINK-14557&lt;/a&gt;] - Clean up the package of py4j
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14639&quot;&gt;FLINK-14639&lt;/a&gt;] - Metrics User Scope docs refer to wrong class
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14646&quot;&gt;FLINK-14646&lt;/a&gt;] - Check non-null for key in KeyGroupStreamPartitioner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14825&quot;&gt;FLINK-14825&lt;/a&gt;] - Rework state processor api documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14995&quot;&gt;FLINK-14995&lt;/a&gt;] - Kinesis NOTICE is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15113&quot;&gt;FLINK-15113&lt;/a&gt;] - fs.azure.account.key not hidden from global configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15554&quot;&gt;FLINK-15554&lt;/a&gt;] - Bump jetty-util-ajax to 9.3.24
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15657&quot;&gt;FLINK-15657&lt;/a&gt;] - Fix the python table api doc link in Python API tutorial
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15700&quot;&gt;FLINK-15700&lt;/a&gt;] - Improve Python API Tutorial doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15726&quot;&gt;FLINK-15726&lt;/a&gt;] - Fixing error message in StreamExecTableSourceScan
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 30 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/30/release-1.9.2.html</link>
<guid isPermaLink="true">/news/2020/01/30/release-1.9.2.html</guid>
</item>
<item>
<title>State Unlocked: Interacting with State in Apache Flink</title>
<description>&lt;h1 id=&quot;introduction&quot;&gt;Introduction&lt;/h1&gt;
&lt;p&gt;With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink.&lt;/p&gt;
&lt;p&gt;In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink. In particular, Flink developers should have sufficient means to access and modify their state, as well as making bootstrapping state with existing data from external systems a piece-of-cake. These efforts span multiple Flink major releases and consist of the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Evolvable state schema in Apache Flink&lt;/li&gt;
&lt;li&gt;Flexibility in swapping state backends, and&lt;/li&gt;
&lt;li&gt;The State processor API, an offline tool to read, write and modify state in Flink&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This post discusses the community’s efforts related to state management in Flink, provides some practical examples of how the different features and APIs can be utilized and covers some future ideas for new and improved ways of managing state in Apache Flink.&lt;/p&gt;
&lt;h1 id=&quot;stream-processing-what-is-state&quot;&gt;Stream processing: What is State?&lt;/h1&gt;
&lt;p&gt;To set the tone for the remaining of the post, let us first try to explain the very definition of state in stream processing. When it comes to stateful stream processing, state comprises of the information that an application or stream processing engine will remember across events and streams as more realtime (unbounded) and/or offline (bounded) data flow through the system. Most trivial applications are inherently stateful; even the example of a simple COUNT operation, whereby when counting up to 10, you essentially need to remember that you have already counted up to 9.&lt;/p&gt;
&lt;p&gt;To better understand how Flink manages state, one can think of Flink like a three-layered state abstraction, as illustrated in the diagram below.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-visual-1.png&quot; width=&quot;600px&quot; alt=&quot;State in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;On the top layer, sits the Flink user code, for example, a &lt;code&gt;KeyedProcessFunction&lt;/code&gt; that contains some value state. This is a simple variable whose value state annotations makes it automatically fault-tolerant, re-scalable and queryable by the runtime. These variables are backed by the configured state backend that sits either on-heap or on-disk (RocksDB State Backend) and provides data locality, proximity to the computation and speed when it comes to per-record computations. Finally, when it comes to upgrades, the introduction of new features or bug fixes, and in order to keep your existing state intact, this is where savepoints come in.&lt;/p&gt;
&lt;p&gt;A savepoint is a snapshot of the distributed, global state of an application at a logical point-in-time and is stored in an external distributed file system or blob storage such as HDFS, or S3. Upon upgrading an application or implementing a code change — such as adding a new operator or changing a field — the Flink job can restart by re-loading the application state from the savepoint into the state backend, making it local and available for the computation and continue processing as if nothing had ever happened.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-visual-2.png&quot; width=&quot;600px&quot; alt=&quot;State in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
It is important to remember here that &lt;b&gt;state is one of the most valuable components of a Flink application&lt;/b&gt; carrying all the information about both where you are now and where you are going. State is among the most long-lived components in a Flink service since it can be carried across jobs, operators, configurations, new features and bug fixes.
&lt;/div&gt;
&lt;h1 id=&quot;schema-evolution-with-apache-flink&quot;&gt;Schema Evolution with Apache Flink&lt;/h1&gt;
&lt;p&gt;In the previous section, we explained how state is stored and persisted in a Flink application. Let’s now take a look at what happens when evolving state in a stateful Flink streaming application becomes necessary.&lt;/p&gt;
&lt;p&gt;Imagine an Apache Flink application that implements a &lt;code&gt;KeyedProcessFunction&lt;/code&gt; and contains some &lt;code&gt;ValueState&lt;/code&gt;. As illustrated below, within the state descriptor, when registering the type, Flink users specify their &lt;code&gt;TypeInformation&lt;/code&gt; that informs Flink about how to serialize the bytes and represents Flink’s internal type system, used to serialize data when shipped across the network or stored in state backends. Flink’s type system has built-in support for all the basic types such as longs, strings, doubles, arrays and basic collection types like lists and maps. Additionally, Flink supports most of the major composite types including Tuples, POJOs, Scala Case Classes and Apache Avro&lt;sup&gt;Ⓡ&lt;/sup&gt;. Finally, if an application’s type does not match any of the above, developers can either plug in their own serializer or Flink will then fall back to Kryo.&lt;/p&gt;
&lt;h2 id=&quot;state-registration-with-built-in-serialization-in-apache-flink&quot;&gt;State registration with built-in serialization in Apache Flink&lt;/h2&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MyFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;valueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;descriptor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-state&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MyState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;valueState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;descriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Typically, evolving the schema of an application’s state happens because of some business logic change (adding or dropping fields or changing data types). In all cases, the schema is determined by means of its serializer, and can be thought of in terms of an alter table statement when compared with a database. When a state variable is first introduced it is like running a &lt;code&gt;CREATE_TABLE&lt;/code&gt; command, there is a lot of freedom with its execution. However, having data in that table (registered rows) limits developers in what they can do and what rules they follow in order to make updates or changes by an &lt;code&gt;ALTER_TABLE&lt;/code&gt; statement. Schema migration in Apache Flink follows a similar principle since the framework is essentially running an &lt;code&gt;ALTER_TABLE&lt;/code&gt; statement across savepoints.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-182&quot;&gt;Flink 1.8&lt;/a&gt; comes with built-in support for &lt;a href=&quot;https://avro.apache.org/&quot;&gt;Apache Avro&lt;/a&gt; (specifically the &lt;a href=&quot;https://avro.apache.org/docs/1.7.7/spec.html&quot;&gt;1.7.7 specification&lt;/a&gt;) and evolves state schema according to Avro specifications by adding and removing types or even by swapping between generic and specific Avro record types.&lt;/p&gt;
&lt;p&gt;In &lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-191&quot;&gt;Flink 1.9&lt;/a&gt; the community added support for schema evolution for POJOs, including the ability to remove existing fields from POJO types or add new fields. The POJO schema evolution tends to be less flexible — when compared to Avro — since it is not possible to change neither the declared field types nor the class name of a POJO type, including its namespace.&lt;/p&gt;
&lt;p&gt;With the community’s efforts related to schema evolution, Flink developers can now expect out-of-the-box support for both Avro and POJO formats, with backwards compatibility for all Flink state backends. Future work revolves around adding support for Scala Case Classes, Tuples and other formats. Make sure to subscribe to the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;Flink mailing list&lt;/a&gt; to contribute and stay on top of any upcoming additions in this space.&lt;/p&gt;
&lt;h2 id=&quot;peeking-under-the-hood&quot;&gt;Peeking Under the Hood&lt;/h2&gt;
&lt;p&gt;Now that we have explained how schema evolution in Flink works, let’s describe the challenges of performing schema serialization with Flink under the hood. Flink considers state as a core part of its API stability, in a way that developers should always be able to take a savepoint from one version of Flink and restart it on the next. With schema evolution, every migration needs to be backwards compatible and also compatible with the different state backends. While in the Flink code the state backends are represented as interfaces detailing how to store and retrieve bytes, in practice, they behave vastly differently, something that adds extra complexity to how schema evolution is executed in Flink.&lt;/p&gt;
&lt;p&gt;For instance, the heap state backend supports lazy serialization and eager deserialization, making the per-record code path always working with Java objects, serializing on a background thread. When restoring, Flink will eagerly deserialize all the data and then start the user code. If a developer plugs in a new serializer, the deserialization happens before Flink ever receives the information.&lt;/p&gt;
&lt;p&gt;The RocksDB state backend behaves in the exact opposite manner: it supports eager serialization — because of items being stored on disk and RocksDB only consuming byte arrays. RocksDB provides lazy deserialization simply by downloading files to the local disk, making Flink unaware of what the bytes mean until a serializer is registered.&lt;/p&gt;
&lt;p&gt;An additional challenge stems from the fact that different versions of user code contain different classes on their classpath making the serializer used to write into a savepoint likely potentially unavailable at runtime.&lt;/p&gt;
&lt;p&gt;To overcome the previously mentioned challenges, we introduced what we call &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt;. The &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; stores the configuration of the writer serializer in the snapshot. When restoring it will use that configuration to read back the previous state and check its compatibility with the current version. Using such operation allows Flink to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Read the configuration used to write out a snapshot&lt;/li&gt;
&lt;li&gt;Consume the new user code&lt;/li&gt;
&lt;li&gt;Check if both items above are compatible&lt;/li&gt;
&lt;li&gt;Consume the bytes from the snapshot and move forward or alert the user otherwise&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;interface&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TypeSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getCurrentVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;writeSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataOutputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;readSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataInputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ClassLoader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;restoreSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;resolveSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;implementing-apache-avro-serialization-in-flink&quot;&gt;Implementing Apache Avro Serialization in Flink&lt;/h2&gt;
&lt;p&gt;Apache Avro is a data serialization format that has very well-defined schema migration semantics and supports both reader and writer schemas. During normal Flink execution the reader and writer schemas will be the same. However, when upgrading an application they may be different and with schema evolution, Flink will be able to migrate objects with their schemas.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@SuppressWarnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;WeakerAccess&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;AvroSerializerSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This is a sketch of our Avro serializer. It uses the provided schemas and delegates to Apache Avro for all (de)-serialization. Let’s take a look at one possible implementation of a &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; that supports schema migration for Avro.&lt;/p&gt;
&lt;h1 id=&quot;writing-out-the-snapshot&quot;&gt;Writing out the snapshot&lt;/h1&gt;
&lt;p&gt;When serializing out the snapshot, the snapshot configuration will write two pieces of information; the current snapshot configuration version and the serializer configuration.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt; &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getCurrentVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;writeSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DataOutputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;writeUTF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The version is used to version the snapshot configuration object itself while the &lt;code&gt;writeSnapshot&lt;/code&gt; method writes out all the information we need to understand the current format; the runtime schema.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt; &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;readSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataInputView&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ClassLoader&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IOException&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;assert&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;readVersion&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchemaDefinition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;in&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readUTF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;previousSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseAvroSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchemaDefinition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeType&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;findClassOrFallbackToGeneric&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getFullName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;runtimeSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tryExtractAvroSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userCodeClassLoader&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now when Flink restores it is able to read back in the writer schema used to serialize the data. The current runtime schema is discovered on the class path using some Java reflection magic.&lt;/p&gt;
&lt;p&gt;Once we have both of these we can compare them for compatibility. Perhaps nothing has changed and the schemas are compatible as is.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt; &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;resolveSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(!(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;newSerializer&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;instanceof&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;incompatible&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Objects&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializerSchemaCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;compatibleAsIs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Otherwise, the schemas are compared using Avro’s compatibility checks and they may either be compatible with a migration or incompatible.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt; &lt;span class=&quot;kd&quot;&gt;final&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SchemaPairCompatibility&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;compatibility&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;SchemaCompatibility&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;checkReaderWriterCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;err&quot;&gt;​&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;avroCompatibilityToFlinkCompatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;compatibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If they are compatible with migration then Flink will restore a new serializer that can read the old schema and deserialize into the new runtime type which is in effect a migration.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt; &lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;restoreSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;previousSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;AvroSerializer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;runtimeType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;runtimeSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id=&quot;the-state-processor-api-reading-writing-and-modifying-flink-state&quot;&gt;The State Processor API: Reading, writing and modifying Flink state&lt;/h1&gt;
&lt;p&gt;The State Processor API allows reading from and writing to Flink savepoints. Some of the interesting use cases it can be used for are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Analyzing state for interesting patterns&lt;/li&gt;
&lt;li&gt;Troubleshooting or auditing jobs by checking for state discrepancies&lt;/li&gt;
&lt;li&gt;Bootstrapping state for new applications&lt;/li&gt;
&lt;li&gt;Modifying savepoints such as:
&lt;ul&gt;
&lt;li&gt;Changing the maximum parallelism of a savepoint after deploying a Flink job&lt;/li&gt;
&lt;li&gt;Introducing breaking schema updates to a Flink application&lt;/li&gt;
&lt;li&gt;Correcting invalid state in a Flink savepoint&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a &lt;a href=&quot;https://flink.apache.org/feature/2019/09/13/state-processor-api.html&quot;&gt;previous blog post&lt;/a&gt;, we discussed the State Processor API in detail, the community’s motivation behind introducing the feature in Flink 1.9, what you can use the API for and how you can use it. Essentially, the State Processor API is based around a relational model of mapping your Flink job state to a database, as illustrated in the diagram below. We encourage you to &lt;a href=&quot;https://flink.apache.org/feature/2019/09/13/state-processor-api.html&quot;&gt;read the previous story&lt;/a&gt; for more information on the API and how to use it. In a follow up post, we will provide detailed tutorials on:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reading Keyed and Operator State with the State Processor API and&lt;/li&gt;
&lt;li&gt;Writing and Bootstrapping Keyed and Operator State with the State Processor API&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stay tuned for more details and guidance around this feature of Flink.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-state-processor-api-visual-1.png&quot; width=&quot;600px&quot; alt=&quot;State Processor API in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2020-01-29-state-unlocked-interacting-with-state-in-apache-flink/managing-state-in-flink-state-processor-api-visual-2.png&quot; width=&quot;600px&quot; alt=&quot;State Processor API in Apache Flink&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h1 id=&quot;looking-ahead-more-ways-to-interact-with-state-in-flink&quot;&gt;Looking ahead: More ways to interact with State in Flink&lt;/h1&gt;
&lt;p&gt;There is a lot of discussion happening in the community related to extending the way Flink developers interact with state in their Flink applications. Regarding the State Processor API, some thoughts revolve around further broadening the API’s scope beyond its current ability to read from and write to both keyed and operator state. In upcoming releases, the State processor API will be extended to support both reading from and writing to windows and have a first-class integration with Flink’s Table API and SQL.&lt;/p&gt;
&lt;p&gt;Beyond widening the scope of the State Processor API, the Flink community is discussing a few additional ways to improve the way developers interact with state in Flink. One of them is the proposal for a Unified Savepoint Format (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-41%3A+Unify+Binary+format+for+Keyed+State&quot;&gt;FLIP-41&lt;/a&gt;) for all keyed state backends. Such improvement aims at introducing a unified binary format across all savepoints in all keyed state backends, something that drastically reduces the overhead of swapping the state backend in a Flink application. Such an improvement would allow developers to take a savepoint in their application and restart it in a different state backend — for example, moving it from the heap to disk (RocksDB state backend) and back — depending on the scalability and evolution of the application at different points-in-time.&lt;/p&gt;
&lt;p&gt;The community is also discussing the ability to have upgradability dry runs in upcoming Flink releases. Having such functionality in Flink allows developers to detect incompatible updates offline without the need of starting a new Flink job from scratch. For example, Flink users will be able to uncover topology or schema incompatibilities upon upgrading a Flink job, without having to load the state back to a running Flink job in the first place. Additionally, with upgradability dry runs Flink users will be able to get information about the registered state through the streaming graph, without needing to access the state in the state backend.&lt;/p&gt;
&lt;p&gt;With all the exciting new functionality added in Flink 1.9 as well as some solid ideas and discussions around bringing state in Flink to the next level, the community is committed to making state in Apache Flink a fundamental element of the framework, something that is ever-present across versions and upgrades of your application and a component that is a true first-class citizen in Apache Flink. We encourage you to sign up to the &lt;a href=&quot;https://flink.apache.org/community.html&quot;&gt;mailing list&lt;/a&gt; and stay on top of the announcements and new features in upcoming releases.&lt;/p&gt;
</description>
<pubDate>Wed, 29 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html</link>
<guid isPermaLink="true">/news/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink.html</guid>
</item>
<item>
<title>Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System</title>
<description>&lt;p&gt;In this series of blog posts you will learn about three powerful Flink patterns for building streaming applications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;/news/2020/03/24/demo-fraud-detection-2.html&quot;&gt;Dynamic updates of application logic&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Dynamic data partitioning (shuffle), controlled at runtime&lt;/li&gt;
&lt;li&gt;Low latency alerting based on custom windowing logic (without using the window API)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These patterns expand the possibilities of what is achievable with statically defined data flows and provide the building blocks to fulfill complex business requirements.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dynamic updates of application logic&lt;/strong&gt; allow Flink jobs to change at runtime, without downtime from stopping and resubmitting the code.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Dynamic data partitioning&lt;/strong&gt; provides the ability to change how events are distributed and grouped by Flink at runtime. Such functionality often becomes a natural requirement when building jobs with dynamically reconfigurable application logic.&lt;br /&gt;
&lt;br /&gt;
&lt;strong&gt;Custom window management&lt;/strong&gt; demonstrates how you can utilize the low level &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/process_function.html&quot;&gt;process function API&lt;/a&gt;, when the native &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/windows.html&quot;&gt;window API&lt;/a&gt; is not exactly matching your requirements. Specifically, you will learn how to implement low latency alerting on windows and how to limit state growth with timers.&lt;/p&gt;
&lt;p&gt;These patterns build on top of core Flink functionality, however, they might not be immediately apparent from the framework’s documentation as explaining and presenting the motivation behind them is not always trivial without a concrete use case. That is why we will showcase these patterns with a practical example that offers a real-world usage scenario for Apache Flink — a &lt;em&gt;Fraud Detection&lt;/em&gt; engine.
We hope that this series will place these powerful approaches into your tool belt and enable you to take on new and exciting tasks.&lt;/p&gt;
&lt;p&gt;In the first blog post of the series we will look at the high-level architecture of the demo application, describe its components and their interactions. We will then deep dive into the implementation details of the first pattern in the series - &lt;strong&gt;dynamic data partitioning&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;You will be able to run the full Fraud Detection Demo application locally and look into the details of the implementation by using the accompanying GitHub repository.&lt;/p&gt;
&lt;h3 id=&quot;fraud-detection-demo&quot;&gt;Fraud Detection Demo&lt;/h3&gt;
&lt;p&gt;The full source code for our fraud detection demo is open source and available online. To run it locally, check out the following repository and follow the steps in the README:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/afedulov/fraud-detection-demo&quot;&gt;https://github.com/afedulov/fraud-detection-demo&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;You will see the demo is a self-contained application - it only requires &lt;code&gt;docker&lt;/code&gt; and &lt;code&gt;docker-compose&lt;/code&gt; to be built from sources and includes the following components:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Apache Kafka (message broker) with ZooKeeper&lt;/li&gt;
&lt;li&gt;Apache Flink (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/concepts/glossary.html#flink-application-cluster&quot;&gt;application cluster&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Fraud Detection Web App&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The high-level goal of the Fraud Detection engine is to consume a stream of financial transactions and evaluate them against a set of rules. These rules are subject to frequent changes and tweaks. In a real production system, it is important to be able to add and remove them at runtime, without incurring an expensive penalty of stopping and restarting the job.&lt;/p&gt;
&lt;p&gt;When you navigate to the demo URL in your browser, you will be presented with the following UI:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/ui.png&quot; width=&quot;800px&quot; alt=&quot;Figure 1: Demo UI&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 1: Fraud Detection Demo UI&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;On the left side, you can see a visual representation of financial transactions flowing through the system after you click the “Start” button. The slider at the top allows you to control the number of generated transactions per second. The middle section is devoted to managing the rules evaluated by Flink. From here, you can create new rules as well as issue control commands, such as clearing Flink’s state.&lt;/p&gt;
&lt;p&gt;The demo out-of-the-box comes with a set of predefined sample rules. You can click the &lt;em&gt;Start&lt;/em&gt; button and, after some time, will observe alerts displayed in the right section of the UI. These alerts are the result of Flink evaluating the generated transactions stream against the predefined rules.&lt;/p&gt;
&lt;p&gt;Our sample fraud detection system consists of three main components:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Frontend (React)&lt;/li&gt;
&lt;li&gt;Backend (SpringBoot)&lt;/li&gt;
&lt;li&gt;Fraud Detection application (Apache Flink)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Interactions between the main elements are depicted in &lt;em&gt;Figure 2&lt;/em&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/architecture.png&quot; width=&quot;800px&quot; alt=&quot;Figure 2: Demo Components&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 2: Fraud Detection Demo Components&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The Backend exposes a REST API to the Frontend for creating/deleting rules as well as issuing control commands for managing the demo execution. It then relays those Frontend actions to Flink by sending them via a “Control” Kafka topic. The Backend additionally includes a &lt;em&gt;Transactions Generator&lt;/em&gt; component, which sends an emulated stream of money transfer events to Flink via a separate “Transactions” topic. Alerts generated by Flink are consumed by the Backend from “Alerts” topic and relayed to the UI via WebSockets.&lt;/p&gt;
&lt;p&gt;Now that you are familiar with the overall layout and the goal of our Fraud Detection engine, let’s now go into the details of what is required to implement such a system.&lt;/p&gt;
&lt;h3 id=&quot;dynamic-data-partitioning&quot;&gt;Dynamic Data Partitioning&lt;/h3&gt;
&lt;p&gt;The first pattern we will look into is Dynamic Data Partitioning.&lt;/p&gt;
&lt;p&gt;If you have used Flink’s DataStream API in the past, you are undoubtedly familiar with the &lt;strong&gt;keyBy&lt;/strong&gt; method. Keying a stream shuffles all the records such that elements with the same key are assigned to the same partition. This means all records with the same key are processed by the same physical instance of the next operator.&lt;/p&gt;
&lt;p&gt;In a typical streaming application, the choice of key is fixed, determined by some static field within the elements. For instance, when building a simple window-based aggregation of a stream of transactions, we might always group by the transactions account id.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// [...]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;...&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nl&quot;&gt;Transaction:&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getAccountId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/*window specification*/&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This approach is the main building block for achieving horizontal scalability in a wide range of use cases. However, in the case of an application striving to provide flexibility in business logic at runtime, this is not enough.
To understand why this is the case, let us start with articulating a realistic sample rule definition for our fraud detection system in the form of a functional requirement:&lt;/p&gt;
&lt;p&gt;&lt;em&gt;“Whenever the &lt;strong&gt;sum&lt;/strong&gt; of the accumulated &lt;strong&gt;payment amount&lt;/strong&gt; from the same &lt;strong&gt;payer&lt;/strong&gt; to the same &lt;strong&gt;beneficiary&lt;/strong&gt; within the &lt;strong&gt;duration of a week&lt;/strong&gt; is &lt;strong&gt;greater&lt;/strong&gt; than &lt;strong&gt;1 000 000 $&lt;/strong&gt; - fire an alert.”&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;In this formulation we can spot a number of parameters that we would like to be able to specify in a newly-submitted rule and possibly even later modify or tweak at runtime:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Aggregation field (payment amount)&lt;/li&gt;
&lt;li&gt;Grouping fields (payer + beneficiary)&lt;/li&gt;
&lt;li&gt;Aggregation function (sum)&lt;/li&gt;
&lt;li&gt;Window duration (1 week)&lt;/li&gt;
&lt;li&gt;Limit (1 000 000)&lt;/li&gt;
&lt;li&gt;Limit operator (greater)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Accordingly, we will use the following simple JSON format to define the aforementioned parameters:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-json&quot;&gt;&lt;span class=&quot;p&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;ruleId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;ruleState&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;ACTIVE&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;groupingKeyNames&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s2&quot;&gt;&amp;quot;payerId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;beneficiaryId&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;],&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregateFieldName&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;paymentAmount&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;aggregatorFunctionType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;SUM&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;limitOperatorType&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;s2&quot;&gt;&amp;quot;GREATER&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;limit&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000000&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;quot;windowMinutes&amp;quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;10080&lt;/span&gt;
&lt;span class=&quot;p&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At this point, it is important to understand that &lt;strong&gt;&lt;code&gt;groupingKeyNames&lt;/code&gt;&lt;/strong&gt; determine the actual physical grouping of events - all Transactions with the same values of specified parameters (e.g. &lt;em&gt;payer #25 -&amp;gt; beneficiary #12&lt;/em&gt;) have to be aggregated in the same physical instance of the evaluating operator. Naturally, the process of distributing data in such a way in Flink’s API is realised by a &lt;code&gt;keyBy()&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;Most examples in Flink’s &lt;code&gt;keyBy()&lt;/code&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-field-expressions&quot;&gt;documentation&lt;/a&gt; use a hard-coded &lt;code&gt;KeySelector&lt;/code&gt;, which extracts specific fixed events’ fields. However, to support the desired flexibility, we have to extract them in a more dynamic fashion based on the specifications of the rules. For this, we will have to use one additional operator that prepares every event for dispatching to a correct aggregating instance.&lt;/p&gt;
&lt;p&gt;On a high level, our main processing pipeline looks like this:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* some key selector */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;cm&quot;&gt;/* actual calculations and alerting */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We have previously established that each rule defines a &lt;strong&gt;&lt;code&gt;groupingKeyNames&lt;/code&gt;&lt;/strong&gt; parameter that specifies which combination of fields will be used for the incoming events’ grouping. Each rule might use an arbitrary combination of these fields. At the same time, every incoming event potentially needs to be evaluated against multiple rules. This implies that events might simultaneously need to be present at multiple parallel instances of evaluating operators that correspond to different rules and hence will need to be forked. Ensuring such events dispatching is the purpose of &lt;code&gt;DynamicKeyFunction()&lt;/code&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/shuffle_function_1.png&quot; width=&quot;800px&quot; alt=&quot;Figure 3: Forking events with Dynamic Key Function&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 3: Forking events with Dynamic Key Function&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;DynamicKeyFunction&lt;/code&gt; iterates over a set of defined rules and prepares every event to be processed by a &lt;code&gt;keyBy()&lt;/code&gt; function by extracting the required grouping keys:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;DynamicKeyFunction&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/* Simplified */&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;cm&quot;&gt;/* Rules that are initialized somehow.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; Details will be discussed in a future blog post. */&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Transaction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Rule&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rules&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;KeysExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getGroupingKeyNames&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;event&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;rule&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRuleId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;KeysExtractor.getKey()&lt;/code&gt; uses reflection to extract the required values of &lt;code&gt;groupingKeyNames&lt;/code&gt; fields from events and combines them as a single concatenated String key, e.g &lt;code&gt;&quot;{payerId=25;beneficiaryId=12}&quot;&lt;/code&gt;. Flink will calculate the hash of this key and assign the processing of this particular combination to a specific server in the cluster. This will allow tracking all transactions between &lt;em&gt;payer #25&lt;/em&gt; and &lt;em&gt;beneficiary #12&lt;/em&gt; and evaluating defined rules within the desired time window.&lt;/p&gt;
&lt;p&gt;Notice that a wrapper class &lt;code&gt;Keyed&lt;/code&gt; with the following signature was introduced as the output type of &lt;code&gt;DynamicKeyFunction&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;IN&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;IN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wrapped&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ID&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KEY&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(){&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Fields of this POJO carry the following information: &lt;code&gt;wrapped&lt;/code&gt; is the original transaction event, &lt;code&gt;key&lt;/code&gt; is the result of using &lt;code&gt;KeysExtractor&lt;/code&gt; and &lt;code&gt;id&lt;/code&gt; is the ID of the Rule that caused the dispatch of the event (according to the rule-specific grouping logic).&lt;/p&gt;
&lt;p&gt;Events of this type will be the input to the &lt;code&gt;keyBy()&lt;/code&gt; function in the main processing pipeline and allow the use of a simple lambda-expression as a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions&quot;&gt;&lt;code&gt;KeySelector&lt;/code&gt;&lt;/a&gt; for the final step of implementing dynamic data shuffle.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Alert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;transactions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicKeyFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;keyed&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;DynamicAlertFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By applying &lt;code&gt;DynamicKeyFunction&lt;/code&gt; we are implicitly copying events for performing parallel per-rule evaluation within a Flink cluster. By doing so, we achieve an important property - horizontal scalability of rules’ processing. Our system will be capable of handling more rules by adding more servers to the cluster, i.e. increasing the parallelism. This property is achieved at the cost of data duplication, which might become an issue depending on the specific set of parameters, such as incoming data rate, available network bandwidth, event payload size etc. In a real-life scenario, additional optimizations can be applied, such as combined evaluation of rules which have the same &lt;code&gt;groupingKeyNames&lt;/code&gt;, or a filtering layer, which would strip events of all the fields that are not required for processing of a particular rule.&lt;/p&gt;
&lt;h3 id=&quot;summary&quot;&gt;Summary:&lt;/h3&gt;
&lt;p&gt;In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case - a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a &lt;strong&gt;dynamic data partitioning pattern&lt;/strong&gt; as the first underlying building block to enable flexible runtime configurations.&lt;/p&gt;
&lt;p&gt;To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.&lt;/p&gt;
&lt;p&gt;In the second part of this series, we will describe how the rules make their way into the running Fraud Detection engine. Additionally, we will go over the implementation details of the main processing function of the pipeline - &lt;em&gt;DynamicAlertFunction()&lt;/em&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-11-19-demo-fraud-detection/end-to-end.png&quot; width=&quot;800px&quot; alt=&quot;Figure 4: End-to-end pipeline&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Figure 4: End-to-end pipeline&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;In the &lt;a href=&quot;/news/2020/03/24/demo-fraud-detection-2.html&quot;&gt;next article&lt;/a&gt;, we will see how Flink’s broadcast streams can be utilized to help steer the processing within the Fraud Detection engine at runtime (Dynamic Application Updates pattern).&lt;/p&gt;
</description>
<pubDate>Wed, 15 Jan 2020 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html</link>
<guid isPermaLink="true">/news/2020/01/15/demo-fraud-detection.html</guid>
</item>
<item>
<title>Apache Flink 1.8.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;
&lt;p&gt;This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.3.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13723&quot;&gt;FLINK-13723&lt;/a&gt;] - Use liquid-c for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13724&quot;&gt;FLINK-13724&lt;/a&gt;] - Remove unnecessary whitespace from the docs&amp;#39; sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13725&quot;&gt;FLINK-13725&lt;/a&gt;] - Use sassc for faster doc generation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13726&quot;&gt;FLINK-13726&lt;/a&gt;] - Build docs with jekyll 4.0.0.pre.beta1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13791&quot;&gt;FLINK-13791&lt;/a&gt;] - Speed up sidenav by using group_by
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] - Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13184&quot;&gt;FLINK-13184&lt;/a&gt;] - Starting a TaskExecutor blocks the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13728&quot;&gt;FLINK-13728&lt;/a&gt;] - Fix wrong closing tag order in sidenav
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13746&quot;&gt;FLINK-13746&lt;/a&gt;] - Elasticsearch (v2.3.5) sink end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13749&quot;&gt;FLINK-13749&lt;/a&gt;] - Make Flink client respect classloading policy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13892&quot;&gt;FLINK-13892&lt;/a&gt;] - HistoryServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13936&quot;&gt;FLINK-13936&lt;/a&gt;] - NOTICE-binary is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13966&quot;&gt;FLINK-13966&lt;/a&gt;] - Jar sorting in collect_license_files.sh is locale dependent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13995&quot;&gt;FLINK-13995&lt;/a&gt;] - Fix shading of the licence information of netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13999&quot;&gt;FLINK-13999&lt;/a&gt;] - Correct the documentation of MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14009&quot;&gt;FLINK-14009&lt;/a&gt;] - Cron jobs broken due to verifying incorrect NOTICE-binary file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14010&quot;&gt;FLINK-14010&lt;/a&gt;] - Dispatcher &amp;amp; JobManagers don&amp;#39;t give up leadership when AM is shut down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14043&quot;&gt;FLINK-14043&lt;/a&gt;] - SavepointMigrationTestBase is super slow
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14107&quot;&gt;FLINK-14107&lt;/a&gt;] - Kinesis consumer record emitter deadlock under event time alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14175&quot;&gt;FLINK-14175&lt;/a&gt;] - Upgrade KPL version in flink-connector-kinesis to fix application OOM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14235&quot;&gt;FLINK-14235&lt;/a&gt;] - Kafka010ProducerITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14315&quot;&gt;FLINK-14315&lt;/a&gt;] - NPE with JobMaster.disconnectTaskManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14337&quot;&gt;FLINK-14337&lt;/a&gt;] - HistoryServerTest.testHistoryServerIntegration failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14347&quot;&gt;FLINK-14347&lt;/a&gt;] - YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14370&quot;&gt;FLINK-14370&lt;/a&gt;] - KafkaProducerAtLeastOnceITCase&amp;gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14398&quot;&gt;FLINK-14398&lt;/a&gt;] - Further split input unboxing code into separate methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14413&quot;&gt;FLINK-14413&lt;/a&gt;] - shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14434&quot;&gt;FLINK-14434&lt;/a&gt;] - Dispatcher#createJobManagerRunner should not start JobManagerRunner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14562&quot;&gt;FLINK-14562&lt;/a&gt;] - RMQSource leaves idle consumer after closing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14589&quot;&gt;FLINK-14589&lt;/a&gt;] - Redundant slot requests with the same AllocationID leads to inconsistent slot table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-15036&quot;&gt;FLINK-15036&lt;/a&gt;] - Container startup error will be handled out side of the YarnResourceManager&amp;#39;s main thread
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12848&quot;&gt;FLINK-12848&lt;/a&gt;] - Method equals() in RowTypeInfo should consider fieldsNames
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13729&quot;&gt;FLINK-13729&lt;/a&gt;] - Update website generation dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13965&quot;&gt;FLINK-13965&lt;/a&gt;] - Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13967&quot;&gt;FLINK-13967&lt;/a&gt;] - Generate full binary licensing via collect_license_files.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13968&quot;&gt;FLINK-13968&lt;/a&gt;] - Add travis check for the correctness of the binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13991&quot;&gt;FLINK-13991&lt;/a&gt;] - Add git exclusion for 1.9+ features to 1.8
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14008&quot;&gt;FLINK-14008&lt;/a&gt;] - Auto-generate binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14104&quot;&gt;FLINK-14104&lt;/a&gt;] - Bump Jackson to 2.10.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14123&quot;&gt;FLINK-14123&lt;/a&gt;] - Lower the default value of taskmanager.memory.fraction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14215&quot;&gt;FLINK-14215&lt;/a&gt;] - Add Docs for TM and JM Environment Variable Setting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14334&quot;&gt;FLINK-14334&lt;/a&gt;] - ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14639&quot;&gt;FLINK-14639&lt;/a&gt;] - Fix the document of Metrics that has an error for `User Scope`
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14646&quot;&gt;FLINK-14646&lt;/a&gt;] - Check non-null for key in KeyGroupStreamPartitioner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14995&quot;&gt;FLINK-14995&lt;/a&gt;] - Kinesis NOTICE is incorrect
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 Dec 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/12/11/release-1.8.3.html</link>
<guid isPermaLink="true">/news/2019/12/11/release-1.8.3.html</guid>
</item>
<item>
<title>Running Apache Flink on Kubernetes with KUDO</title>
<description>&lt;p&gt;A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. To achieve elastic scalability, both are typically deployed in clustered environments, and increasingly on top of container orchestration platforms like Kubernetes. The &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/operator/&quot;&gt;Operator pattern&lt;/a&gt; provides an extension mechanism to Kubernetes that captures human operator knowledge about an application, like Flink, in software to automate its operation. &lt;a href=&quot;https://kudo.dev&quot;&gt;KUDO&lt;/a&gt; is an open source toolkit for building Operators using declarative YAML specs, with a focus on ease of use for cluster admins and developers.&lt;/p&gt;
&lt;p&gt;In this blog post we demonstrate how to orchestrate a streaming data analytics application based on Flink and Kafka with KUDO. It consists of a Flink job that checks financial transactions for fraud, and two microservices that generate and display the transactions. You can find more details about this demo in the &lt;a href=&quot;https://github.com/kudobuilder/operators/tree/master/repository/flink/docs/demo/financial-fraud&quot;&gt;KUDO Operators repository&lt;/a&gt;, including instructions for installing the dependencies.&lt;/p&gt;
&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
&lt;img src=&quot;/img/blog/2019-11-06-flink-kubernetes-kudo/flink-kudo-architecture.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;
&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h2&gt;
&lt;p&gt;You can run this demo on your local machine using &lt;a href=&quot;https://github.com/kubernetes/minikube&quot;&gt;minikube&lt;/a&gt;. The instructions below were tested with minikube v1.5.1 and Kubernetes v1.16.2 but should work on any Kubernetes version above v1.15.0. First, start a minikube cluster with enough capacity:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;minikube start --cpus=6 --memory=9216 --disk-size=10g&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;If you’re using a different way to provision Kubernetes, make sure you have at least 6 CPU Cores, 9 GB of RAM and 10 GB of disk space available.&lt;/p&gt;
&lt;p&gt;Install the &lt;code&gt;kubectl&lt;/code&gt; CLI tool. The KUDO CLI is a plugin for the Kubernetes CLI. The official instructions for installing and setting up kubectl are &lt;a href=&quot;https://kubernetes.io/docs/tasks/tools/install-kubectl/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Next, let’s install the KUDO CLI. At the time of this writing, the latest KUDO version is v0.10.0. You can find the CLI binaries for download &lt;a href=&quot;https://github.com/kudobuilder/kudo/releases&quot;&gt;here&lt;/a&gt;. Download the &lt;code&gt;kubectl-kudo&lt;/code&gt; binary for your OS and architecture.&lt;/p&gt;
&lt;p&gt;If you’re using Homebrew on MacOS, you can install the CLI via:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ brew tap kudobuilder/tap
$ brew install kudo-cli
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now, let’s initialize KUDO on our Kubernetes cluster:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo init
$KUDO_HOME has been configured at /Users/gerred/.kudo
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will create several resources. First, it will create the &lt;a href=&quot;https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/&quot;&gt;Custom Resource Definitions&lt;/a&gt;, &lt;a href=&quot;https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/&quot;&gt;service account&lt;/a&gt;, and &lt;a href=&quot;https://kubernetes.io/docs/reference/access-authn-authz/rbac/&quot;&gt;role bindings&lt;/a&gt; necessary for KUDO to operate. It will also create an instance of the &lt;a href=&quot;https://kudo.dev/docs/architecture.html#components&quot;&gt;KUDO controller&lt;/a&gt; so that we can begin creating instances of applications.&lt;/p&gt;
&lt;p&gt;The KUDO CLI leverages the kubectl plugin system, which gives you all its functionality under &lt;code&gt;kubectl kudo&lt;/code&gt;. This is a convenient way to install and deal with your KUDO Operators. For our demo, we use Kafka and Flink which depend on ZooKeeper. To make the ZooKeeper Operator available on the cluster, run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install zookeeper --version=0.3.0 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The –skip-instance flag skips the creation of a ZooKeeper instance. The flink-demo Operator that we’re going to install below will create it as a dependency instead. Now let’s make the Kafka and Flink Operators available the same way:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install kafka --version=1.2.0 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo install flink --version=0.2.1 --skip-instance
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This installs all the Operator versions needed for our demo.&lt;/p&gt;
&lt;h2 id=&quot;financial-fraud-demo&quot;&gt;Financial Fraud Demo&lt;/h2&gt;
&lt;p&gt;In our financial fraud demo we have two microservices, called “generator” and “actor”. The generator produces transactions with random amounts and writes them into a Kafka topic. Occasionally, the value will be over 10,000 which is considered fraud for the purpose of this demo. The Flink job subscribes to the Kafka topic and detects fraudulent transactions. When it does, it submits them to another Kafka topic which the actor consumes. The actor simply displays each fraudulent transaction.&lt;/p&gt;
&lt;p&gt;The KUDO CLI by default installs Operators from the &lt;a href=&quot;https://github.com/kudobuilder/operators/&quot;&gt;official repository&lt;/a&gt;, but it also supports installation from your local filesystem. This is useful if you want to develop your own Operator, or modify this demo for your own purposes.&lt;/p&gt;
&lt;p&gt;First, clone the “kudobuilder/operators” repository via:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ git clone https://github.com/kudobuilder/operators.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, change into the “operators” directory and install the demo-operator from your local filesystem:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ cd operators
$ kubectl kudo install repository/flink/docs/demo/financial-fraud/demo-operator --instance flink-demo
instance.kudo.dev/v1beta1/flink-demo created
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This time we didn’t include the –skip-instance flag, so KUDO will actually deploy all the components, including Flink, Kafka, and ZooKeeper. KUDO orchestrates deployments and other lifecycle operations using &lt;a href=&quot;https://kudo.dev/docs/concepts.html#plan&quot;&gt;plans&lt;/a&gt; that were defined by the Operator developer. Plans are similar to &lt;a href=&quot;https://en.wikipedia.org/wiki/Runbook&quot;&gt;runbooks&lt;/a&gt; and encapsulate all the procedures required to operate the software. We can track the status of the deployment using this KUDO command:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo plan status --instance flink-demo
Plan(s) for &quot;flink-demo&quot; in namespace &quot;default&quot;:
.
└── flink-demo (Operator-Version: &quot;flink-demo-0.1.4&quot; Active-Plan: &quot;deploy&quot;)
└── Plan deploy (serial strategy) [IN_PROGRESS]
├── Phase dependencies [IN_PROGRESS]
│ ├── Step zookeeper (COMPLETE)
│ └── Step kafka (IN_PROGRESS)
├── Phase flink-cluster [PENDING]
│ └── Step flink (PENDING)
├── Phase demo [PENDING]
│ ├── Step gen (PENDING)
│ └── Step act (PENDING)
└── Phase flink-job [PENDING]
└── Step submit (PENDING)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The output shows that the “deploy” plan is in progress and that it consists of 4 phases: “dependencies”, “flink-cluster”, “demo” and “flink-job”. The “dependencies” phase includes steps for “zookeeper” and “kafka”. This is where both dependencies get installed, before KUDO continues to install the Flink cluster and the demo itself. We also see that ZooKeeper installation completed, and that Kafka installation is currently in progress. We can view details about Kafka’s deployment plan via:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl kudo plan status --instance flink-demo-kafka
Plan(s) for &quot;flink-demo-kafka&quot; in namespace &quot;default&quot;:
.
└── flink-demo-kafka (Operator-Version: &quot;kafka-1.2.0&quot; Active-Plan: &quot;deploy&quot;)
├── Plan deploy (serial strategy) [IN_PROGRESS]
│ └── Phase deploy-kafka [IN_PROGRESS]
│ └── Step deploy (IN_PROGRESS)
└── Plan not-allowed (serial strategy) [NOT ACTIVE]
└── Phase not-allowed (serial strategy) [NOT ACTIVE]
└── Step not-allowed (serial strategy) [NOT ACTIVE]
└── not-allowed [NOT ACTIVE]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After Kafka was successfully installed the next phase “flink-cluster” will start and bring up, you guessed it, your flink-cluster. After this is done, the demo phase creates the generator and actor pods that generate and display transactions for this demo. Lastly, we have the flink-job phase in which we submit the actual FinancialFraudJob to the Flink cluster. Once the flink job is submitted, we will be able to see fraud logs in our actor pod shortly after.&lt;/p&gt;
&lt;p&gt;After a while, the state of all plans, phases and steps will change to “COMPLETE”. Now we can view the Flink dashboard to verify that our job is running. To access it from outside the Kubernetes cluster, first start the client proxy, then open the URL below in your browser:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl proxy
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;http://127.0.0.1:8001/api/v1/namespaces/default/services/flink-demo-flink-jobmanager:ui/proxy/#/overview&quot;&gt;http://127.0.0.1:8001/api/v1/namespaces/default/services/flink-demo-flink-jobmanager:ui/proxy/#/overview&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;It should look similar to this, depending on your local machine and how many cores you have available:&lt;/p&gt;
&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
&lt;img src=&quot;/img/blog/2019-11-06-flink-kubernetes-kudo/flink-dashboard-ui.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;The job is up and running and we should now be able to see fraudulent transaction in the logs of the actor pod:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;$ kubectl logs $(kubectl get pod -l actor=flink-demo -o jsonpath=&quot;{.items[0].metadata.name}&quot;)
Broker: flink-demo-kafka-kafka-0.flink-demo-kafka-svc:9093
Topic: fraud
Detected Fraud: TransactionAggregate {startTimestamp=0, endTimestamp=1563395831000, totalAmount=19895:
Transaction{timestamp=1563395778000, origin=1, target=&#39;3&#39;, amount=8341}
Transaction{timestamp=1563395813000, origin=1, target=&#39;3&#39;, amount=8592}
Transaction{timestamp=1563395817000, origin=1, target=&#39;3&#39;, amount=2802}
Transaction{timestamp=1563395831000, origin=1, target=&#39;3&#39;, amount=160}}
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If you add the “-f” flag to the previous command, you can follow along as more transactions are streaming in and are evaluated by our Flink job.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this blog post we demonstrated how to easily deploy an end-to-end streaming data application on Kubernetes using KUDO. We deployed a Flink job and two microservices, as well as all the required infrastructure - Flink, Kafka, and ZooKeeper using just a few kubectl commands. To find out more about KUDO, visit the &lt;a href=&quot;https://kudo.dev&quot;&gt;project website&lt;/a&gt; or join the community on &lt;a href=&quot;https://kubernetes.slack.com/messages/kudo/&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Mon, 09 Dec 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/12/09/flink-kubernetes-kudo.html</link>
<guid isPermaLink="true">/news/2019/12/09/flink-kubernetes-kudo.html</guid>
</item>
<item>
<title>How to query Pulsar Streams using Apache Flink</title>
<description>&lt;p&gt;In a previous &lt;a href=&quot;https://flink.apache.org/2019/05/03/pulsar-flink.html&quot;&gt;story&lt;/a&gt; on the Flink blog, we explained the different ways that &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://pulsar.apache.org/&quot;&gt;Apache Pulsar&lt;/a&gt; can integrate to provide elastic data processing at large scale. This blog post discusses the new developments and integrations between the two frameworks and showcases how you can leverage Pulsar’s built-in schema to query Pulsar streams in real time using Apache Flink.&lt;/p&gt;
&lt;h1 id=&quot;a-short-intro-to-apache-pulsar&quot;&gt;A short intro to Apache Pulsar&lt;/h1&gt;
&lt;p&gt;Apache Pulsar is a flexible pub/sub messaging system, backed by durable log storage. Some of the framework’s highlights include multi-tenancy, a unified message model, structured event streams and a cloud-native architecture that make it a perfect fit for a wide set of use cases, ranging from billing, payments and trading services all the way to the unification of the different messaging architectures in an organization. If you are interested in finding out more about Pulsar, you can visit the &lt;a href=&quot;https://pulsar.apache.org/docs/en/standalone/&quot;&gt;Apache Pulsar documentation&lt;/a&gt; or get in touch with the Pulsar community on &lt;a href=&quot;https://apache-pulsar.herokuapp.com&quot;&gt;Slack&lt;/a&gt;.&lt;/p&gt;
&lt;h1 id=&quot;existing-pulsar--flink-integration-apache-flink-16&quot;&gt;Existing Pulsar &amp;amp; Flink integration (Apache Flink 1.6+)&lt;/h1&gt;
&lt;p&gt;The existing integration between Pulsar and Flink exploits Pulsar as a message queue in a Flink application. Flink developers can utilize Pulsar as a streaming source and streaming sink for their Flink applications by selecting a specific Pulsar source and connecting to their desired Pulsar cluster and topic:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create and configure Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SimpleStringSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subsciptionName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;subscription&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ingest DataStream with Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Pulsar streams can then get connected to the Flink processing logic…&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// perform computation on DataStream (here a simple WordCount)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatmap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduce&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReduceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;…and then get emitted back to Pulsar (used now as a sink), sending one’s computation results downstream, back to a Pulsar topic:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// emit result via Pulsar producer &lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkPulsarProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthentificationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UTF_8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Although this is a great first integration step, the existing design is not leveraging the full power of Pulsar. Some shortcomings of the integration with Flink 1.6.0 relate to Pulsar neither being utilized as durable storage nor having schema integration with Flink, resulting in manual input when describing an application’s schema registry.&lt;/p&gt;
&lt;h1 id=&quot;pulsars-integration-with-flink-19-using-pulsar-as-a-flink-catalog&quot;&gt;Pulsar’s integration with Flink 1.9: Using Pulsar as a Flink catalog&lt;/h1&gt;
&lt;p&gt;The latest integration between &lt;a href=&quot;https://flink.apache.org/downloads.html#apache-flink-191&quot;&gt;Flink 1.9.0&lt;/a&gt; and Pulsar addresses most of the previously mentioned shortcomings. The &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;contribution of Alibaba’s Blink to the Flink repository&lt;/a&gt; adds many enhancements and new features to the processing framework that make the integration with Pulsar significantly more powerful and impactful. Flink 1.9.0 brings Pulsar schema integration into the picture, makes the Table API a first-class citizen and provides an exactly-once streaming source and at-least-once streaming sink with Pulsar. Lastly, with schema integration, Pulsar can now be registered as a Flink catalog, making running Flink queries on top of Pulsar streams a matter of a few commands. In the following sections, we will take a closer look at the new integrations and provide examples of how to query Pulsar streams using Flink SQL.&lt;/p&gt;
&lt;h1 id=&quot;leveraging-the-flink--pulsar-schema-integration&quot;&gt;Leveraging the Flink &amp;lt;&amp;gt; Pulsar Schema Integration&lt;/h1&gt;
&lt;p&gt;Before delving into the integration details and how you can use Pulsar schema with Flink, let us describe how schema in Pulsar works. Schema in Apache Pulsar already co-exists and serves as the representation of the data on the broker side of the framework, something that makes schema registry with external systems obsolete. Additionally, the data schema in Pulsar is associated with each topic so both producers and consumers send data with predefined schema information, while the broker performs schema validation, and manages schema multi-versioning and evolution in compatibility checks.&lt;/p&gt;
&lt;p&gt;Below you can find an example of Pulsar’s schema on both the producer and consumer side. On the producer side, you can specify which schema you want to use and Pulsar then sends a POJO class without the need to perform any serialization/deserialization. Similarly, on the consumer end, you can also specify the data schema and upon receiving the data, Pulsar will automatically validate the schema information, fetch the schema of the given version and then deserialize the data back to a POJO structure. Pulsar stores the schema information in the metadata of a Pulsar topic.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Create producer with Struct schema and send messages&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Producer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;AVRO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;producer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newMessage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;“&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;err&quot;&gt;”&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;send&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// Create consumer with Struct schema and receive messages&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Consumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newCOnsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;AVRO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;User&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;consumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;receive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Let’s assume we have an application that specifies a schema to the producer and/or consumer. Upon receiving the schema information, the producer (or consumer) — that is connected to the broker — will transfer such information so that the broker can then perform schema registration, validations and schema compatibility checks before returning or rejecting the schema as illustrated in the diagram below:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-pulsar-sql-blog-post-visual.png&quot; width=&quot;600px&quot; alt=&quot;Pulsar Schema&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Not only is Pulsar able to handle and store the schema information, but is additionally able to handle any schema evolution — where necessary. Pulsar will effectively manage any schema evolution in the broker, keeping track of all different versions of your schema while performing any necessary compatibility checks.&lt;/p&gt;
&lt;p&gt;Moreover, when messages are published on the producer side, Pulsar will tag each message with the schema version as part of each message’s metadata. On the consumer side, when the message is received and the metadata is deserialized, Pulsar will check the schema version associated with this message and will fetch the corresponding schema information from the broker. As a result, when Pulsar integrates with a Flink application it uses the pre-existing schema information and maps individual messages with schema information to a different row in Flink’s type system.&lt;/p&gt;
&lt;p&gt;For the cases when Flink users do not interact with schema directly or make use of primitive schema (for example, using a topic to store a string or long number), Pulsar will either convert the message payload into a Flink row, called ‘value’ or — for the cases of structured schema types, like JSON and AVRO — Pulsar will extract the individual fields from the schema information and will map the fields to Flink’s type system. Finally, all metadata information associated with each message, such as the message key, topic, publish time, or event time will be converted into metadata fields in a Flink row. Below we provide two examples of primitive schema and structured schema types and how these will be transformed from a Pulsar topic to Flink’s type system.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/flink-pulsar-sql-blog-post-visual-primitive-avro-schema.png&quot; width=&quot;600px&quot; alt=&quot;Primitive and AVRO Schema&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Once all the schema information is mapped to Flink’s type system, you can start building a Pulsar source, sink or catalog in Flink based on the specified schema information as illustrated below:&lt;/p&gt;
&lt;h1 id=&quot;flink--pulsar-read-data-from-pulsar&quot;&gt;Flink &amp;amp; Pulsar: Read data from Pulsar&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Create a Pulsar source for streaming queries&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;pulsar://...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;http://...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;partitionDiscoveryIntervalMillis&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;5000&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;startingOffsets&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;earliest&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-source-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;source&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FlinkPulsarSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// you don&amp;#39;t need to provide a type information to addSource since FlinkPulsarSource is ResultTypeQueryable&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;dataStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;source&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// chain operations on dataStream of Row and sink the output&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// end method chaining&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Register topics in Pulsar as streaming tables&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inAppendMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO sink-table .....&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h1 id=&quot;flink--pulsar-write-data-to-pulsar&quot;&gt;Flink &amp;amp; Pulsar: Write data to Pulsar&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Create a Pulsar sink for streaming queries&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;.....&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;FlinkPulsarSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;DummyTopicKeyExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Write a streaming table to Pulsar&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;create&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;service.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;admin.url&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;adminUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;flushOnCheckpoint&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prop&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;failOnWrite&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;true&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;test-sink-topic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;Pulsar&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inAppendMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sink-table&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO sink-table .....&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In every instance, Flink developers only need to specify the properties of how Flink will connect to a Pulsar cluster without worrying about any schema registry, or serialization/deserialization actions and register the Pulsar cluster as a source, sink or streaming table in Flink. Once all three elements are put together, Pulsar can then be registered as a catalog in Flink, something that drastically simplifies how you process and query data like, for example, writing a program to query data from Pulsar or using the Table API and SQL to query Pulsar data streams.&lt;/p&gt;
&lt;h1 id=&quot;next-steps--future-integration&quot;&gt;Next Steps &amp;amp; Future Integration&lt;/h1&gt;
&lt;p&gt;The goal of the integration between Pulsar and Flink is to simplify how developers use the two frameworks to build a unified data processing stack. As we progress from the classical Lamda architectures — where an online, speeding layer is combined with an offline, batch layer to run data computations — Flink and Pulsar present a great combination in providing a truly unified data processing stack. We see Flink as a unified computation engine, handling both online (streaming) and offline (batch) workloads and Pulsar as the unified data storage layer for a truly unified data processing stack that simplifies developer workloads.&lt;/p&gt;
&lt;p&gt;There is still a lot of ongoing work and effort from both communities in getting the integration even better, such as a new source API (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface&quot;&gt;FLIP-27&lt;/a&gt;) that will allow the &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discussion-Flink-Pulsar-Connector-td22019.html&quot;&gt;contribution of the Pulsar connectors to the Flink community&lt;/a&gt; as well as a new subscription type called &lt;code&gt;Key_Shared&lt;/code&gt; subscription type in Pulsar that will allow efficient scaling of the source parallelism. Additional efforts focus around the provision of end-to-end, exactly-once guarantees (currently available only in the source Pulsar connector, and not the sink Pulsar connector) and more efforts around using Pulsar/BookKeeper as a Flink state backend.&lt;/p&gt;
&lt;p&gt;You can find a more detailed overview of the integration work between the two communities in this &lt;a href=&quot;https://youtu.be/3sBXXfgl5vs&quot;&gt;recording video&lt;/a&gt; from Flink Forward Europe 2019 or sign up to the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink dev mailing list&lt;/a&gt; for the latest contribution and integration efforts between Flink and Pulsar.&lt;/p&gt;
</description>
<pubDate>Mon, 25 Nov 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</link>
<guid isPermaLink="true">/news/2019/11/25/query-pulsar-streams-using-apache-flink.html</guid>
</item>
<item>
<title>Apache Flink 1.9.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.9 series.&lt;/p&gt;
&lt;p&gt;This release includes 96 fixes and minor improvements for Flink 1.9.0. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.9.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.9.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11630&quot;&gt;FLINK-11630&lt;/a&gt;] - TaskExecutor does not wait for Task termination when terminating itself
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13490&quot;&gt;FLINK-13490&lt;/a&gt;] - Fix if one column value is null when reading JDBC, the following values are all null
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13941&quot;&gt;FLINK-13941&lt;/a&gt;] - Prevent data-loss by not cleaning up small part files from S3.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12501&quot;&gt;FLINK-12501&lt;/a&gt;] - AvroTypeSerializer does not work with types generated by avrohugger
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13386&quot;&gt;FLINK-13386&lt;/a&gt;] - Fix some frictions in the new default Web UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13526&quot;&gt;FLINK-13526&lt;/a&gt;] - Switching to a non existing catalog or database crashes sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13568&quot;&gt;FLINK-13568&lt;/a&gt;] - DDL create table doesn&amp;#39;t allow STRING data type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13805&quot;&gt;FLINK-13805&lt;/a&gt;] - Bad Error Message when TaskManager is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13806&quot;&gt;FLINK-13806&lt;/a&gt;] - Metric Fetcher floods the JM log with errors when TM is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14010&quot;&gt;FLINK-14010&lt;/a&gt;] - Dispatcher &amp;amp; JobManagers don&amp;#39;t give up leadership when AM is shut down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14145&quot;&gt;FLINK-14145&lt;/a&gt;] - CompletedCheckpointStore#getLatestCheckpoint(true) returns wrong checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13059&quot;&gt;FLINK-13059&lt;/a&gt;] - Cassandra Connector leaks Semaphore on Exception and hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13534&quot;&gt;FLINK-13534&lt;/a&gt;] - Unable to query Hive table with decimal column
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13562&quot;&gt;FLINK-13562&lt;/a&gt;] - Throws exception when FlinkRelMdColumnInterval meets two stage stream group aggregate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13563&quot;&gt;FLINK-13563&lt;/a&gt;] - TumblingGroupWindow should implement toString method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13564&quot;&gt;FLINK-13564&lt;/a&gt;] - Throw exception if constant with YEAR TO MONTH resolution was used for group windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13588&quot;&gt;FLINK-13588&lt;/a&gt;] - StreamTask.handleAsyncException throws away the exception cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13653&quot;&gt;FLINK-13653&lt;/a&gt;] - ResultStore should avoid using RowTypeInfo when creating a result
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13711&quot;&gt;FLINK-13711&lt;/a&gt;] - Hive array values not properly displayed in SQL CLI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13737&quot;&gt;FLINK-13737&lt;/a&gt;] - flink-dist should add provided dependency on flink-examples-table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13738&quot;&gt;FLINK-13738&lt;/a&gt;] - Fix NegativeArraySizeException in LongHybridHashTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13742&quot;&gt;FLINK-13742&lt;/a&gt;] - Fix code generation when aggregation contains both distinct aggregate with and without filter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13760&quot;&gt;FLINK-13760&lt;/a&gt;] - Fix hardcode Scala version dependency in hive connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13761&quot;&gt;FLINK-13761&lt;/a&gt;] - `SplitStream` should be deprecated because `SplitJavaStream` is deprecated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13789&quot;&gt;FLINK-13789&lt;/a&gt;] - Transactional Id Generation fails due to user code impacting formatting string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13823&quot;&gt;FLINK-13823&lt;/a&gt;] - Incorrect debug log in CompileUtils
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13825&quot;&gt;FLINK-13825&lt;/a&gt;] - The original plugins dir is not restored after e2e test run
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13831&quot;&gt;FLINK-13831&lt;/a&gt;] - Free Slots / All Slots display error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13887&quot;&gt;FLINK-13887&lt;/a&gt;] - Ensure defaultInputDependencyConstraint to be non-null when setting it in ExecutionConfig
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13897&quot;&gt;FLINK-13897&lt;/a&gt;] - OSS FS NOTICE file is placed in wrong directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13933&quot;&gt;FLINK-13933&lt;/a&gt;] - Hive Generic UDTF can not be used in table API both stream and batch mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13936&quot;&gt;FLINK-13936&lt;/a&gt;] - NOTICE-binary is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13966&quot;&gt;FLINK-13966&lt;/a&gt;] - Jar sorting in collect_license_files.sh is locale dependent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14009&quot;&gt;FLINK-14009&lt;/a&gt;] - Cron jobs broken due to verifying incorrect NOTICE-binary file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14049&quot;&gt;FLINK-14049&lt;/a&gt;] - Update error message for failed partition updates to include task name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14076&quot;&gt;FLINK-14076&lt;/a&gt;] - &amp;#39;ClassNotFoundException: KafkaException&amp;#39; on Flink v1.9 w/ checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14107&quot;&gt;FLINK-14107&lt;/a&gt;] - Kinesis consumer record emitter deadlock under event time alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14119&quot;&gt;FLINK-14119&lt;/a&gt;] - Clean idle state for RetractableTopNFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14139&quot;&gt;FLINK-14139&lt;/a&gt;] - Fix potential memory leak of rest server when using session/standalone cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14140&quot;&gt;FLINK-14140&lt;/a&gt;] - The Flink Logo Displayed in Flink Python Shell is Broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14150&quot;&gt;FLINK-14150&lt;/a&gt;] - Unnecessary __pycache__ directories appears in pyflink.zip
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14288&quot;&gt;FLINK-14288&lt;/a&gt;] - Add Py4j NOTICE for source release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13892&quot;&gt;FLINK-13892&lt;/a&gt;] - HistoryServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14043&quot;&gt;FLINK-14043&lt;/a&gt;] - SavepointMigrationTestBase is super slow
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12164&quot;&gt;FLINK-12164&lt;/a&gt;] - JobMasterTest.testJobFailureWhenTaskExecutorHeartbeatTimeout is unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9900&quot;&gt;FLINK-9900&lt;/a&gt;] - Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13484&quot;&gt;FLINK-13484&lt;/a&gt;] - ConnectedComponents end-to-end test instable with NoResourceAvailableException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13489&quot;&gt;FLINK-13489&lt;/a&gt;] - Heavy deployment end-to-end test fails on Travis with TM heartbeat timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13514&quot;&gt;FLINK-13514&lt;/a&gt;] - StreamTaskTest.testAsyncCheckpointingConcurrentCloseAfterAcknowledge unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13530&quot;&gt;FLINK-13530&lt;/a&gt;] - AbstractServerTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13585&quot;&gt;FLINK-13585&lt;/a&gt;] - Fix sporadical deallock in TaskAsyncCallTest#testSetsUserCodeClassLoader()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13599&quot;&gt;FLINK-13599&lt;/a&gt;] - Kinesis end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13663&quot;&gt;FLINK-13663&lt;/a&gt;] - SQL Client end-to-end test for modern Kafka failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13688&quot;&gt;FLINK-13688&lt;/a&gt;] - HiveCatalogUseBlinkITCase.testBlinkUdf constantly failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13739&quot;&gt;FLINK-13739&lt;/a&gt;] - BinaryRowTest.testWriteString() fails in some environments
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13746&quot;&gt;FLINK-13746&lt;/a&gt;] - Elasticsearch (v2.3.5) sink end-to-end test fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13769&quot;&gt;FLINK-13769&lt;/a&gt;] - BatchFineGrainedRecoveryITCase.testProgram failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13807&quot;&gt;FLINK-13807&lt;/a&gt;] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13965&quot;&gt;FLINK-13965&lt;/a&gt;] - Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9941&quot;&gt;FLINK-9941&lt;/a&gt;] - Flush in ScalaCsvOutputFormat before close method
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13336&quot;&gt;FLINK-13336&lt;/a&gt;] - Remove the legacy batch fault tolerance page and redirect it to the new task failure recovery page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13380&quot;&gt;FLINK-13380&lt;/a&gt;] - Improve the usability of Flink session cluster on Kubernetes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13819&quot;&gt;FLINK-13819&lt;/a&gt;] - Introduce RpcEndpoint State
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13845&quot;&gt;FLINK-13845&lt;/a&gt;] - Drop all the content of removed &amp;quot;Checkpointed&amp;quot; interface
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13957&quot;&gt;FLINK-13957&lt;/a&gt;] - Log dynamic properties on job submission
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13967&quot;&gt;FLINK-13967&lt;/a&gt;] - Generate full binary licensing via collect_license_files.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13968&quot;&gt;FLINK-13968&lt;/a&gt;] - Add travis check for the correctness of the binary licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13449&quot;&gt;FLINK-13449&lt;/a&gt;] - Add ARM architecture to MemoryArchitecture
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Documentation
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13105&quot;&gt;FLINK-13105&lt;/a&gt;] - Add documentation for blink planner&amp;#39;s built-in functions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13277&quot;&gt;FLINK-13277&lt;/a&gt;] - add documentation of Hive source/sink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13354&quot;&gt;FLINK-13354&lt;/a&gt;] - Add documentation for how to use blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13355&quot;&gt;FLINK-13355&lt;/a&gt;] - Add documentation for Temporal Table Join in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13356&quot;&gt;FLINK-13356&lt;/a&gt;] - Add documentation for TopN and Deduplication in blink planner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13359&quot;&gt;FLINK-13359&lt;/a&gt;] - Add documentation for DDL introduction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13362&quot;&gt;FLINK-13362&lt;/a&gt;] - Add documentation for Kafka &amp;amp; ES &amp;amp; FileSystem DDL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13363&quot;&gt;FLINK-13363&lt;/a&gt;] - Add documentation for streaming aggregate performance tunning.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13706&quot;&gt;FLINK-13706&lt;/a&gt;] - add documentation of how to use Hive functions in Flink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13942&quot;&gt;FLINK-13942&lt;/a&gt;] - Add Overview page for Getting Started section
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13863&quot;&gt;FLINK-13863&lt;/a&gt;] - Update Operations Playground to Flink 1.9.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13937&quot;&gt;FLINK-13937&lt;/a&gt;] - Fix wrong hive dependency version in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13830&quot;&gt;FLINK-13830&lt;/a&gt;] - The Document about Cluster on yarn have some problems
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-14160&quot;&gt;FLINK-14160&lt;/a&gt;] - Extend Operations Playground with --backpressure option
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13388&quot;&gt;FLINK-13388&lt;/a&gt;] - Update UI screenshots in the documentation to the new default Web Frontend
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13415&quot;&gt;FLINK-13415&lt;/a&gt;] - Document how to use hive connector in scala shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13517&quot;&gt;FLINK-13517&lt;/a&gt;] - Restructure Hive Catalog documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13643&quot;&gt;FLINK-13643&lt;/a&gt;] - Document the workaround for users with a different minor Hive version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13757&quot;&gt;FLINK-13757&lt;/a&gt;] - Fix wrong description of &quot;IS NOT TRUE&quot; function documentation
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 18 Oct 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/10/18/release-1.9.1.html</link>
<guid isPermaLink="true">/news/2019/10/18/release-1.9.1.html</guid>
</item>
<item>
<title>The State Processor API: How to Read, write and modify the state of Flink applications</title>
<description>&lt;p&gt;Whether you are running Apache Flink&lt;sup&gt;Ⓡ&lt;/sup&gt; in production or evaluated Flink as a computation framework in the past, you’ve probably found yourself asking the question: How can I access, write or update state in a Flink savepoint? Ask no more! &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html&quot;&gt;Apache Flink 1.9.0&lt;/a&gt; introduces the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html&quot;&gt;State Processor API&lt;/a&gt;, a powerful extension of the DataSet API that allows reading, writing and modifying state in Flink’s savepoints and checkpoints.&lt;/p&gt;
&lt;p&gt;In this post, we explain why this feature is a big step for Flink, what you can use it for, and how to use it. Finally, we will discuss the future of the State Processor API and how it aligns with our plans to evolve Flink into a system for &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;unified batch and stream processing&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;stateful-stream-processing-with-apache-flink-until-flink-19&quot;&gt;Stateful Stream Processing with Apache Flink until Flink 1.9&lt;/h2&gt;
&lt;p&gt;All non-trivial stream processing applications are stateful and most of them are designed to run for months or years. Over time, many of them accumulate a lot of valuable state that can be very expensive or even impossible to rebuild if it gets lost due to a failure. In order to guarantee the consistency and durability of application state, Flink featured a sophisticated checkpointing and recovery mechanism from very early on. With every release, the Flink community has added more and more state-related features to improve checkpointing and recovery speed, the maintenance of applications, and practices to manage applications.&lt;/p&gt;
&lt;p&gt;However, a feature that was commonly requested by Flink users was the ability to access the state of an application “from the outside”. This request was motivated by the need to validate or debug the state of an application, to migrate the state of an application to another application, to evolve an application from the Heap State Backend to the RocksDB State Backend, or to import the initial state of an application from an external system like a relational database.&lt;/p&gt;
&lt;p&gt;Despite all those convincing reasons to expose application state externally, your access options have been fairly limited until now. Flink’s Queryable State feature only supports key-lookups (point queries) and does not guarantee the consistency of returned values (the value of a key might be different before and after an application recovered from a failure). Moreover, queryable state cannot be used to add or modify the state of an application. Also, savepoints, which are consistent snapshots of an application’s state, were not accessible because the application state is encoded with a custom binary format.&lt;/p&gt;
&lt;h2 id=&quot;reading-and-writing-application-state-with-the-state-processor-api&quot;&gt;Reading and Writing Application State with the State Processor API&lt;/h2&gt;
&lt;p&gt;The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/table/common.html#integration-with-datastream-and-dataset-api&quot;&gt;interoperability of DataSet and Table API&lt;/a&gt;, you can even use relational Table API or SQL queries to analyze and process state data.&lt;/p&gt;
&lt;p&gt;For example, you can take a savepoint of a running stream processing application and analyze it with a DataSet batch program to verify that the application behaves correctly. Or you can read a batch of data from any store, preprocess it, and write the result to a savepoint that you use to bootstrap the state of a streaming application. It’s also possible to fix inconsistent state entries now. Finally, the State Processor API opens up many ways to evolve a stateful application that were previously blocked by parameter and design choices that could not be changed without losing all the state of the application after it was started. For example, you can now arbitrarily modify the data types of states, adjust the maximum parallelism of operators, split or merge operator state, re-assign operator UIDs, and so on.&lt;/p&gt;
&lt;h2 id=&quot;mapping-application-state-to-datasets&quot;&gt;Mapping Application State to DataSets&lt;/h2&gt;
&lt;p&gt;The State Processor API maps the state of a streaming application to one or more data sets that can be separately processed. In order to be able to use the API, you need to understand how this mapping works.&lt;/p&gt;
&lt;p&gt;But let’s first have a look at what a stateful Flink job looks like. A Flink job is composed of operators, typically one or more source operators, a few operators for the actual processing, and one or more sink operators. Each operator runs in parallel in one or more tasks and can work with different types of state. An operator can have zero, one, or more &lt;em&gt;“operator states”&lt;/em&gt; which are organized as lists that are scoped to the operator’s tasks. If the operator is applied on a keyed stream, it can also have zero, one, or more &lt;em&gt;“keyed states”&lt;/em&gt; which are scoped to a key that is extracted from each processed record. You can think of keyed state as a distributed key-value map.&lt;/p&gt;
&lt;p&gt;The following figure shows the application “MyApp” which consists of three operators called “Src”, “Proc”, and “Snk”. Src has one operator state (os1), Proc has one operator state (os2) and two keyed states (ks1, ks2) and Snk is stateless.&lt;/p&gt;
&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
&lt;img src=&quot;/img/blog/2019-09-13-state-processor-api-blog/application-my-app-state-processor-api.png&quot; width=&quot;600px&quot; alt=&quot;Application: My App&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;A savepoint or checkpoint of MyApp consists of the data of all states, organized in a way that the states of each task can be restored. When processing the data of a savepoint (or checkpoint) with a batch job, we need a mental model that maps the data of the individual tasks’ states into data sets or tables. In fact, we can think of a savepoint as a database. Every operator (identified by its UID) represents a namespace. Each operator state of an operator is mapped to a dedicated table in the namespace with a single column that holds the state’s data of all tasks. All keyed states of an operator are mapped to a single table consisting of a column for the key, and one column for each keyed state. The following figure shows how a savepoint of MyApp is mapped to a database.&lt;/p&gt;
&lt;p style=&quot;display: block; text-align: center; margin-top: 20px; margin-bottom: 20px&quot;&gt;
&lt;img src=&quot;/img/blog/2019-09-13-state-processor-api-blog/database-my-app-state-processor-api.png&quot; width=&quot;600px&quot; alt=&quot;Database: My App&quot; /&gt;
&lt;/p&gt;
&lt;p&gt;The figure shows how the values of Src’s operator state are mapped to a table with one column and five rows, one row for all list entries across all parallel tasks of Src. Operator state os2 of the operator “Proc” is similarly mapped to an individual table. The keyed states ks1 and ks2 are combined to a single table with three columns, one for the key, one for ks1 and one for ks2. The keyed table holds one row for each distinct key of both keyed states. Since the operator “Snk” does not have any state, its namespace is empty.&lt;/p&gt;
&lt;p&gt;The State Processor API now offers methods to create, load, and write a savepoint. You can read a DataSet from a loaded savepoint or convert a DataSet into a state and add it to a savepoint. DataSets can be processed with the full feature set of the DataSet API. With these building blocks, all of the before-mentioned use cases (and more) can be addressed. Please have a look at the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/libs/state_processor_api.html&quot;&gt;documentation&lt;/a&gt; if you’d like to learn how to use the State Processor API in detail.&lt;/p&gt;
&lt;h2 id=&quot;why-dataset-api&quot;&gt;Why DataSet API?&lt;/h2&gt;
&lt;p&gt;In case you are familiar with &lt;a href=&quot;https://flink.apache.org/roadmap.html&quot;&gt;Flink’s roadmap&lt;/a&gt;, you might be surprised that the State Processor API is based on the DataSet API. The Flink community plans to extend the DataStream API with the concept of &lt;em&gt;BoundedStreams&lt;/em&gt; and deprecate the DataSet API. When designing this feature, we also evaluated the DataStream API or Table API but neither could provide the right feature set yet. Since we didn’t want to block this feature on the progress of Flink’s APIs, we decided to build it on the DataSet API, but kept its dependencies on the DataSet API to a minimum. Hence, migrating it to another API should be fairly easy.&lt;/p&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;Flink users have requested a feature to access and modify the state of streaming applications from the outside for a long time. With the State Processor API, Flink 1.9.0 finally exposes application state as a data format that can be manipulated. This feature opens up many new possibilities for how users can maintain and manage Flink streaming applications, including arbitrary evolution of stream applications and exporting and bootstrapping of application state. To put it concisely, the State Processor API unlocks the black box that savepoints used to be.&lt;/p&gt;
</description>
<pubDate>Fri, 13 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/feature/2019/09/13/state-processor-api.html</link>
<guid isPermaLink="true">/feature/2019/09/13/state-processor-api.html</guid>
</item>
<item>
<title>Apache Flink 1.8.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;
&lt;p&gt;This release includes 23 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all fixes and improvements.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13941&quot;&gt;FLINK-13941&lt;/a&gt;] - Prevent data-loss by not cleaning up small part files from S3.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9526&quot;&gt;FLINK-9526&lt;/a&gt;] - BucketingSink end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10368&quot;&gt;FLINK-10368&lt;/a&gt;] - &amp;#39;Kerberized YARN on Docker test&amp;#39; unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12319&quot;&gt;FLINK-12319&lt;/a&gt;] - StackOverFlowError in cep.nfa.sharedbuffer.SharedBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12736&quot;&gt;FLINK-12736&lt;/a&gt;] - ResourceManager may release TM with allocated slots
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12889&quot;&gt;FLINK-12889&lt;/a&gt;] - Job keeps in FAILING state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13059&quot;&gt;FLINK-13059&lt;/a&gt;] - Cassandra Connector leaks Semaphore on Exception; hangs on close
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13159&quot;&gt;FLINK-13159&lt;/a&gt;] - java.lang.ClassNotFoundException when restore job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13367&quot;&gt;FLINK-13367&lt;/a&gt;] - Make ClosureCleaner detect writeReplace serialization override
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13369&quot;&gt;FLINK-13369&lt;/a&gt;] - Recursive closure cleaner ends up with stackOverflow in case of circular dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13394&quot;&gt;FLINK-13394&lt;/a&gt;] - Use fallback unsafe secure MapR in nightly.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13484&quot;&gt;FLINK-13484&lt;/a&gt;] - ConnectedComponents end-to-end test instable with NoResourceAvailableException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13499&quot;&gt;FLINK-13499&lt;/a&gt;] - Remove dependency on MapR artifact repository
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13508&quot;&gt;FLINK-13508&lt;/a&gt;] - CommonTestUtils#waitUntilCondition() may attempt to sleep with negative time
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13586&quot;&gt;FLINK-13586&lt;/a&gt;] - Method ClosureCleaner.clean broke backward compatibility between 1.8.0 and 1.8.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13761&quot;&gt;FLINK-13761&lt;/a&gt;] - `SplitStream` should be deprecated because `SplitJavaStream` is deprecated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13789&quot;&gt;FLINK-13789&lt;/a&gt;] - Transactional Id Generation fails due to user code impacting formatting string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13806&quot;&gt;FLINK-13806&lt;/a&gt;] - Metric Fetcher floods the JM log with errors when TM is lost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13807&quot;&gt;FLINK-13807&lt;/a&gt;] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-13897&quot;&gt;FLINK-13897&lt;/a&gt;] - OSS FS NOTICE file is placed in wrong directory
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12578&quot;&gt;FLINK-12578&lt;/a&gt;] - Use secure URLs for Maven repositories
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12741&quot;&gt;FLINK-12741&lt;/a&gt;] - Update docs about Kafka producer fault tolerance guarantees
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12749&quot;&gt;FLINK-12749&lt;/a&gt;] - Add Flink Operations Playground documentation
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/09/11/release-1.8.2.html</link>
<guid isPermaLink="true">/news/2019/09/11/release-1.8.2.html</guid>
</item>
<item>
<title>Flink Community Update - September&#39;19</title>
<description>&lt;p&gt;This has been an exciting, fast-paced year for the Apache Flink community. But with over 10k messages across the mailing lists, 3k Jira tickets and 2k pull requests, it is not easy to keep up with the latest state of the project. Plus everything happening around it. With that in mind, we want to bring back regular community updates to the Flink blog.&lt;/p&gt;
&lt;p&gt;The first post in the series takes you on an little detour across the year, to freshen up and make sure you’re all up to date.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#the-year-so-far-in-flink&quot; id=&quot;markdown-toc-the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#integration-of-the-chinese-speaking-community&quot; id=&quot;markdown-toc-integration-of-the-chinese-speaking-community&quot;&gt;Integration of the Chinese-speaking community&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improving-flinks-documentation&quot; id=&quot;markdown-toc-improving-flinks-documentation&quot;&gt;Improving Flink’s Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#adjusting-the-contribution-process-and-experience&quot; id=&quot;markdown-toc-adjusting-the-contribution-process-and-experience&quot;&gt;Adjusting the Contribution Process and Experience&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers-and-pmc-members&quot; id=&quot;markdown-toc-new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#new-pmc-members&quot; id=&quot;markdown-toc-new-pmc-members&quot;&gt;New PMC Members&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-committers&quot; id=&quot;markdown-toc-new-committers&quot;&gt;New Committers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#the-bigger-picture&quot; id=&quot;markdown-toc-the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#upcoming-events&quot; id=&quot;markdown-toc-upcoming-events&quot;&gt;Upcoming Events&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#north-america&quot; id=&quot;markdown-toc-north-america&quot;&gt;North America&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#europe&quot; id=&quot;markdown-toc-europe&quot;&gt;Europe&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#asia&quot; id=&quot;markdown-toc-asia&quot;&gt;Asia&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;the-year-so-far-in-flink&quot;&gt;The Year (so far) in Flink&lt;/h1&gt;
&lt;p&gt;Two major versions were released this year: &lt;a href=&quot;https://flink.apache.org/news/2019/04/09/release-1.8.0.html&quot;&gt;Flink 1.8&lt;/a&gt; and &lt;a href=&quot;https://flink.apache.org/news/2019/08/22/release-1.9.0.html&quot;&gt;Flink 1.9&lt;/a&gt;; paving the way for the goal of making Flink the first framework to seamlessly support stream and batch processing with a single, unified runtime. The &lt;a href=&quot;https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;contribution of Blink&lt;/a&gt; to Apache Flink was key in accelerating the path to this vision and reduced the waiting time for long-pending user requests — such as Hive integration, (better) Python support, the rework of Flink’s Machine Learning library and…fine-grained failure recovery (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The 1.9 release was the result of the &lt;strong&gt;biggest community effort the project has experienced so far&lt;/strong&gt;, with the number of contributors soaring to 190 (see &lt;a href=&quot;#the-bigger-picture&quot;&gt;The Bigger Picture&lt;/a&gt;). For a quick overview of the upcoming work for Flink 1.10 (and beyond), have a look at the updated &lt;a href=&quot;https://flink.apache.org/roadmap.html&quot;&gt;roadmap&lt;/a&gt;!&lt;/p&gt;
&lt;h2 id=&quot;integration-of-the-chinese-speaking-community&quot;&gt;Integration of the Chinese-speaking community&lt;/h2&gt;
&lt;p&gt;As the number of Chinese-speaking Flink users rapidly grows, the community is working on translating resources and creating dedicated spaces for discussion to invite and include these users in the wider Flink community. Part of the ongoing work is described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-35%3A+Support+Chinese+Documents+and+Website&quot;&gt;FLIP-35&lt;/a&gt; and has resulted in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A new user mailing list (user-zh@f.a.o) dedicated to Chinese-speakers.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A Chinese translation of the Apache Flink &lt;a href=&quot;https://flink.apache.org/zh/&quot;&gt;website&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/zh/&quot;&gt;documentation&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Multiple meetups organized all over China, with the biggest one reaching a whopping number of 500+ participants. Some of these meetups were also organized in collaboration with communities from other projects, like Apache Pulsar and Apache Kafka.&lt;/li&gt;
&lt;/ul&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_3.png&quot; width=&quot;800px&quot; alt=&quot;China Meetup&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;In case you’re interested in knowing more about this work in progress, Robert Metzger and Fabian Hueske will be diving into “Inviting Apache Flink’s Chinese User Community” at the upcoming ApacheCon Europe 2019 (see &lt;a href=&quot;#upcoming-flink-community-events&quot;&gt;Upcoming Flink Community Events&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&quot;improving-flinks-documentation&quot;&gt;Improving Flink’s Documentation&lt;/h2&gt;
&lt;p&gt;Besides the translation effort, the community has also been working quite hard on a &lt;strong&gt;Flink docs overhaul&lt;/strong&gt;. The main goals are to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Organize and clean-up the structure of the docs;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Align the content with the overall direction of the project;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Improve the &lt;em&gt;getting-started&lt;/em&gt; material and make the content more accessible to different levels of Flink experience.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given that there has been some confusion in the past regarding unclear definition of core Flink concepts, one of the first completed efforts was to introduce a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/concepts/glossary.html#glossary&quot;&gt;Glossary&lt;/a&gt; in the docs. To get up to speed with the roadmap for the remainder efforts, you can refer to &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-42%3A+Rework+Flink+Documentation&quot;&gt;FLIP-42&lt;/a&gt; and the corresponding &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12639&quot;&gt;umbrella Jira ticket&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;adjusting-the-contribution-process-and-experience&quot;&gt;Adjusting the Contribution Process and Experience&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://flink.apache.org/contributing/how-to-contribute.html&quot;&gt;guidelines&lt;/a&gt; to contribute to Apache Flink have been reworked on the website, in an effort to lower the entry barrier for new contributors and reduce the overall friction in the contribution process. In addition, the Flink community discussed and adopted &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026&quot;&gt;bylaws&lt;/a&gt; to help the community collaborate and coordinate more smoothly.&lt;/p&gt;
&lt;p&gt;For code contributors, a &lt;a href=&quot;https://flink.apache.org/contributing/code-style-and-quality-preamble.html&quot;&gt;Code Style and Quality Guide&lt;/a&gt; that captures the expected standards for contributions was also added to the “Contributing” section of the Flink website.&lt;/p&gt;
&lt;p&gt;It’s important to stress that &lt;strong&gt;contributions are not restricted to code&lt;/strong&gt;. Non-code contributions such as mailing list support, documentation work or organization of community events are equally as important to the development of the project and highly encouraged.&lt;/p&gt;
&lt;h2 id=&quot;new-committers-and-pmc-members&quot;&gt;New Committers and PMC Members&lt;/h2&gt;
&lt;p&gt;The Apache Flink community has welcomed &lt;strong&gt;5 new Committers&lt;/strong&gt; and &lt;strong&gt;4 PMC (Project Management Committee) Members&lt;/strong&gt; in 2019, so far:&lt;/p&gt;
&lt;h3 id=&quot;new-pmc-members&quot;&gt;New PMC Members&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Jincheng Sun, Kete (Kurt) Young, Kostas Kloudas, Thomas Weise
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;new-committers&quot;&gt;New Committers&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;Andrey Zagrebin, Hequn, Jiangjie (Becket) Qin, Rong Rong, Zhijiang Wang
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Congratulations and thank you for your hardworking commitment to Flink!&lt;/p&gt;
&lt;h1 id=&quot;the-bigger-picture&quot;&gt;The Bigger Picture&lt;/h1&gt;
&lt;p&gt;Flink continues to push the boundaries of (stream) data processing, and the community is proud to see an ever-increasingly diverse set of contributors, users and technologies join the ecosystem.&lt;/p&gt;
&lt;p&gt;In the timeframe of three releases, the project jumped from &lt;strong&gt;112 to 190 contributors&lt;/strong&gt;, also doubling down on the number of requested changes and improvements. To top it off, the Flink GitHub repository recently reached the milestone of &lt;strong&gt;10k stars&lt;/strong&gt;, all the way up from the incubation days in 2014.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_1.png&quot; width=&quot;1000px&quot; alt=&quot;GitHub&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The activity across the user@ and dev@&lt;sup&gt;1&lt;/sup&gt; mailing lists shows a healthy heartbeat, and the gradual ramp up of user-zh@ suggests that this was a well-received community effort. Looking at the numbers for the same period in 2018, the dev@ mailing list has seen the biggest surge in activity, with an average growth of &lt;strong&gt;2.5x in the number of messages and distinct users&lt;/strong&gt; — a great reflection of the hyperactive pace of development of the Flink codebase.&lt;/p&gt;
&lt;p&gt;&lt;img style=&quot;float: right;&quot; src=&quot;/img/blog/2019-09-05-flink-community-update/2019-09-05-flink-community-update_2.png&quot; width=&quot;420px&quot; alt=&quot;Mailing Lists&quot; /&gt;&lt;/p&gt;
&lt;p&gt;In support of these observations, the report for the financial year of 2019 from the Apache Software Foundation (ASF) features Flink as one of the most thriving open source projects, with mentions for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Most Active Visits and Downloads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Most Active Sources: Visits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Most Active Sources: Clones&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Top Repositories by Number of Commits&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Top Most Active Apache Mailing Lists (user@ and dev@)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Hats off to our fellows at Apache Beam for an astounding year, too! For more detailed insights, check the &lt;a href=&quot;https://s3.amazonaws.com/files-dist/AnnualReports/FY2018%20Annual%20Report.pdf&quot;&gt;full report&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;1. Excluding messages from “jira@apache.org”.&lt;/sup&gt;&lt;/p&gt;
&lt;h1 id=&quot;upcoming-events&quot;&gt;Upcoming Events&lt;/h1&gt;
&lt;p&gt;As the conference and meetup season ramps up again, here are some events to keep an eye out for talks about Flink and opportunities to mingle with the wider stream processing community.&lt;/p&gt;
&lt;h3 id=&quot;north-america&quot;&gt;North America&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://conferences.oreilly.com/strata/strata-ny&quot;&gt;Strata Data Conference 2019&lt;/a&gt;&lt;/strong&gt;, September 23-26, New York, USA
&lt;p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;[Meetup] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/&quot;&gt;Apache Flink Bay Area Meetup&lt;/a&gt;&lt;/strong&gt;, September 24, San Francisco, USA
&lt;p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Bay-Area-Apache-Flink-Meetup/events/262680261/&quot;&gt;Scale By The Bay 2019&lt;/a&gt;&lt;/strong&gt;, November 13-15, San Francisco, USA&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;europe&quot;&gt;Europe&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[Meetup] &lt;strong&gt;&lt;a href=&quot;https://www.meetup.com/Apache-Flink-London-Meetup/events/264123672&quot;&gt;Apache Flink London Meetup&lt;/a&gt;&lt;/strong&gt;, September 23, London, UK
&lt;p&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://europe-2019.flink-forward.org&quot;&gt;Flink Forward Europe 2019&lt;/a&gt;&lt;/strong&gt;, October 7-9, Berlin, Germany
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The next edition of Flink Forward Europe is around the corner and the &lt;a href=&quot;https://europe-2019.flink-forward.org/conference-program&quot;&gt;program&lt;/a&gt; has been announced, featuring 70+ talks as well as panel discussions and interactive “Ask Me Anything” sessions with core Flink committers. If you’re looking to learn more about Flink and share your experience with other community members, there really is &lt;a href=&quot;(https://vimeo.com/296403091)&quot;&gt;no better place&lt;/a&gt; than Flink Forward!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; if you are a &lt;strong&gt;committer for any Apache project&lt;/strong&gt;, you can &lt;strong&gt;get a free ticket&lt;/strong&gt; by registering with your Apache email address and using the discount code: &lt;em&gt;FFEU19-ApacheCommitter&lt;/em&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://aceu19.apachecon.com/&quot;&gt;ApacheCon Berlin 2019&lt;/a&gt;&lt;/strong&gt;, October 22-24, Berlin, Germany&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://www.data2day.de/&quot;&gt;Data2Day 2019&lt;/a&gt;&lt;/strong&gt;, October 22-24, Ludwigshafen, Germany&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://bigdatatechwarsaw.eu&quot;&gt;Big Data Tech Warsaw 2020&lt;/a&gt;&lt;/strong&gt;, February 7, Warsaw, Poland
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The Call For Presentations (CFP) is now &lt;a href=&quot;https://bigdatatechwarsaw.eu/cfp/&quot;&gt;open&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;asia&quot;&gt;Asia&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[Conference] &lt;strong&gt;&lt;a href=&quot;https://m.aliyun.com/markets/aliyun/developer/ffa2019&quot;&gt;Flink Forward Asia 2019&lt;/a&gt;&lt;/strong&gt;, November 28-30, Beijing, China
&lt;p&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The second edition of Flink Forward Asia is also happening later this year, in Beijing, and the CFP is &lt;a href=&quot;https://developer.aliyun.com/special/ffa2019&quot;&gt;open&lt;/a&gt; until September 20.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’d like to keep a closer eye on what’s happening in the community, subscribe to the &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;community mailing list&lt;/a&gt; to get fine-grained weekly updates, upcoming event announcements and more. Also, please reach out if you’re interested in organizing or being part of Flink events in your area!&lt;/p&gt;
</description>
<pubDate>Tue, 10 Sep 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/09/10/community-update.html</link>
<guid isPermaLink="true">/news/2019/09/10/community-update.html</guid>
</item>
<item>
<title>Apache Flink 1.9.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is proud to announce the release of Apache Flink
1.9.0.&lt;/p&gt;
&lt;p&gt;The Apache Flink project’s goal is to develop a stream processing system to
unify and power many forms of real-time and offline data processing
applications as well as event-driven applications. In this release, we have
made a huge step forward in that effort, by integrating Flink’s stream and
batch processing capabilities under a single, unified runtime.&lt;/p&gt;
&lt;p&gt;Significant features on this path are batch-style recovery for batch jobs and
a preview of the new Blink-based query engine for Table API and SQL queries.
We are also excited to announce the availability of the State Processor API,
which is one of the most frequently requested features and enables users to
read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes
a reworked WebUI and previews of Flink’s new Python Table API and its
integration with the Apache Hive ecosystem.&lt;/p&gt;
&lt;p&gt;This blog post describes all major new features and improvements, important
changes to be aware of and what to expect moving forward. For more details,
check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344601&quot;&gt;complete release
changelog&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The binary distribution and source artifacts for this release are now
available via the &lt;a href=&quot;https://flink.apache.org/downloads.html&quot;&gt;Downloads&lt;/a&gt; page of
the Flink project, along with the updated
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/&quot;&gt;documentation&lt;/a&gt;.
Flink 1.9 is API-compatible with previous 1.x releases for APIs annotated with
the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;
&lt;p&gt;Please feel encouraged to download the release and share your thoughts with
the community through the Flink &lt;a href=&quot;https://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing
lists&lt;/a&gt; or
&lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt;. As always,
feedback is very much appreciated!&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#fine-grained-batch-recovery-flip-1&quot; id=&quot;markdown-toc-fine-grained-batch-recovery-flip-1&quot;&gt;Fine-grained Batch Recovery (FLIP-1)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#state-processor-api-flip-43&quot; id=&quot;markdown-toc-state-processor-api-flip-43&quot;&gt;State Processor API (FLIP-43)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#stop-with-savepoint-flip-34&quot; id=&quot;markdown-toc-stop-with-savepoint-flip-34&quot;&gt;Stop-with-Savepoint (FLIP-34)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-webui-rework&quot; id=&quot;markdown-toc-flink-webui-rework&quot;&gt;Flink WebUI Rework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#preview-of-the-new-blink-sql-query-processor&quot; id=&quot;markdown-toc-preview-of-the-new-blink-sql-query-processor&quot;&gt;Preview of the new Blink SQL Query Processor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#preview-of-full-hive-integration-flink-10556&quot; id=&quot;markdown-toc-preview-of-full-hive-integration-flink-10556&quot;&gt;Preview of Full Hive Integration (FLINK-10556)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#preview-of-the-new-python-table-api-flip-38&quot; id=&quot;markdown-toc-preview-of-the-new-python-table-api-flip-38&quot;&gt;Preview of the new Python Table API (FLIP-38)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;fine-grained-batch-recovery-flip-1&quot;&gt;Fine-grained Batch Recovery (FLIP-1)&lt;/h3&gt;
&lt;p&gt;The time to recover a batch (DataSet, Table API and SQL) job from a task
failure was significantly reduced. Until Flink 1.9, task failures in batch
jobs were recovered by canceling all tasks and restarting the whole job, i.e,
the job was started from scratch and all progress was voided. With this
release, Flink can be configured to limit the recovery to only those tasks
that are in the same &lt;strong&gt;failover region&lt;/strong&gt;. A failover region is the set of
tasks that are connected via pipelined data exchanges. Hence, the
batch-shuffle connections of a job define the boundaries of its failover
regions. More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt;.
&lt;img src=&quot;/img/blog/release-19-flip1.png&quot; alt=&quot;alt_text&quot; title=&quot;Fine-grained Batch
Recovery&quot; /&gt;&lt;/p&gt;
&lt;p&gt;To use this new failover strategy, you need to do the following
settings:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make sure you have the entry &lt;code&gt;jobmanager.execution.failover-strategy:
region&lt;/code&gt; in your &lt;code&gt;flink-conf.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The configuration of the 1.9 distribution has that entry by default,
but when reusing a configuration file from previous setups, you have to add
it manually.&lt;/p&gt;
&lt;p&gt;Moreover, you need to set the &lt;code&gt;ExecutionMode&lt;/code&gt; of batch jobs in the
&lt;code&gt;ExecutionConfig&lt;/code&gt; to &lt;code&gt;BATCH&lt;/code&gt; to configure that data shuffles are not pipelined
and jobs have more than one failover region.&lt;/p&gt;
&lt;p&gt;The “Region” failover strategy also improves the recovery of “embarrassingly
parallel” streaming jobs, i.e., jobs without any shuffle like keyBy() or
rebalance. When such a job is recovered, only the tasks of the affected
pipeline (failover region) are restarted. For all other streaming jobs, the
recovery behavior is the same as in prior Flink versions.&lt;/p&gt;
&lt;h3 id=&quot;state-processor-api-flip-43&quot;&gt;State Processor API (FLIP-43)&lt;/h3&gt;
&lt;p&gt;Up to Flink 1.9, accessing the state of a job from the outside was limited to
the (still) experimental &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html&quot;&gt;Queryable
State&lt;/a&gt;.
This release introduces a new, powerful library to read, write and modify
state snapshots using the batch DataSet API. In practice, this means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flink job state can be bootstrapped by reading data from external systems,
such as external databases, and converting it into a savepoint.&lt;/li&gt;
&lt;li&gt;State in savepoints can be queried using any of Flink’s batch APIs
(DataSet, Table, SQL), for example to analyze relevant state patterns or
check for discrepancies in state that can support application auditing or
troubleshooting.&lt;/li&gt;
&lt;li&gt;The schema of state in savepoints can be migrated offline, compared to the
previous approach requiring online migration on schema access.&lt;/li&gt;
&lt;li&gt;Invalid data in savepoints can be identified and corrected.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The new State Processor API covers all variations of snapshots: savepoints,
full checkpoints and incremental checkpoints. More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-43%3A+State+Processor+API&quot;&gt;FLIP-43&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;stop-with-savepoint-flip-34&quot;&gt;Stop-with-Savepoint (FLIP-34)&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/savepoints.html#operations&quot;&gt;Cancelling with a
savepoint&lt;/a&gt;
is a common operation for stopping/restarting, forking or updating Flink jobs.
However, the existing implementation did not guarantee output persistence to
external storage systems for exactly-once sinks. To improve the end-to-end
semantics when stopping a job, Flink 1.9 introduces a new &lt;code&gt;SUSPEND&lt;/code&gt; mode to
stop a job with a savepoint that is consistent with the emitted data.
You can suspend a job with Flink’s CLI client as follows:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;bin/flink stop -p [:targetDirectory] :jobId
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The final job state is set to &lt;code&gt;FINISHED&lt;/code&gt; on success, allowing
users to detect failures of the requested operation.&lt;/p&gt;
&lt;p&gt;More details are available in
&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=103090212&quot;&gt;FLIP-34&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;flink-webui-rework&quot;&gt;Flink WebUI Rework&lt;/h3&gt;
&lt;p&gt;After a
&lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Change-underlying-Frontend-Architecture-for-Flink-Web-Dashboard-td24902.html&quot;&gt;discussion&lt;/a&gt;
about modernizing the internals of Flink’s WebUI, this component was
reconstructed using the latest stable version of Angular — basically, a bump
from Angular 1.x to 7.x. The redesigned version is the default in 1.9.0,
however there is a link to switch to the old WebUI.&lt;/p&gt;
&lt;div class=&quot;row&quot;&gt; &lt;div class=&quot;col-sm-6&quot;&gt; &lt;span&gt;&lt;img class=&quot;thumbnail&quot; src=&quot;/img/blog/release-19-web1.png&quot; /&gt;&lt;/span&gt; &lt;/div&gt; &lt;div class=&quot;col-sm-6&quot;&gt; &lt;span&gt;&lt;img class=&quot;thumbnail&quot; src=&quot;/img/blog/release-19-web2.png&quot; /&gt;&lt;/span&gt; &lt;/div&gt;
&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Moving forward, feature parity for the old version of the WebUI
will not be guaranteed.&lt;/p&gt;
&lt;h3 id=&quot;preview-of-the-new-blink-sql-query-processor&quot;&gt;Preview of the new Blink SQL Query Processor&lt;/h3&gt;
&lt;p&gt;Following the &lt;a href=&quot;/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;donation of
Blink&lt;/a&gt; to
Apache Flink, the community worked on integrating Blink’s query optimizer and
runtime for the Table API and SQL. As a first step, we refactored the
monolithic &lt;code&gt;flink-table&lt;/code&gt; module into smaller modules
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;).
This resulted in a clear separation of and well-defined interfaces between the
Java and Scala API modules and the optimizer and runtime modules.&lt;/p&gt;
&lt;p&gt;&lt;span&gt;&lt;img style=&quot;width:50%&quot; src=&quot;/img/blog/release-19-stack.png&quot; /&gt;&lt;/span&gt;&lt;/p&gt;
&lt;p&gt;Next, we extended Blink’s planner to implement the new optimizer interface
such that there are now two pluggable query processors to execute Table API
and SQL statements: the pre-1.9 Flink processor and the new Blink-based query
processor. The Blink-based query processor offers better SQL coverage (full TPC-H
coverage in 1.9, TPC-DS coverage is planned for the next release) and improved
performance for batch queries as the result of more extensive query
optimization (cost-based plan selection and more optimization rules), improved
code-generation, and tuned operator implementations.
The Blink-based query processor also provides a more powerful streaming runner,
with some new features (e.g. dimension table join, TopN, deduplication) and
optimizations to solve data-skew in aggregation and more useful built-in
functions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The semantics and set of supported operations of the query
processors are mostly, but not fully aligned.&lt;/p&gt;
&lt;p&gt;However, the integration of Blink’s query processor is not fully completed
yet. Therefore, the pre-1.9 Flink processor is still the default processor in
Flink 1.9 and recommended for production settings. You can enable the Blink
processor by configuring it via the &lt;code&gt;EnvironmentSettings&lt;/code&gt; when creating a
&lt;code&gt;TableEnvironment&lt;/code&gt;. The selected processor must be on the classpath of the
executing Java process. For cluster setups, both query processors are
automatically loaded with the default configuration. When running a query from
your IDE you need to explicitly &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/#table-program-dependencies&quot;&gt;add a planner
dependency&lt;/a&gt;
to your project.&lt;/p&gt;
&lt;h4 id=&quot;other-improvements-to-the-table-api-and-sql&quot;&gt;&lt;strong&gt;Other Improvements to the Table API and SQL&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Besides the exciting progress around the Blink planner, the community worked
on a whole set of other improvements to these interfaces, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scala-free Table API and SQL for Java users
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;As part of the refactoring and splitting of the flink-table module, two
separate API modules for Java and Scala were created. For Scala users,
nothing really changes, but Java users can use the Table API and/or SQL now
without pulling in a Scala dependency.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Rework of the Table API Type System&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System&quot;&gt;FLIP-37&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The community implemented a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/types.html#data-types&quot;&gt;new data type
system&lt;/a&gt;
to detach the Table API from Flink’s
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/types_serialization.html#flinks-typeinformation-class&quot;&gt;TypeInformation&lt;/a&gt;
class and improve its compliance with the SQL standard. This is still a
work in progress and expected to be completed in the next release. In
Flink 1.9, UDFs are―among other things―not ported to the new type system
yet.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Multi-column and Multi-row Transformations for Table API&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97552739&quot;&gt;FLIP-29&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The functionality of the Table API was extended with a set of
transformations that support multi-row and/or multi-column inputs and
outputs. These transformations significantly ease the implementation of
processing logic that would be cumbersome to implement with relational
operators.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New, Unified Catalog APIs&lt;/strong&gt;
&lt;strong&gt;(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs&quot;&gt;FLIP-30&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;We reworked the catalog APIs to store metadata and unified the handling of
internal and external catalogs. This effort was mainly initiated as a
prerequisite for the Hive integration (see below), but improves the overall
convenience of managing catalog metadata in Flink. Besides improving the
catalog interfaces, we also extended their functionality. Previously table
definitions for Table API or SQL queries were volatile. With Flink 1.9, the
metadata of tables which are registered with a SQL DDL statement can be
persisted in a catalog. This means you can add a table that is backed by a
Kafka topic to a Metastore catalog and from then on query this table
whenever your catalog is connected to Metastore.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;DDL Support in the SQL API
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10232&quot;&gt;FLINK-10232&lt;/a&gt;)&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Up to this point, Flink SQL only supported DML statements (e.g. &lt;code&gt;SELECT&lt;/code&gt;,
&lt;code&gt;INSERT&lt;/code&gt;). External tables (table sources and sinks) had to be registered
via Java/Scala code or configuration files. For 1.9, we added support for
SQL DDL statements to register and remove tables and views (&lt;code&gt;CREATE TABLE,
DROP TABLE)&lt;/code&gt;. However, we did not add
stream-specific syntax extensions to define timestamp extraction and
watermark generation, yet. Full support for streaming use cases is planned
for the next release.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;preview-of-full-hive-integration-flink-10556&quot;&gt;Preview of Full Hive Integration (FLINK-10556)&lt;/h3&gt;
&lt;p&gt;Apache Hive is widely used in Hadoop’s ecosystem to store and query large
amounts of structured data. Besides being a query processor, Hive features a
catalog called Metastore to manage and organize large datasets. A common
integration point for query processors is to integrate with Hive’s Metastore
in order to be able to tap into the data managed by Hive.&lt;/p&gt;
&lt;p&gt;Recently, the community started implementing an external catalog for Flink’s
Table API and SQL that connects to Hive’s Metastore. In Flink 1.9, users will
be able to query and process all data that is stored in Hive. As described
earlier, you will also be able to persist metadata of Flink tables in Metastore.
Moreover, the Hive integration includes support to use Hive’s UDFs in Flink
Table API or SQL queries. More details are available in
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10556&quot;&gt;FLINK-10556&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;While, previously, table definitions for Table API or SQL queries were always
volatile, the new catalog connector additionally allows persisting a table in
Metastore that is created with a SQL DDL statement (see above). This means
that you connect to Metastore and register a table that is, for example,
backed by a Kafka topic. From now on, you can query that table whenever your
catalog is connected to Metastore.&lt;/p&gt;
&lt;p&gt;Please note that the Hive support in Flink 1.9 is experimental. We are
planning to stabilize these features for the next release and are looking
forward to your feedback.&lt;/p&gt;
&lt;h3 id=&quot;preview-of-the-new-python-table-api-flip-38&quot;&gt;Preview of the new Python Table API (FLIP-38)&lt;/h3&gt;
&lt;p&gt;This release also introduces a first version of a Python Table API
(&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API&quot;&gt;FLIP-38&lt;/a&gt;).
This marks the start towards our goal of bringing
full-fledged Python support to Flink. The feature was designed as a slim
Python API wrapper around the Table API, basically translating Python Table
API method calls into Java Table API calls. In the initial version that ships
with Flink 1.9, the Python Table API does not support UDFs yet, but just
standard relational operations. Support for UDFs implemented in Python is on
the roadmap for future releases.&lt;/p&gt;
&lt;p&gt;If you’d like to try the new Python API, you have to manually &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/flinkDev/building.html#build-pyflink&quot;&gt;install
PyFlink&lt;/a&gt;.
From there, you can have a look at &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/tutorials/python_table_api.html&quot;&gt;this
walkthrough&lt;/a&gt;
or explore it on your own. The &lt;a href=&quot;http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Publish-the-PyFlink-into-PyPI-td31201.html&quot;&gt;community is currently
working&lt;/a&gt;
on preparing a &lt;code&gt;pyflink&lt;/code&gt; Python package that will be made available for
installation via &lt;code&gt;pip&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;The Table API and SQL are now part of the default configuration of the
Flink distribution. Before, the Table API and SQL had to be enabled by
moving the corresponding JAR file from ./opt to ./lib.&lt;/li&gt;
&lt;li&gt;The machine learning library (flink-ml) has been removed in preparation for
&lt;a href=&quot;https://docs.google.com/document/d/1StObo1DLp8iiy0rbukx8kwAJb0BwDZrQrMWub3DzsEo/edit&quot;&gt;FLIP-39&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;The old DataSet and DataStream Python APIs have been removed in favor of
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-38%3A+Python+Table+API&quot;&gt;FLIP-38&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Flink can be compiled and run on Java 9. Note that certain components
interacting with external systems (connectors, filesystems, reporters) may
not work since the respective projects may have skipped Java 9 support.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.9/release-notes/flink-1.9.html&quot;&gt;release
notes&lt;/a&gt;
for a more detailed list of changes and new features if you plan to upgrade
your Flink setup to Flink 1.9.0.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;We would like to thank all contributors who have made this release possible:&lt;/p&gt;
&lt;p&gt;Abdul Qadeer (abqadeer), Aitozi, Alberto Romero, Aleksey Pak, Alexander
Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrew Duffy, Andrey Zagrebin,
Ankur, Artsem Semianenka, Benchao Li, Biao Liu, Bo WANG, Bowen L, Chesnay
Schepler, Clark Yang, Congxian Qiu, Cristian, Danny Chan, David Moravek, Dawid
Wysakowicz, Dian Fu, EronWright, Fabian Hueske, Fabio Lombardelli, Fokko
Driesprong, Gao Yun, Gary Yao, Gen Luo, Gyula Fora, Hequn Cheng,
Hongtao Zhang, Huang Xingbo, HuangXingBo, Hugo Da Cruz Louro, Humberto
Rodríguez A, Hwanju Kim, Igal Shilman, Jamie Grier, Jark Wu, Jason, Jasper
Yue, Jeff Zhang, Jiangjie (Becket) Qin, Jiezhi.G, Jincheng Sun, Jing Zhang,
Jingsong Lee, Juan Gentile, Jungtaek Lim, Kailash Dayanand, Kevin
Bohinski, Konstantin Knauf, Konstantinos Papadopoulos, Kostas Kloudas, Kurt
Young, Lakshmi, Lakshmi Gururaja Rao, Leeviiii, LouisXu, Maximilian Michels,
Nico Kruber, Niels Basjes, Paul Lam, PengFei Li, Peter Huang, Pierre Zemb,
Piotr Nowojski, Piyush Narang, Richard Deurwaarder, Robert Metzger, Robert
Stoll, Romano Vacca, Rong Rong, Rui Li, Ryantaocer, Scott Mitchell, Seth
Wiesman, Shannon Carey, Shimin Yang, Stefan Richter, Stephan Ewen, Stephen
Connolly, Steven Wu, SuXingLee, TANG Wen-hui, Thomas Weise, Till Rohrmann,
Timo Walther, Tom Goong, TsReaper, Tzu-Li (Gordon) Tai, Ufuk Celebi,
Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Xintong Song, Xpray,
XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yangze Guo, Yu Li, Yun Gao, Yun
Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, Zili
Chen, aloys, arganzheng, azagrebin, bd2019us, beyond1920, biao.liub,
blueszheng, boshu Zheng, chenqi, chummyhe89, chunpinghe, dcadmin,
dianfu, godfrey he, guanghui01.rong, hehuiyuan, hello, hequn8128,
jackyyin, joongkeun.yang, klion26, lamber-ken, leesf, liguowei,
lincoln-lil, liyafan82, luoqi, mans2singh, maqingxiang, maxin, mjl, okidogi,
ozan, potseluev, qiangsi.lq, qiaoran, robbinli, shaoxuan-wang, shengqian.zhou,
shenlang.sl, shuai-xu, sunhaibotb, tianchen, tianchen92,
tison, tom_gong, vinoyang, vthinkxie, wanggeng3, wenhuitang, winifredtamg,
xl38154, xuyang1706, yangfei5, yanghua, yuzhao.cyz,
zhangxin516, zhangxinxing, zhaofaxian, zhijiang, zjuwangg, 林小铂,
黄培松, 时无两丶.&lt;/p&gt;
</description>
<pubDate>Thu, 22 Aug 2019 04:30:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/08/22/release-1.9.0.html</link>
<guid isPermaLink="true">/news/2019/08/22/release-1.9.0.html</guid>
</item>
<item>
<title>Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing</title>
<description>&lt;style type=&quot;text/css&quot;&gt;
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{padding:10px 10px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
.tg .tg-wide{padding:10px 30px;}
.tg .tg-top{vertical-align:top}
.tg .tg-topcenter{text-align:center;vertical-align:top}
.tg .tg-center{text-align:center;vertical-align:center}
&lt;/style&gt;
&lt;p&gt;In a &lt;a href=&quot;/2019/06/05/flink-network-stack.html&quot;&gt;previous blog post&lt;/a&gt;, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the network stack will be further examined in a future post. If you are unfamiliar with the network stack we highly recommend reading the &lt;a href=&quot;/2019/06/05/flink-network-stack.html&quot;&gt;network stack deep-dive&lt;/a&gt; first and then continuing here.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#monitoring&quot; id=&quot;markdown-toc-monitoring&quot;&gt;Monitoring&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#backpressure-monitor&quot; id=&quot;markdown-toc-backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#network-metrics&quot; id=&quot;markdown-toc-network-metrics&quot;&gt;Network Metrics&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#backpressure&quot; id=&quot;markdown-toc-backpressure&quot;&gt;Backpressure&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#resource-usage--throughput&quot; id=&quot;markdown-toc-resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#latency-tracking&quot; id=&quot;markdown-toc-latency-tracking&quot;&gt;Latency Tracking&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;monitoring&quot;&gt;Monitoring&lt;/h2&gt;
&lt;p&gt;Probably the most important part of network monitoring is &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;monitoring backpressure&lt;/a&gt;, a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;The receiver is slow.&lt;br /&gt;
This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The network channel is slow.&lt;br /&gt;
Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous &lt;a href=&quot;https://www.ververica.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines&quot;&gt;capacity planning blog post&lt;/a&gt; provides some more insights.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through &lt;a href=&quot;https://www.ververica.com/blog/how-flink-handles-backpressure&quot;&gt;this blog post on backpressure&lt;/a&gt; from 2015.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;directly via Flink’s web UI and its backpressure monitor, or&lt;/li&gt;
&lt;li&gt;indirectly through some of the network metrics.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future investigations will most likely be the operator after the last one that is experiencing backpressure. This specific operator is also highly likely to cause the backpressure in the first place.&lt;/p&gt;
&lt;h3 id=&quot;backpressure-monitor&quot;&gt;Backpressure Monitor&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/back_pressure.html&quot;&gt;backpressure monitor&lt;/a&gt; is only exposed via Flink’s web UI². Since it’s an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks’ threads on all TaskManagers via &lt;code&gt;Thread.getStackTrace()&lt;/code&gt; and computes the number of samples where tasks were blocked on a buffer request. These tasks were either unable to send network buffers at the rate they were produced, or the downstream task(s) were slow at processing them and gave no credits for sending. The backpressure monitor will show the ratio of blocked to total requests. Since some backpressure is considered normal / temporary, it will show a status of&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;span style=&quot;color:green&quot;&gt;OK&lt;/span&gt; for &lt;code&gt;ratio ≤ 0.10&lt;/code&gt;,&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color:orange&quot;&gt;LOW&lt;/span&gt; for &lt;code&gt;0.10 &amp;lt; Ratio ≤ 0.5&lt;/code&gt;, and&lt;/li&gt;
&lt;li&gt;&lt;span style=&quot;color:red&quot;&gt;HIGH&lt;/span&gt; for &lt;code&gt;0.5 &amp;lt; Ratio ≤ 1&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-07-23-network-stack-2/back_pressure_sampling_high.png&quot; width=&quot;600px&quot; alt=&quot;Backpressure sampling:high&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;sup&gt;2&lt;/sup&gt; You may also access the backpressure monitor via the REST API: &lt;code&gt;/jobs/:jobid/vertices/:vertexid/backpressure&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance.&lt;/p&gt;
&lt;h2 id=&quot;network-metrics&quot;&gt;Network Metrics&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#network&quot;&gt;Network&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#io&quot;&gt;task I/O&lt;/a&gt; metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most relevant metrics for users are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;&lt;br /&gt;
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
While interpreting &lt;code&gt;inPoolUsage&lt;/code&gt; in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt;, &lt;code&gt;inPoolUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;&lt;br /&gt;
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools.
Starting with Flink 1.9, &lt;code&gt;inPoolUsage&lt;/code&gt; is the sum of &lt;code&gt;floatingBuffersUsage&lt;/code&gt; and &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;numRecordsOut&lt;/code&gt;, &lt;code&gt;numRecordsIn&lt;/code&gt;&lt;br /&gt;
Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;numBytesOut&lt;/code&gt;, &lt;code&gt;numBytesInLocal&lt;/code&gt;, &lt;code&gt;numBytesInRemote&lt;/code&gt;&lt;br /&gt;
The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via &lt;code&gt;…PerSecond&lt;/code&gt; metrics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;numBuffersOut&lt;/code&gt;, &lt;code&gt;numBuffersInLocal&lt;/code&gt;, &lt;code&gt;numBuffersInRemote&lt;/code&gt;&lt;br /&gt;
Similar to &lt;code&gt;numBytes…&lt;/code&gt; but counting the number of network buffers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
For the sake of completeness and since they have been used in the past, we will briefly look at the &lt;code&gt;outputQueueLength&lt;/code&gt; and &lt;code&gt;inputQueueLength&lt;/code&gt; metrics. These are somewhat similar to the &lt;code&gt;[out,in]PoolUsage&lt;/code&gt; metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there is also a special subtlety with local channels: since a local input channel does not have its own queue (it works with the output queue directly), its value will always be &lt;code&gt;0&lt;/code&gt; for that channel (see &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12576&quot;&gt;FLINK-12576&lt;/a&gt;) and for the case where you only have local input channels, then &lt;code&gt;inputQueueLength = 0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Overall, &lt;strong&gt;we discourage the use of&lt;/strong&gt; &lt;code&gt;outputQueueLength&lt;/code&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;code&gt;inputQueueLength&lt;/code&gt; because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various &lt;code&gt;*PoolUsage&lt;/code&gt; metrics which even reveal more detailed insight.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
If you reason about buffer usage, please keep the following in mind:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5).
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;The receiver will only release a received buffer after deserialising the last record in it.&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.&lt;/p&gt;
&lt;h3 id=&quot;backpressure&quot;&gt;Backpressure&lt;/h3&gt;
&lt;p&gt;Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;If a subtask’s&lt;/strong&gt; &lt;code&gt;outPoolUsage&lt;/code&gt; &lt;strong&gt;is 100%&lt;/strong&gt;, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the &lt;code&gt;RecordWriters&lt;/code&gt; are currently writing into.&lt;br /&gt;
&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;&quot;&gt;&lt;/span&gt; This is different to what the backpressure monitor is showing!&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;An &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain &lt;code&gt;#exclusive buffers&lt;/code&gt; credits), they are reserved for a sender’s backlog and wait for data, they may contain data and are enqueued in an input channel, or they may contain data and are being read by the receiver’s subtask (one record at a time).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:orange&quot;&gt;up to Flink 1.8:&lt;/span&gt;&lt;/strong&gt; Due to &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11082&quot;&gt;FLINK-11082&lt;/a&gt;, an &lt;code&gt;inPoolUsage&lt;/code&gt; of 100% is quite common even in normal situations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9 and above:&lt;/span&gt;&lt;/strong&gt; If &lt;code&gt;inPoolUsage&lt;/code&gt; is constantly around 100%, this is a strong indicator for exercising backpressure upstream.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
&lt;th class=&quot;tg-center&quot;&gt;&lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot;&gt;&lt;code&gt;inPoolUsage&lt;/code&gt; low&lt;/th&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressured, temporary situation: upstream is not backpressured yet or not anymore)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot; rowspan=&quot;2&quot;&gt;
&lt;code&gt;inPoolUsage&lt;/code&gt; high&lt;br /&gt;
(&lt;strong&gt;&lt;span style=&quot;color:green&quot;&gt;Flink 1.9+&lt;/span&gt;&lt;/strong&gt;)&lt;/th&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;
if all upstream tasks’&lt;code&gt;outPoolUsage&lt;/code&gt; are low: &lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(may eventually cause backpressure)&lt;/td&gt;
&lt;td class=&quot;tg-topcenter&quot; rowspan=&quot;2&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressured by downstream task(s) or network, probably forwarding backpressure upstream)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tg-topcenter&quot;&gt;if any upstream task’s&lt;code&gt;outPoolUsage&lt;/code&gt; is high: &lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(may exercise backpressure upstream and may be the source of backpressure)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;
We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;If all subtasks of the receiver task have low &lt;code&gt;inPoolUsage&lt;/code&gt; values and any upstream subtask’s &lt;code&gt;outPoolUsage&lt;/code&gt; is high, then there may be a network bottleneck causing backpressure.
Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the &lt;a href=&quot;#span-classlabel-label-info-styledisplay-inline-blockspan-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressurespan&quot;&gt;What to do with backpressure?&lt;/a&gt; box below.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-flink-19-and-above&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Flink 1.9 and above&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;If &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;:
&lt;ul&gt;
&lt;li&gt;Assuming that &lt;code&gt;floatingBuffersUsage&lt;/code&gt; is around 100%, the higher the &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; the more input channels are backpressured. In an extreme case of &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; being close to 100%, it means that all channels are backpressured.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;br /&gt;
The relation between &lt;code&gt;exclusiveBuffersUsage&lt;/code&gt;, &lt;code&gt;floatingBuffersUsage&lt;/code&gt;, and the upstream tasks’ &lt;code&gt;outPoolUsage&lt;/code&gt; is summarised in the following table and extends on the table above with &lt;code&gt;inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage&lt;/code&gt;:&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; low&lt;/th&gt;
&lt;th&gt;&lt;code&gt;exclusiveBuffersUsage&lt;/code&gt; high&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
&lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
&lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
&lt;td class=&quot;tg-center&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-ok-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:green;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;/td&gt;
&lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
&lt;code&gt;floatingBuffersUsage&lt;/code&gt; low +&lt;br /&gt;
&lt;em&gt;any&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
&lt;td class=&quot;tg-center&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(potential network bottleneck)&lt;/td&gt;
&lt;td class=&quot;tg-center&quot;&gt;-&lt;sup&gt;3&lt;/sup&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
&lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
&lt;em&gt;all&lt;/em&gt; upstream &lt;code&gt;outPoolUsage&lt;/code&gt; low&lt;/th&gt;
&lt;td class=&quot;tg-center&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressure eventually appears on only some of the input channels)&lt;/td&gt;
&lt;td class=&quot;tg-center&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:orange;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressure eventually appears on most or all of the input channels)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-top&quot; style=&quot;min-width:33%;&quot;&gt;
&lt;code&gt;floatingBuffersUsage&lt;/code&gt; high +&lt;br /&gt;
any upstream &lt;code&gt;outPoolUsage&lt;/code&gt; high&lt;/th&gt;
&lt;td class=&quot;tg-center&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressure on only some of the input channels)&lt;/td&gt;
&lt;td class=&quot;tg-center&quot;&gt;
&lt;span class=&quot;glyphicon glyphicon-remove-sign&quot; aria-hidden=&quot;true&quot; style=&quot;color:red;font-size:1.5em;&quot;&gt;&lt;/span&gt;&lt;br /&gt;
(backpressure on most or all of the input channels)&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;sup&gt;3&lt;/sup&gt; this should not happen&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;resource-usage--throughput&quot;&gt;Resource Usage / Throughput&lt;/h3&gt;
&lt;p&gt;Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Low throughput with frequent &lt;code&gt;outPoolUsage&lt;/code&gt; values around 100% but low &lt;code&gt;inPoolUsage&lt;/code&gt; on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; parameter or try disabling credit-based flow control to verify.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Combining &lt;code&gt;numRecordsOut&lt;/code&gt; and &lt;code&gt;numBytesOut&lt;/code&gt; helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;If you want to reason about buffer fill rates and the influence of the output flusher, you may combine &lt;code&gt;numBytesInRemote&lt;/code&gt; with &lt;code&gt;numBuffersInRemote&lt;/code&gt;. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout.
Please note that, as of Flink 1.8 and 1.9, &lt;code&gt;numBuffersOut&lt;/code&gt; only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;h3 class=&quot;no_toc&quot; id=&quot;span-classglyphicon-glyphicon-info-sign-aria-hiddentruespan-what-to-do-with-backpressure&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; What to do with Backpressure?&lt;/h3&gt;
&lt;p&gt;Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.&lt;/p&gt;
&lt;p&gt;Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. In that case, you can often just ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.&lt;/p&gt;
&lt;h4 id=&quot;system-resources&quot;&gt;System Resources&lt;/h4&gt;
&lt;p&gt;Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Try to optimise your code. Code profilers are helpful in this case.&lt;/li&gt;
&lt;li&gt;Tune Flink for that specific resource.&lt;/li&gt;
&lt;li&gt;Scale out by increasing the parallelism and/or increasing the number of machines in the cluster.&lt;/li&gt;
&lt;/ol&gt;
&lt;h4 id=&quot;garbage-collection&quot;&gt;Garbage Collection&lt;/h4&gt;
&lt;p&gt;Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -&lt;code&gt;XX:+PrintGCDetails&lt;/code&gt;) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here (&lt;a href=&quot;https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/index.html&quot;&gt;Oracle’s Garbage Collection Tuning Guide&lt;/a&gt; or &lt;a href=&quot;https://plumbr.io/java-garbage-collection-handbook&quot;&gt;Plumbr’s Java Garbage Collection handbook&lt;/a&gt; seem like a good start).&lt;/p&gt;
&lt;h4 id=&quot;cputhread-bottleneck&quot;&gt;CPU/Thread Bottleneck&lt;/h4&gt;
&lt;p&gt;Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads’ CPU usage, for example.&lt;/p&gt;
&lt;h4 id=&quot;thread-contention&quot;&gt;Thread Contention&lt;/h4&gt;
&lt;p&gt;Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around the shared &lt;code&gt;/dev/urandom&lt;/code&gt; resource.&lt;/p&gt;
&lt;h4 id=&quot;load-imbalance&quot;&gt;Load Imbalance&lt;/h4&gt;
&lt;p&gt;If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised.&lt;/p&gt;
&lt;/div&gt;
&lt;h3 id=&quot;latency-tracking&quot;&gt;Latency Tracking&lt;/h3&gt;
&lt;p&gt;Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same time — and block the receiver from processing incoming records — the time inside the network stack for following records is extended dramatically. We highly recommend adding your own metrics to your Flink job for better latency tracking in your job’s components and a broader view on the cause of delays.&lt;/p&gt;
&lt;p&gt;Flink offers some support for &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/metrics.html#latency-tracking&quot;&gt;tracking the latency&lt;/a&gt; of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-latency-interval&quot;&gt;configuration via &lt;code&gt;metrics.latency.interval&lt;/code&gt;&lt;/a&gt; or via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setLatencyTrackingInterval-long-&quot;&gt;ExecutionConfig#setLatencyTrackingInterval()&lt;/a&gt;. Once enabled, Flink will collect latency histograms based on the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#metrics-latency-granularity&quot;&gt;granularity defined via &lt;code&gt;metrics.latency.granularity&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;single&lt;/code&gt;: one histogram for each operator subtask&lt;/li&gt;
&lt;li&gt;&lt;code&gt;operator&lt;/code&gt; (default): one histogram for each combination of source task and operator subtask&lt;/li&gt;
&lt;li&gt;&lt;code&gt;subtask&lt;/code&gt;: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, &lt;em&gt;a latency marker does not enter application logic&lt;/em&gt; and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not a full “end-to-end” latency. User code indirectly influences these waiting times, though!&lt;/p&gt;
&lt;p&gt;Since &lt;code&gt;LatencyMarkers&lt;/code&gt; sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most &lt;code&gt;buffer_timeout / 2&lt;/code&gt;. This delay will add to each network connection towards a subtask and should be taken into account when analysing a subtask’s latency metric.&lt;/p&gt;
&lt;p&gt;By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
Flink’s latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results.&lt;/p&gt;
&lt;/div&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-warning&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-warning-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Warning&lt;/span&gt;
Enabling latency metrics can significantly impact the performance of the cluster (in particular for &lt;code&gt;subtask&lt;/code&gt; granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes.&lt;/p&gt;
&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In the previous sections we discussed how to monitor Flink’s network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network layer itself but, in most cases, is caused by some subtask under high load. These two scenarios can be distinguished from one another by analysing the metrics as described above. We also provided some hints at monitoring resource usage and tracking network latencies that may add up from sources to sinks.&lt;/p&gt;
&lt;p&gt;Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.&lt;/p&gt;
</description>
<pubDate>Tue, 23 Jul 2019 17:30:00 +0200</pubDate>
<link>https://flink.apache.org/2019/07/23/flink-network-stack-2.html</link>
<guid isPermaLink="true">/2019/07/23/flink-network-stack-2.html</guid>
</item>
<item>
<title>Apache Flink 1.8.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 40 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all improvements, sub-tasks and bug fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.8.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.8.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10921&quot;&gt;FLINK-10921&lt;/a&gt;] - Prioritize shard consumers in Kinesis Consumer by event time
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12617&quot;&gt;FLINK-12617&lt;/a&gt;] - StandaloneJobClusterEntrypoint should default to random JobID for non-HA setups
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9445&quot;&gt;FLINK-9445&lt;/a&gt;] - scala-shell uses plain java command
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] - Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10941&quot;&gt;FLINK-10941&lt;/a&gt;] - Slots prematurely released which still contain unconsumed data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11059&quot;&gt;FLINK-11059&lt;/a&gt;] - JobMaster may continue using an invalid slot if releasing idle slot meet a timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11107&quot;&gt;FLINK-11107&lt;/a&gt;] - Avoid memory stateBackend to create arbitrary folders under HA path when no checkpoint path configured
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11897&quot;&gt;FLINK-11897&lt;/a&gt;] - ExecutionGraphSuspendTest does not wait for all tasks to be submitted
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11915&quot;&gt;FLINK-11915&lt;/a&gt;] - DataInputViewStream skip returns wrong value
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11987&quot;&gt;FLINK-11987&lt;/a&gt;] - Kafka producer occasionally throws NullpointerException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12009&quot;&gt;FLINK-12009&lt;/a&gt;] - Wrong check message about heartbeat interval for HeartbeatServices
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12042&quot;&gt;FLINK-12042&lt;/a&gt;] - RocksDBStateBackend mistakenly uses default filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12112&quot;&gt;FLINK-12112&lt;/a&gt;] - AbstractTaskManagerProcessFailureRecoveryTest process output logging does not work properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12132&quot;&gt;FLINK-12132&lt;/a&gt;] - The example in /docs/ops/deployment/yarn_setup.md should be updated due to the change FLINK-2021
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12184&quot;&gt;FLINK-12184&lt;/a&gt;] - HistoryServerArchiveFetcher isn&amp;#39;t compatible with old version
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12219&quot;&gt;FLINK-12219&lt;/a&gt;] - Yarn application can&amp;#39;t stop when flink job failed in per-job yarn cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12247&quot;&gt;FLINK-12247&lt;/a&gt;] - fix NPE when writing an archive file to a FileSystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12260&quot;&gt;FLINK-12260&lt;/a&gt;] - Slot allocation failure by taskmanager registration timeout and race
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12296&quot;&gt;FLINK-12296&lt;/a&gt;] - Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12297&quot;&gt;FLINK-12297&lt;/a&gt;] - Make ClosureCleaner recursive
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12301&quot;&gt;FLINK-12301&lt;/a&gt;] - Scala value classes inside case classes cannot be serialized anymore in Flink 1.8.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12342&quot;&gt;FLINK-12342&lt;/a&gt;] - Yarn Resource Manager Acquires Too Many Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12375&quot;&gt;FLINK-12375&lt;/a&gt;] - flink-container job jar does not have read permissions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12416&quot;&gt;FLINK-12416&lt;/a&gt;] - Docker build script fails on symlink creation ln -s
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12544&quot;&gt;FLINK-12544&lt;/a&gt;] - Deadlock while releasing memory and requesting segment concurrent in SpillableSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12547&quot;&gt;FLINK-12547&lt;/a&gt;] - Deadlock when the task thread downloads jars using BlobClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12646&quot;&gt;FLINK-12646&lt;/a&gt;] - Use reserved IP as unrouteable IP in RestClientTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12688&quot;&gt;FLINK-12688&lt;/a&gt;] - Make serializer lazy initialization thread safe in StateDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12740&quot;&gt;FLINK-12740&lt;/a&gt;] - SpillableSubpartitionTest deadlocks on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12835&quot;&gt;FLINK-12835&lt;/a&gt;] - Time conversion is wrong in ManualClock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12863&quot;&gt;FLINK-12863&lt;/a&gt;] - Race condition between slot offerings and AllocatedSlotReport
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12865&quot;&gt;FLINK-12865&lt;/a&gt;] - State inconsistency between RM and TM on the slot status
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12871&quot;&gt;FLINK-12871&lt;/a&gt;] - Wrong SSL setup examples in docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12895&quot;&gt;FLINK-12895&lt;/a&gt;] - TaskManagerProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure failed on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12896&quot;&gt;FLINK-12896&lt;/a&gt;] - TaskCheckpointStatisticDetailsHandler uses wrong value for JobID when archiving
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11126&quot;&gt;FLINK-11126&lt;/a&gt;] - Filter out AMRMToken in the TaskManager credentials
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12137&quot;&gt;FLINK-12137&lt;/a&gt;] - Add more proper explanation on flink streaming connectors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12169&quot;&gt;FLINK-12169&lt;/a&gt;] - Improve Javadoc of MessageAcknowledgingSourceBase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12378&quot;&gt;FLINK-12378&lt;/a&gt;] - Consolidate FileSystem Documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12391&quot;&gt;FLINK-12391&lt;/a&gt;] - Add timeout to transfer.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12539&quot;&gt;FLINK-12539&lt;/a&gt;] - StreamingFileSink: Make the class extendable to customize for different usecases
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12350&quot;&gt;FLINK-12350&lt;/a&gt;] - RocksDBStateBackendTest doesn&amp;#39;t cover the incremental checkpoint code path
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-12460&quot;&gt;FLINK-12460&lt;/a&gt;] - Change taskmanager.tmp.dirs to io.tmp.dirs in configuration docs
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 02 Jul 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/07/02/release-1.8.1.html</link>
<guid isPermaLink="true">/news/2019/07/02/release-1.8.1.html</guid>
</item>
<item>
<title>A Practical Guide to Broadcast State in Apache Flink</title>
<description>&lt;p&gt;Since version 1.5.0, Apache Flink features a new type of state which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream. We walk you through the processing steps and the source code to implement this application in practice.&lt;/p&gt;
&lt;h2 id=&quot;what-is-broadcast-state&quot;&gt;What is Broadcast State?&lt;/h2&gt;
&lt;p&gt;The Broadcast State can be used to combine and jointly process two streams of events in a specific way. The events of the first stream are broadcasted to all parallel instances of an operator, which maintains them as state. The events of the other stream are not broadcasted but sent to individual instances of the same operator and processed together with the events of the broadcasted stream.
The new broadcast state is a natural fit for applications that need to join a low-throughput and a high-throughput stream or need to dynamically update their processing logic. We will use a concrete example of the latter use case to explain the broadcast state and show its API in more detail in the remainder of this post.&lt;/p&gt;
&lt;h2 id=&quot;dynamic-pattern-evaluation-with-broadcast-state&quot;&gt;Dynamic Pattern Evaluation with Broadcast State&lt;/h2&gt;
&lt;p&gt;Imagine an e-commerce website that captures the interactions of all users as a stream of user actions. The company that operates the website is interested in analyzing the interactions to increase revenue, improve the user experience, and detect and prevent malicious behavior.
The website implements a streaming application that detects a pattern on the stream of user events. However, the company wants to avoid modifying and redeploying the application every time the pattern changes. Instead, the application ingests a second stream of patterns and updates its active pattern when it receives a new pattern from the pattern stream. In the following, we discuss this application step-by-step and show how it leverages the broadcast state feature in Apache Flink.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig1.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Our example application ingests two data streams. The first stream provides user actions on the website and is illustrated on the top left side of the above figure. A user interaction event consists of the type of the action (user login, user logout, add to cart, or complete payment) and the id of the user, which is encoded by color. The user action event stream in our illustration contains a logout action of User 1001 followed by a payment-complete event for User 1003, and an “add-to-cart” action of User 1002.&lt;/p&gt;
&lt;p&gt;The second stream provides action patterns that the application will evaluate. A pattern consists of two consecutive actions. In the figure above, the pattern stream contains the following two:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Pattern #1: A user logs in and immediately logs out without browsing additional pages on the e-commerce website.&lt;/li&gt;
&lt;li&gt;Pattern #2: A user adds an item to the shopping cart and logs out without completing the purchase.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Such patterns help a business in better analyzing user behavior, detecting malicious actions, and improving the website experience. For example, in the case of items being added to a shopping cart with no follow up purchase, the website team can take appropriate actions to understand better the reasons why users don’t complete a purchase and initiate specific programs to improve the website conversion (such as providing discount codes, limited free shipping offers etc.)&lt;/p&gt;
&lt;p&gt;On the right-hand side, the figure shows three parallel tasks of an operator that ingest the pattern and user action streams, evaluate the patterns on the action stream, and emit pattern matches downstream. For the sake of simplicity, the operator in our example only evaluates a single pattern with exactly two subsequent actions. The currently active pattern is replaced when a new pattern is received from the pattern stream. In principle, the operator could also be implemented to evaluate more complex patterns or multiple patterns concurrently which could be individually added or removed.&lt;/p&gt;
&lt;p&gt;We will describe how the pattern matching application processes the user action and pattern streams.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig2.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;First a pattern is sent to the operator. The pattern is broadcasted to all three parallel tasks of the operator. The tasks store the pattern in their broadcast state. Since the broadcast state should only be updated using broadcasted data, the state of all tasks is always expected to be the same.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig3.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Next, the first user actions are partitioned on the user id and shipped to the operator tasks. The partitioning ensures that all actions of the same user are processed by the same task. The figure above shows the state of the application after the first pattern and the first three action events were consumed by the operator tasks.&lt;/p&gt;
&lt;p&gt;When a task receives a new user action, it evaluates the currently active pattern by looking at the user’s latest and previous actions. For each user, the operator stores the previous action in the keyed state. Since the tasks in the figure above only received a single action for each user so far (we just started the application), the pattern does not need to be evaluated. Finally, the previous action in the user’s keyed state is updated to the latest action, to be able to look it up when the next action of the same user arrives.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig4.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;After the first three actions are processed, the next event, the logout action of User 1001, is shipped to the task that processes the events of User 1001. When the task receives the actions, it looks up the current pattern from the broadcast state and the previous action of User 1001. Since the pattern matches both actions, the task emits a pattern match event. Finally, the task updates its keyed state by overriding the previous event with the latest action.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig5.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;When a new pattern arrives in the pattern stream, it is broadcasted to all tasks and each task updates its broadcast state by replacing the current pattern with the new one.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/broadcastState/fig6.png&quot; width=&quot;600px&quot; alt=&quot;Broadcast State in Apache Flink.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Once the broadcast state is updated with a new pattern, the matching logic continues as before, i.e., user action events are partitioned by key and evaluated by the responsible task.&lt;/p&gt;
&lt;h2 id=&quot;how-to-implement-an-application-with-broadcast-state&quot;&gt;How to Implement an Application with Broadcast State?&lt;/h2&gt;
&lt;p&gt;Until now, we conceptually discussed the application and explained how it uses broadcast state to evaluate dynamic patterns over event streams. Next, we’ll show how to implement the example application with Flink’s DataStream API and the broadcast state feature.&lt;/p&gt;
&lt;p&gt;Let’s start with the input data of the application. We have two data streams, actions, and patterns. At this point, we don’t really care where the streams come from. The streams could be ingested from Apache Kafka or Kinesis or any other system. Action and Pattern are Pojos with two fields each:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actions&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patterns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;code&gt;Action&lt;/code&gt; and &lt;code&gt;Pattern&lt;/code&gt; are Pojos with two fields each:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Action: Long userId, String action&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;Pattern: String firstAction, String secondAction&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a first step, we key the action stream on the &lt;code&gt;userId&lt;/code&gt; attribute.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;KeyedStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actionsByUser&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actions&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KeySelector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;userId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Next, we prepare the broadcast state. Broadcast state is always represented as &lt;code&gt;MapState&lt;/code&gt;, the most versatile state primitive that Flink provides.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcStateDescriptor&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;patterns&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;VOID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;POJO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since our application only evaluates and stores a single &lt;code&gt;Pattern&lt;/code&gt; at a time, we configure the broadcast state as a &lt;code&gt;MapState&lt;/code&gt; with key type &lt;code&gt;Void&lt;/code&gt; and value type &lt;code&gt;Pattern&lt;/code&gt;. The &lt;code&gt;Pattern&lt;/code&gt; is always stored in the &lt;code&gt;MapState&lt;/code&gt; with &lt;code&gt;null&lt;/code&gt; as key.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;BroadcastStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcedPatterns&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patterns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;broadcast&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bcStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Using the &lt;code&gt;MapStateDescriptor&lt;/code&gt; for the broadcast state, we apply the &lt;code&gt;broadcast()&lt;/code&gt; transformation on the patterns stream and receive a &lt;code&gt;BroadcastStream bcedPatterns&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;matches&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;actionsByUser&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;bcedPatterns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;process&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PatternEvaluator&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;After we obtained the keyed &lt;code&gt;actionsByUser&lt;/code&gt; stream and the broadcasted &lt;code&gt;bcedPatterns&lt;/code&gt; stream, we &lt;code&gt;connect()&lt;/code&gt; both streams and apply a &lt;code&gt;PatternEvaluator&lt;/code&gt; on the connected streams. &lt;code&gt;PatternEvaluator&lt;/code&gt; is a custom function that implements the &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; interface. It applies the pattern matching logic that we discussed before and emits &lt;code&gt;Tuple2&amp;lt;Long, Pattern&amp;gt;&lt;/code&gt; records which contain the user id and the matched pattern.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PatternEvaluator&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;KeyedBroadcastProcessFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// handle for keyed state (per user)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ValueState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// broadcast state descriptor&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;conf&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// initialize keyed state&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;lastAction&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MapStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;patterns&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;VOID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;POJO&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/**&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Called for each user action.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Evaluates the current pattern against the previous and&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * current action of the user.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Action&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ReadOnlyContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// get current pattern from broadcast state&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// access MapState with null as VOID default value&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// get previous action of current user from keyed state&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// user had an action before, check if pattern matches&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;firstAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prevAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;secondAction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;equals&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// MATCH&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getCurrentKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// update keyed state and remember action for next pattern evaluation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;prevActionState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;action&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt;/**&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Called for each new pattern.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; * Overwrites the current pattern with the new pattern.&lt;/span&gt;
&lt;span class=&quot;cm&quot;&gt; */&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;processBroadcastElement&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Context&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// store the new pattern by updating the broadcast state&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;BroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Void&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;bcState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ctx&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBroadcastState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;patternDesc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// storing in MapState with null as VOID default value&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;bcState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;put&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kc&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; interface provides three methods to process records and emit results.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;processBroadcastElement()&lt;/code&gt; is called for each record of the broadcasted stream. In our &lt;code&gt;PatternEvaluator&lt;/code&gt; function, we simply put the received &lt;code&gt;Pattern&lt;/code&gt; record in to the broadcast state using the &lt;code&gt;null&lt;/code&gt; key (remember, we only store a single pattern in the &lt;code&gt;MapState&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;code&gt;processElement()&lt;/code&gt; is called for each record of the keyed stream. It provides read-only access to the broadcast state to prevent modification that result in different broadcast states across the parallel instances of the function. The &lt;code&gt;processElement()&lt;/code&gt; method of the &lt;code&gt;PatternEvaluator&lt;/code&gt; retrieves the current pattern from the broadcast state and the previous action of the user from the keyed state. If both are present, it checks whether the previous and current action match with the pattern and emits a pattern match record if that is the case. Finally, it updates the keyed state to the current user action.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;onTimer()&lt;/code&gt; is called when a previously registered timer fires. Timers can be registered in the &lt;code&gt;processElement&lt;/code&gt; method and are used to perform computations or to clean up state in the future. We did not implement this method in our example to keep the code concise. However, it could be used to remove the last action of a user when the user was not active for a certain period of time to avoid growing state due to inactive users.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You might have noticed the context objects of the &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt;’s processing method. The context objects give access to additional functionality such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The broadcast state (read-write or read-only, depending on the method),&lt;/li&gt;
&lt;li&gt;A &lt;code&gt;TimerService&lt;/code&gt;, which gives access to the record’s timestamp, the current watermark, and which can register timers,&lt;/li&gt;
&lt;li&gt;The current key (only available in &lt;code&gt;processElement()&lt;/code&gt;), and&lt;/li&gt;
&lt;li&gt;A method to apply a function the keyed state of each registered key (only available in &lt;code&gt;processBroadcastElement()&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;KeyedBroadcastProcessFunction&lt;/code&gt; has full access to Flink state and time features just like any other ProcessFunction and hence can be used to implement sophisticated application logic. Broadcast state was designed to be a versatile feature that adapts to different scenarios and use cases. Although we only discussed a fairly simple and restricted application, you can use broadcast state in many ways to implement the requirements of your application.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this blog post, we walked you through an example application to explain what Apache Flink’s broadcast state is and how it can be used to evaluate dynamic patterns on event streams. We’ve also discussed the API and showed the source code of our example application.&lt;/p&gt;
&lt;p&gt;We invite you to check the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html&quot;&gt;documentation&lt;/a&gt; of this feature and provide feedback or suggestions for further improvements through our &lt;a href=&quot;http://mail-archives.apache.org/mod_mbox/flink-community/&quot;&gt;mailing list&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Wed, 26 Jun 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/06/26/broadcast-state.html</link>
<guid isPermaLink="true">/2019/06/26/broadcast-state.html</guid>
</item>
<item>
<title>A Deep-Dive into Flink&#39;s Network Stack</title>
<description>&lt;style type=&quot;text/css&quot;&gt;
.tg {border-collapse:collapse;border-spacing:0;}
.tg td{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{padding:10px 20px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;background-color:#eff0f1;}
.tg .tg-wide{padding:10px 30px;}
.tg .tg-top{vertical-align:top}
.tg .tg-center{text-align:center;vertical-align:center}
&lt;/style&gt;
&lt;p&gt;Flink’s network stack is one of the core components that make up the &lt;code&gt;flink-runtime&lt;/code&gt; module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akka, the network stack between TaskManagers relies on a much lower-level API using Netty.&lt;/p&gt;
&lt;p&gt;This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning parameters, and common anti-patterns.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#logical-view&quot; id=&quot;markdown-toc-logical-view&quot;&gt;Logical View&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#physical-transport&quot; id=&quot;markdown-toc-physical-transport&quot;&gt;Physical Transport&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-1&quot; id=&quot;markdown-toc-inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#credit-based-flow-control&quot; id=&quot;markdown-toc-credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#inflicting-backpressure-2&quot; id=&quot;markdown-toc-inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#what-do-we-gain-where-is-the-catch&quot; id=&quot;markdown-toc-what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#writing-records-into-network-buffers-and-reading-them-again&quot; id=&quot;markdown-toc-writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flushing-buffers-to-netty&quot; id=&quot;markdown-toc-flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#buffer-builder--buffer-consumer&quot; id=&quot;markdown-toc-buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#latency-vs-throughput&quot; id=&quot;markdown-toc-latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#conclusion&quot; id=&quot;markdown-toc-conclusion&quot;&gt;Conclusion&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;logical-view&quot;&gt;Logical View&lt;/h2&gt;
&lt;p&gt;Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a &lt;code&gt;keyBy()&lt;/code&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack1.png&quot; width=&quot;400px&quot; alt=&quot;Logical View on Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;It abstracts over the different settings of the following three concepts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Subtask output type (&lt;code&gt;ResultPartitionType&lt;/code&gt;):
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;pipelined (bounded or unbounded):&lt;/strong&gt;
Sending data downstream as soon as it is produced, potentially one-by-one, either as a bounded or unbounded stream of records.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;blocking:&lt;/strong&gt;
Sending data downstream only when the full result was produced.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Scheduling type:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;all at once (eager):&lt;/strong&gt;
Deploy all subtasks of the job at the same time (for streaming applications).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;next stage on first output (lazy):&lt;/strong&gt;
Deploy downstream tasks as soon as any of their producers generated output.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;next stage on complete output:&lt;/strong&gt;
Deploy downstream tasks when any or all of their producers have generated their full output set.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Transport:
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;high throughput:&lt;/strong&gt;
Instead of sending each record one-by-one, Flink buffers a bunch of records into its network buffers and sends them altogether. This reduces the costs per record and leads to higher throughput.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;low latency via buffer timeout:&lt;/strong&gt;
By reducing the timeout of sending an incompletely filled buffer, you may sacrifice throughput for latency.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We will have a look at the throughput and low latency optimisations in the sections below which look at the physical layers of the network stack. For this part, let us elaborate a bit more on the output and scheduling types. First of all, it is important to know that the subtask output type and the scheduling type are closely intertwined making only specific combinations of the two valid.&lt;/p&gt;
&lt;p&gt;Pipelined result partitions are streaming-style outputs which need a live target subtask to send data to. The target can be scheduled before results are produced or at first output. Batch jobs produce bounded result partitions while streaming jobs produce unbounded results.&lt;/p&gt;
&lt;p&gt;Batch jobs may also produce results in a blocking fashion, depending on the operator and connection pattern that is used. In that case, the complete result must be produced first before the receiving task can be scheduled. This allows batch jobs to work more efficiently and with lower resource usage.&lt;/p&gt;
&lt;p&gt;The following table summarises the valid combinations:
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;th&gt;Output Type&lt;/th&gt;
&lt;th&gt;Scheduling Type&lt;/th&gt;
&lt;th&gt;Applies to…&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td rowspan=&quot;2&quot;&gt;pipelined, unbounded&lt;/td&gt;
&lt;td&gt;all at once&lt;/td&gt;
&lt;td&gt;Streaming jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;next stage on first output&lt;/td&gt;
&lt;td&gt;n/a¹&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td rowspan=&quot;2&quot;&gt;pipelined, bounded&lt;/td&gt;
&lt;td&gt;all at once&lt;/td&gt;
&lt;td&gt;n/a²&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;next stage on first output&lt;/td&gt;
&lt;td&gt;Batch jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;blocking&lt;/td&gt;
&lt;td&gt;next stage on complete output&lt;/td&gt;
&lt;td&gt;Batch jobs&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; Currently not used by Flink. &lt;br /&gt;
&lt;sup&gt;2&lt;/sup&gt; This may become applicable to streaming jobs once the &lt;a href=&quot;/roadmap.html#batch-and-streaming-unification&quot;&gt;Batch/Streaming unification&lt;/a&gt; is done.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;
Additionally, for subtasks with more than one input, scheduling start in two ways: after &lt;em&gt;all&lt;/em&gt; or after &lt;em&gt;any&lt;/em&gt; input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-&quot;&gt;ExecutionConfig#setExecutionMode()&lt;/a&gt; - and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionMode.html#enum.constant.detail&quot;&gt;ExecutionMode&lt;/a&gt; in particular - as well as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setDefaultInputDependencyConstraint-org.apache.flink.api.common.InputDependencyConstraint-&quot;&gt;ExecutionConfig#setDefaultInputDependencyConstraint()&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;physical-transport&quot;&gt;Physical Transport&lt;/h2&gt;
&lt;p&gt;In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/operators/#task-chaining-and-resource-groups&quot;&gt;slot sharing groups&lt;/a&gt;. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.&lt;/p&gt;
&lt;p&gt;For the example pictured below, we will assume a parallelism of 4 and a deployment with two task managers offering 2 slots each. TaskManager 1 executes subtasks A.1, A.2, B.1, and B.2 and TaskManager 2 executes subtasks A.3, A.4, B.3, and B.4. In a shuffle-type connection between task A and task B, for example from a &lt;code&gt;keyBy()&lt;/code&gt;, there are 2x4 logical connections to handle on each TaskManager, some of which are local, some remote:
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th class=&quot;tg-wide&quot;&gt;B.1&lt;/th&gt;
&lt;th class=&quot;tg-wide&quot;&gt;B.2&lt;/th&gt;
&lt;th class=&quot;tg-wide&quot;&gt;B.3&lt;/th&gt;
&lt;th class=&quot;tg-wide&quot;&gt;B.4&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-wide&quot;&gt;A.1&lt;/th&gt;
&lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
&lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-wide&quot;&gt;A.2&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-wide&quot;&gt;A.3&lt;/th&gt;
&lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;remote&lt;/td&gt;
&lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot; rowspan=&quot;2&quot;&gt;local&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&quot;tg-wide&quot;&gt;A.4&lt;/th&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Each (remote) network connection between different tasks will get its own TCP channel in Flink’s network stack. However, if different subtasks of the same task are scheduled onto the same TaskManager, their network connections towards the same TaskManagers will be multiplexed and share a single TCP channel for reduced resource usage. In our example, this would apply to A.1 → B.3, A.1 → B.4, as well as A.2 → B.3, and A.2 → B.4 as pictured below:
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack2.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The results of each subtask are called &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultPartition.html&quot;&gt;ResultPartition&lt;/a&gt;, each split into separate &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/partition/ResultSubpartition.html&quot;&gt;ResultSubpartitions&lt;/a&gt; — one for each logical channel. At this point in the stack, Flink is not dealing with individual records anymore but instead with a group of serialised records assembled together into network buffers. The number of buffers available to each subtask in its own local buffer pool (one per sending and receiving side each) is limited to at most&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;#channels * buffers-per-channel + floating-buffers-per-gate
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The total number of buffers on a single TaskManager usually does not need configuration. See the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#configuring-the-network-buffers&quot;&gt;Configuring the Network Buffers&lt;/a&gt; documentation for details on how to do so if needed.&lt;/p&gt;
&lt;h3 id=&quot;inflicting-backpressure-1&quot;&gt;Inflicting Backpressure (1)&lt;/h3&gt;
&lt;p&gt;Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition’s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask’s buffer pool, Flink will stop reading from this channel until a buffer becomes available. This would effectively backpressure all sending subtasks on this multiplex and therefore also throttle other receiving subtasks. The following picture illustrates this for an overloaded subtask B.4 which would cause backpressure on the multiplex and also stop subtask B.3 from receiving and processing further buffers, even though it still has capacity.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack3.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-backpressure-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;To prevent this situation from even happening, Flink 1.5 introduced its own flow control mechanism.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;credit-based-flow-control&quot;&gt;Credit-based Flow Control&lt;/h2&gt;
&lt;p&gt;Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of &lt;strong&gt;exclusive buffers&lt;/strong&gt;. Conversely, buffers in the local buffer pool are called &lt;strong&gt;floating buffers&lt;/strong&gt; as they will float around and are available to every input channel.&lt;/p&gt;
&lt;p&gt;Receivers will announce the availability of buffers as &lt;strong&gt;credits&lt;/strong&gt; to the sender (1 buffer = 1 credit). Each result subpartition will keep track of its &lt;strong&gt;channel credits&lt;/strong&gt;. Buffers are only forwarded to the lower network stack if credit is available and each sent buffer reduces the credit score by one. In addition to the buffers, we also send information about the current &lt;strong&gt;backlog&lt;/strong&gt; size which specifies how many buffers are waiting in this subpartition’s queue. The receiver will use this to ask for an appropriate number of floating buffers for faster backlog processing. It will try to acquire as many floating buffers as the backlog size but this may not always be possible and we may get some or no buffers at all. The receiver will make use of the retrieved buffers and will listen for further buffers becoming available to continue.
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack4.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-credit-flow-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Credit-based flow control will use &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; to specify how many buffers are exclusive (mandatory) and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate&quot;&gt;floating-buffers-per-gate&lt;/a&gt; for the local buffer pool (optional&lt;sup&gt;3&lt;/sup&gt;) thus achieving the same buffer limit as without flow control. The default values for these two parameters have been chosen so that the maximum (theoretical) throughput with flow control is at least as good as without flow control, given a healthy network with usual latencies. You may need to adjust these depending on your actual round-trip-time and bandwidth.
&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;3&lt;/sup&gt;If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).&lt;/p&gt;
&lt;h3 id=&quot;inflicting-backpressure-2&quot;&gt;Inflicting Backpressure (2)&lt;/h3&gt;
&lt;p&gt;As opposed to the receiver’s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.&lt;/p&gt;
&lt;h3 id=&quot;what-do-we-gain-where-is-the-catch&quot;&gt;What do we Gain? Where is the Catch?&lt;/h3&gt;
&lt;p&gt;&lt;img align=&quot;right&quot; src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack5.png&quot; width=&quot;300&quot; height=&quot;200&quot; alt=&quot;Physical-transport-credit-flow-checkpoints-Flink&#39;s Network Stack&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/internals/stream_checkpointing.html#checkpointing&quot;&gt;checkpoint alignments&lt;/a&gt;: without flow control, it would take a while for the channel to fill the network stack’s internal buffers and propagate that the receiver is not reading anymore. During that time, a lot of buffers could be sitting around. Any checkpoint barrier would have to queue up behind these buffers and would thus have to wait until all of those have been processed before it can start (“Barriers never overtake records!”).&lt;/p&gt;
&lt;p&gt;However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While this may affect your job’s performance, it is usually better to have flow control because of all its advantages. You may want to increase the number of exclusive buffers via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-buffers-per-channel&quot;&gt;buffers-per-channel&lt;/a&gt; at the cost of using more memory. The overall memory use compared to the previous implementation, however, may still be lower because lower network stacks do not need to buffer much data any more since we can always transfer that to Flink immediately.&lt;/p&gt;
&lt;p&gt;There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-network-memory-floating-buffers-per-gate&quot;&gt;floating-buffers-per-gate&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;table class=&quot;tg&quot;&gt;
&lt;tr&gt;
&lt;th&gt;Advantages&lt;/th&gt;
&lt;th&gt;Disadvantages&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tg-top&quot;&gt;
• better resource utilisation with data skew in multiplexed connections &lt;br /&gt;&lt;br /&gt;
• improved checkpoint alignment&lt;br /&gt;&lt;br /&gt;
• reduced memory use (less data in lower network layers)&lt;/td&gt;
&lt;td class=&quot;tg-top&quot;&gt;
• additional credit-announce messages&lt;br /&gt;&lt;br /&gt;
• additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)&lt;br /&gt;&lt;br /&gt;
• potential round-trip latency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&quot;tg-center&quot; colspan=&quot;2&quot;&gt;• backpressure appears earlier&lt;/td&gt;
&lt;/tr&gt;
&lt;/table&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;div class=&quot;alert alert-info&quot;&gt;
&lt;p&gt;&lt;span class=&quot;label label-info&quot; style=&quot;display: inline-block&quot;&gt;&lt;span class=&quot;glyphicon glyphicon-info-sign&quot; aria-hidden=&quot;true&quot;&gt;&lt;/span&gt; Note&lt;/span&gt;
If you need to turn off credit-based flow control, you can add this to your &lt;code&gt;flink-conf.yaml&lt;/code&gt;:&lt;/p&gt;
&lt;p&gt;&lt;code&gt;taskmanager.network.credit-model: false&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;writing-records-into-network-buffers-and-reading-them-again&quot;&gt;Writing Records into Network Buffers and Reading them again&lt;/h2&gt;
&lt;p&gt;The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it:
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack6.png&quot; width=&quot;700px&quot; alt=&quot;Physical-transport-complete-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;After creating a record and passing it along, for example via &lt;code&gt;Collector#collect()&lt;/code&gt;, it is given to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.html&quot;&gt;RecordWriter&lt;/a&gt; which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter first serialises the record to a flexible on-heap byte array using the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/serialization/SpanningRecordSerializer.html&quot;&gt;SpanningRecordSerializer&lt;/a&gt;. Afterwards, it tries to write these bytes into the associated network buffer of the target network channel. We will come back to this last part in the section below.&lt;/p&gt;
&lt;p&gt;On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/reader/RecordReader.html&quot;&gt;RecordReader&lt;/a&gt; and going through the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/api/serialization/SpillingAdaptiveSpanningRecordDeserializer.html&quot;&gt;SpillingAdaptiveSpanningRecordDeserializer&lt;/a&gt;. Similar to the serialiser, this deserialiser must also deal with special cases like records spanning multiple network buffers, either because the record is just bigger than a network buffer (32KiB by default, set via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/config.html#taskmanager-memory-segment-size&quot;&gt;taskmanager.memory.segment-size&lt;/a&gt;) or because the serialised record was added to a network buffer which did not have enough remaining bytes. Flink will nevertheless use these bytes and continue writing the rest to a new network buffer.
&lt;br /&gt;&lt;/p&gt;
&lt;h3 id=&quot;flushing-buffers-to-netty&quot;&gt;Flushing Buffers to Netty&lt;/h3&gt;
&lt;p&gt;In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication and synchronisation, but also make the whole buffering obsolete.&lt;/p&gt;
&lt;p&gt;In Flink, there are three situations that make a buffer available for consumption by the Netty server:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;a buffer becomes full when writing a record to it, or&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;the buffer timeout hits, or&lt;br /&gt;&lt;/li&gt;
&lt;li&gt;a special event such as a checkpoint barrier is sent.&lt;br /&gt;
&lt;br /&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;flush-after-buffer-full&quot;&gt;Flush after Buffer Full&lt;/h4&gt;
&lt;p&gt;The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described above. This is a classical producer-consumer pattern with the network buffers in the middle and as shown by the next picture. After (1) serialising and (2) writing data to the buffer, the RecordWriter updates the buffer’s writer index accordingly. Once the buffer is completely filled, the record writer will (3) acquire a new buffer from its local buffer pool for any remaining bytes of the current record - or for the next one - and add the new one to the subpartition queue. This will (4) notify the Netty server of data being available if it is not aware yet&lt;sup&gt;4&lt;/sup&gt;. Whenever Netty has capacity to handle this notification, it will (5) take the buffer and send it along the appropriate TCP channel.
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack7.png&quot; width=&quot;500px&quot; alt=&quot;Record-writer-to-network-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;4&lt;/sup&gt;We can assume it already got the notification if there are more finished buffers in the queue.
&lt;br /&gt;&lt;/p&gt;
&lt;h4 id=&quot;flush-after-buffer-timeout&quot;&gt;Flush after Buffer Timeout&lt;/h4&gt;
&lt;p&gt;In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/streaming/api/environment/StreamExecutionEnvironment.html#setBufferTimeout-long-&quot;&gt;StreamExecutionEnvironment#setBufferTimeout&lt;/a&gt; and acts as an upper bound on the latency&lt;sup&gt;5&lt;/sup&gt; (for low-throughput channels). The following picture shows how it interacts with the other components: the RecordWriter serialises and writes into network buffers as before but concurrently, the output flusher may (3,4) notify the Netty server of data being available if Netty is not already aware (similar to the “buffer full” scenario above). When Netty handles this notification (5) it will consume the available data from the buffer and update the buffer’s reader index. The buffer stays in the queue - any further operation on this buffer from the Netty server side will continue reading from the reader index next time.
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack8.png&quot; width=&quot;500px&quot; alt=&quot;Record-writer-to-network-with-flusher-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;sup&gt;5&lt;/sup&gt;Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured.
&lt;br /&gt;&lt;/p&gt;
&lt;h4 id=&quot;flush-after-special-event&quot;&gt;Flush after special event&lt;/h4&gt;
&lt;p&gt;Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in.
&lt;br /&gt;&lt;/p&gt;
&lt;h4 id=&quot;further-remarks&quot;&gt;Further remarks&lt;/h4&gt;
&lt;p&gt;In contrast to Flink &amp;lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;less synchronisation overhead (output flusher and RecordWriter are independent)&lt;/li&gt;
&lt;li&gt;in high-load scenarios where Netty is the bottleneck (either through backpressure or directly), we can still accumulate data in incomplete buffers&lt;/li&gt;
&lt;li&gt;significant reduction of Netty notifications&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any &lt;em&gt;available&lt;/em&gt; CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead.
&lt;br /&gt;&lt;/p&gt;
&lt;h3 id=&quot;buffer-builder--buffer-consumer&quot;&gt;Buffer Builder &amp;amp; Buffer Consumer&lt;/h3&gt;
&lt;p&gt;If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferBuilder.html&quot;&gt;BufferBuilder&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/api/java/org/apache/flink/runtime/io/network/buffer/BufferConsumer.html&quot;&gt;BufferConsumer&lt;/a&gt; classes which have been introduced in Flink 1.5. While reading is potentially only &lt;em&gt;per buffer&lt;/em&gt;, writing to it is &lt;em&gt;per record&lt;/em&gt; and thus on the hot path for all network communication in Flink. Therefore, it was very clear to us that we needed a lightweight connection between the task’s thread and the Netty thread which does not imply too much synchronisation overhead. For further details, we suggest to check out the &lt;a href=&quot;https://github.com/apache/flink/tree/release-1.8/flink-runtime/src/main/java/org/apache/flink/runtime/io/network/buffer&quot;&gt;source code&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;latency-vs-throughput&quot;&gt;Latency vs. Throughput&lt;/h2&gt;
&lt;p&gt;Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record) to 100ms (the default) and shows the resulting throughput rates on a cluster with 100 nodes and 8 slots each running a job that has no business logic and thus only tests the network stack. For comparison, we also plot Flink 1.4 before the low-latency improvements (as described above) were added.
&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-06-05-network-stack/flink-network-stack9.png&quot; width=&quot;650px&quot; alt=&quot;Network-buffertimeout-Flink&#39;s Network Stack&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;As you can see, with Flink 1.5+, even very low buffer timeouts such as 1ms (for low-latency scenarios) provide a maximum throughput as high as 75% of the default timeout where more data is buffered before being sent over the wire.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and common antipatterns to avoid. Stay tuned for more.&lt;/p&gt;
</description>
<pubDate>Wed, 05 Jun 2019 10:45:00 +0200</pubDate>
<link>https://flink.apache.org/2019/06/05/flink-network-stack.html</link>
<guid isPermaLink="true">/2019/06/05/flink-network-stack.html</guid>
</item>
<item>
<title>State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink</title>
<description>&lt;p&gt;A common requirement for many stateful streaming applications is to automatically cleanup application state for effective management of your state size, or to control how long the application state can be accessed (e.g. due to legal regulations like the GDPR). The state time-to-live (TTL) feature was initiated in Flink 1.6.0 and enabled application state cleanup and efficient state size management in Apache Flink.&lt;/p&gt;
&lt;p&gt;In this post, we motivate the State TTL feature and discuss its use cases. Moreover, we show how to use and configure it. We explain how Flink internally manages state with TTL and present some exciting additions to the feature in Flink 1.8.0. The blog post concludes with an outlook on future improvements and extensions.&lt;/p&gt;
&lt;h1 id=&quot;the-transient-nature-of-state&quot;&gt;The Transient Nature of State&lt;/h1&gt;
&lt;p&gt;There are two major reasons why state should be maintained only for a limited time. For example, let’s imagine a Flink application that ingests a stream of user login events and stores for each user the time of the last login to improve the experience of frequent visitors.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Controlling the size of state.&lt;/strong&gt;
Being able to efficiently manage an ever-growing state size is a primary use case for state TTL. Oftentimes, data needs to be persisted temporarily while there is some user activity around it, e.g. web sessions. When the activity ends there is no longer interest in that data while it still occupies storage. Flink 1.8.0 introduces background cleanup of old state based on TTL that makes the eviction of no-longer-necessary data frictionless. Previously, the application developer had to take extra actions and explicitly remove useless state to free storage space. This manual clean up procedure was not only error prone but also less efficient than the new lazy method to remove state. Following our previous example of storing the time of the last login, this might not be necessary after some time because the user can be treated as “infrequent” later on.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Complying with data protection and sensitive data requirements.&lt;/strong&gt;
Recent developments around data privacy regulations, such as the General Data Protection Regulation (GDPR) introduced by the European Union, make compliance with such data requirements or treating sensitive data a top priority for many use cases and applications. An example of such use cases includes applications that require keeping data for a specific timeframe and preventing access to it thereafter. This is a common challenge for companies providing short-term services to their customers. The state TTL feature gives guarantees for how long an application can access state and hence can help to comply with data protection regulations.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both requirements can be addressed by a feature that periodically, yet continuously, removes the state for a key once it becomes unnecessary or unimportant and there is no requirement to keep it in storage any more.&lt;/p&gt;
&lt;h1 id=&quot;state-ttl-for-continuous-cleanup-of-application-state&quot;&gt;State TTL for continuous cleanup of application state&lt;/h1&gt;
&lt;p&gt;The 1.6.0 release of Apache Flink introduced the State TTL feature. It enabled developers of stream processing applications to configure the state of operators to expire and be cleaned up after a defined timeout (time-to-live). In Flink 1.8.0 the feature was extended, including continuous cleanup of old entries for both the RocksDB and the heap state backends (FSStateBackend and MemoryStateBackend), enabling a continuous cleanup process of old entries (according to the TTL setting).&lt;/p&gt;
&lt;p&gt;In Flink’s DataStream API, application state is defined by a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/stream/state/state.html#using-managed-keyed-state&quot;&gt;state descriptor&lt;/a&gt;. State TTL is configured by passing a &lt;code&gt;StateTtlConfiguration&lt;/code&gt; object to a state descriptor. The following Java example shows how to create a state TTL configuration and provide it to the state descriptor that holds the last login time of a user as a &lt;code&gt;Long&lt;/code&gt; value:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.state.StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.time.Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.api.common.state.ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setUpdateType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;UpdateType&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;OnCreateAndWrite&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setStateVisibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;StateVisibility&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;NeverReturnExpired&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;lastUserLogin&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ValueStateDescriptor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;lastUserLogin&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;lastUserLogin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;enableTimeToLive&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Flink provides multiple options to configure the behavior of the state TTL functionality.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;When is the Time-to-Live reset?&lt;/strong&gt;
By default, the expiration time of a state entry is updated when the state is modified. Optionally, it can also be updated on read access at the cost of an additional write operation to update the timestamp.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Can the expired state be accessed one last time?&lt;/strong&gt;
State TTL employs a lazy strategy to clean up expired state. This can lead to the situation that an application attempts to read state which is expired but hasn’t been removed yet. You can configure whether such a read request returns the expired state or not. In either case, the expired state is immediately removed afterwards. While the option of returning expired state favors data availability, not returning expired state can be required for data protection regulations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Which time semantics are used for the Time-to-Live timers?&lt;/strong&gt;
With Flink 1.8.0, users can only define a state TTL in terms of processing time. The support for event time is planned for future Apache Flink releases.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;You can read more about how to use state TTL in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl&quot;&gt;Apache Flink documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Internally, the State TTL feature is implemented by storing an additional timestamp of the last relevant state access, along with the actual state value. While this approach adds some storage overhead, it allows Flink to check for the expired state during state access, checkpointing, recovery, or dedicated storage cleanup procedures.&lt;/p&gt;
&lt;h1 id=&quot;taking-out-the-garbage&quot;&gt;“Taking out the Garbage”&lt;/h1&gt;
&lt;p&gt;When a state object is accessed in a read operation, Flink will check its timestamp and clear the state if it is expired (depending on the configured state visibility, the expired state is returned or not). Due to this lazy removal, expired state that is never accessed again will forever occupy storage space unless it is garbage collected.&lt;/p&gt;
&lt;p&gt;So how can the expired state be removed without the application logic explicitly taking care of it? In general, there are different possible strategies to remove it in the background.&lt;/p&gt;
&lt;h2 id=&quot;keep-full-state-snapshots-clean&quot;&gt;Keep full state snapshots clean&lt;/h2&gt;
&lt;p&gt;Flink 1.6.0 already supported automatic eviction of the expired state when a full snapshot for a checkpoint or savepoint is taken. Note that state eviction is not applied for incremental checkpoints. State eviction on full snapshots must be explicitly enabled as shown in the following example:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupFullSnapshot&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The local storage stays untouched but the size of the stored snapshot is reduced. The local state of an operator will only be cleaned up when the operator reloads its state from a snapshot, i.e. in case of recovery or when starting from a savepoint.&lt;/p&gt;
&lt;p&gt;Due to these limitations, applications still need to actively remove state after it expired in Flink 1.6.0. To improve the user experience, Flink 1.8.0 introduces two more autonomous cleanup strategies, one for each of Flink’s two state backend types. We describe them below.&lt;/p&gt;
&lt;h2 id=&quot;incremental-cleanup-in-heap-state-backends&quot;&gt;Incremental cleanup in Heap state backends&lt;/h2&gt;
&lt;p&gt;This approach is specific to the Heap state backends (FSStateBackend and MemoryStateBackend). The idea is that the storage backend keeps a lazy global iterator over all state entries. Certain events, for instance state access, trigger an incremental cleanup. Every time an incremental cleanup is triggered, the iterator is advanced. The traversed state entries are checked and expired once are removed. The following code example shows how to enable incremental cleanup:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// check 10 keys for every state access&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupIncrementally&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;false&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;If enabled, every state access triggers a cleanup step. For every clean up step, a certain number of state entries are checked for expiration. There are two tuning parameters. The first defines the number of state entries to check for each cleanup step. The second parameter is a flag to trigger a cleanup step after each processed record, additionally to each state access.&lt;/p&gt;
&lt;p&gt;There are two important caveats about this approach:
* The first one is that the time spent for the incremental cleanup increases the record processing latency.
* The second one should be practically negligible but still worth mentioning: if no state is accessed or no records are processed, expired state won’t be removed.&lt;/p&gt;
&lt;h2 id=&quot;rocksdb-background-compaction-to-filter-out-expired-state&quot;&gt;RocksDB background compaction to filter out expired state&lt;/h2&gt;
&lt;p&gt;If your application uses the RocksDB state backend, you can enable another cleanup strategy which is based on a Flink specific compaction filter. RocksDB periodically runs asynchronous compactions to merge state updates and reduce storage. The Flink compaction filter checks the expiration timestamp of state entries with TTL and discards all expired values.&lt;/p&gt;
&lt;p&gt;The first step to activate this feature is to configure the RocksDB state backend by setting the following Flink configuration option: &lt;code&gt;state.backend.rocksdb.ttl.compaction.filter.enabled&lt;/code&gt;. Once the RocksDB state backend is configured, the compaction cleanup strategy is enabled for a state as shown in the following code example:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ttlConfig&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StateTtlConfig&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;newBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;7&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;cleanupInRocksdbCompactFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Keep in mind that calling the Flink TTL filter slows down the RocksDB compaction.&lt;/p&gt;
&lt;h2 id=&quot;eager-state-cleanup-with-timers&quot;&gt;Eager State Cleanup with Timers&lt;/h2&gt;
&lt;p&gt;Another way to manually cleanup state is based on Flink timers. This is an idea that the community is currently evaluating for future releases. With this approach, a cleanup timer is registered for every state access. This approach is more predictable because state is eagerly removed as soon as it expires. However, it is more expensive because the timers consume storage along with the original state.&lt;/p&gt;
&lt;h1 id=&quot;future-work&quot;&gt;Future work&lt;/h1&gt;
&lt;p&gt;Apart from including the timer-based cleanup strategy, mentioned above, the Flink community has plans to further improve the state TTL feature. The possible improvements include adding support of TTL for event time scale (only processing time is supported at the moment) and enabling State TTL for queryable state.&lt;/p&gt;
&lt;p&gt;We encourage you to join the conversation and share your thoughts and ideas in the &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;Apache Flink JIRA board&lt;/a&gt; or by subscribing to the Apache Flink dev mailing list. Feedback or suggestions are always appreciated and we look forward to hearing your thoughts on the Flink mailing lists.&lt;/p&gt;
&lt;h1 id=&quot;summary&quot;&gt;Summary&lt;/h1&gt;
&lt;p&gt;Time-based state access restrictions and controlling the size of application state are common challenges in the world of stateful stream processing. Flink’s 1.8.0 release significantly improves the State TTL feature by adding support for continuous background cleanup of expired state objects. The new clean up mechanisms relieve you from manually implementing state cleanup. They are also more efficient due to their lazy nature. State TTL gives you control over the size of your application state so that you can focus on the core logic of your applications.&lt;/p&gt;
</description>
<pubDate>Sun, 19 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/19/state-ttl.html</link>
<guid isPermaLink="true">/2019/05/19/state-ttl.html</guid>
</item>
<item>
<title>Flux capacitor, huh? Temporal Tables and Joins in Streaming SQL</title>
<description>&lt;p&gt;Figuring out how to manage and model temporal data for effective point-in-time analysis was a longstanding battle, dating as far back as the early 80’s, that culminated with the introduction of temporal tables in the SQL standard in 2011. Up to that point, users were doomed to implement this as part of the application logic, often hurting the length of the development lifecycle as well as the maintainability of the code. And, although there isn’t a single, commonly accepted definition of &lt;strong&gt;temporal data&lt;/strong&gt;, the challenge it represents is one and the same: how do we validate or enrich data against dynamically changing, historical datasets?&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables1.png&quot; width=&quot;500px&quot; alt=&quot;Taxi Fares and Conversion Rates&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;For example:&lt;/strong&gt; given a stream with Taxi Fare events tied to the local currency of the ride location, we might want to convert the fare price to a common currency for further processing. As conversion rates excel at fluctuating over time, each Taxi Fare event would need to be matched to the rate that was valid at the time the event occurred in order to produce a reliable result.&lt;/p&gt;
&lt;h2 id=&quot;modelling-temporal-data-with-flink&quot;&gt;Modelling Temporal Data with Flink&lt;/h2&gt;
&lt;p&gt;In the 1.7 release, Flink has introduced the concept of &lt;strong&gt;temporal tables&lt;/strong&gt; into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and keep data closely tied to time context, so that it can be interpreted as valid only within a specific period of time. Transforming a stream into a temporal table requires:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Defining a &lt;strong&gt;primary key&lt;/strong&gt; and a &lt;strong&gt;versioning field&lt;/strong&gt; that can be used to keep track of the changes that happen over time;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Exposing the stream as a &lt;strong&gt;temporal table function&lt;/strong&gt; that maps each point in time to a static relation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Going back to our example use case, a temporal table is just what we need to model the conversion rate data such as to make it useful for point-in-time querying. Temporal table functions are implemented as an extension of Flink’s generic &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/udfs.html#table-functions&quot;&gt;table function&lt;/a&gt; class and can be defined in the same straightforward way to be used with the Table API or SQL parser.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;org.apache.flink.table.functions.TemporalTableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Get the stream and table environments.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Provide a sample static data set of the rates history table.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;List&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ArrayList&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;USD&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;102L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;EUR&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;114L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;YEN&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;EUR&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;116L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;USD&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;105L&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Create and register an example table using the sample data set.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistoryStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromCollection&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;fromDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ratesHistoryStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;r_currency, r_rate, r_proctime.proctime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;RatesHistory&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Create and register the temporal table function &amp;quot;rates&amp;quot;.&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Define &amp;quot;r_proctime&amp;quot; as the versioning field and &amp;quot;r_currency&amp;quot; as the primary key.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemporalTableFunction&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rates&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ratesHistory&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createTemporalTableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;r_proctime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;r_currency&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Rates&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rates&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;What does this &lt;strong&gt;Rates&lt;/strong&gt; function do, in practice? Imagine we would like to check what the conversion rates looked like at a given time — say, 11:00. We could simply do something like:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Rates&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s1&quot;&gt;&amp;#39;11:00&amp;#39;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables2.png&quot; width=&quot;650px&quot; alt=&quot;Point-in-time Querying&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Even though Flink does not yet support querying temporal table functions with a constant time attribute parameter, these functions can be used to cover a much more interesting scenario: temporal table joins.&lt;/p&gt;
&lt;h2 id=&quot;streaming-joins-using-temporal-tables&quot;&gt;Streaming Joins using Temporal Tables&lt;/h2&gt;
&lt;p&gt;Temporal tables reach their full potential when used in combination — erm, joined — with streaming data, for instance to power applications that must continuously whitelist against a reference dataset that changes over time for auditing or regulatory compliance. While efficient joins have long been an enduring challenge for query processors due to computational cost and resource consumption, joins over streaming data carry some additional challenges:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;unbounded&lt;/strong&gt; nature of streams means that inputs are continuously evaluated and intermediate join results can consume memory resources indefinitely. Flink gracefully manages its memory consumption out-of-the-box (even for heavier cases where joins require spilling to disk) and supports time-windowed joins to bound the amount of data that needs to be kept around as state;&lt;/li&gt;
&lt;li&gt;Streaming data might be &lt;strong&gt;out-of-order&lt;/strong&gt; and &lt;strong&gt;late&lt;/strong&gt;, so it is not possible to enforce an ordering upfront and time handling requires some thinking to avoid unnecessary outputs and retractions.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the particular case of temporal data, time-windowed joins are not enough (well, at least not without getting into some expensive tweaking): sooner or later, each reference record will fall outside of the window and be wiped from state, no longer being considered for future join results. To address this limitation, Flink has introduced support for temporal table joins to cover time-varying relations.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables3.png&quot; width=&quot;500px&quot; alt=&quot;Temporal Table Join between Taxi Fares and Conversion Rates&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Each record from the append-only table on the probe side (&lt;code&gt;Taxi Fare&lt;/code&gt;) is joined with the version of the record from the temporal table on the build side (&lt;code&gt;Conversion Rate&lt;/code&gt;) that most closely matches the probe side record time attribute (&lt;code&gt;time&lt;/code&gt;) for the same value of the primary key (&lt;code&gt;currency&lt;/code&gt;). Remember the temporal table function (&lt;code&gt;Rates&lt;/code&gt;) we registered earlier? It can now be used to express this join as a simple SQL statement that would otherwise require a heavier statement with a subquery.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-05-13-temporal-tables/TemporalTables4.png&quot; width=&quot;700px&quot; alt=&quot;Regular Join vs. Temporal Table Join&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Temporal table joins support both &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#processing-time-temporal-joins&quot;&gt;processing&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/joins.html#event-time-temporal-joins&quot;&gt;event time&lt;/a&gt; semantics and effectively limit the amount of data kept in state while also allowing records on the build side to be arbitrarily old, as opposed to time-windowed joins. Probe-side records only need to be kept in state for a very short time to ensure correct semantics in presence of out-of-order records. The challenges mentioned in the beginning of this section are overcome by:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Narrowing the &lt;strong&gt;scope&lt;/strong&gt; of the join: only the time-matching version of &lt;code&gt;ratesHistory&lt;/code&gt; is visible for a given &lt;code&gt;taxiFare.time&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;Pruning &lt;strong&gt;unneeded records&lt;/strong&gt; from state: for cases using event time, records between current time and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html#event-time-and-watermarks&quot;&gt;watermark&lt;/a&gt; delay are persisted for both the probe and build side. These are discarded as soon as the watermark arrives and the results are emitted — allowing the join operation to move forward in time and the build table to “refresh” its version in state.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;All this means it is now possible to express continuous stream enrichment in relational and time-varying terms using Flink without dabbling into syntactic patchwork or compromising performance. In other words: stream time-travelling minus the flux capacitor. Extending this syntax to batch processing for enriching historic data with proper (event) time semantics is also part of the Flink roadmap!&lt;/p&gt;
&lt;p&gt;If you’d like to get some &lt;strong&gt;hands-on practice in joining streams with Flink SQL&lt;/strong&gt; (and Flink SQL in general), checkout this &lt;a href=&quot;https://github.com/ververica/sql-training/wiki&quot;&gt;free training for Flink SQL&lt;/a&gt;. The training environment is based on Docker and set up in just a few minutes.&lt;/p&gt;
&lt;p&gt;Subscribe to the &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;Apache Flink mailing lists&lt;/a&gt; to stay up-to-date with the latest developments in this space.&lt;/p&gt;
</description>
<pubDate>Tue, 14 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/14/temporal-tables.html</link>
<guid isPermaLink="true">/2019/05/14/temporal-tables.html</guid>
</item>
<item>
<title>When Flink &amp; Pulsar Come Together</title>
<description>&lt;p&gt;The open source data technology frameworks &lt;a href=&quot;https://flink.apache.org/&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://pulsar.apache.org/en/&quot;&gt;Apache Pulsar&lt;/a&gt; can integrate in different ways to provide elastic data processing at large scale. I recently gave a talk at &lt;a href=&quot;https://www.flink-forward.org/&quot;&gt;Flink Forward&lt;/a&gt; San Francisco 2019 and presented some of the integrations between the two frameworks for batch and streaming applications. In this post, I will give a short introduction to Apache Pulsar and its differentiating elements from other messaging systems and describe the ways that Pulsar and Flink can work together to provide a seamless developer experience for elastic data processing at scale.&lt;/p&gt;
&lt;h2 id=&quot;a-brief-introduction-to-apache-pulsar&quot;&gt;A brief introduction to Apache Pulsar&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://pulsar.apache.org/en/&quot;&gt;Apache Pulsar&lt;/a&gt; is an open-source distributed pub-sub messaging system under the stewardship of the &lt;a href=&quot;https://www.apache.org/&quot;&gt;Apache Software Foundation&lt;/a&gt;. Pulsar is a multi-tenant, high-performance solution for server-to-server messaging including multiple features such as native support for multiple clusters in a Pulsar instance, with seamless &lt;a href=&quot;https://pulsar.apache.org/docs/en/administration-geo&quot;&gt;geo-replication&lt;/a&gt; of messages across clusters, very low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with &lt;a href=&quot;https://pulsar.apache.org/docs/en/concepts-architecture-overview#persistent-storage&quot;&gt;persistent message storage&lt;/a&gt; provided by &lt;a href=&quot;https://bookkeeper.apache.org/&quot;&gt;Apache BookKeeper&lt;/a&gt; among others. Let’s now discuss the primary differentiators between Pulsar and other pub-sub messaging frameworks:&lt;/p&gt;
&lt;p&gt;The first differentiating factor stems from the fact that although Pulsar provides a flexible pub-sub messaging system it is also backed by durable log storage — hence combining both messaging and storage under one framework. Because of that layered architecture, Pulsar provides instant failure recovery, independent scalability and balance-free cluster expansion.&lt;/p&gt;
&lt;p&gt;Pulsar’s architecture follows a similar pattern to other pub-sub systems as the framework is organized in topics as the main data entity, with producers sending data to, and consumers receiving data from a topic as shown in the diagram below.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-1.png&quot; width=&quot;400px&quot; alt=&quot;Pulsar producers and consumers&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The second differentiator of Pulsar is that the framework is built from the get-go with &lt;a href=&quot;https://pulsar.apache.org/docs/en/concepts-multi-tenancy/&quot;&gt;multi-tenancy&lt;/a&gt; in mind. What that means is that each Pulsar topic has a hierarchical management structure making the allocation of resources as well as the resource management and coordination between teams efficient and easy. With Pulsar’s multi-tenancy structure, data platform maintainers can onboard new teams with no friction as Pulsar provides resource isolation at the property (tenant), namespace or topic level, while at the same time data can be shared across the cluster for easy collaboration and coordination.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-2.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Finally, Pulsar’s flexible messaging framework unifies the streaming and queuing data consumption models and provides greater flexibility. As shown in the below diagram, Pulsar holds the data in the topic while multiple teams can consume the data independently depending on their workloads and data consumption patterns.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-3.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;pulsars-view-on-data-segmented-data-streams&quot;&gt;Pulsar’s view on data: Segmented data streams&lt;/h2&gt;
&lt;p&gt;Apache Flink is a streaming-first computation framework that perceives &lt;a href=&quot;/news/2019/02/13/unified-batch-streaming-blink.html&quot;&gt;batch processing as a special case of streaming&lt;/a&gt;. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.&lt;/p&gt;
&lt;p&gt;Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of &lt;em&gt;Segmented Streams&lt;/em&gt; and bounded data stream for batch and static workloads.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-4.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;With Pulsar, once a producer sends data to a topic, it is partitioned depending on the data traffic and then further segmented under those partitions — using Apache Bookkeeper as segment store — to allow for parallel data processing as illustrated in the diagram below. This allows a combination of traditional pub-sub messaging and distributed parallel computations in one framework.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/pulsar-flink/image-5.png&quot; width=&quot;640px&quot; alt=&quot;Apache Flink and Apache Pulsar&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;when-flink--pulsar-come-together&quot;&gt;When Flink + Pulsar come together&lt;/h2&gt;
&lt;p&gt;Apache Flink and Apache Pulsar integrate in multiple ways already. In the following sections, I will present some potential future integrations between the frameworks and share examples of existing ways in which you can utilize the frameworks together.&lt;/p&gt;
&lt;h3 id=&quot;potential-integrations&quot;&gt;Potential Integrations&lt;/h3&gt;
&lt;p&gt;Pulsar can integrate with Apache Flink in different ways. Some potential integrations include providing support for streaming workloads with the use of &lt;em&gt;Streaming Connectors&lt;/em&gt; and support for batch workloads with the use of &lt;em&gt;Batch Source Connectors&lt;/em&gt;. Pulsar also comes with native support for schema that can integrate with Flink and provide structured access to the data, for example by using Flink SQL as a way of querying data in Pulsar. Finally, an alternative way of integrating the technologies could include using Pulsar as a state backend with Flink. Since Pulsar has a layered architecture (&lt;em&gt;Streams&lt;/em&gt; and &lt;em&gt;Segmented Streams&lt;/em&gt;, powered by Apache Bookkeeper), it becomes natural to use Pulsar as a storage layer and store Flink state.&lt;/p&gt;
&lt;p&gt;From an architecture point of view, we can imagine the integration between the two frameworks as one that uses Apache Pulsar for a unified view of the data layer and Apache Flink as a unified computation and data processing framework and API.&lt;/p&gt;
&lt;h3 id=&quot;existing-integrations&quot;&gt;Existing Integrations&lt;/h3&gt;
&lt;p&gt;Integration between the two frameworks is ongoing and developers can already use Pulsar with Flink in multiple ways. For example, Pulsar can be used as a streaming source and streaming sink in Flink DataStream applications. Developers can ingest data from Pulsar into a Flink job that makes computations and processes real-time data, to then send the data back to a Pulsar topic as a streaming sink. Such an example is shown below:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create and configure Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;PulsarSourceBuilder&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;SimpleStringSchema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;inputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subscriptionName&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;subscription&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;SourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;src&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ingest DataStream with Pulsar consumer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;src&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// perform computation on DataStream (here a simple WordCount)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatMap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FlatMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;})&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;returns&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;timeWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;5&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;reduce&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ReduceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;c2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;count&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// emit result via Pulsar producer&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkPulsarProducer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;UTF_8&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Another integration between the two frameworks that developers can take advantage of includes using Pulsar as both a streaming source and a streaming table sink for Flink SQL or Table API queries as shown in the example below:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// obtain a DataStream with words&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register DataStream as Table &amp;quot;words&amp;quot; with two attributes (&amp;quot;word&amp;quot;, &amp;quot;ts&amp;quot;). &lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// &amp;quot;ts&amp;quot; is an event-time timestamp.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;words&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;word, ts.rowtime&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// create a TableSink that produces to Pulsar&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TableSink&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sink&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PulsarJsonTableSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;outputTopic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ROUTING_KEY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register Pulsar TableSink as table &amp;quot;wc&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;registerTableSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;wc&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;sink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;configure&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]{&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;word&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;cnt&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;},&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TypeInformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[]{&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;LONG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;}));&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// count words per 5 seconds and write result to table &amp;quot;wc&amp;quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sqlUpdate&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;INSERT INTO wc &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT word, COUNT(*) AS cnt &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;FROM words &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;GROUP BY word, TUMBLE(ts, INTERVAL &amp;#39;5&amp;#39; SECOND)&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Finally, Flink integrates with Pulsar for batch workloads as a batch sink where all results get pushed to Pulsar after Apache Flink has completed the computation in a static data set. Such an example is shown below:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// obtain DataSet from arbitrary computation&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;WordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// create PulsarOutputFormat instance&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OutputFormat&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pulsarOutputFormat&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;PulsarOutputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;serviceUrl&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;topic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;AuthenticationDisabled&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;wordWithCount&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toString&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getBytes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// write DataSet to Pulsar&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;output&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;pulsarOutputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be &lt;em&gt;“streaming-first”&lt;/em&gt; with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;Apache Flink&lt;/a&gt; and &lt;a href=&quot;https://lists.apache.org/list.html?dev@pulsar.apache.org&quot;&gt;Apache Pulsar&lt;/a&gt; mailing lists to stay up-to-date with the latest developments in this space or share your thoughts and recommendations with both communities.&lt;/p&gt;
</description>
<pubDate>Fri, 03 May 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/2019/05/03/pulsar-flink.html</link>
<guid isPermaLink="true">/2019/05/03/pulsar-flink.html</guid>
</item>
<item>
<title>Apache Flink&#39;s Application to Season of Docs</title>
<description>&lt;p&gt;The Apache Flink community is happy to announce its application to the first edition of &lt;a href=&quot;https://developers.google.com/season-of-docs/&quot;&gt;Season of Docs&lt;/a&gt; by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or two technical writers to extend and restructure parts of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-stable/&quot;&gt;our documentation&lt;/a&gt; (details below).&lt;/p&gt;
&lt;p&gt;The community has discussed this opportunity on the &lt;a href=&quot;https://lists.apache.org/thread.html/3c789b6187da23ad158df59bbc598543b652e3cfc1010a14e294e16a@%3Cdev.flink.apache.org%3E&quot;&gt;dev mailinglist&lt;/a&gt; and agreed on three project ideas to submit to the program. We have a great team of mentors (Stephan, Fabian, David, Jark &amp;amp; Konstantin) lined up and are very much looking forward to the first proposals by potential technical writers (given we are admitted to the program ;)). In case of questions feel free to reach out to the community via &lt;a href=&quot;../../../../community.html#mailing-lists&quot;&gt;dev@flink.apache.org&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;project-ideas-list&quot;&gt;Project Ideas List&lt;/h2&gt;
&lt;h3 id=&quot;project-1-improve-documentation-of-stream-processing-concepts&quot;&gt;Project 1: Improve Documentation of Stream Processing Concepts&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing over the last several years. As the community has pushed the boundaries of stream processing, we have introduced new concepts that users need to become familiar with to develop and operate Apache Flink applications efficiently.
The Apache Flink documentation [1] already contains a “concepts” section, but it is a ) incomplete and b) lacks an overall structure &amp;amp; reading flow. In addition, “concepts”-content is also spread over the development [2] &amp;amp; operations [3] documentation without references to the “concepts” section. An example of this can be found in [4] and [5].&lt;/p&gt;
&lt;p&gt;In this project, we would like to restructure, consolidate and extend the concepts documentation for Apache Flink to better guide users who want to become productive as quickly as possible. This includes better conceptual introductions to topics such as event time, state, and fault tolerance with proper linking to and from relevant deployment and development guides.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/programming-model.html#time&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/event_time.html&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;project-2-improve-documentation-of-flink-deployments--operations&quot;&gt;Project 2: Improve Documentation of Flink Deployments &amp;amp; Operations&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing for the last few years. As a stateful distributed system in general and a continuously running, low-latency system in particular, Apache Flink deployments are non-trivial to setup and manage.
Unfortunately, the operations [1] and monitoring documentation [2] are arguably the weakest spots of the Apache Flink documentation. While it is comprehensive and often goes into a lot of detail, it lacks an overall structure and does not address common overarching concerns of operations teams in an efficient way.&lt;/p&gt;
&lt;p&gt;In this project, we would like to restructure this part of the documentation and extend it if possible. Ideas for extension include: discussion of session and per-job clusters, better documentation for containerized deployments (incl. K8s), capacity planning &amp;amp; integration into CI/CD pipelines.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring&quot;&gt;https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id=&quot;project-3-improve-documentation-for-relational-apis-table-api--sql&quot;&gt;Project 3: Improve Documentation for Relational APIs (Table API &amp;amp; SQL)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Description:&lt;/strong&gt; Apache Flink features APIs at different levels of abstraction which enables its users to trade conciseness for expressiveness. Flink’s relational APIs, SQL and the Table API, are “younger” than the DataStream and DataSet APIs, more high-level and focus on data analytics use cases. A core principle of Flink’s SQL and Table API is that they can be used to process static (batch) and continuous (streaming) data and that a program or query produces the same result in both cases.
The documentation of Flink’s relational APIs has organically grown and can be improved in a few areas. There are several on-going development efforts (e.g. Hive Integration, Python Support or Support for Interactive Programming) that aim to extend the scope of the Table API and SQL.&lt;/p&gt;
&lt;p&gt;The existing documentation could be reorganized to prepare for covering the new features. Moreover, it could be improved by adding a concepts section that describes the use cases and internals of the APIs in more detail. Moreover, the documentation of built-in functions could be improved by adding more concrete examples.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Related material:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table&quot;&gt;Table API &amp;amp; SQL docs main page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html&quot;&gt;Built-in functions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/common.html&quot;&gt;Concepts&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/streaming/&quot;&gt;Streaming Concepts&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;
</description>
<pubDate>Wed, 17 Apr 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/04/17/sod.html</link>
<guid isPermaLink="true">/news/2019/04/17/sod.html</guid>
</item>
<item>
<title>Apache Flink 1.8.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce Apache Flink 1.8.0. The
latest release includes more than 420 resolved issues and some exciting
additions to Flink that we describe in the following sections of this post.
Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344274&quot;&gt;complete changelog&lt;/a&gt;
for more details.&lt;/p&gt;
&lt;p&gt;Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated
with the &lt;code&gt;@Public&lt;/code&gt; annotation. The release is available now and we encourage
everyone to &lt;a href=&quot;/downloads.html&quot;&gt;download the release&lt;/a&gt; and
check out the updated
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;/community.html#mailing-lists&quot;&gt;mailing
lists&lt;/a&gt; or
&lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always,
very much appreciated!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#important-changes&quot; id=&quot;markdown-toc-important-changes&quot;&gt;Important Changes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#known-issues&quot; id=&quot;markdown-toc-known-issues&quot;&gt;Known Issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;p&gt;With Flink 1.8.0 we come closer to our goals of enabling fast data processing
and building data-intensive applications for the Flink community in a seamless
way. We do this by cleaning up and refactoring Flink under the hood to allow
more efficient feature development in the future. This includes removal of the
legacy runtime components that were subsumed in the major rework of Flink’s
underlying distributed system architecture
(&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;)
as well as refactorings on the Table API that prepare it for the future
addition of the Blink enhancements
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11439&quot;&gt;FLINK-11439&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Nevertheless, this release includes some important new features and bug fixes.
The most interesting of those are highlighted below. Please consult the
&lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12344274&quot;&gt;complete changelog&lt;/a&gt;
and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html&quot;&gt;release notes&lt;/a&gt;
for more details.&lt;/p&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Finalized State Schema Evolution Story&lt;/strong&gt;: This release completes
the community driven effort to provide a schema evolution story for
user state managed by Flink. This has been an effort that spanned 2
releases, starting from 1.7.0 with the introduction of support for
Avro state schema evolution as well as a revamped serialization
compatibility abstraction.&lt;/p&gt;
&lt;p&gt;Flink 1.8.0 finalizes this effort by extending support for schema
evolution to POJOs, upgrading all Flink built-in serializers to use
the new serialization compatibility abstractions, as well as making it
easier for advanced users who use custom state serializers to
implement the abstractions. These different aspects for a complete
out-of-the-box schema evolution story are explained in detail below:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Support for POJO state schema evolution: The pool of data types
that support state schema evolution has been expanded to include
POJOs. For state types that use POJOs, you can now add or remove
fields from your POJO while retaining backwards
compatibility. For a full overview of the list of data types that
now support schema evolution as well as their evolution
specifications and limitations, please refer to the State Schema
Evolution documentation page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Upgrade all Flink serializers to use new serialization
compatibility asbtractions: Back in 1.7.0, we introduced the new
serialization compatibility abstractions &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt;
and &lt;code&gt;TypeSerializerSchemaCompatibility&lt;/code&gt;. Besides providing a more
expressible API to reflect schema compatibility between the data
stored in savepoints and the data registered at runtime, another
important aspect about the new abstraction is that it avoids the
need for Flink to Java-serialize the state serializer as state
metadata in savepoints.&lt;/p&gt;
&lt;p&gt;In 1.8.0, all of Flink’s built-in serializers have been upgraded to
use the new abstractions, and therefore the serializers
themselves are no longer Java-serialized into savepoints. This
greatly improves interoperability of Flink savepoints, in terms
of state schema evolvability. For example, one outcome was the
support for POJO schema evolution, as previously mentioned
above. Another outcome is that all composite data types supported
by Flink (such as &lt;code&gt;Either&lt;/code&gt;, Scala case classes, Flink Java
&lt;code&gt;Tuple&lt;/code&gt;s, etc.) are generally evolve-able as well when they have
a nested evolvable type, such as a POJO. For example, the &lt;code&gt;MyPojo&lt;/code&gt;
type in &lt;code&gt;ValueState&amp;lt;Tuple2&amp;lt;Integer, MyPojo&amp;gt;&amp;gt;&lt;/code&gt; or
&lt;code&gt;ListState&amp;lt;Either&amp;lt;Integer, MyPojo&amp;gt;&amp;gt;&lt;/code&gt;, which is a POJO, is allowed
to evolve its schema.&lt;/p&gt;
&lt;p&gt;For users who are using custom &lt;code&gt;TypeSerializer&lt;/code&gt; implementations
for their state serializer and are still using the outdated
abstractions (i.e. &lt;code&gt;TypeSerializerConfigSnapshot&lt;/code&gt; and
&lt;code&gt;CompatiblityResult&lt;/code&gt;), we highly recommend upgrading to the new
abstractions to be future proof. Please refer to the Custom State
Serialization documentation page for a detailed description on
the new abstractions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Provide pre-defined snapshot implementations for common
serializers: For convenience, Flink 1.8.0 comes with two
predefined implementations for the &lt;code&gt;TypeSerializerSnapshot&lt;/code&gt; that
make the task of implementing these new abstractions easier
for most implementations of &lt;code&gt;TypeSerializer&lt;/code&gt;s -
&lt;code&gt;SimpleTypeSerializerSnapshot&lt;/code&gt; and
&lt;code&gt;CompositeTypeSerializerSnapshot&lt;/code&gt;. This section in the
documentation provides information on how to use these classes.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Continuous cleanup of old state based on TTL
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7811&quot;&gt;FLINK-7811&lt;/a&gt;)&lt;/strong&gt;: We
introduced TTL (time-to-live) for Keyed state in Flink 1.6
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;). This
feature enabled cleanup and made keyed state entries inaccessible after a
defined timeout. In addition state would now also be cleaned up when
writing a savepoint/checkpoint.&lt;/p&gt;
&lt;p&gt;Flink 1.8 introduces continuous cleanup of old entries for both the RocksDB
state backend
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10471&quot;&gt;FLINK-10471&lt;/a&gt;) and the heap
state backend
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10473&quot;&gt;FLINK-10473&lt;/a&gt;). This means
that old entries (according to the TTL setting) are continuously cleaned up.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;SQL pattern detection with user-defined functions and
aggregations&lt;/strong&gt;: The support of the MATCH_RECOGNIZE clause has been
extended by multiple features. The addition of user-defined
functions allows for custom logic during pattern detection
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10597&quot;&gt;FLINK-10597&lt;/a&gt;),
while adding aggregations allows for more complex CEP definitions,
such as the following
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7599&quot;&gt;FLINK-7599&lt;/a&gt;).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;SELECT *
FROM Ticker
MATCH_RECOGNIZE (
ORDER BY rowtime
MEASURES
AVG(A.price) AS avgPrice
ONE ROW PER MATCH
AFTER MATCH SKIP TO FIRST B
PATTERN (A+ B)
DEFINE
A AS AVG(A.price) &amp;lt; 15
) MR;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;RFC-compliant CSV format (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9964&quot;&gt;FLINK-9964&lt;/a&gt;)&lt;/strong&gt;: The SQL tables can now be read and written in
an RFC-4180 standard compliant CSV table format. The format might also be
useful for general DataStream API users.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New KafkaDeserializationSchema that gives direct access to ConsumerRecord
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8354&quot;&gt;FLINK-8354&lt;/a&gt;)&lt;/strong&gt;: For the
Flink &lt;code&gt;KafkaConsumers&lt;/code&gt;, we introduced a new &lt;code&gt;KafkaDeserializationSchema&lt;/code&gt; that
gives direct access to the Kafka &lt;code&gt;ConsumerRecord&lt;/code&gt;. This now allows access to
all data that Kafka provides for a record, including the headers. This
subsumes the &lt;code&gt;KeyedSerializationSchema&lt;/code&gt; functionality, which is deprecated but
still available for now.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Per-shard watermarking option in FlinkKinesisConsumer
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5697&quot;&gt;FLINK-5697&lt;/a&gt;)&lt;/strong&gt;: The Kinesis
Consumer can now emit periodic watermarks that are derived from per-shard watermarks,
for correct event time processing with subtasks that consume multiple Kinesis shards.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New consumer for DynamoDB Streams to capture table changes
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4582&quot;&gt;FLINK-4582&lt;/a&gt;)&lt;/strong&gt;: &lt;code&gt;FlinkDynamoDBStreamsConsumer&lt;/code&gt;
is a variant of the Kinesis consumer that supports retrieval of CDC-like streams from DynamoDB tables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Support for global aggregates for subtask coordination
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10887&quot;&gt;FLINK-10887&lt;/a&gt;)&lt;/strong&gt;:
Designed as a solution for global source watermark tracking, &lt;code&gt;GlobalAggregateManager&lt;/code&gt;
allows sharing of information between parallel subtasks. This feature will
be integrated into streaming connectors for watermark synchronization and
can be used for other purposes with a user defined aggregator.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;important-changes&quot;&gt;Important Changes&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Changes to bundling of Hadoop libraries with Flink
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11266&quot;&gt;FLINK-11266&lt;/a&gt;)&lt;/strong&gt;:
Convenience binaries that include hadoop are no longer released.&lt;/p&gt;
&lt;p&gt;If a deployment relies on &lt;code&gt;flink-shaded-hadoop2&lt;/code&gt; being included in
&lt;code&gt;flink-dist&lt;/code&gt;, then you must manually download a pre-packaged Hadoop
jar from the optional components section of the &lt;a href=&quot;/downloads.html&quot;&gt;download
page&lt;/a&gt; and copy it into the
&lt;code&gt;/lib&lt;/code&gt; directory. Alternatively, a Flink distribution that includes
hadoop can be built by packaging &lt;code&gt;flink-dist&lt;/code&gt; and activating the
&lt;code&gt;include-hadoop&lt;/code&gt; maven profile.&lt;/p&gt;
&lt;p&gt;As hadoop is no longer included in &lt;code&gt;flink-dist&lt;/code&gt; by default, specifying
&lt;code&gt;-DwithoutHadoop&lt;/code&gt; when packaging &lt;code&gt;flink-dist&lt;/code&gt; no longer impacts the build.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;FlinkKafkaConsumer will now filter restored partitions based on topic
specification
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10342&quot;&gt;FLINK-10342&lt;/a&gt;)&lt;/strong&gt;:
Starting from Flink 1.8.0, the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt; now always filters out
restored partitions that are no longer associated with a specified topic to
subscribe to in the restored execution. This behaviour did not exist in
previous versions of the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;. If you wish to retain the
previous behaviour, please use the
&lt;code&gt;disableFilterRestoredPartitionsWithSubscribedTopics()&lt;/code&gt; configuration method
on the &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Consider this example: if you had a Kafka Consumer that was consuming from
topic &lt;code&gt;A&lt;/code&gt;, you did a savepoint, then changed your Kafka consumer to instead
consume from topic &lt;code&gt;B&lt;/code&gt;, and then restarted your job from the savepoint.
Before this change, your consumer would now consume from both topic &lt;code&gt;A&lt;/code&gt; and
&lt;code&gt;B&lt;/code&gt; because it was stored in state that the consumer was consuming from topic
&lt;code&gt;A&lt;/code&gt;. With the change, your consumer would only consume from topic &lt;code&gt;B&lt;/code&gt; after
restore because it now filters the topics that are stored in state using the
configured topics.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Change in the Maven modules of Table API
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11064&quot;&gt;FLINK-11064&lt;/a&gt;)&lt;/strong&gt;: Users
that had a &lt;code&gt;flink-table&lt;/code&gt; dependency before, need to update their
dependencies to &lt;code&gt;flink-table-planner&lt;/code&gt; and the correct dependency of
&lt;code&gt;flink-table-api-*&lt;/code&gt;, depending on whether Java or Scala is used: one of
&lt;code&gt;flink-table-api-java-bridge&lt;/code&gt; or &lt;code&gt;flink-table-api-scala-bridge&lt;/code&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;known-issues&quot;&gt;Known Issues&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Discarded checkpoint can cause Tasks to fail
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11662&quot;&gt;FLINK-11662&lt;/a&gt;)&lt;/strong&gt;: There is
a race condition that can lead to erroneous checkpoint failures. This mostly
occurs when restarting from a savepoint or checkpoint takes a long time at the
sources of a job. If you see random checkpointing failures that don’t seem to
have a good explanation you might be affected. Please see the Jira issue for
more details and a workaround for the problem.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.8/release-notes/flink-1.8.html&quot;&gt;release
notes&lt;/a&gt;
for a more detailed list of changes and new features if you plan to upgrade
your Flink setup to Flink 1.8.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;We would like to acknowledge all community members for contributing to this
release. Special credits go to the following members for contributing to the
1.8.0 release (according to &lt;code&gt;git log --pretty=&quot;%an&quot; release-1.7.0..release-1.8.0 | sort | uniq&lt;/code&gt; without manual deduplication):&lt;/p&gt;
&lt;p&gt;Addison Higham, Aitozi, Aleksey Pak, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Andrey Zagrebin, Artsem Semianenka, Asura7969, Avi, Barisa Obradovic, Benchao Li, Bo WANG, Chesnay Schepler, Congxian Qiu, Cristian, David Anderson, Dawid Wysakowicz, Dian Fu, DuBin, EAlexRojas, EronWright, Eugen Yushin, Fabian Hueske, Fokko Driesprong, Gary Yao, Hequn Cheng, Igal Shilman, Jamie Grier, JaryZhen, Jeff Zhang, Jihyun Cho, Jinhu Wu, Joerg Schad, KarmaGYZ, Kezhu Wang, Konstantin Knauf, Kostas Kloudas, Lakshmi, Lakshmi Gururaja Rao, Lavkesh Lahngir, Li, Shuangjiang, Mai Nakagawa, Matrix42, Matt, Maximilian Michels, Mododo, Nico Kruber, Paul Lin, Piotr Nowojski, Qi Yu, Qin, Robert, Robert Metzger, Romano Vacca, Rong Rong, Rune Skou Larsen, Seth Wiesman, Shannon Carey, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, SuXingLee, TANG Wen-hui, Tao Yang, Thomas Weise, Till Rohrmann, Timo Walther, Tom Goong, Tony Feng, Tony Wei, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Ufuk Celebi, Xingcan Cui, Xpray, XuQianJin-Stars, Xue Yu, Yangze Guo, Ying Xu, Yiqun Lin, Yu Li, Yuanyang Wu, Yun Tang, ZILI CHEN, Zhanchun Zhang, Zhijiang, ZiLi Chen, acqua.csq, alex04.wang, ap, azagrebin, blueszheng, boshu Zheng, chengjie.wu, chensq, chummyhe89, eaglewatcherwb, hequn8128, ifndef-SleePy, intsmaze, jackyyin, jinhu.wjh, jparkie, jrthe42, junsheng.wu, kgorman, kkloudas, kkolman, klion26, lamber-ken, leesf, libenchao, lining, liuzhaokun, lzh3636, maqingxiang, mb-datadome, okidogi, park.yq, sunhaibotb, sunjincheng121, tison, unknown, vinoyang, wenhuitang, wind, xueyu, xuqianjin, yanghua, zentol, zhangzhanchun, zhijiang, zhuzhu.zz, zy, 仲炜, 砚田, 谢磊&lt;/p&gt;
</description>
<pubDate>Tue, 09 Apr 2019 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2019/04/09/release-1.8.0.html</link>
<guid isPermaLink="true">/news/2019/04/09/release-1.8.0.html</guid>
</item>
<item>
<title>Flink and Prometheus: Cloud-native monitoring of streaming applications</title>
<description>&lt;p&gt;This blog post describes how developers can leverage Apache Flink’s built-in &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;metrics system&lt;/a&gt; together with &lt;a href=&quot;https://prometheus.io/&quot;&gt;Prometheus&lt;/a&gt; to observe and monitor streaming applications in an effective way. This is a follow-up post from my &lt;a href=&quot;https://flink-forward.org/&quot;&gt;Flink Forward&lt;/a&gt; Berlin 2018 talk (&lt;a href=&quot;https://www.slideshare.net/MaximilianBode1/monitoring-flink-with-prometheus&quot;&gt;slides&lt;/a&gt;, &lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/monitoring-flink-with-prometheus&quot;&gt;video&lt;/a&gt;). We will cover some basic Prometheus concepts and why it is a great fit for monitoring Apache Flink stream processing jobs. There is also an example to showcase how you can utilize Prometheus with Flink to gain insights into your applications and be alerted on potential degradations of your Flink jobs.&lt;/p&gt;
&lt;h2 id=&quot;why-prometheus&quot;&gt;Why Prometheus?&lt;/h2&gt;
&lt;p&gt;Prometheus is a metrics-based monitoring system that was originally created in 2012. The system is completely open-source (under the Apache License 2) with a vibrant community behind it and it has graduated from the Cloud Native Foundation last year – a sign of maturity, stability and production-readiness. As we mentioned, the system is based on metrics and it is designed to measure the overall health, behavior and performance of a service. Prometheus features a multi-dimensional data model as well as a flexible query language. It is designed for reliability and can easily be deployed in traditional or containerized environments. Some of the important Prometheus concepts are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Metrics:&lt;/strong&gt; Prometheus defines metrics as floats of information that change in time. These time series have millisecond precision.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Labels&lt;/strong&gt; are the key-value pairs associated with time series that support Prometheus’ flexible and powerful data model – in contrast to hierarchical data structures that one might experience with traditional metrics systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scrape:&lt;/strong&gt; Prometheus is a pull-based system and fetches (“scrapes”) metrics data from specified sources that expose HTTP endpoints with a text-based format.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;PromQL&lt;/strong&gt; is Prometheus’ &lt;a href=&quot;https://prometheus.io/docs/prometheus/latest/querying/basics/&quot;&gt;query language&lt;/a&gt;. It can be used for both building dashboards and setting up alert rules that will trigger when specific conditions are met.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When considering metrics and monitoring systems for your Flink jobs, there are many &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;options&lt;/a&gt;. Flink offers native support for exposing data to Prometheus via the &lt;code&gt;PrometheusReporter&lt;/code&gt; configuration. Setting up this integration is very easy.&lt;/p&gt;
&lt;p&gt;Prometheus is a great choice as usually Flink jobs are not running in isolation but in a greater context of microservices. For making metrics available to Prometheus from other parts of a larger system, there are two options: There exist &lt;a href=&quot;https://prometheus.io/docs/instrumenting/clientlibs/&quot;&gt;libraries for all major languages&lt;/a&gt; to instrument other applications. Additionally, there is a wide variety of &lt;a href=&quot;https://prometheus.io/docs/instrumenting/exporters/&quot;&gt;exporters&lt;/a&gt;, which are tools that expose metrics of third-party systems (like databases or Apache Kafka) as Prometheus metrics.&lt;/p&gt;
&lt;h2 id=&quot;prometheus-and-flink-in-action&quot;&gt;Prometheus and Flink in Action&lt;/h2&gt;
&lt;p&gt;We have provided a &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example&quot;&gt;GitHub repository&lt;/a&gt; that demonstrates the integration described above. To have a look, clone the repository, make sure &lt;a href=&quot;https://docs.docker.com/install/&quot;&gt;Docker&lt;/a&gt; is installed and run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;./gradlew composeUp
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This builds a Flink job using the build tool &lt;a href=&quot;https://gradle.org/&quot;&gt;Gradle&lt;/a&gt; and starts up a local environment based on &lt;a href=&quot;https://docs.docker.com/compose/&quot;&gt;Docker Compose&lt;/a&gt; running the job in a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/deployment/docker.html#flink-job-cluster&quot;&gt;Flink job cluster&lt;/a&gt; (reachable at &lt;a href=&quot;http://localhost:8081/&quot;&gt;http://localhost:8081&lt;/a&gt;) as well as a Prometheus instance (&lt;a href=&quot;http://localhost:9090/&quot;&gt;http://localhost:9090&lt;/a&gt;).&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheusexamplejob.png&quot; width=&quot;600px&quot; alt=&quot;PrometheusExampleJob in Flink Web UI&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Job graph and custom metric for example job in Flink web interface.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;PrometheusExampleJob&lt;/code&gt; has three operators: Random numbers up to 10,000 are generated, then a map counts the events and creates a histogram of the values passed through. Finally, the events are discarded without further output. The very simple code below is from the second operator. It illustrates how easy it is to add custom metrics relevant to your business logic into your Flink job.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FlinkMetricsExposingMapFunction&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichMapFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;open&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Configuration&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parameters&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getMetricGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;().&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;counter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;events&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Integer&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;eventCounter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;inc&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;value&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;center&gt;&lt;i&gt;&lt;small&gt;Excerpt from &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example/blob/master/src/main/java/com/github/mbode/flink_prometheus_example/FlinkMetricsExposingMapFunction.java&quot;&gt;FlinkMetricsExposingMapFunction.java&lt;/a&gt; demonstrating custom Flink metric.&lt;/small&gt;&lt;/i&gt;&lt;/center&gt;
&lt;h2 id=&quot;configuring-prometheus-with-flink&quot;&gt;Configuring Prometheus with Flink&lt;/h2&gt;
&lt;p&gt;To start monitoring Flink with Prometheus, the following steps are necessary:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Make the &lt;code&gt;PrometheusReporter&lt;/code&gt; jar available to the classpath of the Flink cluster (it comes with the Flink distribution):&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; cp /opt/flink/opt/flink-metrics-prometheus-1.7.2.jar /opt/flink/lib
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter&quot;&gt;Configure the reporter&lt;/a&gt; in Flink’s &lt;em&gt;flink-conf.yaml&lt;/em&gt;. All job managers and task managers will expose the metrics on the configured port.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9999
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Prometheus needs to know where to scrape metrics. In a static scenario, you can simply &lt;a href=&quot;https://prometheus.io/docs/prometheus/latest/configuration/configuration/&quot;&gt;configure Prometheus&lt;/a&gt; in &lt;em&gt;prometheus.yml&lt;/em&gt; with the following:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt; scrape_configs:
- job_name: &#39;flink&#39;
static_configs:
- targets: [&#39;job-cluster:9999&#39;, &#39;taskmanager1:9999&#39;, &#39;taskmanager2:9999&#39;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;In more dynamic scenarios we recommend using Prometheus’ service discovery support for different platforms such as Kubernetes, AWS EC2 and more.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Both custom metrics are now available in Prometheus:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheus.png&quot; width=&quot;600px&quot; alt=&quot;Prometheus web UI with example metric&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Example metric in Prometheus web UI.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;More technical metrics from the Flink cluster (like checkpoint sizes or duration, Kafka offsets or resource consumption) are also available. If you are interested, you can check out the HTTP endpoints exposing all Prometheus metrics for the job managers and the two task managers on &lt;a href=&quot;http://localhost:9249/metrics&quot;&gt;http://localhost:9249&lt;/a&gt;, &lt;a href=&quot;http://localhost:9250/metrics&quot;&gt;http://localhost:9250&lt;/a&gt; and &lt;a href=&quot;http://localhost:9251/metrics&quot;&gt;http://localhost:9251&lt;/a&gt;, respectively.&lt;/p&gt;
&lt;p&gt;To test Prometheus’ alerting feature, kill one of the Flink task managers via&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker kill taskmanager1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Our Flink job can recover from this partial failure via the mechanism of &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/state/checkpointing.html&quot;&gt;Checkpointing&lt;/a&gt;. Nevertheless, after roughly one minute (as configured in the alert rule) the following alert will fire:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-03-11-prometheus-monitoring/prometheusalerts.png&quot; width=&quot;600px&quot; alt=&quot;Prometheus web UI with example alert&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Example alert in Prometheus web UI.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;In real-world situations alerts like this one can be routed through a component called &lt;a href=&quot;https://prometheus.io/docs/alerting/alertmanager/&quot;&gt;Alertmanager&lt;/a&gt; and be grouped into notifications to systems like email, PagerDuty or Slack.&lt;/p&gt;
&lt;p&gt;Go ahead and play around with the setup, and check out the &lt;a href=&quot;https://grafana.com/grafana&quot;&gt;Grafana&lt;/a&gt; instance reachable at &lt;a href=&quot;http://localhost:3000/&quot;&gt;http://localhost:3000&lt;/a&gt; (credentials &lt;em&gt;admin:flink&lt;/em&gt;) for visualizing Prometheus metrics. If there are any questions or problems, feel free to &lt;a href=&quot;https://github.com/mbode/flink-prometheus-example/issues&quot;&gt;create an issue&lt;/a&gt;. Once finished, do not forget to tear down the setup via&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;./gradlew composeDown
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Using Prometheus together with Flink provides an easy way for effective monitoring and alerting of your Flink jobs. Both projects have exciting and vibrant communities behind them with new developments and additions scheduled for upcoming releases. We encourage you to try the two technologies together as it has immensely improved our insights into Flink jobs running in production.&lt;/p&gt;
</description>
<pubDate>Mon, 11 Mar 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2019/03/11/prometheus-monitoring.html</link>
<guid isPermaLink="true">/features/2019/03/11/prometheus-monitoring.html</guid>
</item>
<item>
<title>What to expect from Flink Forward San Francisco 2019</title>
<description>&lt;p&gt;The third annual Flink Forward San Francisco is just a few weeks away! As always, Flink Forward will be the right place to meet and mingle with experienced Flink users, contributors, and committers. Attendees will hear and chat about the latest developments around Flink and learn from technical deep-dive sessions and exciting use cases that were put into production with Flink. The event will take place on April 1-2, 2019 at Hotel Nikko in San Francisco. The &lt;a href=&quot;https://sf-2019.flink-forward.org/program-committee&quot;&gt;program committee&lt;/a&gt; assembled an amazing &lt;a href=&quot;https://sf-2019.flink-forward.org/speakers&quot;&gt;lineup of speakers&lt;/a&gt; who will cover many different aspects of Apache Flink and stream processing.&lt;/p&gt;
&lt;p&gt;Some highlights of the program are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#realtime-store-visit-predictions-at-scale&quot;&gt;Realtime Store Visit Predictions at Scale&lt;/a&gt;: Luca Giovagnoli from &lt;em&gt;Yelp&lt;/em&gt; will talk about a “multidisciplinary” Flink application that combines geospatial clustering algorithms, Machine Learning models, and cutting-edge stream-processing technology.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#real-time-processing-with-flink-for-machine-learning-at-netflix&quot;&gt;Real-time Processing with Flink for Machine Learning at Netflix&lt;/a&gt;: Elliot Chow will discuss the practical aspects of using Apache Flink to power Machine Learning algorithms for video recommendations, search results ranking, and selection of artwork images at &lt;em&gt;Netflix&lt;/em&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#building-production-flink-jobs-with-airstream-at-airbnb&quot;&gt;Building production Flink jobs with Airstream at Airbnb&lt;/a&gt;: Pala Muthiah and Hao Wang will reveal how &lt;em&gt;Airbnb&lt;/em&gt; builds real time data pipelines with Airstream, Airbnb’s computation framework that is powered by Flink SQL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#when-table-meets-ai--build-flink-ai-ecosystem-on-table-api&quot;&gt;When Table meets AI: Build Flink AI Ecosystem on Table API&lt;/a&gt;: Shaoxuan Wang from &lt;em&gt;Alibaba&lt;/em&gt; will discuss how they are building a solid AI ecosystem for unified batch/streaming Machine Learning data pipelines on top of Flink’s Table API.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/conference-program#adventures-in-scaling-from-zero-to-5-billion-data-points-per-day&quot;&gt;Adventures in Scaling from Zero to 5 Billion Data Points per Day&lt;/a&gt;: Dave Torok will take us through &lt;em&gt;Comcast’s&lt;/em&gt; journey in scaling the company’s operationalized Machine Learning framework from the very early days in production to processing more than 5 billion data points per day.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re new to Apache Flink or want to deepen your knowledge around the framework, Flink Forward features again a full day of training.&lt;/p&gt;
&lt;p&gt;You can choose from 3 training tracks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#introduction-to-streaming-with-apache-flink&quot;&gt;Introduction to Streaming with Apache Flink&lt;/a&gt;: A hands-on, in-depth introduction to stream processing and Apache Flink, this course emphasizes those features of Flink that make it easy to build and manage accurate, fault tolerant applications on streams.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#analyzing-streaming-data-with-flink-sql&quot;&gt;Analyzing Streaming Data with Flink SQL&lt;/a&gt;: In this hands-on training, you will learn what it means to run SQL queries on data streams and how to fully leverage the potential of SQL on Flink. We’ll also cover some of the more recent features such as time-versioned joins and the MATCH RECOGNIZE clause.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&quot;https://sf-2019.flink-forward.org/training-program#apache-flink-troubleshooting---operations&quot;&gt;Troubleshooting and Operating Flink at large scale&lt;/a&gt;: In this training, we will focus on everything you need to run Apache Flink applications reliably and efficiently in production including topics like capacity planning, monitoring, troubleshooting and tuning Apache Flink.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you haven’t done so yet, check out the &lt;a href=&quot;http://sf-2019.flink-forward.org/conference-program&quot;&gt;full schedule&lt;/a&gt; and &lt;a href=&quot;https://sf-2019.flink-forward.org/register&quot;&gt;register&lt;/a&gt; your attendance. &lt;br /&gt;
I’m looking forward to meet you at Flink Forward San Francisco.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Fabian&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Wed, 06 Mar 2019 12:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/03/06/ffsf-preview.html</link>
<guid isPermaLink="true">/news/2019/03/06/ffsf-preview.html</guid>
</item>
<item>
<title>Monitoring Apache Flink Applications 101</title>
<description>&lt;!-- improve style of tables --&gt;
&lt;style&gt;
table { border: 0px solid black; table-layout: auto; width: 800px; }
th, td { border: 1px solid black; padding: 5px; padding-left: 10px; padding-right: 10px; }
th { text-align: center }
td { vertical-align: top }
&lt;/style&gt;
&lt;p&gt;This blog post provides an introduction to Apache Flink’s built-in monitoring
and metrics system, that allows developers to effectively monitor their Flink
jobs. Oftentimes, the task of picking the relevant metrics to monitor a
Flink application can be overwhelming for a DevOps team that is just starting
with stream processing and Apache Flink. Having worked with many organizations
that deploy Flink at scale, I would like to share my experience and some best
practice with the community.&lt;/p&gt;
&lt;p&gt;With business-critical applications running on Apache Flink, performance monitoring
becomes an increasingly important part of a successful production deployment. It
ensures that any degradation or downtime is immediately identified and resolved
as quickly as possible.&lt;/p&gt;
&lt;p&gt;Monitoring goes hand-in-hand with observability, which is a prerequisite for
troubleshooting and performance tuning. Nowadays, with the complexity of modern
enterprise applications and the speed of delivery increasing, an engineering
team must understand and have a complete overview of its applications’ status at
any given point in time.&lt;/p&gt;
&lt;h2 id=&quot;flinks-metrics-system&quot;&gt;Flink’s Metrics System&lt;/h2&gt;
&lt;p&gt;The foundation for monitoring Flink jobs is its &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;metrics
system&lt;/a&gt;
which consists of two components; &lt;code&gt;Metrics&lt;/code&gt; and &lt;code&gt;MetricsReporters&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;metrics&quot;&gt;Metrics&lt;/h3&gt;
&lt;p&gt;Flink comes with a comprehensive set of built-in metrics such as:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Used JVM Heap / NonHeap / Direct Memory (per Task-/JobManager)&lt;/li&gt;
&lt;li&gt;Number of Job Restarts (per Job)&lt;/li&gt;
&lt;li&gt;Number of Records Per Second (per Operator)&lt;/li&gt;
&lt;li&gt;…&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These metrics have different scopes and measure more general (e.g. JVM or
operating system) as well as Flink-specific aspects.&lt;/p&gt;
&lt;p&gt;As a user, you can and should add application-specific metrics to your
functions. Typically these include counters for the number of invalid records or
the number of records temporarily buffered in managed state. Besides counters,
Flink offers additional metrics types like gauges and histograms. For
instructions on how to register your own metrics with Flink’s metrics system
please check out &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#registering-metrics&quot;&gt;Flink’s
documentation&lt;/a&gt;.
In this blog post, we will focus on how to get the most out of Flink’s built-in
metrics.&lt;/p&gt;
&lt;h3 id=&quot;metricsreporters&quot;&gt;MetricsReporters&lt;/h3&gt;
&lt;p&gt;All metrics can be queried via Flink’s REST API. However, users can configure
MetricsReporters to send the metrics to external systems. Apache Flink provides
reporters to the most common monitoring tools out-of-the-box including JMX,
Prometheus, Datadog, Graphite and InfluxDB. For information about how to
configure a reporter check out Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#reporter&quot;&gt;MetricsReporter
documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In the remaining part of this blog post, we will go over some of the most
important metrics to monitor your Apache Flink application.&lt;/p&gt;
&lt;h2 id=&quot;monitoring-general-health&quot;&gt;Monitoring General Health&lt;/h2&gt;
&lt;p&gt;The first thing you want to monitor is whether your job is actually in a &lt;em&gt;RUNNING&lt;/em&gt;
state. In addition, it pays off to monitor the number of restarts and the time
since the last restart.&lt;/p&gt;
&lt;p&gt;Generally speaking, successful checkpointing is a strong indicator of the
general health of your application. For each checkpoint, checkpoint barriers
need to flow through the whole topology of your Flink job and events and
barriers cannot overtake each other. Therefore, a successful checkpoint shows
that no channel is fully congested.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;uptime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job&lt;/td&gt;
&lt;td&gt;The time that the job has been running without interruption.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;fullRestarts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job&lt;/td&gt;
&lt;td&gt;The total number of full restarts since this job was submitted.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;numberOfCompletedCheckpoints&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job&lt;/td&gt;
&lt;td&gt;The number of successfully completed checkpoints.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;numberOfFailedCheckpoints&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job&lt;/td&gt;
&lt;td&gt;The number of failed checkpoints.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-1.png&quot; width=&quot;800px&quot; alt=&quot;Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-2.png&quot; width=&quot;800px&quot; alt=&quot;Completed Checkpoints (18336), Failed (14)&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Completed Checkpoints (18336), Failed (14)&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ΔfullRestarts&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ΔnumberOfFailedCheckpoints&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;monitoring-progress--throughput&quot;&gt;Monitoring Progress &amp;amp; Throughput&lt;/h2&gt;
&lt;p&gt;Knowing that your application is RUNNING and checkpointing is working fine is good,
but it does not tell you whether the application is actually making progress and
keeping up with the upstream systems.&lt;/p&gt;
&lt;h3 id=&quot;throughput&quot;&gt;Throughput&lt;/h3&gt;
&lt;p&gt;Flink provides multiple metrics to measure the throughput of our application.
For each operator or task (remember: a task can contain multiple &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/stream/operators/#task-chaining-and-resource-groups&quot;&gt;chained
tasks&lt;/a&gt;
Flink counts the number of records and bytes going in and out. Out of those
metrics, the rate of outgoing records per operator is often the most intuitive
and easiest to reason about.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;numRecordsOutPerSecond&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;task&lt;/td&gt;
&lt;td&gt;The number of records this operator/task sends per second.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;numRecordsOutPerSecond&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;operator&lt;/td&gt;
&lt;td&gt;The number of records this operator sends per second.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-3.png&quot; width=&quot;800px&quot; alt=&quot;Mean Records Out per Second per Operator&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Mean Records Out per Second per Operator&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;recordsOutPerSecond&lt;/code&gt; = &lt;code&gt;0&lt;/code&gt; (for a non-Sink operator)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: Source operators always have zero incoming records. Sink operators
always have zero outgoing records because the metrics only count
Flink-internal communication. There is a &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7286&quot;&gt;JIRA
ticket&lt;/a&gt; to change this
behavior.&lt;/p&gt;
&lt;h3 id=&quot;progress&quot;&gt;Progress&lt;/h3&gt;
&lt;p&gt;For applications, that use event time semantics, it is important that watermarks
progress over time. A watermark of time &lt;em&gt;t&lt;/em&gt; tells the framework, that it
should not anymore expect to receive  events with a timestamp earlier than &lt;em&gt;t&lt;/em&gt;,
and in turn, to trigger all operations that were scheduled for a timestamp &amp;lt; &lt;em&gt;t&lt;/em&gt;.
For example, an event time window that ends at &lt;em&gt;t&lt;/em&gt; = 30 will be closed and
evaluated once the watermark passes 30.&lt;/p&gt;
&lt;p&gt;As a consequence, you should monitor the watermark at event time-sensitive
operators in your application, such as process functions and windows. If the
difference between the current processing time and the watermark, known as
even-time skew, is unusually high, then it typically implies one of two issues.
First, it could mean that your are simply processing old events, for example
during catch-up after a downtime or when your job is simply not able to keep up
and events are queuing up. Second, it could mean a single upstream sub-task has
not sent a watermark for a long time (for example because it did not receive any
events to base the watermark on), which also prevents the watermark in
downstream operators to progress. This &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5017&quot;&gt;JIRA
ticket&lt;/a&gt; provides further
information and a work around for the latter.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;currentOutputWatermark&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;operator&lt;/td&gt;
&lt;td&gt;The last watermark this operator has emitted.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panels&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-4.png&quot; width=&quot;800px&quot; alt=&quot;Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging a few seconds behind for each subtask.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging a few seconds behind for each subtask.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;currentProcessingTime - currentOutputWatermark&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;keeping-up&quot;&gt;“Keeping Up”&lt;/h3&gt;
&lt;p&gt;When consuming from a message queue, there is often a direct way to monitor if
your application is keeping up. By using connector-specific metrics you can
monitor how far behind the head of the message queue your current consumer group
is. Flink forwards the underlying metrics from most sources.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;records-lag-max&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;user&lt;/td&gt;
&lt;td&gt;applies to &lt;code&gt;FlinkKafkaConsumer&lt;/code&gt;. The maximum lag in terms of the number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;millisBehindLatest&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;user&lt;/td&gt;
&lt;td&gt;applies to &lt;code&gt;FlinkKinesisConsumer&lt;/code&gt;. The number of milliseconds a consumer is behind the head of the stream. For any consumer and Kinesis shard, this indicates how far it is behind the current time.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;records-lag-max&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;millisBehindLatest&lt;/code&gt; &amp;gt; &lt;code&gt;threshold&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;monitoring-latency&quot;&gt;Monitoring Latency&lt;/h2&gt;
&lt;p&gt;Generally speaking, latency is the delay between the creation of an event and
the time at which results based on this event become visible. Once the event is
created it is usually stored in a persistent message queue, before it is
processed by Apache Flink, which then writes the results to a database or calls
a downstream system. In such a pipeline, latency can be introduced at each stage
and for various reasons including the following:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;It might take a varying amount of time until events are persisted in the
message queue.&lt;/li&gt;
&lt;li&gt;During periods of high load or during recovery, events might spend some time
in the message queue until they are processed by Flink (see previous section).&lt;/li&gt;
&lt;li&gt;Some operators in a streaming topology need to buffer events for some time
(e.g. in a time window) for functional reasons.&lt;/li&gt;
&lt;li&gt;Each computation in your Flink topology (framework or user code), as well as
each network shuffle, takes time and adds to latency.&lt;/li&gt;
&lt;li&gt;If the application emits through a transactional sink, the sink will only
commit and publish transactions upon successful checkpoints of Flink, adding
latency usually up to the checkpointing interval for each record.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;In practice, it has proven invaluable to add timestamps to your events at
multiple stages (at least at creation, persistence, ingestion by Flink,
publication by Flink, possibly sampling those to save bandwidth). The
differences between these timestamps can be exposed as a user-defined metric in
your Flink topology to derive the latency distribution of each stage.&lt;/p&gt;
&lt;p&gt;In the rest of this section, we will only consider latency, which is introduced
inside the Flink topology and cannot be attributed to transactional sinks or
events being buffered for functional reasons (4.).&lt;/p&gt;
&lt;p&gt;To this end, Flink comes with a feature called &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#latency-tracking&quot;&gt;Latency
Tracking&lt;/a&gt;.
When enabled, Flink will insert so-called latency markers periodically at all
sources. For each sub-task, a latency distribution from each source to this
operator will be reported. The granularity of these histograms can be further
controlled by setting &lt;em&gt;metrics.latency.granularity&lt;/em&gt; as desired.&lt;/p&gt;
&lt;p&gt;Due to the potentially high number of histograms (in particular for
&lt;em&gt;metrics.latency.granularity: subtask&lt;/em&gt;), enabling latency tracking can
significantly impact the performance of the cluster. It is recommended to only
enable it to locate sources of latency during debugging.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;latency&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;operator&lt;/td&gt;
&lt;td&gt;The latency from the source operator to this operator.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;restartingTime&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job&lt;/td&gt;
&lt;td&gt;The time it took to restart the job, or how long the current restart has been in progress.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-5.png&quot; width=&quot;800px&quot; alt=&quot;Latency distribution between a source and a single sink subtask.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;Latency distribution between a source and a single sink subtask.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;jvm-metrics&quot;&gt;JVM Metrics&lt;/h2&gt;
&lt;p&gt;So far we have only looked at Flink-specific metrics. As long as latency &amp;amp;
throughput of your application are in line with your expectations and it is
checkpointing consistently, this is probably everything you need. On the other
hand, if you job’s performance is starting to degrade among the firstmetrics you
want to look at are memory consumption and CPU load of your Task- &amp;amp; JobManager
JVMs.&lt;/p&gt;
&lt;h3 id=&quot;memory&quot;&gt;Memory&lt;/h3&gt;
&lt;p&gt;Flink reports the usage of Heap, NonHeap, Direct &amp;amp; Mapped memory for JobManagers
and TaskManagers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Heap memory - as with most JVM applications - is the most volatile and important
metric to watch. This is especially true when using Flink’s filesystem
statebackend as it keeps all state objects on the JVM Heap. If the size of
long-living objects on the Heap increases significantly, this can usually be
attributed to the size of your application state (check the
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#checkpointing&quot;&gt;checkpointing metrics&lt;/a&gt;
for an estimated size of the on-heap state). The possible reasons for growing
state are very application-specific. Typically, an increasing number of keys, a
large event-time skew between different input streams or simply missing state
cleanup may cause growing state.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;NonHeap memory is dominated by the metaspace, the size of which is unlimited by default
and holds class metadata as well as static content. There is a
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10317&quot;&gt;JIRA Ticket&lt;/a&gt; to limit the size
to 250 megabyte by default.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The biggest driver of Direct memory is by far the
number of Flink’s network buffers, which can be
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/config.html#configuring-the-network-buffers&quot;&gt;configured&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Mapped memory is usually close to zero as Flink does not use memory-mapped files.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In a containerized environment you should additionally monitor the overall
memory consumption of the Job- and TaskManager containers to ensure they don’t
exceed their resource limits. This is particularly important, when using the
RocksDB statebackend, since RocksDB allocates a considerable amount of
memory off heap. To understand how much memory RocksDB might use, you can
checkout &lt;a href=&quot;https://www.da-platform.com/blog/manage-rocksdb-memory-size-apache-flink&quot;&gt;this blog
post&lt;/a&gt;
by Stefan Richter.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.Memory.NonHeap.Committed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The amount of non-heap memory guaranteed to be available to the JVM (in bytes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.Memory.Heap.Used&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The amount of heap memory currently used (in bytes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.Memory.Heap.Committed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The amount of heap memory guaranteed to be available to the JVM (in bytes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.Memory.Direct.MemoryUsed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The amount of memory used by the JVM for the direct buffer pool (in bytes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.Memory.Mapped.MemoryUsed&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The amount of memory used by the JVM for the mapped buffer pool (in bytes).&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.GarbageCollector.G1 Young Generation.Time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The total time spent performing G1 Young Generation garbage collection.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.GarbageCollector.G1 Old Generation.Time&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The total time spent performing G1 Old Generation garbage collection.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-6.png&quot; width=&quot;800px&quot; alt=&quot;TaskManager memory consumption and garbage collection times.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;TaskManager memory consumption and garbage collection times.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-7.png&quot; width=&quot;800px&quot; alt=&quot;JobManager memory consumption and garbage collection times.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;JobManager memory consumption and garbage collection times.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Possible Alerts&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;container memory limit&lt;/code&gt; &amp;lt; &lt;code&gt;container memory + safety margin&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;cpu&quot;&gt;CPU&lt;/h3&gt;
&lt;p&gt;Besides memory, you should also monitor the CPU load of the TaskManagers. If
your TaskManagers are constantly under very high load, you might be able to
improve the overall performance by decreasing the number of task slots per
TaskManager (in case of a Standalone setup), by providing more resources to the
TaskManager (in case of a containerized setup), or by providing more
TaskManagers. In general, a system already running under very high load during
normal operations, will need much more time to catch-up after recovering from a
downtime. During this time you will see a much higher latency (event-time skew) than
usual.&lt;/p&gt;
&lt;p&gt;A sudden increase in the CPU load might also be attributed to high garbage
collection pressure, which should be visible in the JVM memory metrics as well.&lt;/p&gt;
&lt;p&gt;If one or a few TaskManagers are constantly under very high load, this can slow
down the whole topology due to long checkpoint alignment times and increasing
event-time skew. A common reason is skew in the partition key of the data, which
can be mitigated by pre-aggregating before the shuffle or keying on a more
evenly distributed key.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Metrics&lt;/strong&gt;&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Scope&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;Status.JVM.CPU.Load&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;job-/taskmanager&lt;/td&gt;
&lt;td&gt;The recent CPU usage of the JVM.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Example Dashboard Panel&lt;/strong&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/2019-02-21-monitoring-best-practices/fig-8.png&quot; width=&quot;800px&quot; alt=&quot;TaskManager &amp;amp; JobManager CPU load.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;&lt;small&gt;TaskManager &amp;amp; JobManager CPU load.&lt;/small&gt;&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;system-resources&quot;&gt;System Resources&lt;/h2&gt;
&lt;p&gt;In addition to the JVM metrics above, it is also possible to use Flink’s metrics
system to gather insights about system resources, i.e. memory, CPU &amp;amp;
network-related metrics for the whole machine as opposed to the Flink processes
alone. System resource monitoring is disabled by default and requires additional
dependencies on the classpath. Please check out the
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html#system-resources&quot;&gt;Flink system resource metrics documentation&lt;/a&gt; for
additional guidance and details. System resource monitoring in Flink can be very
helpful in setups without existing host monitoring capabilities.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;This post tries to shed some light on Flink’s metrics and monitoring system. You
can utilise it as a starting point when you first think about how to
successfully monitor your Flink application. I highly recommend to start
monitoring your Flink application early on in the development phase. This way
you will be able to improve your dashboards and alerts over time and, more
importantly, observe the performance impact of the changes to your application
throughout the development phase. By doing so, you can ask the right questions
about the runtime behaviour of your application, and learn much more about
Flink’s internals early on.&lt;/p&gt;
&lt;p&gt;Last but not least, this post only scratches the surface of the overall metrics
and monitoring capabilities of Apache Flink. I highly recommend going over
&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/monitoring/metrics.html&quot;&gt;Flink’s metrics documentation&lt;/a&gt;
for a full reference of Flink’s metrics system.&lt;/p&gt;
</description>
<pubDate>Mon, 25 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/25/monitoring-best-practices.html</link>
<guid isPermaLink="true">/news/2019/02/25/monitoring-best-practices.html</guid>
</item>
<item>
<title>Apache Flink 1.6.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fourth bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 25 fixes and minor improvements for Flink 1.6.3. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.4.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10721&quot;&gt;FLINK-10721&lt;/a&gt;] - Kafka discovery-loop exceptions may be swallowed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10761&quot;&gt;FLINK-10761&lt;/a&gt;] - MetricGroup#getAllVariables can deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10774&quot;&gt;FLINK-10774&lt;/a&gt;] - connection leak when partition discovery is disabled and open throws exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10848&quot;&gt;FLINK-10848&lt;/a&gt;] - Flink&amp;#39;s Yarn ResourceManager can allocate too many excess containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11022&quot;&gt;FLINK-11022&lt;/a&gt;] - Update LICENSE and NOTICE files for older releases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11071&quot;&gt;FLINK-11071&lt;/a&gt;] - Dynamic proxy classes cannot be resolved when deserializing job graph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11084&quot;&gt;FLINK-11084&lt;/a&gt;] - Incorrect ouput after two consecutive split and select
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11119&quot;&gt;FLINK-11119&lt;/a&gt;] - Incorrect Scala example for Table Function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11134&quot;&gt;FLINK-11134&lt;/a&gt;] - Invalid REST API request should not log the full exception in Flink logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11151&quot;&gt;FLINK-11151&lt;/a&gt;] - FileUploadHandler stops working if the upload directory is removed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11173&quot;&gt;FLINK-11173&lt;/a&gt;] - Proctime attribute validation throws an incorrect exception message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11224&quot;&gt;FLINK-11224&lt;/a&gt;] - Log is missing in scala-shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11232&quot;&gt;FLINK-11232&lt;/a&gt;] - Empty Start Time of sub-task on web dashboard
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11234&quot;&gt;FLINK-11234&lt;/a&gt;] - ExternalTableCatalogBuilder unable to build a batch-only table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11235&quot;&gt;FLINK-11235&lt;/a&gt;] - Elasticsearch connector leaks threads if no connection could be established
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11251&quot;&gt;FLINK-11251&lt;/a&gt;] - Incompatible metric name on prometheus reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11389&quot;&gt;FLINK-11389&lt;/a&gt;] - Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11584&quot;&gt;FLINK-11584&lt;/a&gt;] - ConfigDocsCompletenessITCase fails DescriptionBuilder#linebreak() is used
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11585&quot;&gt;FLINK-11585&lt;/a&gt;] - Prefix matching in ConfigDocsGenerator can result in wrong assignments
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10910&quot;&gt;FLINK-10910&lt;/a&gt;] - Harden Kubernetes e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11079&quot;&gt;FLINK-11079&lt;/a&gt;] - Skip deployment for flnk-storm-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11207&quot;&gt;FLINK-11207&lt;/a&gt;] - Update Apache commons-compress from 1.4.1 to 1.18
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11262&quot;&gt;FLINK-11262&lt;/a&gt;] - Bump jython-standalone to 2.7.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11289&quot;&gt;FLINK-11289&lt;/a&gt;] - Rework example module structure to account for licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11304&quot;&gt;FLINK-11304&lt;/a&gt;] - Typo in time attributes doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11469&quot;&gt;FLINK-11469&lt;/a&gt;] - fix Tuning Checkpoints and Large State doc
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 25 Feb 2019 01:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/25/release-1.6.4.html</link>
<guid isPermaLink="true">/news/2019/02/25/release-1.6.4.html</guid>
</item>
<item>
<title>Apache Flink 1.7.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 40 fixes and minor improvements for Flink 1.7.1, covering several critical
recovery issues as well as problems in the Flink streaming connectors.&lt;/p&gt;
&lt;p&gt;The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.7.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11179&quot;&gt;FLINK-11179&lt;/a&gt;] - JoinCancelingITCase#testCancelSortMatchWhileDoingHeavySorting test error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11180&quot;&gt;FLINK-11180&lt;/a&gt;] - ProcessFailureCancelingITCase#testCancelingOnProcessFailure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11181&quot;&gt;FLINK-11181&lt;/a&gt;] - SimpleRecoveryITCaseBase test error
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10721&quot;&gt;FLINK-10721&lt;/a&gt;] - Kafka discovery-loop exceptions may be swallowed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10761&quot;&gt;FLINK-10761&lt;/a&gt;] - MetricGroup#getAllVariables can deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10774&quot;&gt;FLINK-10774&lt;/a&gt;] - connection leak when partition discovery is disabled and open throws exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10848&quot;&gt;FLINK-10848&lt;/a&gt;] - Flink&amp;#39;s Yarn ResourceManager can allocate too many excess containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11046&quot;&gt;FLINK-11046&lt;/a&gt;] - ElasticSearch6Connector cause thread blocked when index failed with retry
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11071&quot;&gt;FLINK-11071&lt;/a&gt;] - Dynamic proxy classes cannot be resolved when deserializing job graph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11083&quot;&gt;FLINK-11083&lt;/a&gt;] - CRowSerializerConfigSnapshot is not instantiable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11084&quot;&gt;FLINK-11084&lt;/a&gt;] - Incorrect ouput after two consecutive split and select
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11100&quot;&gt;FLINK-11100&lt;/a&gt;] - Presto S3 FileSystem E2E test broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11119&quot;&gt;FLINK-11119&lt;/a&gt;] - Incorrect Scala example for Table Function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11134&quot;&gt;FLINK-11134&lt;/a&gt;] - Invalid REST API request should not log the full exception in Flink logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11145&quot;&gt;FLINK-11145&lt;/a&gt;] - Fix Hadoop version handling in binary release script
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11151&quot;&gt;FLINK-11151&lt;/a&gt;] - FileUploadHandler stops working if the upload directory is removed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11168&quot;&gt;FLINK-11168&lt;/a&gt;] - LargePlanTest times out on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11173&quot;&gt;FLINK-11173&lt;/a&gt;] - Proctime attribute validation throws an incorrect exception message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11187&quot;&gt;FLINK-11187&lt;/a&gt;] - StreamingFileSink with S3 backend transient socket timeout issues
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11191&quot;&gt;FLINK-11191&lt;/a&gt;] - Exception in code generation when ambiguous columns in MATCH_RECOGNIZE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11194&quot;&gt;FLINK-11194&lt;/a&gt;] - missing Scala 2.12 build of HBase connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11201&quot;&gt;FLINK-11201&lt;/a&gt;] - Document SBT dependency requirements when using MiniClusterResource
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11224&quot;&gt;FLINK-11224&lt;/a&gt;] - Log is missing in scala-shell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11227&quot;&gt;FLINK-11227&lt;/a&gt;] - The DescriptorProperties contains some bounds checking errors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11232&quot;&gt;FLINK-11232&lt;/a&gt;] - Empty Start Time of sub-task on web dashboard
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11234&quot;&gt;FLINK-11234&lt;/a&gt;] - ExternalTableCatalogBuilder unable to build a batch-only table
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11235&quot;&gt;FLINK-11235&lt;/a&gt;] - Elasticsearch connector leaks threads if no connection could be established
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11246&quot;&gt;FLINK-11246&lt;/a&gt;] - Fix distinct AGG visibility issues
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11251&quot;&gt;FLINK-11251&lt;/a&gt;] - Incompatible metric name on prometheus reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11279&quot;&gt;FLINK-11279&lt;/a&gt;] - Invalid week interval parsing in ExpressionParser
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11302&quot;&gt;FLINK-11302&lt;/a&gt;] - FlinkS3FileSystem uses an incorrect path for temporary files.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11389&quot;&gt;FLINK-11389&lt;/a&gt;] - Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11419&quot;&gt;FLINK-11419&lt;/a&gt;] - StreamingFileSink fails to recover after taskmanager failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11436&quot;&gt;FLINK-11436&lt;/a&gt;] - Java deserialization failure of the AvroSerializer when used in an old CompositeSerializers
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10457&quot;&gt;FLINK-10457&lt;/a&gt;] - Support SequenceFile for StreamingFileSink
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10910&quot;&gt;FLINK-10910&lt;/a&gt;] - Harden Kubernetes e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11023&quot;&gt;FLINK-11023&lt;/a&gt;] - Update LICENSE and NOTICE files for flink-connectors
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11079&quot;&gt;FLINK-11079&lt;/a&gt;] - Skip deployment for flink-storm-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11207&quot;&gt;FLINK-11207&lt;/a&gt;] - Update Apache commons-compress from 1.4.1 to 1.18
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11216&quot;&gt;FLINK-11216&lt;/a&gt;] - Back to top button is missing in the Joining document and is not properly placed in the Process Function document
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11262&quot;&gt;FLINK-11262&lt;/a&gt;] - Bump jython-standalone to 2.7.1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11289&quot;&gt;FLINK-11289&lt;/a&gt;] - Rework example module structure to account for licensing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11304&quot;&gt;FLINK-11304&lt;/a&gt;] - Typo in time attributes doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11331&quot;&gt;FLINK-11331&lt;/a&gt;] - Fix errors in tableApi.md and functions.md
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11469&quot;&gt;FLINK-11469&lt;/a&gt;] - fix Tuning Checkpoints and Large State doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11473&quot;&gt;FLINK-11473&lt;/a&gt;] - Clarify Documenation on Latency Tracking
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11628&quot;&gt;FLINK-11628&lt;/a&gt;] - Cache maven on travis
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 15 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/15/release-1.7.2.html</link>
<guid isPermaLink="true">/news/2019/02/15/release-1.7.2.html</guid>
</item>
<item>
<title>Batch as a Special Case of Streaming and Alibaba&#39;s contribution of Blink</title>
<description>&lt;p&gt;Last week, we &lt;a href=&quot;https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E&quot;&gt;broke the news&lt;/a&gt; that Alibaba decided to contribute its Flink-fork, called Blink, back to the Apache Flink project. Why is that a big thing for Flink, what will it mean for users and the community, and how does it fit into Flink’s overall vision? Let’s take a step back to understand this better…&lt;/p&gt;
&lt;h2 id=&quot;a-unified-approach-to-batch-and-streaming&quot;&gt;A Unified Approach to Batch and Streaming&lt;/h2&gt;
&lt;p&gt;Since its early days, Apache Flink has followed the philosophy of taking a unified approach to batch and streaming data processing. The core building block is &lt;em&gt;“continuous processing of unbounded data streams”&lt;/em&gt;: if you can do that, you can also do offline processing of bounded data sets (batch processing use cases), because these are just streams that happen to end at some point.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/bounded-unbounded.png&quot; width=&quot;600px&quot; alt=&quot;Processing of bounded and unbounded data.&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;em&gt;“streaming first, with batch as a special case of streaming”&lt;/em&gt; philosophy is supported by various projects (for example &lt;a href=&quot;https://flink.apache.org&quot;&gt;Flink&lt;/a&gt;, &lt;a href=&quot;https://beam.apache.org&quot;&gt;Beam&lt;/a&gt;, etc.) and often been cited as a powerful way to build data applications that &lt;a href=&quot;https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101&quot;&gt;generalize across real-time and offline processing&lt;/a&gt; and to help greatly reduce the complexity of data infrastructures.&lt;/p&gt;
&lt;h3 id=&quot;why-are-there-still-batch-processors&quot;&gt;Why are there still batch processors?&lt;/h3&gt;
&lt;p&gt;However, &lt;em&gt;“batch is just a special case of streaming”&lt;/em&gt; does not mean that any stream processor is now the right tool for your batch processing use cases - the introduction of stream processors did not render batch processors obsolete:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Pure stream processing systems are very slow at batch processing workloads. No one would consider it a good idea to use a stream processor that shuffles through message queues to analyze large amounts of available data.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Unified APIs like &lt;a href=&quot;https://beam.apache.org&quot;&gt;Apache Beam&lt;/a&gt; often delegate to different runtimes depending on whether the data is continuous/unbounded of fix/bounded. For example, the implementations of the batch and streaming runtime of Google Cloud Dataflow are different, to get the desired performance and resilience in each case.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;em&gt;Apache Flink&lt;/em&gt; has a streaming API that can do bounded/unbounded use cases, but still offers a separate DataSet API and runtime stack that is faster for batch use cases.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;What is the reason for the above? Where did &lt;em&gt;“batch is just a special case of streaming”&lt;/em&gt; go wrong?&lt;/p&gt;
&lt;p&gt;The answer is simple, nothing is wrong with that paradigm. Unifying batch and streaming in the API is one aspect. One needs to also exploit certain characteristics of the special case “bounded data” in the runtime to competitively handle batch processing use cases. After all, batch processors have been built specifically for that special case.&lt;/p&gt;
&lt;h2 id=&quot;batch-on-top-of-a-streaming-runtime&quot;&gt;Batch on top of a Streaming Runtime&lt;/h2&gt;
&lt;p&gt;We always believed that it is possible to have a runtime that is state-of-the-art for both stream processing and batch processing use cases at the same time. A runtime that is streaming-first, but can exploit just the right amount of special properties of bounded streams to be as fast for batch use cases as dedicated batch processors. &lt;strong&gt;This is the unique approach that Flink takes.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Apache Flink has a network stack that supports both &lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/improving-throughput-and-latency-with-flinks-network-stack&quot;&gt;low-latency/high-throughput streaming data exchanges&lt;/a&gt;, as well as high-throughput batch shuffles. Flink has streaming runtime operators for many operations, but also specialized operators for bounded inputs, which get used when you choose the DataSet API or select the batch environment in the Table API.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/stream-batch-joins.png&quot; width=&quot;500px&quot; alt=&quot;Streaming and batch joins.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;The figure illustrates a streaming join and a batch join. The batch join can read one input fully into a hash table and then probe with the other input. The stream join needs to build tables for both sides, because it needs to continuously process both inputs.
For data larger than memory, the batch join can partition both data sets into subsets that fit in memory (data hits disk once) whereas the continuous nature of the stream join requires it to always keep all data in the table and repeatedly hit disk on cache misses.&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Because of that, Apache Flink has been actually demonstrating some pretty impressive batch processing performance since its early days. The below benchmark is a bit older, but validated our architectural approach early on.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/sort-performance.png&quot; width=&quot;500px&quot; alt=&quot;Sort performance.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Time to sort 3.2 TB (80 GB/node), in seconds&lt;br /&gt;
(&lt;a href=&quot;https://www.slideshare.net/FlinkForward/dongwon-kim-a-comparative-performance-evaluation-of-flink&quot; target=&quot;blank&quot;&gt;Presentation by Dongwon Kim, Flink Forward Berlin 2015&lt;/a&gt;.)&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-is-still-missing&quot;&gt;What is still missing?&lt;/h2&gt;
&lt;p&gt;To conclude the approach and make Flink’s experience on bounded data (batch) state-of-the-art, we need to add a few more enhancements. We believe that these features are key to realizing our vision:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(1) A truly unified runtime operator stack&lt;/strong&gt;: Currently the bounded and unbounded operators have a different network and threading model and don’t mix and match. The original reason was that batch operators followed a “pull model” (easier for batch algorithms), while streaming operators followed a “push model” (better latency/throughput characteristics). In a unified stack, continuous streaming operators are the foundation. When operating on bounded data without latency constraints, the API or the query optimizer can select from a larger set of operators. The optimizer can pick, for example, a specialized join operator that first consumes one input stream entirely before reading the second input stream.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(2) Exploiting bounded streams to reduce the scope of fault tolerance&lt;/strong&gt;: When input data is bounded, it is possible to completely buffer data during shuffles (memory or disk) and replay that data after a failure. This makes recovery more fine grained and thus much more efficient.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(3) Exploiting bounded stream operator properties for scheduling&lt;/strong&gt;: A continuous unbounded streaming application needs (by definition) all operators running at the same time. An application on bounded data can schedule operations after another, depending on how the operators consume data (e.g., first build hash table, then probe hash table). This increases resource efficiency.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(4) Enabling these special case optimizations for the DataStream API&lt;/strong&gt;: Currently, only the Table API (which is unified across bounded/unbounded streams) activates these optimizations when working on bounded data.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;(5) Performance and coverage for SQL&lt;/strong&gt;: SQL is the de-facto standard data language, and while it is also being rapidly adopted for continuous streaming use cases, there is absolutely no way past it for bounded/batch use cases. To be competitive with the best batch engines, Flink needs more coverage and performance for the SQL query execution. While the core data-plane in Flink is high performance, the speed of SQL execution ultimately depends a lot also on optimizer rules, a rich set of operators, and features like code generation.&lt;/p&gt;
&lt;h2 id=&quot;enter-blink&quot;&gt;Enter Blink&lt;/h2&gt;
&lt;p&gt;Blink is a fork of Apache Flink, originally created inside Alibaba to improve Flink’s behavior for internal use cases. Blink adds a series of improvements and integrations (see the &lt;a href=&quot;https://github.com/apache/flink/blob/blink/README.md&quot;&gt;Readme&lt;/a&gt; for details), many of which fall into the category of improved bounded-data/batch processing and SQL. In fact, of the above list of features for a unified batch/streaming system, Blink implements significant steps forward in all except (4):&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Unified Stream Operators:&lt;/strong&gt; Blink extends the Flink streaming runtime operator model to support selectively reading from different inputs, while keeping the push model for very low latency. This control over the inputs helps to now support algorithms like hybrid hash-joins on the same operator and threading model as continuous symmetric joins through RocksDB. These operators also form the basis for future features like &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API&quot;&gt;“Side Inputs”&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Table API &amp;amp; SQL Query Processor:&lt;/strong&gt; The SQL query processor is the component that evolved the changed most compared to the latest Flink master branch:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;While Flink currently translates queries either into DataSet or DataStream programs (depending on the characteristics of their inputs), Blink translates queries to a data flow of the aforementioned stream operators.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Blink adds many more runtime operators for common SQL operations like semi-joins, anti-joins, etc.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The query planner (optimizer) is still based on Apache Calcite, but has many more optimization rules (incl. join reordering) and uses a proper cost model for planning.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Stream operators are more aggressively chained.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The common data structures (sorters, hash tables) and serializers are extended to go even further in operating on binary data and saving serialization overhead. Code generation is used for the row serializers.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Improved Scheduling and Failure Recovery:&lt;/strong&gt; Finally, Blink implements several improvements for task scheduling and fault tolerance. The scheduling strategies use resources better by exploiting how the operators process their input data. The failover strategies recover more fine-grained along the boundaries of persistent shuffles. A failed JobManager can be replaced without restarting a running application.&lt;/p&gt;
&lt;p&gt;The changes in Blink result in a big improvement in performance. The below numbers were reported by the developers of Blink to give a rough impression of the performance gains.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/blink-flink-tpch.png&quot; width=&quot;600px&quot; alt=&quot;TPC-H performance of Blink and Flink.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Relative performance of Blink versus Flink 1.6.0 in the TPC-H benchmark, query by query.&lt;br /&gt;
The performance improvement is in average 10x.&lt;br /&gt;
&lt;a href=&quot;https://www.ververica.com/flink-forward-berlin/resources/unified-engine-for-data-processing-and-ai&quot; target=&quot;blank&quot;&gt;Presentation by Xiaowei Jiang at Flink Forward Berlin, 2018&lt;/a&gt;.)&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/unified-batch-streaming-blink/blink-spark-tpcds.png&quot; width=&quot;600px&quot; alt=&quot;TPC-DS performace of Blink and Spark.&quot; /&gt;
&lt;br /&gt;
&lt;i&gt;Performance of Blink versus Spark in the TPC-DS benchmark, aggregate time for all queries together.&lt;br /&gt;
&lt;a href=&quot;https://www.bilibili.com/video/av42325467/?p=3&quot; target=&quot;blank&quot;&gt;Presentation by Xiaowei Jiang at Flink Forward Beijing, 2018&lt;/a&gt;.&lt;/i&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;how-do-we-plan-to-merge-blink-and-flink&quot;&gt;How do we plan to merge Blink and Flink?&lt;/h2&gt;
&lt;p&gt;Blink’s code is currently available as a &lt;a href=&quot;https://github.com/apache/flink/tree/blink&quot;&gt;branch&lt;/a&gt; in the Apache Flink repository. It is a challenge to merge a such big amount of changes, while making the merge process as non-disruptive as possible and keeping public APIs as stable as possible.&lt;/p&gt;
&lt;p&gt;The community’s &lt;a href=&quot;https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E&quot;&gt;merge plan&lt;/a&gt; focuses initially on the bounded/batch processing features mentioned above and follows the following approach to ensure a smooth integration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To merge Blink’s &lt;em&gt;SQL/Table API query processor&lt;/em&gt; enhancements, we exploit the fact that both Flink and Blink have the same APIs: SQL and the Table API.
Following some restructuring of the Table/SQL module (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions&quot;&gt;FLIP-32&lt;/a&gt;) we plan to merge the Blink query planner (optimizer) and runtime (operators) as an additional query processor next to the current SQL runtime. Think of it as two different runners for the same APIs.&lt;br /&gt;
Initially, users will be able to select which query processor to use. After a transition period in which the new query processor will be developed to subsume the current query processor, the current processor will most likely be deprecated and eventually dropped. Given that SQL is such a well defined interface, we anticipate that this transition has little friction for users. Mostly a pleasant surprise to have broader SQL feature coverage and a boost in performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To support the merge of Blink’s &lt;em&gt;enhancements to scheduling and recovery&lt;/em&gt; for jobs on bounded data, the Flink community is already working on refactoring its current schedule and adding support for &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10429&quot;&gt;pluggable scheduling and fail-over strategies&lt;/a&gt;.&lt;br /&gt;
Once this effort is finished, we can add Blink’s scheduling and recovery strategies as a new scheduling strategy that is used by the new query processor. Eventually, we plan to use the new scheduling strategy also for bounded DataStream programs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The extended catalog support, DDL support, as well as support for Hive’s catalog and integrations is currently going through separate design discussions. We plan to leverage existing code here whenever it makes sense.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;summary&quot;&gt;Summary&lt;/h2&gt;
&lt;p&gt;We believe that the data processing stack of the future is based on stream processing: The elegance of stream processing with its ability to model offline processing (batch), real-time data processing, and event-driven applications in the same way, while offering high performance and consistency is simply too compelling.&lt;/p&gt;
&lt;p&gt;Exploiting certain properties of bounded data is important for a stream processor to achieve the same performance as dedicated batch processors. While Flink always supported batch processing, the project is taking the next step in building a unified runtime and towards &lt;strong&gt;becoming a stream processor that is competitive with batch processing systems even on their home turf: OLAP SQL.&lt;/strong&gt; The contribution of Alibaba’s Blink code helps the Flink community to pick up the speed on this development.&lt;/p&gt;
</description>
<pubDate>Wed, 13 Feb 2019 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2019/02/13/unified-batch-streaming-blink.html</link>
<guid isPermaLink="true">/news/2019/02/13/unified-batch-streaming-blink.html</guid>
</item>
<item>
<title>Apache Flink 1.5.6 Released</title>
<description>&lt;p&gt;The Apache Flink community released the sixth and last bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 47 fixes and minor improvements for Flink 1.5.5. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.6.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.6&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] - Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10863&quot;&gt;FLINK-10863&lt;/a&gt;] - Assign uids to all operators
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10166&quot;&gt;FLINK-10166&lt;/a&gt;] - Dependency problems when executing SQL query in sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10309&quot;&gt;FLINK-10309&lt;/a&gt;] - Cancel with savepoint fails with java.net.ConnectException when using the per job-mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10419&quot;&gt;FLINK-10419&lt;/a&gt;] - ClassNotFoundException while deserializing user exceptions from checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] - Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10491&quot;&gt;FLINK-10491&lt;/a&gt;] - Deadlock during spilling data in SpillableSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] - Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10581&quot;&gt;FLINK-10581&lt;/a&gt;] - YarnConfigurationITCase.testFlinkContainerMemory test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10642&quot;&gt;FLINK-10642&lt;/a&gt;] - CodeGen split fields errors when maxGeneratedCodeLength equals 1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10655&quot;&gt;FLINK-10655&lt;/a&gt;] - RemoteRpcInvocation not overwriting ObjectInputStream&amp;#39;s ClassNotFoundException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10669&quot;&gt;FLINK-10669&lt;/a&gt;] - Exceptions &amp;amp; errors are not properly checked in logs in e2e tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10670&quot;&gt;FLINK-10670&lt;/a&gt;] - Fix Correlate codegen error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10674&quot;&gt;FLINK-10674&lt;/a&gt;] - Fix handling of retractions after clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10690&quot;&gt;FLINK-10690&lt;/a&gt;] - Tests leak resources via Files.list
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10693&quot;&gt;FLINK-10693&lt;/a&gt;] - Fix Scala EitherSerializer duplication
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10715&quot;&gt;FLINK-10715&lt;/a&gt;] - E2e tests fail with ConcurrentModificationException in MetricRegistryImpl
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10750&quot;&gt;FLINK-10750&lt;/a&gt;] - SocketClientSinkTest.testRetry fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10752&quot;&gt;FLINK-10752&lt;/a&gt;] - Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10753&quot;&gt;FLINK-10753&lt;/a&gt;] - Propagate and log snapshotting exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10770&quot;&gt;FLINK-10770&lt;/a&gt;] - Some generated functions are not opened properly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10773&quot;&gt;FLINK-10773&lt;/a&gt;] - Resume externalized checkpoint end-to-end test fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10821&quot;&gt;FLINK-10821&lt;/a&gt;] - Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10839&quot;&gt;FLINK-10839&lt;/a&gt;] - Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10856&quot;&gt;FLINK-10856&lt;/a&gt;] - Harden resume from externalized checkpoint E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10857&quot;&gt;FLINK-10857&lt;/a&gt;] - Conflict between JMX and Prometheus Metrics reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10880&quot;&gt;FLINK-10880&lt;/a&gt;] - Failover strategies should not be applied to Batch Execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10913&quot;&gt;FLINK-10913&lt;/a&gt;] - ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10925&quot;&gt;FLINK-10925&lt;/a&gt;] - NPE in PythonPlanStreamer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10990&quot;&gt;FLINK-10990&lt;/a&gt;] - Enforce minimum timespan in MeterView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10998&quot;&gt;FLINK-10998&lt;/a&gt;] - flink-metrics-ganglia has LGPL dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] - Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4173&quot;&gt;FLINK-4173&lt;/a&gt;] - Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9869&quot;&gt;FLINK-9869&lt;/a&gt;] - Send PartitionInfo in batch to Improve perfornance
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10613&quot;&gt;FLINK-10613&lt;/a&gt;] - Remove logger casts in HBaseConnectorITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10614&quot;&gt;FLINK-10614&lt;/a&gt;] - Update test_batch_allround.sh e2e to new testing infrastructure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10637&quot;&gt;FLINK-10637&lt;/a&gt;] - Start MiniCluster with random REST port
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10678&quot;&gt;FLINK-10678&lt;/a&gt;] - Add a switch to run_test to configure if logs should be checked for errors/excepions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10906&quot;&gt;FLINK-10906&lt;/a&gt;] - docker-entrypoint.sh logs credentails during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10916&quot;&gt;FLINK-10916&lt;/a&gt;] - Include duplicated user-specified uid into error message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11005&quot;&gt;FLINK-11005&lt;/a&gt;] - Define flink-sql-client uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10606&quot;&gt;FLINK-10606&lt;/a&gt;] - Construct NetworkEnvironment simple for tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10607&quot;&gt;FLINK-10607&lt;/a&gt;] - Unify to remove duplicated NoOpResultPartitionConsumableNotifier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10827&quot;&gt;FLINK-10827&lt;/a&gt;] - Add test for duplicate() to SerializerTestBase
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 26 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/26/release-1.5.6.html</link>
<guid isPermaLink="true">/news/2018/12/26/release-1.5.6.html</guid>
</item>
<item>
<title>Apache Flink 1.6.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 80 fixes and minor improvements for Flink 1.6.2. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.3.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10097&quot;&gt;FLINK-10097&lt;/a&gt;] - More tests to increase StreamingFileSink test coverage
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] - Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10367&quot;&gt;FLINK-10367&lt;/a&gt;] - Avoid recursion stack overflow during releasing SingleInputGate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10863&quot;&gt;FLINK-10863&lt;/a&gt;] - Assign uids to all operators in general purpose testing job
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9635&quot;&gt;FLINK-9635&lt;/a&gt;] - Local recovery scheduling can cause spread out of tasks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9878&quot;&gt;FLINK-9878&lt;/a&gt;] - IO worker threads BLOCKED on SSL Session Cache while CMS full gc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10149&quot;&gt;FLINK-10149&lt;/a&gt;] - Fink Mesos allocates extra port when not configured to do so.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10166&quot;&gt;FLINK-10166&lt;/a&gt;] - Dependency problems when executing SQL query in sql-client
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10309&quot;&gt;FLINK-10309&lt;/a&gt;] - Cancel with savepoint fails with java.net.ConnectException when using the per job-mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10357&quot;&gt;FLINK-10357&lt;/a&gt;] - Streaming File Sink end-to-end test failed with mismatch
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10359&quot;&gt;FLINK-10359&lt;/a&gt;] - Scala example in DataSet docs is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10364&quot;&gt;FLINK-10364&lt;/a&gt;] - Test instability in NonHAQueryableStateFsBackendITCase#testMapState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10419&quot;&gt;FLINK-10419&lt;/a&gt;] - ClassNotFoundException while deserializing user exceptions from checkpointing
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10425&quot;&gt;FLINK-10425&lt;/a&gt;] - taskmanager.host is not respected
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10455&quot;&gt;FLINK-10455&lt;/a&gt;] - Potential Kafka producer leak in case of failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10463&quot;&gt;FLINK-10463&lt;/a&gt;] - Null literal cannot be properly parsed in Java Table API function call
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10481&quot;&gt;FLINK-10481&lt;/a&gt;] - Wordcount end-to-end test in docker env unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10491&quot;&gt;FLINK-10491&lt;/a&gt;] - Deadlock during spilling data in SpillableSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] - Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10567&quot;&gt;FLINK-10567&lt;/a&gt;] - Lost serialize fields when ttl state store with the mutable serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10570&quot;&gt;FLINK-10570&lt;/a&gt;] - State grows unbounded when &amp;quot;within&amp;quot; constraint not applied
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10581&quot;&gt;FLINK-10581&lt;/a&gt;] - YarnConfigurationITCase.testFlinkContainerMemory test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10642&quot;&gt;FLINK-10642&lt;/a&gt;] - CodeGen split fields errors when maxGeneratedCodeLength equals 1
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10655&quot;&gt;FLINK-10655&lt;/a&gt;] - RemoteRpcInvocation not overwriting ObjectInputStream&amp;#39;s ClassNotFoundException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10663&quot;&gt;FLINK-10663&lt;/a&gt;] - Closing StreamingFileSink can cause NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10669&quot;&gt;FLINK-10669&lt;/a&gt;] - Exceptions &amp;amp; errors are not properly checked in logs in e2e tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10670&quot;&gt;FLINK-10670&lt;/a&gt;] - Fix Correlate codegen error
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10674&quot;&gt;FLINK-10674&lt;/a&gt;] - Fix handling of retractions after clean up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10681&quot;&gt;FLINK-10681&lt;/a&gt;] - elasticsearch6.ElasticsearchSinkITCase fails if wrong JNA library installed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10690&quot;&gt;FLINK-10690&lt;/a&gt;] - Tests leak resources via Files.list
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10693&quot;&gt;FLINK-10693&lt;/a&gt;] - Fix Scala EitherSerializer duplication
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10715&quot;&gt;FLINK-10715&lt;/a&gt;] - E2e tests fail with ConcurrentModificationException in MetricRegistryImpl
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10750&quot;&gt;FLINK-10750&lt;/a&gt;] - SocketClientSinkTest.testRetry fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10752&quot;&gt;FLINK-10752&lt;/a&gt;] - Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10753&quot;&gt;FLINK-10753&lt;/a&gt;] - Propagate and log snapshotting exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10763&quot;&gt;FLINK-10763&lt;/a&gt;] - Interval join produces wrong result type in Scala API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10770&quot;&gt;FLINK-10770&lt;/a&gt;] - Some generated functions are not opened properly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10773&quot;&gt;FLINK-10773&lt;/a&gt;] - Resume externalized checkpoint end-to-end test fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10809&quot;&gt;FLINK-10809&lt;/a&gt;] - Using DataStreamUtils.reinterpretAsKeyedStream produces corrupted keyed state after restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10816&quot;&gt;FLINK-10816&lt;/a&gt;] - Fix LockableTypeSerializer.duplicate()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10821&quot;&gt;FLINK-10821&lt;/a&gt;] - Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10839&quot;&gt;FLINK-10839&lt;/a&gt;] - Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10842&quot;&gt;FLINK-10842&lt;/a&gt;] - Waiting loops are broken in e2e/common.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10856&quot;&gt;FLINK-10856&lt;/a&gt;] - Harden resume from externalized checkpoint E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10857&quot;&gt;FLINK-10857&lt;/a&gt;] - Conflict between JMX and Prometheus Metrics reporter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10880&quot;&gt;FLINK-10880&lt;/a&gt;] - Failover strategies should not be applied to Batch Execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10913&quot;&gt;FLINK-10913&lt;/a&gt;] - ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10925&quot;&gt;FLINK-10925&lt;/a&gt;] - NPE in PythonPlanStreamer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10946&quot;&gt;FLINK-10946&lt;/a&gt;] - Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10990&quot;&gt;FLINK-10990&lt;/a&gt;] - Enforce minimum timespan in MeterView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10992&quot;&gt;FLINK-10992&lt;/a&gt;] - Jepsen: Do not use /tmp as HDFS Data Directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10997&quot;&gt;FLINK-10997&lt;/a&gt;] - Avro-confluent-registry does not bundle any dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10998&quot;&gt;FLINK-10998&lt;/a&gt;] - flink-metrics-ganglia has LGPL dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] - Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11017&quot;&gt;FLINK-11017&lt;/a&gt;] - Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11029&quot;&gt;FLINK-11029&lt;/a&gt;] - Incorrect parameter in Working with state doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11041&quot;&gt;FLINK-11041&lt;/a&gt;] - ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11045&quot;&gt;FLINK-11045&lt;/a&gt;] - UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11083&quot;&gt;FLINK-11083&lt;/a&gt;] - CRowSerializerConfigSnapshot is not instantiable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11087&quot;&gt;FLINK-11087&lt;/a&gt;] - Broadcast state migration Incompatibility from 1.5.3 to 1.7.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11123&quot;&gt;FLINK-11123&lt;/a&gt;] - Missing import in ML quickstart docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11136&quot;&gt;FLINK-11136&lt;/a&gt;] - Fix the logical of merge for DISTINCT aggregates
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4173&quot;&gt;FLINK-4173&lt;/a&gt;] - Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10353&quot;&gt;FLINK-10353&lt;/a&gt;] - Restoring a KafkaProducer with Semantic.EXACTLY_ONCE from a savepoint written with Semantic.AT_LEAST_ONCE fails with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10608&quot;&gt;FLINK-10608&lt;/a&gt;] - Add avro files generated by datastream-allround-test to RAT exclusions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10613&quot;&gt;FLINK-10613&lt;/a&gt;] - Remove logger casts in HBaseConnectorITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10614&quot;&gt;FLINK-10614&lt;/a&gt;] - Update test_batch_allround.sh e2e to new testing infrastructure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10637&quot;&gt;FLINK-10637&lt;/a&gt;] - Start MiniCluster with random REST port
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10678&quot;&gt;FLINK-10678&lt;/a&gt;] - Add a switch to run_test to configure if logs should be checked for errors/excepions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10692&quot;&gt;FLINK-10692&lt;/a&gt;] - Harden Confluent schema E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10883&quot;&gt;FLINK-10883&lt;/a&gt;] - Submitting a jobs without enough slots times out due to a unspecified timeout
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10906&quot;&gt;FLINK-10906&lt;/a&gt;] - docker-entrypoint.sh logs credentails during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10916&quot;&gt;FLINK-10916&lt;/a&gt;] - Include duplicated user-specified uid into error message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10951&quot;&gt;FLINK-10951&lt;/a&gt;] - Disable enforcing of YARN container virtual memory limits in tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11005&quot;&gt;FLINK-11005&lt;/a&gt;] - Define flink-sql-client uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10606&quot;&gt;FLINK-10606&lt;/a&gt;] - Construct NetworkEnvironment simple for tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10607&quot;&gt;FLINK-10607&lt;/a&gt;] - Unify to remove duplicated NoOpResultPartitionConsumableNotifier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10827&quot;&gt;FLINK-10827&lt;/a&gt;] - Add test for duplicate() to SerializerTestBase
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Wish
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10793&quot;&gt;FLINK-10793&lt;/a&gt;] - Change visibility of TtlValue and TtlSerializer to public for external tools
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Sat, 22 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/22/release-1.6.3.html</link>
<guid isPermaLink="true">/news/2018/12/22/release-1.6.3.html</guid>
</item>
<item>
<title>Apache Flink 1.7.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.7 series.&lt;/p&gt;
&lt;p&gt;This release includes 27 fixes and minor improvements for Flink 1.7.0. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.7.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.7.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10252&quot;&gt;FLINK-10252&lt;/a&gt;] - Handle oversized metric messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10367&quot;&gt;FLINK-10367&lt;/a&gt;] - Avoid recursion stack overflow during releasing SingleInputGate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10522&quot;&gt;FLINK-10522&lt;/a&gt;] - Check if RecoverableWriter supportsResume and act accordingly.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10963&quot;&gt;FLINK-10963&lt;/a&gt;] - Cleanup small objects uploaded to S3 as independent objects
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8336&quot;&gt;FLINK-8336&lt;/a&gt;] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10149&quot;&gt;FLINK-10149&lt;/a&gt;] - Fink Mesos allocates extra port when not configured to do so.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10359&quot;&gt;FLINK-10359&lt;/a&gt;] - Scala example in DataSet docs is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10482&quot;&gt;FLINK-10482&lt;/a&gt;] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10566&quot;&gt;FLINK-10566&lt;/a&gt;] - Flink Planning is exponential in the number of stages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10997&quot;&gt;FLINK-10997&lt;/a&gt;] - Avro-confluent-registry does not bundle any dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11011&quot;&gt;FLINK-11011&lt;/a&gt;] - Elasticsearch 6 sink end-to-end test unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11013&quot;&gt;FLINK-11013&lt;/a&gt;] - Fix distinct aggregates for group window in Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11017&quot;&gt;FLINK-11017&lt;/a&gt;] - Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11029&quot;&gt;FLINK-11029&lt;/a&gt;] - Incorrect parameter in Working with state doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11032&quot;&gt;FLINK-11032&lt;/a&gt;] - Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11033&quot;&gt;FLINK-11033&lt;/a&gt;] - Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11041&quot;&gt;FLINK-11041&lt;/a&gt;] - ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11044&quot;&gt;FLINK-11044&lt;/a&gt;] - RegisterTableSink docs incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11045&quot;&gt;FLINK-11045&lt;/a&gt;] - UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11047&quot;&gt;FLINK-11047&lt;/a&gt;] - CoGroupGroupSortTranslationTest does not compile with scala 2.12
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11085&quot;&gt;FLINK-11085&lt;/a&gt;] - NoClassDefFoundError in presto-s3 filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11087&quot;&gt;FLINK-11087&lt;/a&gt;] - Broadcast state migration Incompatibility from 1.5.3 to 1.7.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11094&quot;&gt;FLINK-11094&lt;/a&gt;] - Restored state in RocksDBStateBackend that has not been accessed in restored execution causes NPE on snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11123&quot;&gt;FLINK-11123&lt;/a&gt;] - Missing import in ML quickstart docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11136&quot;&gt;FLINK-11136&lt;/a&gt;] - Fix the logical of merge for DISTINCT aggregates
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-11080&quot;&gt;FLINK-11080&lt;/a&gt;] - Define flink-connector-elasticsearch6 uber-jar dependencies via artifactSet
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 21 Dec 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/12/21/release-1.7.1.html</link>
<guid isPermaLink="true">/news/2018/12/21/release-1.7.1.html</guid>
</item>
<item>
<title>Apache Flink 1.7.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce Apache Flink 1.7.0.
The latest release includes more than 420 resolved issues and some exciting additions to Flink that we describe in the following sections of this post.
Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12343585&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;Flink 1.7.0 is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.
The release is available now and we encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the updated &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#flink-170---extending-the-reach-of-stream-processing&quot; id=&quot;markdown-toc-flink-170---extending-the-reach-of-stream-processing&quot;&gt;Flink 1.7.0 - Extending the reach of Stream Processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;flink-170---extending-the-reach-of-stream-processing&quot;&gt;Flink 1.7.0 - Extending the reach of Stream Processing&lt;/h2&gt;
&lt;p&gt;In Flink 1.7.0 we come closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way.
Our latest release includes some exciting new features and improvements such as support for Scala 2.12, an exactly-once S3 file sink, the integration of complex event processing with streaming SQL and more features that we explain below.&lt;/p&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scala 2.12 Support in Apache Flink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7811&quot;&gt;FLINK-7811&lt;/a&gt;):
Apache Flink 1.7.0 is the first release which comes with full support for Scala 2.12.
This allows users to write Flink applications with a newer Scala version and to leverage the Scala 2.12 ecosystem.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;State Evolution&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9376&quot;&gt;FLINK-9376&lt;/a&gt;):
In many cases, a long-running Flink application needs to evolve during its lifetime because of changing requirements.
Changing the user state without losing the current application progress in the form of its state is a crucial requirement for application evolution.&lt;/p&gt;
&lt;p&gt;With Flink 1.7.0, the community added state evolution which allows you to flexibly adapt a long-running application’s user states schema, while maintaining compatibility with previous savepoints.
With state evolution it is possible to add or remove columns to your state schema in order to change which business features will be captured by your application after it has been deployed.&lt;/p&gt;
&lt;p&gt;State schema evolution now works out-of-the-box when using Avro’s generated classes as user state, meaning that the schema of the state can be evolved according to Avro’s specifications.
While Avro types are the only built-in type that supports schema evolution as of Flink 1.7, the community continues working to further extend support to other types in future Flink releases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Exactly-once S3 StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9752&quot;&gt;FLINK-9752&lt;/a&gt;):
The &lt;code&gt;StreamingFileSink&lt;/code&gt; which was introduced in Flink 1.6.0 is now extended to also support writing to S3 filesystems with exactly-once processing guarantees.
Using this feature allows users to build exactly-once end-to-end pipelines writing to S3.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;MATCH_RECOGNIZE&lt;/code&gt; Support in Streaming SQL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6935&quot;&gt;FLINK-6935&lt;/a&gt;):
This is a major addition to Apache Flink 1.7.0 that provides initial support of the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/streaming/match_recognize.html&quot;&gt;&lt;code&gt;MATCH_RECOGNIZE&lt;/code&gt;&lt;/a&gt; standard to Flink SQL.
This feature combines both complex event processing (CEP) and SQL for easy pattern matching on data streams and, thus, enabling a whole set of new use cases.&lt;/p&gt;
&lt;p&gt;This feature is currently in beta phase so we welcome any feedback and suggestions from the community for future iterations and improvements.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Temporal Tables and Temporal Joins in Streaming SQL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9712&quot;&gt;FLINK-9712&lt;/a&gt;):
Temporal Tables is a new concept in Apache Flink that gives a (parameterized) view on a table’s changing history and returns the content of a table at a specific point in time.&lt;/p&gt;
&lt;p&gt;As an example, we can use a table with historical currency exchange rates.
Such a table is constantly growing/evolving as time progresses and newly updated exchange rates are added.
Temporal Table is a view that can return the actual state of those exchange rates to any given point of time.
With such a table it is possible to convert a stream of orders in different currencies to a common currency using the correct exchange rate.&lt;/p&gt;
&lt;p&gt;Temporal Joins allow for memory and computational-efficient joins of Streaming data with an ever-changing/updating table, using either processing time or event time, while being ANSI SQL compliant.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Miscellaneous Features for Streaming SQL&lt;/strong&gt;:
Besides the major features mentioned above, Flink’s Table &amp;amp; SQL API has been extended to serve more use cases.&lt;/p&gt;
&lt;p&gt;The following built-in functions were added to the APIs: &lt;code&gt;TO_BASE64&lt;/code&gt;, &lt;code&gt;LOG2&lt;/code&gt;, &lt;code&gt;LTRIM&lt;/code&gt;, &lt;code&gt;REPEAT&lt;/code&gt;, &lt;code&gt;REPLACE&lt;/code&gt;, &lt;code&gt;COSH&lt;/code&gt;, &lt;code&gt;SINH&lt;/code&gt;, &lt;code&gt;TANH&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;The SQL Client now supports the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/sqlClient.html#sql-views&quot;&gt;definition of views&lt;/a&gt; both in an environment file and within a CLI session.
Furthermore, basic SQL statement auto-completion has been added to the CLI.&lt;/p&gt;
&lt;p&gt;The community added an &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/connect.html#elasticsearch-connector&quot;&gt;Elasticsearch 6 table sink&lt;/a&gt; which allows to store updating results of a dynamic table.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Versioned REST API&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7551&quot;&gt;FLINK-7551&lt;/a&gt;):
Beginning with Flink 1.7.0, the REST API is versioned.
This guarantees the stability of Flink’s REST API so that third-party applications can be developed against a stable API in Flink.
Thus, future Flink upgrades will not require changes to existing third-party integrations.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Kafka 2.0 Connector&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10598&quot;&gt;FLINK-10598&lt;/a&gt;):
Apache Flink 1.7.0 continues to add more connectors, making it even easier to interact with more external systems.
In this release, the community added the Kafka 2.0 connector which allows to read from and write to Kafka 2.0 with exactly-once guarantees.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Local Recovery&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9635&quot;&gt;FLINK-9635&lt;/a&gt;):
Apache Flink 1.7.0 completes the local recovery feature by extending Flink’s scheduling to take previous deployment locations into account in case of recovery.&lt;/p&gt;
&lt;p&gt;If local recovery is enabled Flink will keep a local copy of the latest checkpoint on the machine where the task is running.
By scheduling tasks to their previous locations, Flink will, thus, minimize the network traffic for restoring state by reading checkpoint state from local disk.
This feature considerably improves recovery speed.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Removal of Flink’s Legacy Mode&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10392&quot;&gt;FLINK-10392&lt;/a&gt;):
Apache Flink 1.7.0 marks the release where the Flip-6 effort has been fully completed and reached feature parity with the legacy mode.
Consequently, this release removes support for the legacy mode.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.7/release-notes/flink-1.7.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.7.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;We would like to acknowledge all community members for contributing to this release.
Special credits go to the following members for contributing to the 1.7.0 release (according to git):&lt;/p&gt;
&lt;p&gt;Aitozi, Alex Arkhipov, Alexander Koltsov, Alexey Trenikhin, Alice, Alice Yan, Aljoscha Krettek, Andrei Poluliakh, Andrey Zagrebin, Ashwin Sinha, Barisa Obradovic, Ben La Monica, Benoit Meriaux, Bowen Li, Chesnay Schepler, Christophe Jolif, Congxian Qiu, Craig Foster, David Anderson, Dawid Wysakowicz, Dian Fu, Diego Carvallo, Dimitris Palyvos, Eugen Yushin, Fabian Hueske, Florian Schmidt, Gary Yao, Guibo Pan, Hequn Cheng, Hiroaki Yoshida, Igal Shilman, JIN SUN, Jamie Grier, Jayant Ameta, Jeff Zhang, Jeffrey Chung, Jicaar, Jin Sun, Joe Malt, Johannes Dillmann, Jun Zhang, Kostas Kloudas, Krzysztof Białek, Lakshmi Gururaja Rao, Liu Biao, Mahesh Senniappan, Manuel Hoffmann, Mark Cho, Max Feng, Mike Pedersen, Mododo, Nico Kruber, Oleksandr Nitavskyi, Osman Şamil AKÇELİK, Patrick Lucas, Paul Lam, Piotr Nowojski, Rick Hofstede, Rong R, Rong Rong, Sayat Satybaldiyev, Sebastian Klemke, Seth Wiesman, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Jason, Thomas Weise, Till Rohrmann, Timo Walther, Tzu-Li “tison” Chen, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Wosin, Xingcan Cui, Xpray, Xue Yu, Yangze Guo, Ying Xu, Yun Tang, Zhijiang, blues Zheng, hequn8128, ifndef-SleePy, jerryjzhang, jrthe42, jyc.jia, kkolman, lihongli, linjun, linzhaoming, liurenjie1024, liuxianjiao, lrl, lsy, lzqdename, maqingxiang, maqingxiang-it, minwenjun, shuai-xu, sihuazhou, snuyanzin, wind, xuewei.linxuewei, xueyu, xuqianjin, yanghua, yangshimin, zhijiang, 谢磊, 陈梓立&lt;/p&gt;
</description>
<pubDate>Fri, 30 Nov 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/11/30/release-1.7.0.html</link>
<guid isPermaLink="true">/news/2018/11/30/release-1.7.0.html</guid>
</item>
<item>
<title>Apache Flink 1.6.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 30 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10242&quot;&gt;FLINK-10242&lt;/a&gt;] - Latency marker interval should be configurable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10243&quot;&gt;FLINK-10243&lt;/a&gt;] - Add option to reduce latency metrics granularity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10331&quot;&gt;FLINK-10331&lt;/a&gt;] - Reduce number of flush requests to the network stack
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10332&quot;&gt;FLINK-10332&lt;/a&gt;] - Move data available notification in PipelinedSubpartition out of the synchronized block
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5542&quot;&gt;FLINK-5542&lt;/a&gt;] - YARN client incorrectly uses local YARN config to check vcore capacity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] - Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9788&quot;&gt;FLINK-9788&lt;/a&gt;] - ExecutionGraph Inconsistency prevents Job from recovering
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9884&quot;&gt;FLINK-9884&lt;/a&gt;] - Slot request may not be removed when it has already be assigned in slot manager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9891&quot;&gt;FLINK-9891&lt;/a&gt;] - Flink cluster is not shutdown in YARN mode when Flink client is stopped
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9932&quot;&gt;FLINK-9932&lt;/a&gt;] - Timed-out TaskExecutor slot-offers to JobMaster leak the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10135&quot;&gt;FLINK-10135&lt;/a&gt;] - Certain cluster-level metrics are no longer exposed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10157&quot;&gt;FLINK-10157&lt;/a&gt;] - Allow `null` user values in map state with TTL
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10222&quot;&gt;FLINK-10222&lt;/a&gt;] - Table scalar function expression parses error when function name equals the exists keyword suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10259&quot;&gt;FLINK-10259&lt;/a&gt;] - Key validation for GroupWindowAggregate is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10263&quot;&gt;FLINK-10263&lt;/a&gt;] - User-defined function with LITERAL paramters yields CompileException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10316&quot;&gt;FLINK-10316&lt;/a&gt;] - Add check to KinesisProducer that aws.region is set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10354&quot;&gt;FLINK-10354&lt;/a&gt;] - Savepoints should be counted as retained checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10363&quot;&gt;FLINK-10363&lt;/a&gt;] - S3 FileSystem factory prints secrets into logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10379&quot;&gt;FLINK-10379&lt;/a&gt;] - Can not use Table Functions in Java Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10383&quot;&gt;FLINK-10383&lt;/a&gt;] - Hadoop configurations on the classpath seep into the S3 file system configs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10390&quot;&gt;FLINK-10390&lt;/a&gt;] - DataDog MetricReporter leaks connections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10400&quot;&gt;FLINK-10400&lt;/a&gt;] - Return failed JobResult if job terminates in state FAILED or CANCELED
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10415&quot;&gt;FLINK-10415&lt;/a&gt;] - RestClient does not react to lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10444&quot;&gt;FLINK-10444&lt;/a&gt;] - Make S3 entropy injection work with FileSystem safety net
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10451&quot;&gt;FLINK-10451&lt;/a&gt;] - TableFunctionCollector should handle the life cycle of ScalarFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10465&quot;&gt;FLINK-10465&lt;/a&gt;] - Jepsen: runit supervised sshd is stopped on tear down
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10469&quot;&gt;FLINK-10469&lt;/a&gt;] - FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10487&quot;&gt;FLINK-10487&lt;/a&gt;] - fix invalid Flink SQL example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10516&quot;&gt;FLINK-10516&lt;/a&gt;] - YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10524&quot;&gt;FLINK-10524&lt;/a&gt;] - MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10532&quot;&gt;FLINK-10532&lt;/a&gt;] - Broken links in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10544&quot;&gt;FLINK-10544&lt;/a&gt;] - Remove custom settings.xml for snapshot deployments
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9061&quot;&gt;FLINK-9061&lt;/a&gt;] - Add entropy to s3 path for better scalability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10075&quot;&gt;FLINK-10075&lt;/a&gt;] - HTTP connections to a secured REST endpoint flood the log
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10260&quot;&gt;FLINK-10260&lt;/a&gt;] - Confusing log messages during TaskManager registration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10282&quot;&gt;FLINK-10282&lt;/a&gt;] - Provide separate thread-pool for REST endpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10291&quot;&gt;FLINK-10291&lt;/a&gt;] - Generate JobGraph with fixed/configurable JobID in StandaloneJobClusterEntrypoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10311&quot;&gt;FLINK-10311&lt;/a&gt;] - HA end-to-end/Jepsen tests for standby Dispatchers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10312&quot;&gt;FLINK-10312&lt;/a&gt;] - Wrong / missing exception when submitting job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10371&quot;&gt;FLINK-10371&lt;/a&gt;] - Allow to enable SSL mutual authentication on REST endpoints by configuration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10375&quot;&gt;FLINK-10375&lt;/a&gt;] - ExceptionInChainedStubException hides wrapped exception in cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10582&quot;&gt;FLINK-10582&lt;/a&gt;] - Make REST executor thread priority configurable
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 29 Oct 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/10/29/release-1.6.2.html</link>
<guid isPermaLink="true">/news/2018/10/29/release-1.6.2.html</guid>
</item>
<item>
<title>Apache Flink 1.5.5 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fifth bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.5.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10242&quot;&gt;FLINK-10242&lt;/a&gt;] - Latency marker interval should be configurable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10243&quot;&gt;FLINK-10243&lt;/a&gt;] - Add option to reduce latency metrics granularity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10331&quot;&gt;FLINK-10331&lt;/a&gt;] - Reduce number of flush requests to the network stack
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10332&quot;&gt;FLINK-10332&lt;/a&gt;] - Move data available notification in PipelinedSubpartition out of the synchronized block
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5542&quot;&gt;FLINK-5542&lt;/a&gt;] - YARN client incorrectly uses local YARN config to check vcore capacity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] - Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9788&quot;&gt;FLINK-9788&lt;/a&gt;] - ExecutionGraph Inconsistency prevents Job from recovering
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9884&quot;&gt;FLINK-9884&lt;/a&gt;] - Slot request may not be removed when it has already be assigned in slot manager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9891&quot;&gt;FLINK-9891&lt;/a&gt;] - Flink cluster is not shutdown in YARN mode when Flink client is stopped
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9932&quot;&gt;FLINK-9932&lt;/a&gt;] - Timed-out TaskExecutor slot-offers to JobMaster leak the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10135&quot;&gt;FLINK-10135&lt;/a&gt;] - Certain cluster-level metrics are no longer exposed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10222&quot;&gt;FLINK-10222&lt;/a&gt;] - Table scalar function expression parses error when function name equals the exists keyword suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10259&quot;&gt;FLINK-10259&lt;/a&gt;] - Key validation for GroupWindowAggregate is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10316&quot;&gt;FLINK-10316&lt;/a&gt;] - Add check to KinesisProducer that aws.region is set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10354&quot;&gt;FLINK-10354&lt;/a&gt;] - Savepoints should be counted as retained checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10400&quot;&gt;FLINK-10400&lt;/a&gt;] - Return failed JobResult if job terminates in state FAILED or CANCELED
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10415&quot;&gt;FLINK-10415&lt;/a&gt;] - RestClient does not react to lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10451&quot;&gt;FLINK-10451&lt;/a&gt;] - TableFunctionCollector should handle the life cycle of ScalarFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10469&quot;&gt;FLINK-10469&lt;/a&gt;] - FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10487&quot;&gt;FLINK-10487&lt;/a&gt;] - fix invalid Flink SQL example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10516&quot;&gt;FLINK-10516&lt;/a&gt;] - YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10524&quot;&gt;FLINK-10524&lt;/a&gt;] - MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10544&quot;&gt;FLINK-10544&lt;/a&gt;] - Remove custom settings.xml for snapshot deployments
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10075&quot;&gt;FLINK-10075&lt;/a&gt;] - HTTP connections to a secured REST endpoint flood the log
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10260&quot;&gt;FLINK-10260&lt;/a&gt;] - Confusing log messages during TaskManager registration
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10282&quot;&gt;FLINK-10282&lt;/a&gt;] - Provide separate thread-pool for REST endpoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10312&quot;&gt;FLINK-10312&lt;/a&gt;] - Wrong / missing exception when submitting job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10375&quot;&gt;FLINK-10375&lt;/a&gt;] - ExceptionInChainedStubException hides wrapped exception in cause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10582&quot;&gt;FLINK-10582&lt;/a&gt;] - Make REST executor thread priority configurable
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 29 Oct 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/10/29/release-1.5.5.html</link>
<guid isPermaLink="true">/news/2018/10/29/release-1.5.5.html</guid>
</item>
<item>
<title>Apache Flink 1.6.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.6 series.&lt;/p&gt;
&lt;p&gt;This release includes 60 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.6.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.6.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9637&quot;&gt;FLINK-9637&lt;/a&gt;] - Add public user documentation for TTL feature
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10068&quot;&gt;FLINK-10068&lt;/a&gt;] - Add documentation for async/RocksDB-based timers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10085&quot;&gt;FLINK-10085&lt;/a&gt;] - Update AbstractOperatorRestoreTestBase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10087&quot;&gt;FLINK-10087&lt;/a&gt;] - Update BucketingSinkMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10089&quot;&gt;FLINK-10089&lt;/a&gt;] - Update FlinkKafkaConsumerBaseMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10090&quot;&gt;FLINK-10090&lt;/a&gt;] - Update ContinuousFileProcessingMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10091&quot;&gt;FLINK-10091&lt;/a&gt;] - Update WindowOperatorMigrationTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10092&quot;&gt;FLINK-10092&lt;/a&gt;] - Update StatefulJobSavepointMigrationITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10109&quot;&gt;FLINK-10109&lt;/a&gt;] - Add documentation for StreamingFileSink
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9289&quot;&gt;FLINK-9289&lt;/a&gt;] - Parallelism of generated operators should have max parallism of input
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9546&quot;&gt;FLINK-9546&lt;/a&gt;] - The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] - Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9972&quot;&gt;FLINK-9972&lt;/a&gt;] - Debug memory logging not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10011&quot;&gt;FLINK-10011&lt;/a&gt;] - Old job resurrected during HA failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10063&quot;&gt;FLINK-10063&lt;/a&gt;] - Jepsen: Automatically restart Mesos Processes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10101&quot;&gt;FLINK-10101&lt;/a&gt;] - Mesos web ui url is missing.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10105&quot;&gt;FLINK-10105&lt;/a&gt;] - Test failure because of jobmanager.execution.failover-strategy is outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10115&quot;&gt;FLINK-10115&lt;/a&gt;] - Content-length limit is also applied to FileUploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10116&quot;&gt;FLINK-10116&lt;/a&gt;] - createComparator fails on case class with Unit type fields prior to the join-key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10141&quot;&gt;FLINK-10141&lt;/a&gt;] - Reduce lock contention introduced with 1.5
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10142&quot;&gt;FLINK-10142&lt;/a&gt;] - Reduce synchronization overhead for credit notifications
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10150&quot;&gt;FLINK-10150&lt;/a&gt;] - Chained batch operators interfere with each other other
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10151&quot;&gt;FLINK-10151&lt;/a&gt;] - [State TTL] Fix false recursion call in TransformingStateTableKeyGroupPartitioner.tryAddToSource
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10154&quot;&gt;FLINK-10154&lt;/a&gt;] - Make sure we always read at least one record in KinesisConnector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10169&quot;&gt;FLINK-10169&lt;/a&gt;] - RowtimeValidator fails with custom TimestampExtractor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10172&quot;&gt;FLINK-10172&lt;/a&gt;] - Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10192&quot;&gt;FLINK-10192&lt;/a&gt;] - SQL Client table visualization mode does not update correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10193&quot;&gt;FLINK-10193&lt;/a&gt;] - Default RPC timeout is used when triggering savepoint via JobMasterGateway
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10204&quot;&gt;FLINK-10204&lt;/a&gt;] - StreamElementSerializer#copy broken for LatencyMarkers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10255&quot;&gt;FLINK-10255&lt;/a&gt;] - Standby Dispatcher locks submitted JobGraphs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10261&quot;&gt;FLINK-10261&lt;/a&gt;] - INSERT INTO does not work with ORDER BY clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10267&quot;&gt;FLINK-10267&lt;/a&gt;] - [State] Fix arbitrary iterator access on RocksDBMapIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10269&quot;&gt;FLINK-10269&lt;/a&gt;] - Elasticsearch 6 UpdateRequest fail because of binary incompatibility
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10283&quot;&gt;FLINK-10283&lt;/a&gt;] - FileCache logs unnecessary warnings
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10293&quot;&gt;FLINK-10293&lt;/a&gt;] - RemoteStreamEnvironment does not forward port to RestClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10314&quot;&gt;FLINK-10314&lt;/a&gt;] - Blocking calls in Execution Graph creation bring down cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10328&quot;&gt;FLINK-10328&lt;/a&gt;] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10329&quot;&gt;FLINK-10329&lt;/a&gt;] - Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10022&quot;&gt;FLINK-10022&lt;/a&gt;] - Add metrics for input/output buffers
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9013&quot;&gt;FLINK-9013&lt;/a&gt;] - Document yarn.containers.vcores only being effective when adapting YARN config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9446&quot;&gt;FLINK-9446&lt;/a&gt;] - Compatibility table not up-to-date
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9795&quot;&gt;FLINK-9795&lt;/a&gt;] - Update Mesos documentation for flip6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9859&quot;&gt;FLINK-9859&lt;/a&gt;] - More Akka config options
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9899&quot;&gt;FLINK-9899&lt;/a&gt;] - Add more metrics to the Kinesis source connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9962&quot;&gt;FLINK-9962&lt;/a&gt;] - allow users to specify TimeZone in DateTimeBucketer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10001&quot;&gt;FLINK-10001&lt;/a&gt;] - Improve Kubernetes documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10006&quot;&gt;FLINK-10006&lt;/a&gt;] - Improve logging in BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10020&quot;&gt;FLINK-10020&lt;/a&gt;] - Kinesis Consumer listShards should support more recoverable exceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10082&quot;&gt;FLINK-10082&lt;/a&gt;] - Initialize StringBuilder in Slf4jReporter with estimated size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10094&quot;&gt;FLINK-10094&lt;/a&gt;] - Always backup default config for end-to-end tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10110&quot;&gt;FLINK-10110&lt;/a&gt;] - Harden e2e Kafka shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10131&quot;&gt;FLINK-10131&lt;/a&gt;] - Improve logging around ResultSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10137&quot;&gt;FLINK-10137&lt;/a&gt;] - YARN: Log completed Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10164&quot;&gt;FLINK-10164&lt;/a&gt;] - Add support for resuming from savepoints to StandaloneJobClusterEntrypoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10170&quot;&gt;FLINK-10170&lt;/a&gt;] - Support string representation for map and array types in descriptor-based Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10185&quot;&gt;FLINK-10185&lt;/a&gt;] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10223&quot;&gt;FLINK-10223&lt;/a&gt;] - TaskManagers should log their ResourceID during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10301&quot;&gt;FLINK-10301&lt;/a&gt;] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10325&quot;&gt;FLINK-10325&lt;/a&gt;] - [State TTL] Refactor TtlListState to use only loops, no java stream API for performance
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10084&quot;&gt;FLINK-10084&lt;/a&gt;] - Migration tests weren&amp;#39;t updated for 1.5
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 20 Sep 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/09/20/release-1.6.1.html</link>
<guid isPermaLink="true">/news/2018/09/20/release-1.6.1.html</guid>
</item>
<item>
<title>Apache Flink 1.5.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the fourth bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.4.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9878&quot;&gt;FLINK-9878&lt;/a&gt;] - IO worker threads BLOCKED on SSL Session Cache while CMS full gc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10011&quot;&gt;FLINK-10011&lt;/a&gt;] - Old job resurrected during HA failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10101&quot;&gt;FLINK-10101&lt;/a&gt;] - Mesos web ui url is missing.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10115&quot;&gt;FLINK-10115&lt;/a&gt;] - Content-length limit is also applied to FileUploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10116&quot;&gt;FLINK-10116&lt;/a&gt;] - createComparator fails on case class with Unit type fields prior to the join-key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10141&quot;&gt;FLINK-10141&lt;/a&gt;] - Reduce lock contention introduced with 1.5
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10142&quot;&gt;FLINK-10142&lt;/a&gt;] - Reduce synchronization overhead for credit notifications
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10150&quot;&gt;FLINK-10150&lt;/a&gt;] - Chained batch operators interfere with each other other
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10172&quot;&gt;FLINK-10172&lt;/a&gt;] - Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10193&quot;&gt;FLINK-10193&lt;/a&gt;] - Default RPC timeout is used when triggering savepoint via JobMasterGateway
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10204&quot;&gt;FLINK-10204&lt;/a&gt;] - StreamElementSerializer#copy broken for LatencyMarkers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10255&quot;&gt;FLINK-10255&lt;/a&gt;] - Standby Dispatcher locks submitted JobGraphs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10261&quot;&gt;FLINK-10261&lt;/a&gt;] - INSERT INTO does not work with ORDER BY clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10267&quot;&gt;FLINK-10267&lt;/a&gt;] - [State] Fix arbitrary iterator access on RocksDBMapIterator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10293&quot;&gt;FLINK-10293&lt;/a&gt;] - RemoteStreamEnvironment does not forward port to RestClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10314&quot;&gt;FLINK-10314&lt;/a&gt;] - Blocking calls in Execution Graph creation bring down cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10328&quot;&gt;FLINK-10328&lt;/a&gt;] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10329&quot;&gt;FLINK-10329&lt;/a&gt;] - Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10082&quot;&gt;FLINK-10082&lt;/a&gt;] - Initialize StringBuilder in Slf4jReporter with estimated size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10131&quot;&gt;FLINK-10131&lt;/a&gt;] - Improve logging around ResultSubpartition
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10137&quot;&gt;FLINK-10137&lt;/a&gt;] - YARN: Log completed Containers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10185&quot;&gt;FLINK-10185&lt;/a&gt;] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10223&quot;&gt;FLINK-10223&lt;/a&gt;] - TaskManagers should log their ResourceID during startup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10301&quot;&gt;FLINK-10301&lt;/a&gt;] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 20 Sep 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/09/20/release-1.5.4.html</link>
<guid isPermaLink="true">/news/2018/09/20/release-1.5.4.html</guid>
</item>
<item>
<title>Apache Flink 1.5.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.3. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.3.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9951&quot;&gt;FLINK-9951&lt;/a&gt;] - Update scm developerConnection
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5750&quot;&gt;FLINK-5750&lt;/a&gt;] - Incorrect translation of n-ary Union
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9289&quot;&gt;FLINK-9289&lt;/a&gt;] - Parallelism of generated operators should have max parallism of input
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9546&quot;&gt;FLINK-9546&lt;/a&gt;] - The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9655&quot;&gt;FLINK-9655&lt;/a&gt;] - Externalized checkpoint E2E test fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] - Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9694&quot;&gt;FLINK-9694&lt;/a&gt;] - Potentially NPE in CompositeTypeSerializerConfigSnapshot constructor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9923&quot;&gt;FLINK-9923&lt;/a&gt;] - OneInputStreamTaskTest.testWatermarkMetrics fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9935&quot;&gt;FLINK-9935&lt;/a&gt;] - Batch Table API: grouping by window and attribute causes java.lang.ClassCastException:
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9936&quot;&gt;FLINK-9936&lt;/a&gt;] - Mesos resource manager unable to connect to master after failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9946&quot;&gt;FLINK-9946&lt;/a&gt;] - Quickstart E2E test archetype version is hard-coded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9969&quot;&gt;FLINK-9969&lt;/a&gt;] - Unreasonable memory requirements to complete examples/batch/WordCount
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9972&quot;&gt;FLINK-9972&lt;/a&gt;] - Debug memory logging not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9978&quot;&gt;FLINK-9978&lt;/a&gt;] - Source release sha contains absolute file path
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9985&quot;&gt;FLINK-9985&lt;/a&gt;] - Incorrect parameter order in document
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9988&quot;&gt;FLINK-9988&lt;/a&gt;] - job manager does not respect property jobmanager.web.address
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10013&quot;&gt;FLINK-10013&lt;/a&gt;] - Fix Kerberos integration for FLIP-6 YarnTaskExecutorRunner
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10033&quot;&gt;FLINK-10033&lt;/a&gt;] - Let Task release reference to Invokable on shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10070&quot;&gt;FLINK-10070&lt;/a&gt;] - Flink cannot be compiled with maven 3.0.x
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10022&quot;&gt;FLINK-10022&lt;/a&gt;] - Add metrics for input/output buffers
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9446&quot;&gt;FLINK-9446&lt;/a&gt;] - Compatibility table not up-to-date
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9765&quot;&gt;FLINK-9765&lt;/a&gt;] - Improve CLI responsiveness when cluster is not reachable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9806&quot;&gt;FLINK-9806&lt;/a&gt;] - Add a canonical link element to documentation HTML
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9859&quot;&gt;FLINK-9859&lt;/a&gt;] - More Akka config options
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9942&quot;&gt;FLINK-9942&lt;/a&gt;] - Guard handlers against null fields in requests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9986&quot;&gt;FLINK-9986&lt;/a&gt;] - Remove unnecessary information from .version.properties file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9987&quot;&gt;FLINK-9987&lt;/a&gt;] - Rework ClassLoader E2E test to not rely on .version.properties file
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10006&quot;&gt;FLINK-10006&lt;/a&gt;] - Improve logging in BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-10016&quot;&gt;FLINK-10016&lt;/a&gt;] - Make YARN/Kerberos end-to-end test stricter
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 21 Aug 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/08/21/release-1.5.3.html</link>
<guid isPermaLink="true">/news/2018/08/21/release-1.5.3.html</guid>
</item>
<item>
<title>Apache Flink 1.6.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is proud to announce the 1.6.0 release. Over the past 2 months, the Flink community has worked hard to resolve more than 360 issues. Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12342760&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;Flink 1.6.0 is the seventh major release in the 1.x.y series. It is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;
&lt;p&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#flink-16---the-next-step-in-stateful-stream-processing&quot; id=&quot;markdown-toc-flink-16---the-next-step-in-stateful-stream-processing&quot;&gt;Flink 1.6 - The next step in stateful stream processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#improving-flinks-state-support&quot; id=&quot;markdown-toc-improving-flinks-state-support&quot;&gt;Improving Flink’s State Support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#extending-flinks-deployment-options&quot; id=&quot;markdown-toc-extending-flinks-deployment-options&quot;&gt;Extending Flink’s Deployment Options&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#enhancing-sql-and-table-api&quot; id=&quot;markdown-toc-enhancing-sql-and-table-api&quot;&gt;Enhancing SQL and Table API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#more-connectors&quot; id=&quot;markdown-toc-more-connectors&quot;&gt;More Connectors&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#jepsen-based-distributed-tests-suite&quot; id=&quot;markdown-toc-jepsen-based-distributed-tests-suite&quot;&gt;Jepsen Based Distributed Tests Suite&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#various-other-features-and-improvements&quot; id=&quot;markdown-toc-various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;flink-16---the-next-step-in-stateful-stream-processing&quot;&gt;Flink 1.6 - The next step in stateful stream processing&lt;/h2&gt;
&lt;p&gt;In Flink 1.6.0 we continue the groundwork we laid out in earlier versions: Enabling Flink users to seamlessly run fast data processing and build data-driven and data-intensive applications effortlessly.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Flink’s state support is one of the key features which makes Flink so versatile and powerful when it comes to implementing all kinds of use cases.
To make it even easier, the community added &lt;strong&gt;native support for state TTL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9938&quot;&gt;FLINK-9938&lt;/a&gt;).
This feature allows to clean up state after it has expired.
With Flink 1.6.0 &lt;strong&gt;timer state can now go out of core&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9485&quot;&gt;FLINK-9485&lt;/a&gt;) by storing the relevant state in RocksDB.
Last but not least, we also &lt;strong&gt;improved the deletion of timers&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9423&quot;&gt;FLINK-9423&lt;/a&gt;) significantly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;With Flink 1.5.0 we reworked Flink’s distributed architecture to add support for resource elasticity and different deployment scenarios, most notably a better container integration.
In Flink 1.6.0 we follow up on some of the unfinished aspects of this work: &lt;strong&gt;All external communication, including job submissions, is now HTTP/REST based&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;) which eases container setups considerably.
Flink 1.6.0 also comes with a &lt;strong&gt;container entrypoint&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9488&quot;&gt;FLINK-9488&lt;/a&gt;) which allows to easily bootstrap a containerized job cluster.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Streaming SQL is one of the features with the most disruptive potential, because it makes Flink much more accessible.
In Apache Flink 1.6.0 the community &lt;strong&gt;improved further the SQL CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8863&quot;&gt;FLINK-8863&lt;/a&gt;) making the &lt;strong&gt;executions of streaming and batch queries&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8861&quot;&gt;FLINK-8861&lt;/a&gt;) against a multitude of data sources a piece of cake.
In addition, the &lt;strong&gt;full Avro support&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9444&quot;&gt;FLINK-9444&lt;/a&gt;) makes reading any kind of Avro data seamless.
Last but not least, the community &lt;strong&gt;hardened Flink’s CEP library&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9418&quot;&gt;FLINK-9418&lt;/a&gt;) that can now handle significantly larger use cases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;What would be a distributed processing engine without its connectors to talk to the outside world?
In the latest Flink release we added a &lt;strong&gt;new StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9750&quot;&gt;FLINK-9750&lt;/a&gt;) that succeeds the &lt;code&gt;BucketingSink&lt;/code&gt; as the standard file sink.
The community also added support for &lt;strong&gt;ElasticSearch 6.x&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7386&quot;&gt;FLINK-7386&lt;/a&gt;) and implemented multiple &lt;strong&gt;AvroDeserializationSchemas&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9338&quot;&gt;FLINK-9338&lt;/a&gt;) to easily ingest Avro data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;improving-flinks-state-support&quot;&gt;Improving Flink’s State Support&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Support for State TTL&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9510&quot;&gt;FLINK-9510&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9938&quot;&gt;FLINK-9938&lt;/a&gt;):
This feature allows to specify a time-to-live (TTL) for Flink state.
Once the time-to-live has been exceeded Flink will no longer give access to the respective state values.
The expired data is cleaned up on access so that the operator keyed state doesn’t grow infinitely and it won’t be included in subsequent checkpoints.
This feature fully complies with new data protection regulations (e.g. GDPR).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Scalable Timers Based on RocksDB&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9485&quot;&gt;FLINK-9485&lt;/a&gt;):
Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk.
Previously, users were limited to the heap memory size.
On top of that, snapshots of the timer state are now asynchronous, i.e., they no longer block the processing pipeline during checkpoints and can be incremental.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Faster Timer Deletions&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9423&quot;&gt;FLINK-9423&lt;/a&gt;):
Improving Flink’s internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n).
This significantly improves Flink jobs using timers.
Deleting timers is also exposed through a user-facing API now.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;extending-flinks-deployment-options&quot;&gt;Extending Flink’s Deployment Options&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Job Cluster Container Entrypoint&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9488&quot;&gt;FLINK-9488&lt;/a&gt;):
Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster.
Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed.
Since the image already contains the Flink job, client communication is no longer necessary.
Avoiding additional communication steps with the client reduces the number of moving parts and improves operations in a container environment significantly.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fully RESTified Job Submission&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;):
The Flink client now sends all job-relevant content via a single POST call to the server.
This allows a much easier integration with cluster management frameworks and container environments, since opening custom ports is no longer necessary.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;enhancing-sql-and-table-api&quot;&gt;Enhancing SQL and Table API&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;User-Defined Function in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8863&quot;&gt;FLINK-8863&lt;/a&gt;):
The SQL Client CLI now supports the registration of user-defined functions.
This considerably improves the CLI’s expressiveness, because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Support for Batch Queries in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8861&quot;&gt;FLINK-8861&lt;/a&gt;):
The SQL Client CLI now supports the execution of batch queries.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Support for INSERT INTO Statements in SQL Client CLI&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8858&quot;&gt;FLINK-8858&lt;/a&gt;):
By supporting SQL’s INSERT INTO statements, the SQL Client CLI can be used to submit long-running SQL queries to Flink that sink their results in external systems.
The SQL Client itself can be shut down after submission without stopping the job.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unified Table Sinks and Formats&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8866&quot;&gt;FLINK-8866&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8558&quot;&gt;FLINK-8558&lt;/a&gt;):
In the past, table sinks had to be configured programmatically and were tied to a specific format and implementation.
This release reworked these aspects by decoupling formats from connectors and improving how table sinks are discovered and configured.
Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New Kafka Table Sink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9846&quot;&gt;FLINK-9846&lt;/a&gt;):
The Kafka table sink now uses the new unified APIs and supports both JSON and Avro formats.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Full SQL Avro Support&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9444&quot;&gt;FLINK-9444&lt;/a&gt;):
Flink’s Table &amp;amp; SQL API now understands the full spectrum of Avro types including generic/specific records and logical types.
The types are automatically mapped from and to Flink-equivalent types allowing to specify end-to-end ETL pipelines in SQL.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Improved Expressiveness of SQL and Table API&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5878&quot;&gt;FLINK-5878&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8688&quot;&gt;FLINK-8688&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6810&quot;&gt;FLINK-6810&lt;/a&gt;):
Flink’s Table &amp;amp; SQL API supports left, right, and full outer joins that allow for continuous result-updating queries.
SQL aggregate functions support the &lt;code&gt;DISTINCT&lt;/code&gt; keyword.
Queries such as &lt;code&gt;COUNT(DISTINCT column)&lt;/code&gt; are supported for windowed and non-windowed aggregations.
Both SQL and Table API now include more built-in functions such as &lt;code&gt;MD5, SHA1, SHA2, LOG&lt;/code&gt;, and &lt;code&gt;UNNEST&lt;/code&gt; for multisets.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;more-connectors&quot;&gt;More Connectors&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New StreamingFileSink&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9750&quot;&gt;FLINK-9750&lt;/a&gt;):
The new &lt;code&gt;StreamingFileSink&lt;/code&gt; is an exactly-once sink for writing to filesystems which capitalizes on the knowledge acquired from the previous &lt;code&gt;BucketingSink&lt;/code&gt;.
Exactly-once is supported through integration of the sink with Flink’s checkpointing mechanism.
The new sink is built upon Flink’s own &lt;code&gt;FileSystem&lt;/code&gt; abstraction and it supports local file system and HDFS, with plans for S3 support in the near future.
It exposes pluggable file rolling and bucketing policies.
Apart from row-wise encoding formats, the new &lt;code&gt;StreamingFileSink&lt;/code&gt; comes with support for Parquet.
Other bulk-encoding formats like ORC can be easily added using the exposed APIs.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ElasticSearch 6.x Connector and Improved Support for Older Versions&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7386&quot;&gt;FLINK-7386&lt;/a&gt;):
Flink now comes with a connector for ElasticSearch 6.x, that is built on top of Elasticsearch’s new &lt;a href=&quot;https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high.html&quot;&gt;high level REST client&lt;/a&gt;.
For older ElasticSearch versions which still use the native Java &lt;code&gt;TransportClient&lt;/code&gt;, Flink’s Elasticsearch connectors now support up to Elasticsearch version 5.6.10.
Some APIs in the &lt;code&gt;RequestIndexer&#39;s&lt;/code&gt; public interface of the ElasticSearch connector have been deprecated.
Please refer to the Javadoc / documentation for the new preferred API.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Avro Deserialization Schemas&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9338&quot;&gt;FLINK-9338&lt;/a&gt;):
Flink comes now with a &lt;code&gt;DeserializationSchema&lt;/code&gt; which allows deserializing Avro encoded messages.
It also adds out-of-the-box integration with Confluent’s schema registry.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;jepsen-based-distributed-tests-suite&quot;&gt;Jepsen Based Distributed Tests Suite&lt;/h3&gt;
&lt;p&gt;The Flink community added a Jepsen based test suite (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9004&quot;&gt;FLINK-9004&lt;/a&gt;) which validates the behavior of Flink’s distributed cluster components under real-world faults.
It is a first step towards a higher test coverage for Flink’s fault tolerance mechanisms.
The community intends to incrementally improve test coverage with it.&lt;/p&gt;
&lt;h3 id=&quot;various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Hardened CEP Library&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9418&quot;&gt;FLINK-9418&lt;/a&gt;):
The CEP operator’s internal NFA state is now backed by Flink state.
That way it can go out of core to support much larger use cases.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;More Expressive DataStream Joins&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8478&quot;&gt;FLINK-8478&lt;/a&gt;):
Flink 1.6.0 adds support for interval joins in the DataStream API.
With this feature it is now possible to join together events from different streams where elements from one stream lie in a specified time interval relative to elements from the other stream.
Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/operators/joining.html&quot;&gt;documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Intra-Cluster Mutual Authentication&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9312&quot;&gt;FLINK-9312&lt;/a&gt;):
Flink’s cluster components now enforce mutual authentication with their peers.
This allows only Flink components to talk to each other, making it impossible for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.6/release-notes/flink-1.6.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.6.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following 112 people contributed to the 1.6.0 release. Thanks to all contributors!&lt;/p&gt;
&lt;p&gt;Alejandro Alcalde, Alexander Koltsov, Alexey Tsitkin, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Arunan Sugunakumar, Ashwin Sinha, Bill Lee, Bowen Li, Chesnay Schepler, Christophe Jolif, Clément Tamisier, Craig Foster, David Anderson, Dawid Wysakowicz, Deepak Sharnma, Dmitrii_Kniazev, EAlexRojas, Elias Levy, Eron Wright, Ethan Li, Fabian Hueske, Florian Schmidt, Franz Thoma, Gabor Gevay, Georgii Gobozov, Haohui Mai, Jamie Grier, Jeff Zhang, Jelmer Kuperus, Jiayi Liao, Jungtaek Lim, Kailash HD, Ken Geis, Ken Krugler, Lakshmi Gururaja Rao, Leonid Ishimnikov, Matrix42, Michael Gendelman, MichealShin, Moser Thomas W, Nico Duldhardt, Nico Kruber, Oleksandr Nitavskyi, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Philippe Duveau, Piotr Nowojski, Qiu Congxian/klion26, Rinat Sharipov, Rong Rong, Rune Skou Larsen, Sayat Satybaldiyev, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Thomas Weise, Till Rohrmann, Timo Walther, Tobii42, Tzu-Li (Gordon) Tai, Viktor Vlasov, Wosin, Xingcan Cui, Xpray, Yan Zhou, Yazdan.JS, Yun Tang, Zhijiang, Zsolt Donca, an4828, aria, binlijin, blueszheng, davidxdh, gyao, hequn8128, hzyuqi1, jerryjzhang, jparkie, juhoautio, kai-chi, kkloudas, klion26, lamber-ken, lincoln-lil, linjun, liurenjie1024, lsy, maqingxiang-it, maxbelov, mayyamus, minwenjun, neoremind, sampathBhat, shankarganesh1234, shuai.xus, sihuazhou, snuyanzin, triones.deng, vinoyang, xueyu, yangshimin, yuemeng, zhangminglei, zhouhai02, zjureel, 军长, 陈梓立&lt;/p&gt;
</description>
<pubDate>Thu, 09 Aug 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/08/09/release-1.6.0.html</link>
<guid isPermaLink="true">/news/2018/08/09/release-1.6.0.html</guid>
</item>
<item>
<title>Apache Flink 1.5.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 20 fixes and minor improvements for Flink 1.5.1. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9839&quot;&gt;FLINK-9839&lt;/a&gt;] - End-to-end test: Streaming job with SSL
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5750&quot;&gt;FLINK-5750&lt;/a&gt;] - Incorrect translation of n-ary Union
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8161&quot;&gt;FLINK-8161&lt;/a&gt;] - Flakey YARNSessionCapacitySchedulerITCase on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8731&quot;&gt;FLINK-8731&lt;/a&gt;] - TwoInputStreamTaskTest flaky on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9091&quot;&gt;FLINK-9091&lt;/a&gt;] - Failure while enforcing releasability in building flink-json module
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9380&quot;&gt;FLINK-9380&lt;/a&gt;] - Failing end-to-end tests should not clean up logs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9439&quot;&gt;FLINK-9439&lt;/a&gt;] - DispatcherTest#testJobRecovery dead locks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9575&quot;&gt;FLINK-9575&lt;/a&gt;] - Potential race condition when removing JobGraph in HA
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9584&quot;&gt;FLINK-9584&lt;/a&gt;] - Unclosed streams in Bucketing-/RollingSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9658&quot;&gt;FLINK-9658&lt;/a&gt;] - Test data output directories are no longer cleaned up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9706&quot;&gt;FLINK-9706&lt;/a&gt;] - DispatcherTest#testSubmittedJobGraphListener fails on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9743&quot;&gt;FLINK-9743&lt;/a&gt;] - PackagedProgram.extractContainedLibraries fails on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9754&quot;&gt;FLINK-9754&lt;/a&gt;] - Release scripts refers to non-existing profile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9755&quot;&gt;FLINK-9755&lt;/a&gt;] - Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9762&quot;&gt;FLINK-9762&lt;/a&gt;] - CoreOptions.TMP_DIRS wrongly managed on Yarn
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9766&quot;&gt;FLINK-9766&lt;/a&gt;] - Incomplete/incorrect cleanup in RemoteInputChannelTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9771&quot;&gt;FLINK-9771&lt;/a&gt;] - &amp;quot;Show Plan&amp;quot; option under Submit New Job in WebUI not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9772&quot;&gt;FLINK-9772&lt;/a&gt;] - Documentation of Hadoop API outdated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9784&quot;&gt;FLINK-9784&lt;/a&gt;] - Inconsistent use of &amp;#39;static&amp;#39; in AsyncIOExample.java
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9793&quot;&gt;FLINK-9793&lt;/a&gt;] - When submitting a flink job with yarn-cluster, flink-dist*.jar is repeatedly uploaded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9810&quot;&gt;FLINK-9810&lt;/a&gt;] - JarListHandler does not close opened jars
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9838&quot;&gt;FLINK-9838&lt;/a&gt;] - Slot request failed Exceptions after completing a job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9841&quot;&gt;FLINK-9841&lt;/a&gt;] - Web UI only show partial taskmanager log
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9842&quot;&gt;FLINK-9842&lt;/a&gt;] - Job submission fails via CLI with SSL enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9847&quot;&gt;FLINK-9847&lt;/a&gt;] - OneInputStreamTaskTest.testWatermarksNotForwardedWithinChainWhenIdle unstable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9857&quot;&gt;FLINK-9857&lt;/a&gt;] - Processing-time timers fire too early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9860&quot;&gt;FLINK-9860&lt;/a&gt;] - Netty resource leak on receiver side
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9872&quot;&gt;FLINK-9872&lt;/a&gt;] - SavepointITCase#testSavepointForJobWithIteration does not properly cancel jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9908&quot;&gt;FLINK-9908&lt;/a&gt;] - Inconsistent state of SlotPool after ExecutionGraph cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9910&quot;&gt;FLINK-9910&lt;/a&gt;] - Non-queued scheduling failure sometimes does not return the slot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9911&quot;&gt;FLINK-9911&lt;/a&gt;] - SlotPool#failAllocation is called outside of main thread
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9499&quot;&gt;FLINK-9499&lt;/a&gt;] - Allow REST API for running a job to provide job configuration as body of POST request
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9659&quot;&gt;FLINK-9659&lt;/a&gt;] - Remove hard-coded sleeps in bucketing sink E2E test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9748&quot;&gt;FLINK-9748&lt;/a&gt;] - create_source_release pollutes flink root directory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9768&quot;&gt;FLINK-9768&lt;/a&gt;] - Only build flink-dist for binary releases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9785&quot;&gt;FLINK-9785&lt;/a&gt;] - Add remote addresses to LocalTransportException instances
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9801&quot;&gt;FLINK-9801&lt;/a&gt;] - flink-dist is missing dependency on flink-examples
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9804&quot;&gt;FLINK-9804&lt;/a&gt;] - KeyedStateBackend.getKeys() does not work on RocksDB MapState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9811&quot;&gt;FLINK-9811&lt;/a&gt;] - Add ITCase for interactions of Jar handlers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9873&quot;&gt;FLINK-9873&lt;/a&gt;] - Log actual state when aborting checkpoint due to task not running
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9881&quot;&gt;FLINK-9881&lt;/a&gt;] - Typo in a function name in table.scala
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9888&quot;&gt;FLINK-9888&lt;/a&gt;] - Remove unsafe defaults from release scripts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9909&quot;&gt;FLINK-9909&lt;/a&gt;] - Remove cancellation of input futures from ConjunctFutures
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Tue, 31 Jul 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/07/31/release-1.5.2.html</link>
<guid isPermaLink="true">/news/2018/07/31/release-1.5.2.html</guid>
</item>
<item>
<title>Apache Flink 1.5.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.5 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.5.0. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.5.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.5.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8977&quot;&gt;FLINK-8977&lt;/a&gt;] - End-to-end test: Manually resume job after terminal failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8982&quot;&gt;FLINK-8982&lt;/a&gt;] - End-to-end test: Queryable state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8989&quot;&gt;FLINK-8989&lt;/a&gt;] - End-to-end test: ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8996&quot;&gt;FLINK-8996&lt;/a&gt;] - Include an operator with broadcast and union state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9008&quot;&gt;FLINK-9008&lt;/a&gt;] - End-to-end test: Quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9320&quot;&gt;FLINK-9320&lt;/a&gt;] - Update `test-ha.sh` end-to-end test to use general purpose DataStream job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9322&quot;&gt;FLINK-9322&lt;/a&gt;] - Add exception throwing map function that simulates failures to the general purpose DataStream job
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9394&quot;&gt;FLINK-9394&lt;/a&gt;] - Let externalized checkpoint resume e2e also test rescaling
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8785&quot;&gt;FLINK-8785&lt;/a&gt;] - JobSubmitHandler does not handle JobSubmissionExceptions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8795&quot;&gt;FLINK-8795&lt;/a&gt;] - Scala shell broken for Flip6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8946&quot;&gt;FLINK-8946&lt;/a&gt;] - TaskManager stop sending metrics after JobManager failover
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9174&quot;&gt;FLINK-9174&lt;/a&gt;] - The type of state created in ProccessWindowFunction.proccess() is inconsistency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9215&quot;&gt;FLINK-9215&lt;/a&gt;] - TaskManager Releasing - org.apache.flink.util.FlinkException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9257&quot;&gt;FLINK-9257&lt;/a&gt;] - End-to-end tests prints &amp;quot;All tests PASS&amp;quot; even if individual test-script returns non-zero exit code
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9258&quot;&gt;FLINK-9258&lt;/a&gt;] - ConcurrentModificationException in ComponentMetricGroup.getAllVariables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9326&quot;&gt;FLINK-9326&lt;/a&gt;] - TaskManagerOptions.NUM_TASK_SLOTS does not work for local/embedded mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9374&quot;&gt;FLINK-9374&lt;/a&gt;] - Flink Kinesis Producer does not backpressure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9398&quot;&gt;FLINK-9398&lt;/a&gt;] - Flink CLI list running job returns all jobs except in CREATE state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9437&quot;&gt;FLINK-9437&lt;/a&gt;] - Revert cypher suite update
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9458&quot;&gt;FLINK-9458&lt;/a&gt;] - Unable to recover from job failure on YARN with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9467&quot;&gt;FLINK-9467&lt;/a&gt;] - No Watermark display on Web UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9468&quot;&gt;FLINK-9468&lt;/a&gt;] - Wrong calculation of outputLimit in LimitedConnectionsFileSystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9493&quot;&gt;FLINK-9493&lt;/a&gt;] - Forward exception when releasing a TaskManager at the SlotPool
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9494&quot;&gt;FLINK-9494&lt;/a&gt;] - Race condition in Dispatcher with concurrent granting and revoking of leaderhship
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9500&quot;&gt;FLINK-9500&lt;/a&gt;] - FileUploadHandler does not handle EmptyLastHttpContent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9524&quot;&gt;FLINK-9524&lt;/a&gt;] - NPE from ProcTimeBoundedRangeOver.scala
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9530&quot;&gt;FLINK-9530&lt;/a&gt;] - Task numRecords metrics broken for chains
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9554&quot;&gt;FLINK-9554&lt;/a&gt;] - flink scala shell doesn&amp;#39;t work in yarn mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9567&quot;&gt;FLINK-9567&lt;/a&gt;] - Flink does not release resource in Yarn Cluster mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9570&quot;&gt;FLINK-9570&lt;/a&gt;] - SQL Client merging environments uses AbstractMap
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9580&quot;&gt;FLINK-9580&lt;/a&gt;] - Potentially unclosed ByteBufInputStream in RestClient#readRawResponse
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9627&quot;&gt;FLINK-9627&lt;/a&gt;] - Extending &amp;#39;KafkaJsonTableSource&amp;#39; according to comments will result in NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9629&quot;&gt;FLINK-9629&lt;/a&gt;] - Datadog metrics reporter does not have shaded dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9633&quot;&gt;FLINK-9633&lt;/a&gt;] - Flink doesn&amp;#39;t use the Savepoint path&amp;#39;s filesystem to create the OuptutStream on Task.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9634&quot;&gt;FLINK-9634&lt;/a&gt;] - Deactivate previous location based scheduling if local recovery is disabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9636&quot;&gt;FLINK-9636&lt;/a&gt;] - Network buffer leaks in requesting a batch of segments during canceling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9646&quot;&gt;FLINK-9646&lt;/a&gt;] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9654&quot;&gt;FLINK-9654&lt;/a&gt;] - Internal error while deserializing custom Scala TypeSerializer instances
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9655&quot;&gt;FLINK-9655&lt;/a&gt;] - Externalized checkpoint E2E test fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9665&quot;&gt;FLINK-9665&lt;/a&gt;] - PrometheusReporter does not properly unregister metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9676&quot;&gt;FLINK-9676&lt;/a&gt;] - Deadlock during canceling task and recycling exclusive buffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9677&quot;&gt;FLINK-9677&lt;/a&gt;] - RestClient fails for large uploads
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9684&quot;&gt;FLINK-9684&lt;/a&gt;] - HistoryServerArchiveFetcher not working properly with secure hdfs cluster
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9693&quot;&gt;FLINK-9693&lt;/a&gt;] - Possible memory leak in jobmanager retaining archived checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9708&quot;&gt;FLINK-9708&lt;/a&gt;] - Network buffer leaks when buffer request fails during buffer redistribution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9769&quot;&gt;FLINK-9769&lt;/a&gt;] - FileUploads may be shared across requests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9770&quot;&gt;FLINK-9770&lt;/a&gt;] - UI jar list broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9789&quot;&gt;FLINK-9789&lt;/a&gt;] - Watermark metrics for an operator&amp;amp;task shadow each other
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; New Feature
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9153&quot;&gt;FLINK-9153&lt;/a&gt;] - TaskManagerRunner should support rpc port range
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9280&quot;&gt;FLINK-9280&lt;/a&gt;] - Extend JobSubmitHandler to accept jar files
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9316&quot;&gt;FLINK-9316&lt;/a&gt;] - Expose operator unique ID to the user defined functions in DataStream .
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9564&quot;&gt;FLINK-9564&lt;/a&gt;] - Expose end-to-end module directory to test scripts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9599&quot;&gt;FLINK-9599&lt;/a&gt;] - Implement generic mechanism to receive files via rest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9669&quot;&gt;FLINK-9669&lt;/a&gt;] - Introduce task manager assignment store
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9670&quot;&gt;FLINK-9670&lt;/a&gt;] - Introduce slot manager factory
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9671&quot;&gt;FLINK-9671&lt;/a&gt;] - Add configuration to enable task manager isolation.
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4301&quot;&gt;FLINK-4301&lt;/a&gt;] - Parameterize Flink version in Quickstart bash script
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8650&quot;&gt;FLINK-8650&lt;/a&gt;] - Add tests and documentation for WINDOW clause
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8654&quot;&gt;FLINK-8654&lt;/a&gt;] - Extend quickstart docs on how to submit jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9109&quot;&gt;FLINK-9109&lt;/a&gt;] - Add flink modify command to documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9355&quot;&gt;FLINK-9355&lt;/a&gt;] - Simplify configuration of local recovery to a simple on/off
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9372&quot;&gt;FLINK-9372&lt;/a&gt;] - Typo on Elasticsearch website link (elastic.io --&amp;gt; elastic.co)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9409&quot;&gt;FLINK-9409&lt;/a&gt;] - Remove flink-avro and flink-json from /opt
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9456&quot;&gt;FLINK-9456&lt;/a&gt;] - Let ResourceManager notify JobManager about failed/killed TaskManagers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9508&quot;&gt;FLINK-9508&lt;/a&gt;] - General Spell Check on Flink Docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9517&quot;&gt;FLINK-9517&lt;/a&gt;] - Fixing broken links on CLI and Upgrade Docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9518&quot;&gt;FLINK-9518&lt;/a&gt;] - SSL setup Docs config example has wrong keys password
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9549&quot;&gt;FLINK-9549&lt;/a&gt;] - Fix FlickCEP Docs broken link and minor style changes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9573&quot;&gt;FLINK-9573&lt;/a&gt;] - Check for leadership with leader session id
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9594&quot;&gt;FLINK-9594&lt;/a&gt;] - Add documentation for e2e test changes introduced with FLINK-9257
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9595&quot;&gt;FLINK-9595&lt;/a&gt;] - Add instructions to docs about ceased support of KPL version used in Kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9638&quot;&gt;FLINK-9638&lt;/a&gt;] - Add helper script to run single e2e test
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9672&quot;&gt;FLINK-9672&lt;/a&gt;] - Fail fatally if we cannot submit job on added JobGraph signal
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9707&quot;&gt;FLINK-9707&lt;/a&gt;] - LocalFileSystem does not support concurrent directory creations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9729&quot;&gt;FLINK-9729&lt;/a&gt;] - Duplicate lines for &amp;quot;Weekday name (Sunday .. Saturday)&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-9734&quot;&gt;FLINK-9734&lt;/a&gt;] - Typo &amp;#39;field-deleimiter&amp;#39; in SQL client docs
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 12 Jul 2018 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/07/12/release-1.5.1.html</link>
<guid isPermaLink="true">/news/2018/07/12/release-1.5.1.html</guid>
</item>
<item>
<title>Apache Flink 1.5.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is thrilled to announce the 1.5.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 780 issues. Please check the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12341764&amp;amp;projectId=12315522&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;
&lt;p&gt;Flink 1.5.0 is the sixth major release in the 1.x.y series. As usual, it is API-compatible with previous 1.x.y releases for APIs annotated with the &lt;code&gt;@Public&lt;/code&gt; annotation.&lt;/p&gt;
&lt;p&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.5/&quot;&gt;documentation&lt;/a&gt;.
Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; or &lt;a href=&quot;https://issues.apache.org/jira/projects/FLINK/summary&quot;&gt;JIRA&lt;/a&gt; is, as always, very much appreciated!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt; on the Flink project site.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#flink-15---streaming-evolved&quot; id=&quot;markdown-toc-flink-15---streaming-evolved&quot;&gt;Flink 1.5 - Streaming Evolved&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#rewrite-of-flinks-deployment-and-process-model&quot; id=&quot;markdown-toc-rewrite-of-flinks-deployment-and-process-model&quot;&gt;Rewrite of Flink’s Deployment and Process Model&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#broadcast-state&quot; id=&quot;markdown-toc-broadcast-state&quot;&gt;Broadcast State&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improvements-to-flinks-network-stack&quot; id=&quot;markdown-toc-improvements-to-flinks-network-stack&quot;&gt;Improvements to Flink’s Network Stack&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#task-local-state-recovery&quot; id=&quot;markdown-toc-task-local-state-recovery&quot;&gt;Task-Local State Recovery&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#extending-join-support-for-sql-and-table-api&quot; id=&quot;markdown-toc-extending-join-support-for-sql-and-table-api&quot;&gt;Extending Join Support for SQL and Table API&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#sql-cli-client&quot; id=&quot;markdown-toc-sql-cli-client&quot;&gt;SQL CLI Client&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#various-other-features-and-improvements&quot; id=&quot;markdown-toc-various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes&quot; id=&quot;markdown-toc-release-notes&quot;&gt;Release Notes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;flink-15---streaming-evolved&quot;&gt;Flink 1.5 - Streaming Evolved&lt;/h2&gt;
&lt;p&gt;We believe that the field of stream processing, and Apache Flink with it, is taking another major leap at the moment. Stream processing is not just faster analytics and a more principled way of building fast continuous data pipelines. Stream processing is becoming a paradigm to build data-driven and data-intensive applications - it brings together data processing logic and application/business logic.&lt;/p&gt;
&lt;p&gt;To help users realize the potential of this change, we spent a lot of effort in this release to rework some fundamental pieces of Flink. We want Flink to feel natural to users who do data engineering / data processing, as well as users who build data/event-driven applications (and of course those who combine both aspects inside their applications). This is an ongoing journey, but here are the first steps on this way:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;We have &lt;strong&gt;redesigned and reimplemented large parts of Flink’s process model&lt;/strong&gt;. This effort has been tracked under the name &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;. While not all is completed yet, the changes in Flink 1.5 enable more natural Kubernetes deployments and switch to HTTP/REST for all external communication (to naturally interact with service proxies). Simultaneously, Flink 1.5 simplifies deployments on common cluster managers (YARN, Mesos) and features dynamic resource allocation.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Streaming &lt;strong&gt;broadcast state&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4940&quot;&gt;FLINK-4940&lt;/a&gt;) connects a broadcasted stream (e.g., context data, machine learning models, rules/patterns, triggers, …) with other streams that may maintain (large) keyed state, such as feature vectors, state machines, etc. Prior to Flink 1.5, such use cases could not be easily built.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;To improve support for real-time applications with tight latency constraints, we made &lt;strong&gt;major improvements to Flink’s network stack&lt;/strong&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7315&quot;&gt;FLINK-7315&lt;/a&gt;). Flink 1.5 achieves even lower latencies while maintaining a high throughput. In addition, we improved checkpoint stability under backpressure.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Streaming SQL is more and more recognized as a simple and powerful way to perform streaming analytics, build data pipelines, do feature engineering, or incrementally keep applications updated on changing data. We added a &lt;strong&gt;SQL CLI for streaming SQL queries&lt;/strong&gt; (&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client&quot;&gt;FLIP-24&lt;/a&gt;) to make this feature easier to get started with.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;rewrite-of-flinks-deployment-and-process-model&quot;&gt;Rewrite of Flink’s Deployment and Process Model&lt;/h3&gt;
&lt;p&gt;The rewrite of Flink’s deployment and process model (internally known as &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;) has been in the works for more than a year and was a substantial effort from the Flink community. Many contributors from several organizations, such as data Artisans, Alibaba, and Dell EMC, collaborated on the design and implementation of this feature, which has been the most significant improvement of a Flink core component since the project’s inception.&lt;/p&gt;
&lt;p&gt;In a nutshell, the improvements add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization, failure recovery, and also dynamic scaling. Moreover, deployments on container management infrastructures like Kubernetes have been simplified and all requests to the JobManager now happen through REST. This includes job submission, cancellation, requesting job status, taking a savepoint, and so on.&lt;/p&gt;
&lt;p&gt;The work also builds the foundation for future improvements of Flink’s integration with Kubernetes. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment, i.e., without starting a Flink cluster first. In addition, the work is a big step towards support for applications that are able to automatically adjust their parallelism.&lt;/p&gt;
&lt;p&gt;Note that Flink’s programming APIs are not affected by these improvements.&lt;/p&gt;
&lt;h3 id=&quot;broadcast-state&quot;&gt;Broadcast State&lt;/h3&gt;
&lt;p&gt;Support for broadcast state, i.e., state that is replicated across all parallel instances of a function, has been an frequently requested feature. Typical use cases for broadcast state involve two streams, a control or configuration stream that serves rules, patterns, or other configuration messages and a regular data stream. The processing of the regular stream is configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.&lt;/p&gt;
&lt;p&gt;Of course, broadcasted state can checkpointed and restored just like any other state in Flink with exactly-once state consistency guarantees. Moreover, broadcast state unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library.&lt;/p&gt;
&lt;h3 id=&quot;improvements-to-flinks-network-stack&quot;&gt;Improvements to Flink’s Network Stack&lt;/h3&gt;
&lt;p&gt;The performance of a distributed streaming application heavily depends on the component that transfers events from one operator to another via a network connection. In the context of stream processing, two performance metrics, latency and throughput, are important.&lt;/p&gt;
&lt;p&gt;For Flink 1.5, the community worked on two efforts to improve Flink’s network stack, credit-based flow control and improving the transfer latency. Credit-based flow control reduces the amount of data “on the wire” to a minimum while preserving high throughput. This significantly reduces the time to complete a checkpoint in back pressure situations. Moreover, Flink is now able to achieve much lower latencies without a reduction in throughput.&lt;/p&gt;
&lt;h3 id=&quot;task-local-state-recovery&quot;&gt;Task-Local State Recovery&lt;/h3&gt;
&lt;p&gt;Flink’s checkpointing mechanism writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This mechanism ensures that state is not lost when an application fails. However, in case of a failure, it might take a while to load the state from the remote storage to recover the application.&lt;/p&gt;
&lt;p&gt;Improving the checkpointing and recovery efficiency is an ongoing effort in the Flink community. Prominent features of previous releases were asynchronous and incremental checkpointing. In this release, we improved the efficiency of failure recovery.&lt;/p&gt;
&lt;p&gt;Task-local state recovery leverages the fact that a job typically fails due to a single crashed operator, TaskManager, or machine. When writing the state of operators to the remote storage, Flink can now also keep a copy on the local disk of each machine. In case of failover, the scheduler tries to reschedule tasks to their previous machine and load the state from the local disk instead of the remote storage, resulting in faster recovery.&lt;/p&gt;
&lt;h3 id=&quot;extending-join-support-for-sql-and-table-api&quot;&gt;Extending Join Support for SQL and Table API&lt;/h3&gt;
&lt;p&gt;With the 1.5.0 release, Flink adds support for windowed outer equi-joins. Queries like the one shown below allow for joining of tables on bounded time ranges in both event-time and processing-time.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;departureTime&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arrivalTime&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Departures&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LEFT&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;OUTER&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Arrivals&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rideId&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;a&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;arrivalTime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BETWEEN&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;deptureTime&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AND&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;d&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;departureTime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;2&amp;#39;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;HOURS&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;For cases where two streaming tables should not be joined within a bounded time interval, Flink SQL also now supports non-windowed inner joins. This enables full-history matching, which is common in many standard SQL statements.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;address&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;productId&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;amount&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Users&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;JOIN&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Orders&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;ON&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;u&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userId&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;o&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;userId&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;sql-cli-client&quot;&gt;SQL CLI Client&lt;/h3&gt;
&lt;p&gt;A few months ago, the community started an effort to add a service to execute streaming and batch SQL queries (FLIP-24). The new SQL CLI client is the first step of this effort and provides a SQL shell to run exploratory queries on data streams. The animation below shows a preview of this features.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/sql_client_demo.gif&quot; style=&quot;width:90%;margin:15px&quot; /&gt;
&lt;/center&gt;
&lt;h3 id=&quot;various-other-features-and-improvements&quot;&gt;Various Other Features and Improvements&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://www.openstack.org/&quot;&gt;OpenStack&lt;/a&gt; provides software for creating public and private clouds on pools of resources. Flink now supports OpenStack’s S3-like file system, Swift, for checkpoint and savepoint storage. Swift can be used without Hadoop dependencies.&lt;/li&gt;
&lt;li&gt;Reading and writing JSON messages from and to connectors has been improved. It’s now possible to parse a standard JSON schema in order to configure serializers and deserializers. The SQL CLI Client is able to read JSON records from Kafka.&lt;/li&gt;
&lt;li&gt;Applications can be rescaled without manually triggering a savepoint. Under the hood, Flink will still take a savepoint, stop the application, and rescale it to the new parallelism.&lt;/li&gt;
&lt;li&gt;Improved metrics for watermarks and latency. Flink now reports the minimum watermark in all operators, including sources. Moreover, the latency metrics were reworked for better integration with common metrics systems.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;FileInputFormat&lt;/code&gt; (and many derived input formats) now supports reading files from multiple paths.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;BucketingSink&lt;/code&gt; supports the specification of custom extensions for multiple parts.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;CassandraOutputFormat&lt;/code&gt; can be used to emit &lt;code&gt;Row&lt;/code&gt; objects.&lt;/li&gt;
&lt;li&gt;The Kinesis consumer allows for more customization.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;release-notes&quot;&gt;Release Notes&lt;/h2&gt;
&lt;p&gt;Please review the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.5/release-notes/flink-1.5.html&quot;&gt;release notes&lt;/a&gt; if you plan to upgrade your Flink setup to Flink 1.5.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following 106 people contributed to the 1.5.0 release. Thanks to all contributors!&lt;/p&gt;
&lt;p&gt;Aegeaner, Alejandro Alcalde, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Ankit Parashar, Arunan Sugunakumar, Bartłomiej Tartanus, Bowen Li, Cristian, Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii_Kniazev, Dyana Rose, EAlexRojas, Eron Wright, Fabian Hueske, Florian Schmidt, Gabor Gevay, Greg Hogan, Gyula Fora, Jark Wu, Jelmer Kuperus, Joerg Schad, John Eismeier, Kailash HD, Ken Geis, Ken Krugler, Kent Murra, Leonid Ishimnikov, Malcolm Taylor, Matrix42, Michael Fong, Michael Gendelman, Moser Thomas W, Nico Kruber, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Phetsarath, Sourigna, Philip Luppens, Piotr Nowojski, Qiu Congxian/klion26, Razvan, Robert Metzger, Rong Rong, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Steven Langbroek, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Vetriselvan1187, Xingcan Cui, Xpray, Yazdan.JS, Zhijiang, Zohar Mizrahi, aria, biao.liub, binlijin, davidxdh, eastcirclek, eskabetxe, gyao, hequn8128, hzyuqi1, ifndef-SleePy, jparkie, juhoautio, kkloudas, maqingxiang-it, maxbelov, mayyamus, mingleiZhang, neoremind, nichuanlei, okumin, shankarganesh1234, shuai.xus, sihuazhou, summerleafs, sunjincheng121, triones.deng, twalthr, uybhatti, vinoyang, wenlong.lwl, yanghua, yew1eb, yuemeng, zentol, zhangminglei, zhouhai02, zjureel, 军长, 金竹, 王振涛, 陈梓立&lt;/p&gt;
</description>
<pubDate>Fri, 25 May 2018 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2018/05/25/release-1.5.0.html</link>
<guid isPermaLink="true">/news/2018/05/25/release-1.5.0.html</guid>
</item>
<item>
<title>Apache Flink 1.3.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the third bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;
&lt;p&gt;This release includes 4 critical fixes related to checkpointing and recovery. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all Flink 1.3 series users to upgrade to Flink 1.3.3.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7783&quot;&gt;FLINK-7783&lt;/a&gt;] - Don&amp;#39;t always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover()
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7283&quot;&gt;FLINK-7283&lt;/a&gt;] - PythonPlanBinderTest issues with python paths
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8487&quot;&gt;FLINK-8487&lt;/a&gt;] - State loss after multiple restart attempts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8807&quot;&gt;FLINK-8807&lt;/a&gt;] - ZookeeperCompleted checkpoint store can get stuck in infinite loop
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8890&quot;&gt;FLINK-8890&lt;/a&gt;] - Compare checkpoints with order in CompletedCheckpoint.checkpointsMatch()
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 15 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/03/15/release-1.3.3.html</link>
<guid isPermaLink="true">/news/2018/03/15/release-1.3.3.html</guid>
</item>
<item>
<title>Apache Flink 1.4.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.4 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 10 fixes and minor improvements for Flink 1.4.1. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.4.2.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6321&quot;&gt;FLINK-6321&lt;/a&gt;] - RocksDB state backend Checkpointing is not working with KeyedCEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7756&quot;&gt;FLINK-7756&lt;/a&gt;] - RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8423&quot;&gt;FLINK-8423&lt;/a&gt;] - OperatorChain#pushToOperator catch block may fail with NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8451&quot;&gt;FLINK-8451&lt;/a&gt;] - CaseClassSerializer is not backwards compatible in 1.4
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8520&quot;&gt;FLINK-8520&lt;/a&gt;] - CassandraConnectorITCase.testCassandraTableSink unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8621&quot;&gt;FLINK-8621&lt;/a&gt;] - PrometheusReporterTest.endpointIsUnavailableAfterReporterIsClosed unstable on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8692&quot;&gt;FLINK-8692&lt;/a&gt;] - Mistake in MyMapFunction code snippet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8735&quot;&gt;FLINK-8735&lt;/a&gt;] - Add savepoint migration ITCase that covers operator state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8741&quot;&gt;FLINK-8741&lt;/a&gt;] - KafkaFetcher09/010/011 uses wrong user code classloader
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8772&quot;&gt;FLINK-8772&lt;/a&gt;] - FlinkKafkaConsumerBase partitions discover missing a log parameter
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8791&quot;&gt;FLINK-8791&lt;/a&gt;] - Fix documentation on how to link dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8798&quot;&gt;FLINK-8798&lt;/a&gt;] - Make commons-logging a parent-first pattern
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8849&quot;&gt;FLINK-8849&lt;/a&gt;] - Wrong link from concepts/runtime to doc on chaining
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8202&quot;&gt;FLINK-8202&lt;/a&gt;] - Update queryable section on configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8574&quot;&gt;FLINK-8574&lt;/a&gt;] - Add timestamps to travis logging messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8576&quot;&gt;FLINK-8576&lt;/a&gt;] - Log message for QueryableState loading failure too verbose
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8652&quot;&gt;FLINK-8652&lt;/a&gt;] - Reduce log level of QueryableStateClient.getKvState() to DEBUG
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8308&quot;&gt;FLINK-8308&lt;/a&gt;] - Update yajl-ruby dependency to 1.3.1 or higher
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 08 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/03/08/release-1.4.2.html</link>
<guid isPermaLink="true">/news/2018/03/08/release-1.4.2.html</guid>
</item>
<item>
<title>An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)</title>
<description>&lt;p&gt;&lt;em&gt;This post is an adaptation of &lt;a href=&quot;https://berlin.flink-forward.org/kb_sessions/hit-me-baby-just-one-time-building-end-to-end-exactly-once-applications-with-flink/&quot;&gt;Piotr Nowojski’s presentation from Flink Forward Berlin 2017&lt;/a&gt;. You can find the slides and a recording of the presentation on the Flink Forward Berlin website.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Apache Flink 1.4.0, released in December 2017, introduced a significant milestone for stream processing with Flink: a new feature called &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7210&quot;&gt;relevant Jira here&lt;/a&gt;) that extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and a selection of data sources and sinks, including Apache Kafka versions 0.11 and beyond. It provides a layer of abstraction and requires a user to implement only a handful of methods to achieve end-to-end exactly-once semantics.&lt;/p&gt;
&lt;p&gt;If that’s all you need to hear, let us point you &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html&quot;&gt;to the relevant place in the Flink documentation&lt;/a&gt;, where you can read about how to put &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; to use.&lt;/p&gt;
&lt;p&gt;But if you’d like to learn more, in this post, we’ll share an in-depth overview of the new feature and what is happening behind the scenes in Flink.&lt;/p&gt;
&lt;p&gt;Throughout the rest of this post, we’ll:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Describe the role of Flink’s checkpoints for guaranteeing exactly-once results within a Flink application.&lt;/li&gt;
&lt;li&gt;Show how Flink interacts with data sources and data sinks via the two-phase commit protocol to deliver &lt;em&gt;end-to-end&lt;/em&gt; exactly-once guarantees.&lt;/li&gt;
&lt;li&gt;Walk through a simple example on how to use &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; to implement an exactly-once file sink.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;exactly-once-semantics-within-an-apache-flink-application&quot;&gt;Exactly-once Semantics Within an Apache Flink Application&lt;/h2&gt;
&lt;p&gt;When we say “exactly-once semantics”, what we mean is that each incoming event affects the final results exactly once. Even in case of a machine or software failure, there’s no duplicate data and no data that goes unprocessed.&lt;/p&gt;
&lt;p&gt;Flink has long provided exactly-once semantics &lt;em&gt;within&lt;/em&gt; a Flink application. Over the past few years, we’ve &lt;a href=&quot;https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink&quot;&gt;written in depth about Flink’s checkpointing&lt;/a&gt;, which is at the core of Flink’s ability to provide exactly-once semantics. The Flink documentation also &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html&quot;&gt;provides a thorough overview of the feature&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Before we continue, here’s a quick summary of the checkpointing algorithm because understanding checkpoints is necessary for understanding this broader topic.&lt;/p&gt;
&lt;p&gt;A checkpoint in Flink is a consistent snapshot of:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;The current state of an application&lt;/li&gt;
&lt;li&gt;The position in an input stream&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Flink generates checkpoints on a regular, configurable interval and then writes the checkpoint to a persistent storage system, such as S3 or HDFS. Writing the checkpoint data to the persistent storage happens asynchronously, which means that a Flink application continues to process data during the checkpointing process.&lt;/p&gt;
&lt;p&gt;In the event of a machine or software failure and upon restart, a Flink application resumes processing from the most recent successfully-completed checkpoint; Flink restores application state and rolls back to the correct position in the input stream from a checkpoint before processing starts again. This means that Flink computes results as though the failure never occurred.&lt;/p&gt;
&lt;p&gt;Before Flink 1.4.0, exactly-once semantics were limited to the scope of &lt;em&gt;a Flink application only&lt;/em&gt; and did not extend to most of the external systems to which Flink sends data after processing.&lt;/p&gt;
&lt;p&gt;But Flink applications operate in conjunction with a wide range of data sinks, and developers should be able to maintain exactly-once semantics beyond the context of one component.&lt;/p&gt;
&lt;p&gt;To provide &lt;em&gt;end-to-end exactly-once&lt;/em&gt; semantics–that is, semantics that also apply to the external systems that Flink writes to in addition to the state of the Flink application–these external systems must provide a means to commit or roll back writes that coordinate with Flink’s checkpoints.&lt;/p&gt;
&lt;p&gt;One common approach for coordinating commits and rollbacks in a distributed system is the &lt;a href=&quot;https://en.wikipedia.org/wiki/Two-phase_commit_protocol&quot;&gt;two-phase commit protocol&lt;/a&gt;. In the next section, we’ll go behind the scenes and discuss how Flink’s &lt;code&gt;TwoPhaseCommitSinkFunction &lt;/code&gt;utilizes the two-phase commit protocol to provide end-to-end exactly-once semantics.&lt;/p&gt;
&lt;h2 id=&quot;end-to-end-exactly-once-applications-with-apache-flink&quot;&gt;End-to-end Exactly Once Applications with Apache Flink&lt;/h2&gt;
&lt;p&gt;We’ll walk through the two-phase commit protocol and how it enables end-to-end exactly-once semantics in a sample Flink application that reads from and writes to Kafka. Kafka is a popular messaging system to use along with Flink, and Kafka recently added support for transactions with its 0.11 release. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-011&quot;&gt;This means that Flink now has the necessary mechanism to provide end-to-end exactly-once semantics&lt;/a&gt; in applications when receiving data from and writing data to Kafka.&lt;/p&gt;
&lt;p&gt;Flink’s support for end-to-end exactly-once semantics is not limited to Kafka and you can use it with any source / sink that provides the necessary coordination mechanism. For example, &lt;a href=&quot;http://pravega.io/&quot;&gt;Pravega&lt;/a&gt;, an open-source streaming storage system from Dell/EMC, also supports end-to-end exactly-once semantics with Flink via the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-1.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;In the sample Flink application that we’ll discuss today, we have:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;A data source that reads from Kafka (in Flink, a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-consumer&quot;&gt;KafkaConsumer&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;A windowed aggregation&lt;/li&gt;
&lt;li&gt;A data sink that writes data back to Kafka (in Flink, a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-producer&quot;&gt;KafkaProducer&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the data sink to provide exactly-once guarantees, it must write all data to Kafka within the scope of a transaction. A commit bundles all writes between two checkpoints.&lt;/p&gt;
&lt;p&gt;This ensures that writes are rolled back in case of a failure.&lt;/p&gt;
&lt;p&gt;However, in a distributed system with multiple, concurrently-running sink tasks, a simple commit or rollback is not sufficient, because all of the components must “agree” together on committing or rolling back to ensure a consistent result. Flink uses the two-phase commit protocol and its pre-commit phase to address this challenge.&lt;/p&gt;
&lt;p&gt;The starting of a checkpoint represents the “pre-commit” phase of our two-phase commit protocol. When a checkpoint starts, the Flink JobManager injects a checkpoint barrier (which separates the records in the data stream into the set that goes into the current checkpoint vs. the set that goes into the next checkpoint) into the data stream.&lt;/p&gt;
&lt;p&gt;The barrier is passed from operator to operator. For every operator, it triggers the operator’s state backend to take a snapshot of its state.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-2.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The data source stores its Kafka offsets, and after completing this, it passes the checkpoint barrier to the next operator.&lt;/p&gt;
&lt;p&gt;This approach works if an operator has internal state &lt;em&gt;only&lt;/em&gt;. &lt;em&gt;Internal state&lt;/em&gt; is everything that is stored and managed by Flink’s state backends - for example, the windowed sums in the second operator. When a process has only internal state, there is no need to perform any additional action during pre-commit aside from updating the data in the state backends before it is checkpointed. Flink takes care of correctly committing those writes in case of checkpoint success or aborting them in case of failure.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-3.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit without external state&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;However, when a process has &lt;em&gt;external&lt;/em&gt; state, this state must be handled a bit differently. External state usually comes in the form of writes to an external system such as Kafka. In that case, to provide exactly-once guarantees, the external system must provide support for transactions that integrates with a two-phase commit protocol.&lt;/p&gt;
&lt;p&gt;We know that the data sink in our example has such external state because it’s writing data to Kafka. In this case, in the pre-commit phase, the data sink must pre-commit its external transaction in addition to writing its state to the state backend.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-4.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - precommit with external state&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The pre-commit phase finishes when the checkpoint barrier passes through all of the operators and the triggered snapshot callbacks complete. At this point the checkpoint completed successfully and consists of the state of the entire application, including pre-committed external state. In case of a failure, we would re-initialize the application from this checkpoint.&lt;/p&gt;
&lt;p&gt;The next step is to notify all operators that the checkpoint has succeeded. This is the commit phase of the two-phase commit protocol and the JobManager issues checkpoint-completed callbacks for every operator in the application. The data source and window operator have no external state, and so in the commit phase, these operators don’t have to take any action. The data sink does have external state, though, and commits the transaction with the external writes.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/eo-post-graphic-5.png&quot; width=&quot;600px&quot; alt=&quot;A sample Flink application - commit external state&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;So let’s put all of these different pieces together:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Once all of the operators complete their pre-commit, they issue a commit.&lt;/li&gt;
&lt;li&gt;If at least one pre-commit fails, all others are aborted, and we roll back to the previous successfully-completed checkpoint.&lt;/li&gt;
&lt;li&gt;After a successful pre-commit, the commit &lt;em&gt;must&lt;/em&gt; be guaranteed to eventually succeed – both our operators and our external system need to make this guarantee. If a commit fails (for example, due to an intermittent network issue), the entire Flink application fails, restarts according to the user’s restart strategy, and there is another commit attempt. This process is critical because if the commit does not eventually succeed, data loss occurs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Therefore, we can be sure that all operators agree on the final outcome of the checkpoint: all operators agree that the data is either committed or that the commit is aborted and rolled back.&lt;/p&gt;
&lt;h2 id=&quot;implementing-the-two-phase-commit-operator-in-flink&quot;&gt;Implementing the Two-Phase Commit Operator in Flink&lt;/h2&gt;
&lt;p&gt;All the logic required to put a two-phase commit protocol together can be a little bit complicated and that’s why Flink extracts the common logic of the two-phase commit protocol into the abstract &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; class&lt;code&gt;. &lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Let’s discuss how to extend a &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; on a simple file-based example. We need to implement only four methods and present their implementations for an exactly-once file sink:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;beginTransaction - &lt;/code&gt;to begin the transaction, we create a temporary file in a temporary directory on our destination file system. Subsequently, we can write data to this file as we process it.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;preCommit - &lt;/code&gt;on pre-commit, we flush the file, close it, and never write to it again. We’ll also start a new transaction for any subsequent writes that belong to the next checkpoint.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;commit - &lt;/code&gt;on commit, we atomically move the pre-committed file to the actual destination directory. Please note that this increases the latency in the visibility of the output data.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;abort - &lt;/code&gt;on abort, we delete the temporary file.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;As we know, if there’s any failure, Flink restores the state of the application to the latest successful checkpoint. One potential catch is in a rare case when the failure occurs after a successful pre-commit but before notification of that fact (a commit) reaches our operator. In that case, Flink restores our operator to the state that has already been pre-committed but not yet committed.&lt;/p&gt;
&lt;p&gt;We must save enough information about pre-committed transactions in checkpointed state to be able to either &lt;code&gt;abort&lt;/code&gt; or &lt;code&gt;commit&lt;/code&gt; transactions after a restart. In our example, this would be the path to the temporary file and target directory.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; takes this scenario into account, and it always issues a preemptive commit when restoring state from a checkpoint. It is our responsibility to implement a commit in an idempotent way. Generally, this shouldn’t be an issue. In our example, we can recognize such a situation: the temporary file is not in the temporary directory, but has already been moved to the target directory.&lt;/p&gt;
&lt;p&gt;There are a handful of other edge cases that &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; takes into account, too. &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/api/java/org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html&quot;&gt;Learn more in the Flink documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;If you’ve made it this far, thanks for staying with us through a detailed post. Here are some key points that we covered:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Flink’s checkpointing system serves as Flink’s basis for supporting a two-phase commit protocol and providing end-to-end exactly-once semantics.&lt;/li&gt;
&lt;li&gt;An advantage of this approach is that Flink does not materialize data in transit the way that some other systems do–there’s no need to write every stage of the computation to disk as is the case is most batch processing.&lt;/li&gt;
&lt;li&gt;Flink’s new &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and external systems that support transactions&lt;/li&gt;
&lt;li&gt;Starting with &lt;a href=&quot;https://data-artisans.com/blog/announcing-the-apache-flink-1-4-0-release&quot;&gt;Flink 1.4.0&lt;/a&gt;, both the Pravega and Kafka 0.11 producers provide exactly-once semantics; Kafka introduced transactions for the first time in Kafka 0.11, which is what made the Kafka exactly-once producer possible in Flink.&lt;/li&gt;
&lt;li&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/connectors/kafka.html#kafka-011&quot;&gt;Kafka 0.11 producer&lt;/a&gt; is implemented on top of the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt;, and it offers very low overhead compared to the at-least-once Kafka producer.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We’re very excited about what this new feature enables, and we look forward to being able to support additional producers with the &lt;code&gt;TwoPhaseCommitSinkFunction&lt;/code&gt; in the future.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This post &lt;a href=&quot;https://data-artisans.com/blog/end-to-end-exactly-once-processing-apache-flink-apache-kafka&quot; target=&quot;_blank&quot;&gt; first appeared on the data Artisans blog &lt;/a&gt;and was contributed to Apache Flink and the Flink blog by the original authors Piotr Nowojski and Mike Winters.&lt;/em&gt;&lt;/p&gt;
&lt;link rel=&quot;canonical&quot; href=&quot;https://data-artisans.com/blog/end-to-end-exactly-once-processing-apache-flink-apache-kafka&quot; /&gt;
</description>
<pubDate>Thu, 01 Mar 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html</link>
<guid isPermaLink="true">/features/2018/03/01/end-to-end-exactly-once-apache-flink.html</guid>
</item>
<item>
<title>Apache Flink 1.4.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.4 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.4.0. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.4.1.&lt;/p&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.11&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.4.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6321&quot;&gt;FLINK-6321&lt;/a&gt;] - RocksDB state backend Checkpointing is not working with KeyedCEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7499&quot;&gt;FLINK-7499&lt;/a&gt;] - double buffer release in SpillableSubpartitionView
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7756&quot;&gt;FLINK-7756&lt;/a&gt;] - RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7760&quot;&gt;FLINK-7760&lt;/a&gt;] - Restore failing from external checkpointing metadata.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8323&quot;&gt;FLINK-8323&lt;/a&gt;] - Fix Mod scala function bug
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5506&quot;&gt;FLINK-5506&lt;/a&gt;] - Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6951&quot;&gt;FLINK-6951&lt;/a&gt;] - Incompatible versions of httpcomponents jars for Flink kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7949&quot;&gt;FLINK-7949&lt;/a&gt;] - AsyncWaitOperator is not restarting when queue is full
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8145&quot;&gt;FLINK-8145&lt;/a&gt;] - IOManagerAsync not properly shut down in various tests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8200&quot;&gt;FLINK-8200&lt;/a&gt;] - RocksDBAsyncSnapshotTest should use temp fold instead of fold with fixed name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8226&quot;&gt;FLINK-8226&lt;/a&gt;] - Dangling reference generated after NFA clean up timed out SharedBufferEntry
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8230&quot;&gt;FLINK-8230&lt;/a&gt;] - NPE in OrcRowInputFormat on nested structs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8235&quot;&gt;FLINK-8235&lt;/a&gt;] - Cannot run spotbugs for single module
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8242&quot;&gt;FLINK-8242&lt;/a&gt;] - ClassCastException in OrcTableSource.toOrcPredicate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8248&quot;&gt;FLINK-8248&lt;/a&gt;] - RocksDB state backend Checkpointing is not working with KeyedCEP in 1.4
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8249&quot;&gt;FLINK-8249&lt;/a&gt;] - Kinesis Producer didnt configure region
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8261&quot;&gt;FLINK-8261&lt;/a&gt;] - Typos in the shading exclusion for jsr305 in the quickstarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8263&quot;&gt;FLINK-8263&lt;/a&gt;] - Wrong packaging of flink-core in scala quickstarty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8265&quot;&gt;FLINK-8265&lt;/a&gt;] - Missing jackson dependency for flink-mesos
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8270&quot;&gt;FLINK-8270&lt;/a&gt;] - TaskManagers do not use correct local path for shipped Keytab files in Yarn deployment modes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8275&quot;&gt;FLINK-8275&lt;/a&gt;] - Flink YARN deployment with Kerberos enabled not working
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8278&quot;&gt;FLINK-8278&lt;/a&gt;] - Scala examples in Metric documentation do not compile
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8283&quot;&gt;FLINK-8283&lt;/a&gt;] - FlinkKafkaConsumerBase failing on Travis with no output in 10min
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8295&quot;&gt;FLINK-8295&lt;/a&gt;] - Netty shading does not work properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8306&quot;&gt;FLINK-8306&lt;/a&gt;] - FlinkKafkaConsumerBaseTest has invalid mocks on final methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8318&quot;&gt;FLINK-8318&lt;/a&gt;] - Conflict jackson library with ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8325&quot;&gt;FLINK-8325&lt;/a&gt;] - Add COUNT AGG support constant parameter, i.e. COUNT(*), COUNT(1)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8352&quot;&gt;FLINK-8352&lt;/a&gt;] - Flink UI Reports No Error on Job Submission Failures
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8355&quot;&gt;FLINK-8355&lt;/a&gt;] - DataSet Should not union a NULL row for AGG without GROUP BY clause.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8371&quot;&gt;FLINK-8371&lt;/a&gt;] - Buffers are not recycled in a non-spilled SpillableSubpartition upon release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8398&quot;&gt;FLINK-8398&lt;/a&gt;] - Stabilize flaky KinesisDataFetcherTests
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8406&quot;&gt;FLINK-8406&lt;/a&gt;] - BucketingSink does not detect hadoop file systems
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8409&quot;&gt;FLINK-8409&lt;/a&gt;] - Race condition in KafkaConsumerThread leads to potential NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8419&quot;&gt;FLINK-8419&lt;/a&gt;] - Kafka consumer&amp;#39;s offset metrics are not registered for dynamically discovered partitions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8421&quot;&gt;FLINK-8421&lt;/a&gt;] - HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8433&quot;&gt;FLINK-8433&lt;/a&gt;] - Update code example for &amp;quot;Managed Operator State&amp;quot; documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8461&quot;&gt;FLINK-8461&lt;/a&gt;] - Wrong logger configurations for shaded Netty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8466&quot;&gt;FLINK-8466&lt;/a&gt;] - ErrorInfo needs to hold Exception as SerializedThrowable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8484&quot;&gt;FLINK-8484&lt;/a&gt;] - Kinesis consumer re-reads closed shards on job restart
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8485&quot;&gt;FLINK-8485&lt;/a&gt;] - Running Flink inside Intellij no longer works after upgrading from 1.3.2 to 1.4.0
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8489&quot;&gt;FLINK-8489&lt;/a&gt;] - Data is not emitted by second ElasticSearch connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8496&quot;&gt;FLINK-8496&lt;/a&gt;] - WebUI does not display TM MemorySegment metrics
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8499&quot;&gt;FLINK-8499&lt;/a&gt;] - Kryo must not be child-first loaded
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8522&quot;&gt;FLINK-8522&lt;/a&gt;] - DefaultOperatorStateBackend writes data in checkpoint that is never read.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8559&quot;&gt;FLINK-8559&lt;/a&gt;] - Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8561&quot;&gt;FLINK-8561&lt;/a&gt;] - SharedBuffer line 573 uses == to compare BufferEntries instead of .equals.
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8079&quot;&gt;FLINK-8079&lt;/a&gt;] - Skip remaining E2E tests if one failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8202&quot;&gt;FLINK-8202&lt;/a&gt;] - Update queryable section on configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8243&quot;&gt;FLINK-8243&lt;/a&gt;] - OrcTableSource should recursively read all files in nested directories of the input path.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8260&quot;&gt;FLINK-8260&lt;/a&gt;] - Document API of Kafka 0.11 Producer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8264&quot;&gt;FLINK-8264&lt;/a&gt;] - Add Scala to the parent-first loading patterns
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8271&quot;&gt;FLINK-8271&lt;/a&gt;] - upgrade from deprecated classes to AmazonKinesis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8287&quot;&gt;FLINK-8287&lt;/a&gt;] - Flink Kafka Producer docs should clearly state what partitioner is used by default
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8296&quot;&gt;FLINK-8296&lt;/a&gt;] - Rework FlinkKafkaConsumerBestTest to not use Java reflection for dependency injection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8346&quot;&gt;FLINK-8346&lt;/a&gt;] - add S3 signature v4 workaround to docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8362&quot;&gt;FLINK-8362&lt;/a&gt;] - Shade Elasticsearch dependencies away
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8455&quot;&gt;FLINK-8455&lt;/a&gt;] - Add Hadoop to the parent-first loading patterns
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8473&quot;&gt;FLINK-8473&lt;/a&gt;] - JarListHandler may fail with NPE if directory is deleted
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8571&quot;&gt;FLINK-8571&lt;/a&gt;] - Provide an enhanced KeyedStream implementation to use ForwardPartitioner
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Test
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-8472&quot;&gt;FLINK-8472&lt;/a&gt;] - Extend migration tests for Flink 1.4
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 15 Feb 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2018/02/15/release-1.4.1.html</link>
<guid isPermaLink="true">/news/2018/02/15/release-1.4.1.html</guid>
</item>
<item>
<title>Managing Large State in Apache Flink: An Intro to Incremental Checkpointing</title>
<description>&lt;p&gt;Apache Flink was purpose-built for &lt;em&gt;stateful&lt;/em&gt; stream processing. However, what is state in a stream processing application? I defined state and stateful stream processing in a &lt;a href=&quot;http://flink.apache.org/features/2017/07/04/flink-rescalable-state.html&quot;&gt;previous blog post&lt;/a&gt;, and in case you need a refresher, &lt;em&gt;state is defined as memory in an application’s operators that stores information about previously-seen events that you can use to influence the processing of future events&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;State is a fundamental, enabling concept in stream processing required for a majority of complex use cases. Some examples highlighted in the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html&quot;&gt;Flink documentation&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;When an application searches for certain event patterns, the state stores the sequence of events encountered so far.&lt;/li&gt;
&lt;li&gt;When aggregating events per minute, the state holds the pending aggregates.&lt;/li&gt;
&lt;li&gt;When training a machine learning model over a stream of data points, the state holds the current version of the model parameters.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;However, stateful stream processing is only useful in production environments if the state is fault tolerant. “Fault tolerance” means that even if there’s a software or machine failure, the computed end-result is accurate, with no data loss or double-counting of events.&lt;/p&gt;
&lt;p&gt;Flink’s fault tolerance has always been a powerful and popular feature, minimizing the impact of software or machine failure on your business and making it possible to guarantee exactly-once results from a Flink application.&lt;/p&gt;
&lt;p&gt;Core to this is &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html&quot;&gt;checkpointing&lt;/a&gt;, which is the mechanism Flink uses to make application state fault tolerant. A checkpoint in Flink is a global, asynchronous snapshot of application state that’s taken on a regular interval and sent to durable storage (usually, a distributed file system). In the event of a failure, Flink restarts an application using the most recently completed checkpoint as a starting point. Some Apache Flink users run applications with gigabytes or even terabytes of application state. These users reported that with such large state, creating a checkpoint was often a slow and resource intensive operation, which is why in Flink 1.3 we introduced ‘incremental checkpointing.’&lt;/p&gt;
&lt;p&gt;Before incremental checkpointing, every single Flink checkpoint consisted of the full state of an application. We created the incremental checkpointing feature after we noticed that writing the full state for every checkpoint was often unnecessary, as the state changes from one checkpoint to the next were rarely that large. Incremental checkpointing instead maintains the differences (or ‘delta’) between each checkpoint and stores only the differences between the last checkpoint and the current state.&lt;/p&gt;
&lt;p&gt;Incremental checkpoints can provide a significant performance improvement for jobs with a very large state. Early testing of the feature by a production user with terabytes of state shows a drop in checkpoint time from more than 3 minutes down to 30 seconds after implementing incremental checkpoints. This is because the checkpoint doesn’t need to transfer the full state to durable storage on each checkpoint.&lt;/p&gt;
&lt;h3 id=&quot;how-to-start&quot;&gt;How to Start&lt;/h3&gt;
&lt;p&gt;Currently, you can only use incremental checkpointing with a RocksDB state back-end, and Flink uses RocksDB’s internal backup mechanism to consolidate checkpoint data over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and Flink eventually consumes and prunes old checkpoints automatically.&lt;/p&gt;
&lt;p&gt;To enable incremental checkpointing in your application, I recommend you read the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/large_state_tuning.html#tuning-rocksdb&quot;&gt;the Apache Flink documentation on checkpointing&lt;/a&gt; for full details, but in summary, you enable checkpointing as normal, but enable incremental checkpointing in the constructor by setting the second parameter to &lt;code&gt;true&lt;/code&gt;.&lt;/p&gt;
&lt;h4 id=&quot;java-example&quot;&gt;Java Example&lt;/h4&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filebackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h4 id=&quot;scala-example&quot;&gt;Scala Example&lt;/h4&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filebackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;kc&quot;&gt;true&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By default, Flink retains 1 completed checkpoint, so if you need a higher number, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/checkpointing.html#related-config-options&quot;&gt;you can configure it with the following flag&lt;/a&gt;:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;state&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;checkpoints&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;num&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;retained&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;how-it-works&quot;&gt;How it Works&lt;/h3&gt;
&lt;p&gt;Flink’s incremental checkpointing uses &lt;a href=&quot;https://github.com/facebook/rocksdb/wiki/Checkpoints&quot;&gt;RocksDB checkpoints&lt;/a&gt; as a foundation. RocksDB is a key-value store based on ‘&lt;a href=&quot;https://en.wikipedia.org/wiki/Log-structured_merge-tree&quot;&gt;log-structured-merge&lt;/a&gt;’ (LSM) trees that collects all changes in a mutable (changeable) in-memory buffer called a ‘memtable’. Any updates to the same key in the memtable replace previous values, and once the memtable is full, RocksDB writes it to disk with all entries sorted by their key and with light compression applied. Once RocksDB writes the memtable to disk it is immutable (unchangeable) and is now called a ‘sorted-string-table’ (sstable).&lt;/p&gt;
&lt;p&gt;A ‘compaction’ background task merges sstables to consolidate potential duplicates for each key, and over time RocksDB deletes the original sstables, with the merged sstable containing all information from across all the other sstables.&lt;/p&gt;
&lt;p&gt;On top of this, Flink tracks which sstable files RocksDB has created and deleted since the previous checkpoint, and as the sstables are immutable, Flink uses this to figure out the state changes. To do this, Flink triggers a flush in RocksDB, forcing all memtables into sstables on disk, and hard-linked in a local temporary directory. This process is synchronous to the processing pipeline, and Flink performs all further steps asynchronously and does not block processing.&lt;/p&gt;
&lt;p&gt;Then Flink copies all new sstables to stable storage (e.g., HDFS, S3) to reference in the new checkpoint. Flink doesn’t copy all sstables that already existed in the previous checkpoint to stable storage but re-reference them. Any new checkpoints will no longer reference deleted files as deleted sstables in RocksDB are always the result of compaction, and it eventually replaces old tables with an sstable that is the result of a merge. This how in Flink’s incremental checkpoints can prune the checkpoint history.&lt;/p&gt;
&lt;p&gt;For tracking changes between checkpoints, the uploading of consolidated tables is redundant work. Flink performs the process incrementally, and typically adds only a small overhead, so we consider this worthwhile because it allows Flink to keep a shorter history of checkpoints to consider in a recovery.&lt;/p&gt;
&lt;h4 id=&quot;an-example&quot;&gt;An Example&lt;/h4&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/incremental_cp_impl_example.svg&quot; alt=&quot;Example setup&quot; /&gt;
&lt;em&gt;Example setup&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Take an example with a subtask of one operator that has a keyed state, and the number of retained checkpoints set at &lt;strong&gt;2&lt;/strong&gt;. The columns in the figure above show the state of the local RocksDB instance for each checkpoint, the files it references, and the counts in the shared state registry after the checkpoint completes.&lt;/p&gt;
&lt;p&gt;For checkpoint ‘CP 1’, the local RocksDB directory contains two sstable files, it considers these new and uploads them to stable storage using directory names that match the checkpoint name. When the checkpoint completes, Flink creates the two entries in the shared state registry and sets their counts to ‘1’. The key in the shared state registry is a composite of an operator, subtask, and the original sstable file name. The registry also keeps a mapping from the key to the file path in stable storage.&lt;/p&gt;
&lt;p&gt;For checkpoint ‘CP 2’, RocksDB has created two new sstable files, and the two older ones still exist. For checkpoint ‘CP 2’, Flink adds the two new files to stable storage and can reference the previous two files. When the checkpoint completes, Flink increases the counts for all referenced files by 1.&lt;/p&gt;
&lt;p&gt;For checkpoint ‘CP 3’, RocksDB’s compaction has merged &lt;code&gt;sstable-(1)&lt;/code&gt;, &lt;code&gt;sstable-(2)&lt;/code&gt;, and &lt;code&gt;sstable-(3)&lt;/code&gt; into &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and deleted the original files. This merged file contains the same information as the source files, with all duplicate entries eliminated. In addition to this merged file, &lt;code&gt;sstable-(4)&lt;/code&gt; still exists and there is now a new &lt;code&gt;sstable-(5)&lt;/code&gt; file. Flink adds the new &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and &lt;code&gt;sstable-(5)&lt;/code&gt; files to stable storage, &lt;code&gt;sstable-(4)&lt;/code&gt; is re-referenced from checkpoint ‘CP 2’ and increases the counts for referenced files by 1. The older ‘CP 1’ checkpoint is now deleted as the number of retained checkpoints (2) has been reached. As part of this deletion, Flink decreases the counts for all files referenced ‘CP 1’, (&lt;code&gt;sstable-(1)&lt;/code&gt; and &lt;code&gt;sstable-(2)&lt;/code&gt;), by 1.&lt;/p&gt;
&lt;p&gt;For checkpoint ‘CP-4’, RocksDB has merged &lt;code&gt;sstable-(4)&lt;/code&gt;, &lt;code&gt;sstable-(5)&lt;/code&gt;, and a new &lt;code&gt;sstable-(6)&lt;/code&gt; into &lt;code&gt;sstable-(4,5,6)&lt;/code&gt;. Flink adds this new table to stable storage and references it together with &lt;code&gt;sstable-(1,2,3)&lt;/code&gt;, it increases the counts for &lt;code&gt;sstable-(1,2,3)&lt;/code&gt; and &lt;code&gt;sstable-(4,5,6)&lt;/code&gt; by 1 and then deletes ‘CP-2’ as the number of retained checkpoints has been reached. As the counts for &lt;code&gt;sstable-(1)&lt;/code&gt;, &lt;code&gt;sstable-(2)&lt;/code&gt;, and &lt;code&gt;sstable-(3)&lt;/code&gt; have now dropped to 0, and Flink deletes them from stable storage.&lt;/p&gt;
&lt;h3 id=&quot;race-conditions-and-concurrent-checkpoints&quot;&gt;Race Conditions and Concurrent Checkpoints&lt;/h3&gt;
&lt;p&gt;As Flink can execute multiple checkpoints in parallel, sometimes new checkpoints start before confirming previous checkpoints as completed. Because of this, you should consider which the previous checkpoint to use as a basis for a new incremental checkpoint. Flink only references state from a checkpoint confirmed by the checkpoint coordinator so that it doesn’t unintentionally reference a deleted shared file.&lt;/p&gt;
&lt;h3 id=&quot;restoring-checkpoints-and-performance-considerations&quot;&gt;Restoring Checkpoints and Performance Considerations&lt;/h3&gt;
&lt;p&gt;If you enable incremental checkpointing, there are no further configuration steps needed to recover your state in case of failure. If a failure occurs, Flink’s &lt;code&gt;JobManager&lt;/code&gt; tells all tasks to restore from the last completed checkpoint, be it a full or incremental checkpoint. Each &lt;code&gt;TaskManager&lt;/code&gt; then downloads their share of the state from the checkpoint on the distributed file system.&lt;/p&gt;
&lt;p&gt;Though the feature can lead to a substantial improvement in checkpoint time for users with a large state, there are trade-offs to consider with incremental checkpointing. Overall, the process reduces the checkpointing time during normal operations but can lead to a longer recovery time depending on the size of your state. If the cluster failure is particularly severe and the Flink &lt;code&gt;TaskManager&lt;/code&gt;s have to read from multiple checkpoints, recovery can be a slower operation than when using non-incremental checkpointing. You can also no longer delete old checkpoints as newer checkpoints need them, and the history of differences between checkpoints can grow indefinitely over time. You need to plan for larger distributed storage to maintain the checkpoints and the network overhead to read from it.&lt;/p&gt;
&lt;p&gt;There are some strategies for improving the convenience/performance trade-off, and I recommend you read &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/checkpoints.html#basics-of-incremental-checkpoints&quot;&gt;the Flink documentation&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This post &lt;a href=&quot;https://data-artisans.com/blog/managing-large-state-apache-flink-incremental-checkpointing-overview&quot; target=&quot;_blank&quot;&gt; originally appeared on the data Artisans blog &lt;/a&gt;and was contributed to the Flink blog by Stefan Richter and Chris Ward.&lt;/em&gt;&lt;/p&gt;
&lt;link rel=&quot;canonical&quot; href=&quot;https://data-artisans.com/blog/managing-large-state-apache-flink-incremental-checkpointing-overview&quot; /&gt;
</description>
<pubDate>Tue, 30 Jan 2018 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/features/2018/01/30/incremental-checkpointing.html</link>
<guid isPermaLink="true">/features/2018/01/30/incremental-checkpointing.html</guid>
</item>
<item>
<title>Apache Flink in 2017: Year in Review</title>
<description>&lt;p&gt;2017 was another exciting year for the Apache Flink® community, with 3 major version releases (&lt;a href=&quot;http://flink.apache.org/news/2017/02/06/release-1.2.0.html&quot;&gt;Flink 1.2.0 in February&lt;/a&gt;, &lt;a href=&quot;http://flink.apache.org/news/2017/06/01/release-1.3.0.html&quot;&gt;Flink 1.3.0 in June&lt;/a&gt;, and &lt;a href=&quot;http://flink.apache.org/news/2017/12/12/release-1.4.0.html&quot;&gt;Flink 1.4.0 in December&lt;/a&gt;) and the first-ever &lt;a href=&quot;https://sf-2017.flink-forward.org/&quot;&gt;Flink Forward in San Francisco&lt;/a&gt;, giving Flink community members in another corner of the globe an opportunity to connect. Users shared details about their innovative production deployments, redefining what is possible with a modern stream processing framework like Flink.&lt;/p&gt;
&lt;p&gt;In this post, we’ll look back on the project’s progress over the course of 2017, and we’ll also preview what 2018 has in store.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#community-growth&quot; id=&quot;markdown-toc-community-growth&quot;&gt;Community Growth&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#github&quot; id=&quot;markdown-toc-github&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#meetups&quot; id=&quot;markdown-toc-meetups&quot;&gt;Meetups&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-forward-2017&quot; id=&quot;markdown-toc-flink-forward-2017&quot;&gt;Flink Forward 2017&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-and-ecosystem&quot; id=&quot;markdown-toc-features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-ecosystem-growth&quot; id=&quot;markdown-toc-flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#feature-timeline-in-2017&quot; id=&quot;markdown-toc-feature-timeline-in-2017&quot;&gt;Feature Timeline in 2017&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#looking-ahead-to-2018&quot; id=&quot;markdown-toc-looking-ahead-to-2018&quot;&gt;Looking ahead to 2018&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;community-growth&quot;&gt;Community Growth&lt;/h2&gt;
&lt;h3 id=&quot;github&quot;&gt;Github&lt;/h3&gt;
&lt;p&gt;First, here’s a summary of community statistics from &lt;a href=&quot;https://github.com/apache/flink&quot;&gt;GitHub&lt;/a&gt;. At the time of writing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Contributors&lt;/strong&gt; have increased from 258 in December 2016 to 352 in December 2017 (up &lt;strong&gt;36%&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Stars&lt;/strong&gt; have increased from 1830 in December 2016 to 3036 in December 2017 (up &lt;strong&gt;65%&lt;/strong&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Forks&lt;/strong&gt; have increased from 1255 in December 2016 to 2070 in December 2017 (up &lt;strong&gt;65%&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The community also welcomed &lt;strong&gt;10 new committers in 2017&lt;/strong&gt;: Kostas Kloudas, Jark Wu, Stefan Richter, Kurt Young, Theodore Vasiloudis, Xiaogang Shi, Dawid Wysakowicz, Shaoxuan Wang, Jincheng Sun and Haohui Mai.&lt;/p&gt;
&lt;p&gt;We also welcomed &lt;strong&gt;3 new members to the &lt;a href=&quot;http://www.apache.org/foundation/governance/pmcs.html&quot;&gt;project management committee (PMC)&lt;/a&gt;&lt;/strong&gt;: Greg Hogan, Tzu-Li (Gordon) Tai and Chesnay Schepler.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/github-stats-2017.png&quot; alt=&quot;Apache Flink GitHub Stats&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Next, let’s take a look at a few other project stats, starting with number of commits. If we run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git log --pretty&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;oneline --after&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;12/31/2016 &lt;span class=&quot;p&quot;&gt;|&lt;/span&gt; wc -l&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Inside the Flink repository, we’ll see a total of &lt;strong&gt;2316&lt;/strong&gt; commits so far in 2017, bringing the all-time total commits to &lt;strong&gt;12,532&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now, let’s go a bit deeper, here are instructions to take a look at this data yourself.&lt;/p&gt;
&lt;p&gt;Download and install gitstats from the &lt;a href=&quot;http://gitstats.sourceforge.net/&quot;&gt;project homepage&lt;/a&gt;, then clone the Apache Flink git repository:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;git clone git@github.com:apache/flink.git&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Generate the statistics&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;gitstats flink/ flink-stats/&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;View all the statistics as an HTML page using your default browser:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-bash&quot;&gt;open flink-stats/index.html&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Flink surpassed 1 million lines of code in 2016, and that trend continued in 2017 with the code base now clocking in at &lt;strong&gt;1,257,949&lt;/strong&gt; lines.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-lines-of-code-2017.png&quot; alt=&quot;Flink Total Lines of Code&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Monday remains the day of the week with the most commits over the project’s history, but Wednesday is catching up:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-dow-2017.png&quot; alt=&quot;Flink Commits by Day of Week&quot; /&gt;&lt;/p&gt;
&lt;p&gt;5 pm remains the preferred commit time, closely followed by 4 pm:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-hod-2017.png&quot; alt=&quot;Flink Commits by Hour of Day&quot; /&gt;&lt;/p&gt;
&lt;h3 id=&quot;meetups&quot;&gt;Meetups&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://www.meetup.com/topics/apache-flink/&quot;&gt;Apache Flink Meetup membership&lt;/a&gt; grew by &lt;strong&gt;20%&lt;/strong&gt; this year to a total of &lt;strong&gt;19,767&lt;/strong&gt; members at &lt;strong&gt;39&lt;/strong&gt; meetups listing Flink as a topic. With meetups on five continents, the Flink community is proud to be truly global.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-meetups-dec2017.png&quot; alt=&quot;Apache Flink Meetup Map&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;flink-forward-2017&quot;&gt;Flink Forward 2017&lt;/h2&gt;
&lt;p&gt;2017 was the first year we ran a Flink Forward conference in both &lt;a href=&quot;https://berlin-2017.flink-forward.org&quot;&gt;Berlin&lt;/a&gt; (September 11-13) and &lt;a href=&quot;https://sf-2017.flink-forward.org&quot;&gt;San Francisco&lt;/a&gt; (April 10-11), and over 350 members of our community attended each event for speaker sessions, training, and discussion about Flink.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.slideshare.net/FlinkForward/presentations&quot;&gt;Slides&lt;/a&gt; and &lt;a href=&quot;https://www.youtube.com/channel/UCY8_lgiZLZErZPF47a2hXMA&quot;&gt;videos&lt;/a&gt; are available for all speaker sessions, and if you’re interested in learning more about how organizations use Flink in production, we encourage you to browse and watch a couple.&lt;/p&gt;
&lt;p&gt;For 2018, Flink Forward will be back in &lt;a href=&quot;https://flink-forward.org/&quot;&gt;September in Berlin&lt;/a&gt;, and in &lt;a href=&quot;https://sf-2018.flink-forward.org/&quot;&gt;April in San Francisco&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/speaker-logos-ff2017.png&quot; alt=&quot;Flink Forward Speakers&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/h2&gt;
&lt;h3 id=&quot;flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/h3&gt;
&lt;p&gt;Flink was added to a selection of distributions and integrations during 2017, making it easier for a wider user base to get started with Flink:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;Official Docker image&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html&quot;&gt;Official DC/OS and Mesos support&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://data-artisans.com/blog/dellemc-launches-open-source-pravega-complete-apache-flink-connector&quot;&gt;A Flink connector&lt;/a&gt; for &lt;a href=&quot;http://pravega.io&quot;&gt;Pravega&lt;/a&gt;, Dell/EMC’s streaming storage system.&lt;/li&gt;
&lt;li&gt;Uber announced AthenaX, a streaming SQL platform &lt;a href=&quot;https://data-artisans.com/blog/uber-introduces-open-source-athenax-streaming-sql-platform-apache-flink&quot;&gt;powered by Apache Flink&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;dataArtisans announced an early access program of a SaaS product based on Apache Flink, &lt;a href=&quot;https://data-artisans.com/blog/da-platform-2-stateful-stream-processing-with-apache-flink-made-easier&quot;&gt;dA Platform 2&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;feature-timeline-in-2017&quot;&gt;Feature Timeline in 2017&lt;/h3&gt;
&lt;p&gt;Just in time for the end of the year, our 1.4 release &lt;a href=&quot;http://flink.apache.org/news/2017/12/12/release-1.4.0.html&quot;&gt;read the full release announcement&lt;/a&gt; landed in mid-December culminating 5 months of work and the resolution of more than 900 issues. This is the fifth major release in the 1.x.y series.&lt;/p&gt;
&lt;p&gt;Here’s a selection of major features added to Flink over the course of 2017:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-releases-2017.png&quot; alt=&quot;Flink Release Timeline 2017&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If you take a look at &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5016?jql=project%20%3D%20FLINK%20AND%20issuetype%20in%20(Bug%2C%20Improvement%2C%20%22New%20Feature%22)%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20resolved%20%3E%3D%202017-01-01%20AND%20resolved%20%3C%3D%202017-12-31%20ORDER%20BY%20resolved%20ASC&quot;&gt;the resolved issues and enhancements for 2017 on Jira&lt;/a&gt; you can see that the community resolved over 1,831 issues and feature additions.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/news/2016/12/19/2016-year-in-review.html#looking-ahead-to-2017&quot;&gt;Regarding roadmap commitments from 2016&lt;/a&gt;, there is mixed news, with some items a part of current releases, others scheduled for upcoming releases and some that remain under discussion.&lt;/p&gt;
&lt;h2 id=&quot;looking-ahead-to-2018&quot;&gt;Looking ahead to 2018&lt;/h2&gt;
&lt;p&gt;A good source of information about the Flink community’s roadmap is the list of &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals&quot;&gt;Flink Improvement Proposals (FLIPs)&lt;/a&gt; in the project wiki. Below, we’ll highlight a selection of FLIPs accepted by the community as well as some that are still under discussion.&lt;/p&gt;
&lt;p&gt;Work is already underway on a number of these features, and some will be included in Flink 1.5 at the beginning of 2018.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Improved BLOB storage architecture&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-19:+Improved+BLOB+storage+architecture&quot;&gt;FLIP-19&lt;/a&gt; to consolidate API usage and improve concurrency.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Integration of SQL and CEP&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-20:+Integration+of+SQL+and+CEP&quot;&gt;FLIP-20&lt;/a&gt; to allow developers to create complex event processing (CEP) patterns using SQL statements.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unified checkpoints and savepoints&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-10:+Unify+Checkpoints+and+Savepoints&quot;&gt;FLIP-10&lt;/a&gt;, to allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An improved Flink deployment and process model&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt;, to allow for better integration with Flink and cluster managers and deployment technologies such as Mesos, Docker, and Kubernetes.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fine-grained recovery from task failures&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+:+Fine+Grained+Recovery+from+Task+Failures&quot;&gt;FLIP-1&lt;/a&gt; to improve recovery efficiency and only re-execute failed tasks, reducing the amount of state that Flink needs to transfer on recovery.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An SQL Client&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client&quot;&gt;FLIP-24&lt;/a&gt; to add a service and a client to execute SQL queries against batch and streaming tables.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Serving of machine learning models&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-23+-+Model+Serving&quot;&gt;FLIP-23&lt;/a&gt; to add a library that allows users to apply offline-trained machine learning models to data streams.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Lastly, we’d like to extend a sincere thank you to all the Flink community for making 2017 a great year!&lt;/p&gt;
</description>
<pubDate>Thu, 21 Dec 2017 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/12/21/2017-year-in-review.html</link>
<guid isPermaLink="true">/news/2017/12/21/2017-year-in-review.html</guid>
</item>
<item>
<title>Apache Flink 1.4.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.4.0 release. Over the past 5 months, the
Flink community has been working hard to resolve more than 900 issues. See the &lt;a href=&quot;https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&amp;amp;version=12340533&quot;&gt;complete changelog&lt;/a&gt;
for more detail.&lt;/p&gt;
&lt;p&gt;This is the fifth major release in the 1.x.y series. It is API-compatible with the other 1.x.y
releases for APIs annotated with the @Public annotation.&lt;/p&gt;
&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads&lt;/a&gt; page on the Flink project site.&lt;/p&gt;
&lt;p&gt;The release includes improvements to many different aspects of Flink, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The ability to build end-to-end exactly-once applications with Flink and popular data sources and sinks such as Apache Kafka.&lt;/li&gt;
&lt;li&gt;A more developer-friendly dependency structure as well as Hadoop-free Flink for Flink users who do not have Hadoop dependencies.&lt;/li&gt;
&lt;li&gt;Support for JOIN and for new sources and sinks in table API and SQL, expanding the range of logic that can be expressed with these APIs.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;A summary of some of the features in the release is available below.&lt;/p&gt;
&lt;p&gt;For more background on the Flink 1.4.0 release and the work planned for the Flink 1.5.0 release, please refer to &lt;a href=&quot;http://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html&quot;&gt;this blog post&lt;/a&gt; on the Apache Flink blog.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#new-features-and-improvements&quot; id=&quot;markdown-toc-new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot; id=&quot;markdown-toc-end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot;&gt;End-to-end Exactly Once Applications with Apache Flink and Apache Kafka and TwoPhaseCommitSinkFunction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-api-and-streaming-sql-enhancements&quot; id=&quot;markdown-toc-table-api-and-streaming-sql-enhancements&quot;&gt;Table API and Streaming SQL Enhancements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#a-significantly-improved-dependency-structure-and-reversed-class-loading&quot; id=&quot;markdown-toc-a-significantly-improved-dependency-structure-and-reversed-class-loading&quot;&gt;A Significantly-Improved Dependency Structure and Reversed Class Loading&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#hadoop-free-flink&quot; id=&quot;markdown-toc-hadoop-free-flink&quot;&gt;Hadoop-free Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improvements-to-flink-internals&quot; id=&quot;markdown-toc-improvements-to-flink-internals&quot;&gt;Improvements to Flink Internals&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#improvements-to-the-queryable-state-client&quot; id=&quot;markdown-toc-improvements-to-the-queryable-state-client&quot;&gt;Improvements to the Queryable State Client&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#metrics-and-monitoring&quot; id=&quot;markdown-toc-metrics-and-monitoring&quot;&gt;Metrics and Monitoring&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#connector-improvements-and-fixes&quot; id=&quot;markdown-toc-connector-improvements-and-fixes&quot;&gt;Connector improvements and fixes&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#release-notes---please-read&quot; id=&quot;markdown-toc-release-notes---please-read&quot;&gt;Release Notes - Please Read&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#changes-to-dynamic-class-loading-of-user-code&quot; id=&quot;markdown-toc-changes-to-dynamic-class-loading-of-user-code&quot;&gt;Changes to dynamic class loading of user code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#no-more-avro-dependency-included-by-default&quot; id=&quot;markdown-toc-no-more-avro-dependency-included-by-default&quot;&gt;No more Avro dependency included by default&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#hadoop-free-flink-1&quot; id=&quot;markdown-toc-hadoop-free-flink-1&quot;&gt;Hadoop-free Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#bundled-s3-filesystems&quot; id=&quot;markdown-toc-bundled-s3-filesystems&quot;&gt;Bundled S3 FileSystems&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;new-features-and-improvements&quot;&gt;New Features and Improvements&lt;/h2&gt;
&lt;h3 id=&quot;end-to-end-exactly-once-applications-with-apache-flink-and-apache-kafka-and-twophasecommitsinkfunction&quot;&gt;End-to-end Exactly Once Applications with Apache Flink and Apache Kafka and TwoPhaseCommitSinkFunction&lt;/h3&gt;
&lt;p&gt;Flink 1.4 includes a first version of an exactly-once producer for Apache Kafka 0.11. This producer
enables developers who build Flink applications with Kafka as a data source and sink to compute
exactly-once results not just within the Flink program, but truly “end-to-end” in the application.&lt;/p&gt;
&lt;p&gt;The common pattern used for exactly-once applications in Kafka and in other sinks–the two-phase
commit algorithm–has been extracted in Flink 1.4.0 into a common class, the
TwoPhaseCommitSinkFunction (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7210&quot;&gt;FLINK-7210&lt;/a&gt;). This
will make it easier for users to create their own exactly-once data sinks in the future.&lt;/p&gt;
&lt;h3 id=&quot;table-api-and-streaming-sql-enhancements&quot;&gt;Table API and Streaming SQL Enhancements&lt;/h3&gt;
&lt;p&gt;Flink SQL now supports windowed joins based on processing time and event time
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5725&quot;&gt;FLINK-5725&lt;/a&gt;). Users will be able to execute a
join between 2 streaming tables and compute windowed results according to these 2 different concepts
of time. The syntax and semantics in Flink are the same as standard SQL with JOIN and with Flink’s
streaming SQL more broadly.&lt;/p&gt;
&lt;p&gt;Flink SQL also now supports “INSERT INTO SELECT” queries, which makes it possible to write results
from SQL directly into a data sink (an external system that receives data from a Flink application).
This improves operability and ease-of-use of Flink SQL.&lt;/p&gt;
&lt;p&gt;The Table API now supports aggregations on streaming tables; previously, the only supported
operations on streaming tables were projection, selection, and union
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4557&quot;&gt;FLINK-4557&lt;/a&gt;). This feature was initially discussed in Flink
Improvement Proposal 11: &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot;&gt;FLIP-11&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The release also adds support for new table API and SQL sources and sinks, including a Kafka 0.11
source and JDBC sink.&lt;/p&gt;
&lt;p&gt;Lastly, Flink SQL now uses Apache Calcite 1.14, which was just released in October 2017
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7051&quot;&gt;FLINK-7051&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id=&quot;a-significantly-improved-dependency-structure-and-reversed-class-loading&quot;&gt;A Significantly-Improved Dependency Structure and Reversed Class Loading&lt;/h3&gt;
&lt;p&gt;Flink 1.4.0 shades a number of dependences and subtle runtime conflicts, including:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;ASM&lt;/li&gt;
&lt;li&gt;Guava&lt;/li&gt;
&lt;li&gt;Jackson&lt;/li&gt;
&lt;li&gt;Netty&lt;/li&gt;
&lt;li&gt;Apache Zookeeper&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;These changes improve Flink’s overall stability and removes friction when embedding Flink or calling
Flink “library style”.&lt;/p&gt;
&lt;p&gt;The release also introduces default reversed (child-first) class loading for dynamically-loaded user
code, allowing for different dependencies than those included in the core framework.&lt;/p&gt;
&lt;p&gt;For details on those changes please check out the relevant Jira issues:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7442&quot;&gt;FLINK-7442&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6529&quot;&gt;FLINK-6529&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;hadoop-free-flink&quot;&gt;Hadoop-free Flink&lt;/h3&gt;
&lt;p&gt;Apache Flink users without any Apache Hadoop dependencies can now run Flink without Hadoop. Flink
programs that do not rely on Hadoop components can now be much smaller, a benefit particularly in a
container-based setup resulting in less network traffic and better performance.&lt;/p&gt;
&lt;p&gt;This includes the addition of Flink’s own Amazon S3 filesystem implementations based on Hadoop’s S3a
and Presto’s S3 file system with properly shaded dependencies (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5706&quot;&gt;FLINK-5706&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The details of these changes regarding Hadoop-free Flink are available in the Jira issue:
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2268&quot;&gt;FLINK-2268&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;improvements-to-flink-internals&quot;&gt;Improvements to Flink Internals&lt;/h3&gt;
&lt;p&gt;Flink 1.4.0 introduces a new blob storage architecture that was first discussed in
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-19%3A+Improved+BLOB+storage+architecture&quot;&gt;Flink Improvement Proposal 19&lt;/a&gt; (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6916&quot;&gt;FLINK-6916&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This will enable easier integration with both the work being done in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;Flink Improvement Proposal 6&lt;/a&gt; in
the future and with other improvements in the 1.4.0 release, such as support for messages larger
than the maximum Akka Framesize (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6046&quot;&gt;FLINK-6046&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;The improvement also enables Flink to leverage distributed file systems in high availability
settings for optimized distribution of deployment data to TaskManagers.&lt;/p&gt;
&lt;h3 id=&quot;improvements-to-the-queryable-state-client&quot;&gt;Improvements to the Queryable State Client&lt;/h3&gt;
&lt;p&gt;Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/state/queryable_state.html&quot;&gt;queryable state&lt;/a&gt; makes it possible for users to access application state directly in Flink
before the state has been sent to an external database or key-value store.&lt;/p&gt;
&lt;p&gt;Flink 1.4.0 introduces a range of improvements to the queryable state client, including a more
container-friendly architecture, a more user-friendly API that hides configuration parameters, and
the groundwork to be able to expose window state (the state of an in-flight window) in the future.&lt;/p&gt;
&lt;p&gt;For details about the changes to queryable state please refer to the umbrella Jira issue:
&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5675&quot;&gt;FLINK-5675&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;metrics-and-monitoring&quot;&gt;Metrics and Monitoring&lt;/h3&gt;
&lt;p&gt;Flink’s metrics system now also includes support for Prometheus, an increasingly-popular metrics and
reporting system within the Flink community (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6221&quot;&gt;FLINK-6221&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;And the Apache Kafka connector in Flink now exposes metrics for failed and successful offset commits
in the Kafka consumer callback (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6998&quot;&gt;FLINK-6998&lt;/a&gt;).&lt;/p&gt;
&lt;h3 id=&quot;connector-improvements-and-fixes&quot;&gt;Connector improvements and fixes&lt;/h3&gt;
&lt;p&gt;Flink 1.4.0 introduces an Apache Kafka 0.11 connector and, as described above, support for an
exactly-once producer for Kafka 0.11 (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6988&quot;&gt;FLINK-6988&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Additionally, the Flink-Kafka consumer now supports dynamic partition discovery &amp;amp; topic discovery
based on regex. This means that the Flink-Kafka consumer can pick up new Kafka partitions without
needing to restart the job and while maintaining exactly-once guarantees
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4022&quot;&gt;FLINK-4022&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Flink’s Apache Kinesis connector now uses an updated version of the Kinesis Consumer Library and
Kinesis Consumer Library. This introduces improved retry logic to the connector and should
significantly reduce the number of failures caused by Flink writing too quickly to Kinesis
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7366&quot;&gt;FLINK-7366&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;Flink’s Apache Cassandra connector now supports Scala tuples–previously, only streams of Java
tuples were supported (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4497&quot;&gt;FLINK-4497&lt;/a&gt;). Also, a bug was fixed in
the Cassandra connector that caused messages to be lost in certain instances
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4500&quot;&gt;FLINK-4500&lt;/a&gt;).&lt;/p&gt;
&lt;h2 id=&quot;release-notes---please-read&quot;&gt;Release Notes - Please Read&lt;/h2&gt;
&lt;p&gt;Some of these changes will require updating the configuration or Maven dependencies for existing
programs. Please read below to see if you might be affected.&lt;/p&gt;
&lt;h3 id=&quot;changes-to-dynamic-class-loading-of-user-code&quot;&gt;Changes to dynamic class loading of user code&lt;/h3&gt;
&lt;p&gt;As mentioned above, we changed the way Flink loads user code from the previous default of
&lt;em&gt;parent-first class loading&lt;/em&gt; (the default for Java) to &lt;em&gt;child-first classloading&lt;/em&gt;, which is a common
practice in Java Application Servers, where this is also referred to as inverted or reversed class
loading.&lt;/p&gt;
&lt;p&gt;This should not affect regular user code but will enable programs to use a different version of
dependencies that come with Flink – for example Akka, netty, or Jackson. If you want to change back
to the previous default, you can use the configuration setting &lt;code&gt;classloader.resolve-order: parent-first&lt;/code&gt;,
the new default being &lt;code&gt;child-first&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&quot;no-more-avro-dependency-included-by-default&quot;&gt;No more Avro dependency included by default&lt;/h3&gt;
&lt;p&gt;Flink previously included Avro by default so user programs could simply use Avro and not worry about
adding any dependencies. This behavior was changed in Flink 1.4 because it can lead to dependency
clashes.&lt;/p&gt;
&lt;p&gt;You now must manually include the Avro dependency (&lt;code&gt;flink-avro&lt;/code&gt;) with your program jar (or add it to
the Flink lib folder) if you want to use Avro.&lt;/p&gt;
&lt;h3 id=&quot;hadoop-free-flink-1&quot;&gt;Hadoop-free Flink&lt;/h3&gt;
&lt;p&gt;Starting with version 1.4, Flink can run without any Hadoop dependencies present in the Classpath.
Along with simply running without Hadoop, this enables Flink to dynamically use whatever Hadoop
version is available in the classpath.&lt;/p&gt;
&lt;p&gt;You could, for example, download the Hadoop-free release of Flink but use that to run on any
supported version of YARN, and Flink would dynamically use the Hadoop dependencies from YARN.&lt;/p&gt;
&lt;p&gt;This also means that in cases where you used connectors to HDFS, such as the &lt;code&gt;BucketingSink&lt;/code&gt; or
&lt;code&gt;RollingSink&lt;/code&gt;, you now have to ensure that you either use a Flink distribution with bundled Hadoop
dependencies or make sure to include Hadoop dependencies when building a jar file for your
application.&lt;/p&gt;
&lt;h3 id=&quot;bundled-s3-filesystems&quot;&gt;Bundled S3 FileSystems&lt;/h3&gt;
&lt;p&gt;Flink 1.4 comes bundled with two different S3 FileSystems based on the Presto S3 FileSystem and
the Hadoop S3A FileSystem. They don’t have dependencies (because all dependencies are
shaded/relocated) and you can use them by dropping the respective file from the &lt;code&gt;opt&lt;/code&gt; directory
into the &lt;code&gt;lib&lt;/code&gt; directory of your Flink installation. For more information about this, please refer
to the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/filesystems.html#built-in-file-systems&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following 106 people contributed to the 1.4.0 release. Thank you to
all contributors!&lt;/p&gt;
&lt;p&gt;Ajay Tripathy, Alejandro Alcalde, Aljoscha Krettek, Bang, Phiradet, Bowen Li, Chris Ward, Cristian,
Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii Kniazev, DmytroShkvyra, Fabian
Hueske, FlorianFan, Fokko Driesprong, Gabor Gevay, Gary Yao, Greg Hogan, Haohui Mai, Hequn Cheng,
James Lafa, Jark Wu, Jie Shen, Jing Fan, JingsongLi, Joerg Schad, Juan Paulo Gutierrez, Ken Geis,
Kent Murra, Kurt Young, Lim Chee Hau, Maximilian Bode, Michael Fong, Mike Kobit, Mikhail Lipkovich,
Nico Kruber, Novotnik, Petr, Nycholas de Oliveira e Oliveira, Patrick Lucas, Piotr Nowojski, Robert
Metzger, Rodrigo Bonifacio, Rong Rong, Scott Kidder, Sebastian Klemke, Shuyi Chen, Stefan Richter,
Stephan Ewen, Svend Vanderveken, Till Rohrmann, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Usman
Younas, Vetriselvan1187, Vishnu Viswanath, Wright, Eron, Xingcan Cui, Xpray, Yestin, Yonatan Most,
Zhenzhong Xu, Zhijiang, adebski, asdf2014, bbayani, biao.liub, cactuslrd.lird, dawidwys, desktop,
fengyelei, godfreyhe, gosubpl, gyao, hongyuhong, huafengw, kkloudas, kl0u, lincoln-lil,
lingjinjiang, mengji.fy, minwenjun, mtunique, p1tz, paul, rtudoran, shaoxuan-wang, sirko
bretschneider, sunjincheng121, tedyu, twalthr, uybhatti, wangmiao1981, yew1eb, z00376786, zentol,
zhangminglei, zhe li, zhouhai02, zjureel, 付典, 军长, 宝牛, 淘江, 金竹&lt;/p&gt;
</description>
<pubDate>Tue, 12 Dec 2017 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/12/12/release-1.4.0.html</link>
<guid isPermaLink="true">/news/2017/12/12/release-1.4.0.html</guid>
</item>
<item>
<title>Looking Ahead to Apache Flink 1.4.0 and 1.5.0</title>
<description>&lt;p&gt;The Apache Flink 1.4.0 release is on track to happen in the next couple of weeks, and for all of the
readers out there who haven’t been following the release discussion on &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink’s developer mailing
list&lt;/a&gt;, we’d like to provide some details on
what’s coming in Flink 1.4.0 as well as a preview of what the Flink community will save for 1.5.0.&lt;/p&gt;
&lt;p&gt;Both releases include ambitious features that we believe will move Flink to an entirely new level in
terms of the types of problems it can solve and applications it can support. The community deserves
lots of credit for its hard work over the past few months, and we’re excited to see these features
in the hands of users.&lt;/p&gt;
&lt;p&gt;This post will describe how the community plans to get there and the rationale behind the approach.&lt;/p&gt;
&lt;h2 id=&quot;coming-soon-major-changes-to-flinks-runtime&quot;&gt;Coming soon: Major Changes to Flink’s Runtime&lt;/h2&gt;
&lt;p&gt;There are 3 significant improvements to the Apache Flink engine that the community has nearly
completed and that will have a meaningful impact on Flink’s operability and performance.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Rework of the deployment model and distributed process&lt;/li&gt;
&lt;li&gt;Transition from configurable, fixed-interval network I/O to event-driven network I/O and application-level flow control for better backpressure handling&lt;/li&gt;
&lt;li&gt;Faster recovery from failure&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Next, we’ll go through each of these improvements in more detail.&lt;/p&gt;
&lt;h2 id=&quot;reworking-flinks-deployment-model-and-distributed-processing&quot;&gt;Reworking Flink’s Deployment Model and Distributed Processing&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot;&gt;FLIP-6&lt;/a&gt; (FLIP is short for
FLink Improvement Proposal and FLIPs are proposals for bigger changes to Flink) is an initiative
that’s been in the works for more than a year and represents a major refactor of Flink’s deployment
model and distributed process. The underlying motivation for FLIP-6 was the fact that Flink is being
adopted by a wider range of developer communities–both developers coming from the big data and
analytics space as well as developers coming from the event-driven applications space.&lt;/p&gt;
&lt;p&gt;Modern, stateful stream processing has served as a convergence for these two developer communities.
Despite a significant overlap of the core concepts in the applications being built, each group of
developers has its own set of common tools, deployment models, and expected behaviors when working
with a stream processing framework like Flink.&lt;/p&gt;
&lt;p&gt;FLIP-6 will ensure that Flink fits naturally in both of these contexts, behaving as though it’s
native to each ecosystem and operating seamlessly within a broader technology stack. A few of the
specific changes in FLIP-6 that will have such an impact:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Leveraging cluster management frameworks to support full resource elasticity&lt;/li&gt;
&lt;li&gt;First-class support for containerized environments such as Kubernetes and Docker&lt;/li&gt;
&lt;li&gt;REST-based client-cluster communication to ease operations and 3rd party integrations&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;FLIP-6, along with already-introduced features like
&lt;a href=&quot;https://data-artisans.com/blog/apache-flink-at-mediamath-rescaling-stateful-applications&quot;&gt;rescalable state&lt;/a&gt;,
lays the groundwork for dynamic scaling in Flink, meaning that Flink programs will be able to scale up or down
automatically based on required resources–a huge step forward in terms of ease of operability and
the efficiency of Flink applications.&lt;/p&gt;
&lt;h2 id=&quot;lower-latency-via-improvements-to-the-apache-flink-network-stack&quot;&gt;Lower Latency via Improvements to the Apache Flink Network Stack&lt;/h2&gt;
&lt;p&gt;Speed will always be a key consideration for users who build stream processing applications, and
Flink 1.5 will include a rework of the network stack that will even further improve Flink’s latency.
At the heart of this work is a transition from configurable, fixed-interval network I/O to event-
driven network I/O and application-level flow control, ensuring that Flink will use all available
network capacity, as well as credit-based flow control which offers more fine-grained backpressuring
for improved checkpoint alignments.&lt;/p&gt;
&lt;p&gt;In our testing (&lt;a href=&quot;https://www.slideshare.net/FlinkForward/flink-forward-berlin-2017-nico-kruber-building-a-network-stack-for-optimal-throughput-lowlatency-tradeoffs#26&quot;&gt;see slide 26 here&lt;/a&gt;),
we’ve seen a substantial improvement in latency using event-driven network I/O, and the community
is also doing work to make sure we’re able to provide this increase in speed without a measurable
throughput tradeoff.&lt;/p&gt;
&lt;h2 id=&quot;faster-recovery-from-failures&quot;&gt;Faster Recovery from Failures&lt;/h2&gt;
&lt;p&gt;Flink 1.3.0 introduced incremental checkpoints, making it possible to take a checkpoint of state
updates since the last successfully-completed checkpoint only rather than the previous behavior of
only taking checkpoints of the entire state of the application. This has led to significant
performance improvements for users with large state.&lt;/p&gt;
&lt;p&gt;Flink 1.5 will introduce task-local recovery, which means that Flink will store a second copy of the
most recent checkpoint on the local disk (or even in main memory) of a task manager. The primary
copy still goes to durable storage so that it’s resilient to machine failures.&lt;/p&gt;
&lt;p&gt;In case of failover, the scheduler will try to reschedule tasks to their previous task manager (in
other words, to the same machine again) if this is possible. The task can then recover from the
locally-kept state. This makes it possible to avoid reading all state from the distributed file
system (which is remote over the network). Especially in applications with very large state, not
having to read many gigabytes over the network and instead from local disk will result in
significant performance gains in recovery.&lt;/p&gt;
&lt;h2 id=&quot;the-proposed-timeline-for-flink-14-and-flink-15&quot;&gt;The Proposed Timeline for Flink 1.4 and Flink 1.5&lt;/h2&gt;
&lt;p&gt;The good news is that all 3 of the features described above are well underway, and in fact, much of
the work is already covered by open pull requests.&lt;/p&gt;
&lt;p&gt;But given these features’ importance and the complexity of the work involved, the community expected
that the QA and testing required would be extensive and would delay the release of the otherwise-
ready features also on the list for the next release.&lt;/p&gt;
&lt;p&gt;And so the community decided to withhold the 3 features above (deployment model rework, improvements
to the network stack, and faster recovery) to be included a separate Flink 1.5 release that will
come shortly after the Flink 1.4 release. Flink 1.5 is estimated to come just a couple of months
after 1.4 rather than the typical 4-month cycle in between major releases.&lt;/p&gt;
&lt;p&gt;The soon-to-be-released Flink 1.4 represents the current state of Flink without merging those 3
features. And Flink 1.4 is a substantial release in its own right, including, but not limited to,
the following:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;A significantly improved dependency structure&lt;/strong&gt;, removing many of Flink’s dependencies and subtle runtime conflicts. This increases overall stability and removes friction when embedding Flink or calling Flink “library style”.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Reversed class loading for dynamically-loaded user code&lt;/strong&gt;, allowing for different dependencies than those included in the core framework.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An Apache Kafka 0.11 exactly-once producer&lt;/strong&gt;, making it possible to build end-to-end exactly once applications with Flink and Kafka.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streaming SQL JOIN based on processing time and event time&lt;/strong&gt;, which gives users the full advantage of Flink’s time handling while using a SQL JOIN.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Table API / Streaming SQL Source and Sink Additions&lt;/strong&gt;, including a Kafka 0.11 source and JDBC sink.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hadoop-free Flink&lt;/strong&gt;, meaning that users who don’t rely on any Hadoop components (such as YARN or HDFS) in their Flink applications can use Flink without Hadoop for the first time.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improvements to queryable state&lt;/strong&gt;, including a more container-friendly architecture, a more user-friendly API that hides configuration parameters, and the groundwork to be able to expose window state (the state of an in-flight window) in the future.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Connector improvements and fixes&lt;/strong&gt; for a range of connectors including Kafka, Apache Cassandra, Amazon Kinesis, and more.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Improved RPC performance&lt;/strong&gt; for faster recovery from failure&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The community decided it was best to get these features into a stable version of Flink as soon as
possible, and the separation of what could have been a single (and very substantial) Flink 1.4
release into 1.4 and 1.5 serves that purpose.&lt;/p&gt;
&lt;p&gt;We’re excited by what each of these represents for Apache Flink, and we’d like to extend our thanks
to the Flink community for all of their hard work.&lt;/p&gt;
&lt;p&gt;If you’d like to follow along with release discussions, &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;please subscribe to the dev@ mailing
list&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Wed, 22 Nov 2017 11:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/11/22/release-1.4-and-1.5-timeline.html</link>
<guid isPermaLink="true">/news/2017/11/22/release-1.4-and-1.5-timeline.html</guid>
</item>
<item>
<title>Apache Flink 1.3.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released the second bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;
&lt;p&gt;This release includes more than 60 fixes and minor improvements for Flink 1.3.1. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.3.2.&lt;/p&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
Important Notice:
&lt;p&gt;A user reported a bug in the FlinkKafkaConsumer
(&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7143&quot;&gt;FLINK-7143&lt;/a&gt;) that is causing
incorrect partition assignment in large Kafka deployments in the presence of inconsistent broker
metadata. In that case multiple parallel instances of the FlinkKafkaConsumer may read from the
same topic partition, leading to data duplication. In Flink 1.3.2 this bug is fixed but incorrect
assignments from Flink 1.3.0 and 1.3.1 cannot be automatically fixed by upgrading to Flink 1.3.2
via a savepoint because the upgraded version would resume the wrong partition assignment from the
savepoint. If you believe you are affected by this bug (seeing messages from some partitions
duplicated) please refer to the JIRA issue for an upgrade path that works around that.&lt;/p&gt;
&lt;p&gt;Before attempting the more elaborate upgrade path, we would suggest to check if you are
actually affected by this bug. We did not manage to reproduce it in various testing clusters and
according to the reporting user, it only appeared in rare cases on their very large setup. This
leads us to believe that most likely only a minority of setups would be affected by this bug.&lt;/p&gt;
&lt;/div&gt;
&lt;p&gt;Notable changes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The default Kafka version for Flink Kafka Consumer 0.10 was bumped from 0.10.0.1 to 0.10.2.1.&lt;/li&gt;
&lt;li&gt;Some default values for configurations of AWS API call behaviors in the Flink Kinesis Consumer
were adapted for better default consumption performance: 1) &lt;code&gt;SHARD_GETRECORDS_MAX&lt;/code&gt; default changed
to 10,000, and 2) &lt;code&gt;SHARD_GETRECORDS_INTERVAL_MILLIS&lt;/code&gt; default changed to 200ms.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Updated Maven dependencies:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;List of resolved issues:&lt;/p&gt;
&lt;h2&gt; Sub-task
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6665&quot;&gt;FLINK-6665&lt;/a&gt;] - Pass a ScheduledExecutorService to the RestartStrategy
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6667&quot;&gt;FLINK-6667&lt;/a&gt;] - Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6680&quot;&gt;FLINK-6680&lt;/a&gt;] - App &amp;amp; Flink migration guide: updates for the 1.3 release
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5488&quot;&gt;FLINK-5488&lt;/a&gt;] - yarnClient should be closed in AbstractYarnClusterDescriptor for error conditions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6376&quot;&gt;FLINK-6376&lt;/a&gt;] - when deploy flink cluster on the yarn, it is lack of hdfs delegation token.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6541&quot;&gt;FLINK-6541&lt;/a&gt;] - Jar upload directory not created
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6654&quot;&gt;FLINK-6654&lt;/a&gt;] - missing maven dependency on &amp;quot;flink-shaded-hadoop2-uber&amp;quot; in flink-dist
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6655&quot;&gt;FLINK-6655&lt;/a&gt;] - Misleading error message when HistoryServer path is empty
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6742&quot;&gt;FLINK-6742&lt;/a&gt;] - Improve error message when savepoint migration fails due to task removal
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6774&quot;&gt;FLINK-6774&lt;/a&gt;] - build-helper-maven-plugin version not set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6806&quot;&gt;FLINK-6806&lt;/a&gt;] - rocksdb is not listed as state backend in doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6843&quot;&gt;FLINK-6843&lt;/a&gt;] - ClientConnectionTest fails on travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6867&quot;&gt;FLINK-6867&lt;/a&gt;] - Elasticsearch 1.x ITCase still instable due to embedded node instability
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6918&quot;&gt;FLINK-6918&lt;/a&gt;] - Failing tests: ChainLengthDecreaseTest and ChainLengthIncreaseTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6945&quot;&gt;FLINK-6945&lt;/a&gt;] - TaskCancelAsyncProducerConsumerITCase.testCancelAsyncProducerAndConsumer instable test case
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6964&quot;&gt;FLINK-6964&lt;/a&gt;] - Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6965&quot;&gt;FLINK-6965&lt;/a&gt;] - Avro is missing snappy dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6987&quot;&gt;FLINK-6987&lt;/a&gt;] - TextInputFormatTest fails when run in path containing spaces
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6996&quot;&gt;FLINK-6996&lt;/a&gt;] - FlinkKafkaProducer010 doesn&amp;#39;t guarantee at-least-once semantic
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7005&quot;&gt;FLINK-7005&lt;/a&gt;] - Optimization steps are missing for nested registered tables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7011&quot;&gt;FLINK-7011&lt;/a&gt;] - Instable Kafka testStartFromKafkaCommitOffsets failures on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7025&quot;&gt;FLINK-7025&lt;/a&gt;] - Using NullByteKeySelector for Unbounded ProcTime NonPartitioned Over
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7034&quot;&gt;FLINK-7034&lt;/a&gt;] - GraphiteReporter cannot recover from lost connection
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7038&quot;&gt;FLINK-7038&lt;/a&gt;] - Several misused &amp;quot;KeyedDataStream&amp;quot; term in docs and Javadocs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7041&quot;&gt;FLINK-7041&lt;/a&gt;] - Deserialize StateBackend from JobCheckpointingSettings with user classloader
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7132&quot;&gt;FLINK-7132&lt;/a&gt;] - Fix BulkIteration parallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7133&quot;&gt;FLINK-7133&lt;/a&gt;] - Fix Elasticsearch version interference
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7137&quot;&gt;FLINK-7137&lt;/a&gt;] - Flink table API defaults top level fields as nullable and all nested fields within CompositeType as non-nullable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7143&quot;&gt;FLINK-7143&lt;/a&gt;] - Partition assignment for Kafka consumer is not stable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7154&quot;&gt;FLINK-7154&lt;/a&gt;] - Missing call to build CsvTableSource example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7158&quot;&gt;FLINK-7158&lt;/a&gt;] - Wrong test jar dependency in flink-clients
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7177&quot;&gt;FLINK-7177&lt;/a&gt;] - DataSetAggregateWithNullValuesRule fails creating null literal for non-nullable type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7178&quot;&gt;FLINK-7178&lt;/a&gt;] - Datadog Metric Reporter Jar is Lacking Dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7180&quot;&gt;FLINK-7180&lt;/a&gt;] - CoGroupStream perform checkpoint failed
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7195&quot;&gt;FLINK-7195&lt;/a&gt;] - FlinkKafkaConsumer should not respect fetched partitions to filter restored partition states
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7216&quot;&gt;FLINK-7216&lt;/a&gt;] - ExecutionGraph can perform concurrent global restarts to scheduling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7225&quot;&gt;FLINK-7225&lt;/a&gt;] - Cutoff exception message in StateDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7226&quot;&gt;FLINK-7226&lt;/a&gt;] - REST responses contain invalid content-encoding header
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7231&quot;&gt;FLINK-7231&lt;/a&gt;] - SlotSharingGroups are not always released in time for new restarts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7234&quot;&gt;FLINK-7234&lt;/a&gt;] - Fix CombineHint documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7241&quot;&gt;FLINK-7241&lt;/a&gt;] - Fix YARN high availability documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7255&quot;&gt;FLINK-7255&lt;/a&gt;] - ListStateDescriptor example uses wrong constructor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7258&quot;&gt;FLINK-7258&lt;/a&gt;] - IllegalArgumentException in Netty bootstrap with large memory state segment size
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7266&quot;&gt;FLINK-7266&lt;/a&gt;] - Don&amp;#39;t attempt to delete parent directory on S3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7268&quot;&gt;FLINK-7268&lt;/a&gt;] - Zookeeper Checkpoint Store interacting with Incremental State Handles can lead to loss of handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7281&quot;&gt;FLINK-7281&lt;/a&gt;] - Fix various issues in (Maven) release infrastructure
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6365&quot;&gt;FLINK-6365&lt;/a&gt;] - Adapt default values of the Kinesis connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6575&quot;&gt;FLINK-6575&lt;/a&gt;] - Disable all tests on Windows that use HDFS
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6682&quot;&gt;FLINK-6682&lt;/a&gt;] - Improve error message in case parallelism exceeds maxParallelism
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6789&quot;&gt;FLINK-6789&lt;/a&gt;] - Remove duplicated test utility reducer in optimizer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6874&quot;&gt;FLINK-6874&lt;/a&gt;] - Static and transient fields ignored for POJOs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6898&quot;&gt;FLINK-6898&lt;/a&gt;] - Limit size of operator component in metric name
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6937&quot;&gt;FLINK-6937&lt;/a&gt;] - Fix link markdown in Production Readiness Checklist doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6940&quot;&gt;FLINK-6940&lt;/a&gt;] - Clarify the effect of configuring per-job state backend
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6998&quot;&gt;FLINK-6998&lt;/a&gt;] - Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7004&quot;&gt;FLINK-7004&lt;/a&gt;] - Switch to Travis Trusty image
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7032&quot;&gt;FLINK-7032&lt;/a&gt;] - Intellij is constantly changing language level of sub projects back to 1.6
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7069&quot;&gt;FLINK-7069&lt;/a&gt;] - Catch exceptions for each reporter separately
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7149&quot;&gt;FLINK-7149&lt;/a&gt;] - Add checkpoint ID to &amp;#39;sendValues()&amp;#39; in GenericWriteAheadSink
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7164&quot;&gt;FLINK-7164&lt;/a&gt;] - Extend integration tests for (externalised) checkpoints, checkpoint store
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7174&quot;&gt;FLINK-7174&lt;/a&gt;] - Bump dependency of Kafka 0.10.x to the latest one
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7211&quot;&gt;FLINK-7211&lt;/a&gt;] - Exclude Gelly javadoc jar from release
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7224&quot;&gt;FLINK-7224&lt;/a&gt;] - Incorrect Javadoc description in all Kafka consumer versions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7228&quot;&gt;FLINK-7228&lt;/a&gt;] - Harden HistoryServerStaticFileHandlerTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7233&quot;&gt;FLINK-7233&lt;/a&gt;] - TaskManagerHeapSizeCalculationJavaBashTest failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7287&quot;&gt;FLINK-7287&lt;/a&gt;] - test instability in Kafka010ITCase.testCommitOffsetsToKafka
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-7290&quot;&gt;FLINK-7290&lt;/a&gt;] - Make release scripts modular
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Sat, 05 Aug 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/08/05/release-1.3.2.html</link>
<guid isPermaLink="true">/news/2017/08/05/release-1.3.2.html</guid>
</item>
<item>
<title>A Deep Dive into Rescalable State in Apache Flink</title>
<description>&lt;p&gt;&lt;em&gt;Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink.&lt;/em&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#an-intro-to-stateful-stream-processing&quot; id=&quot;markdown-toc-an-intro-to-stateful-stream-processing&quot;&gt;An Intro to Stateful Stream Processing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#state-in-apache-flink&quot; id=&quot;markdown-toc-state-in-apache-flink&quot;&gt;State in Apache Flink&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#rescaling-stateful-stream-processing-jobs&quot; id=&quot;markdown-toc-rescaling-stateful-stream-processing-jobs&quot;&gt;Rescaling Stateful Stream Processing Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#reassigning-operator-state-when-rescaling&quot; id=&quot;markdown-toc-reassigning-operator-state-when-rescaling&quot;&gt;Reassigning Operator State When Rescaling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#reassigning-keyed-state-when-rescaling&quot; id=&quot;markdown-toc-reassigning-keyed-state-when-rescaling&quot;&gt;Reassigning Keyed State When Rescaling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#wrapping-up&quot; id=&quot;markdown-toc-wrapping-up&quot;&gt;Wrapping Up&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;an-intro-to-stateful-stream-processing&quot;&gt;An Intro to Stateful Stream Processing&lt;/h2&gt;
&lt;p&gt;At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input.&lt;/p&gt;
&lt;p&gt;In contrast, operators in &lt;em&gt;stateless&lt;/em&gt; stream processing only consider their current inputs, without further context and knowledge about the past. A simple example to illustrate this difference: let us consider a source stream that emits events with schema &lt;code&gt;e = {event_id:int, event_value:int}&lt;/code&gt;. Our goal is, for each event, to extract and output the &lt;code&gt;event_value&lt;/code&gt;. We can easily achieve this with a simple source-map-sink pipeline, where the map function extracts the &lt;code&gt;event_value&lt;/code&gt; from the event and emits it downstream to an outputting sink. This is an instance of stateless stream processing.&lt;/p&gt;
&lt;p&gt;But what if we want to modify our job to output the &lt;code&gt;event_value&lt;/code&gt; only if it is larger than the value from the previous event? In this case, our map function obviously needs some way to remember the &lt;code&gt;event_value&lt;/code&gt; from a past event — and so this is an instance of stateful stream processing.&lt;/p&gt;
&lt;p&gt;This example should demonstrate that state is a fundamental, enabling concept in stream processing that is required for a majority of interesting use cases.&lt;/p&gt;
&lt;h2 id=&quot;state-in-apache-flink&quot;&gt;State in Apache Flink&lt;/h2&gt;
&lt;p&gt;Apache Flink is a massively parallel distributed system that allows stateful stream processing at large scale. For scalability, a Flink job is logically decomposed into a graph of operators, and the execution of each operator is physically decomposed into multiple parallel operator instances. Conceptually, each parallel operator instance in Flink is an independent task that can be scheduled on its own machine in a network-connected cluster of shared-nothing machines.&lt;/p&gt;
&lt;p&gt;For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.&lt;/p&gt;
&lt;p&gt;However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.&lt;/p&gt;
&lt;p&gt;For the sake of data locality, all state data in Flink is always bound to the task that runs the corresponding parallel operator instance and is co-located on the same machine that runs the task.&lt;/p&gt;
&lt;p&gt;Through this design, all state data for a task is local, and no network communication between tasks is required for state access. Avoiding this kind of traffic is crucial for the scalability of a massively parallel distributed system like Flink.&lt;/p&gt;
&lt;p&gt;For Flink’s stateful stream processing, we differentiate between two different types of state: operator state and keyed state. Operator state is scoped per parallel instance of an operator (sub-task), and keyed state can be thought of as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#keyed-state&quot;&gt;“operator state that has been partitioned, or sharded, with exactly one state-partition per key”&lt;/a&gt;. We could have easily implemented our previous example as operator state: all events that are routed through the operator instance can influence its value.&lt;/p&gt;
&lt;h2 id=&quot;rescaling-stateful-stream-processing-jobs&quot;&gt;Rescaling Stateful Stream Processing Jobs&lt;/h2&gt;
&lt;p&gt;Changing the parallelism (that is, changing the number of parallel subtasks that perform work for an operator) in stateless streaming is very easy. It requires only starting or stopping parallel instances of stateless operators and dis-/connecting them to/from their upstream and downstream operators as shown in &lt;strong&gt;Figure 1A&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;On the other hand, changing the parallelism of stateful operators is much more involved because we must also (i) redistribute the previous operator state in a (ii) consistent, (iii) meaningful way. Remember that in Flink’s shared-nothing architecture, all state is local to the task that runs the owning parallel operator instance, and there is no communication between parallel operator instances at job runtime.&lt;/p&gt;
&lt;p&gt;However, there is already one mechanism in Flink that allows the exchange of operator state between tasks, in a consistent way, with exactly-once guarantees — Flink’s checkpointing!&lt;/p&gt;
&lt;p&gt;You can see detail about Flink’s checkpoints in &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/stream_checkpointing.html&quot;&gt;the documentation&lt;/a&gt;. In a nutshell, a checkpoint is triggered when a checkpoint coordinator injects a special event (a so-called checkpoint barrier) into a stream.&lt;/p&gt;
&lt;p&gt;Checkpoint barriers flow downstream with the event stream from sources to sinks, and whenever an operator instance receives a barrier, the operator instance immediately snapshots its current state to a distributed storage system, e.g. HDFS.&lt;/p&gt;
&lt;p&gt;On restore, the new tasks for the job (which potentially run on different machines now) can again pick up the state data from the distributed storage system.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 1&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/stateless-stateful-streaming.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;We can piggyback rescaling of stateful jobs on checkpointing, as shown in &lt;strong&gt;Figure 1B&lt;/strong&gt;. First, a checkpoint is triggered and sent to a distributed storage system. Next, the job is restarted with a changed parallelism and can access a consistent snapshot of all previous state from the distributed storage. While this solves (i) redistribution of a (ii) consistent state across machines there is still one problem: without a clear 1:1 relationship between previous state and new parallel operator instances, how can we assign the state in a (iii) meaningful way?&lt;/p&gt;
&lt;p&gt;We could again assign the state from previous &lt;code&gt;map_1&lt;/code&gt; and &lt;code&gt;map_2&lt;/code&gt; to the new &lt;code&gt;map_1&lt;/code&gt; and &lt;code&gt;map_2&lt;/code&gt;. But this would leave &lt;code&gt;map_3&lt;/code&gt; with empty state. Depending on the type of state and concrete semantics of the job, this naive approach could lead to anything from inefficiency to incorrect results.&lt;/p&gt;
&lt;p&gt;In the following section, we’ll explain how we solved the problem of efficient, meaningful state reassignment in Flink. Each of Flink state’s two flavours, operator state and keyed state, requires a different approach to state assignment.&lt;/p&gt;
&lt;h2 id=&quot;reassigning-operator-state-when-rescaling&quot;&gt;Reassigning Operator State When Rescaling&lt;/h2&gt;
&lt;p&gt;First, we’ll discuss how state reassignment in rescaling works for operator state. A common real-world use-case of operator state in Flink is to maintain the current offsets for Kafka partitions in Kafka sources. Each Kafka source instance would maintain &lt;code&gt;&amp;lt;PartitionID, Offset&amp;gt;&lt;/code&gt; pairs – one pair for each Kafka partition that the source is reading–as operator state. How would we redistribute this operator state in case of rescaling? Ideally, we would like to reassign all &lt;code&gt;&amp;lt;PartitionID, Offset&amp;gt;&lt;/code&gt; pairs from the checkpoint in round robin across all parallel operator instances after the rescaling.&lt;/p&gt;
&lt;p&gt;As a user, we are aware of the “meaning” of Kafka partition offsets, and we know that we can treat them as independent, redistributable units of state. The problem of how we can we share this domain-specific knowledge with Flink remains.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Figure 2A&lt;/strong&gt; illustrates the previous interface for checkpointing operator state in Flink. On snapshot, each operator instance returned an object that represented its complete state. In the case of a Kafka source, this object was a list of partition offsets.&lt;/p&gt;
&lt;p&gt;This snapshot object was then written to the distributed store. On restore, the object was read from distributed storage and passed to the operator instance as a parameter to the restore function.&lt;/p&gt;
&lt;p&gt;This approach was problematic for rescaling: how could Flink decompose the operator state into meaningful, redistributable partitions? Even though the Kafka source was actually always a list of partition offsets, the previously-returned state object was a black box to Flink and therefore could not be redistributed.&lt;/p&gt;
&lt;p&gt;As a generalized approach to solve this black box problem, we slightly modified the checkpointing interface, called &lt;code&gt;ListCheckpointed&lt;/code&gt;. &lt;strong&gt;Figure 2B&lt;/strong&gt; shows the new checkpointing interface, which returns and receives a list of state partitions. Introducing a list instead of a single object makes the meaningful partitioning of state explicit: each item in the list still remains a black box to Flink, but is considered an atomic, independently re-distributable part of the operator state.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 2&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/list-checkpointed.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Our approach provides a simple API with which implementing operators can encode domain-specific knowledge about how to partition and merge units of state. With our new checkpointing interface, the Kafka source makes individual partition offsets explicit, and state reassignment becomes as easy as splitting and merging lists.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;FlinkKafkaConsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;RichParallelSourceFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;T&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;implements&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CheckpointedFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;transient&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ListState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;initializeState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FunctionInitializationContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;OperatorStateStore&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateStore&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getOperatorStateStore&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register the state with the backend&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stateStore&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getSerializableListState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;kafka-offsets&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// if the job was restarted, we set the restored offsets&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;isRestored&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kafkaOffset&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ... restore logic&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;nd&quot;&gt;@Override&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;snapshotState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;FunctionSnapshotContext&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;context&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;throws&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Exception&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;clear&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// write the partition offsets to the list of operator states&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;Entry&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;KafkaTopicPartition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Long&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subscribedPartitionOffsets&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;entrySet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;offsetsOperatorState&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;add&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Tuple2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getKey&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;partition&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getValue&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// ...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;reassigning-keyed-state-when-rescaling&quot;&gt;Reassigning Keyed State When Rescaling&lt;/h2&gt;
&lt;p&gt;The second flavour of state in Flink is keyed state. In contrast to operator state, keyed state is scoped by key, where the key is extracted from each stream event.&lt;/p&gt;
&lt;p&gt;To illustrate how keyed state differs from operator state, let’s use the following example. Assume we have a stream of events, where each event has the schema &lt;code&gt;{customer_id:int, value:int}&lt;/code&gt;. We have already learned that we can use operator state to compute and emit the running sum of values for all customers.&lt;/p&gt;
&lt;p&gt;Now assume we want to slightly modify our goal and compute a running sum of values for each individual &lt;code&gt;customer_id&lt;/code&gt;. This is a use case from keyed state, as one aggregated state must be maintained for each unique key in the stream.&lt;/p&gt;
&lt;p&gt;Note that keyed state is only available for keyed streams, which are created through the &lt;code&gt;keyBy()&lt;/code&gt; operation in Flink. The &lt;code&gt;keyBy()&lt;/code&gt; operation (i) specifies how to extract a key from each event and (ii) ensures that all events with the same key are always processed by the same parallel operator instance. As a result, all keyed state is transitively also bound to one parallel operator instance, because for each key, exactly one operator instance is responsible. This mapping from key to operator is deterministically computed through hash partitioning on the key.&lt;/p&gt;
&lt;p&gt;We can see that keyed state has one clear advantage over operator state when it comes to rescaling: we can easily figure out how to correctly split and redistribute the state across parallel operator instances. State reassignment simply follows the partitioning of the keyed stream. After rescaling, the state for each key must be assigned to the operator instance that is now responsible for that key, as determined by the hash partitioning of the keyed stream.&lt;/p&gt;
&lt;p&gt;While this automatically solves the problem of logically remapping the state to sub-tasks after rescaling, there is one more practical problem left to solve: how can we efficiently transfer the state to the subtasks’ local backends?&lt;/p&gt;
&lt;p&gt;When we’re not rescaling, each subtask can simply read the whole state as written to the checkpoint by a previous instance in one sequential read.&lt;/p&gt;
&lt;p&gt;When rescaling, however, this is no longer possible – the state for each subtask is now potentially scattered across the files written by all subtasks (think about what happens if you change the parallelism in &lt;code&gt;hash(key) mod parallelism&lt;/code&gt;). We have illustrated this problem in &lt;strong&gt;Figure 3A&lt;/strong&gt;. In this example, we show how keys are shuffled when rescaling from parallelism 3 to 4 for a key space of 0, 20, using identity as hash function to keep it easy to follow.&lt;/p&gt;
&lt;p&gt;A naive approach might be to read all the previous subtask state from the checkpoint in all sub-tasks and filter out the matching keys for each sub-task. While this approach can benefit from a sequential read pattern, each subtask potentially reads a large fraction of irrelevant state data, and the distributed file system receives a huge number of parallel read requests.&lt;/p&gt;
&lt;p&gt;Another approach could be to build an index that tracks the location of the state for each key in the checkpoint. With this approach, all sub-tasks could locate and read the matching keys very selectively. This approach would avoid reading irrelevant data, but it has two major downsides. A materialized index for all keys, i.e. a key-to-read-offset mapping, can potentially grow very large. Furthermore, this approach can also introduce a huge amount of random I/O (when seeking to the data for individual keys, see &lt;strong&gt;Figure 3A&lt;/strong&gt;, which typically entails very bad performance in distributed file systems.&lt;/p&gt;
&lt;p&gt;Flink’s approach sits in between those two extremes by introducing key-groups as the atomic unit of state assignment. How does this work? The number of key-groups must be determined before the job is started and (currently) cannot be changed after the fact. As key-groups are the atomic unit of state assignment, this also means that the number of key-groups is the upper limit for parallelism. In a nutshell, key-groups give us a way to trade between flexibility in rescaling (by setting an upper limit for parallelism) and the maximum overhead involved in indexing and restoring the state.&lt;/p&gt;
&lt;p&gt;We assign key-groups to subtasks as ranges. This makes the reads on restore not only sequential within each key-group, but often also across multiple key-groups. An additional benefit: this also keeps the metadata of key-group-to-subtask assignments very small. We do not maintain explicit lists of key-groups because it is sufficient to track the range boundaries.&lt;/p&gt;
&lt;p&gt;We have illustrated rescaling from parallelism 3 to 4 with 10 key-groups in &lt;strong&gt;Figure 3B&lt;/strong&gt;. As we can see, introducing key-groups and assigning them as ranges greatly improves the access pattern over the naive approach. Equation 2 and 3 in &lt;strong&gt;Figure 3B&lt;/strong&gt; also details how we compute key-groups and the range assignment.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;center&gt;&lt;i&gt;Figure 2&lt;/i&gt;&lt;/center&gt;&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/key-groups.svg&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h2&gt;
&lt;p&gt;Thanks for staying with us, and we hope you now have a clear idea of how rescalable state works in Apache Flink and how to make use of rescaling in real-world scenarios.&lt;/p&gt;
&lt;p&gt;Flink 1.3.0, which was released earlier this month, adds more tooling for state management and fault tolerance in Flink, including incremental checkpoints. And the community is exploring features such as…&lt;/p&gt;
&lt;p&gt;• State replication&lt;br /&gt;
• State that isn’t bound to the lifecycle of a Flink job&lt;br /&gt;
• Automatic rescaling (with no savepoints required)&lt;/p&gt;
&lt;p&gt;…for Flink 1.4.0 and beyond.&lt;/p&gt;
&lt;p&gt;If you’d like to learn more, we recommend starting with the Apache Flink &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html&quot;&gt;documentation&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;This is an excerpt from a post that originally appeared on the data Artisans blog. If you’d like to read the original post in its entirety, you can find it &lt;a href=&quot;https://data-artisans.com/blog/apache-flink-at-mediamath-rescaling-stateful-applications&quot; target=&quot;_blank&quot;&gt;here&lt;/a&gt; (external link).&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Tue, 04 Jul 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/features/2017/07/04/flink-rescalable-state.html</link>
<guid isPermaLink="true">/features/2017/07/04/flink-rescalable-state.html</guid>
</item>
<item>
<title>Apache Flink 1.3.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.3 series.&lt;/p&gt;
&lt;p&gt;This release includes 50 fixes and minor improvements for Flink 1.3.0. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.3.1.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.3.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h3&gt; Bug
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6492&quot;&gt;FLINK-6492&lt;/a&gt;] - Unclosed DataOutputViewStream in GenericArraySerializerConfigSnapshot#write()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6602&quot;&gt;FLINK-6602&lt;/a&gt;] - Table source with defined time attributes allows empty string
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6652&quot;&gt;FLINK-6652&lt;/a&gt;] - Problem with DelimitedInputFormat
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6659&quot;&gt;FLINK-6659&lt;/a&gt;] - RocksDBMergeIteratorTest, SavepointITCase leave temporary directories behind
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6669&quot;&gt;FLINK-6669&lt;/a&gt;] - [Build] Scala style check errror on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6685&quot;&gt;FLINK-6685&lt;/a&gt;] - SafetyNetCloseableRegistry is closed prematurely in Task::triggerCheckpointBarrier
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6772&quot;&gt;FLINK-6772&lt;/a&gt;] - Incorrect ordering of matched state events in Flink CEP
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6775&quot;&gt;FLINK-6775&lt;/a&gt;] - StateDescriptor cannot be shared by multiple subtasks
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6780&quot;&gt;FLINK-6780&lt;/a&gt;] - ExternalTableSource should add time attributes in the row type
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6783&quot;&gt;FLINK-6783&lt;/a&gt;] - Wrongly extracted TypeInformations for WindowedStream::aggregate
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6797&quot;&gt;FLINK-6797&lt;/a&gt;] - building docs fails with bundler 1.15
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6801&quot;&gt;FLINK-6801&lt;/a&gt;] - PojoSerializerConfigSnapshot cannot deal with missing Pojo fields
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6804&quot;&gt;FLINK-6804&lt;/a&gt;] - Inconsistent state migration behaviour between different state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6807&quot;&gt;FLINK-6807&lt;/a&gt;] - Elasticsearch 5 connector artifact not published to maven
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6808&quot;&gt;FLINK-6808&lt;/a&gt;] - Stream join fails when checkpointing is enabled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6809&quot;&gt;FLINK-6809&lt;/a&gt;] - side outputs documentation: wrong variable name in java example code
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6812&quot;&gt;FLINK-6812&lt;/a&gt;] - Elasticsearch 5 release artifacts not published to Maven central
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6815&quot;&gt;FLINK-6815&lt;/a&gt;] - Javadocs don&amp;#39;t work anymore in Flink 1.4-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6816&quot;&gt;FLINK-6816&lt;/a&gt;] - Fix wrong usage of Scala string interpolation in Table API
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6833&quot;&gt;FLINK-6833&lt;/a&gt;] - Race condition: Asynchronous checkpointing task can fail completed StreamTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6844&quot;&gt;FLINK-6844&lt;/a&gt;] - TraversableSerializer should implement compatibility methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6848&quot;&gt;FLINK-6848&lt;/a&gt;] - Extend the managed state docs with a Scala example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6853&quot;&gt;FLINK-6853&lt;/a&gt;] - Migrating from Flink 1.1 fails for FlinkCEP
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6869&quot;&gt;FLINK-6869&lt;/a&gt;] - Scala serializers do not have the serialVersionUID specified
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6875&quot;&gt;FLINK-6875&lt;/a&gt;] - Remote DataSet API job submission timing out
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6881&quot;&gt;FLINK-6881&lt;/a&gt;] - Creating a table from a POJO and defining a time attribute fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6883&quot;&gt;FLINK-6883&lt;/a&gt;] - Serializer for collection of Scala case classes are generated with different anonymous class names in 1.3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6886&quot;&gt;FLINK-6886&lt;/a&gt;] - Fix Timestamp field can not be selected in event time case when toDataStream[T], `T` not a `Row` Type.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6896&quot;&gt;FLINK-6896&lt;/a&gt;] - Creating a table from a POJO and use table sink to output fail
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6899&quot;&gt;FLINK-6899&lt;/a&gt;] - Wrong state array size in NestedMapsStateTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6914&quot;&gt;FLINK-6914&lt;/a&gt;] - TrySerializer#ensureCompatibility causes StackOverflowException
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6915&quot;&gt;FLINK-6915&lt;/a&gt;] - EnumValueSerializer broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6921&quot;&gt;FLINK-6921&lt;/a&gt;] - EnumValueSerializer cannot properly handle appended enum values
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6922&quot;&gt;FLINK-6922&lt;/a&gt;] - Enum(Value)SerializerConfigSnapshot uses Java serialization to store enum values
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6930&quot;&gt;FLINK-6930&lt;/a&gt;] - Selecting window start / end on row-based Tumble/Slide window causes NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6932&quot;&gt;FLINK-6932&lt;/a&gt;] - Update the inaccessible Dataflow Model paper link
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6941&quot;&gt;FLINK-6941&lt;/a&gt;] - Selecting window start / end on over window causes field not resolve exception
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6948&quot;&gt;FLINK-6948&lt;/a&gt;] - EnumValueSerializer cannot handle removed enum values
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; Improvement
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5354&quot;&gt;FLINK-5354&lt;/a&gt;] - Split up Table API documentation into multiple pages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6038&quot;&gt;FLINK-6038&lt;/a&gt;] - Add deep links to Apache Bahir Flink streaming connector documentations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6796&quot;&gt;FLINK-6796&lt;/a&gt;] - Allow setting the user code class loader for AbstractStreamOperatorTestHarness
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6803&quot;&gt;FLINK-6803&lt;/a&gt;] - Add test for PojoSerializer when Pojo changes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6859&quot;&gt;FLINK-6859&lt;/a&gt;] - StateCleaningCountTrigger should not delete timer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6929&quot;&gt;FLINK-6929&lt;/a&gt;] - Add documentation for Table API OVER windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6952&quot;&gt;FLINK-6952&lt;/a&gt;] - Add link to Javadocs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6748&quot;&gt;FLINK-6748&lt;/a&gt;] - Table API / SQL Docs: Table API Page
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; Test
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6830&quot;&gt;FLINK-6830&lt;/a&gt;] - Add ITTests for savepoint migration from 1.3
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6320&quot;&gt;FLINK-6320&lt;/a&gt;] - Flakey JobManagerHAJobGraphRecoveryITCase
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6744&quot;&gt;FLINK-6744&lt;/a&gt;] - Flaky ExecutionGraphSchedulingTest
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6913&quot;&gt;FLINK-6913&lt;/a&gt;] - Instable StatefulJobSavepointMigrationITCase.testRestoreSavepoint
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 23 Jun 2017 18:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/06/23/release-1.3.1.html</link>
<guid isPermaLink="true">/news/2017/06/23/release-1.3.1.html</guid>
</item>
<item>
<title>Apache Flink 1.3.0 Release Announcement</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the &lt;a href=&quot;/blog/release_1.3.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;
&lt;p&gt;This is the fourth major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.&lt;/p&gt;
&lt;p&gt;Users can expect Flink releases now in a 4 month cycle. At the beginning of the 1.3 &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+and+Feature+Plan&quot;&gt;release cycle&lt;/a&gt;, the community decided to follow a strict &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Time-based+releases&quot;&gt;time-based release model&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/&quot;&gt;documentation&lt;/a&gt;. Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;. Some highlights of the release are listed below.&lt;/p&gt;
&lt;h1 id=&quot;large-state-handlingrecovery&quot;&gt;Large State Handling/Recovery&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Incremental Checkpointing for RocksDB&lt;/strong&gt;: It is now possible to checkpoint only the difference from the previous successful checkpoint, rather than checkpointing the entire application state. This speeds up checkpointing and saves disk space, because the individual checkpoints are smaller. (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5053&quot;&gt;FLINK-5053&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asynchronous snapshots for heap-based state backends&lt;/strong&gt;: The filesystem and memory statebackends now also support asynchronous snapshots using a copy-on-write HashMap implementation. Asynchronous snapshotting makes Flink more resilient to slow storage systems and expensive serialization. The time an operator blocks on a snapshot is reduced to a minimum (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6048&quot;&gt;FLINK-6048&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5715&quot;&gt;FLINK-5715&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Allow upgrades to state serializers:&lt;/strong&gt; Users can now upgrade serializers, while keeping their application state. One use case of this is upgrading custom serializers used for managed operator state/keyed state. Also, registration order for POJO types/Kryo types is now no longer fixed (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/state.html#custom-serialization-for-managed-state&quot;&gt;Documentation&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6178&quot;&gt;FLINK-6178&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Recover job state at the granularity of operator&lt;/strong&gt;: Before Flink 1.3, operator state was bound to Flink’s internal “Task” representation. This made it hard to change a job’s topology while keeping its state around. With this change, users are allowed to do more topology changes (un-chain operators) by restoring state into logical operators instead of “Tasks” (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5892&quot;&gt;FLINK-5892&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fine-grained recovery&lt;/strong&gt; (beta): Instead of restarting the complete ExecutionGraph in case of a task failure, Flink is now able to restart only the affected subgraph and thereby significantly decrease recovery time (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4256&quot;&gt;FLINK-4256&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;datastream-api&quot;&gt;DataStream API&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Side Outputs&lt;/strong&gt;: This change allows users to have more than one output stream for an operator. Operator metadata, internal system information (debugging, performance etc.) or rejected/late elements are potential use-cases for this new API feature. &lt;strong&gt;The Window operator is now using this new feature for late window elements&lt;/strong&gt; (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/side_output.html&quot;&gt;Side Outputs Documentation&lt;/a&gt;, &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4460&quot;&gt;FLINK-4460&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Union Operator State&lt;/strong&gt;: Flink 1.2.0 introduced broadcast state functionality, but this had not yet been exposed via a public API. Flink 1.3.0 provides the Union Operator State API for exposing broadcast operator state. The union state will send the entire state across all parallel instances to each instance on restore, giving each operator a full view of the state (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5991&quot;&gt;FLINK-5991&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Per-Window State&lt;/strong&gt;: Previously, the state that a WindowFunction or ProcessWindowFunction could access was scoped to the key of the window but not the window itself. With this new feature, users can keep window state independent of the key (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5929&quot;&gt;FLINK-5929&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;deployment-and-tooling&quot;&gt;Deployment and Tooling&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Flink HistoryServer&lt;/strong&gt;: Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/historyserver.html&quot;&gt;HistoryServer&lt;/a&gt; now allows you to query the status and statistics of completed jobs that have been archived by a JobManager (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-1579&quot;&gt;FLINK-1579&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Watermark Monitoring in Web Front-end&lt;/strong&gt;: For easier diagnosis of watermark issues, the Flink JobManager front-end now provides a new tab to track the watermark of each operator (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3427&quot;&gt;FLINK-3427&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Datadog HTTP Metrics Reporter&lt;/strong&gt;: Datadog is a widely-used metrics system, and Flink now offers a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#datadog-orgapacheflinkmetricsdatadogdatadoghttpreporter&quot;&gt;Datadog reporter&lt;/a&gt; that contacts the Datadog http endpoint directly (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6013&quot;&gt;FLINK-6013&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Network Buffer Configuration&lt;/strong&gt;: We finally got rid of the tedious network buffer configuration and replaced it with a more generic approach. First of all, you may now follow the idiom “more is better” without any penalty on the latency which could previously occur due to excessive buffering in incoming and outgoing channels. Secondly, instead of defining an absolute number of network buffers, we now use fractions of the available JVM memory (10% by default). This should cover more use cases by default and may also be tweaked by defining a minimum and maximum size.&lt;/p&gt;
&lt;p&gt;→ See &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/config.html#configuring-the-network-buffers&quot;&gt;Configuring the Network Buffers&lt;/a&gt; in the Flink documentation.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;table-api--sql&quot;&gt;Table API / SQL&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Support for Retractions in Table API / SQL&lt;/strong&gt;: As part of our endeavor to support continuous queries on &lt;a href=&quot;http://flink.apache.org/news/2017/04/04/dynamic-tables.html&quot;&gt;Dynamic Tables&lt;/a&gt;, Retraction is an important building block that will enable a whole range of new applications which require updating previously-emitted results. Examples for such use cases are computation of early results for long-running windows, updates due to late arriving data, or maintaining constantly changing results similar to materialized views in relational database systems. Flink 1.3.0 supports retraction for non-windowed aggregates. Results with updates can be either converted into a DataStream or materialized to external data stores using TableSinks with upsert or retraction support.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Extended support for aggregations in Table API / SQL&lt;/strong&gt;: With Flink 1.3.0, the Table API and SQL support many more types of aggregations, including
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;GROUP BY window aggregations in SQL (via the window functions &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6011&quot;&gt;TUMBLE, HOP, and SESSION windows&lt;/a&gt;) for both batch and streaming.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;SQL OVER window aggregations (only for streaming)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Non-windowed aggregations (in streaming with retractions).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;User-defined aggregation functions for custom aggregation logic.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;External catalog support&lt;/strong&gt;: The Table API &amp;amp; SQL allows to register external catalogs. Table API and SQL queries can then have access to table sources and their schema from the external catalogs without register those tables one by one.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;→ See &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/table_api.html#group-windows&quot;&gt;the Flink documentation&lt;/a&gt; for details about these features.&lt;/p&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
The Table API / SQL documentation is currently being reworked. The community plans to publish the updated docs in the week of June 5th.
&lt;/div&gt;
&lt;h1 id=&quot;connectors&quot;&gt;Connectors&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;ElasticSearch 5.x support&lt;/strong&gt;: The ElasticSearch connectors have been restructured to have a common base module and specific modules for ES 1, 2 and 5, similar to how the Kafka connectors are organized. This will make fixes and future improvements available across all ES versions (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4988&quot;&gt;FLINK-4988&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Allow rescaling the Kinesis Consumer&lt;/strong&gt;: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the Kinesis Consumer also makes use of that engine feature (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4821&quot;&gt;FLINK-4821&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Transparent shard discovery for Kinesis Consumer&lt;/strong&gt;: The Kinesis consumer can now discover new shards without failing / restarting jobs when a resharding is happening (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4577&quot;&gt;FLINK-4577&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Allow setting custom start positions for the Kafka consumer&lt;/strong&gt;: With this change, you can instruct Flink’s Kafka consumer to start reading messages from a specific offset (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3123&quot;&gt;FLINK-3123&lt;/a&gt;) or earliest / latest offset (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4280&quot;&gt;FLINK-4280&lt;/a&gt;) without respecting committed offsets in Kafka.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Allow out-opt from offset committing for the Kafka consumer&lt;/strong&gt;: By default, Kafka commits the offsets to the Kafka broker once a checkpoint has been completed. This change allows users to disable this mechanism (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3398&quot;&gt;FLINK-3398&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;cep-library&quot;&gt;CEP Library&lt;/h1&gt;
&lt;p&gt;The CEP library has been greatly enhanced and is now able to accommodate more use-cases out-of-the-box (expressivity enhancements), make more efficient use of the available resources, adjust to changing runtime conditions–all without breaking backwards compatibility of operator state.&lt;/p&gt;
&lt;p&gt;Please note that the API of the CEP library has been updated with this release.&lt;/p&gt;
&lt;p&gt;Below are some of the main features of the revamped CEP library:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Make CEP operators rescalable&lt;/strong&gt;: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the CEP library also makes use of that engine feature (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5420&quot;&gt;FLINK-5420&lt;/a&gt;).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;New operators for the CEP library&lt;/strong&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Quantifiers (*,+,?) for the pattern API (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3318&quot;&gt;FLINK-3318&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for different continuity requirements (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6208&quot;&gt;FLINK-6208&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Support for iterative conditions (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6197&quot;&gt;FLINK-6197&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;gelly-library&quot;&gt;Gelly Library&lt;/h1&gt;
&lt;ul&gt;
&lt;li&gt;Unified driver for running Gelly examples &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4949&quot;&gt;FLINK-4949&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;PageRank algorithm for directed graphs (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4896&quot;&gt;FLINK-4896&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Add Circulant and Echo graph generators (&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6393&quot;&gt;FLINK-6393&lt;/a&gt;).&lt;/li&gt;
&lt;/ul&gt;
&lt;h1 id=&quot;known-issues&quot;&gt;Known Issues&lt;/h1&gt;
&lt;div class=&quot;alert alert-warning&quot;&gt;
There are two &lt;strong&gt;known issues&lt;/strong&gt; in Flink 1.3.0. Both will be addressed in the &lt;i&gt;1.3.1&lt;/i&gt; release.
&lt;br /&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6783&quot;&gt;FLINK-6783&lt;/a&gt;: Wrongly extracted TypeInformations for &lt;code&gt;WindowedStream::aggregate&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6775&quot;&gt;FLINK-6775&lt;/a&gt;: StateDescriptor cannot be shared by multiple subtasks&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h1 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h1&gt;
&lt;p&gt;According to git shortlog, the following 103 people contributed to the 1.3.0 release. Thank you to all contributors!&lt;/p&gt;
&lt;p&gt;Addison Higham, Alexey Diomin, Aljoscha Krettek, Andrea Sella, Andrey Melentyev, Anton Mushin, barcahead, biao.liub, Bowen Li, Chen Qin, Chico Sokol, David Anderson, Dawid Wysakowicz, DmytroShkvyra, Fabian Hueske, Fabian Wollert, fengyelei, Flavio Pompermaier, FlorianFan, Fokko Driesprong, Geoffrey Mon, godfreyhe, gosubpl, Greg Hogan, guowei.mgw, hamstah, Haohui Mai, Hequn Cheng, hequn.chq, heytitle, hongyuhong, Jamie Grier, Jark Wu, jingzhang, Jinkui Shi, Jin Mingjian, Joerg Schad, Joshua Griffith, Jürgen Thomann, kaibozhou, Kathleen Sharp, Ken Geis, kkloudas, Kurt Young, lincoln-lil, lingjinjiang, liuyuzhong7, Lorenz Buehmann, manuzhang, Marc Tremblay, Mauro Cortellazzi, Max Kuklinski, mengji.fy, Mike Dias, mtunique, Nico Kruber, Omar Erminy, Patrick Lucas, paul, phoenixjiangnan, rami-alisawi, Ramkrishna, Rick Cox, Robert Metzger, Rodrigo Bonifacio, rtudoran, Seth Wiesman, Shaoxuan Wang, shijinkui, shuai.xus, Shuyi Chen, spkavuly, Stefano Bortoli, Stefan Richter, Stephan Ewen, Stephen Gran, sunjincheng121, tedyu, Till Rohrmann, tonycox, Tony Wei, twalthr, Tzu-Li (Gordon) Tai, Ufuk Celebi, Ventura Del Monte, Vijay Srinivasaraghavan, WangTaoTheTonic, wenlong.lwl, xccui, xiaogang.sxg, Xpray, zcb, zentol, zhangminglei, Zhenghua Gao, Zhijiang, Zhuoluo Yang, zjureel, Zohar Mizrahi, 士远, 槿瑜, 淘江, 金竹&lt;/p&gt;
</description>
<pubDate>Thu, 01 Jun 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/06/01/release-1.3.0.html</link>
<guid isPermaLink="true">/news/2017/06/01/release-1.3.0.html</guid>
</item>
<item>
<title>Introducing Docker Images for Apache Flink</title>
<description>&lt;p&gt;For some time, the Apache Flink community has provided scripts to build a Docker image to run Flink. Now, starting with version 1.2.1, Flink will have a &lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;Docker image&lt;/a&gt; on the Docker Hub. This image is maintained by the Flink community and curated by the &lt;a href=&quot;https://github.com/docker-library/official-images&quot;&gt;Docker&lt;/a&gt; team to ensure it meets the quality standards for container images of the Docker community.&lt;/p&gt;
&lt;p&gt;A community-maintained way to run Apache Flink on Docker and other container runtimes and orchestrators is part of the ongoing effort by the Flink community to make Flink a first-class citizen of the container world.&lt;/p&gt;
&lt;p&gt;If you want to use the Docker image today you can get the latest version by running:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker pull flink
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;And to run a local Flink cluster with one TaskManager and the Web UI exposed on port 8081, run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;docker run -t -p 8081:8081 flink local
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;With this image there are various ways to start a Flink cluster, both locally and in a distributed environment. Take a look at the &lt;a href=&quot;https://hub.docker.com/r/_/flink/&quot;&gt;documentation&lt;/a&gt; that shows how to run a Flink cluster with multiple TaskManagers locally using Docker Compose or across multiple machines using Docker Swarm. You can also use the examples as a reference to create configurations for other platforms like Mesos and Kubernetes.&lt;/p&gt;
&lt;p&gt;While this announcement is an important milestone, it’s just the first step to help users run containerized Flink in production. There are &lt;a href=&quot;https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20Docker%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC&quot;&gt;improvements&lt;/a&gt; to be made in Flink itself and we will continue to improve these Docker images and for the documentation and examples surrounding them.&lt;/p&gt;
&lt;p&gt;This is of course a team effort, so any contribution is welcome. The &lt;a href=&quot;https://github.com/docker-flink&quot;&gt;docker-flink&lt;/a&gt; GitHub organization hosts the source files to &lt;a href=&quot;https://github.com/docker-flink/docker-flink&quot;&gt;generate the images&lt;/a&gt; and the &lt;a href=&quot;https://github.com/docker-flink/docs/tree/master/flink&quot;&gt;documentation&lt;/a&gt; that is presented alongside the images on Docker Hub.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Disclaimer: The docker images are provided as a community project by individuals on a best-effort basis. They are not official releases by the Apache Flink PMC.&lt;/em&gt;&lt;/p&gt;
</description>
<pubDate>Tue, 16 May 2017 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/05/16/official-docker-image.html</link>
<guid isPermaLink="true">/news/2017/05/16/official-docker-image.html</guid>
</item>
<item>
<title>Apache Flink 1.2.1 Released</title>
<description>&lt;p&gt;The Apache Flink community released the first bugfix version of the Apache Flink 1.2 series.&lt;/p&gt;
&lt;p&gt;This release includes many critical fixes for Flink 1.2.0. The list below includes a detailed list of all fixes.&lt;/p&gt;
&lt;p&gt;We highly recommend all users to upgrade to Flink 1.2.1.&lt;/p&gt;
&lt;p&gt;Please note that there are two unresolved major issues in Flink 1.2.1 and 1.2.0:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6353&quot;&gt;FLINK-6353&lt;/a&gt; Restoring using CheckpointedRestoring does not work from 1.2 to 1.2&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6188&quot;&gt;FLINK-6188&lt;/a&gt; Some setParallelism() methods can’t cope with default parallelism&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.2.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Release Notes - Flink - Version 1.2.1&lt;/h2&gt;
&lt;h3&gt; Sub-task
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5546&quot;&gt;FLINK-5546&lt;/a&gt;] - java.io.tmpdir setted as project build directory in surefire plugin
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5640&quot;&gt;FLINK-5640&lt;/a&gt;] - configure the explicit Unit Test file suffix
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5723&quot;&gt;FLINK-5723&lt;/a&gt;] - Use &amp;quot;Used&amp;quot; instead of &amp;quot;Initial&amp;quot; to make taskmanager tag more readable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5825&quot;&gt;FLINK-5825&lt;/a&gt;] - In yarn mode, a small pic can not be loaded
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; Bug
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4813&quot;&gt;FLINK-4813&lt;/a&gt;] - Having flink-test-utils as a dependency outside Flink fails the build
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4848&quot;&gt;FLINK-4848&lt;/a&gt;] - keystoreFilePath should be checked against null in SSLUtils#createSSLServerContext
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5628&quot;&gt;FLINK-5628&lt;/a&gt;] - CheckpointStatsTracker implements Serializable but isn&amp;#39;t
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5644&quot;&gt;FLINK-5644&lt;/a&gt;] - Task#lastCheckpointSize metric broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5650&quot;&gt;FLINK-5650&lt;/a&gt;] - Flink-python tests executing cost too long time
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5652&quot;&gt;FLINK-5652&lt;/a&gt;] - Memory leak in AsyncDataStream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5669&quot;&gt;FLINK-5669&lt;/a&gt;] - flink-streaming-contrib DataStreamUtils.collect in local environment mode fails when offline
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5678&quot;&gt;FLINK-5678&lt;/a&gt;] - User-defined TableFunctions do not support all types of parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5699&quot;&gt;FLINK-5699&lt;/a&gt;] - Cancel with savepoint fails with a NPE if savepoint target directory not set
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5701&quot;&gt;FLINK-5701&lt;/a&gt;] - FlinkKafkaProducer should check asyncException on checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5708&quot;&gt;FLINK-5708&lt;/a&gt;] - we should remove duplicated configuration options
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5732&quot;&gt;FLINK-5732&lt;/a&gt;] - Java quick start mvn command line is incorrect
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5749&quot;&gt;FLINK-5749&lt;/a&gt;] - unset HADOOP_HOME and HADOOP_CONF_DIR to avoid env in build machine failing the UT and IT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5751&quot;&gt;FLINK-5751&lt;/a&gt;] - 404 in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5771&quot;&gt;FLINK-5771&lt;/a&gt;] - DelimitedInputFormat does not correctly handle multi-byte delimiters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5773&quot;&gt;FLINK-5773&lt;/a&gt;] - Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5806&quot;&gt;FLINK-5806&lt;/a&gt;] - TaskExecutionState toString format have wrong key
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5814&quot;&gt;FLINK-5814&lt;/a&gt;] - flink-dist creates wrong symlink when not used with cleaned before
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5817&quot;&gt;FLINK-5817&lt;/a&gt;] - Fix test concurrent execution failure by test dir conflicts.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5828&quot;&gt;FLINK-5828&lt;/a&gt;] - BlobServer create cache dir has concurrency safety problem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5885&quot;&gt;FLINK-5885&lt;/a&gt;] - Java code snippet instead of scala in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5907&quot;&gt;FLINK-5907&lt;/a&gt;] - RowCsvInputFormat bug on parsing tsv
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5934&quot;&gt;FLINK-5934&lt;/a&gt;] - Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5940&quot;&gt;FLINK-5940&lt;/a&gt;] - ZooKeeperCompletedCheckpointStore cannot handle broken state handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5942&quot;&gt;FLINK-5942&lt;/a&gt;] - Harden ZooKeeperStateHandleStore to deal with corrupted data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5945&quot;&gt;FLINK-5945&lt;/a&gt;] - Close function in OuterJoinOperatorBase#executeOnCollections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5949&quot;&gt;FLINK-5949&lt;/a&gt;] - Flink on YARN checks for Kerberos credentials for non-Kerberos authentication methods
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5962&quot;&gt;FLINK-5962&lt;/a&gt;] - Cancel checkpoint canceller tasks in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5965&quot;&gt;FLINK-5965&lt;/a&gt;] - Typo on DropWizard wrappers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5972&quot;&gt;FLINK-5972&lt;/a&gt;] - Don&amp;#39;t allow shrinking merging windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5985&quot;&gt;FLINK-5985&lt;/a&gt;] - Flink treats every task as stateful (making topology changes impossible)
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6000&quot;&gt;FLINK-6000&lt;/a&gt;] - Can not start HA cluster with start-cluster.sh
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6001&quot;&gt;FLINK-6001&lt;/a&gt;] - NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and allowedLateness
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6002&quot;&gt;FLINK-6002&lt;/a&gt;] - Documentation: &amp;#39;MacOS X&amp;#39; under &amp;#39;Download and Start Flink&amp;#39; in Quickstart page is not rendered correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6006&quot;&gt;FLINK-6006&lt;/a&gt;] - Kafka Consumer can lose state if queried partition list is incomplete on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6025&quot;&gt;FLINK-6025&lt;/a&gt;] - User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6051&quot;&gt;FLINK-6051&lt;/a&gt;] - Wrong metric scope names in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6084&quot;&gt;FLINK-6084&lt;/a&gt;] - Cassandra connector does not declare all dependencies
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6133&quot;&gt;FLINK-6133&lt;/a&gt;] - fix build status in README.md
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6170&quot;&gt;FLINK-6170&lt;/a&gt;] - Some checkpoint metrics rely on latest stat snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6181&quot;&gt;FLINK-6181&lt;/a&gt;] - Zookeeper scripts use invalid regex
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6182&quot;&gt;FLINK-6182&lt;/a&gt;] - Fix possible NPE in SourceStreamTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6183&quot;&gt;FLINK-6183&lt;/a&gt;] - TaskMetricGroup may not be cleanup when Task.run() is never called or exits early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6184&quot;&gt;FLINK-6184&lt;/a&gt;] - Buffer metrics can cause NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6203&quot;&gt;FLINK-6203&lt;/a&gt;] - DataSet Transformations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6207&quot;&gt;FLINK-6207&lt;/a&gt;] - Duplicate type serializers for async snapshots of CopyOnWriteStateTable
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6308&quot;&gt;FLINK-6308&lt;/a&gt;] - Task managers are not attaching to job manager on macos
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; Improvement
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4326&quot;&gt;FLINK-4326&lt;/a&gt;] - Flink start-up scripts should optionally start services on the foreground
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5217&quot;&gt;FLINK-5217&lt;/a&gt;] - Deprecated interface Checkpointed make clear suggestion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5331&quot;&gt;FLINK-5331&lt;/a&gt;] - PythonPlanBinderTest idling extremely long
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5581&quot;&gt;FLINK-5581&lt;/a&gt;] - Improve Kerberos security related documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5639&quot;&gt;FLINK-5639&lt;/a&gt;] - Clarify License implications of RabbitMQ Connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5680&quot;&gt;FLINK-5680&lt;/a&gt;] - Document env.ssh.opts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5681&quot;&gt;FLINK-5681&lt;/a&gt;] - Make ReaperThread for SafetyNetCloseableRegistry a singleton
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5702&quot;&gt;FLINK-5702&lt;/a&gt;] - Kafka Producer docs should warn if using setLogFailuresOnly, at-least-once is compromised
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5705&quot;&gt;FLINK-5705&lt;/a&gt;] - webmonitor&amp;#39;s request/response use UTF-8 explicitly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5713&quot;&gt;FLINK-5713&lt;/a&gt;] - Protect against NPE in WindowOperator window cleanup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5721&quot;&gt;FLINK-5721&lt;/a&gt;] - Add FoldingState to State Documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5800&quot;&gt;FLINK-5800&lt;/a&gt;] - Make sure that the CheckpointStreamFactory is instantiated once per operator only
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5805&quot;&gt;FLINK-5805&lt;/a&gt;] - improve docs for ProcessFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5807&quot;&gt;FLINK-5807&lt;/a&gt;] - improved wording for doc home page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5837&quot;&gt;FLINK-5837&lt;/a&gt;] - improve readability of the queryable state docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5876&quot;&gt;FLINK-5876&lt;/a&gt;] - Mention Scala type fallacies for queryable state client serializers
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5877&quot;&gt;FLINK-5877&lt;/a&gt;] - Fix Scala snippet in Async I/O API doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5894&quot;&gt;FLINK-5894&lt;/a&gt;] - HA docs are misleading re: state backends
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5895&quot;&gt;FLINK-5895&lt;/a&gt;] - Reduce logging aggressiveness of FileSystemSafetyNet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5938&quot;&gt;FLINK-5938&lt;/a&gt;] - Replace ExecutionContext by Executor in Scheduler
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6212&quot;&gt;FLINK-6212&lt;/a&gt;] - Missing reference to flink-avro dependency
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; New Feature
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6139&quot;&gt;FLINK-6139&lt;/a&gt;] - Documentation for building / preparing Flink for MapR
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt; Task
&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2883&quot;&gt;FLINK-2883&lt;/a&gt;] - Add documentation to forbid key-modifying ReduceFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3903&quot;&gt;FLINK-3903&lt;/a&gt;] - Homebrew Installation
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 26 Apr 2017 20:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/04/26/release-1.2.1.html</link>
<guid isPermaLink="true">/news/2017/04/26/release-1.2.1.html</guid>
</item>
<item>
<title>Continuous Queries on Dynamic Tables</title>
<description>&lt;h4 id=&quot;analyzing-data-streams-with-sql&quot;&gt;Analyzing Data Streams with SQL&lt;/h4&gt;
&lt;p&gt;More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/streams.png&quot; style=&quot;width:45%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. Due to these features, Flink is able to compute exact and deterministic results from high-volume input streams in near real-time while providing exactly-once semantics in case of failures.&lt;/p&gt;
&lt;p&gt;Flink’s core API for stream processing, the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream API&lt;/a&gt;, is very expressive and provides primitives for many common operations. Among other features, it offers highly customizable windowing logic, different state primitives with varying performance characteristics, hooks to register and react on timers, and tooling for efficient asynchronous requests to external systems. On the other hand, many stream analytics applications follow similar patterns and do not require the level of expressiveness as provided by the DataStream API. They could be expressed in a more natural and concise way using a domain specific language. As we all know, SQL is the de-facto standard for data analytics. For streaming analytics, SQL would enable a larger pool of people to specify applications on data streams in less time. However, no open source stream processor offers decent SQL support yet.&lt;/p&gt;
&lt;h2 id=&quot;why-is-sql-on-streams-a-big-deal&quot;&gt;Why is SQL on Streams a Big Deal?&lt;/h2&gt;
&lt;p&gt;SQL is the most widely used language for data analytics for many good reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;SQL is declarative: You specify what you want but not how to compute it.&lt;/li&gt;
&lt;li&gt;SQL can be effectively optimized: An optimizer figures out an efficient plan to compute your result.&lt;/li&gt;
&lt;li&gt;SQL can be efficiently evaluated: The processing engine knows exactly what to compute and how to do so efficiently.&lt;/li&gt;
&lt;li&gt;And finally, everybody knows and many tools speak SQL.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So being able to process and analyze data streams with SQL makes stream processing technology available to many more users. Moreover, it significantly reduces the time and effort to define efficient stream analytics applications due to the SQL’s declarative nature and potential to be automatically optimized.&lt;/p&gt;
&lt;p&gt;However, SQL (and the relational data model and algebra) were not designed with streaming data in mind. Relations are (multi-)sets and not infinite sequences of tuples. When executing a SQL query, conventional database systems and query engines read and process a data set, which is completely available, and produce a fixed sized result. In contrast, data streams continuously provide new records such that data arrives over time. Hence, streaming queries have to continuously process the arriving data and never “complete”.&lt;/p&gt;
&lt;p&gt;That being said, processing streams with SQL is not impossible. Some relational database systems feature eager maintenance of materialized views, which is similar to evaluating SQL queries on streams of data. A materialized view is defined as a SQL query just like a regular (virtual) view. However, the result of the query is actually stored (or materialized) in memory or on disk such that the view does not need to be computed on-the-fly when it is queried. In order to prevent that a materialized view becomes stale, the database system needs to update the view whenever its base relations (the tables referenced in its definition query) are modified. If we consider the changes on the view’s base relations as a stream of modifications (or as a changelog stream) it becomes obvious that materialized view maintenance and SQL on streams are somehow related.&lt;/p&gt;
&lt;h2 id=&quot;flinks-relational-apis-table-api-and-sql&quot;&gt;Flink’s Relational APIs: Table API and SQL&lt;/h2&gt;
&lt;p&gt;Since version 1.1.0 (released in August 2016), Flink features two semantically equivalent relational APIs, the language-embedded Table API (for Java and Scala) and standard SQL. Both APIs are designed as unified APIs for online streaming and historic batch data. This means that,&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;a query produces exactly the same result regardless whether its input is static batch data or streaming data.&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Unified APIs for stream and batch processing are important for several reasons. First of all, users only need to learn a single API to process static and streaming data. Moreover, the same query can be used to analyze batch and streaming data, which allows to jointly analyze historic and live data in the same query. At the current state we haven’t achieved complete unification of batch and streaming semantics yet, but the community is making very good progress towards this goal.&lt;/p&gt;
&lt;p&gt;The following code snippet shows two equivalent Table API and SQL queries that compute a simple windowed aggregate on a stream of temperature sensor measurements. The syntax of the SQL query is based on &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite’s&lt;/a&gt; syntax for &lt;a href=&quot;https://calcite.apache.org/docs/reference.html#grouped-window-functions&quot;&gt;grouped window functions&lt;/a&gt; and will be supported in version 1.3.0 of Flink.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setStreamTimeCharacteristic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TimeCharacteristic&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;EventTime&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define a table source to read sensor data (sensorId, time, room, temp)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorTable&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;???&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// can be a CSV file, Kafka topic, database, or ...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register the table source&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensors&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Table API&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tapiResult&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensors&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// scan sensors table&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// define 1-hour window&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// group by window and room&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;temp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTemp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// compute average temperature&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// SQL&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sqlResult&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |SELECT room, TUMBLE_END(rowtime, INTERVAL &amp;#39;1&amp;#39; HOUR), AVG(temp) AS avgTemp&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |FROM sensors&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |GROUP BY TUMBLE(rowtime, INTERVAL &amp;#39;1&amp;#39; HOUR), room&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt; |&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;stripMargin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As you can see, both APIs are tightly integrated with each other and Flink’s primary &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html&quot;&gt;DataSet&lt;/a&gt; APIs. A &lt;code&gt;Table&lt;/code&gt; can be generated from and converted to a &lt;code&gt;DataSet&lt;/code&gt; or &lt;code&gt;DataStream&lt;/code&gt;. Hence, it is easily possible to scan an external table source such as a database or &lt;a href=&quot;https://parquet.apache.org&quot;&gt;Parquet&lt;/a&gt; file, do some preprocessing with a Table API query, convert the result into a &lt;code&gt;DataSet&lt;/code&gt; and run a &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/gelly/index.html&quot;&gt;Gelly&lt;/a&gt; graph algorithm on it. The queries defined in the example above can also be used to process batch data by changing the execution environment.&lt;/p&gt;
&lt;p&gt;Internally, both APIs are translated into the same logical representation, optimized by Apache Calcite, and compiled into DataStream or DataSet programs. In fact, the optimization and translation process does not know whether a query was defined using the Table API or SQL. If you are curious about the details of the optimization process, have a look at &lt;a href=&quot;http://flink.apache.org/news/2016/05/24/stream-sql.html&quot;&gt;a blog post&lt;/a&gt; that we published last year. Since the Table API and SQL are equivalent in terms of semantics and only differ in syntax, we always refer to both APIs when we talk about SQL in this post.&lt;/p&gt;
&lt;p&gt;In its current state (version 1.2.0), Flink’s relational APIs support a limited set of relational operators on data streams, including projections, filters, and windowed aggregates. All supported operators have in common that they never update result records which have been emitted. This is clearly not an issue for record-at-a-time operators such as projection and filter. However, it affects operators that collect and process multiple records as for instance windowed aggregates. Since emitted results cannot be updated, input records, which arrive after a result has been emitted, have to be discarded in Flink 1.2.0.&lt;/p&gt;
&lt;p&gt;The limitations of the current version are acceptable for applications that emit data to storage systems such as Kafka topics, message queues, or files which only support append operations and no updates or deletes. Common use cases that follow this pattern are for example continuous ETL and stream archiving applications that persist streams to an archive or prepare data for further online (streaming) analysis or later offline analysis. Since it is not possible to update previously emitted results, these kinds of applications have to make sure that the emitted results are correct and will not need to be corrected in the future. The following figure illustrates such applications.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-append-out.png&quot; style=&quot;width:60%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;While queries that only support appends are useful for some kinds of applications and certain types of storage systems, there are many streaming analytics use cases that need to update results. This includes streaming applications that cannot discard late arriving records, need early results for (long-running) windowed aggregates, or require non-windowed aggregates. In each of these cases, previously emitted result records need to be updated. Result-updating queries often materialize their result to an external database or key-value store in order to make it accessible and queryable for external applications. Applications that implement this pattern are dashboards, reporting applications, or &lt;a href=&quot;http://2016.flink-forward.org/kb_sessions/joining-infinity-windowless-stream-processing-with-flink/&quot;&gt;other applications&lt;/a&gt;, which require timely access to continuously updated results. The following figure illustrates these kind of applications.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-update-out.png&quot; style=&quot;width:60%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;h2 id=&quot;continuous-queries-on-dynamic-tables&quot;&gt;Continuous Queries on Dynamic Tables&lt;/h2&gt;
&lt;p&gt;Support for queries that update previously emitted results is the next big step for Flink’s relational APIs. This feature is so important because it vastly increases the scope of the APIs and the range of supported use cases. Moreover, many of the newly supported use cases can be challenging to implement using the DataStream API.&lt;/p&gt;
&lt;p&gt;So when adding support for result-updating queries, we must of course preserve the unified semantics for stream and batch inputs. We achieve this by the concept of &lt;em&gt;Dynamic Tables&lt;/em&gt;. A dynamic table is a table that is continuously updated and can be queried like a regular, static table. However, in contrast to a query on a batch table which terminates and returns a static table as result, a query on a dynamic table runs continuously and produces a table that is continuously updated depending on the modification on the input table. Hence, the resulting table is a dynamic table as well. This concept is very similar to materialized view maintenance as we discussed before.&lt;/p&gt;
&lt;p&gt;Assuming we can run queries on dynamic tables which produce new dynamic tables, the next question is, How do streams and dynamic tables relate to each other? The answer is that streams can be converted into dynamic tables and dynamic tables can be converted into streams. The following figure shows the conceptual model of processing a relational query on a stream.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/stream-query-stream.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;First, the stream is converted into a dynamic table. The dynamic table is queried with a continuous query, which produces a new dynamic table. Finally, the resulting table is converted back into a stream. It is important to note that this is only the logical model and does not imply how the query is actually executed. In fact, a continuous query is internally translated into a conventional DataStream program.&lt;/p&gt;
&lt;p&gt;In the following, we describe the different steps of this model:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Defining a dynamic table on a stream,&lt;/li&gt;
&lt;li&gt;Querying a dynamic table, and&lt;/li&gt;
&lt;li&gt;Emitting a dynamic table.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&quot;defining-a-dynamic-table-on-a-stream&quot;&gt;Defining a Dynamic Table on a Stream&lt;/h2&gt;
&lt;p&gt;The first step of evaluating a SQL query on a dynamic table is to define a dynamic table on a stream. This means we have to specify how the records of a stream modify the dynamic table. The stream must carry records with a schema that is mapped to the relational schema of the table. There are two modes to define a dynamic table on a stream: &lt;em&gt;Append Mode&lt;/em&gt; and &lt;em&gt;Update Mode&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;In append mode each stream record is an insert modification to the dynamic table. Hence, all records of a stream are appended to the dynamic table such that it is ever-growing and infinite in size. The following figure illustrates the append mode.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/append-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;In update mode a stream record can represent an insert, update, or delete modification on the dynamic table (append mode is in fact a special case of update mode). When defining a dynamic table on a stream via update mode, we can specify a unique key attribute on the table. In that case, update and delete operations are performed with respect to the key attribute. The update mode is visualized in the following figure.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/replace-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;h2 id=&quot;querying-a-dynamic-table&quot;&gt;Querying a Dynamic Table&lt;/h2&gt;
&lt;p&gt;Once we have defined a dynamic table, we can run a query on it. Since dynamic tables change over time, we have to define what it means to query a dynamic table. Let’s imagine we take a snapshot of a dynamic table at a specific point in time. This snapshot can be treated as a regular static batch table. We denote a snapshot of a dynamic table &lt;em&gt;A&lt;/em&gt; at a point &lt;em&gt;t&lt;/em&gt; as &lt;em&gt;A[t]&lt;/em&gt;. The snapshot can be queried with any SQL query. The query produces a regular static table as result. We denote the result of a query &lt;em&gt;q&lt;/em&gt; on a dynamic table &lt;em&gt;A&lt;/em&gt; at time &lt;em&gt;t&lt;/em&gt; as &lt;em&gt;q(A[t])&lt;/em&gt;. If we repeatedly compute the result of a query on snapshots of a dynamic table for progressing points in time, we obtain many static result tables which are changing over time and effectively constitute a dynamic table. We define the semantics of a query on a dynamic table as follows.&lt;/p&gt;
&lt;p&gt;A query &lt;em&gt;q&lt;/em&gt; on a dynamic table &lt;em&gt;A&lt;/em&gt; produces a dynamic table &lt;em&gt;R&lt;/em&gt;, which is at each point in time &lt;em&gt;t&lt;/em&gt; equivalent to the result of applying &lt;em&gt;q&lt;/em&gt; on &lt;em&gt;A[t]&lt;/em&gt;, i.e., &lt;em&gt;R[t] = q(A[t])&lt;/em&gt;. This definition implies that running the same query on &lt;em&gt;q&lt;/em&gt; on a batch table and on a streaming table produces the same result. In the following, we show two examples to illustrate the semantics of queries on dynamic tables.&lt;/p&gt;
&lt;p&gt;In the figure below, we see a dynamic input table &lt;em&gt;A&lt;/em&gt; on the left side, which is defined in append mode. At time &lt;em&gt;t = 8&lt;/em&gt;, &lt;em&gt;A&lt;/em&gt; consists of six rows (colored in blue). At time &lt;em&gt;t = 9&lt;/em&gt; and &lt;em&gt;t = 12&lt;/em&gt;, one row is appended to &lt;em&gt;A&lt;/em&gt; (visualized in green and orange, respectively). We run a simple query on table &lt;em&gt;A&lt;/em&gt; which is shown in the center of the figure. The query groups by attribute &lt;em&gt;k&lt;/em&gt; and counts the records per group. On the right hand side we see the result of query &lt;em&gt;q&lt;/em&gt; at time &lt;em&gt;t = 8&lt;/em&gt; (blue), &lt;em&gt;t = 9&lt;/em&gt; (green), and &lt;em&gt;t = 12&lt;/em&gt; (orange). At each point in time t, the result table is equivalent to a batch query on the dynamic table &lt;em&gt;A&lt;/em&gt; at time &lt;em&gt;t&lt;/em&gt;.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-groupBy-cnt.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The query in this example is a simple grouped (but not windowed) aggregation query. Hence, the size of the result table depends on the number of distinct grouping keys of the input table. Moreover, it is worth noticing that the query continuously updates result rows that it had previously emitted instead of merely adding new rows.&lt;/p&gt;
&lt;p&gt;The second example shows a similar query which differs in one important aspect. In addition to grouping on the key attribute &lt;em&gt;k&lt;/em&gt;, the query also groups records into tumbling windows of five seconds, which means that it computes a count for each value of &lt;em&gt;k&lt;/em&gt; every five seconds. Again, we use Calcite’s &lt;a href=&quot;https://calcite.apache.org/docs/reference.html#grouped-window-functions&quot;&gt;group window functions&lt;/a&gt; to specify this query. On the left side of the figure we see the input table &lt;em&gt;A&lt;/em&gt; and how it changes over time in append mode. On the right we see the result table and how it evolves over time.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/query-groupBy-window-cnt.png&quot; style=&quot;width:80%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;In contrast to the result of the first example, the resulting table grows relative to the time, i.e., every five seconds new result rows are computed (given that the input table received more records in the last five seconds). While the non-windowed query (mostly) updates rows of the result table, the windowed aggregation query only appends new rows to the result table.&lt;/p&gt;
&lt;p&gt;Although this blog post focuses on the semantics of SQL queries on dynamic tables and not on how to efficiently process such a query, we’d like to point out that it is not possible to compute the complete result of a query from scratch whenever an input table is updated. Instead, the query is compiled into a streaming program which continuously updates its result based on the changes on its input. This implies that not all valid SQL queries are supported but only those that can be continuously, incrementally, and efficiently computed. We plan discuss details about the evaluation of SQL queries on dynamic tables in a follow up blog post.&lt;/p&gt;
&lt;h2 id=&quot;emitting-a-dynamic-table&quot;&gt;Emitting a Dynamic Table&lt;/h2&gt;
&lt;p&gt;Querying a dynamic table yields another dynamic table, which represents the query’s results. Depending on the query and its input tables, the result table is continuously modified by insert, update, and delete changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without update modifications, or anything in between.&lt;/p&gt;
&lt;p&gt;Traditional database systems use logs to rebuild tables in case of failures and for replication. There are different logging techniques, such as UNDO, REDO, and UNDO/REDO logging. In a nutshell, UNDO logs record the previous value of a modified element to revert incomplete transactions, REDO logs record the new value of a modified element to redo lost changes of completed transactions, and UNDO/REDO logs record the old and the new value of a changed element to undo incomplete transactions and redo lost changes of completed transactions. Based on the principles of these logging techniques, a dynamic table can be converted into two types of changelog streams, a &lt;em&gt;REDO Stream&lt;/em&gt; and a &lt;em&gt;REDO+UNDO Stream&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;A dynamic table is converted into a redo+undo stream by converting the modifications on the table into stream messages. An insert modification is emitted as an insert message with the new row, a delete modification is emitted as a delete message with the old row, and an update modification is emitted as a delete message with the old row and an insert message with the new row. This behavior is illustrated in the following figure.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/undo-redo-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The left shows a dynamic table which is maintained in append mode and serves as input to the query in the center. The result of the query converted into a redo+undo stream which is shown at the bottom. The first record &lt;em&gt;(1, A)&lt;/em&gt; of the input table results in a new record in the result table and hence in an insert message &lt;em&gt;+(A, 1)&lt;/em&gt; to the stream. The second input record with &lt;em&gt;k = ‘A’&lt;/em&gt; &lt;em&gt;(4, A)&lt;/em&gt; produces an update of the &lt;em&gt;(A, 1)&lt;/em&gt; record in the result table and hence yields a delete message &lt;em&gt;-(A, 1)&lt;/em&gt; and an insert message for &lt;em&gt;+(A, 2)&lt;/em&gt;. All downstream operators or data sinks need to be able to correctly handle both types of messages.&lt;/p&gt;
&lt;p&gt;A dynamic table can be converted into a redo stream in two cases: either it is an append-only table (i.e., it only has insert modifications) or it has a unique key attribute. Each insert modification on the dynamic table results in an insert message with the new row to the redo stream. Due to the restriction of redo streams, only tables with unique keys can have update and delete modifications. If a key is removed from the keyed dynamic table, either because a row is deleted or because the key attribute of a row was modified, a delete message with the removed key is emitted to the redo stream. An update modification yields an update message with the updating, i.e., new row. Since delete and update modifications are defined with respect to the unique key, the downstream operators need to be able to access previous values by key. The figure below shows how the result table of the same query as above is converted into a redo stream.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/dynamic-tables/redo-mode.png&quot; style=&quot;width:70%;margin:10px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The row &lt;em&gt;(1, A)&lt;/em&gt; which yields an insert into the dynamic table results in the &lt;em&gt;+(A, 1)&lt;/em&gt; insert message. The row &lt;em&gt;(4, A)&lt;/em&gt; which produces an update yields the &lt;em&gt;*(A, 2)&lt;/em&gt; update message.&lt;/p&gt;
&lt;p&gt;Common use cases for redo streams are to write the result of a query to an append-only storage system, like rolling files or a Kafka topic, or to a data store with keyed access, such as Cassandra, a relational DBMS, or a compacted Kafka topic. It is also possible to materialize a dynamic table as keyed state inside of the streaming application that evaluates the continuous query and make it queryable from external systems. With this design Flink itself maintains the result of a continuous SQL query on a stream and serves key lookups on the result table, for instance from a dashboard application.&lt;/p&gt;
&lt;h2 id=&quot;what-will-change-when-switching-to-dynamic-tables&quot;&gt;What will Change When Switching to Dynamic Tables?&lt;/h2&gt;
&lt;p&gt;In version 1.2, all streaming operators of Flink’s relational APIs, like filter, project, and group window aggregates, only emit new rows and are not capable of updating previously emitted results. In contrast, dynamic table are able to handle update and delete modifications. Now you might ask yourself, How does the processing model of the current version relate to the new dynamic table model? Will the semantics of the APIs completely change and do we need to reimplement the APIs from scratch to achieve the desired semantics?&lt;/p&gt;
&lt;p&gt;The answer to all these questions is simple. The current processing model is a subset of the dynamic table model. Using the terminology we introduced in this post, the current model converts a stream into a dynamic table in append mode, i.e., an infinitely growing table. Since all operators only accept insert changes and produce insert changes on their result table (i.e., emit new rows), all supported queries result in dynamic append tables, which are converted back into DataStreams using the redo model for append-only tables. Consequently, the semantics of the current model are completely covered and preserved by the new dynamic table model.&lt;/p&gt;
&lt;h2 id=&quot;conclusion-and-outlook&quot;&gt;Conclusion and Outlook&lt;/h2&gt;
&lt;p&gt;Flink’s relational APIs are great to implement stream analytics applications in no time and used in several production settings. In this blog post we discussed the future of the Table API and SQL. This effort will make Flink and stream processing accessible to more people. Moreover, the unified semantics for querying historic and real-time data as well as the concept of querying and maintaining dynamic tables will enable and significantly ease the implementation of many exciting use cases and applications. As this post was focusing on the semantics of relational queries on streams and dynamic tables, we did not discuss the details of how a query will be executed, which includes the internal implementation of retractions, handling of late events, support for early results, and bounding space requirements. We plan to publish a follow up blog post on this topic at a later point in time.&lt;/p&gt;
&lt;p&gt;In recent months, many members of the Flink community have been discussing and contributing to the relational APIs. We made great progress so far. While most work has focused on processing streams in append mode, the next steps on the agenda are to work on dynamic tables to support queries that update their results. If you are excited about the idea of processing streams with SQL and would like to contribute to this effort, please give feedback, join the discussions on the mailing list, or grab a JIRA issue to work on.&lt;/p&gt;
</description>
<pubDate>Tue, 04 Apr 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/04/04/dynamic-tables.html</link>
<guid isPermaLink="true">/news/2017/04/04/dynamic-tables.html</guid>
</item>
<item>
<title>From Streams to Tables and Back Again: An Update on Flink&#39;s Table &amp; SQL API</title>
<description>&lt;p&gt;Stream processing can deliver a lot of value. Many organizations have recognized the benefit of managing large volumes of data in real-time, reacting quickly to trends, and providing customers with live services at scale. Streaming applications with well-defined business logic can deliver a competitive advantage.&lt;/p&gt;
&lt;p&gt;Flink’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStream&lt;/a&gt; abstraction is a powerful API which lets you flexibly define both basic and complex streaming pipelines. Additionally, it offers low-level operations such as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html&quot;&gt;Async IO&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html&quot;&gt;ProcessFunctions&lt;/a&gt;. However, many users do not need such a deep level of flexibility. They need an API which quickly solves 80% of their use cases where simple tasks can be defined using little code.&lt;/p&gt;
&lt;p&gt;To deliver the power of stream processing to a broader set of users, the Apache Flink community is developing APIs that provide simpler abstractions and more concise syntax so that users can focus on their business logic instead of advanced streaming concepts. Along with other APIs (such as &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/libs/cep.html&quot;&gt;CEP&lt;/a&gt; for complex event processing on streams), Flink offers a relational API that aims to unify stream and batch processing: the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html&quot;&gt;Table &amp;amp; SQL API&lt;/a&gt;, often referred to as the Table API.&lt;/p&gt;
&lt;p&gt;Recently, contributors working for companies such as Alibaba, Huawei, data Artisans, and more decided to further develop the Table API. Over the past year, the Table API has been rewritten entirely. Since Flink 1.1, its core has been based on &lt;a href=&quot;http://calcite.apache.org/&quot;&gt;Apache Calcite&lt;/a&gt;, which parses SQL and optimizes all relational queries. Today, the Table API can address a wide range of use cases in both batch and stream environments with unified semantics.&lt;/p&gt;
&lt;p&gt;This blog post summarizes the current status of Flink’s Table API and showcases some of the recently-added features in Apache Flink. Among the features presented here are the unified access to batch and streaming data, data transformation, and window operators.
The following paragraphs are not only supposed to give you a general overview of the Table API, but also to illustrate the potential of relational APIs in the future.&lt;/p&gt;
&lt;p&gt;Because the Table API is built on top of Flink’s core APIs, &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/datastream_api.html&quot;&gt;DataStreams&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html&quot;&gt;DataSets&lt;/a&gt; can be converted to a Table and vice-versa without much overhead. Hereafter, we show how to create tables from different sources and specify programs that can be executed locally or in a distributed setting. In this post, we will use the Scala version of the Table API, but there is also a Java version as well as a SQL API with an equivalent set of features.&lt;/p&gt;
&lt;h2 id=&quot;data-transformation-and-etl&quot;&gt;Data Transformation and ETL&lt;/h2&gt;
&lt;p&gt;A common task in every data processing pipeline is importing data from one or multiple systems, applying some transformations to it, then exporting the data to another system. The Table API can help to manage these recurring tasks. For reading data, the API provides a set of ready-to-use &lt;code&gt;TableSources&lt;/code&gt; such as a &lt;code&gt;CsvTableSource&lt;/code&gt; and &lt;code&gt;KafkaTableSource&lt;/code&gt;, however, it also allows the implementation of custom &lt;code&gt;TableSources&lt;/code&gt; that can hide configuration specifics (e.g. watermark generation) from users who are less familiar with streaming concepts.&lt;/p&gt;
&lt;p&gt;Let’s assume we have a CSV file that stores customer information. The values are delimited by a “|”-character and contain a customer identifier, name, timestamp of the last update, and preferences encoded in a comma-separated key-value string:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;42|Bob Smith|2016-07-23 16:10:11|color=12,length=200,size=200
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The following example illustrates how to read a CSV file and perform some data cleansing before converting it to a regular DataStream program.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// set up execution environment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// configure table source&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customerSource&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;CsvTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;builder&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;/path/to/customer_data.csv&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ignoreFirstLine&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fieldDelimiter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;|&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;id&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;LONG&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;last_update&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;TIMESTAMP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;field&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;prefs&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;build&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// name your table source&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;customerSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define your table program&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;filter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;isNotNull&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;last_update&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;2016-01-01 00:00:00&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTimestamp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;lowerCase&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// convert it to a data stream&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The Table API comes with a large set of built-in functions that make it easy to specify business logic using a language integrated query (LINQ) syntax. In the example above, we filter out customers with invalid names and only select those that updated their preferences recently. We convert names to lowercase for normalization. For debugging purposes, we convert the table into a DataStream and print it.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;CsvTableSource&lt;/code&gt; supports both batch and stream environments. If the programmer wants to execute the program above in a batch application, all he or she has to do is to replace the environment via &lt;code&gt;ExecutionEnvironment&lt;/code&gt; and change the output conversion from &lt;code&gt;DataStream&lt;/code&gt; to &lt;code&gt;DataSet&lt;/code&gt;. The Table API program itself doesn’t change.&lt;/p&gt;
&lt;p&gt;In the example, we converted the table program to a data stream of &lt;code&gt;Row&lt;/code&gt; objects. However, we are not limited to row data types. The Table API supports all types from the underlying APIs such as Java and Scala Tuples, Case Classes, POJOs, or generic types that are serialized using Kryo. Let’s assume that we want to have regular object (POJO) with the following format instead of generic rows:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Customer&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Int&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;name&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;update&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;var&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;java.util.Properties&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;We can use the following table program to convert the CSV file into Customer objects. Flink takes care of creating objects and mapping fields for us.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ds&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tEnv&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;scan&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;customers&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;name&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;last_update&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;update&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;parseProperties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Customer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You might have noticed that the query above uses a function to parse the preferences field. Even though Flink’s Table API is shipped with a large set of built-in functions, is often necessary to define custom user-defined scalar functions. In the above example we use a user-defined function &lt;code&gt;parseProperties&lt;/code&gt;. The following code snippet shows how easily we can implement a scalar function.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;object&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;parseProperties&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ScalarFunction&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Properties&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Properties&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;str&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;foreach&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;setProperty&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;props&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Scalar functions can be used to deserialize, extract, or convert values (and more). By overwriting the &lt;code&gt;open()&lt;/code&gt; method we can even have access to runtime information such as distributed cached files or metrics. Even the &lt;code&gt;open()&lt;/code&gt; method is only called once during the runtime’s &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.3/internals/task_lifecycle.html&quot;&gt;task lifecycle&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;unified-windowing-for-static-and-streaming-data&quot;&gt;Unified Windowing for Static and Streaming Data&lt;/h2&gt;
&lt;p&gt;Another very common task, especially when working with continuous data, is the definition of windows to split a stream into pieces of finite size, over which we can apply computations. At the moment, the Table API supports three types of windows: sliding windows, tumbling windows, and session windows (for general definitions of the different types of windows, we recommend &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html&quot;&gt;Flink’s documentation&lt;/a&gt;). All three window types work on &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html&quot;&gt;event or processing time&lt;/a&gt;. Session windows can be defined over time intervals, sliding and tumbling windows can be defined over time intervals or a number of rows.&lt;/p&gt;
&lt;p&gt;Let’s assume that our customer data from the example above is an event stream of updates generated whenever the customer updated his or her preferences. We assume that events come from a TableSource that has assigned timestamps and watermarks. The definition of a window happens again in a LINQ-style fashion. The following example could be used to count the updates to the preferences during one day.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.d&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ay&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;from&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;to&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;count&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;updates&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;By using the &lt;code&gt;on()&lt;/code&gt; parameter, we can specify whether the window is supposed to work on event-time or not. The Table API assumes that timestamps and watermarks are assigned correctly when using event-time. Elements with timestamps smaller than the last received watermark are dropped. Since the extraction of timestamps and generation of watermarks depends on the data source and requires some deeper knowledge of their origin, the TableSource or the upstream DataStream is usually responsible for assigning these properties.&lt;/p&gt;
&lt;p&gt;The following code shows how to define other types of windows:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// using processing-time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumble&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;100.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;rows&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;manyRowWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// using event-time&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Session&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;withGap&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;15.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;minutes&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;sessionWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Slide&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.d&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ay&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;dailyWindow&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since batch is just a special case of streaming (where a batch happens to have a defined start and end point), it is also possible to apply all of these windows in a batch execution environment. Without any modification of the table program itself, we can run the code on a DataSet given that we specified a column named “rowtime”. This is particularly interesting if we want to compute exact results from time-to-time, so that late events that are heavily out-of-order can be included in the computation.&lt;/p&gt;
&lt;p&gt;At the moment, the Table API only supports so-called “group windows” that also exist in the DataStream API. Other windows such as SQL’s OVER clause windows are in development and &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot;&gt;planned for Flink 1.3&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In order to demonstrate the expressiveness and capabilities of the API, here’s a snippet with a more advanced example of an exponentially decaying moving average over a sliding window of one hour which returns aggregated results every second. The table program weighs recent orders more heavily than older orders. This example is borrowed from &lt;a href=&quot;https://calcite.apache.org/docs/stream.html#hopping-windows&quot;&gt;Apache Calcite&lt;/a&gt; and shows what will be possible in future Flink releases for both the Table API and SQL.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Slide&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;over&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;productId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;productId&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;unitPrice&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;rowtime&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;start&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;exp&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;1.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;hour&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sum&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;user-defined-table-functions&quot;&gt;User-defined Table Functions&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html#user-defined-table-functions&quot;&gt;User-defined table functions&lt;/a&gt; were added in Flink 1.2. These can be quite useful for table columns containing non-atomic values which need to be extracted and mapped to separate fields before processing. Table functions take an arbitrary number of scalar values and allow for returning an arbitrary number of rows as output instead of a single value, similar to a flatMap function in the DataStream or DataSet API. The output of a table function can then be joined with the original row in the table by using either a left-outer join or cross join.&lt;/p&gt;
&lt;p&gt;Using the previously-mentioned customer table, let’s assume we want to produce a table that contains the color and size preferences as separate columns. The table program would look like this:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// create an instance of the table function&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;extractPrefs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PropertiesExtractor&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// derive rows and join them with original row&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;table&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;join&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;extractPrefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;prefs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;username&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;PropertiesExtractor&lt;/code&gt; is a user-defined table function that extracts the color and size. We are not interested in customers that haven’t set these preferences and thus don’t emit anything if both properties are not present in the string value. Since we are using a (cross) join in the program, customers without a result on the right side of the join will be filtered out.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PropertiesExtractor&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableFunction&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;eval&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Unit&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// split string into (key, value) pairs&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;prefs&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;,&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;=&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;split&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;color&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;color&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pairs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;find&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_1&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;==&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;size&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(\&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;_&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.\&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;_2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// emit a row if color and size are specified&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;size&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;match&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Some&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;c&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;s&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;case&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;_&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&amp;gt;&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// skip&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;override&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getResultType&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;RowTypeInfo&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Types&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;STRING&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;There is significant interest in making streaming more accessible and easier to use. Flink’s Table API development is happening quickly, and we believe that soon, you will be able to implement large batch or streaming pipelines using purely relational APIs or even convert existing Flink jobs to table programs. The Table API is already a very useful tool since you can work around limitations and missing features at any time by switching back-and-forth between the DataSet/DataStream abstraction to the Table abstraction.&lt;/p&gt;
&lt;p&gt;Contributions like support of Apache Hive UDFs, external catalogs, more TableSources, additional windows, and more operators will make the Table API an even more useful tool. Particularly, the upcoming introduction of Dynamic Tables, which is worth a blog post of its own, shows that even in 2017, new relational APIs open the door to a number of possibilities.&lt;/p&gt;
&lt;p&gt;Try it out, or even better, join the design discussions on the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; and &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel&quot;&gt;JIRA&lt;/a&gt; and start contributing!&lt;/p&gt;
</description>
<pubDate>Wed, 29 Mar 2017 14:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2017/03/29/table-sql-api-update.html</link>
<guid isPermaLink="true">/news/2017/03/29/table-sql-api-update.html</guid>
</item>
<item>
<title>Apache Flink 1.1.5 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.&lt;/p&gt;
&lt;p&gt;This release includes critical fixes for HA recovery robustness, fault tolerance
guarantees of the Flink Kafka Connector, as well as classloading issues with the Kryo serializer.
We highly recommend all users to upgrade to Flink 1.1.5.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.5&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;release-notes---flink---version-115&quot;&gt;Release Notes - Flink - Version 1.1.5&lt;/h2&gt;
&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5701&quot;&gt;FLINK-5701&lt;/a&gt;] - FlinkKafkaProducer should check asyncException on checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6006&quot;&gt;FLINK-6006&lt;/a&gt;] - Kafka Consumer can lose state if queried partition list is incomplete on restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5940&quot;&gt;FLINK-5940&lt;/a&gt;] - ZooKeeperCompletedCheckpointStore cannot handle broken state handles
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5942&quot;&gt;FLINK-5942&lt;/a&gt;] - Harden ZooKeeperStateHandleStore to deal with corrupted data
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-6025&quot;&gt;FLINK-6025&lt;/a&gt;] - User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5945&quot;&gt;FLINK-5945&lt;/a&gt;] - Close function in OuterJoinOperatorBase#executeOnCollections
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5934&quot;&gt;FLINK-5934&lt;/a&gt;] - Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5771&quot;&gt;FLINK-5771&lt;/a&gt;] - DelimitedInputFormat does not correctly handle multi-byte delimiters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5647&quot;&gt;FLINK-5647&lt;/a&gt;] - Fix RocksDB Backend Cleanup
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] - CompilerException: &quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5585&quot;&gt;FLINK-5585&lt;/a&gt;] - NullPointer Exception in JobManager.updateAccumulators
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5484&quot;&gt;FLINK-5484&lt;/a&gt;] - Add test for registered Kryo types
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5518&quot;&gt;FLINK-5518&lt;/a&gt;] - HadoopInputFormat throws NPE when close() is called before open()
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5575&quot;&gt;FLINK-5575&lt;/a&gt;] - in old releases, warn users and guide them to the latest stable docs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5639&quot;&gt;FLINK-5639&lt;/a&gt;] - Clarify License implications of RabbitMQ Connector
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5466&quot;&gt;FLINK-5466&lt;/a&gt;] - Make production environment default in gulpfile
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Thu, 23 Mar 2017 19:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/03/23/release-1.1.5.html</link>
<guid isPermaLink="true">/news/2017/03/23/release-1.1.5.html</guid>
</item>
<item>
<title>Announcing Apache Flink 1.2.0</title>
<description>&lt;p&gt;The Apache Flink community is pleased to announce the 1.2.0 release. Over the past months, the Flink community has been working hard to resolve 650 issues. See the &lt;a href=&quot;http://flink.apache.org/blog/release_1.2.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more detail.&lt;/p&gt;
&lt;p&gt;This is the third major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.&lt;/p&gt;
&lt;p&gt;We encourage everyone to download the release and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/&quot;&gt;documentation&lt;/a&gt;. Feedback through the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt; is, as always, gladly encouraged!&lt;/p&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;. Some highlights of the release are listed below.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#dynamic-scaling--key-groups&quot; id=&quot;markdown-toc-dynamic-scaling--key-groups&quot;&gt;Dynamic Scaling / Key Groups&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#rescalable-non-partitioned-state&quot; id=&quot;markdown-toc-rescalable-non-partitioned-state&quot;&gt;Rescalable Non-Partitioned State&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#processfunction&quot; id=&quot;markdown-toc-processfunction&quot;&gt;ProcessFunction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#async-io&quot; id=&quot;markdown-toc-async-io&quot;&gt;Async I/O&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#run-flink-with-apache-mesos&quot; id=&quot;markdown-toc-run-flink-with-apache-mesos&quot;&gt;Run Flink with Apache Mesos&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#secure-data-access&quot; id=&quot;markdown-toc-secure-data-access&quot;&gt;Secure Data Access&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#queryable-state&quot; id=&quot;markdown-toc-queryable-state&quot;&gt;Queryable State&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#backwards-compatible-savepoints&quot; id=&quot;markdown-toc-backwards-compatible-savepoints&quot;&gt;Backwards compatible savepoints&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#table-api--sql&quot; id=&quot;markdown-toc-table-api--sql&quot;&gt;Table API &amp;amp; SQL&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#miscellaneous-improvements&quot; id=&quot;markdown-toc-miscellaneous-improvements&quot;&gt;Miscellaneous improvements&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#list-of-contributors&quot; id=&quot;markdown-toc-list-of-contributors&quot;&gt;List of Contributors&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;dynamic-scaling--key-groups&quot;&gt;Dynamic Scaling / Key Groups&lt;/h2&gt;
&lt;p&gt;Flink now supports changing the parallelism of a streaming job by restoring it from a savepoint with a different parallelism. Both changing the entire job’s parallelism and operator parallelism is supported.
In the &lt;code&gt;StreamExecutionEnvironment&lt;/code&gt;, users can set a new per-job configuration parameter called “max parallelism”. It determines the upper limit for the parallelism.&lt;/p&gt;
&lt;p&gt;By default, the value is set to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;128&lt;/code&gt; : for all parallelism &amp;lt;= 128&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MIN(nextPowerOfTwo(parallelism + (parallelism / 2)), 2^15)&lt;/code&gt;: for all parallelism &amp;gt; 128&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The following built-in functions and operators support rescaling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Window operator&lt;/li&gt;
&lt;li&gt;Rolling/Bucketing sink&lt;/li&gt;
&lt;li&gt;Kafka consumers&lt;/li&gt;
&lt;li&gt;Continuous File Processing source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The write-ahead log Cassandra sink and the CEP operator are currently not rescalable. Users using the keyed state interfaces can use the dynamic scaling without changing their code.&lt;/p&gt;
&lt;h2 id=&quot;rescalable-non-partitioned-state&quot;&gt;Rescalable Non-Partitioned State&lt;/h2&gt;
&lt;p&gt;As part of the dynamic scaling effort, the community has also added rescalable non-partitioned state for operators like the Kafka consumer that don’t use keyed state but instead use operator state.&lt;/p&gt;
&lt;p&gt;In case of rescaling, the operator state needs to be redistributed among the parallel consumer instances. In case of the Kafka consumer, the assigned partitions and their offsets are redistributed.&lt;/p&gt;
&lt;h2 id=&quot;processfunction&quot;&gt;ProcessFunction&lt;/h2&gt;
&lt;p&gt;The &lt;code&gt;ProcessFunction&lt;/code&gt; is a low-level stream processing operation giving access to the basic building blocks of all (acyclic) streaming applications:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Events (stream elements)&lt;/li&gt;
&lt;li&gt;State (fault tolerant, consistent)&lt;/li&gt;
&lt;li&gt;Timers (event time and processing time)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The ProcessFunction can be thought of as a FlatMapFunction with access to keyed state and timers.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html&quot;&gt;ProcessFunction documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;async-io&quot;&gt;Async I/O&lt;/h2&gt;
&lt;p&gt;Flink now has a dedicated Async I/O operator for making blocking calls asynchronously and in a checkpointed fashion. For example, there are many Flink applications that need to query external datastores for each element in a stream. To avoid slowing down the stream to the speed of the external system, the async I/O operator allows requests to overlap.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/asyncio.html&quot;&gt;Async I/O documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;run-flink-with-apache-mesos&quot;&gt;Run Flink with Apache Mesos&lt;/h2&gt;
&lt;p&gt;The latest release further extends Flink’s deployment flexibility by adding support for Apache Mesos and DC/OS. In combination with Marathon, it is now possible to run an highly available Flink cluster on Mesos.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/mesos.html&quot;&gt;Mesos documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;secure-data-access&quot;&gt;Secure Data Access&lt;/h2&gt;
&lt;p&gt;Flink is now able to authenticate against external services such as Zookeeper, Kafka, HDFS and YARN using Kerberos.
Also, experimental support for encryption over the wire has been added.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/security-kerberos.html&quot;&gt;Kerberos documentation&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/setup/security-ssl.html&quot;&gt;SSL setup documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;queryable-state&quot;&gt;Queryable State&lt;/h2&gt;
&lt;p&gt;This experimental feature allows users to query the current state of an operator.
If you have, for example, a flatMap() operator that keeps a running aggregate per key, queryable state allows you to retrieve the current aggregate value at any time by directly connecting to the TaskManager and retrieving that value.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/queryable_state.html&quot;&gt;Queryable State documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;backwards-compatible-savepoints&quot;&gt;Backwards compatible savepoints&lt;/h2&gt;
&lt;p&gt;Flink 1.2.0 allows users to restart a job from an 1.1.4 savepoint. This makes major Flink version upgrades possible without losing application state. The following built-in operators are backwards compatible:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Window operator&lt;/li&gt;
&lt;li&gt;Rolling/Bucketing sink&lt;/li&gt;
&lt;li&gt;Kafka consumers&lt;/li&gt;
&lt;li&gt;Continuous File Processing source&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/ops/upgrading.html&quot;&gt;Upgrading Flink applications documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;table-api--sql&quot;&gt;Table API &amp;amp; SQL&lt;/h2&gt;
&lt;p&gt;This release significantly expanded the performance, stability, and coverage of Flink’s Table API and SQL support for batch and streaming tables.&lt;/p&gt;
&lt;p&gt;The community added tumbling, sliding, and session group-window aggregations over streaming tables
e.g. &lt;code&gt;table.window(Session withGap 10.minutes on &#39;rowtime as &#39;w)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;SQL supports more built-in functions and operations
e.g. &lt;code&gt;EXISTS&lt;/code&gt;, &lt;code&gt;VALUES&lt;/code&gt;, &lt;code&gt;LIMIT&lt;/code&gt;, &lt;code&gt;CURRENT_DATE&lt;/code&gt;, &lt;code&gt;INITCAP&lt;/code&gt;, &lt;code&gt;NULLIF&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Both APIs support more data types and are better integrated
e.g. access a POJO field &lt;code&gt;myPojo.get(&#39;field&#39;)&lt;/code&gt;, &lt;code&gt;myPojo.flatten()&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;Users can now define their own scalar and table functions
e.g. &lt;code&gt;table.select(&#39;uid, parse(&#39;field) as &#39;parsed).join(split(&#39;parsed) as &#39;atom)&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/table_api.html&quot;&gt;Flink Table API &amp;amp; SQL documentation&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&quot;miscellaneous-improvements&quot;&gt;Miscellaneous improvements&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Metrics in Flink web interface: A metrics system was added in Flink 1.1, and with this release, Flink provides a new tab in the web frontend to see some of the metrics in the web UI.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Kafka 0.10 support: Flink 1.2 now provides a connector for Apache Kafka 0.10.0.x, including support for consuming and producing messages with a timestamp using Flink’s internal event time (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/connectors/kafka.html&quot;&gt;Kafka Connector Documentation&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Evictor Semantics: Flink 1.2 ships with more expressive evictor semantics that allow the programmer to evict elements form a window both before and after the application of the window function, and to remove elements arbitrarily (&lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#evictors&quot;&gt;Evictor Semantics Documentation&lt;/a&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;According to git shortlog, the following 122 people contributed to the 1.2.0 release. Thank you to all contributors!&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abhishek R. Singh&lt;/li&gt;
&lt;li&gt;Ahmad Ragab&lt;/li&gt;
&lt;li&gt;Aleksandr Chermenin&lt;/li&gt;
&lt;li&gt;Alexander Pivovarov&lt;/li&gt;
&lt;li&gt;Alexander Shoshin&lt;/li&gt;
&lt;li&gt;Alexey Diomin&lt;/li&gt;
&lt;li&gt;Aljoscha Krettek&lt;/li&gt;
&lt;li&gt;Andrey Melentyev&lt;/li&gt;
&lt;li&gt;Anton Mushin&lt;/li&gt;
&lt;li&gt;Bob Thorman&lt;/li&gt;
&lt;li&gt;Boris Osipov&lt;/li&gt;
&lt;li&gt;Bram Vogelaar&lt;/li&gt;
&lt;li&gt;Bruno Aranda&lt;/li&gt;
&lt;li&gt;David Anderson&lt;/li&gt;
&lt;li&gt;Dominik&lt;/li&gt;
&lt;li&gt;Evgeny_Kincharov&lt;/li&gt;
&lt;li&gt;Fabian Hueske&lt;/li&gt;
&lt;li&gt;Fokko Driesprong&lt;/li&gt;
&lt;li&gt;Gabor Gevay&lt;/li&gt;
&lt;li&gt;George&lt;/li&gt;
&lt;li&gt;Gordon Tai&lt;/li&gt;
&lt;li&gt;Greg Hogan&lt;/li&gt;
&lt;li&gt;Gyula Fora&lt;/li&gt;
&lt;li&gt;Haohui Mai&lt;/li&gt;
&lt;li&gt;Holger Frydrych&lt;/li&gt;
&lt;li&gt;HungUnicorn&lt;/li&gt;
&lt;li&gt;Ismaël Mejía&lt;/li&gt;
&lt;li&gt;Ivan Mushketyk&lt;/li&gt;
&lt;li&gt;Jakub Havlik&lt;/li&gt;
&lt;li&gt;Jark Wu&lt;/li&gt;
&lt;li&gt;Jendrik Poloczek&lt;/li&gt;
&lt;li&gt;Jincheng Sun&lt;/li&gt;
&lt;li&gt;Josh&lt;/li&gt;
&lt;li&gt;Joshi&lt;/li&gt;
&lt;li&gt;Keiji Yoshida&lt;/li&gt;
&lt;li&gt;Kirill Morozov&lt;/li&gt;
&lt;li&gt;Kurt Young&lt;/li&gt;
&lt;li&gt;Liwei Lin&lt;/li&gt;
&lt;li&gt;Lorenz Buehmann&lt;/li&gt;
&lt;li&gt;Maciek Próchniak&lt;/li&gt;
&lt;li&gt;Makman2&lt;/li&gt;
&lt;li&gt;Markus Müller&lt;/li&gt;
&lt;li&gt;Martin Junghanns&lt;/li&gt;
&lt;li&gt;Márton Balassi&lt;/li&gt;
&lt;li&gt;Max Kuklinski&lt;/li&gt;
&lt;li&gt;Maximilian Michels&lt;/li&gt;
&lt;li&gt;Milosz Tanski&lt;/li&gt;
&lt;li&gt;Nagarjun&lt;/li&gt;
&lt;li&gt;Neelesh Srinivas Salian&lt;/li&gt;
&lt;li&gt;Neil Derraugh&lt;/li&gt;
&lt;li&gt;Nick Chadwick&lt;/li&gt;
&lt;li&gt;Nico Kruber&lt;/li&gt;
&lt;li&gt;Niels Basjes&lt;/li&gt;
&lt;li&gt;Pattarawat Chormai&lt;/li&gt;
&lt;li&gt;Piotr Godek&lt;/li&gt;
&lt;li&gt;Raghav&lt;/li&gt;
&lt;li&gt;Ramkrishna&lt;/li&gt;
&lt;li&gt;Robert Metzger&lt;/li&gt;
&lt;li&gt;Rohit Agarwal&lt;/li&gt;
&lt;li&gt;Roman Maier&lt;/li&gt;
&lt;li&gt;Sachin&lt;/li&gt;
&lt;li&gt;Sachin Goel&lt;/li&gt;
&lt;li&gt;Scott Kidder&lt;/li&gt;
&lt;li&gt;Shannon Carey&lt;/li&gt;
&lt;li&gt;Stefan Richter&lt;/li&gt;
&lt;li&gt;Steffen Hausmann&lt;/li&gt;
&lt;li&gt;Stephan Epping&lt;/li&gt;
&lt;li&gt;Stephan Ewen&lt;/li&gt;
&lt;li&gt;Sunny T&lt;/li&gt;
&lt;li&gt;Suri&lt;/li&gt;
&lt;li&gt;Theodore Vasiloudis&lt;/li&gt;
&lt;li&gt;Till Rohrmann&lt;/li&gt;
&lt;li&gt;Tony Wei&lt;/li&gt;
&lt;li&gt;Tzu-Li (Gordon) Tai&lt;/li&gt;
&lt;li&gt;Ufuk Celebi&lt;/li&gt;
&lt;li&gt;Vijay Srinivasaraghavan&lt;/li&gt;
&lt;li&gt;Vishnu Viswanath&lt;/li&gt;
&lt;li&gt;WangTaoTheTonic&lt;/li&gt;
&lt;li&gt;William-Sang&lt;/li&gt;
&lt;li&gt;Yassine Marzougui&lt;/li&gt;
&lt;li&gt;anton solovev&lt;/li&gt;
&lt;li&gt;beyond1920&lt;/li&gt;
&lt;li&gt;biao.liub&lt;/li&gt;
&lt;li&gt;chobeat&lt;/li&gt;
&lt;li&gt;danielblazevski&lt;/li&gt;
&lt;li&gt;f7753&lt;/li&gt;
&lt;li&gt;fengyelei&lt;/li&gt;
&lt;li&gt;fengyelei 00406569&lt;/li&gt;
&lt;li&gt;gallenvara&lt;/li&gt;
&lt;li&gt;gaolun.gl&lt;/li&gt;
&lt;li&gt;godfreyhe&lt;/li&gt;
&lt;li&gt;heytitle&lt;/li&gt;
&lt;li&gt;hzyuemeng1&lt;/li&gt;
&lt;li&gt;iteblog&lt;/li&gt;
&lt;li&gt;kl0u&lt;/li&gt;
&lt;li&gt;larsbachmann&lt;/li&gt;
&lt;li&gt;lincoln-lil&lt;/li&gt;
&lt;li&gt;manuzhang&lt;/li&gt;
&lt;li&gt;medale&lt;/li&gt;
&lt;li&gt;miaoever&lt;/li&gt;
&lt;li&gt;mtunique&lt;/li&gt;
&lt;li&gt;radekg&lt;/li&gt;
&lt;li&gt;renkai&lt;/li&gt;
&lt;li&gt;sergey_sokur&lt;/li&gt;
&lt;li&gt;shijinkui&lt;/li&gt;
&lt;li&gt;shuai.xus&lt;/li&gt;
&lt;li&gt;smarthi&lt;/li&gt;
&lt;li&gt;swapnil-chougule&lt;/li&gt;
&lt;li&gt;tedyu&lt;/li&gt;
&lt;li&gt;tibor.moger&lt;/li&gt;
&lt;li&gt;tonycox&lt;/li&gt;
&lt;li&gt;twalthr&lt;/li&gt;
&lt;li&gt;vasia&lt;/li&gt;
&lt;li&gt;wenlong.lwl&lt;/li&gt;
&lt;li&gt;wrighe3&lt;/li&gt;
&lt;li&gt;xiaogang.sxg&lt;/li&gt;
&lt;li&gt;yushi.wxg&lt;/li&gt;
&lt;li&gt;yuzhongliu&lt;/li&gt;
&lt;li&gt;zentol&lt;/li&gt;
&lt;li&gt;zhuhaifengleon&lt;/li&gt;
&lt;li&gt;淘江&lt;/li&gt;
&lt;li&gt;魏偉哲&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 06 Feb 2017 13:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2017/02/06/release-1.2.0.html</link>
<guid isPermaLink="true">/news/2017/02/06/release-1.2.0.html</guid>
</item>
<item>
<title>Apache Flink 1.1.4 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.&lt;/p&gt;
&lt;p&gt;This release includes major robustness improvements for checkpoint cleanup on failures and consumption of intermediate streams. We highly recommend all users to upgrade to Flink 1.1.4.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.4&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;note-for-rocksdb-backend-users&quot;&gt;Note for RocksDB Backend Users&lt;/h2&gt;
&lt;p&gt;We updated Flink’s RocksDB dependency version from &lt;code&gt;4.5.1&lt;/code&gt; to &lt;code&gt;4.11.2&lt;/code&gt;. Between these versions some of RocksDB’s internal configuration defaults changed that would affect the memory footprint of running Flink with RocksDB. Therefore, we manually reset them to the previous defaults. If you want to run with the new Rocks 4.11.2 defaults, you can do this via:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;RocksDBStateBackend&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// Use the new default options. Otherwise, the default for RocksDB 4.5.1&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// `PredefinedOptions.DEFAULT_ROCKS_4_5_1` will be used.&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;setPredefinedOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;PredefinedOptions&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;DEFAULT&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;release-notes---flink---version-114&quot;&gt;Release Notes - Flink - Version 1.1.4&lt;/h2&gt;
&lt;h3 id=&quot;sub-task&quot;&gt;Sub-task&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4510&quot;&gt;FLINK-4510&lt;/a&gt;] - Always create CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4984&quot;&gt;FLINK-4984&lt;/a&gt;] - Add Cancellation Barriers to BarrierTracker and BarrierBuffer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4985&quot;&gt;FLINK-4985&lt;/a&gt;] - Report Declined/Canceled Checkpoints to Checkpoint Coordinator
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] - CompilerException: &amp;quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3680&quot;&gt;FLINK-3680&lt;/a&gt;] - Remove or improve (not set) text in the Job Plan UI
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3813&quot;&gt;FLINK-3813&lt;/a&gt;] - YARNSessionFIFOITCase.testDetachedMode failed on Travis
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4108&quot;&gt;FLINK-4108&lt;/a&gt;] - NPE in Row.productArity
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4506&quot;&gt;FLINK-4506&lt;/a&gt;] - CsvOutputFormat defaults allowNullValues to false, even though doc and declaration says true
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4581&quot;&gt;FLINK-4581&lt;/a&gt;] - Table API throws &amp;quot;No suitable driver found for jdbc:calcite&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4586&quot;&gt;FLINK-4586&lt;/a&gt;] - NumberSequenceIterator and Accumulator threading issue
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4619&quot;&gt;FLINK-4619&lt;/a&gt;] - JobManager does not answer to client when restore from savepoint fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4727&quot;&gt;FLINK-4727&lt;/a&gt;] - Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4862&quot;&gt;FLINK-4862&lt;/a&gt;] - NPE on EventTimeSessionWindows with ContinuousEventTimeTrigger
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4932&quot;&gt;FLINK-4932&lt;/a&gt;] - Don&amp;#39;t let ExecutionGraph fail when in state Restarting
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4933&quot;&gt;FLINK-4933&lt;/a&gt;] - ExecutionGraph.scheduleOrUpdateConsumers can fail the ExecutionGraph
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4977&quot;&gt;FLINK-4977&lt;/a&gt;] - Enum serialization does not work in all cases
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4991&quot;&gt;FLINK-4991&lt;/a&gt;] - TestTask hangs in testWatchDogInterruptsTask
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4998&quot;&gt;FLINK-4998&lt;/a&gt;] - ResourceManager fails when num task slots &amp;gt; Yarn vcores
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5013&quot;&gt;FLINK-5013&lt;/a&gt;] - Flink Kinesis connector doesn&amp;#39;t work on old EMR versions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5028&quot;&gt;FLINK-5028&lt;/a&gt;] - Stream Tasks must not go through clean shutdown logic on cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5038&quot;&gt;FLINK-5038&lt;/a&gt;] - Errors in the &amp;quot;cancelTask&amp;quot; method prevent closeables from being closed early
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5039&quot;&gt;FLINK-5039&lt;/a&gt;] - Avro GenericRecord support is broken
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5040&quot;&gt;FLINK-5040&lt;/a&gt;] - Set correct input channel types with eager scheduling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5050&quot;&gt;FLINK-5050&lt;/a&gt;] - JSON.org license is CatX
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5057&quot;&gt;FLINK-5057&lt;/a&gt;] - Cancellation timeouts are picked from wrong config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5058&quot;&gt;FLINK-5058&lt;/a&gt;] - taskManagerMemory attribute set wrong value in FlinkShell
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5063&quot;&gt;FLINK-5063&lt;/a&gt;] - State handles are not properly cleaned up for declined or expired checkpoints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5073&quot;&gt;FLINK-5073&lt;/a&gt;] - ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5075&quot;&gt;FLINK-5075&lt;/a&gt;] - Kinesis consumer incorrectly determines shards as newly discovered when tested against Kinesalite
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5082&quot;&gt;FLINK-5082&lt;/a&gt;] - Pull ExecutionService lifecycle management out of the JobManager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5085&quot;&gt;FLINK-5085&lt;/a&gt;] - Execute CheckpointCoodinator&amp;#39;s state discard calls asynchronously
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5114&quot;&gt;FLINK-5114&lt;/a&gt;] - PartitionState update with finished execution fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5142&quot;&gt;FLINK-5142&lt;/a&gt;] - Resource leak in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5149&quot;&gt;FLINK-5149&lt;/a&gt;] - ContinuousEventTimeTrigger doesn&amp;#39;t fire at the end of the window
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5154&quot;&gt;FLINK-5154&lt;/a&gt;] - Duplicate TypeSerializer when writing RocksDB Snapshot
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5158&quot;&gt;FLINK-5158&lt;/a&gt;] - Handle ZooKeeperCompletedCheckpointStore exceptions in CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5172&quot;&gt;FLINK-5172&lt;/a&gt;] - In RocksDBStateBackend, set flink-core and flink-streaming-java to &amp;quot;provided&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5173&quot;&gt;FLINK-5173&lt;/a&gt;] - Upgrade RocksDB dependency
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5184&quot;&gt;FLINK-5184&lt;/a&gt;] - Error result of compareSerialized in RowComparator class
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5193&quot;&gt;FLINK-5193&lt;/a&gt;] - Recovering all jobs fails completely if a single recovery fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5197&quot;&gt;FLINK-5197&lt;/a&gt;] - Late JobStatusChanged messages can interfere with running jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5214&quot;&gt;FLINK-5214&lt;/a&gt;] - Clean up checkpoint files when failing checkpoint operation on TM
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5215&quot;&gt;FLINK-5215&lt;/a&gt;] - Close checkpoint streams upon cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5216&quot;&gt;FLINK-5216&lt;/a&gt;] - CheckpointCoordinator&amp;#39;s &amp;#39;minPauseBetweenCheckpoints&amp;#39; refers to checkpoint start rather then checkpoint completion
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5218&quot;&gt;FLINK-5218&lt;/a&gt;] - Eagerly close checkpoint streams on cancellation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5228&quot;&gt;FLINK-5228&lt;/a&gt;] - LocalInputChannel re-trigger request and release deadlock
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5229&quot;&gt;FLINK-5229&lt;/a&gt;] - Cleanup StreamTaskStates if a checkpoint operation of a subsequent operator fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5246&quot;&gt;FLINK-5246&lt;/a&gt;] - Don&amp;#39;t discard unknown checkpoint messages in the CheckpointCoordinator
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5248&quot;&gt;FLINK-5248&lt;/a&gt;] - SavepointITCase doesn&amp;#39;t catch savepoint restore failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5274&quot;&gt;FLINK-5274&lt;/a&gt;] - LocalInputChannel throws NPE if partition reader is released
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5275&quot;&gt;FLINK-5275&lt;/a&gt;] - InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5276&quot;&gt;FLINK-5276&lt;/a&gt;] - ExecutionVertex archiving can throw NPE with many previous attempts
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5285&quot;&gt;FLINK-5285&lt;/a&gt;] - CancelCheckpointMarker flood when using at least once mode
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5326&quot;&gt;FLINK-5326&lt;/a&gt;] - IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5352&quot;&gt;FLINK-5352&lt;/a&gt;] - Restore RocksDB 1.1.3 memory behavior
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3347&quot;&gt;FLINK-3347&lt;/a&gt;] - TaskManager (or its ActorSystem) need to restart in case they notice quarantine
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3787&quot;&gt;FLINK-3787&lt;/a&gt;] - Yarn client does not report unfulfillable container constraints
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4445&quot;&gt;FLINK-4445&lt;/a&gt;] - Ignore unmatched state when restoring from savepoint
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4715&quot;&gt;FLINK-4715&lt;/a&gt;] - TaskManager should commit suicide after cancellation failure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4894&quot;&gt;FLINK-4894&lt;/a&gt;] - Don&amp;#39;t block on buffer request after broadcastEvent
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4975&quot;&gt;FLINK-4975&lt;/a&gt;] - Add a limit for how much data may be buffered during checkpoint alignment
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4996&quot;&gt;FLINK-4996&lt;/a&gt;] - Make CrossHint @Public
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5046&quot;&gt;FLINK-5046&lt;/a&gt;] - Avoid redundant serialization when creating the TaskDeploymentDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5123&quot;&gt;FLINK-5123&lt;/a&gt;] - Add description how to do proper shading to Flink docs.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5169&quot;&gt;FLINK-5169&lt;/a&gt;] - Make consumption of input channels fair
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5192&quot;&gt;FLINK-5192&lt;/a&gt;] - Provide better log config templates
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5194&quot;&gt;FLINK-5194&lt;/a&gt;] - Log heartbeats on TRACE level
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5196&quot;&gt;FLINK-5196&lt;/a&gt;] - Don&amp;#39;t log InputChannelDescriptor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5198&quot;&gt;FLINK-5198&lt;/a&gt;] - Overwrite TaskState toString
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5199&quot;&gt;FLINK-5199&lt;/a&gt;] - Improve logging of submitted job graph actions in HA case
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5201&quot;&gt;FLINK-5201&lt;/a&gt;] - Promote loaded config properties to INFO
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5207&quot;&gt;FLINK-5207&lt;/a&gt;] - Decrease HadoopFileSystem logging
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5249&quot;&gt;FLINK-5249&lt;/a&gt;] - description of datastream rescaling doesn&amp;#39;t match the figure
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5259&quot;&gt;FLINK-5259&lt;/a&gt;] - wrong execution environment in retry delays example
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-5278&quot;&gt;FLINK-5278&lt;/a&gt;] - Improve Task and checkpoint logging
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;new-feature&quot;&gt;New Feature&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4976&quot;&gt;FLINK-4976&lt;/a&gt;] - Add a way to abort in flight checkpoints
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;task&quot;&gt;Task&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4778&quot;&gt;FLINK-4778&lt;/a&gt;] - Update program example in /docs/setup/cli.md due to the change in FLINK-2021
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 21 Dec 2016 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/12/21/release-1.1.4.html</link>
<guid isPermaLink="true">/news/2016/12/21/release-1.1.4.html</guid>
</item>
<item>
<title>Apache Flink in 2016: Year in Review</title>
<description>&lt;p&gt;2016 was an exciting year for the Apache Flink® community, and the
&lt;a href=&quot;http://flink.apache.org/news/2016/03/08/release-1.0.0.html&quot; target=&quot;_blank&quot;&gt;release of Flink 1.0 in March&lt;/a&gt;
marked the first time in Flink’s history that the community guaranteed API backward compatibility for all
versions in a series. This step forward for Flink was followed by many new and exciting production deployments
in organizations of all shapes and sizes, all around the globe.&lt;/p&gt;
&lt;p&gt;In this post, we’ll look back on the project’s progress over the course of 2016, and
we’ll also preview what 2017 has in store.&lt;/p&gt;
&lt;div class=&quot;page-toc&quot;&gt;
&lt;ul id=&quot;markdown-toc&quot;&gt;
&lt;li&gt;&lt;a href=&quot;#community-growth&quot; id=&quot;markdown-toc-community-growth&quot;&gt;Community Growth&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#github&quot; id=&quot;markdown-toc-github&quot;&gt;Github&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#meetups&quot; id=&quot;markdown-toc-meetups&quot;&gt;Meetups&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#flink-forward-2016&quot; id=&quot;markdown-toc-flink-forward-2016&quot;&gt;Flink Forward 2016&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#features-and-ecosystem&quot; id=&quot;markdown-toc-features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/a&gt; &lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;#flink-ecosystem-growth&quot; id=&quot;markdown-toc-flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#feature-timeline-in-2016&quot; id=&quot;markdown-toc-feature-timeline-in-2016&quot;&gt;Feature Timeline in 2016&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;#looking-ahead-to-2017&quot; id=&quot;markdown-toc-looking-ahead-to-2017&quot;&gt;Looking ahead to 2017&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
&lt;h2 id=&quot;community-growth&quot;&gt;Community Growth&lt;/h2&gt;
&lt;h3 id=&quot;github&quot;&gt;Github&lt;/h3&gt;
&lt;p&gt;First, here’s a summary of community statistics from &lt;a href=&quot;https://github.com/apache/flink&quot; target=&quot;_blank&quot;&gt;GitHub&lt;/a&gt;. At the time of writing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;b&gt;Contributors&lt;/b&gt; have increased from 150 in December 2015 to 258 in December 2016 (up &lt;b&gt;72%&lt;/b&gt;)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Stars&lt;/b&gt; have increased from 813 in December 2015 to 1830 in December 2016 (up &lt;b&gt;125%&lt;/b&gt;)&lt;/li&gt;
&lt;li&gt;&lt;b&gt;Forks&lt;/b&gt; have increased from 544 in December 2015 to 1255 in December 2016 (up &lt;b&gt;130%&lt;/b&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The community also welcomed &lt;b&gt;3 new committers in 2016&lt;/b&gt;: Chengxiang Li, Greg Hogan, and Tzu-Li (Gordon) Tai.&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;img src=&quot;/img/blog/github-stats-2016.png&quot; width=&quot;775&quot; alt=&quot;Apache Flink GitHub Stats&quot; /&gt;
&lt;br /&gt;
&lt;br /&gt;&lt;/p&gt;
&lt;p&gt;Next, let’s take a look at a few other project stats, starting with number of commits. If we run:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git log --pretty=oneline --after=12/31/2015 | wc -l
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;…inside the Flink repository, we’ll see a total of &lt;strong&gt;1884&lt;/strong&gt; commits so far in 2016, bringing the all-time total commits to &lt;strong&gt;10,015&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;Now, let’s go a bit deeper. And here are instructions in case you’d like to take a look at this data yourself.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Download gitstats from the &lt;a href=&quot;http://gitstats.sourceforge.net/&quot;&gt;project homepage&lt;/a&gt;. Or, on OS X with homebrew, type:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;brew install --HEAD homebrew/head-only/gitstats
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Clone the Apache Flink git repository:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;git clone git@github.com:apache/flink.git
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;Generate the statistics&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;gitstats flink/ flink-stats/
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;ul&gt;
&lt;li&gt;View all the statistics as an html page using your defaulf browser:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code&gt;open flink-stats/index.html
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;2016 is the year that Flink surpassed 1 million lines of code, now clocking in at &lt;strong&gt;1,034,137&lt;/strong&gt; lines.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-lines-of-code-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Total Lines of Code&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Monday remains the day of the week with the most commits over the project’s history:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-dow-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Commits by Day of Week&quot; /&gt;&lt;/p&gt;
&lt;p&gt;And 5pm is still solidly the preferred commit time:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-hod-2016.png&quot; align=&quot;center&quot; width=&quot;550&quot; alt=&quot;Flink Commits by Hour of Day&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;br /&gt;&lt;/p&gt;
&lt;h3 id=&quot;meetups&quot;&gt;Meetups&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://www.meetup.com/topics/apache-flink/&quot; target=&quot;_blank&quot;&gt;Apache Flink Meetup membership&lt;/a&gt; grew by &lt;b&gt;240%&lt;/b&gt;
this year, and at the time of writing, there are 41 meetups comprised of 16,541 members listing Flink as a topic–up from 16 groups with 4,864 members in December 2015.
The Flink community is proud to be truly global in nature.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-meetups-dec2016.png&quot; width=&quot;775&quot; alt=&quot;Apache Flink Meetup Map&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;flink-forward-2016&quot;&gt;Flink Forward 2016&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;http://2016.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;second annual Flink Forward conference &lt;/a&gt;took place in
Berlin on September 12-14, and over 350 members of the Flink community came together for speaker sessions, training,
and discussion about Flink. &lt;a href=&quot;http://2016.flink-forward.org/program/sessions/&quot; target=&quot;_blank&quot;&gt;Slides and videos&lt;/a&gt;
from speaker sessions are available online, and we encourage you to take a look if you’re interested in learning more
about how Flink is used in production in a wide range of organizations.&lt;/p&gt;
&lt;p&gt;Flink Forward will be expanding to &lt;a href=&quot;http://sf.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;San Francisco in April 2017&lt;/a&gt;, and the &lt;a href=&quot;http://berlin.flink-forward.org/&quot; target=&quot;_blank&quot;&gt;third-annual Berlin event
is scheduled for September 2017.&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/speaker-logos-ff2016.png&quot; width=&quot;775&quot; alt=&quot;Flink Forward Speakers&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;features-and-ecosystem&quot;&gt;Features and Ecosystem&lt;/h2&gt;
&lt;h3 id=&quot;flink-ecosystem-growth&quot;&gt;Flink Ecosystem Growth&lt;/h3&gt;
&lt;p&gt;Flink was added to a selection of distributions during 2016, making it easier
for an even larger base of users to start working with Flink:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://aws.amazon.com/blogs/big-data/use-apache-flink-on-amazon-emr/&quot; target=&quot;_blank&quot;&gt;
Amazon EMR&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://cloud.google.com/dataproc/docs/release-notes/service#november_29_2016&quot; target=&quot;_blank&quot;&gt;
Google Cloud Dataproc&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.lightbend.com/blog/introducing-lightbend-fast-data-platform&quot; target=&quot;_blank&quot;&gt;
Lightbend Fast Data Platform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In addition, the Apache Beam and Flink communities teamed up to build a Flink runner for Beam that, according to the Google team, is &lt;a href=&quot;https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective&quot; target=&quot;_blank&quot;&gt;“sophisticated enough to be a compelling alternative to Cloud Dataflow when running on premise or on non-Google clouds”&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;feature-timeline-in-2016&quot;&gt;Feature Timeline in 2016&lt;/h3&gt;
&lt;p&gt;Here’s a selection of major features added to Flink over the course of 2016:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;/img/blog/flink-releases-2016.png&quot; width=&quot;775&quot; alt=&quot;Flink Release Timeline 2016&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If you spend time in the &lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4554?jql=project%20%3D%20FLINK%20AND%20issuetype%20%3D%20%22New%20Feature%22%20AND%20status%20%3D%20Resolved%20ORDER%20BY%20resolved%20DESC&quot; target=&quot;_blank&quot;&gt;Apache Flink JIRA project&lt;/a&gt;, you’ll see that the Flink community has addressed every single one of the roadmap items identified
in &lt;a href=&quot;http://flink.apache.org/news/2015/12/18/a-year-in-review.html&quot; target=&quot;_blank&quot;&gt;2015’s year in review post&lt;/a&gt;. Here’s to making that an annual tradition. :)&lt;/p&gt;
&lt;h2 id=&quot;looking-ahead-to-2017&quot;&gt;Looking ahead to 2017&lt;/h2&gt;
&lt;p&gt;A good source of information about the Flink community’s roadmap is the list of
&lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals&quot; target=&quot;_blank&quot;&gt;Flink
Improvement Proposals (FLIPs)&lt;/a&gt; in the project wiki. Below, we’ll highlight a selection of FLIPs
that have been accepted by the community as well as some that are still under discussion.&lt;/p&gt;
&lt;p&gt;We should note that work is already underway on a number of these features, and some will even be included in Flink 1.2 at the beginning of 2017.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;A new Flink deployment and process model&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077&quot; target=&quot;_blank&quot;&gt;FLIP-6&lt;a&gt;&lt;/a&gt;. This work ensures that Flink supports a wide
range of deployment types and cluster managers, making it possible to run Flink smoothly in any environment.&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Dynamic scaling&lt;/strong&gt; for both key-value state &lt;a href=&quot;https://github.com/apache/flink/pull/2440&quot; target=&quot;_blank&quot;&gt;(as described in
this PR)&lt;a&gt;&lt;/a&gt; &lt;em&gt;and&lt;/em&gt; non-partitioned state &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-8%3A+Rescalable+Non-Partitioned+State&quot; target=&quot;_blank&quot;&gt;(as described in FLIP-8)&lt;a&gt;&lt;/a&gt;, ensuring that it’s always possible to split or merge state when scaling up or down, respectively.&lt;/a&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Asynchronous I/O&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673&quot; target=&quot;_blank&quot;&gt;FLIP-12
&lt;/a&gt;, which makes I/O access a less time-consuming process without adding complexity or the need for extra checkpoint coordination.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Enhancements to the window evictor&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-4+%3A+Enhance+Window+Evictor&quot; target=&quot;_blank&quot;&gt;FLIP-4&lt;/a&gt;,
to provide users with more control over how elements are evicted from a window.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Fined-grained recovery from task failures&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-1+%3A+Fine+Grained+Recovery+from+Task+Failures&quot; target=&quot;_blank&quot;&gt;FLIP-1&lt;/a&gt;,
to make it possible to restart only what needs to be restarted during recovery, building on cached intermediate results.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Unified checkpoints and savepoints&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-10%3A+Unify+Checkpoints+and+Savepoints&quot; target=&quot;_blank&quot;&gt;FLIP-10&lt;/a&gt;, to
allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both
the job and Flink version whereas checkpoints can only be recovered with the same job.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Table API window aggregations&lt;/strong&gt;, as described in &lt;a href=&quot;https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations&quot; target=&quot;_blank&quot;&gt;FLIP-11&lt;/a&gt;, to support group-window and row-window aggregates on streaming and batch tables.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Side inputs&lt;/strong&gt;, as described in &lt;a href=&quot;https://docs.google.com/document/d/1hIgxi2Zchww_5fWUHLoYiXwSBXjv-M5eOv-MKQYN3m4/edit&quot; target=&quot;_blank&quot;&gt;this design document&lt;/a&gt;, to
enable the joining of a main, high-throughput stream with one more more inputs with static or slowly-changing data.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you’re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;Flink mailing lists&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Lastly, we’d like to extend a sincere thank you to all of the Flink community for making 2016 a great year!&lt;/p&gt;
</description>
<pubDate>Mon, 19 Dec 2016 10:00:00 +0100</pubDate>
<link>https://flink.apache.org/news/2016/12/19/2016-year-in-review.html</link>
<guid isPermaLink="true">/news/2016/12/19/2016-year-in-review.html</guid>
</item>
<item>
<title>Apache Flink 1.1.3 Released</title>
<description>&lt;p&gt;The Apache Flink community released the next bugfix version of the Apache Flink 1.1. series.&lt;/p&gt;
&lt;p&gt;We recommend all users to upgrade to Flink 1.1.3.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.3&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;note-for-rocksdb-backend-users&quot;&gt;Note for RocksDB Backend Users&lt;/h2&gt;
&lt;p&gt;It is highly recommended to use the “fully async” mode for the RocksDB state backend. The “fully async” mode will most likely allow you to easily upgrade to Flink 1.2 (via &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/savepoints.html&quot;&gt;savepoints&lt;/a&gt;) when it is released. The “semi async” mode will no longer be supported by Flink 1.2.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;RocksDBStateBackend&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;RocksDBStateBackend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;backend&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;enableFullyAsyncSnapshots&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;();&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;release-notes---flink---version-113&quot;&gt;Release Notes - Flink - Version 1.1.3&lt;/h2&gt;
&lt;h2&gt; Bug
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2662&quot;&gt;FLINK-2662&lt;/a&gt;] - CompilerException: &amp;quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4311&quot;&gt;FLINK-4311&lt;/a&gt;] - TableInputFormat fails when reused on next split
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4329&quot;&gt;FLINK-4329&lt;/a&gt;] - Fix Streaming File Source Timestamps/Watermarks Handling
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4485&quot;&gt;FLINK-4485&lt;/a&gt;] - Finished jobs in yarn session fill /tmp filesystem
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4513&quot;&gt;FLINK-4513&lt;/a&gt;] - Kafka connector documentation refers to Flink 1.1-SNAPSHOT
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4514&quot;&gt;FLINK-4514&lt;/a&gt;] - ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4540&quot;&gt;FLINK-4540&lt;/a&gt;] - Detached job execution may prevent cluster shutdown
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4544&quot;&gt;FLINK-4544&lt;/a&gt;] - TaskManager metrics are vulnerable to custom JMX bean installation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4566&quot;&gt;FLINK-4566&lt;/a&gt;] - ProducerFailedException does not properly preserve Exception causes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4588&quot;&gt;FLINK-4588&lt;/a&gt;] - Fix Merging of Covering Window in MergingWindowSet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4589&quot;&gt;FLINK-4589&lt;/a&gt;] - Fix Merging of Covering Window in MergingWindowSet
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4616&quot;&gt;FLINK-4616&lt;/a&gt;] - Kafka consumer doesn&amp;#39;t store last emmited watermarks per partition in state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4618&quot;&gt;FLINK-4618&lt;/a&gt;] - FlinkKafkaConsumer09 should start from the next record on startup from offsets in Kafka
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4619&quot;&gt;FLINK-4619&lt;/a&gt;] - JobManager does not answer to client when restore from savepoint fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4636&quot;&gt;FLINK-4636&lt;/a&gt;] - AbstractCEPPatternOperator fails to restore state
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4640&quot;&gt;FLINK-4640&lt;/a&gt;] - Serialization of the initialValue of a Fold on WindowedStream fails
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4651&quot;&gt;FLINK-4651&lt;/a&gt;] - Re-register processing time timers at the WindowOperator upon recovery.
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4663&quot;&gt;FLINK-4663&lt;/a&gt;] - Flink JDBCOutputFormat logs wrong WARN message
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4672&quot;&gt;FLINK-4672&lt;/a&gt;] - TaskManager accidentally decorates Kill messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4677&quot;&gt;FLINK-4677&lt;/a&gt;] - Jars with no job executions produces NullPointerException in ClusterClient
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4702&quot;&gt;FLINK-4702&lt;/a&gt;] - Kafka consumer must commit offsets asynchronously
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4727&quot;&gt;FLINK-4727&lt;/a&gt;] - Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4732&quot;&gt;FLINK-4732&lt;/a&gt;] - Maven junction plugin security threat
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4777&quot;&gt;FLINK-4777&lt;/a&gt;] - ContinuousFileMonitoringFunction may throw IOException when files are moved
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4788&quot;&gt;FLINK-4788&lt;/a&gt;] - State backend class cannot be loaded, because fully qualified name converted to lower-case
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt; Improvement
&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4396&quot;&gt;FLINK-4396&lt;/a&gt;] - GraphiteReporter class not found at startup of jobmanager
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4574&quot;&gt;FLINK-4574&lt;/a&gt;] - Strengthen fetch interval implementation in Kinesis consumer
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4723&quot;&gt;FLINK-4723&lt;/a&gt;] - Unify behaviour of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 12 Oct 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/10/12/release-1.1.3.html</link>
<guid isPermaLink="true">/news/2016/10/12/release-1.1.3.html</guid>
</item>
<item>
<title>Apache Flink 1.1.2 Released</title>
<description>&lt;p&gt;The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.&lt;/p&gt;
&lt;p&gt;We recommend all users to upgrade to Flink 1.1.2.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.2&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2&gt;Release Notes - Flink - Version 1.1.2&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4236&quot;&gt;FLINK-4236&lt;/a&gt;] - Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4309&quot;&gt;FLINK-4309&lt;/a&gt;] - Potential null pointer dereference in DelegatingConfiguration#keySet()
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4334&quot;&gt;FLINK-4334&lt;/a&gt;] - Shaded Hadoop1 jar not fully excluded in Quickstart
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4341&quot;&gt;FLINK-4341&lt;/a&gt;] - Kinesis connector does not emit maximum watermark properly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4402&quot;&gt;FLINK-4402&lt;/a&gt;] - Wrong metrics parameter names in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4409&quot;&gt;FLINK-4409&lt;/a&gt;] - class conflict between jsr305-1.3.9.jar and flink-shaded-hadoop2-1.1.1.jar
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4411&quot;&gt;FLINK-4411&lt;/a&gt;] - [py] Chained dual input children are not properly propagated
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4412&quot;&gt;FLINK-4412&lt;/a&gt;] - [py] Chaining does not properly handle broadcast variables
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4425&quot;&gt;FLINK-4425&lt;/a&gt;] - &amp;quot;Out Of Memory&amp;quot; during savepoint deserialization
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4454&quot;&gt;FLINK-4454&lt;/a&gt;] - Lookups for JobManager address in config
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4480&quot;&gt;FLINK-4480&lt;/a&gt;] - Incorrect link to elastic.co in documentation
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4486&quot;&gt;FLINK-4486&lt;/a&gt;] - JobManager not fully running when yarn-session.sh finishes
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4488&quot;&gt;FLINK-4488&lt;/a&gt;] - Prevent cluster shutdown after job execution for non-detached jobs
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4514&quot;&gt;FLINK-4514&lt;/a&gt;] - ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4526&quot;&gt;FLINK-4526&lt;/a&gt;] - ApplicationClient: remove redundant proxy messages
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3866&quot;&gt;FLINK-3866&lt;/a&gt;] - StringArraySerializer claims type is immutable; shouldn&amp;#39;t
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3899&quot;&gt;FLINK-3899&lt;/a&gt;] - Document window processing with Reduce/FoldFunction + WindowFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4302&quot;&gt;FLINK-4302&lt;/a&gt;] - Add JavaDocs to MetricConfig
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-4495&quot;&gt;FLINK-4495&lt;/a&gt;] - Running multiple jobs on yarn (without yarn-session)
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 05 Sep 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/09/05/release-1.1.2.html</link>
<guid isPermaLink="true">/news/2016/09/05/release-1.1.2.html</guid>
</item>
<item>
<title>Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion</title>
<description>&lt;p&gt;An update for the Flink community: the &lt;a href=&quot;http://flink-forward.org/kb_day/day-1/&quot;&gt;Flink Forward 2016 schedule&lt;/a&gt; is now available online. This year&#39;s event will include 2 days of talks from stream processing experts at Google, MapR, Alibaba, Netflix, Cloudera, and more. Following the talks is a full day of hands-on Flink training.&lt;/p&gt;
&lt;p&gt;Ted Dunning has been announced as a keynote speaker at the event. Ted is the VP of Incubator at &lt;a href=&quot;http://www.apache.org&quot;&gt;Apache Software Foundation&lt;/a&gt;, the Chief Application Architect at &lt;a href=&quot;http://www.mapr.com&quot;&gt;MapR Technologies&lt;/a&gt;, and a mentor on many recent projects. He&#39;ll present &lt;a href=&quot;http://flink-forward.org/kb_sessions/keynote-tba/&quot;&gt;&quot;How Can We Take Flink Forward?&quot;&lt;/a&gt; on the second day of the conference.&lt;/p&gt;
&lt;p&gt;Following Ted&#39;s keynote there will be a panel discussion on &lt;a href=&quot;http://flink-forward.org/kb_sessions/panel-large-scale-streaming-in-production/&quot;&gt;&quot;Large Scale Streaming in Production&quot;&lt;/a&gt;. As stream processing systems become more mainstream, companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang &lt;a href=&quot;http://www.alibaba.com&quot;&gt;(Alibaba)&lt;/a&gt;, Monal Daxini &lt;a href=&quot;http://www.netflix.com&quot;&gt;(Netflix)&lt;/a&gt;, Maxim Fateev &lt;a href=&quot;http://www.uber.com&quot;&gt;(Uber)&lt;/a&gt;, and Ted Dunning &lt;a href=&quot;http://www.mapr.com&quot;&gt;(MapR Technologies)&lt;/a&gt; on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scale. The panel will be moderated by Jamie Grier &lt;a href=&quot;http://www.data-artisans.com&quot;&gt;(data Artisans)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The welcome keynote on Monday, September 12, will be presented by data Artisans&#39; co-founders Kostas Tzoumas and Stephan Ewen. They will talk about &lt;a href=&quot;http://flink-forward.org/kb_sessions/keynote-tba-2/&quot;&gt;&quot;The maturing data streaming ecosystem and Apache Flink’s accelerated growth&quot;&lt;/a&gt;. In this talk, Kostas and Stephan discuss several large-scale stream processing use cases that the data Artisans team has seen over the past year.&lt;/p&gt;
&lt;p&gt;And one more recent addition to the program: Maxim Fateev of Uber will present &lt;a href=&quot;http://flink-forward.org/kb_sessions/beyond-the-watermark-on-demand-backfilling-in-flink/&quot;&gt;&quot;Beyond the Watermark: On-Demand Backfilling in Flink&quot;&lt;/a&gt;. Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim&#39;s talk covers Uber&#39;s solution for on-demand backfilling.&lt;/p&gt;
&lt;p&gt;We hope to see many community members at Flink Forward 2016. Registration is available online: &lt;a href=&quot;http://flink-forward.org/registration/&quot;&gt;flink-forward.org/registration&lt;/a&gt;
&lt;/p&gt;
</description>
<pubDate>Wed, 24 Aug 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/24/ff16-keynotes-panels.html</link>
<guid isPermaLink="true">/news/2016/08/24/ff16-keynotes-panels.html</guid>
</item>
<item>
<title>Flink 1.1.1 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version 1.1.1.&lt;/p&gt;
&lt;p&gt;The Maven artifacts published on Maven central for 1.1.0 had a Hadoop dependency issue: No Hadoop 1 specific version (with version 1.1.0-hadoop1) was deployed and 1.1.0 artifacts have a dependency on Hadoop 1 instead of Hadoop 2.&lt;/p&gt;
&lt;p&gt;This was fixed with this release and we &lt;strong&gt;highly recommend&lt;/strong&gt; all users to use this version of Flink by bumping your Flink dependencies to version 1.1.1:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-xml&quot;&gt;&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-java&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-streaming-java_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.apache.flink&lt;span class=&quot;nt&quot;&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;flink-clients_2.10&lt;span class=&quot;nt&quot;&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.1.1&lt;span class=&quot;nt&quot;&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class=&quot;nt&quot;&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You can find the binaries on the updated &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Thu, 11 Aug 2016 11:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/11/release-1.1.1.html</link>
<guid isPermaLink="true">/news/2016/08/11/release-1.1.1.html</guid>
</item>
<item>
<title>Announcing Apache Flink 1.1.0</title>
<description>&lt;div class=&quot;alert alert-success&quot;&gt;&lt;strong&gt;Important&lt;/strong&gt;: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use &lt;strong&gt;1.1.1&lt;/strong&gt; or &lt;strong&gt;1.1.1-hadoop1&lt;/strong&gt; as the Flink version.&lt;/div&gt;
&lt;p&gt;The Apache Flink community is pleased to announce the availability of Flink 1.1.0.&lt;/p&gt;
&lt;p&gt;This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the &lt;a href=&quot;/blog/release_1.1.0-changelog.html&quot;&gt;complete changelog&lt;/a&gt; for more details.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;We encourage everyone to &lt;a href=&quot;http://flink.apache.org/downloads.html&quot;&gt;download the release&lt;/a&gt; and &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/&quot;&gt;check out the documentation&lt;/a&gt;. Feedback through the Flink &lt;a href=&quot;http://flink.apache.org/community.html#mailing-lists&quot;&gt;mailing lists&lt;/a&gt; is, as always, very welcome!&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Some highlights of the release are listed in the following sections.&lt;/p&gt;
&lt;h2 id=&quot;connectors&quot;&gt;Connectors&lt;/h2&gt;
&lt;p&gt;The &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/index.html&quot;&gt;streaming connectors&lt;/a&gt; are a major part of Flink’s DataStream API. This release adds support for new external systems and further improves on the available connectors.&lt;/p&gt;
&lt;h3 id=&quot;continuous-file-system-sources&quot;&gt;Continuous File System Sources&lt;/h3&gt;
&lt;p&gt;A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via &lt;code&gt;FileProcessingMode&lt;/code&gt;s:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;stream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;readFile&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;textInputFormat&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;hdfs:///file-path&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FileProcessingMode&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;PROCESS_CONTINUOUSLY&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;5000&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// monitoring interval (millis)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;FilePathFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;createDefaultFilter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;());&lt;/span&gt; &lt;span class=&quot;c1&quot;&gt;// file path filter&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This will monitor &lt;code&gt;hdfs:///file-path&lt;/code&gt; every &lt;code&gt;5000&lt;/code&gt; milliseconds. Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/index.html#data-sources&quot;&gt;DataSource documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;kinesis-source-and-sink&quot;&gt;Kinesis Source and Sink&lt;/h3&gt;
&lt;p&gt;Flink 1.1 adds a Kinesis connector for both consuming (&lt;code&gt;FlinkKinesisConsumer&lt;/code&gt;) from and producing (&lt;code&gt;FlinkKinesisProduer&lt;/code&gt;) to &lt;a href=&quot;https://aws.amazon.com/kinesis/&quot;&gt;Amazon Kinesis Streams&lt;/a&gt;, which is a managed service purpose-built to make it easy to work with streaming data on AWS.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kinesis&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;env&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;FlinkKinesisConsumer&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&amp;gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;stream-name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;schema&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;config&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/kinesis.html&quot;&gt;Kinesis connector documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;cassandra-sink&quot;&gt;Cassandra Sink&lt;/h3&gt;
&lt;p&gt;The &lt;a href=&quot;http://wiki.apache.org/cassandra/GettingStarted&quot;&gt;Apache Cassandra&lt;/a&gt; sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;CassandraSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;addSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/connectors/cassandra.html&quot;&gt;Cassandra Sink documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;table-api-and-sql&quot;&gt;Table API and SQL&lt;/h2&gt;
&lt;p&gt;The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala).&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;custT&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;toTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;custDs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;name, zipcode&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;zipcode = &amp;#39;12345&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;result&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT STREAM product, amount FROM Orders WHERE product LIKE &amp;#39;%Rubber%&amp;#39;&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;A more detailed introduction can be found in the &lt;a href=&quot;http://flink.apache.org/news/2016/05/24/stream-sql.html&quot;&gt;Flink blog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/table.html&quot;&gt;Table API documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;datastream-api&quot;&gt;DataStream API&lt;/h2&gt;
&lt;p&gt;The DataStream API now exposes &lt;strong&gt;session windows&lt;/strong&gt; and &lt;strong&gt;allowed lateness&lt;/strong&gt; as first-class citizens.&lt;/p&gt;
&lt;h3 id=&quot;session-windows&quot;&gt;Session Windows&lt;/h3&gt;
&lt;p&gt;Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a &lt;em&gt;certain period of inactivity&lt;/em&gt;. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/session-windows.svg&quot; style=&quot;height:400px&quot; /&gt;
&lt;/center&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;selector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;EventTimeSessionWindows&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;withGap&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;minutes&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;support-for-late-elements&quot;&gt;Support for Late Elements&lt;/h3&gt;
&lt;p&gt;You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called &lt;em&gt;allowed lateness&lt;/em&gt;. This specifies by how much time elements can be late.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;input&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;key&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;selector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;).&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;assigner&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;allowedLateness&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;windowed&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;transformation&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;(&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;function&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Elements that arrive within the allowed lateness are still put into windows and are considered when computing window results. If elements arrive after the allowed lateness they will be dropped. Flink will also make sure that any state held by the windowing operation is garbage collected once the watermark passes the end of a window plus the allowed lateness.&lt;/p&gt;
&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/windows.html&quot;&gt;Windows documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;scala-api-for-complex-event-processing-cep&quot;&gt;Scala API for Complex Event Processing (CEP)&lt;/h2&gt;
&lt;p&gt;Flink 1.0 added the initial version of the CEP library. The core of the library is a Pattern API, which allows you to easily specify patterns to match against in your event stream. While in Flink 1.0 this API was only available for Java, Flink 1.1. now exposes the same API for Scala, allowing you to specify your event patterns in a more concise manner.&lt;/p&gt;
&lt;p&gt;A more detailed introduction can be found in the &lt;a href=&quot;http://flink.apache.org/news/2016/04/06/cep-monitoring.html&quot;&gt;Flink blog&lt;/a&gt; and the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/streaming/libs/cep.html&quot;&gt;CEP documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;graph-generators-and-new-gelly-library-algorithms&quot;&gt;Graph generators and new Gelly library algorithms&lt;/h2&gt;
&lt;p&gt;This release includes many enhancements and new features for graph processing. Gelly now provides a collection of scalable graph generators for common graph types, such as complete, cycle, grid, hypercube, and RMat graphs. A variety of new graph algorithms have been added to the Gelly library, including Global and Local Clustering Coefficient, HITS, and similarity measures (Jaccard and Adamic-Adar).&lt;/p&gt;
&lt;p&gt;For a full list of new graph processing features, check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/batch/libs/gelly.html&quot;&gt;Gelly documentation&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;metrics&quot;&gt;Metrics&lt;/h2&gt;
&lt;p&gt;Flink’s new metrics system allows you to easily gather and expose metrics from your user application to external systems. You can add counters, gauges, and histograms to your application via the runtime context:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Counter&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;counter&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;getRuntimeContext&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getMetricGroup&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;counter&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;my-counter&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;All registered metrics will be exposed via reporters. Out of the box, Flinks comes with support for JMX, Ganglia, Graphite, and statsD. In addition to your custom metrics, Flink exposes many internal metrics like checkpoint sizes and JVM stats.&lt;/p&gt;
&lt;p&gt;Check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-release-1.1/apis/metrics.html&quot;&gt;Metrics documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;list-of-contributors&quot;&gt;List of Contributors&lt;/h2&gt;
&lt;p&gt;The following 95 people contributed to this release:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Abdullah Ozturk&lt;/li&gt;
&lt;li&gt;Ajay Bhat&lt;/li&gt;
&lt;li&gt;Alexey Savartsov&lt;/li&gt;
&lt;li&gt;Aljoscha Krettek&lt;/li&gt;
&lt;li&gt;Andrea Sella&lt;/li&gt;
&lt;li&gt;Andrew Palumbo&lt;/li&gt;
&lt;li&gt;Chenguang He&lt;/li&gt;
&lt;li&gt;Chiwan Park&lt;/li&gt;
&lt;li&gt;David Moravek&lt;/li&gt;
&lt;li&gt;Dominik Bruhn&lt;/li&gt;
&lt;li&gt;Dyana Rose&lt;/li&gt;
&lt;li&gt;Fabian Hueske&lt;/li&gt;
&lt;li&gt;Flavio Pompermaier&lt;/li&gt;
&lt;li&gt;Gabor Gevay&lt;/li&gt;
&lt;li&gt;Gabor Horvath&lt;/li&gt;
&lt;li&gt;Geoffrey Mon&lt;/li&gt;
&lt;li&gt;Gordon Tai&lt;/li&gt;
&lt;li&gt;Greg Hogan&lt;/li&gt;
&lt;li&gt;Gyula Fora&lt;/li&gt;
&lt;li&gt;Henry Saputra&lt;/li&gt;
&lt;li&gt;Ignacio N. Lucero Ascencio&lt;/li&gt;
&lt;li&gt;Igor Berman&lt;/li&gt;
&lt;li&gt;Ismaël Mejía&lt;/li&gt;
&lt;li&gt;Ivan Mushketyk&lt;/li&gt;
&lt;li&gt;Jark Wu&lt;/li&gt;
&lt;li&gt;Jiri Simsa&lt;/li&gt;
&lt;li&gt;Jonas Traub&lt;/li&gt;
&lt;li&gt;Josh&lt;/li&gt;
&lt;li&gt;Joshi&lt;/li&gt;
&lt;li&gt;Joshua Herman&lt;/li&gt;
&lt;li&gt;Ken Krugler&lt;/li&gt;
&lt;li&gt;Konstantin Knauf&lt;/li&gt;
&lt;li&gt;Lasse Dalegaard&lt;/li&gt;
&lt;li&gt;Li Fanxi&lt;/li&gt;
&lt;li&gt;MaBiao&lt;/li&gt;
&lt;li&gt;Mao Wei&lt;/li&gt;
&lt;li&gt;Mark Reddy&lt;/li&gt;
&lt;li&gt;Martin Junghanns&lt;/li&gt;
&lt;li&gt;Martin Liesenberg&lt;/li&gt;
&lt;li&gt;Maximilian Michels&lt;/li&gt;
&lt;li&gt;Michal Fijolek&lt;/li&gt;
&lt;li&gt;Márton Balassi&lt;/li&gt;
&lt;li&gt;Nathan Howell&lt;/li&gt;
&lt;li&gt;Niels Basjes&lt;/li&gt;
&lt;li&gt;Niels Zeilemaker&lt;/li&gt;
&lt;li&gt;Phetsarath, Sourigna&lt;/li&gt;
&lt;li&gt;Robert Metzger&lt;/li&gt;
&lt;li&gt;Scott Kidder&lt;/li&gt;
&lt;li&gt;Sebastian Klemke&lt;/li&gt;
&lt;li&gt;Shahin&lt;/li&gt;
&lt;li&gt;Shannon Carey&lt;/li&gt;
&lt;li&gt;Shannon Quinn&lt;/li&gt;
&lt;li&gt;Stefan Richter&lt;/li&gt;
&lt;li&gt;Stefano Baghino&lt;/li&gt;
&lt;li&gt;Stefano Bortoli&lt;/li&gt;
&lt;li&gt;Stephan Ewen&lt;/li&gt;
&lt;li&gt;Steve Cosenza&lt;/li&gt;
&lt;li&gt;Sumit Chawla&lt;/li&gt;
&lt;li&gt;Tatu Saloranta&lt;/li&gt;
&lt;li&gt;Tianji Li&lt;/li&gt;
&lt;li&gt;Till Rohrmann&lt;/li&gt;
&lt;li&gt;Todd Lisonbee&lt;/li&gt;
&lt;li&gt;Tony Baines&lt;/li&gt;
&lt;li&gt;Trevor Grant&lt;/li&gt;
&lt;li&gt;Ufuk Celebi&lt;/li&gt;
&lt;li&gt;Vasudevan&lt;/li&gt;
&lt;li&gt;Yijie Shen&lt;/li&gt;
&lt;li&gt;Zack Pierce&lt;/li&gt;
&lt;li&gt;Zhai Jia&lt;/li&gt;
&lt;li&gt;chengxiang li&lt;/li&gt;
&lt;li&gt;chobeat&lt;/li&gt;
&lt;li&gt;danielblazevski&lt;/li&gt;
&lt;li&gt;dawid&lt;/li&gt;
&lt;li&gt;dawidwys&lt;/li&gt;
&lt;li&gt;eastcirclek&lt;/li&gt;
&lt;li&gt;erli ding&lt;/li&gt;
&lt;li&gt;gallenvara&lt;/li&gt;
&lt;li&gt;kl0u&lt;/li&gt;
&lt;li&gt;mans2singh&lt;/li&gt;
&lt;li&gt;markreddy&lt;/li&gt;
&lt;li&gt;mjsax&lt;/li&gt;
&lt;li&gt;nikste&lt;/li&gt;
&lt;li&gt;omaralvarez&lt;/li&gt;
&lt;li&gt;philippgrulich&lt;/li&gt;
&lt;li&gt;ramkrishna&lt;/li&gt;
&lt;li&gt;sahitya-pavurala&lt;/li&gt;
&lt;li&gt;samaitra&lt;/li&gt;
&lt;li&gt;smarthi&lt;/li&gt;
&lt;li&gt;spkavuly&lt;/li&gt;
&lt;li&gt;subhankar&lt;/li&gt;
&lt;li&gt;twalthr&lt;/li&gt;
&lt;li&gt;vasia&lt;/li&gt;
&lt;li&gt;xueyan.li&lt;/li&gt;
&lt;li&gt;zentol&lt;/li&gt;
&lt;li&gt;卫乐&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Mon, 08 Aug 2016 15:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/08/08/release-1.1.0.html</link>
<guid isPermaLink="true">/news/2016/08/08/release-1.1.0.html</guid>
</item>
<item>
<title>Stream Processing for Everyone with SQL and Apache Flink</title>
<description>&lt;p&gt;The capabilities of open source systems for distributed stream processing have evolved significantly over the last years. Initially, the first systems in the field (notably &lt;a href=&quot;https://storm.apache.org&quot;&gt;Apache Storm&lt;/a&gt;) provided low latency processing, but were limited to at-least-once guarantees, processing-time semantics, and rather low-level APIs. Since then, several new systems emerged and pushed the state of the art of open source stream processing in several dimensions. Today, users of Apache Flink or &lt;a href=&quot;https://beam.incubator.apache.org&quot;&gt;Apache Beam&lt;/a&gt; can use fluent Scala and Java APIs to implement stream processing jobs that operate in event-time with exactly-once semantics at high throughput and low latency.&lt;/p&gt;
&lt;p&gt;In the meantime, stream processing has taken off in the industry. We are witnessing a rapidly growing interest in stream processing which is reflected by prevalent deployments of streaming processing infrastructure such as &lt;a href=&quot;https://kafka.apache.org&quot;&gt;Apache Kafka&lt;/a&gt; and Apache Flink. The increasing number of available data streams results in a demand for people that can analyze streaming data and turn it into real-time insights. However, stream data analysis requires a special skill set including knowledge of streaming concepts such as the characteristics of unbounded streams, windows, time, and state as well as the skills to implement stream analysis jobs usually against Java or Scala APIs. People with this skill set are rare and hard to find.&lt;/p&gt;
&lt;p&gt;About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is &lt;em&gt;the&lt;/em&gt; standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysis and significantly simplify many applications including stream ingestion and simple transformations. In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.&lt;/p&gt;
&lt;h2 id=&quot;where-did-we-come-from&quot;&gt;Where did we come from?&lt;/h2&gt;
&lt;p&gt;With the &lt;a href=&quot;http://flink.apache.org/news/2015/04/13/release-0.9.0-milestone1.html&quot;&gt;0.9.0-milestone1&lt;/a&gt; release, Apache Flink added an API to process relational data with SQL-like expressions called the Table API. The central concept of this API is a Table, a structured data set or stream on which relational operations can be applied. The Table API is tightly integrated with the DataSet and DataStream API. A Table can be easily created from a DataSet or DataStream and can also be converted back into a DataSet or DataStream as the following example shows&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;ExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// obtain a DataSet from somewhere&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempData&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;DataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)]&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// convert the DataSet to a Table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempTable&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempData&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// compute your result&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgTempCTable&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempTable&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;like&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;room%&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;3600&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;24&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;Location&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.556&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempC&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;groupBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;day&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;room&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTempC&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// convert result Table back into a DataSet and print it&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;avgTempCTable&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toDataSet&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;Row&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;].&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Although the example shows Scala code, there is also an equivalent Java version of the Table API. The following picture depicts the original architecture of the Table API.&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/stream-sql/old-table-api.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;A Table is created from a DataSet or DataStream and transformed into a new Table by applying relational transformations such as &lt;code&gt;filter&lt;/code&gt;, &lt;code&gt;join&lt;/code&gt;, or &lt;code&gt;select&lt;/code&gt; on them. Internally, a logical table operator tree is constructed from the applied Table transformations. When a Table is translated back into a DataSet or DataStream, the respective translator translates the logical operator tree into DataSet or DataStream operators. Expressions like &lt;code&gt;&#39;location.like(&quot;room%&quot;)&lt;/code&gt; are compiled into Flink functions via code generation.&lt;/p&gt;
&lt;p&gt;However, the original Table API had a few limitations. First of all, it could not stand alone. Table API queries had to be always embedded into a DataSet or DataStream program. Queries against batch Tables did not support outer joins, sorting, and many scalar functions which are commonly used in SQL queries. Queries against streaming tables only supported filters, union, and projections and no aggregations or joins. Also, the translation process did not leverage query optimization techniques except for the physical optimization that is applied to all DataSet programs.&lt;/p&gt;
&lt;h2 id=&quot;table-api-joining-forces-with-sql&quot;&gt;Table API joining forces with SQL&lt;/h2&gt;
&lt;p&gt;The discussion about adding support for SQL came up a few times in the Flink community. With Flink 0.9 and the availability of the Table API, code generation for relational expressions, and runtime operators, the foundation for such an extension seemed to be there and SQL support the next logical step. On the other hand, the community was also well aware of the multitude of dedicated “SQL-on-Hadoop” solutions in the open source landscape (&lt;a href=&quot;https://hive.apache.org&quot;&gt;Apache Hive&lt;/a&gt;, &lt;a href=&quot;https://drill.apache.org&quot;&gt;Apache Drill&lt;/a&gt;, &lt;a href=&quot;http://impala.io&quot;&gt;Apache Impala&lt;/a&gt;, &lt;a href=&quot;https://tajo.apache.org&quot;&gt;Apache Tajo&lt;/a&gt;, just to name a few). Given these alternatives, we figured that time would be better spent improving Flink in other ways than implementing yet another SQL-on-Hadoop solution.&lt;/p&gt;
&lt;p&gt;However, with the growing popularity of stream processing and the increasing adoption of Flink in this area, the Flink community saw the need for a simpler API to enable more users to analyze streaming data. About half a year ago, we decided to take the Table API to the next level, extend the stream processing capabilities of the Table API, and add support for SQL on streaming data. What we came up with was a revised architecture for a Table API that supports SQL (and Table API) queries on streaming and static data sources. We did not want to reinvent the wheel and decided to build the new Table API on top of &lt;a href=&quot;https://calcite.apache.org&quot;&gt;Apache Calcite&lt;/a&gt;, a popular SQL parser and optimizer framework. Apache Calcite is used by many projects including Apache Hive, Apache Drill, Cascading, and many &lt;a href=&quot;https://calcite.apache.org/docs/powered_by.html&quot;&gt;more&lt;/a&gt;. Moreover, the Calcite community put &lt;a href=&quot;https://calcite.apache.org/docs/stream.html&quot;&gt;SQL on streams&lt;/a&gt; on their roadmap which makes it a perfect fit for Flink’s SQL interface.&lt;/p&gt;
&lt;p&gt;Calcite is central in the new design as the following architecture sketch shows:&lt;/p&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/stream-sql/new-table-api.png&quot; style=&quot;width:75%;margin:15px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;The new architecture features two integrated APIs to specify relational queries, the Table API and SQL. Queries of both APIs are validated against a catalog of registered tables and converted into Calcite’s representation for logical plans. In this representation, stream and batch queries look exactly the same. Next, Calcite’s cost-based optimizer applies transformation rules and optimizes the logical plans. Depending on the nature of the sources (streaming or static) we use different rule sets. Finally, the optimized plan is translated into a regular Flink DataStream or DataSet program. This step involves again code generation to compile relational expressions into Flink functions.&lt;/p&gt;
&lt;p&gt;The new architecture of the Table API maintains the basic principles of the original Table API and improves it. It keeps a uniform interface for relational queries on streaming and static data. In addition, we take advantage of Calcite’s query optimization framework and SQL parser. The design builds upon Flink’s established APIs, i.e., the DataStream API that offers low-latency, high-throughput stream processing with exactly-once semantics and consistent results due to event-time processing, and the DataSet API with robust and efficient in-memory operators and pipelined data exchange. Any improvements to Flink’s core APIs and engine will automatically improve the execution of Table API and SQL queries.&lt;/p&gt;
&lt;p&gt;With this effort, we are adding SQL support for both streaming and static data to Flink. However, we do not want to see this as a competing solution to dedicated, high-performance SQL-on-Hadoop solutions, such as Impala, Drill, and Hive. Instead, we see the sweet spot of Flink’s SQL integration primarily in providing access to streaming analytics to a wider audience. In addition, it will facilitate integrated applications that use Flink’s API’s as well as SQL while being executed on a single runtime engine.&lt;/p&gt;
&lt;h2 id=&quot;how-will-flinks-sql-on-streams-look-like&quot;&gt;How will Flink’s SQL on streams look like?&lt;/h2&gt;
&lt;p&gt;So far we discussed the motivation for and architecture of Flink’s stream SQL interface, but how will it actually look like? The new SQL interface is integrated into the Table API. DataStreams, DataSets, and external data sources can be registered as tables at the &lt;code&gt;TableEnvironment&lt;/code&gt; in order to make them queryable with SQL. The &lt;code&gt;TableEnvironment.sql()&lt;/code&gt; method states a SQL query and returns its result as a Table. The following example shows a complete program that reads a streaming table from a JSON encoded Kafka topic, processes it with a SQL query and writes the resulting stream into another Kafka topic. Please note that the KafkaJsonSource and KafkaJsonSink are under development and not available yet. In the future, TableSources and TableSinks can be persisted to and loaded from files to ease reuse of source and sink definitions and to reduce boilerplate code.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;c1&quot;&gt;// get environments&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;StreamExecutionEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getExecutionEnvironment&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;getTableEnvironment&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// configure Kafka connection&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;kafkaProps&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define a JSON encoded Kafka topic as external table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorSource&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;KafkaJsonSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;[(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;String&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Long&lt;/span&gt;, &lt;span class=&quot;kt&quot;&gt;Double&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)](&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;sensorTopic&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;kafkaProps&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;location&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;time&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&amp;quot;tempF&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// register external table&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;registerTableSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensorData&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorSource&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define query in external table&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;roomSensors&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;sql&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;FROM sensorData &amp;quot;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt;
&lt;span class=&quot;s&quot;&gt;&amp;quot;WHERE location LIKE &amp;#39;room%&amp;#39;&amp;quot;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define a JSON encoded Kafka topic as external sink&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;roomSensorSink&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;KafkaJsonSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(...)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;// define sink for room sensor data and execute query&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;roomSensors&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;toSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;roomSensorSink&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;execEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;You might have noticed that this example left out the most interesting aspects of stream data processing: window aggregates and joins. How will these operations be expressed in SQL? Well, that is a very good question. The Apache Calcite community put out an excellent proposal that discusses the syntax and semantics of &lt;a href=&quot;https://calcite.apache.org/docs/stream.html&quot;&gt;SQL on streams&lt;/a&gt;. It describes Calcite’s stream SQL as &lt;em&gt;“an extension to standard SQL, not another ‘SQL-like’ language”&lt;/em&gt;. This has several benefits. First, people who are familiar with standard SQL will be able to analyze data streams without learning a new syntax. Queries on static tables and streams are (almost) identical and can be easily ported. Moreover it is possible to specify queries that reference static and streaming tables at the same time which goes well together with Flink’s vision to handle batch processing as a special case of stream processing, i.e., as processing finite streams. Finally, using standard SQL for stream data analysis means following a well established standard that is supported by many tools.&lt;/p&gt;
&lt;p&gt;Although we haven’t completely fleshed out the details of how windows will be defined in Flink’s SQL syntax and Table API, the following examples show how a tumbling window query could look like in SQL and the Table API.&lt;/p&gt;
&lt;h3 id=&quot;sql-following-the-syntax-proposal-of-calcites-streaming-sql-document&quot;&gt;SQL (following the syntax proposal of Calcite’s streaming SQL document)&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-sql&quot;&gt;&lt;span class=&quot;k&quot;&gt;SELECT&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;STREAM&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TUMBLE_END&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;day&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;room&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;AVG&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;556&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;AS&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgTempC&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;FROM&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;sensorData&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;WHERE&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;location&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;LIKE&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;room%&amp;#39;&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;GROUP&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;BY&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TUMBLE&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;time&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;INTERVAL&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;&amp;#39;1&amp;#39;&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;DAY&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;),&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;location&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h3 id=&quot;table-api&quot;&gt;Table API&lt;/h3&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-scala&quot;&gt;&lt;span class=&quot;k&quot;&gt;val&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;avgRoomTemp&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;Table&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tableEnv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ingest&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;sensorData&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;like&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;room%&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;))&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;partitionBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;window&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nc&quot;&gt;Tumbling&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;every&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;Days&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;on&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;time&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;w&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;end&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;location&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;((&lt;/span&gt;&lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;tempF&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;32&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;mf&quot;&gt;0.556&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;avg&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;-Symbol&quot;&gt;&amp;#39;avgTempCs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;h2 id=&quot;whats-up-next&quot;&gt;What’s up next?&lt;/h2&gt;
&lt;p&gt;The Flink community is actively working on SQL support for the next minor version Flink 1.1.0. In the first version, SQL (and Table API) queries on streams will be limited to selection, filter, and union operators. Compared to Flink 1.0.0, the revised Table API will support many more scalar functions and be able to read tables from external sources and write them back to external sinks. A lot of work went into reworking the architecture of the Table API and integrating Apache Calcite.&lt;/p&gt;
&lt;p&gt;In Flink 1.2.0, the feature set of SQL on streams will be significantly extended. Among other things, we plan to support different types of window aggregates and maybe also streaming joins. For this effort, we want to closely collaborate with the Apache Calcite community and help extending Calcite’s support for relational operations on streaming data when necessary.&lt;/p&gt;
&lt;p&gt;If this post made you curious and you want to try out Flink’s SQL interface and the new Table API, we encourage you to do so! Simply clone the SNAPSHOT &lt;a href=&quot;https://github.com/apache/flink/tree/master&quot;&gt;master branch&lt;/a&gt; and check out the &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/table.html&quot;&gt;Table API documentation for the SNAPSHOT version&lt;/a&gt;. Please note that the branch is under heavy development, and hence some code examples in this blog post might not work. We are looking forward to your feedback and welcome contributions.&lt;/p&gt;
</description>
<pubDate>Tue, 24 May 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/05/24/stream-sql.html</link>
<guid isPermaLink="true">/news/2016/05/24/stream-sql.html</guid>
</item>
<item>
<title>Flink 1.0.3 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.3&lt;/strong&gt;, the third bugfix release of the 1.0 series.&lt;/p&gt;
&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.3&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;
&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3790&quot;&gt;FLINK-3790&lt;/a&gt;] [streaming] Use proper hadoop config in rolling sink&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3840&quot;&gt;FLINK-3840&lt;/a&gt;] Remove Testing Files in RocksDB Backend&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3835&quot;&gt;FLINK-3835&lt;/a&gt;] [optimizer] Add input id to JSON plan to resolve ambiguous input names&lt;/li&gt;
&lt;li&gt;[hotfix] OptionSerializer.duplicate to respect stateful element serializer&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3803&quot;&gt;FLINK-3803&lt;/a&gt;] [runtime] Pass CheckpointStatsTracker to ExecutionGraph&lt;/li&gt;
&lt;li&gt;[hotfix] [cep] Make cep window border treatment consistent&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3678&quot;&gt;FLINK-3678&lt;/a&gt;] [dist, docs] Make Flink logs directory configurable&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;docs&quot;&gt;Docs&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[docs] Add note about S3AFileSystem ‘buffer.dir’ property&lt;/li&gt;
&lt;li&gt;[docs] Update AWS S3 docs&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;tests&quot;&gt;Tests&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3860&quot;&gt;FLINK-3860&lt;/a&gt;] [connector-wikiedits] Add retry loop to WikipediaEditsSourceTest&lt;/li&gt;
&lt;li&gt;[streaming-contrib] Fix port clash in DbStateBackend tests&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 11 May 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/05/11/release-1.0.3.html</link>
<guid isPermaLink="true">/news/2016/05/11/release-1.0.3.html</guid>
</item>
<item>
<title>Flink 1.0.2 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.2&lt;/strong&gt;, the second bugfix release of the 1.0 series.&lt;/p&gt;
&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.2&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;
&lt;h3 id=&quot;bug&quot;&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3657&quot;&gt;FLINK-3657&lt;/a&gt;] [dataSet] Change access of DataSetUtils.countElements() to ‘public’&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3762&quot;&gt;FLINK-3762&lt;/a&gt;] [core] Enable Kryo reference tracking&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3732&quot;&gt;FLINK-3732&lt;/a&gt;] [core] Fix potential null deference in ExecutionConfig#equals()&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3760&quot;&gt;FLINK-3760&lt;/a&gt;] Fix StateDescriptor.readObject&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3730&quot;&gt;FLINK-3730&lt;/a&gt;] Fix RocksDB Local Directory Initialization&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3712&quot;&gt;FLINK-3712&lt;/a&gt;] Make all dynamic properties available to the CLI frontend&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3688&quot;&gt;FLINK-3688&lt;/a&gt;] WindowOperator.trigger() does not emit Watermark anymore&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3697&quot;&gt;FLINK-3697&lt;/a&gt;] Properly access type information for nested POJO key selection&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;improvement&quot;&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3654&quot;&gt;FLINK-3654&lt;/a&gt;] Disable Write-Ahead-Log in RocksDB State&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;docs&quot;&gt;Docs&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2544&quot;&gt;FLINK-2544&lt;/a&gt;] [docs] Add Java 8 version for building PowerMock tests to docs&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3469&quot;&gt;FLINK-3469&lt;/a&gt;] [docs] Improve documentation for grouping keys&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3634&quot;&gt;FLINK-3634&lt;/a&gt;] [docs] Fix documentation for DataSetUtils.zipWithUniqueId()&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3711&quot;&gt;FLINK-3711&lt;/a&gt;][docs] Documentation of Scala fold()() uses correct syntax&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;tests&quot;&gt;Tests&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3716&quot;&gt;FLINK-3716&lt;/a&gt;] [kafka consumer] Decreasing socket timeout so testFailOnNoBroker() will pass before JUnit timeout&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Fri, 22 Apr 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/22/release-1.0.2.html</link>
<guid isPermaLink="true">/news/2016/04/22/release-1.0.2.html</guid>
</item>
<item>
<title>Flink Forward 2016 Call for Submissions Is Now Open</title>
<description>&lt;p&gt;We are happy to announce that the call for submissions for Flink Forward 2016 is now open! The conference will take place September 12-14, 2016 in Berlin, Germany, bringing together the open source stream processing community. Most Apache Flink committers will attend the conference, making it the ideal venue to learn more about the project and its roadmap and connect with the community.&lt;/p&gt;
&lt;p&gt;The conference welcomes submissions on everything Flink-related, including experiences with using Flink, products based on Flink, technical talks on extending Flink, as well as connecting Flink with other open source or proprietary software.&lt;/p&gt;
&lt;p&gt;Read more &lt;a href=&quot;http://flink-forward.org/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
</description>
<pubDate>Thu, 14 Apr 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/14/flink-forward-announce.html</link>
<guid isPermaLink="true">/news/2016/04/14/flink-forward-announce.html</guid>
</item>
<item>
<title>Introducing Complex Event Processing (CEP) with Apache Flink</title>
<description>&lt;p&gt;With the ubiquity of sensor networks and smart devices continuously collecting more and more data, we face the challenge to analyze an ever growing stream of data in near real-time.
Being able to react quickly to changing trends or to deliver up to date business intelligence can be a decisive factor for a company’s success or failure.
A key problem in real time processing is the detection of event patterns in data streams.&lt;/p&gt;
&lt;p&gt;Complex event processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern.
The result of a matching are usually complex events which are derived from the input events.
In contrast to traditional DBMSs where a query is executed on stored data, CEP executes data on a stored query.
All data which is not relevant for the query can be immediately discarded.
The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data.
Furthermore, inputs are processed immediately.
Once the system has seen all events for a matching sequence, results are emitted straight away.
This aspect effectively leads to CEP’s real time analytics capability.&lt;/p&gt;
&lt;p&gt;Consequently, CEP’s processing paradigm drew significant interest and found application in a wide variety of use cases.
Most notably, CEP is used nowadays for financial applications such as stock market trend and credit card fraud detection.
Moreover, it is used in RFID-based tracking and monitoring, for example, to detect thefts in a warehouse where items are not properly checked out.
CEP can also be used to detect network intrusion by specifying patterns of suspicious user behaviour.&lt;/p&gt;
&lt;p&gt;Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads.
Consequently, the Flink community has introduced the first version of a new &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html&quot;&gt;CEP library&lt;/a&gt; with &lt;a href=&quot;http://flink.apache.org/news/2016/03/08/release-1.0.0.html&quot;&gt;Flink 1.0&lt;/a&gt;.
In the remainder of this blog post, we introduce Flink’s CEP library and we illustrate its ease of use through the example of monitoring a data center.&lt;/p&gt;
&lt;h2 id=&quot;monitoring-and-alert-generation-for-data-centers&quot;&gt;Monitoring and alert generation for data centers&lt;/h2&gt;
&lt;center&gt;
&lt;img src=&quot;/img/blog/cep-monitoring.svg&quot; style=&quot;width:600px;margin:15px&quot; /&gt;
&lt;/center&gt;
&lt;p&gt;Assume we have a data center with a number of racks.
For each rack the power consumption and the temperature are monitored.
Whenever such a measurement takes place, a new power or temperature event is generated, respectively.
Based on this monitoring event stream, we want to detect racks that are about to overheat, and dynamically adapt their workload and cooling.&lt;/p&gt;
&lt;p&gt;For this scenario we use a two staged approach.
First, we monitor the temperature events.
Whenever we see two consecutive events whose temperature exceeds a threshold value, we generate a temperature warning with the current average temperature.
A temperature warning does not necessarily indicate that a rack is about to overheat.
But whenever we see two consecutive warnings with increasing temperatures, then we want to issue an alert for this rack.
This alert can then lead to countermeasures to cool the rack.&lt;/p&gt;
&lt;h3 id=&quot;implementation-with-apache-flink&quot;&gt;Implementation with Apache Flink&lt;/h3&gt;
&lt;p&gt;First, we define the messages of the incoming monitoring event stream.
Every monitoring message contains its originating rack ID.
The temperature event additionally contains the current temperature and the power consumption event contains the current voltage.
We model the events as POJOs:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;abstract&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;temperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;PowerEvent&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;extends&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;voltage&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we can ingest the monitoring event stream using one of Flink’s connectors (e.g. Kafka, RabbitMQ, etc.).
This will give us a &lt;code&gt;DataStream&amp;lt;MonitoringEvent&amp;gt; inputEventStream&lt;/code&gt; which we will use as the input for Flink’s CEP operator.
But first, we have to define the event pattern to detect temperature warnings.
The CEP library offers an intuitive &lt;a href=&quot;https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html#the-pattern-api&quot;&gt;Pattern API&lt;/a&gt; to easily define these complex patterns.&lt;/p&gt;
&lt;p&gt;Every pattern consists of a sequence of events which can have optional filter conditions assigned.
A pattern always starts with a first event to which we will assign the name &lt;code&gt;“First Event”&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This pattern will match every monitoring event.
Since we are only interested in &lt;code&gt;TemperatureEvents&lt;/code&gt; whose temperature is above a threshold value, we have to add an additional subtype constraint and a where clause:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;As stated before, we want to generate a &lt;code&gt;TemperatureWarning&lt;/code&gt; if and only if we see two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; for the same rack whose temperatures are too high.
The Pattern API offers the &lt;code&gt;next&lt;/code&gt; call which allows us to add a new event to our pattern.
This event has to follow directly the first matching event in order for the whole pattern to match.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;subtype&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;class&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;where&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;evt&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TEMPERATURE_THRESHOLD&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;10&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The final pattern definition also contains the &lt;code&gt;within&lt;/code&gt; API call which defines that two consecutive &lt;code&gt;TemperatureEvents&lt;/code&gt; have to occur within a time interval of 10 seconds for the pattern to match.
Depending on the time characteristic setting, this can either be processing, ingestion or event time.&lt;/p&gt;
&lt;p&gt;Having defined the event pattern, we can now apply it on the &lt;code&gt;inputEventStream&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;inputEventStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;warningPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Since we want to generate our warnings for each rack individually, we &lt;code&gt;keyBy&lt;/code&gt; the input event stream by the &lt;code&gt;“rackID”&lt;/code&gt; POJO field.
This enforces that matching events of our pattern will all have the same rack ID.&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;PatternStream&amp;lt;MonitoringEvent&amp;gt;&lt;/code&gt; gives us access to successfully matched event sequences.
They can be accessed using the &lt;code&gt;select&lt;/code&gt; API call.
The &lt;code&gt;select&lt;/code&gt; API call takes a &lt;code&gt;PatternSelectFunction&lt;/code&gt; which is called for every matching event sequence.
The event sequence is provided as a &lt;code&gt;Map&amp;lt;String, MonitoringEvent&amp;gt;&lt;/code&gt; where each &lt;code&gt;MonitoringEvent&lt;/code&gt; is identified by its assigned event name.
Our pattern select function generates for each matching pattern a &lt;code&gt;TemperatureWarning&lt;/code&gt; event.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;double&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;averageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;tempPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;select&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;MonitoringEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureEvent&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(),&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Now we have generated a new complex event stream &lt;code&gt;DataStream&amp;lt;TemperatureWarning&amp;gt; warnings&lt;/code&gt; from the initial monitoring event stream.
This complex event stream can again be used as the input for another round of complex event processing.
We use the &lt;code&gt;TemperatureWarnings&lt;/code&gt; to generate &lt;code&gt;TemperatureAlerts&lt;/code&gt; whenever we see two consecutive &lt;code&gt;TemperatureWarnings&lt;/code&gt; for the same rack with increasing temperatures.
The &lt;code&gt;TemperatureAlerts&lt;/code&gt; have the following definition:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;kd&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;kd&quot;&gt;class&lt;/span&gt; &lt;span class=&quot;nc&quot;&gt;TemperatureAlert&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;kd&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;...&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;At first, we have to define our alert event pattern:&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;?&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;begin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;next&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;within&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Time&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;seconds&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;));&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This definition says that we want to see two &lt;code&gt;TemperatureWarnings&lt;/code&gt; within 20 seconds.
The first event has the name &lt;code&gt;“First Event”&lt;/code&gt; and the second consecutive event has the name &lt;code&gt;“Second Event”&lt;/code&gt;.
The individual events don’t have a where clause assigned, because we need access to both events in order to decide whether the temperature is increasing.
Therefore, we apply the filter condition in the select clause.
But first, we obtain again a &lt;code&gt;PatternStream&lt;/code&gt;.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;PatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;CEP&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;warnings&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;keyBy&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;rackID&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;),&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;alertPattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Again, we &lt;code&gt;keyBy&lt;/code&gt; the warnings input stream by the &lt;code&gt;&quot;rackID&quot;&lt;/code&gt; so that we generate our alerts for each rack individually.
Next we apply the &lt;code&gt;flatSelect&lt;/code&gt; method which will give us access to matching event sequences and allows us to output an arbitrary number of complex events.
Thus, we will only generate a &lt;code&gt;TemperatureAlert&lt;/code&gt; if and only if the temperature is increasing.&lt;/p&gt;
&lt;div class=&quot;highlight&quot;&gt;&lt;pre&gt;&lt;code class=&quot;language-java&quot;&gt;&lt;span class=&quot;n&quot;&gt;DataStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alerts&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;alertPatternStream&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;flatSelect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;Map&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;String&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;Collector&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;first&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;First Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;TemperatureWarning&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pattern&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&amp;quot;Second Event&amp;quot;&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;);&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;lt;&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;second&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getAverageTemperature&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;())&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;{&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;out&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;collect&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;TemperatureAlert&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;first&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;na&quot;&gt;getRackID&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;()));&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;});&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;DataStream&amp;lt;TemperatureAlert&amp;gt; alerts&lt;/code&gt; is the data stream of temperature alerts for each rack.
Based on these alerts we can now adapt the workload or cooling for overheating racks.&lt;/p&gt;
&lt;p&gt;The full source code for the presented example as well as an example data source which generates randomly monitoring events can be found in &lt;a href=&quot;https://github.com/tillrohrmann/cep-monitoring&quot;&gt;this repository&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;In this blog post we have seen how easy it is to reason about event streams using Flink’s CEP library.
Using the example of monitoring and alert generation for a data center, we have implemented a short program which notifies us when a rack is about to overheat and potentially to fail.&lt;/p&gt;
&lt;p&gt;In the future, the Flink community will further extend the CEP library’s functionality and expressiveness.
Next on the road map is support for a regular expression-like pattern specification, including Kleene star, lower and upper bounds, and negation.
Furthermore, it is planned to allow the where-clause to access fields of previously matched events.
This feature will allow to prune unpromising event sequences early.&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; The example code requires Flink 1.0.1 or higher.&lt;/p&gt;
</description>
<pubDate>Wed, 06 Apr 2016 12:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/06/cep-monitoring.html</link>
<guid isPermaLink="true">/news/2016/04/06/cep-monitoring.html</guid>
</item>
<item>
<title>Flink 1.0.1 Released</title>
<description>&lt;p&gt;Today, the Flink community released Flink version &lt;strong&gt;1.0.1&lt;/strong&gt;, the first bugfix release of the 1.0 series.&lt;/p&gt;
&lt;p&gt;We &lt;strong&gt;recommend all users updating to this release&lt;/strong&gt; by bumping the version of your Flink dependencies to &lt;code&gt;1.0.1&lt;/code&gt; and updating the binaries on the server. You can find the binaries on the updated &lt;a href=&quot;/downloads.html&quot;&gt;Downloads page&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;fixed-issues&quot;&gt;Fixed Issues&lt;/h2&gt;
&lt;h3&gt;Bug&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3179&quot;&gt;FLINK-3179&lt;/a&gt;] - Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3472&quot;&gt;FLINK-3472&lt;/a&gt;] - JDBCInputFormat.nextRecord(..) has misleading message on NPE
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3491&quot;&gt;FLINK-3491&lt;/a&gt;] - HDFSCopyUtilitiesTest fails on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3495&quot;&gt;FLINK-3495&lt;/a&gt;] - RocksDB Tests can&amp;#39;t run on Windows
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3533&quot;&gt;FLINK-3533&lt;/a&gt;] - Update the Gelly docs wrt examples and cluster execution
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3563&quot;&gt;FLINK-3563&lt;/a&gt;] - .returns() doesn&amp;#39;t compile when using .map() with a custom MapFunction
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3566&quot;&gt;FLINK-3566&lt;/a&gt;] - Input type validation often fails on custom TypeInfo implementations
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3578&quot;&gt;FLINK-3578&lt;/a&gt;] - Scala DataStream API does not support Rich Window Functions
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3595&quot;&gt;FLINK-3595&lt;/a&gt;] - Kafka09 consumer thread does not interrupt when stuck in record emission
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3602&quot;&gt;FLINK-3602&lt;/a&gt;] - Recursive Types are not supported / crash TypeExtractor
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3621&quot;&gt;FLINK-3621&lt;/a&gt;] - Misleading documentation of memory configuration parameters
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3629&quot;&gt;FLINK-3629&lt;/a&gt;] - In wikiedits Quick Start example, &amp;quot;The first call, .window()&amp;quot; should be &amp;quot;The first call, .timeWindow()&amp;quot;
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3651&quot;&gt;FLINK-3651&lt;/a&gt;] - Fix faulty RollingSink Restore
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3653&quot;&gt;FLINK-3653&lt;/a&gt;] - recovery.zookeeper.storageDir is not documented on the configuration page
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3663&quot;&gt;FLINK-3663&lt;/a&gt;] - FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3681&quot;&gt;FLINK-3681&lt;/a&gt;] - CEP library does not support Java 8 lambdas as select function
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3682&quot;&gt;FLINK-3682&lt;/a&gt;] - CEP operator does not set the processing timestamp correctly
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3684&quot;&gt;FLINK-3684&lt;/a&gt;] - CEP operator does not forward watermarks properly
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Improvement&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3570&quot;&gt;FLINK-3570&lt;/a&gt;] - Replace random NIC selection heuristic by InetAddress.getLocalHost
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3575&quot;&gt;FLINK-3575&lt;/a&gt;] - Update Working With State Section in Doc
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-3591&quot;&gt;FLINK-3591&lt;/a&gt;] - Replace Quickstart K-Means Example by Streaming Example
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;Test&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2444&quot;&gt;FLINK-2444&lt;/a&gt;] - Add tests for HadoopInputFormats
&lt;/li&gt;
&lt;li&gt;[&lt;a href=&quot;https://issues.apache.org/jira/browse/FLINK-2445&quot;&gt;FLINK-2445&lt;/a&gt;] - Add tests for HadoopOutputFormats
&lt;/li&gt;
&lt;/ul&gt;
</description>
<pubDate>Wed, 06 Apr 2016 10:00:00 +0200</pubDate>
<link>https://flink.apache.org/news/2016/04/06/release-1.0.1.html</link>
<guid isPermaLink="true">/news/2016/04/06/release-1.0.1.html</guid>
</item>
</channel>
</rss>