blob: 55df7916583bca9f9753ac2354624a696b9d5790 [file] [log] [blame]
"use strict";(function(){const t={cache:!0};t.doc={id:"id",field:["title","content"],store:["title","href","section"]};const e=FlexSearch.create("balance",t);window.bookSearchIndex=e,e.add({id:0,href:"/what-is-flink/flink-architecture/",title:"Architecture",section:"About",content:` What is Apache Flink? — Architecture # Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.
Here, we explain important aspects of Flink’s architecture.
Process Unbounded and Bounded Data # Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream.
Data can be processed as unbounded or bounded streams.
Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested. It is not possible to wait for all input data to arrive because the input is unbounded and will not be complete at any point in time. Processing unbounded data often requires that events are ingested in a specific order, such as the order in which events occurred, to be able to reason about result completeness.
Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations. Ordered ingestion is not required to process bounded streams because a bounded data set can always be sorted. Processing of bounded streams is also known as batch processing.
Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance.
Convince yourself by exploring the use cases that have been built on top of Flink.
Deploy Applications Anywhere # Apache Flink is a distributed system and requires compute resources in order to execute applications. Flink integrates with all common cluster resource managers such as Hadoop YARN and Kubernetes but can also be setup to run as a stand-alone cluster.
Flink is designed to work well with each of the previously listed resource managers. This is achieved by resource-manager-specific deployment modes that allow Flink to interact with each resource manager in its idiomatic way.
When deploying a Flink application, Flink automatically identifies the required resources based on the application’s configured parallelism and requests them from the resource manager. In case of a failure, Flink replaces the failed container by requesting new resources. All communication to submit or control an application happens via REST calls. This eases the integration of Flink in many environments.
Run Applications at any Scale # Flink is designed to run stateful streaming applications at any scale. Applications are parallelized into possibly thousands of tasks that are distributed and concurrently executed in a cluster. Therefore, an application can leverage virtually unlimited amounts of CPUs, main memory, disk and network IO. Moreover, Flink easily maintains very large application state. Its asynchronous and incremental checkpointing algorithm ensures minimal impact on processing latencies while guaranteeing exactly-once state consistency.
Users reported impressive scalability numbers for Flink applications running in their production environments, such as
applications processing multiple trillions of events per day, applications maintaining multiple terabytes of state, and applications running on thousands of cores. Leverage In-Memory Performance # Stateful Flink applications are optimized for local state access. Task state is always maintained in memory or, if the state size exceeds the available memory, in access-efficient on-disk data structures. Hence, tasks perform all computations by accessing local, often in-memory, state yielding very low processing latencies. Flink guarantees exactly-once state consistency in case of failures by periodically and asynchronously checkpointing the local state to durable storage.
`}),e.add({id:1,href:"/documentation/flink-stable/",title:"Flink $FlinkStableShortVersion (stable)",section:"Documentation",content:" Flink documentation (latest stable release) # You can find the Flink documentation for the latest stable release here. "}),e.add({id:2,href:"/getting-started/with-flink/",title:"With Flink",section:"Getting Started",content:" Getting Started with Flink # Read how you can get started with Flink here. "}),e.add({id:3,href:"/what-is-flink/flink-applications/",title:"Applications",section:"About",content:` What is Apache Flink? — Applications # Apache Flink is a framework for stateful computations over unbounded and bounded data streams. Flink provides multiple APIs at different levels of abstraction and offers dedicated libraries for common use cases.
Here, we present Flink’s easy-to-use and expressive APIs and libraries.
Building Blocks for Streaming Applications # The types of applications that can be built with and executed by a stream processing framework are defined by how well the framework controls streams, state, and time. In the following, we describe these building blocks for stream processing applications and explain Flink’s approaches to handle them.
Streams # Obviously, streams are a fundamental aspect of stream processing. However, streams can have different characteristics that affect how a stream can and should be processed. Flink is a versatile processing framework that can handle any kind of stream.
Bounded and unbounded streams: Streams can be unbounded or bounded, i.e., fixed-sized data sets. Flink has sophisticated features to process unbounded streams, but also dedicated operators to efficiently process bounded streams. Real-time and recorded streams: All data are generated as streams. There are two ways to process the data. Processing it in real-time as it is generated or persisting the stream to a storage system, e.g., a file system or object store, and processed it later. Flink applications can process recorded or real-time streams. State # Every non-trivial streaming application is stateful, i.e., only applications that apply transformations on individual events do not require state. Any application that runs basic business logic needs to remember events or intermediate results to access them at a later point in time, for example when the next event is received or after a specific time duration.
Application state is a first-class citizen in Flink. You can see that by looking at all the features that Flink provides in the context of state handling.
Multiple State Primitives: Flink provides state primitives for different data structures, such as atomic values, lists, or maps. Developers can choose the state primitive that is most efficient based on the access pattern of the function. Pluggable State Backends: Application state is managed in and checkpointed by a pluggable state backend. Flink features different state backends that store state in memory or in RocksDB, an efficient embedded on-disk data store. Custom state backends can be plugged in as well. Exactly-once state consistency: Flink’s checkpointing and recovery algorithms guarantee the consistency of application state in case of a failure. Hence, failures are transparently handled and do not affect the correctness of an application. Very Large State: Flink is able to maintain application state of several terabytes in size due to its asynchronous and incremental checkpoint algorithm. Scalable Applications: Flink supports scaling of stateful applications by redistributing the state to more or fewer workers. Time # Time is another important ingredient of streaming applications. Most event streams have inherent time semantics because each event is produced at a specific point in time. Moreover, many common stream computations are based on time, such as windows aggregations, sessionization, pattern detection, and time-based joins. An important aspect of stream processing is how an application measures time, i.e., the difference between event-time and processing-time.
Flink provides a rich set of time-related features.
Event-time Mode: Applications that process streams with event-time semantics compute results based on timestamps of the events. Thereby, event-time processing allows for accurate and consistent results regardless whether recorded or real-time events are processed. Watermark Support: Flink employs watermarks to reason about time in event-time applications. Watermarks are also a flexible mechanism to trade-off the latency and completeness of results. Late Data Handling: When processing streams in event-time mode with watermarks, it can happen that a computation was considered completed before all associated events have arrived. Such events are called late events. Flink features multiple options to handle late events, such as rerouting them via side outputs and updating previously completed results. Processing-time Mode: In addition to its event-time mode, Flink also supports processing-time semantics which performs computations as triggered by the wall-clock time of the processing machine. The processing-time mode can be suitable for certain applications with strict low-latency requirements that can tolerate approximate results. Layered APIs # Flink provides three layered APIs. Each API offers a different trade-off between conciseness and expressiveness and targets different use cases.
We briefly present each API, discuss its applications, and show a code example.
The ProcessFunctions # ProcessFunctions are the most expressive function interfaces that Flink offers. Flink provides ProcessFunctions to process individual events from one or two input streams or events that were grouped in a window. ProcessFunctions provide fine-grained control over time and state. A ProcessFunction can arbitrarily modify its state and register timers that will trigger a callback function in the future. Hence, ProcessFunctions can implement complex per-event business logic as required for many stateful event-driven applications.
The following example shows a KeyedProcessFunction that operates on a KeyedStream and matches START and END events. When a START event is received, the function remembers its timestamp in state and registers a timer in four hours. If an END event is received before the timer fires, the function computes the duration between END and START event, clears the state, and returns the value. Otherwise, the timer just fires and clears the state.
/** * Matches keyed START and END events and computes the difference between * both elements' timestamps. The first String field is the key attribute, * the second String attribute marks START and END events. */ public static class StartEndDuration extends KeyedProcessFunction<String, Tuple2<String, String>, Tuple2<String, Long>> { private ValueState<Long> startTime; @Override public void open(Configuration conf) { // obtain state handle startTime = getRuntimeContext() .getState(new ValueStateDescriptor<Long>("startTime", Long.class)); } /** Called for each processed event. */ @Override public void processElement( Tuple2<String, String> in, Context ctx, Collector<Tuple2<String, Long>> out) throws Exception { switch (in.f1) { case "START": // set the start time if we receive a start event. startTime.update(ctx.timestamp()); // register a timer in four hours from the start event. ctx.timerService() .registerEventTimeTimer(ctx.timestamp() + 4 * 60 * 60 * 1000); break; case "END": // emit the duration between start and end event Long sTime = startTime.value(); if (sTime != null) { out.collect(Tuple2.of(in.f0, ctx.timestamp() - sTime)); // clear the state startTime.clear(); } default: // do nothing } } /** Called when a timer fires. */ @Override public void onTimer( long timestamp, OnTimerContext ctx, Collector<Tuple2<String, Long>> out) { // Timeout interval exceeded. Cleaning up the state. startTime.clear(); } } The example illustrates the expressive power of the KeyedProcessFunction but also highlights that it is a rather verbose interface.
The DataStream API # The DataStream API provides primitives for many common stream processing operations, such as windowing, record-at-a-time transformations, and enriching events by querying an external data store. The DataStream API is available for Java and Scala and is based on functions, such as map(), reduce(), and aggregate(). Functions can be defined by extending interfaces or as Java or Scala lambda functions.
The following example shows how to sessionize a clickstream and count the number of clicks per session.
// a stream of website clicks DataStream<Click> clicks = ... DataStream<Tuple2<String, Long>> result = clicks // project clicks to userId and add a 1 for counting .map( // define function by implementing the MapFunction interface. new MapFunction<Click, Tuple2<String, Long>>() { @Override public Tuple2<String, Long> map(Click click) { return Tuple2.of(click.userId, 1L); } }) // key by userId (field 0) .keyBy(0) // define session window with 30 minute gap .window(EventTimeSessionWindows.withGap(Time.minutes(30L))) // count clicks per session. Define function as lambda function. .reduce((a, b) -> Tuple2.of(a.f0, a.f1 + b.f1)); SQL & Table API # Flink features two relational APIs, the Table API and SQL . Both APIs are unified APIs for batch and stream processing, i.e., queries are executed with the same semantics on unbounded, real-time streams or bounded, recorded streams and produce the same results. The Table API and SQL leverage Apache Calcite for parsing, validation, and query optimization. They can be seamlessly integrated with the DataStream and DataSet APIs and support user-defined scalar, aggregate, and table-valued functions.
Flink’s relational APIs are designed to ease the definition of data analytics, data pipelining, and ETL applications.
The following example shows the SQL query to sessionize a clickstream and count the number of clicks per session. This is the same use case as in the example of the DataStream API.
SELECT userId, COUNT(*) FROM clicks GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId Libraries # Flink features several libraries for common data processing use cases. The libraries are typically embedded in an API and not fully self-contained. Hence, they can benefit from all features of the API and be integrated with other libraries.
Complex Event Processing (CEP) : Pattern detection is a very common use case for event stream processing. Flink’s CEP library provides an API to specify patterns of events (think of regular expressions or state machines). The CEP library is integrated with Flink’s DataStream API, such that patterns are evaluated on DataStreams. Applications for the CEP library include network intrusion detection, business process monitoring, and fraud detection.
DataSet API : The DataSet API is Flink’s core API for batch processing applications. The primitives of the DataSet API include map, reduce, (outer) join, co-group, and iterate. All operations are backed by algorithms and data structures that operate on serialized data in memory and spill to disk if the data size exceed the memory budget. The data processing algorithms of Flink’s DataSet API are inspired by traditional database operators, such as hybrid hash-join or external merge-sort. Starting with Flink 1.12 the DataSet API has been soft deprecated.
Gelly : Gelly is a library for scalable graph processing and analysis. Gelly is implemented on top of and integrated with the DataSet API. Hence, it benefits from its scalable and robust operators. Gelly features built-in algorithms , such as label propagation, triangle enumeration, and page rank, but provides also a Graph API that eases the implementation of custom graph algorithms.
`}),e.add({id:4,href:"/documentation/flink-master/",title:"Flink Master (snapshot)",section:"Documentation",content:" Flink documentation (latest snapshot) # You can find the Flink documentation for the latest snapshot here. "}),e.add({id:5,href:"/getting-started/with-flink-kubernetes-operator/",title:"With Flink Kubernetes Operator",section:"Getting Started",content:" Getting Started with Flink Kubernetes Operator # Read how you can get started with Flink Kubernetes Operator here. "}),e.add({id:6,href:"/documentation/flink-kubernetes-operator-stable/",title:"Kubernetes Operator $FlinkKubernetesOperatorStableShortVersion (latest)",section:"Documentation",content:" Flink Kubernetes Operator documentation (latest stable release) # You can find the Flink Kubernetes Operator documentation for the latest stable release here. "}),e.add({id:7,href:"/what-is-flink/flink-operations/",title:"Operations",section:"About",content:` What is Apache Flink? — Operations # Apache Flink is a framework for stateful computations over unbounded and bounded data streams. Since many streaming applications are designed to run continuously with minimal downtime, a stream processor must provide excellent failure recovery, as well as tooling to monitor and maintain applications while they are running.
Apache Flink puts a strong focus on the operational aspects of stream processing. Here, we explain Flink’s failure recovery mechanism and present its features to manage and supervise running applications.
Run Your Applications Non-Stop 24/7 # Machine and process failures are ubiquitous in distributed systems. A distributed stream processor like Flink must recover from failures in order to be able to run streaming applications 24/7. Obviously, this does not only mean to restart an application after a failure but also to ensure that its internal state remains consistent, such that the application can continue processing as if the failure had never happened.
Flink provides several features to ensure that applications keep running and remain consistent:
Consistent Checkpoints: Flink’s recovery mechanism is based on consistent checkpoints of an application’s state. In case of a failure, the application is restarted and its state is loaded from the latest checkpoint. In combination with resettable stream sources, this feature can guarantee exactly-once state consistency. Efficient Checkpoints: Checkpointing the state of an application can be quite expensive if the application maintains terabytes of state. Flink can perform asynchronous and incremental checkpoints, in order to keep the impact of checkpoints on the application’s latency SLAs very small. End-to-End Exactly-Once: Flink features transactional sinks for specific storage systems that guarantee that data is only written out exactly once, even in case of failures. Integration with Cluster Managers: Flink is tightly integrated with cluster managers, such as Hadoop YARN or Kubernetes. When a process fails, a new process is automatically started to take over its work. High-Availability Setup: Flink features a high-availability mode that eliminates all single-points-of-failure. The HA-mode is based on Apache ZooKeeper, a battle-proven service for reliable distributed coordination. Update, Migrate, Suspend, and Resume Your Applications # Streaming applications that power business-critical services need to be maintained. Bugs need to be fixed and improvements or new features need to be implemented. However, updating a stateful streaming application is not trivial. Often one cannot simply stop the applications and restart a fixed or improved version because one cannot afford to lose the state of the application.
Flink’s Savepoints are a unique and powerful feature that solves the issue of updating stateful applications and many other related challenges. A savepoint is a consistent snapshot of an application’s state and therefore very similar to a checkpoint. However in contrast to checkpoints, savepoints need to be manually triggered and are not automatically removed when an application is stopped. A savepoint can be used to start a state-compatible application and initialize its state. Savepoints enable the following features:
Application Evolution: Savepoints can be used to evolve applications. A fixed or improved version of an application can be restarted from a savepoint that was taken from a previous version of the application. It is also possible to start the application from an earlier point in time (given such a savepoint exists) to repair incorrect results produced by the flawed version. Cluster Migration: Using savepoints, applications can be migrated (or cloned) to different clusters. Flink Version Updates: An application can be migrated to run on a new Flink version using a savepoint. Application Scaling: Savepoints can be used to increase or decrease the parallelism of an application. A/B Tests and What-If Scenarios: The performance or quality of two (or more) different versions of an application can be compared by starting all versions from the same savepoint. Pause and Resume: An application can be paused by taking a savepoint and stopping it. At any later point in time, the application can be resumed from the savepoint. Archiving: Savepoints can be archived to be able to reset the state of an application to an earlier point in time. Monitor and Control Your Applications # Just like any other service, continuously running streaming applications need to be supervised and integrated into the operations infrastructure, i.e., monitoring and logging services, of an organization. Monitoring helps to anticipate problems and react ahead of time. Logging enables root-cause analysis to investigate failures. Finally, easily accessible interfaces to control running applications are an important feature.
Flink integrates nicely with many common logging and monitoring services and provides a REST API to control applications and query information.
Web UI: Flink features a web UI to inspect, monitor, and debug running applications. It can also be used to submit executions for execution or cancel them. Logging: Flink implements the popular slf4j logging interface and integrates with the logging frameworks log4j or logback. Metrics: Flink features a sophisticated metrics system to collect and report system and user-defined metrics. Metrics can be exported to several reporters, including JMX, Ganglia, Graphite, Prometheus, StatsD, Datadog, and Slf4j. REST API: Flink exposes a REST API to submit a new application, take a savepoint of a running application, or cancel an application. The REST API also exposes metadata and collected metrics of running or completed applications. `}),e.add({id:8,href:"/getting-started/with-flink-cdc/",title:"With Flink CDC",section:"Getting Started",content:" Getting Started with Flink CDC # Read how you can get started with Flink CDC here. "}),e.add({id:9,href:"/documentation/flink-kubernetes-operator-master/",title:"Kubernetes Operator Main (snapshot)",section:"Documentation",content:" Flink Kubernetes Operator documentation (latest snapshot) # You can find the Flink Kubernetes Operator documentation for the latest snapshot here. "}),e.add({id:10,href:"/what-is-flink/use-cases/",title:"Use Cases",section:"About",content:` Use Cases # Apache Flink is an excellent choice to develop and run many different types of applications due to its extensive feature set. Flink’s features include support for stream and batch processing, sophisticated state management, event-time processing semantics, and exactly-once consistency guarantees for state. Moreover, Flink can be deployed on various resource providers such as YARN and Kubernetes, but also as a stand-alone cluster on bare-metal hardware. Configured for high availability, Flink does not have a single point of failure. Flink has been proven to scale to thousands of cores and terabytes of application state, delivers high throughput and low latency, and powers some of the world’s most demanding stream processing applications.
Below, we explore the most common types of applications that are powered by Flink and give pointers to real-world examples.
Event-driven Applications Data Analytics Applications Data Pipeline Applications Event-driven Applications # What are event-driven applications? # An event-driven application is a stateful application that ingest events from one or more event streams and reacts to incoming events by triggering computations, state updates, or external actions.
Event-driven applications are an evolution of the traditional application design with separated compute and data storage tiers. In this architecture, applications read data from and persist data to a remote transactional database.
In contrast, event-driven applications are based on stateful stream processing applications. In this design, data and computation are co-located, which yields local (in-memory or disk) data access. Fault-tolerance is achieved by periodically writing checkpoints to a remote persistent storage. The figure below depicts the difference between the traditional application architecture and event-driven applications.
What are the advantages of event-driven applications? # Instead of querying a remote database, event-driven applications access their data locally which yields better performance, both in terms of throughput and latency. The periodic checkpoints to a remote persistent storage can be asynchronous and incremental. Hence, the impact of checkpointing on the regular event processing is very small. However, the event-driven application design provides more benefits than just local data access. In the tiered architecture, it is common that multiple applications share the same database. Hence, any change of the database, such as changing the data layout due to an application update or scaling the service, needs to be coordinated. Since each event-driven application is responsible for its own data, changes to the data representation or scaling the application requires less coordination.
How does Flink support event-driven applications? # The limits of event-driven applications are defined by how well a stream processor can handle time and state. Many of Flink’s outstanding features are centered around these concepts. Flink provides a rich set of state primitives that can manage very large data volumes (up to several terabytes) with exactly-once consistency guarantees. Moreover, Flink’s support for event-time, highly customizable window logic, and fine-grained control of time as provided by the ProcessFunction enable the implementation of advanced business logic. Moreover, Flink features a library for Complex Event Processing (CEP) to detect patterns in data streams.
However, Flink’s outstanding feature for event-driven applications is its support for savepoints. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing.
What are typical event-driven applications? # Fraud detection Anomaly detection Rule-based alerting Business process monitoring Web application (social network) Data Analytics Applications # What are data analytics applications? # Analytical jobs extract information and insight from raw data. Traditionally, analytics are performed as batch queries or applications on bounded data sets of recorded events. In order to incorporate the latest data into the result of the analysis, it has to be added to the analyzed data set and the query or application is rerun. The results are written to a storage system or emitted as reports.
With a sophisticated stream processing engine, analytics can also be performed in a real-time fashion. Instead of reading finite data sets, streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are either written to an external database or maintained as internal state. A dashboard application can read the latest results from the external database or directly query the internal state of the application.
Apache Flink supports streaming as well as batch analytical applications as shown in the figure below.
What are the advantages of streaming analytics applications? # The advantages of continuous streaming analytics compared to batch analytics are not limited to a much lower latency from events to insight due to elimination of periodic import and query execution. In contrast to batch queries, streaming queries do not have to deal with artificial boundaries in the input data which are caused by periodic imports and the bounded nature of the input.
Another aspect is a simpler application architecture. A batch analytics pipeline consists of several independent components to periodically schedule data ingestion and query execution. Reliably operating such a pipeline is non-trivial because failures of one component affect the following steps of the pipeline. In contrast, a streaming analytics application which runs on a sophisticated stream processor like Flink incorporates all steps from data ingestions to continuous result computation. Therefore, it can rely on the engine’s failure recovery mechanism.
How does Flink support data analytics applications? # Flink provides very good support for continuous streaming as well as batch analytics. Specifically, it features an ANSI-compliant SQL interface with unified semantics for batch and streaming queries. SQL queries compute the same result regardless whether they are run on a static data set of recorded events or on a real-time event stream. Rich support for user-defined functions ensures that custom code can be executed in SQL queries. If even more custom logic is required, Flink’s DataStream API or DataSet API provide more low-level control.
What are typical data analytics applications? # Quality monitoring of Telco networks Analysis of product updates & experiment evaluation in mobile applications Ad-hoc analysis of live data in consumer technology Large-scale graph analysis Data Pipeline Applications # What are data pipelines? # Extract-transform-load (ETL) is a common approach to convert and move data between storage systems. Often ETL jobs are periodically triggered to copy data from transactional database systems to an analytical database or a data warehouse.
Data pipelines serve a similar purpose as ETL jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they are able to read records from sources that continuously produce data and move it with low latency to their destination. For example a data pipeline might monitor a file system directory for new files and write their data into an event log. Another application might materialize an event stream to a database or incrementally build and refine a search index.
The figure below depicts the difference between periodic ETL jobs and continuous data pipelines.
What are the advantages of data pipelines? # The obvious advantage of continuous data pipelines over periodic ETL jobs is the reduced latency of moving data to its destination. Moreover, data pipelines are more versatile and can be employed for more use cases because they are able to continuously consume and emit data.
How does Flink support data pipelines? # Many common data transformation or enrichment tasks can be addressed by Flink’s SQL interface (or Table API) and its support for user-defined functions. Data pipelines with more advanced requirements can be realized by using the DataStream API which is more generic. Flink provides a rich set of connectors to various storage systems such as Kafka, Kinesis, Elasticsearch, and JDBC database systems. It also features continuous sources for file systems that monitor directories and sinks that write files in a time-bucketed fashion.
What are typical data pipeline applications? # Real-time search index building in e-commerce Continuous ETL in e-commerce `}),e.add({id:11,href:"/getting-started/with-flink-ml/",title:"With Flink ML",section:"Getting Started",content:" Getting Started with Flink ML # Read how you can get started with Flink ML here. "}),e.add({id:12,href:"/documentation/flink-cdc-stable/",title:"CDC $FlinkCDCStableShortVersion (stable)",section:"Documentation",content:" Flink CDC documentation (latest stable release) # You can find the Flink CDC documentation for the latest stable release here. "}),e.add({id:13,href:"/downloads/",title:"Downloads",section:"Apache Flink® — Stateful Computations over Data Streams",content:` Apache Flink® Downloads # Apache Flink # Apache Flink® 1.19.0 is the latest stable release.
Apache Flink 1.19.0 # Apache Flink 1.19.0 (asc, sha512)
Apache Flink 1.19.0 Source Release (asc, sha512)
Release Notes # Please have a look at the Release Notes for Apache Flink 1.19.0 if you plan to upgrade your Flink setup from a previous version.
Apache Flink 1.18.1 # Apache Flink 1.18.1 (asc, sha512)
Apache Flink 1.18.1 Source Release (asc, sha512)
Release Notes # Please have a look at the Release Notes for Apache Flink 1.18.1 if you plan to upgrade your Flink setup from a previous version.
Apache Flink 1.17.2 # Apache Flink 1.17.2 (asc, sha512)
Apache Flink 1.17.2 Source Release (asc, sha512)
Release Notes # Please have a look at the Release Notes for Apache Flink 1.17.2 if you plan to upgrade your Flink setup from a previous version.
Apache Flink 1.16.3 # Apache Flink 1.16.3 (asc, sha512)
Apache Flink 1.16.3 Source Release (asc, sha512)
Release Notes # Please have a look at the Release Notes for Apache Flink 1.16.3 if you plan to upgrade your Flink setup from a previous version.
Apache Flink connectors # These are connectors that are released separately from the main Flink releases.
Apache Flink AWS Connectors 4.2.0 # Apache Flink AWS Connectors 4.2.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink Cassandra Connector 3.1.0 # Apache Flink Cassandra Connector 3.1.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x
1.17.x
Apache Flink Elasticsearch Connector 3.0.1 # Apache Flink Elasticsearch Connector 3.0.1 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x
1.17.x
Apache Flink Google Cloud PubSub Connector 3.0.2 # Apache Flink Google Cloud PubSub Connector 3.0.2 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink HBase Connector 3.0.0 # Apache Flink HBase Connector 3.0.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x
1.17.x
Apache Flink JDBC Connector 3.1.2 # Apache Flink JDBC Connector 3.1.2 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x
1.17.x
1.18.x
Apache Flink Kafka Connector 3.1.0 # Apache Flink Kafka Connector 3.1.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink MongoDB Connector 1.1.0 # Apache Flink MongoDB Connector 1.1.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink Opensearch Connector 1.1.0 # Apache Flink Opensearch Connector 1.1.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink Pulsar Connector 3.0.1 # Apache Flink Pulsar Connector 3.0.1 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x Apache Flink Pulsar Connector 4.1.0 # Apache Flink Pulsar Connector 4.1.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.x
1.18.x
Apache Flink RabbitMQ Connector 3.0.1 # Apache Flink RabbitMQ Connector 3.0.1 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.x
1.17.x
Apache Flink Stateful Functions # Apache Flink® Stateful Functions 3.3 is the latest stable release.
Apache Flink Stateful Functions 3.3.0 # Apache Flink Stateful Functions 3.3.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.16.2 Apache Flink ML # Apache Flink® ML 2.3 is the latest stable release.
Apache Flink ML 2.3.0 # Apache Flink ML 2.3.0 Source Release (asc, sha512)
Apache Flink ML 2.3.0 Python Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.17.* Apache Flink ML 2.2.0 # Apache Flink ML 2.2.0 Source Release (asc, sha512)
Apache Flink ML 2.2.0 Python Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.15.* Apache Flink Kubernetes Operator # Apache Flink® Kubernetes Operator 1.8 is the latest stable release.
Apache Flink Kubernetes Operator 1.8.0 # Apache Flink Kubernetes Operator 1.8.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.18.x
1.17.x
1.16.x
1.15.x
Apache Flink Kubernetes Operator 1.7.0 # Apache Flink Kubernetes Operator 1.7.0 Source Release (asc, sha512)
This component is compatible with Apache Flink version(s):
1.18.x
1.17.x
1.16.x
1.15.x
Additional Components # These are components that the Flink project develops which are not part of the main Flink release:
Pre-bundled Hadoop 2.8.3 # Pre-bundled Hadoop 2.8.3 Source Release (asc, sha512)
Pre-bundled Hadoop 2.7.5 # Pre-bundled Hadoop 2.7.5 Source Release (asc, sha512)
Pre-bundled Hadoop 2.6.5 # Pre-bundled Hadoop 2.6.5 Source Release (asc, sha512)
Pre-bundled Hadoop 2.4.1 # Pre-bundled Hadoop 2.4.1 Source Release (asc, sha512)
Apache Flink-shaded 18.0 Source Release # Apache Flink-shaded 18.0 Source Release Source Release (asc, sha512)
Apache Flink-shaded 17.0 Source Release # Apache Flink-shaded 17.0 Source Release Source Release (asc, sha512)
Apache Flink-shaded 16.2 Source Release # Apache Flink-shaded 16.2 Source Release Source Release (asc, sha512)
Apache Flink-connector-parent 1.1.0 Source release # Apache Flink-connector-parent 1.1.0 Source release Source Release (asc, sha512)
Verifying Hashes and Signatures # Along with our releases, we also provide sha512 hashes in *.sha512 files and cryptographic signatures in *.asc files. The Apache Software Foundation has an extensive tutorial to verify hashes and signatures which you can follow by using any of these release-signing KEYS.
Maven Dependencies # Apache Flink # You can add the following dependencies to your pom.xml to include Apache Flink in your project. These dependencies include a local execution environment and thus support local testing.
Scala API: To use the Scala API, replace the flink-java artifact id with flink-scala_2.12 and flink-streaming-java with flink-streaming-scala_2.12. <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.19.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.19.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.19.0</version> </dependency> Apache Flink Stateful Functions # You can add the following dependencies to your pom.xml to include Apache Flink Stateful Functions in your project.
<dependency> <groupId>org.apache.flink</groupId> <artifactId>statefun-sdk</artifactId> <version>3.3.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>statefun-flink-harness</artifactId> <version>3.3.0</version> </dependency> The statefun-sdk dependency is the only one you will need to start developing applications. The statefun-flink-harness dependency includes a local execution environment that allows you to locally test your application in an IDE.
Apache Flink ML # You can add the following dependencies to your pom.xml to include Apache Flink ML in your project.
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-ml-core</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-ml-iteration</artifactId> <version>2.3.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-ml-lib</artifactId> <version>2.3.0</version> </dependency> Advanced users could only import a minimal set of Flink ML dependencies for their target use-cases:
Use artifact flink-ml-core in order to develop custom ML algorithms. Use artifacts flink-ml-core and flink-ml-iteration in order to develop custom ML algorithms which require iteration. Use artifact flink-ml-lib in order to use the off-the-shelf ML algorithms from Flink ML. Apache Flink Kubernetes Operator # You can add the following dependencies to your pom.xml to include Apache Flink Kubernetes Operator in your project.
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-kubernetes-operator</artifactId> <version>1.8.0</version> </dependency> Update Policy for old releases # As of March 2017, the Flink community decided to support the current and previous minor release with bugfixes. If 1.2.x is the current release, 1.1.y is the previous minor supported release. Both versions will receive bugfixes for critical issues.
As of March 2023, the Flink community decided that upon release of a new Flink minor version, the community will perform one final bugfix release for resolved critical/blocker issues in the Flink minor version losing support. If 1.16.1 is the current release and 1.15.4 is the latest previous patch version, once 1.17.0 is released we will create a 1.15.5 to flush out any resolved critical/blocker issues.
Note that the community is always open to discussing bugfix releases for even older versions. Please get in touch with the developers for that on the dev@flink.apache.org mailing list.
All stable releases # All Flink releases are available via https://archive.apache.org/dist/flink/ including checksums and cryptographic signatures. At the time of writing, this includes the following versions:
Apache Flink # Apache Flink 1.19.0 - 2024-03-18 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.18.1 - 2024-01-19 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.18.0 - 2023-10-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.17.2 - 2023-11-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.17.1 - 2023-05-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.17.0 - 2023-03-23 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.16.3 - 2023-11-20 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.16.2 - 2023-05-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.16.1 - 2023-01-30 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.16.0 - 2022-10-28 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.15.4 - 2023-03-15 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.15.3 - 2022-11-10 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.15.2 - 2022-08-24 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.15.1 - 2022-07-06 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.15.0 - 2022-05-05 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.6 - 2022-09-28 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.5 - 2022-06-22 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.4 - 2022-03-02 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.3 - 2022-01-17 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.2 - 2021-12-16 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.14.0 - 2021-09-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.6 - 2022-02-18 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.5 - 2021-12-16 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.3 - 2021-10-19 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.2 - 2021-08-02 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.1 - 2021-05-28 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.13.0 - 2021-04-30 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.7 - 2021-12-16 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.5 - 2021-08-06 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.4 - 2021-05-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.3 - 2021-04-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.2 - 2021-03-03 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.1 - 2021-01-19 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.12.0 - 2020-12-08 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.6 - 2021-12-16 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.4 - 2021-08-09 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.3 - 2020-12-18 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.2 - 2020-09-17 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.1 - 2020-07-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.11.0 - 2020-07-06 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.10.3 - 2021-01-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.10.2 - 2020-08-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.10.1 - 2020-05-12 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.10.0 - 2020-02-11 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.9.3 - 2020-04-24 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.9.2 - 2020-01-30 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.9.1 - 2019-10-18 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.9.0 - 2019-08-22 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.8.3 - 2019-12-11 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.8.2 - 2019-09-11 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.8.1 - 2019-07-02 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.8.0 - 2019-04-09 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.7.2 - 2019-02-15 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.7.1 - 2018-12-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.7.0 - 2018-11-30 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.6.4 - 2019-02-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.6.3 - 2018-12-22 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.6.2 - 2018-10-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.6.1 - 2018-09-19 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.6.0 - 2018-08-08 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.6 - 2018-12-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.5 - 2018-10-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.4 - 2018-09-19 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.3 - 2018-08-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.2 - 2018-07-31 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.1 - 2018-07-12 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.5.0 - 2018-05-25 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.4.2 - 2018-03-08 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.4.1 - 2018-02-15 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.4.0 - 2017-11-29 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.3.3 - 2018-03-15 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.3.2 - 2017-08-05 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.3.1 - 2017-06-23 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.3.0 - 2017-06-01 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.2.1 - 2017-04-26 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.2.0 - 2017-02-06 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.5 - 2017-03-22 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.4 - 2016-12-21 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.3 - 2016-10-13 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.2 - 2016-09-05 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.1 - 2016-08-11 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.1.0 - 2016-08-08 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.0.3 - 2016-05-12 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.0.2 - 2016-04-23 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.0.1 - 2016-04-06 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 1.0.0 - 2016-03-08 (Source, Binaries, Docs, Javadocs, Scaladocs ) Apache Flink 0.10.2 - 2016-02-11 (Source, Binaries) Apache Flink 0.10.1 - 2015-11-27 (Source, Binaries) Apache Flink 0.10.0 - 2015-11-16 (Source, Binaries) Apache Flink 0.9.1 - 2015-09-01 (Source, Binaries) Apache Flink 0.9.0 - 2015-06-24 (Source, Binaries) Apache Flink 0.9.0-milestone-1 - 2015-04-13 (Source, Binaries) Apache Flink 0.8.1 - 2015-02-20 (Source, Binaries) Apache Flink 0.8.0 - 2015-01-22 (Source, Binaries) Apache Flink 0.7.0-incubating - 2014-11-04 (Source, Binaries) Apache Flink 0.6.1-incubating - 2014-09-26 (Source, Binaries) Apache Flink 0.6-incubating - 2014-08-26 (Source, Binaries) Apache Flink connectors # Flink Elasticsearch Connector 3.0.0 - 2022-11-09 (Source) Flink AWS Connectors 3.0.0 - 2022-11-28 (Source) Flink Cassandra Connector 3.0.0 - 2022-11-30 (Source) Flink AWS Connectors 4.0.0 - 2022-12-09 (Source) Flink Pulsar Connector 3.0.0 - 2022-12-20 (Source) Flink JDBC Connector 3.0.0 - 2022-11-30 (Source) Flink RabbitMQ Connectors 3.0.0 - 2022-12-13 (Source) Flink Opensearch Connector 1.0.0 - 2022-12-21 (Source) Flink Google Cloud PubSub Connector 3.0.0 - 2023-01-31 (Source) Flink MongoDB Connector 1.0.0 - 2023-04-03 (Source) Flink AWS Connectors 4.1.0 - 2023-04-03 (Source) Flink Kafka Connector 3.0.0 - 2023-04-21 (Source) Flink MongoDB Connector 1.0.1 - 2023-04-24 (Source) Flink JDBC Connector 3.1.0 - 2023-05-05 (Source) Flink RabbitMQ Connectors 3.0.1 - 2023-05-08 (Source) Flink Elasticsearch Connector 3.0.1 - 2023-05-08 (Source) Flink Opensearch Connector 1.0.1 - 2023-05-08 (Source) Flink Pulsar Connector 4.0.0 - 2023-05-08 (Source) Flink Google Cloud PubSub Connector 3.0.1 - 2023-05-09 (Source) Flink Cassandra Connector 3.1.0 - 2023-05-25 (Source) Flink Pulsar Connector 3.0.1 - 2023-06-07 (Source) Flink JDBC Connector 3.1.1 - 2023-06-28 (Source) Flink MongoDB Connector 1.0.2 - 2023-08-15 (Source) Flink HBase Connector 3.0.0 - 2023-09-1 (Source) Flink Kafka Connector 3.0.1 - 2023-10-30 (Source) Flink AWS Connectors 4.2.0 - 2023-11-30 (Source) Flink Kafka Connector 3.0.2 - 2023-12-01 (Source) Flink Pulsar Connector 4.1.0 - 2023-12-28 (Source) Flink Google Cloud PubSub Connector 3.0.2 - 2024-01-12 (Source) Flink Opensearch Connector 1.1.0 - 2024-02-01 (Source) Flink Kafka Connector 3.1.0 - 2024-02-07 (Source) Flink MongoDB Connector 1.1.0 - 2024-02-19 (Source) Flink JDBC Connector 3.1.2 - 2024-02-21 (Source) Apache Flink Stateful Functions # Apache Flink Stateful Functions 3.3.0 - 2023-09-19 (Source, Docs, Javadocs) Apache Flink Stateful Functions 3.2.0 - 2022-01-27 (Source, Docs, Javadocs) Apache Flink Stateful Functions 3.1.1 - 2021-12-22 (Source, Docs, Javadocs) Apache Flink Stateful Functions 3.1.0 - 2021-08-30 (Source, Docs, Javadocs) Apache Flink Stateful Functions 3.0.0 - 2021-04-14 (Source, Docs, Javadocs) Apache Flink Stateful Functions 2.2.2 - 2021-01-02 (Source, Docs, Javadocs) Apache Flink Stateful Functions 2.2.1 - 2020-11-09 (Source, Docs, Javadocs) Apache Flink Stateful Functions 2.2.0 - 2020-09-28 (Source, Docs, Javadocs) Apache Flink Stateful Functions 2.1.0 - 2020-06-08 (Source, Docs, Javadocs) Apache Flink Stateful Functions 2.0.0 - 2020-04-02 (Source, Docs, Javadocs) Apache Flink Shaded # Apache Flink Shaded 18.0 - 2024-01-11 (Source) Apache Flink Shaded 17.0 - 2023-05-08 (Source) Apache Flink Shaded 16.2 - 2023-11-17 (Source) Apache Flink Shaded 16.1 - 2022-11-24 (Source) Apache Flink Shaded 16.0 - 2022-10-07 (Source) Apache Flink Shaded 15.0 - 2022-01-21 (Source) Apache Flink Shaded 14.0 - 2021-07-21 (Source) Apache Flink Shaded 13.0 - 2021-04-06 (Source) Apache Flink Shaded 12.0 - 2020-10-09 (Source) Apache Flink Shaded 11.0 - 2020-05-29 (Source) Apache Flink Shaded 10.0 - 2020-02-17 (Source) Apache Flink Shaded 9.0 - 2019-11-23 (Source) Apache Flink Shaded 8.0 - 2019-08-28 (Source) Apache Flink Shaded 7.0 - 2019-05-30 (Source) Apache Flink Shaded 6.0 - 2019-02-12 (Source) Apache Flink Shaded 5.0 - 2018-10-15 (Source) Apache Flink Shaded 4.0 - 2018-06-06 (Source) Apache Flink Shaded 3.0 - 2018-02-28 (Source) Apache Flink Shaded 2.0 - 2017-10-30 (Source) Apache Flink Shaded 1.0 - 2017-07-27 (Source) Apache Flink ML # Apache Flink ML 2.3.0 - 2023-07-01 (Source) Apache Flink ML 2.2.0 - 2023-04-19 (Source) Apache Flink ML 2.1.0 - 2022-07-12 (Source) Apache Flink ML 2.0.0 - 2021-01-07 (Source) Apache Flink Kubernetes Operator # Apache Flink Kubernetes Operator 1.8.0 - 2024-03-21 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.7.0 - 2023-11-22 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.6.1 - 2023-10-27 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.6.0 - 2023-08-15 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.5.0 - 2023-05-17 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.4.0 - 2023-02-22 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.3.1 - 2023-01-10 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.3.0 - 2022-12-14 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.2.0 - 2022-10-05 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.1.0 - 2022-07-25 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.0.1 - 2022-06-27 (Source, Helm Chart) Apache Flink Kubernetes Operator 1.0.0 - 2022-06-04 (Source, Helm Chart) Apache Flink Kubernetes Operator 0.1.0 - 2022-04-02 (Source, Helm Chart) Apache Flink Table Store # Apache Flink Table Store 0.3.0 - 2023-01-13 (Source, Binaries) Apache Flink Table Store 0.2.0 - 2022-08-29 (Source, Binaries) Apache Flink Table Store 0.1.0 - 2022-05-11 (Source, Binaries) `}),e.add({id:14,href:"/what-is-flink/powered-by/",title:"Powered By",section:"About",content:` Powered By Flink # Apache Flink powers business-critical applications in many companies and enterprises around the globe. On this page, we present a few notable Flink users that run interesting use cases in production and link to resources that discuss their applications in more detail.
More Flink users are listed in the Powered by Flink directory in the project wiki. Please note that the list is not comprehensive. We only add users that explicitly ask to be listed.
If you would you like to be included on this page, please reach out to the Flink user mailing list and let us know.
Alibaba, the world’s largest retailer, uses a fork of Flink called Blink to optimize search rankings in real time. Read more about Flink’s role at Alibaba Amazon Managed Service for Apache Flink is a fully managed Amazon service that enables you to use an Apache Flink application to process and analyze streaming data. BetterCloud, a multi-SaaS management platform, uses Flink to surface near real-time intelligence from SaaS application activity. See BetterCloud at Flink Forward SF 2017 Bouygues Telecom is running 30 production applications powered by Flink and is processing 10 billion raw events per day. See Bouygues Telcom at Flink Forward 2016 Capital One, a Fortune 500 financial services company, uses Flink for real-time activity monitoring and alerting. Learn about Capital One’s fraud detection use case Comcast, a global media and technology company, uses Flink for operationalizing machine learning models and near-real-time event stream processing. Learn about Flink at Comcast Criteo is the advertising platform for the open internet and uses Flink for real-time revenue monitoring and near-real-time event processing. Learn about Criteo’s Flink use case Didi Chuxing (“DiDi”), the world’s leading mobile transportation platform, uses Apache Flink for real-time monitoring, feature extraction, and ETL. Learn about Didi’s Flink use case Drivetribe, a digital community founded by the former hosts of “Top Gear”, uses Flink for metrics and content recommendations. Read about Flink in the Drivetribe stack Ebay’s monitoring platform is powered by Flink and evaluates thousands of customizable alert rules on metrics and log streams. Learn more about Flink at Ebay Ericsson used Flink to build a real-time anomaly detector with machine learning over large infrastructures. Read a detailed overview on O’Reilly Ideas Gojek is a Super App: one app with over 20 services uses Flink to power their self-serve platform empowering data-driven decisions across functions. Read more on the Gojek engineering blog Huawei is a leading global provider of ICT infrastructure and smart devices. Huawei Cloud provides Cloud Service based on Flink. Learn about how Flink powers Cloud Service King, the creators of Candy Crush Saga, uses Flink to provide data science teams a real-time analytics dashboard. Learn more about King’s Flink implementation Klaviyo leverages Apache Flink to scale its real-time analytics system that deduplicates and aggregates over a million events per second. Read about real-time analytics at Klaviyo Kuaishou, one of the leading short video sharing apps in China, uses Apache Flink to build a real-time monitoring platform for short videos and live streaming. Read about real-time monitoring at Kuaishou Lyft uses Flink as processing engine for its streaming platform, for example to consistently generate features for machine learning. Read more about Streaming at Lyft MediaMath, a programmatic marketing company, uses Flink to power its real-time reporting infrastructure. See MediaMath at Flink Forward SF 2017 Mux, an analytics company for streaming video providers, uses Flink for real-time anomaly detection and alerting. Read more about how Mux is using Flink OPPO, one of the largest mobile phone manufacturers in China, build a real-time data warehouse with Flink to analyze the effects of operating activities and short-term interests of users. Read more about how OPPO is using Flink Otto Group, the world’s second-largest online retailer, uses Flink for business intelligence stream processing. See Otto at Flink Forward 2016 OVH leverages Flink to develop streaming-oriented applications such as real-time Business Intelligence or alerting systems. Read more about how OVH is using Flink Pinterest runs thousands of experiments every day on a platform for real-time experiment analytics that is based on Apache Flink. Read more about real-time experiment analytics at Pinterest Razorpay, one of India’s largest payment gateways, built their in-house platform Mitra with Apache Flink to scale AI feature generation and model serving in real-time. Read more about data science with Flink at Razorpay ResearchGate, a social network for scientists, uses Flink for network analysis and near-duplicate detection. See ResearchGate at Flink Forward 2016 SK telecom is South Korea’s largest wireless carrier and uses Flink for several applications including smart factory and mobility applications. Learn more about one of SK telecom’s use cases Telefónica NEXT’s TÜV-certified Data Anonymization Platform is powered by Flink. Read more about Telefónica NEXT Tencent, one of the largest Internet companies, built an in-house platform with Apache Flink to improve the efficiency of developing and operating real-time applications. Read more about Tencent’s platform. Uber built their internal SQL-based, open-source streaming analytics platform AthenaX on Apache Flink. Read more on the Uber engineering blog Vip, one of the largest warehouse sale website for big brands in China, uses Flink to stream and ETL data into Apache Hive in real-time for data processing and analytics. Read more about Vip’s story. Xiaomi, one of the largest electronics companies in China, built a platform with Flink to improve the efficiency of developing and operating real-time applications and use it in real-time recommendations. Learn more about how Xiaomi is using Flink. Yelp utilizes Flink to power its data connectors ecosystem and stream processing infrastructure. Find out more watching a Flink Forward talk Zalando, one of the largest e-commerce companies in Europe, uses Flink for real-time process monitoring and ETL. Read more on the Zalando Tech Blog `}),e.add({id:15,href:"/getting-started/with-flink-stateful-functions/",title:"With Flink Stateful Functions",section:"Getting Started",content:" Getting Started with Flink Stateful Functions # Read how you can get started with Flink Stateful Functions here. "}),e.add({id:16,href:"/documentation/flink-cdc-master/",title:"CDC Master (snapshot)",section:"Documentation",content:" Flink CDC documentation (latest snapshot) # You can find the Flink CDC documentation for the latest snapshot here. "}),e.add({id:17,href:"/what-is-flink/roadmap/",title:"Roadmap",section:"About",content:` Roadmap # Preamble: This roadmap means to provide users and contributors with a high-level summary of ongoing efforts, grouped by the major threads to which the efforts belong. With so much that is happening in Flink, we hope that this helps with understanding the direction of the project. The roadmap contains both efforts in early stages as well as nearly completed efforts, so that users may get a better impression of the overall status and direction of those developments.
More details and various smaller changes can be found in the FLIPs
The roadmap is continuously updated. New features and efforts should be added to the roadmap once there is consensus that they will happen and what they will roughly look like for the user.
Last Update: 2023-09-01
Feature Radar # The feature radar is meant to give users guidance regarding feature maturity, as well as which features are approaching end-of-life. For questions, please contact the developer mailing list: dev@flink.apache.org
Feature Stages # MVP: Have a look, consider whether this can help you in the future. Beta: You can benefit from this, but you should carefully evaluate the feature. Ready and Evolving: Ready to use in production, but be aware you may need to make some adjustments to your application and setup in the future, when you upgrade Flink. Stable: Unrestricted use in production Approaching End-of-Life: Stable, still feel free to use, but think about alternatives. Not a good match for new long-lived projects. Deprecated: Start looking for alternatives now Scenarios We Focus On # Batch / Streaming Unification and Mixing # Flink is a streaming data system in its core, that executes “batch as a special case of streaming”. Efficient execution of batch jobs is powerful in its own right; but even more so, batch processing capabilities (efficient processing of bounded streams) open the way for a seamless unification of batch and streaming applications. Unified streaming/batch up-levels the streaming data paradigm: It gives users consistent semantics across their real-time and lag-time applications. Furthermore, streaming applications often need to be complemented by batch (bounded stream) processing, for example when reprocessing data after bugs or data quality issues, or when bootstrapping new applications. A unified API and system make this much easier.
Both DataStream API and SQL provide unified API to execute the same application in different modes of batch and streaming. There have been some efforts to make the unification much more seamless, such as unified Source API (FLIP-27) and SinkV2 API (FLIP-191). Beyond unification, we want to go one step further. Our goal is to mix and switch between batch/streaming execution in the future to make it a seamless experience. We have supported checkpointing when some tasks are finished & bounded stream programs shut down with a final checkpoint (FLIP-147). There are initial discussions and designs about jobs with mixed batch/streaming execution, so stay tuned for more news in that area.
Dynamic checkpoint interval for processing bounded stream of historical data and unbounded stream of incremental data (FLIP-309). Event notification mechanism for the boundary of bounded part and unbounded part in a stream. This can unlock many exciting features and improvements, such as FLINK-19830. Bootstrap states using a batch job (bounded stream program) with a final checkpoint, and continue processing with a streaming job (unbounded stream program) from the checkpoint and state. Unified SQL Platform # The community has been building Flink to a powerful basis for a unified (batch and streaming) SQL analytics platform and is continuing to do so.
SQL has very strong cross-batch-streaming semantics, allowing users to use the same queries for ad-hoc analytics and as continuous queries. Flink already contains an efficient unified query engine and a wide set of integrations. With user feedback, those are continuously improved.
Going Beyond a SQL Stream/Batch Processing Engine # The experience of updating Flink SQL based jobs has been rather cumbersome as it could have led to new job graphs making restoring from savepoints/checkpoints impossible. FLIP-190 which already has been shipped as MVP is targeting this. To extend the capability of a stream-batch processing engine and make Flink ready for the unified SQL platform, there is an ongoing effort to allow Flink better manage data and metadata, including DELETE/UPDATE, Call Procedures, rich DDLs, Time Travel, and so on. This is especially useful for building a lakehouse with Flink and Paimon/Iceberg/Hudi. There are some initial discussions to support JSON data type for Flink SQL. This can enable Flink SQL to better analyze semi-structured data and better adapt to NoSQL databases. Platform Infrastructure # After FLIP-163 the community is working again on a set of SQL Client usability improvements (FLIP-189, FLIP-222) which is aiming at improving the user experience when using the SQL client. To simplify the building of production SQL platforms with Flink, we are improving the SQL Gateway component as the service of the Flink SQL platform. There are many ongoing exciting features around it, including supporting application mode (FLIP-316), JDBC driver client (FLIP-293), persisted catalog registration (FLIP-295), authentication, and high availability. Support for Common Languages # Hive syntax compatibility can help users migrate existing Hive SQL tasks to Flink seamlessly, and it is convenient for users who are familiar with Hive syntax to use Hive syntax to write SQL to query tables registered in Flink. Until now, the Hive syntax compatibility has reached 94.1% which is measured using the Hive qtest suite. Flink community is continuously improving the compatibility and execution performance (FLINK-29717). With FLIP-216 there’s now the initiative to introduce pluggable SQL dialects on the example of the Hive syntax. It makes Flink easier to support other SQL dialects in the future, for example, Spark SQL and PostgreSQL. Towards Streaming Warehouses # Flink has become the leading technology and factual standard for stream processing. The concept of unifying streaming and batch data processing is gaining recognition and is being successfully implemented in more and more companies. To further unify streaming-batch analytics, Flink has proposed the concept of Streaming Warehouse. This new concept aims to unify not only computation but also storage, ensuring that data flows and is processed in real time. As a result, the data in the warehouse is always up-to-date, and any analytics or insights generated from it reflect the current state of the business. This combines the advantages of traditional data warehouses with real-time insights.
The Apache Flink community initiated the Flink Table Store subproject (FLIP-188) with the vision of streaming-batch unified storage. With the project growing rapidly, Flink Table Store joined the Apache incubator as an independent project called Apache Paimon. Apache Paimon has its own roadmap under the documentation. The unified storage opens the way for Flink to improve the performance and experience of streaming-batch unified applications.
OLAP is an important scenario after Flink streaming-batch data processing, users need an OLAP engine to analyze data in the streaming warehouse. Flink could execute “OLAP as a special case of batch” and the community is trying to explore the possibility of improvement for short-lived jobs without affecting streaming and batch processing. It is a nice-to-have feature and it will bring great value for users in Flink becoming a unified streaming-batch-OLAP data processing system.
In order to build an efficient streaming warehouse, there are a lot of things that need to be improved in Flink, for example:
Support rich warehouse APIs to manage data and metadata, such as: CTAS/RTAS (FLIP-303), CALL (FLIP-311), TRUNCATE (FLIP-302), and so on. CBO (cost-based optimizations) with statistics in streaming lakehouses for streaming queries. Make full use of the layout and indexes on streaming lakehouse to reduce data reading and processing for streaming queries. Improvements for short-lived jobs to support OLAP queries with low latency and concurrent execution. Engine Evolution # Disaggregated State Management # One major advantage of Flink is its efficient and easy-to-use state management mechanism. However, this mechanism has evolved a little since it was born and is not suitable in the cloud-native era. In the past several releases, we’ve made significant efforts to improve the procedure of state snapshotting (FLIP-76 unaligned checkpoint, FLIP-158 generic incremental checkpoint) and state repartitioning. In doing so, we gradually find that a lot of problems (slow state snapshotting and state recovery for example) are root-caused by computation and state management bounded together, especially for large jobs with large states. Hence, starting from Flink 2.0, we aim at disaggregating Flink computation and state management and we believe that is more suitable for a modern cloud-native architecture.
In the new design, DFS is played as primary storage. Checkpoints are shareable between operators so we do not need to compute and store multiple copies of the same state table. Queryable state APIs can be provided based on these checkpoints. Compaction and clean-up of state files are not bounded to the same Task manager anymore so we can do better load-balancing and avoid burst CPU and network peaks.
Evolution of Flink APIs # With Flink 2.0 approaching, the community is planning to evolve the APIs of Apache Flink.
We are planning to remove some long deprecated APIs in Flink 2.0, to make Flink move faster, including: The DataSet API, all Scala APIs, the legacy SinkV1 API, the legacy TableSource/TableSink API Deprecated methods / fields / classes in the DataStream API, Table API and REST API Deprecated configuration options and metrics We are also planning to retire the legacy SourceFunction / SinkFunction APIs, and Queryable State API in the long term. This may not happen shortly, as the prerequisites for users to migrate from these APIs are not fully met at the moment. We are aware of some problems of the current DataStream API, such as the exposing of and dependencies on the Flink internal implementations, which requires significant changes to fix. To provide a smooth migration experience, the community is designing a new ProcessFunction API, which aims to gradually replace the DataStream API in the long term. Flink as an Application # The goal of these efforts is to make it feel natural to deploy (long-running streaming) Flink applications. Instead of starting a cluster and submitting a job to that cluster, these efforts support deploying a streaming job as a self-contained application.
For example as a simple Kubernetes deployment; deployed and scaled like a regular application without extra workflows.
There is currently a Flink Kubernetes Operator subproject being developed by the community and has its own roadmap under the documentation. Streaming query as an application. Make SQL Client/Gateway supports submitting SQL jobs in the application mode (FLIP-316). Performance # Continuous work to keep improving the performance of both Flink streaming and batch processing.
Large-Scale Streaming Jobs # Streaming Join is a headache for Flink users because of its large-scale state. The community is putting lots of effort into further improving the performance of streaming join, such as minibatch join, multi-way join, and reducing duplicated states. The community is also continuously improving and working on some other joins, such as unordered async lookup join and processing-time temporal join (FLIP-326). They can be very efficient alternatives for streaming joins. Change data capture and processing with Flink SQL is widely used, and the community is improving cost and performance in this case, e.g. reducing normalize and materialize state. Faster Batch Queries # The community’s goal is to make Flink’s performance on bounded streams (batch use cases) competitive with that of dedicated batch processors. While Flink has been shown to handle some batch processing use cases faster than widely-used batch processors, there are some ongoing efforts to make sure this is the case for broader use cases: The community has introduced Dynamic Partition Pruning (DPP) which aims to minimize I/O costs of the data read from the data sources. There are some ongoing efforts to further reduce the I/O and shuffle costs, such as Runtime Filter (FLIP-324). Operator Fusion CodeGen (FLIP-315) improves the execution performance of a query by fusing an operator DAG into a single optimized operator that eliminates virtual function calls and leverages CPU registers for intermediate data. The community has supported some adaptive batch execution and scheduling (FLIP-187). We are trying to support broader adaptive cases, such as Adaptive Query Execution that makes use of runtime statistics to choose the most efficient query execution plan. The community has started improving scheduler and execution performance (FLINK-25318) for short-lived jobs to support OLAP. Flink executes “OLAP as a special case of batch”, we are trying to extend Flink to execute low-latency and currency queries in Session Cluster and users can perform streaming, batch, and OLAP data processing on the unified Flink engine.
Stability # The community is keeping improving the stability of jobs, by better tolerating failures, and speeding up the recovery process.
The instability of the environment is unavoidable. It can lead to a crash of JobManager and TaskManager nodes, or slow data processing. The community has introduced speculative execution (FLIP-168, FLIP-245, FLIP-281) for batch jobs to reduce the impact of problematic machines which slows down data processing.
JobManager node crash is usually unacceptable for a batch job because the job has to be re-run from the very beginning. Therefore, the community is planning to improve the JobManager recovery process to avoid re-run finished stages. Another planned improvement is to retain running tasks when the JobManager node goes down unexpectedly, to further reduce the impact of the JobManager crash. This can also benefit streaming jobs even if they have periodical checkpointing, to avoid interruption or regression of data processing in this case.
Usability # Now and then we hear people say that, while Flink is powerful in functionality, it is not that easy to master. Such voices are heard. The community is working on several efforts to improve the usability of Flink.
We are working on reducing the number of configuration options that users need to specify, as well as making them easier to understand and tune. This includes: Removing options that require in-depth knowledge of Flink internals to understand and use. Making Flink automatically and dynamically decide the proper behavior where possible. Improving the default values of the options so that users need not to touch them in most cases. Improving the definition and description of the options so that they are easier to understand and work with when it’s necessary.
We have already made some progress along this direction. Flink 1.17 requires less than 10 configurations to achieve well enough performance on TPC-DS. Hybrid shuffle supports dynamically switching between different shuffle modes and decouples its memory footprint from the parallelism of the job.
Developer Experience # Ecosystem # There is almost no use case in which Apache Flink is used on its own. It has established itself as part of many data related reference architectures. In fact you’ll find the squirrel logo covering several aspects.
All connectors will be hosted in an external repository going forward and many of them have been successfully externalized. See the mailing list thread. Catalog as a first-class citizen. Flink catalog lets users issue batch and streaming queries connecting to external systems without registering DDLs/schemas manually. It is recommended to support Catalog in the highest priority for connectors. The community is working on supporting more catalogs for connectors (e.g. GlueCatalog, SchemaRegistryCatalog). There is ongoing work on introducing more new connectors (e.g. Pinot, Redshift, ClickHouse)
Documentation # There are various dedicated efforts to simplify the maintenance and structure (more intuitive navigation/reading) of the documentation.
Docs Tech Stack: FLIP-157 General Docs Structure: FLIP-42 SQL Docs: FLIP-60 `}),e.add({id:18,href:"/getting-started/training-course/",title:"Training Course",section:"Getting Started",content:" Training Course # Read all about the Flink Training Course here. "}),e.add({id:19,href:"/what-is-flink/community/",title:"Community & Project Info",section:"About",content:` Community & Project Info # How do I get help from Apache Flink? # There are many ways to get help from the Apache Flink community. The mailing lists are the primary place where all Flink committers are present. For user support and questions use the user mailing list. You can also join the community on Slack. Some committers are also monitoring Stack Overflow. Please remember to tag your questions with the apache-flink tag. Bugs and feature requests can either be discussed on the dev mailing list or on Jira. Those interested in contributing to Flink should check out the contribution guide.
Mailing Lists # Name Subscribe Digest Unsubscribe Post Archive news@flink.apache.org
News and announcements from the Flink community Subscribe Subscribe Unsubscribe Read only list Archives community@flink.apache.org
Broader community discussions related to meetups, conferences, blog posts and job offers Subscribe Subscribe Unsubscribe Post Archives user@flink.apache.org
User support and questions mailing list Subscribe Subscribe Unsubscribe Post Archives user-zh@flink.apache.org
User support and questions mailing list Subscribe Subscribe Unsubscribe Post Archives dev@flink.apache.org
Development related discussions Subscribe Subscribe Unsubscribe Post Archives builds@flink.apache.org
Build notifications of Flink main repository Subscribe Subscribe Unsubscribe Read only list Archives issues@flink.apache.org
Mirror of all Jira activity Subscribe Subscribe Unsubscribe Read only list Archives commits@flink.apache.org
All commits to our repositories Subscribe Subscribe Unsubscribe Read only list Archives Please make sure you are subscribed to the mailing list you are posting to! If you are not subscribed to the mailing list, your message will either be rejected (dev@ list) or you won't receive the response (user@ list). How to subscribe to a mailing list # Before you can post a message to a mailing list, you need to subscribe to the list first.
Send an email without any contents or subject to *listname*-subscribe@flink.apache.org. (replace listname with dev, user, user-zh, ..) Wait till you receive an email with the subject “confirm subscribe to *listname*@flink.apache.org”. Reply to that email, without editing the subject or including any contents Wait till you receive an email with the subject “WELCOME to *listname*@flink.apache.org”. If you send us an email with a code snippet, make sure that:
you do not link to files in external services as such files can change, get deleted or the link might break and thus make an archived email thread useless you paste text instead of screenshots of text you keep formatting when pasting code in order to keep the code readable there are enough import statements to avoid ambiguities Slack # You can join the Apache Flink community on Slack. After creating an account in Slack, don’t forget to introduce yourself in #introductions. Due to Slack limitations the invite link expires after 100 invites. If it is expired, please reach out to the Dev mailing list. Any existing Slack member can also invite anyone else to join.
There are a couple of community rules:
Be respectful - This is the most important rule! All important decisions and conclusions must be reflected back to the mailing lists. “If it didn’t happen on a mailing list, it didn’t happen.” - The Apache Mottos Use Slack threads to keep parallel conversations from overwhelming a channel. Use either #pyflink (for all Python Flink questions) or #troubleshooting (for all other Flink questions). Please do not direct message people for troubleshooting, Jira assigning and PR review. Doing this can result in removal from Slack. Note: All messages from public channels in our Slack are permanently stored and published in the Apache Flink Slack archive on linen.dev. The purpose of this archive is to allow search engines to find past discussions in the Flink Slack.
Stack Overflow # Committers are watching Stack Overflow for the apache-flink tag.
Make sure to tag your questions there accordingly to get answers from the Flink community.
Issue Tracker # We use Jira to track all code related issues: https://issues.apache.org/jira/browse/FLINK. You must have a JIRA account in order to log cases and issues.
If you don’t have an ASF JIRA account, you can request one at the ASF Self-serve portal.
All issue activity is also mirrored to the issues mailing list.
Reporting Security Issues # If you wish to report a security vulnerability, please contact security@apache.org. Apache Flink follows the typical Apache vulnerability handling process for reporting vulnerabilities. Note that vulnerabilities should not be publicly disclosed until the project has responded.
Meetups # There are plenty of meetups on meetup.com featuring Flink.
Source Code # Main Repositories # Flink Core Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink.git GitHub mirror: https://github.com/apache/flink.git Flink Docker Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-docker.git GitHub mirror: https://github.com/apache/flink-docker.git Flink Kubernetes Operator Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-kubernetes-operator.git GitHub mirror: https://github.com/apache/flink-kubernetes-operator Flink CDC Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-cdc.git GitHub mirror: https://github.com/apache/flink-cdc Flink ML Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-ml.git GitHub mirror: https://github.com/apache/flink-ml Flink Stateful Functions Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-statefun.git GitHub mirror: https://github.com/apache/flink-statefun Flink Stateful Functions Docker Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-statefun-docker.git GitHub mirror: https://github.com/apache/flink-statefun-docker Flink Website Repository
ASF repository: https://gitbox.apache.org/repos/asf/flink-web.git GitHub mirror: https://github.com/apache/flink-web.git Complete List of Repositories # The complete list of repositories of Apache Flink can be found under https://gitbox.apache.org/repos/asf#flink.
Training # Ververica currently maintains free Apache Flink training. Their training website has slides and exercises with solutions. The slides are also available on SlideShare.
Project Wiki # The Apache Flink project wiki contains a range of relevant resources for Flink users. However, some content on the wiki might be out-of-date. When in doubt, please refer to the Flink documentation.
Flink Forward # Flink Forward is a conference happening yearly in different locations around the world. Up to date information about the conference is available on Flink-Forward.org.
People # Please find the most up-to-date list here.
Materials / Apache Flink Logos # The materials page offers assets such as the Apache Flink logo in different image formats, or the Flink color scheme.
`}),e.add({id:20,href:"/documentation/flinkml-stable/",title:"ML $FlinkMLStableShortVersion (stable)",section:"Documentation",content:" Flink ML documentation (latest stable release) # You can find the Flink ML documentation for the latest stable release here. "}),e.add({id:21,href:"/documentation/flinkml-master/",title:"ML Master (snapshot)",section:"Documentation",content:" Flink ML documentation (latest snapshot) # You can find the Flink ML documentation for the latest snapshot here. "}),e.add({id:22,href:"/what-is-flink/security/",title:"Security",section:"About",content:` Security # Security Updates # This section lists fixed vulnerabilities in Flink.
CVE ID Affected Flink versions Notes CVE-2020-1960 1.1.0 to 1.1.5, 1.2.0 to 1.2.1, 1.3.0 to 1.3.3, 1.4.0 to 1.4.2, 1.5.0 to 1.5.6, 1.6.0 to 1.6.4, 1.7.0 to 1.7.2, 1.8.0 to 1.8.3, 1.9.0 to 1.9.2, 1.10.0 Users are advised to upgrade to Flink 1.9.3 or 1.10.1 or later versions or remove the port parameter from the reporter configuration (see advisory for details). CVE-2020-17518 1.5.1 to 1.11.2 Fixed in commit a5264a6f41524afe8ceadf1d8ddc8c80f323ebc4 Users are advised to upgrade to Flink 1.11.3 or 1.12.0 or later versions. CVE-2020-17519 1.11.0, 1.11.1, 1.11.2 Fixed in commit b561010b0ee741543c3953306037f00d7a9f0801 Users are advised to upgrade to Flink 1.11.3 or 1.12.0 or later versions. CVE-2023-41834 Flink Stateful Functions 3.1.0, 3.1.1, 3.2.0 Fixed in commit b06c0a23a5a622d48efc8395699b2e4502bd92be Users are advised to upgrade to Flink Stateful Functions 3.3.0 or later versions. Frequently Asked Questions # During a security analysis of Flink, I noticed that Flink allows for remote code execution, is this an issue? # Apache Flink is a framework for executing user-supplied code in clusters. Users can submit code to Flink processes, which will be executed unconditionally, without any attempts to limit what code can run. Starting other processes, establishing network connections or accessing and modifying local files is possible.
Historically, we’ve received numerous remote code execution vulnerability reports, which we had to reject, as this is by design.
We strongly discourage users to expose Flink processes to the public internet. Within company networks or “cloud” accounts, we recommend restricting access to a Flink cluster via appropriate means.
I found a vulnerability in Flink, how do I report it? # Thanks a lot for looking into the security of Apache Flink! We appreciate reports improving the security of Flink. We accept vulnerability reports through the Apache Security Team, via their private email address security@apache.org.
If you want to discuss a potential security issue privately with the Flink PMC, you can reach us also via private@flink.apache.org.
`}),e.add({id:23,href:"/what-is-flink/special-thanks/",title:"Special Thanks",section:"About",content:` Special Thanks # General Apache sponsors # Without those sponsors, the ASF would simply not exist or sustain its activities:
https://www.apache.org/foundation/thanks.html
For those who want to know more about the Apache Sponsorship Program, please check:
https://www.apache.org/foundation/sponsorship.html
Thanks !
Organizations who helped our project … # We would also like to thank the companies and organizations who sponsored machines or services for helping the development of Apache Flink:
Alibaba donated 8 machines (32vCPU,64GB) to run Flink CI builds for Flink repository and Flink pull requests. AWS donated service costs for flink-connector-aws tests that hit real AWS services. Ververica donated a machine (1vCPU,2GB) for hosting flink-ci repositories and a machine (8vCPU,64GB) for running Flink benchmarks. `}),e.add({id:24,href:"/documentation/flink-stateful-functions-stable/",title:"Stateful Functions $StateFunStableShortVersion (stable)",section:"Documentation",content:" Flink documentation (latest stable release) # You can find the Flink documentation for the latest stable release here. "}),e.add({id:25,href:"/getting-started/",title:"Getting Started",section:"Apache Flink® — Stateful Computations over Data Streams",content:" Documentation # "}),e.add({id:26,href:"/documentation/flink-stateful-functions-master/",title:"Stateful Functions Master (snapshot)",section:"Documentation",content:" Flink Stateful Functions documentation (latest snapshot) # You can find the Flink Stateful Functions documentation for the latest snapshot here. "}),e.add({id:27,href:"/documentation/",title:"Documentation",section:"Apache Flink® — Stateful Computations over Data Streams",content:" Documentation # "}),e.add({id:28,href:"/how-to-contribute/overview/",title:"Overview",section:"How to Contribute",content:` How To Contribute # Apache Flink is developed by an open and friendly community. Everybody is cordially welcome to join the community and contribute to Apache Flink. There are several ways to interact with the community and to contribute to Flink including asking questions, filing bug reports, proposing new features, joining discussions on the mailing lists, contributing code or documentation, improving the website, or testing release candidates.
What do you want to do? # Contributing to Apache Flink goes beyond writing code for the project. Below, we list different opportunities to help the project:
Area Further information Report a Bug To report a problem with Flink, open Flink’s Jira, log in if necessary, and click on the red Create button at the top. Please give detailed information about the problem you encountered and, if possible, add a description that helps to reproduce the problem. Contribute Code Read the Code Contribution Guide Help With Code Reviews Read the Code Review Guide Help Preparing a Release Releasing a new version consists of the following steps: Building a new release candidate and starting a vote (usually for 72 hours) on the dev@flink.apache.org list Testing the release candidate and voting (+1 if no issues were found, -1 if the release candidate has issues). Going back to step 1 if the release candidate had issues. Otherwise we publish the release. Read the test procedure for a release. Contribute Documentation Read the Documentation Contribution Guide Support Flink Users Reply to questions on the user mailing list Reply to Flink related questions on Stack Overflow with the apache-flink, flink-streaming or flink-sql tag Check the latest issues in Jira for tickets which are actually user questions Improve The Website Read the Website Contribution Guide Spread the Word About Flink Organize or attend a Flink Meetup Contribute to the Flink blog Share your conference, meetup or blog post on the community@flink.apache.org mailing list, or tweet about it, tagging the @ApacheFlink handle. Any other question? Reach out to the dev@flink.apache.org mailing list to get help! Further reading # How to become a committer # Committers are community members that have write access to the project’s repositories, i.e., they can modify the code, documentation, and website by themselves and also accept other contributions.
There is no strict protocol for becoming a committer or PMC member. Candidates for new committers are typically people that are active contributors and community members.
Candidates for new committers are suggested by current committers or PMC members, and voted upon by the PMC.
If you would like to become a committer, you should engage with the community and start contributing to Apache Flink in any of the above ways. You might also want to talk to other committers and ask for their advice and guidance.
What are we looking for in Committers # Being a committer means being recognized as a significant contributor to the project (community or technology), and having the tools to help with the development. Committer candidates are community members who have made good contributions over an extended period of time and want to continue their contributions.
Community contributions include helping to answer user questions on the mailing list, verifying release candidates, giving talks, organizing community events, and other forms of evangelism and community building. The “Apache Way” has a strong focus on the project community, and committers can be recognized for outstanding community contributions even without any code contributions.
Code/technology contributions include contributed pull requests (patches), design discussions, reviews, testing, and other help in identifying and fixing bugs. Especially constructive and high quality design discussions, as well as helping other contributors, are strong indicators.
While the prior points give ways to identify promising candidates, the following are “must haves” for any committer candidate:
Being community minded: The candidate understands the meritocratic principles of community management. They do not always optimize for as much as possible personal contribution, but will help and empower others where it makes sense.
We trust that a committer candidate will use their write access to the repositories responsibly, and if in doubt, conservatively. Flink is a big system, and it is important that committers are aware of what they know and what they don’t know. In doubt, committers should ask for a second pair of eyes rather than commit to parts that they are not well familiar with. (Even the most seasoned committers follow this practice.)
They have shown to be respectful towards other community members and constructive in discussions.
What are we looking for in PMC members # The PMC is the official controlling body of the project. PMC members “must” be able to perform the official responsibilities of the PMC (verify releases and growth of committers/PMC). We “want” them to be people that have a vision for Flink, technology and community wise.
For the avoidance of doubt, not every PMC member needs to know all details of how exactly Flink’s release process works (it is okay to understand the gist and how to find the details). Likewise, not every PMC member needs to be a visionary. We strive to build a PMC that covers all parts well, understanding that each member brings different strengths.
Ideally, we find candidates among active community members that have shown initiative to shape the direction of Flink (technology and community) and have shown willingness to learn the official processes, such as how to create or verify for releases.
A PMC member is also a committer. Candidates are already committers or will automatically become also a committer when joining the PMC. Hence, the “What are we looking for in committers?” also applies to PMC candidates.
A PMC member has a lot of power in a project. A single PMC member can block many decisions and generally stall and harm the project in many ways. We hence must trust the PMC candidates to be level-headed, constructive, supportive, and willing to “disagree and commit” at times.
`}),e.add({id:29,href:"/how-to-contribute/contribute-code/",title:"Contribute Code",section:"How to Contribute",content:` Contributing Code # Apache Flink is maintained, improved, and extended by code contributions of volunteers. We welcome contributions to Flink, but due to the size of the project and to preserve the high quality of the code base, we follow a contribution process that is explained in this document.
Please feel free to ask questions at any time. Either send a mail to the Dev mailing list or comment on the Jira issue you are working on.
IMPORTANT: Please read this document carefully before starting to work on a code contribution. Follow the process and guidelines explained below. Contributing to Apache Flink does not start with opening a pull request. We expect contributors to reach out to us first to discuss the overall approach together. Without consensus with the Flink committers, contributions might require substantial rework or will not be reviewed.
Looking for what to contribute # If you have a good idea for the contribution, you can proceed to the code contribution process. If you are looking for what you could contribute, you can browse open Jira issues in Flink’s bug tracker, which are not assigned, and then follow the code contribution process. If you are very new to the Flink project and want to learn about it and its contribution process, you can check the starter issues, which are annotated with a starter label.
Code Contribution Process # Note: The code contribution process has changed recently (June 2019). The community decided to shift the "backpressure" from pull requests to Jira, by requiring contributors to get consensus (indicated by being assigned to the ticket) before opening a pull request. 1Discuss Create a Jira ticket or mailing list discussion and reach consensus
Agree on importance, relevance, scope of the ticket, discuss the implementation approach and find a committer willing to review and merge the change.
Only committers can assign a Jira ticket.
2Implement Implement the change according to the Code Style and Quality Guide and the approach agreed upon in the Jira ticket.
Only start working on the implementation if there is consensus on the approach (e.g. you are assigned to the ticket)
3Review Open a pull request and work with the reviewer.
Pull requests belonging to unassigned Jira tickets or not authored by assignee will not be reviewed or merged by the community.
4Merge A committer of Flink checks if the contribution fulfills the requirements and merges the code to the codebase.
Note: trivial hot fixes such as typos or syntax errors can be opened as a [hotfix] pull request, without a Jira ticket. 1. Create Jira Ticket and Reach Consensus # The first step for making a contribution to Apache Flink is to reach consensus with the Flink community. This means agreeing on the scope and implementation approach of a change.
In most cases, the discussion should happen in Flink’s bug tracker: Jira.
The following types of changes require a [DISCUSS] thread on the Flink Dev mailing list:
big changes (major new feature; big refactorings, involving multiple components) potentially controversial changes or issues changes with very unclear approaches or multiple equal approaches Do not open a Jira ticket for these types of changes before the discussion has come to a conclusion. Jira tickets based on a dev@ discussion need to link to that discussion and should summarize the outcome.
Requirements for a Jira ticket to get consensus:
Formal requirements The Title describes the problem concisely. The Description gives all the details needed to understand the problem or feature request. The Component field is set: Many committers and contributors only focus on certain subsystems of Flink. Setting the appropriate component is important for getting their attention. There is agreement that the ticket solves a valid problem, and that it is a good fit for Flink. The Flink community considers the following aspects: Does the contribution alter the behavior of features or components in a way that it may break previous users’ programs and setups? If yes, there needs to be a discussion and agreement that this change is desirable. Does the contribution conceptually fit well into Flink? Is it too much of a special case such that it makes things more complicated for the common case, or bloats the abstractions / APIs? Does the feature fit well into Flink’s architecture? Will it scale and keep Flink flexible for the future, or will the feature restrict Flink in the future? Is the feature a significant new addition (rather than an improvement to an existing part)? If yes, will the Flink community commit to maintaining this feature? Does this feature align well with Flink’s roadmap and currently ongoing efforts? Does the feature produce added value for Flink users or developers? Or does it introduce the risk of regression without adding relevant user or developer benefit? Could the contribution live in another repository, e.g., Apache Bahir or another external repository? Is this a contribution just for the sake of getting a commit in an open source project (fixing typos, style changes merely for taste reasons) There is consensus on how to solve the problem. This includes considerations such as API and data backwards compatibility and migration strategies Testing strategies Impact on Flink’s build time Dependencies and their licenses If a change is identified as a large or controversial change in the discussion on Jira, it might require a Flink Improvement Proposal (FLIP) or a discussion on the Dev mailing list to reach agreement and consensus.
Contributors can expect to get a first reaction from a committer within a few days after opening the ticket. If a ticket doesn’t get any attention, we recommend reaching out to the developer mailing list. Note that the Flink community sometimes does not have the capacity to accept all incoming contributions.
Once all requirements for the ticket are met, a committer will assign somebody to the Assignee field of the ticket to work on it. Only committers have the permission to assign somebody.
Pull requests belonging to unassigned Jira tickets will not be reviewed or merged by the community.
2. Implement your change # Once you’ve been assigned to a Jira issue, you may start to implement the required changes.
Here are some further points to keep in mind while implementing:
Set up a Flink development environment Follow the Code Style and Quality Guide of Flink Take any discussions and requirements from the Jira issue or design document into account. Do not mix unrelated issues into one contribution. 3. Open a Pull Request # Considerations before opening a pull request:
Make sure that mvn clean verify is passing on your changes to ensure that all checks pass, the code builds and that all tests pass. Execute the End to End tests of Flink. Make sure no unrelated or unnecessary reformatting changes are included. Make sure your commit history adheres to the requirements. Make sure your change has been rebased to the latest commits in your base branch. Make sure the pull request refers to the respective Jira, and that each Jira issue is assigned to exactly one pull request (in case of multiple pull requests for one Jira; resolve that situation first) Considerations before or right after opening a pull request:
Make sure that the branch is building successfully on Azure DevOps. Code changes in Flink are reviewed and accepted through GitHub pull requests.
There is a separate guide on how to review a pull request, including our pull request review process. As a code author, you should prepare your pull request to meet all requirements.
4. Merge change # The code will be merged by a committer of Flink once the review is finished. The Jira ticket will be closed afterwards.
`}),e.add({id:30,href:"/how-to-contribute/reviewing-prs/",title:"Review Pull Requests",section:"How to Contribute",content:` How to Review a Pull Request # This guide is for all committers and contributors that want to help with reviewing code contributions. Thank you for your effort - good reviews are one of the most important and crucial parts of an open source project. This guide should help the community to make reviews such that:
Contributors have a good contribution experience. Our reviews are structured and check all important aspects of a contribution. We make sure to keep a high code quality in Flink. We avoid situations where contributors and reviewers spend a lot of time refining a contribution that gets rejected later. Review Checklist # Every review needs to check the following six aspects. We encourage to check these aspects in order, to avoid spending time on detailed code quality reviews when formal requirements are not met or there is no consensus in the community to accept the change.
1. Is the Contribution Well-Described? # Check whether the contribution is sufficiently well-described to support a good review. Trivial changes and fixes do not need a long description. If the implementation is exactly according to a prior discussion on Jira or the development mailing list, only a short reference to that discussion is needed. If the implementation is different from the agreed approach in the consensus discussion, a detailed description of the implementation is required for any further review of the contribution.
Any pull request that changes functionality or behavior needs to describe the big picture of these changes, so that reviews know what to look for (and don’t have to dig through the code to hopefully understand what the change does).
A contribution is well-described if the following questions 2, 3, and 4 can be answered without looking at the code.
2. Is There Consensus that the Change or Feature Should Go into Flink? # This question can be directly answered from the linked Jira issue. For pull requests that are created without prior consensus, a discussion in Jira to seek consensus will be needed.
For [hotfix] pull requests, consensus needs to be checked in the pull request.
3. Does the Contribution Need Attention from some Specific Committers and Is There Time Commitment from These Committers? # Some changes require attention and approval from specific committers. For example, changes in parts that are either very performance sensitive, or have a critical impact on distributed coordination and fault tolerance need input by a committer that is deeply familiar with the component.
As a rule of thumb, special attention is required when the Pull Request description answers one of the questions in the template section “Does this pull request potentially affect one of the following parts” with ‘yes’.
This question can be answered with
Does not need specific attention Needs specific attention for X (X can be for example checkpointing, jobmanager, etc.). Has specific attention for X by @committerA, @contributorB If the pull request needs specific attention, one of the tagged committers/contributors should give the final approval.
4. Does the Implementation Follow the Agreed Upon Overall Approach/Architecture? # In this step, we check if a contribution follows the agreed upon approach from the previous discussion in Jira or the mailing lists.
This question should be answerable from the Pull Request description (or the linked Jira) as much as possible.
We recommend to check this before diving into the details of commenting on individual parts of the change.
5. Is the Overall Code Quality Good, Meeting Standard we Want to Maintain in Flink? # This is the detailed code review of the actual changes, covering:
Are the changes doing what is described in the Jira ticket or design document? Does the code follow the right software engineering practices? Is the code correct, robust, maintainable, testable? Are the changes performance aware, when changing a performance sensitive part? Are the changes sufficiently covered by tests? Are the tests executing fast, i.e., are heavy-weight integration tests only used when necessary? Does the code format follow Flink’s checkstyle pattern? Does the code avoid to introduce additional compiler warnings? If dependencies have been changed, were the NOTICE files updated? Code guidelines can be found in the Flink Code Style and Quality Guide.
6. Are the English and Chinese documentation updated? # If the pull request introduces a new feature, the feature should be documented. The Flink community is maintaining both an English and a Chinese documentation. So both documentations should be updated. If you are not familiar with the Chinese language, please open a Jira assigned to the chinese-translation component for Chinese documentation translation and link it with current Jira issue. If you are familiar with Chinese language, you are encouraged to update both sides in one pull request.
See more about how to contribute documentation.
`}),e.add({id:31,href:"/how-to-contribute/code-style-and-quality-preamble/",title:"Code Style and Quality Guide",section:"How to Contribute",content:` Apache Flink Code Style and Quality Guide # Preamble # Pull Requests & Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # This is an attempt to capture the code and quality standard that we want to maintain.
A code contribution (or any piece of code) can be evaluated in various ways: One set of properties is whether the code is correct and efficient. This requires solving the logical or algorithmic problem correctly and well.
Another set of properties is whether the code follows an intuitive design and architecture, whether it is well structured with right separation of concerns, and whether the code is easily understandable and makes its assumptions explicit. That set of properties requires solving the software engineering problem well. A good solution implies that the code is easily testable, maintainable also by other people than the original authors (because it is harder to accidentally break), and efficient to evolve.
While the first set of properties has rather objective approval criteria, the second set of properties is much harder to assess, but is of high importance for an open source project like Apache Flink. To make the code base inviting to many contributors, to make contributions easy to understand for developers that did not write the original code, and to make the code robust in the face of many contributions, well engineered code is crucial.1 For well engineered code, it is easier to keep it correct and fast over time.
This is of course not a full guide on how to write well engineered code. There is a world of big books that try to capture that. This guide is meant as a checklist of best practices, patterns, anti-patterns, and common mistakes that we observed in the context of developing Flink.
A big part of high-quality open source contributions is about helping the reviewer to understand the contribution and double-check the implications, so an important part of this guide is about how to structure a pull request for review.
In earlier days, we (the Flink community) did not always pay sufficient attention to this, making some components of Flink harder to evolve and to contribute to. ↩︎
`}),e.add({id:32,href:"/how-to-contribute/contribute-documentation/",title:"Contribute Documentation",section:"How to Contribute",content:` Contribute Documentation # Good documentation is crucial for any kind of software. This is especially true for sophisticated software systems such as distributed data processing engines like Apache Flink. The Apache Flink community aims to provide concise, precise, and complete documentation and welcomes any contribution to improve Apache Flink’s documentation.
Obtain the documentation sources # Apache Flink’s documentation is maintained in the same git repository as the code base. This is done to ensure that code and documentation can be easily kept in sync.
The easiest way to contribute documentation is to fork Flink’s mirrored repository on GitHub into your own GitHub account by clicking on the fork button at the top right. If you have no GitHub account, you can create one for free.
Next, clone your fork to your local machine.
git clone https://github.com/<your-user-name>/flink.git The documentation is located in the docs/ subdirectory of the Flink code base.
Before you start working on the documentation … # … please make sure there exists a Jira issue that corresponds to your contribution. We require all documentation changes to refer to a Jira issue, except for trivial fixes such as typos.
Also, have a look at the Documentation Style Guide for some guidance on how to write accessible, consistent and inclusive documentation.
Update or extend the documentation # The Flink documentation is written in Markdown. Markdown is a lightweight markup language which can be translated to HTML.
In order to update or extend the documentation you have to modify the Markdown (.md) files. Please verify your changes by starting the build script in preview mode.
./build_docs.sh -p The script compiles the Markdown files into static HTML pages and starts a local webserver. Open your browser at http://localhost:1313/ to view the compiled documentation including your changes. The served documentation is automatically re-compiled and updated when you modify and save Markdown files and refresh your browser.
Please feel free to ask any questions you have on the developer mailing list.
Chinese documentation translation # The Flink community is maintaining both English and Chinese documentation. If you want to update or extend the documentation, both English and Chinese documentation should be updated. If you are not familiar with Chinese language, please open a JIRA tagged with the chinese-translation component for Chinese documentation translation and link it with the current JIRA issue. If you are familiar with Chinese language, you are encouraged to update both sides in one pull request.
NOTE: The Flink community is still in the process of translating Chinese documentations, some documents may not have been translated yet. If the document you are updating is not translated yet, you can just copy the English changes to the Chinese document.
The Chinese documents are located in the content.zh/docs folder. You can update or extend the Chinese file in the content.zh/docs folder according to the English documents changes.
Submit your contribution # The Flink project accepts documentation contributions through the GitHub Mirror as Pull Requests. Pull requests are a simple way of offering a patch by providing a pointer to a code branch that contains the changes.
To prepare and submit a pull request follow these steps.
Commit your changes to your local git repository. The commit message should point to the corresponding Jira issue by starting with [FLINK-XXXX].
Push your committed contribution to your fork of the Flink repository at GitHub.
git push origin myBranch Go to the website of your repository fork (https://github.com/<your-user-name>/flink) and use the “Create Pull Request” button to start creating a pull request. Make sure that the base fork is apache/flink master and the head fork selects the branch with your changes. Give the pull request a meaningful description and submit it.
It is also possible to attach a patch to a Jira issue.
`}),e.add({id:33,href:"/how-to-contribute/",title:"How to Contribute",section:"Apache Flink® — Stateful Computations over Data Streams",content:" How to contribute # "}),e.add({id:34,href:"/how-to-contribute/documentation-style-guide/",title:"Documentation Style Guide",section:"How to Contribute",content:` Documentation Style Guide # This guide provides an overview of the essential style guidelines for writing and contributing to the Flink documentation. It’s meant to support your contribution journey in the greater community effort to improve and extend existing documentation — and help make it more accessible, consistent and inclusive.
Language # The Flink documentation is maintained in US English and Chinese — when extending or updating the documentation, both versions should be addressed in one pull request. If you are not familiar with the Chinese language, make sure that your contribution is complemented by these additional steps:
Open a JIRA ticket for the translation, tagged with the chinese-translation component; Link the ticket to the original contribution JIRA ticket. Looking for style guides to contribute with translating existing documentation to Chinese? Go ahead and consult this translation specification.
Language Style # Below, you find some basic guidelines that can help ensure readability and accessibility in your writing. For a deeper and more complete dive into language style, also refer to the General Guiding Principles.
Voice and Tone # Use active voice. Active voice supports brevity and makes content more engaging. If you add by zombies after the verb in a sentence and it still makes sense, you are using the passive voice.
Active Voice “You can run this example in your IDE or on the command line.” Passive Voice “This example can be run in your IDE or on the command line (by zombies).” Use you, never we. Using we can be confusing and patronizing to some users, giving the impression that “we are all members of a secret club and you didn’t get a membership invite”. Address the user as you.
Avoid gender- and culture-specific language. There is no need to identify gender in documentation: technical writing should be gender-neutral. Also, jargon and conventions that you take for granted in your own language or culture are often different elsewhere. Humor is a staple example: a great joke in one culture can be widely misinterpreted in another.
Avoid qualifying and prejudging actions. For a user that is frustrated or struggling to complete an action, using words like quick or easy can lead to a poor documentation experience.
Avoid using uppercase words to highlight or emphasize statements. Highlighting key words with e.g. bold or italic font usually appears more polite. If you want to draw attention to important but not obvious statements, try to group them into separate paragraphs starting with a label, highlighted with a corresponding HTML tag:
<span class="label label-info">Note</span> <span class="label label-warning">Warning</span> <span class="label label-danger">Danger</span> Using Flink-specific Terms # Use clear definitions of terms or provide additional instructions on what something means by adding a link to a helpful resource, such as other documentation pages or the Flink Glossary. The Glossary is a work in progress, so you can also propose new terms by opening a pull-request.
Repository # Markdown files (.md) should have a short name that summarizes the topic covered, spelled in lowercase and with dashes (-) separating the words. The Chinese version file should have the same name as the English version, but stored in the content.zh folder.
Syntax # The documentation website is generated using Hugo and the pages are written in Markdown, a lightweight portable format for web publishing (but not limited to it).
Extended Syntax # Markdown can also be used in combination with GitHub Flavored Markdown and plain HTML. For example, some contributors prefer to use HTML tags for images and are free to do so with this intermix.
Front Matter # In addition to Markdown, each file contains a YAML front matter block that will be used to set variables and metadata on the page. The front matter must be the first thing in the file and must be specified as a valid YAML set between triple-dashed lines.
Apache License # For every documentation file, the front matter should be immediately followed by the Apache License statement. For both language versions, this block must be stated in US English and copied in the exact same words as inthe following example.
--- title: Concepts layout: redirect --- <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> Below are the front matter variables most commonly used along the Flink documentation.
Variable Possible Values Description Layout layout {base,plain,redirect} The layout file to use. Layout files are located under the _layouts directory. Content title %s The title to be used as the top-level (Level-1) heading for the page. Navigation nav-id %s The ID of the page. Other pages can use this ID as their nav-parent_id. nav-parent_id {root,%s} The ID of the parent page. The lowest navigation level is root. nav-pos %d The relative position of pages per navigation level. nav-title %s The title to use as an override of the default link text (title). Documentation-wide information and configuration settings that sit under _config.yml are also available to the front matter through the site variable. These settings can be accessed using the following syntax:
{{ "{{ site.CONFIG_KEY " }}}} The placeholder will be replaced with the value of the variable named CONFIG_KEY when generating the documentation.
Formatting # Listed in the following sections are the basic formatting guidelines to get you started with writing documentation that is consistent and simple to navigate.
Headings # In Markdown, headings are any line prefixed with a hash (#), with the number of hashes indicating the level of the heading. Headings should be nested and consecutive — never skip a header level for styling reasons!
Syntax Level Description # Heading Level-1 The page title is defined in the Front Matter, so this level should not be used. ## Heading Level-2 Starting level for Sections. Used to organize content by higher-level topics or goals. ### Heading Level-3 Sub-sections. Used in each Section to separate supporting information or tasks. #### Heading Level-4 Best Practice # Use descriptive language in the phrasing of headings. For example, for a documentation page on dynamic tables, “Dynamic Tables and Continuous Queries” is more descriptive than “Background” or “Technical Information”.
Table of Contents # In the documentation build, the Table Of Contents (TOC) is automatically generated from the headings of the page using the following line of markup:
{{ "{:toc" }}} All headings up to Level-3 are considered. To exclude a particular heading from the TOC:
{{ "# Excluded Heading {:.no_toc" }}} Best Practice # Write a short and concise introduction to the topic that is being covered and place it before the TOC. A little context, such an outline of key messages, goes a long way in ensuring that the documentation is consistent and accessible to all levels of knowledge.
Navigation # In the documentation build, navigation is defined using properties configured in the front-matter variables of each page.
It’s possible to use Back to Top links in extensive documentation pages, so that users can navigate to the top of the page without having to scroll back up manually. In markup, this is implemented as a placeholder that is replaced with a default link when generating the documentation:
{{ "{% top " }}%} Best Practice # It’s recommended to use Back to Top links at least at the end of each Level-2 section.
Annotations # In case you want to include edge cases, tightly related information or nice-to-knows in the documentation, it’s a (very) good practice to highlight them using special annotations.
To highlight a tip or piece of information that may be helpful to know:
<div class="alert alert-info"> // Info Message </div> To signal danger of pitfalls or call attention to an important piece of information that is crucial to follow:
<div class="alert alert-danger"> // Danger Message </div> Links # Adding links to documentation is an effective way to guide the user into a better understanding of the topic being discussed without the risk of overwriting.
Links to sections in the page. Each heading generates an implicit identifier to directly link it within a page. This identifier is generated by making the heading lowercase and replacing internal spaces with hyphens.
Heading: ## Heading Title ID: #heading-title [Link Text](#heading-title) Links to other pages of the Flink documentation.
[Link Text]({% link path/to/link-page.md %}) Links to external pages
[Link Text](external_url) Best Practice # Use descriptive links that provide information on the action or destination. For example, avoid using “Learn More” or “Click Here” links.
Visual Elements # Figures and other visual elements are placed under the root fig folder and can be referenced in documentation pages using a syntax similar to that of links:
Best Practice # Use flowcharts, tables and figures where appropriate or necessary for additional clarification, but never as a standalone source of information. Make sure that any text included in those elements is large enough to read and that the overall resolution is adequate.
Code # Inline code. Small code snippets or references to language constructs in normal text flow should be highlighted using surrounding backticks ( \` ).
Code blocks. Code that represents self-contained examples, feature walkthroughs, demonstration of best practices or other useful scenarios should be wrapped using a fenced code block with appropriate syntax highlighting. One way of achieving this with markup is:
\`\`\`java // Java Code \`\`\` When specifying multiple programming languages, each code block should be styled as a tab:
<div class="codetabs" markdown="1"> <div data-lang="java" markdown="1"> \`\`\`java // Java Code \`\`\` </div> <div data-lang="scala" markdown="1"> \`\`\`scala // Scala Code \`\`\` </div> </div> These code blocks are often used to learn and explore, so there are a few best practices to keep in mind:
Showcase key development tasks. Reserve code examples for common implementation scenarios that are meaningful for users. Leave more lengthy and complicated examples for tutorials or walkthroughs.
Ensure the code is standalone. Code examples should be self-contained and not have external dependencies (except for outlier cases such as examples on how to use specific connectors). Include all import statements without using wildcards, so that newcomers can understand and learn which packages are being used.
Avoid shortcuts. For example, handle exceptions and cleanup as you would in real-world code.
Include comments, but don’t overdo it. Provide an introduction describing the main functionality of the code and possible caveats that might not be obvious from reading it. Use comments to clarify implementation details and to describe the expected output.
Commands in code blocks. Commands can be documented using bash syntax highlighted code blocks. The following items should be considered when adding commands to the documentation:
Use long parameter names. Long parameter names help the reader to understand the purpose of the command. They should be preferred over their short counterparts. One parameter per line. Using long parameter names makes the command possibly harder to read. Putting one parameter per line improves the readability. You need to add a backslash \\ escaping the newline at the end of each intermediate line to support copy&paste. Indentation. Each new parameter line should be indented by 6 spaces. Use $ prefix to indicate command start. The readability of the code block might worsen having multiple commands in place. Putting a dollar sign $ in front of each new commands helps identifying the start of a command. A properly formatted command would look like this:
$ ./bin/flink run-application \\ --target kubernetes-application \\ -Dkubernetes.cluster-id=my-first-application-cluster \\ -Dkubernetes.container.image=custom-image-name \\ local:///opt/flink/usrlib/my-flink-job.jar General Guiding Principles # This style guide has the overarching goal of setting the foundation for documentation that is Accessible, Consistent, Objective, Logical and Inclusive.
Accessible # The Flink community is diverse and international, so you need to think wide and globally when writing documentation. Not everyone speaks English at a native level and the level of experience with Flink (and stream processing in general) ranges from absolute beginners to experienced advanced users. Ensure technical accuracy and linguistic clarity in the content you produce so that it can be understood by all users.
Consistent # Stick to the basic guidelines detailed in this style guide and use your own best judgment to uniformly spell, capitalize, hyphenate, bold and italicize text. Correct grammar, punctuation and spelling are desirable, but not a hard requirement — documentation contributions are open to any level of language proficiency.
Objective # Keep your sentences short and to the point. As a rule of thumb, if a sentence is shorter than 14 words, readers will likely understand 90 percent of its content. Sentences with more than 25 words are usually harder to understand and should be revised and split, when possible. Being concise and using well-known keywords also allows users to navigate to relevant documentation with little effort.
Logical # Be mindful that most users will scan through online content and only read around 28 percent of it. This underscores the importance of grouping related ideas together into a clear hierarchy of information and using focused, descriptive headings. Placing the most relevant information in the first two paragraphs of each section is a good practice that increases the “return of time invested” for the user.
Inclusive # Use positive language and concrete, relatable examples to ensure the content is findable and welcoming to all users. The documentation is translated to other languages, so using simple language and familiar words also helps reduce the translation effort.
`}),e.add({id:35,href:"/how-to-contribute/improve-website/",title:"Contribute to the Website",section:"How to Contribute",content:` Improving the Website # The Apache Flink website presents Apache Flink and its community. It serves several purposes including:
Informing visitors about Apache Flink and its features. Encouraging visitors to download and use Flink. Encouraging visitors to engage with the community. We welcome any contribution to improve our website. This document contains all information that is necessary to improve Flink’s website.
Obtain the website sources # The website of Apache Flink is hosted in a dedicated git repository which is mirrored to GitHub at https://github.com/apache/flink-web.
The easiest way to contribute website updates is to fork the mirrored website repository on GitHub into your own GitHub account by clicking on the fork button at the top right. If you have no GitHub account, you can create one for free.
Next, clone your fork to your local machine.
git clone https://github.com/<your-user-name>/flink-web.git The flink-web directory contains the cloned repository. The website resides in the asf-site branch of the repository. Run the following commands to enter the directory and switch to the asf-site branch.
cd flink-web git checkout asf-site Directory structure and files # Flink’s website is written in Markdown. Markdown is a lightweight markup language which can be translated to HTML. We use Hugo to generate static HTML files from Markdown.
The files and directories in the website git repository have the following roles:
All files ending with .md are Markdown files. These files are translated into static HTML files. The docs directory contains all documentation, themes and other content that’s needed to render and/or generate the website. The docs/content/docs folder contains all English content. The docs/content.zh/docs contains all Chinese content. The docs/content/posts contains all blog posts. The content/ directory contains the generated HTML files from Hugo. It is important to place the files in this directory since the Apache Infrastructure to host the Flink website is pulling the HTML content from this directory. (For committers: When pushing changes to the website git, push also the updates in the content/ directory!) Update or extend the documentation # You can update and extend the website by modifying or adding Markdown files or any other resources such as CSS files. To verify your changes start the build script in preview mode.
./build.sh The script compiles the Markdown files into HTML and starts a local webserver. Open your browser at http://localhost:1313 to view the website including your changes. The Chinese translation is located at http://localhost:1313/zh/. The served website is automatically re-compiled and updated when you modify and save any file and refresh your browser.
To add an external hyperlink to the official documentation of Flink in your documentations or blog posts, please use the following syntax:
{{< docs_link file="relative_path/" name="Title">}} For example:
{{< docs_link file="flink-docs-stable/docs/dev/datastream/side_output/" name="Side Output">}} Please feel free to ask any questions you have on the developer mailing list.
Submit your contribution # The Flink project accepts website contributions through the GitHub Mirror as Pull Requests. Pull requests are a simple way of offering a patch by providing a pointer to a code branch that contains the changes.
To prepare and submit a pull request follow these steps.
Commit your changes to your local git repository. Unless your contribution is a major rework of the website, please squash it into a single commit.
Push the commit to a dedicated branch of your fork of the Flink repository at GitHub.
git push origin myBranch Go the website of your repository fork (https://github.com/<your-user-name>/flink-web) and use the “Create Pull Request” button to start creating a pull request. Make sure that the base fork is apache/flink-web asf-site and the head fork selects the branch with your changes. Give the pull request a meaningful description and submit it.
Committer section # This section is only relevant for committers.
ASF website git repositories # ASF writable: https://gitbox.apache.org/repos/asf/flink-web.git
Details on how to set the credentials for the ASF git repository are linked here.
Merging a pull request # Contributions are expected to be done on the source files only (no modifications on the compiled files in the content/ directory.). Before pushing a website change, please run the build script
./build.sh add the changes to the content/ directory as an additional commit and push the changes to the ASF base repository.
`}),e.add({id:36,href:"/how-to-contribute/getting-help/",title:"Getting Help",section:"How to Contribute",content:` Getting Help # Having a Question? # The Apache Flink community answers many user questions every day. You can search for answers and advice in the archives or reach out to the community for help and guidance.
User Mailing List # Many Flink users, contributors, and committers are subscribed to Flink’s user mailing list. The user mailing list is a very good place to ask for help.
Before posting to the mailing list, you can search the mailing list archives for email threads that discuss issues related to yours on the following websites.
Apache Mailing List Archive If you’d like to post to the mailing list, you need to
subscribe to the mailing list by sending an email to user-subscribe@flink.apache.org, confirm the subscription by replying to the confirmation email, and send your email to user@flink.apache.org. Please note that you won’t receive a response to your mail if you are not subscribed.
Slack # You can join the Apache Flink community on Slack. After creating an account in Slack, don’t forget to introduce yourself in #introductions. Due to Slack limitations the invite link expires after 100 invites. If it is expired, please reach out to the Dev mailing list. Any existing Slack member can also invite anyone else to join.
There are a couple of community rules:
Be respectful - This is the most important rule! All important decisions and conclusions must be reflected back to the mailing lists. “If it didn’t happen on a mailing list, it didn’t happen.” - The Apache Mottos Use Slack threads to keep parallel conversations from overwhelming a channel. Use either #pyflink (for all Python Flink questions) or #troubleshooting (for all other Flink questions). Please do not direct message people for troubleshooting, Jira assigning and PR review. Doing this can result in removal from Slack. Stack Overflow # Many members of the Flink community are active on Stack Overflow. You can search for questions and answers or post your questions using the [apache-flink] tag.
Found a Bug? # If you observe an unexpected behavior that might be caused by a bug, you can search for reported bugs or file a bug report in Flink’s JIRA.
If you are unsure whether the unexpected behavior happened due to a bug or not, please post a question to the user mailing list.
Got an Error Message? # Identifying the cause for an error message can be challenging. In the following, we list the most common error messages and explain how to handle them.
I have a NotSerializableException. # Flink uses Java serialization to distribute copies of the application logic (the functions and operations you implement, as well as the program configuration, etc.) to the parallel worker processes. Because of that, all functions that you pass to the API must be serializable, as defined by java.io.Serializable.
If your function is an anonymous inner class, consider the following:
make the function a standalone class, or a static inner class. use a Java 8 lambda function. Is your function is already a static class, check the fields that you assign when you create an instance of the class. One of the fields most likely holds a non-serializable type.
In Java, use a RichFunction and initialize the problematic fields in the open() method. In Scala, you can often simply use “lazy val” to defer initialization until the distributed execution happens. This may come at a minor performance cost. You can naturally also use a RichFunction in Scala. Using the Scala API, I get an error about implicit values and evidence parameters. # This error means that the implicit value for the type information could not be provided. Make sure that you have an import org.apache.flink.streaming.api.scala._ (DataStream API) or an import org.apache.flink.api.scala._ (DataSet API) statement in your code.
If you are using Flink operations inside functions or classes that take generic parameters, then a TypeInformation must be available for that parameter. This can be achieved by using a context bound:
def myFunction[T: TypeInformation](input: DataSet[T]): DataSet[Seq[T]] = { input.reduceGroup( i => i.toSeq ) } See Type Extraction and Serialization for an in-depth discussion of how Flink handles types.
I see a ClassCastException: X cannot be cast to X. # When you see an exception in the style com.foo.X cannot be cast to com.foo.X (or cannot be assigned to com.foo.X), it means that multiple versions of the class com.foo.X have been loaded by different class loaders, and types of that class are attempted to be assigned to each other.
The reason for that can be:
Class duplication through child-first classloading. That is an intended mechanism to allow users to use different versions of the same dependencies that Flink uses. However, if different copies of these classes move between Flink’s core and the user application code, such an exception can occur. To verify that this is the reason, try setting classloader.resolve-order: parent-first in the configuration. If that makes the error disappear, please write to the mailing list to check if that may be a bug.
Caching of classes from different execution attempts, for example by utilities like Guava’s Interners, or Avro’s Schema cache. Try to not use interners, or reduce the scope of the interner/cache to make sure a new cache is created whenever a new task execution is started.
I have an AbstractMethodError or NoSuchFieldError. # Such errors typically indicate a mix-up in some dependency version. That means a different version of a dependency (a library) is loaded during the execution compared to the version that code was compiled against.
From Flink 1.4.0 on, dependencies in your application JAR file may have different versions compared to dependencies used by Flink’s core, or other dependencies in the classpath (for example from Hadoop). That requires child-first classloading to be activated, which is the default.
If you see these problems in Flink 1.4+, one of the following may be true:
You have a dependency version conflict within your application code. Make sure all your dependency versions are consistent. You are conflicting with a library that Flink cannot support via child-first classloading. Currently these are the Scala standard library classes, as well as Flink’s own classes, logging APIs, and any Hadoop core classes. My DataStream application produces no output, even though events are going in. # If your DataStream application uses Event Time, check that your watermarks get updated. If no watermarks are produced, event time windows might never trigger, and the application would produce no results.
You can check in Flink’s web UI (watermarks section) whether watermarks are making progress.
I see an exception reporting “Insufficient number of network buffers”. # If you run Flink with a very high parallelism, you may need to increase the number of network buffers.
By default, Flink takes 10% of the JVM heap size for network buffers, with a minimum of 64MB and a maximum of 1GB. You can adjust all these values via taskmanager.network.memory.fraction, taskmanager.network.memory.min, and taskmanager.network.memory.max.
Please refer to the Configuration Reference for details.
My job fails with various exceptions from the HDFS/Hadoop code. What can I do? # The most common cause for that is that the Hadoop version in Flink’s classpath is different than the Hadoop version of the cluster you want to connect to (HDFS / YARN).
The easiest way to fix that is to pick a Hadoop-free Flink version and simply export the Hadoop path and classpath from the cluster.
`}),e.add({id:37,href:"/2024/03/21/apache-flink-kubernetes-operator-1.8.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.8.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink Kubernetes Operator 1.8.0!
The release includes many improvements to the operator core, the autoscaler, and introduces new features like TaskManager memory auto-tuning.
We encourage you to download the release and share your experience with the community through the Flink mailing lists or JIRA! We’re looking forward to your feedback!
Highlights # Flink Autotuning # We’re excited to announce our latest addition to the autoscaling module: Flink Autotuning.
Flink Autotuning complements Flink Autoscaling by auto-adjusting critical settings of the Flink configuration. For this release, we support auto-configuring Flink memory which is a huge source of pain for users. Flink has various memory pools (e.g. heap memory, network memory, state backend memory, JVM metaspace) which all need to be assigned fractions of the available memory upfront in order for a Flink job to run properly.
Assigning too little memory results in pipeline failures, which is why most users end up assigning way too much memory. Based on our experience, we’ve seen that heap memory is at least 50% over-provisioned, even after using Flink Autoscaling. The reason is that Flink Autoscaling is primarily CPU-driven to optimize pipeline throughput, but doesn’t change the ratio between CPU/Memory on the containers.
Resource savings are nice to have, but the real power of Flink Autotuning is the reduced time to production.
With Flink Autoscaling and Flink Autotuning, all users need to do is set a max memory size for the TaskManagers, just like they would normally configure TaskManager memory. Flink Autotuning then automatically adjusts the various memory pools and brings down the total container memory size. It does that by observing the actual max memory usage on the TaskMangers or by calculating the exact number of network buffers required for the job topology. The adjustments are made together with Flink Autoscaling, so there is no extra downtime involved.
Flink Autotuning can be enabled by setting:
# Autoscaling needs to be enabled job.autoscaler.enabled: true # Turn on Autotuning job.autoscaler.memory.tuning.enabled: true For future releases, we are planning to auto-tune more aspects of the Flink configuration, e.g. the number of task slots.
Another area for improvement is how managed memory is auto-configured. If no managed memory is used, e.g. the heap-based state backend is used, managed memory will be set to zero which helps save a lot of memory. If managed memory is used, e.g. via RocksDB, the configured managed memory will be kept constant because Flink currently lacks metrics to accurately measure the usage of managed memory. Nevertheless, users already benefit from the resource savings and optimizations for heap, metaspace, and network memory. RocksDB users can solely focus their attention on configuring managed memory. For RocksDB, we also added an option to add all saved memory to the managed memory. This is beneficial when running with RocksDB to maximize the in-memory performance.
Improved Accuracy of Autoscaling Metrics # So far, Flink Autoscaling relied on sampling scaling metrics within the current metric window. The resulting accuracy depended on the number of samples and the sampling interval. For this release, whenever possible, we use Flink’s accumulated metrics which provide cumulative counters of metrics like records processed or time spent processing. This allows us to derive the exact metric value for the window.
For example, to calculate the average records processed per time unite, we measure the accumulated number of records processed once at the start of the metric window, e.g. 1000 records. Then we measure a second time when the metric window closes, e.g. 1500. By subtracting the former from the latter, we can calculate the exact amount of records processed: 1500-1000 = 500. We can then divide by the metric window duration to get the average number of records processed.
Rescale time estimation # We now measure the actual required restart time for applying autoscaling decisions. Previously, users had to manually configure the estimated maximum restart time via job.autoscaler.restart.time. If the new feature is enabled, this setting is now only used for the first scaling. After the first scaling, the actual restart time has been observed and will be taken into account for future scalings.
This feature can be enabled via:
job.autoscaler.restart.time-tracking.enabled: true For the next release we are thinking to enable it by default.
Autoscaling for Session Cluster Jobs # Autoscaling used to be an application / job cluster only feature. Now it is also supported for session clusters.
Improved Standalone Autoscaler # Since 1.7.0, Flink Autoscaling is now also available in a standalone module without the need to run on top of Kubernetes.
We merged notable improvements to the standalone autoscaler:
The control loop now supports multiple thread We implemented a JdbcAutoScalerStateStore for storing state via JDBC-supported databases We implemented a JdbcAutoScalerEventHandler for emitting events to JDBC-supported databases Savepoint Trigger Nonce # A common request is to support a streamlined, user-friendly way of redeploying from a target savepoint. Previously this was only possible by deleting the CR and recreating it with initialSavepointPath. A big downside of this approach is a loss of savepoint/checkpoint history in the status that some platforms may need, resulting in non-cleaned up savepoints.
We introduced a savepointRedeployNonce field in the job spec similar to other action trigger nonces.
If the nonce changes to a new non-null value the job will be redeployed from the path specified in the initialSavepointPath (or empty state If the path is empty).
Cluster Shutdown and Resource Cleanup Improvements # We improved the shutdown behavior and added better and more consistent logging. We now scale down the JobManager replicas to zero before removing the JobManager deployment. This ensures that the TaskManager shutdown is clean because the owner reference to the JobManager deployment is not removed immediately which gives TaskManagers time to shut down.
Custom Flink Resource Mutator # Users already had the ability to provide custom resource validators to the operator. With this release, we added support for custom mutators. See the docs.
Smaller Operator image # Through build optimizations, we were able to reduce the size of the Docker image by 20%.
Experimental Features # Cluster Resource Capacity Check # The operator can automatically check if sufficient resources are available for an autoscaling decision. The information is retrieved from the Kubernetes cluster based on the available node metrics and the maximum node size of the Kubernetes Cluster Autoscaler.
The feature can be turned on by setting this configuration value:
kubernetes.operator.cluster.resource-view.refresh-interval: 5 min Release Notes # The release notes can be found here.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.8.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.8.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.8.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.8.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # 1996fanrui, Alexander Fedulov, AncyRominus, Caican Cai, Cancai Cai, ConradJam, Domenic Bove, Dominik Dębowczyk, Gabor Somogyi, Guillaume Vauvert, Gyula Fora, Hao Xin, Jerry Wang, Justin Chen, Máté Czagány, Maximilian Michels, Peter Huang, Rui Fan, Ryan van Huuksloot, Samrat, Tony Garrard, Yang-LI-CS, ensctom, fengfei02, flashJd, Nicolas Fraison
`}),e.add({id:38,href:"/2024/03/18/announcing-the-release-of-apache-flink-1.19/",title:"Announcing the Release of Apache Flink 1.19",section:"Flink Blog",content:`The Apache Flink PMC is pleased to announce the release of Apache Flink 1.19.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 162 people contributed to this release completing 33 FLIPs and 600+ issues. Thank you!
Let’s dive into the highlights.
Flink SQL Improvements # Custom Parallelism for Table/SQL Sources # Now in Flink 1.19, you can set a custom parallelism for performance tuning via the scan.parallelism option. The first available connector is DataGen (Kafka connector is on the way). Here is an example using SQL Client:
-- set parallelism within the ddl CREATE TABLE Orders ( order_number BIGINT, price DECIMAL(32,2), buyer ROW<first_name STRING, last_name STRING>, order_time TIMESTAMP(3) ) WITH ( 'connector' = 'datagen', 'scan.parallelism' = '4' ); -- or set parallelism via dynamic table option SELECT * FROM Orders /*+ OPTIONS('scan.parallelism'='4') */; More Information
Documentation FLIP-367: Support Setting Parallelism for Table/SQL Sources Configurable SQL Gateway Java Options # A new option env.java.opts.sql-gateway for specifying the Java options is introduced in Flink 1.19, so you can fine-tune the memory settings, garbage collection behavior, and other relevant Java parameters for SQL Gateway.
More Information
FLINK-33203 Configure Different State TTLs Using SQL Hint # Starting from Flink 1.18, Table API and SQL users can set state time-to-live (TTL) individually for stateful operators via the SQL compiled plan. In Flink 1.19, users have a more flexible way to specify custom TTL values for regular joins and group aggregations directly within their queries by utilizing the STATE_TTL hint. This improvement means that you no longer need to alter your compiled plan to set specific TTLs for these frequently used operators. With the introduction of STATE_TTL hints, you can streamline your workflow and dynamically adjust the TTL based on your operational requirements.
Here is an example:
-- set state ttl for join SELECT /*+ STATE_TTL('Orders'= '1d', 'Customers' = '20d') */ * FROM Orders LEFT OUTER JOIN Customers ON Orders.o_custkey = Customers.c_custkey; -- set state ttl for aggregation SELECT /*+ STATE_TTL('o' = '1d') */ o_orderkey, SUM(o_totalprice) AS revenue FROM Orders AS o GROUP BY o_orderkey; More Information
Documentation FLIP-373: Support Configuring Different State TTLs using SQL Hint Named Parameters for Functions and Procedures # Named parameters now can be used when calling a function or stored procedure. With named parameters, users do not need to strictly specify the parameter position, just specify the parameter name and its corresponding value. At the same time, if non-essential parameters are not specified, they will default to being filled with null.
Here’s an example of defining a function with one mandatory parameter and two optional parameters using named parameters:
public static class NamedArgumentsTableFunction extends TableFunction<Object> { @FunctionHint( output = @DataTypeHint("STRING"), arguments = { @ArgumentHint(name = "in1", isOptional = false, type = @DataTypeHint("STRING")), @ArgumentHint(name = "in2", isOptional = true, type = @DataTypeHint("STRING")), @ArgumentHint(name = "in3", isOptional = true, type = @DataTypeHint("STRING"))}) public void eval(String arg1, String arg2, String arg3) { collect(arg1 + ", " + arg2 + "," + arg3); } } When calling the function in SQL, parameters can be specified by name, for example:
SELECT * FROM TABLE(myFunction(in1 => 'v1', in3 => 'v3', in2 => 'v2')); Also the optional parameters can be omitted:
SELECT * FROM TABLE(myFunction(in1 => 'v1')); More Information
Documentation FLIP-387: Support named parameters for functions and call procedures Window TVF Aggregation Features # Supports SESSION Window TVF in Streaming Mode
Now users can use SESSION Window TVF in streaming mode. A simple example is as follows: -- session window with partition keys SELECT * FROM TABLE( SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES)); -- apply aggregation on the session windowed table with partition keys SELECT window_start, window_end, item, SUM(price) AS total_price FROM TABLE( SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES)) GROUP BY item, window_start, window_end; Supports Changelog Inputs for Window TVF Aggregation
Window aggregation operators (generated based on Window TVF Function) can now handle changelog streams (e.g., CDC data sources, etc.). Users are recommended to migrate from legacy window aggregation to the new syntax for more complete feature support. More Information
Documentation New UDF Type: AsyncScalarFunction # The common UDF type ScalarFunction works well for CPU-intensive operations, but less well for IO bound or otherwise long-running computations. In Flink 1.19, we have a new AsyncScalarFunction which is a user-defined asynchronous ScalarFunction allows for issuing concurrent function calls asynchronously.
More Information
FLIP-400: AsyncScalarFunction for asynchronous scalar function support Tuning: MiniBatch Optimization for Regular Joins # The record amplification is a pain point when performing cascading joins in Flink, now in Flink 1.19, the new mini-batch optimization can be used for regular join to reduce intermediate result in such cascading join scenarios.
More Information
minibatch-regular-joins FLIP-415: Introduce a new join operator to support minibatch Runtime & Coordination Improvements # Dynamic Source Parallelism Inference for Batch Jobs # In Flink 1.19, we have supported dynamic source parallelism inference for batch jobs, which allows source connectors to dynamically infer the parallelism based on the actual amount of data to consume. This feature is a significant improvement over previous versions, which only assigned a fixed default parallelism to source vertices. Source connectors need to implement the inference interface to enable dynamic parallelism inference. Currently, the FileSource connector has already been developed with this functionality in place. Additionally, the configuration execution.batch.adaptive.auto-parallelism.default-source-parallelism will be used as the upper bound of source parallelism inference. And now it will not default to 1. Instead, if it is not set, the upper bound of allowed parallelism set via execution.batch.adaptive.auto-parallelism.max-parallelism will be used. If that configuration is also not set, the default parallelism set via parallelism.default or StreamExecutionEnvironment#setParallelism() will be used instead.
More Information
Documentation FLIP-379: Support dynamic source parallelism inference for batch jobs Standard YAML for Flink Configuration # Starting with Flink 1.19, Flink has officially introduced full support for the standard YAML 1.2 syntax. The default configuration file has been changed to config.yaml and placed in the conf/ directory. Users should directly modify this file to configure Flink. If users want to use the legacy configuration file flink-conf.yaml, users just need to copy this file into the conf/ directory. Once the legacy configuration file flink-conf.yaml is detected, Flink will prioritize using it as the configuration file. And in the upcoming Flink 2.0, the flink-conf.yaml configuration file will no longer work.
More Information
flink-configuration-file FLIP-366: Support standard YAML for Flink configuration Profiling JobManager/TaskManager on Flink Web # In Flink 1.19, we support triggering profiling at the JobManager/TaskManager level, allowing users to create a profiling instance with arbitrary intervals and event modes (supported by async-profiler). Users can easily submit profiles and export results in the Flink Web UI.
For example, users can simply submit a profiling instance with a specified period and mode by “Creating a Profiling Instance” after identifying a candidate TaskManager/JobManager with a performance bottleneck:
then easily download the interactive HTML file after the profiling instance is complete:
More Information
Documentation FLIP-375: Built-in cross-platform powerful java profiler New Config Options for Administrator JVM Options # A set of administrator JVM options are available, which prepend the user-set extra JVM options for platform-wide JVM tuning.
More Information
Documentation FLIP-397: Add config options for administrator JVM options Beta Support for Java 21 # Apache Flink was made ready to compile and run with Java 21. This feature is still in beta mode. Issues should be reported in Flink’s bug tracker.
More Information
FLINK-33163 Checkpoints Improvements # Using Larger Checkpointing Interval When Source is Processing Backlog # IsProcessingBacklog is introduced to indicate whether a record should be processed with low latency or high throughput. Connector developers can update the Source implementation to utilize the SplitEnumeratorContext#setIsProcessingBacklog method to report whether the records are backlog records. Users can set the execution.checkpointing.interval-during-backlog to use a larger checkpoint interval to enhance the throughput while the job is processing backlog if the source is backlog-aware.
More Information
FLINK-32514 FLIP-309: Support using larger checkpointing interval when source is processing backlog CheckpointsCleaner Clean Individual Checkpoint States in Parallel # Now when disposing of no longer needed checkpoints, every state handle/state file will be disposed in parallel by the ioExecutor, vastly improving the disposing speed of a single checkpoint (for large checkpoints the disposal time can be improved from 10 minutes to < 1 minute) . The old behavior can be restored by setting state.checkpoint.cleaner.parallel-mode to false.
More Information
FLINK-33090 Trigger Checkpoints through Command Line Client # The command line interface supports triggering a checkpoint manually. Usage:
./bin/flink checkpoint $JOB_ID [-full] By specifying the ‘-full’ option, a full checkpoint is triggered. Otherwise an incremental checkpoint is triggered if the job is configured to take incremental ones periodically.
More Information
FLINK-6755 Connector API Improvements # New Interfaces to SinkV2 That Are Consistent with Source API # In Flink 1.19, the SinkV2 API made some changes to align with Source API.
The following interfaces are deprecated: TwoPhaseCommittingSink, StatefulSink, WithPreWriteTopology, WithPreCommitTopology, WithPostCommitTopology.
The following new interfaces have been introduced: CommitterInitContext, CommittingSinkWriter, WriterInitContext, StatefulSinkWriter.
The following interface method’s parameter has been changed: Sink#createWriter.
The original interfaces will remain available during the 1.19 release line, but they will be removed in consecutive releases.
More Information
FLINK-33973 FLIP-372: Enhance and synchronize Sink API to match the Source API New Committer Metrics to Track the Status of Committables # The TwoPhaseCommittingSink#createCommitter method parameterization has been changed, a new CommitterInitContext parameter has been added. The original method will remain available during the 1.19 release line, but they will be removed in consecutive releases.
More Information
FLINK-25857 FLIP-371: Provide initialization context for Committer creation in TwoPhaseCommittingSink Important Deprecations # In preparation for the release of Flink 2.0 later this year, the community has decided to officially deprecate multiple APIs that were approaching end of life for a while.
Flink’s org.apache.flink.api.common.time.Time is now officially deprecated and will be dropped in Flink 2.0 Please migrate it to Java’s own Duration class. Methods supporting the Duration class that replace the deprecated Time-based methods were introduced. org.apache.flink.runtime.jobgraph.RestoreMode#LEGACY is deprecated. Please use RestoreMode#CLAIM or RestoreMode#NO_CLAIM mode instead to get a clear state file ownership when restoring. The old method of resolving schema compatibility has been deprecated, please migrate to the new method following Migrating from deprecated TypeSerializerSnapshot#resolveSchemaCompatibility(TypeSerializer newSerializer) before Flink 1.19. Configuring serialization behavior through hard codes is deprecated, e.g., ExecutionConfig#enableForceKryo(). Please use the options pipeline.serialization-config, pipeline.force-avro, pipeline.force-kryo, and pipeline.generic-types. Registration of instance-level serializers is deprecated, using class-level serializers instead. We have deprecated all setXxx and getXxx methods except getString(String key, String defaultValue) and setString(String key, String value), such as: setInteger, setLong, getInteger and getLong etc. Users and developers are recommend to use get and set methods with ConfigOption instead of string as key. The non-ConfigOption objects in the StreamExecutionEnvironment, CheckpointConfig, and ExecutionConfig and their corresponding getter/setter interfaces are now be deprecated. These objects and methods are planned to be removed in Flink 2.0. The deprecated interfaces include the getter and setter methods of RestartStrategy, CheckpointStorage, and StateBackend. org.apache.flink.api.common.functions.RuntimeContext#getExecutionConfig is now officially deprecated and will be dropped in Flink 2.0. Please migrate all related usages to the new getter method:
Migrate TypeInformation#createSerializer to RuntimeContext#createTypeSerializer
Migrate RuntimeContext#getExecutionConfig.getGlobalJobParameters to RuntimeContext#getGlobalJobParameters
Migrate RuntimeContext#getExecutionConfig.isObjectReuseEnabled() to RuntimeContext#isObjectReuseEnabled org.apache.flink.api.common.functions.RichFunction#open(Configuration parameters) method has been deprecated and will be removed in future versions. Users are encouraged to migrate to the new RichFunction#open(OpenContext openContext). org.apache.flink.configuration.AkkaOptions is deprecated and replaced with RpcOptions. Upgrade Notes # The Flink community tries to ensure that upgrades are as seamless as possible. However, certain changes may require users to make adjustments to certain parts of the program when upgrading to version 1.19. Please refer to the release notes for a comprehensive list of adjustments to make and issues to check during the upgrading process.
List of Contributors # The Apache Flink community would like to express gratitude to all the contributors who made this release possible:
Adi Polak, Ahmed Hamdy, Akira Ajisaka, Alan Sheinberg, Aleksandr Pilipenko, Alex Wu, Alexander Fedulov, Archit Goyal, Asha Boyapati, Benchao Li, Bo Cui, Cheena Budhiraja, Chesnay Schepler, Dale Lane, Danny Cranmer, David Moravek, Dawid Wysakowicz, Deepyaman Datta, Dian Fu, Dmitriy Linevich, Elkhan Dadashov, Eric Brzezenski, Etienne Chauchot, Fang Yong, Feng Jiajie, Feng Jin, Ferenc Csaky, Gabor Somogyi, Gyula Fora, Hang Ruan, Hangxiang Yu, Hanyu Zheng, Hjw, Hong Liang Teoh, Hongshun Wang, HuangXingBo, Jack, Jacky Lau, James Hughes, Jane Chan, Jerome Gagnon, Jeyhun Karimov, Jiabao Sun, JiangXin, Jiangjie (Becket) Qin, Jim Hughes, Jing Ge, Jinzhong Li, JunRuiLee, Laffery, Leonard Xu, Lijie Wang, Martijn Visser, Marton Balassi, Matt Wang, Matthias Pohl, Matthias Schwalbe, Matyas Orhidi, Maximilian Michels, Mingliang Liu, Máté Czagány, Panagiotis Garefalakis, ParyshevSergey, Patrick Lucas, Peter Huang, Peter Vary, Piotr Nowojski, Prabhu Joseph, Pranav Sharma, Qingsheng Ren, Robin Moffatt, Roc Marshal, Rodrigo Meneses, Roman, Roman Khachatryan, Ron, Rui Fan, Ruibin Xing, Ryan Skraba, Samrat002, Sergey Nuyanzin, Shammon FY, Shengkai, Stefan Richter, SuDewei, TBCCC, Tartarus0zm, Thomas Weise, Timo Walther, Varun, Venkata krishnan Sowrirajan, Vladimir Matveev, Wang FeiFan, Weihua Hu, Weijie Guo, Wencong Liu, Xiangyu Feng, Xianxun Ye, Xiaogang Zhou, Xintong Song, XuShuai, Xuyang, Yanfei Lei, Yangze Guo, Yi Zhang, Yu Chen, Yuan Mei, Yubin Li, Yuepeng Pan, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhu Zhu, archzi, bvarghese1, caicancai, caodizhou, dongwoo6kim, duanyc, eason.qin, fengjiajie, fengli, gongzhongqiang, gyang94, hejufang, jiangxin, jiaoqingbo, jingge, lijingwei.5018, lincoln lee, liuyongvs, luoyuxia, mimaomao, murong00, polaris6, pvary, sharath1709, simplejason, sunxia, sxnan, tzy123-123, wangfeifan, wangzzu, xiangyu0xf, xiarui, xingbo, xuyang, yeming, yhx, yinhan.yh, yunfan123, yunfengzhou-hub, yunhong, yuxia Luo, yuxiang, zoudan, 周仁祥, 曹帝胄, 朱通通, 马越
`}),e.add({id:39,href:"/2024/01/19/apache-flink-1.18.1-release-announcement/",title:"Apache Flink 1.18.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.18 series.
This release includes 47 bug fixes, vulnerability fixes, and minor improvements for Flink 1.18. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.18.1.
Note: Users that have state compression should not migrate to 1.18.1 (nor 1.18.0) due to a critical bug that could lead to data loss. Please refer to FLINK-34063 for more information.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.18.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.18.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.18.1</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.18.1 Release Notes # Release Notes - Flink - Version 1.18.1 Bug [FLINK-31650] - Incorrect busyMsTimePerSecond metric value for FINISHED task [FLINK-33158] - Cryptic exception when there is a StreamExecSort in JsonPlan [FLINK-33171] - Consistent implicit type coercion support for equal and non-equal comparisons for codegen [FLINK-33223] - MATCH_RECOGNIZE AFTER MATCH clause can not be deserialised from a compiled plan [FLINK-33225] - Python API incorrectly passes \`JVM_ARGS\` as single argument [FLINK-33313] - RexNodeExtractor fails to extract conditions with binary literal [FLINK-33352] - OpenAPI spec is lacking mappings for discriminator properties [FLINK-33395] - The join hint doesn't work when appears in subquery [FLINK-33474] - ShowPlan throws undefined exception In Flink Web Submit Page [FLINK-33523] - DataType ARRAY<INT NOT NULL> fails to cast into Object[] [FLINK-33529] - PyFlink fails with "No module named 'cloudpickle" [FLINK-33541] - RAND_INTEGER can't be existed in a IF statement [FLINK-33567] - Flink documentation should only display connector downloads links when a connector is available [FLINK-33588] - Fix Flink Checkpointing Statistics Bug [FLINK-33613] - Python UDF Runner process leak in Process Mode [FLINK-33693] - Force aligned barrier logic doesn't work when the aligned checkpoint timeout is enabled [FLINK-33752] - When Duration is greater than or equal to 1 day, the display unit is ms. [FLINK-33793] - java.lang.NoSuchMethodError when checkpointing in Google Cloud Storage [FLINK-33872] - Checkpoint history does not display for completed jobs New Feature [FLINK-33071] - Log checkpoint statistics Improvement [FLINK-24819] - Higher APIServer cpu load after using SharedIndexInformer replaced naked Kubernetes watch [FLINK-32611] - Redirect to Apache Paimon's link instead of legacy flink table store [FLINK-33041] - Add an introduction about how to migrate DataSet API to DataStream [FLINK-33161] - [benchmark] Java17 profile for benchmarks [FLINK-33501] - Rely on Maven wrapper instead of having custom Maven installation logic [FLINK-33598] - Watch HA configmap via name instead of lables to reduce pressure on APIserver `}),e.add({id:40,href:"/2023/11/29/apache-flink-1.16.3-release-announcement/",title:"Apache Flink 1.16.3 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the third bug fix release of the Flink 1.16 series.
This release includes 52 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.16.3.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.16.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.16.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.16.3</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.16.3 Release Notes # Release Notes - Flink - Version 1.16.3 Bug [FLINK-32316] - Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover [FLINK-32362] - SourceAlignment announceCombinedWatermark period task maybe lost [FLINK-32411] - SourceCoordinator thread leaks when job recovers from checkpoint [FLINK-32414] - Watermark alignment will cause flink jobs to hang forever when any source subtask has no SourceSplit [FLINK-32496] - Sources with idleness and alignment always wait for alignment when part of multiple sources is idle [FLINK-27415] - Read empty csv file throws exception in FileSystem table connector [FLINK-28185] - "Invalid negative offset" when using OffsetsInitializer.timestamp(.) [FLINK-29913] - Shared state would be discarded by mistake when maxConcurrentCheckpoint>1 [FLINK-30559] - May get wrong result for \`if\` expression if it's string data type [FLINK-30596] - Multiple POST /jars/:jarid/run requests with the same jobId, runs duplicate jobs [FLINK-30751] - Remove references to disableDataSync in RocksDB documentation [FLINK-30966] - Flink SQL IF FUNCTION logic error [FLINK-31139] - not upload empty state changelog file [FLINK-31967] - SQL with LAG function NullPointerException [FLINK-32023] - execution.buffer-timeout cannot be set to -1 ms [FLINK-32136] - Pyflink gateway server launch fails when purelib != platlib [FLINK-32172] - KafkaExample can not run with args [FLINK-32199] - MetricStore does not remove metrics of nonexistent parallelism in TaskMetricStore when scale down job parallelism [FLINK-32217] - Retain metric store can cause NPE [FLINK-32254] - FineGrainedSlotManager may not allocate enough taskmanagers if maxSlotNum is configured [FLINK-32296] - Flink SQL handle array of row incorrectly [FLINK-32548] - Make watermark alignment ready for production use [FLINK-32583] - RestClient can deadlock if request made after Netty event executor terminated [FLINK-32592] - (Stream)ExEnv#initializeContextEnvironment isn't thread-safe [FLINK-32655] - RecreateOnResetOperatorCoordinator did not forward notifyCheckpointAborted to the real OperatorCoordinator [FLINK-32680] - Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph [FLINK-32760] - Version Conflict in flink-sql-connector-hive for shaded.parquet prefix packages [FLINK-32888] - File upload runs into EndOfDataDecoderException [FLINK-32909] - The jobmanager.sh pass arguments failed [FLINK-33010] - NPE when using GREATEST() in Flink SQL [FLINK-33149] - Bump snappy-java to 1.1.10.4 [FLINK-33291] - The release profile of Flink does include enforcing the Java version only in a "soft" way Improvement [FLINK-29542] - Unload.md wrongly writes UNLOAD operation as LOAD operation [FLINK-32314] - Ignore class-loading errors after RPC system shutdown [FLINK-32371] - Bump snappy-java to 1.1.10.1 `}),e.add({id:41,href:"/2023/11/29/apache-flink-1.17.2-release-announcement/",title:"Apache Flink 1.17.2 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the second bug fix release of the Flink 1.17 series.
This release includes 82 bug fixes, vulnerability fixes, and minor improvements for Flink 1.17. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.17.2.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.17.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.17.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.17.2</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.17.2 Release Notes # Release Notes - Flink - Version 1.17.2 Bug [FLINK-27415] - Read empty csv file throws exception in FileSystem table connector [FLINK-28513] - Flink Table API CSV streaming sink throws SerializedThrowable exception [FLINK-29913] - Shared state would be discarded by mistake when maxConcurrentCheckpoint>1 [FLINK-30559] - May get wrong result for \`if\` expression if it's string data type [FLINK-30596] - Multiple POST /jars/:jarid/run requests with the same jobId, runs duplicate jobs [FLINK-30751] - Remove references to disableDataSync in RocksDB documentation [FLINK-30966] - Flink SQL IF FUNCTION logic error [FLINK-31139] - not upload empty state changelog file [FLINK-31519] - The watermark alignment docs is outdated after FLIP-217 finished [FLINK-31812] - SavePoint from /jars/:jarid:/run api on body is not anymore set to null if empty [FLINK-31967] - SQL with LAG function NullPointerException [FLINK-31974] - JobManager crashes after KubernetesClientException exception with FatalExitExceptionHandler [FLINK-32023] - execution.buffer-timeout cannot be set to -1 ms [FLINK-32034] - Python's DistUtils is deprecated as of 3.10 [FLINK-32056] - Update the used Pulsar connector in flink-python to 4.0.0 [FLINK-32110] - TM native memory leak when using time window in Pyflink ThreadMode [FLINK-32136] - Pyflink gateway server launch fails when purelib != platlib [FLINK-32141] - SharedStateRegistry print too much info log [FLINK-32172] - KafkaExample can not run with args [FLINK-32199] - MetricStore does not remove metrics of nonexistent parallelism in TaskMetricStore when scale down job parallelism [FLINK-32217] - Retain metric store can cause NPE [FLINK-32219] - SQL client hangs when executing EXECUTE PLAN [FLINK-32226] - RestClusterClient leaks jobgraph file if submission fails [FLINK-32249] - A Java string should be used instead of a Calcite NlsString to construct the column comment of CatalogTable [FLINK-32254] - FineGrainedSlotManager may not allocate enough taskmangers if maxSlotNum is configured [FLINK-32296] - Flink SQL handle array of row incorrectly [FLINK-32316] - Duplicated announceCombinedWatermark task maybe scheduled if jobmanager failover [FLINK-32362] - SourceAlignment announceCombinedWatermark period task maybe lost [FLINK-32411] - SourceCoordinator thread leaks when job recovers from checkpoint [FLINK-32414] - Watermark alignment will cause flink jobs to hang forever when any source subtask has no SourceSplit [FLINK-32447] - table hints lost when they inside a view referenced by an external query [FLINK-32456] - JSON_OBJECTAGG & JSON_ARRAYAGG cannot be used with other aggregate functions [FLINK-32465] - KerberosLoginProvider.isLoginPossible does accidental login with keytab [FLINK-32496] - Sources with idleness and alignment always wait for alignment when part of multiple sources is idle [FLINK-32548] - Make watermark alignment ready for production use [FLINK-32578] - Cascaded group by window time columns on a proctime window aggregate may result hang for ever [FLINK-32583] - RestClient can deadlock if request made after Netty event executor terminated [FLINK-32592] - (Stream)ExEnv#initializeContextEnvironment isn't thread-safe [FLINK-32628] - build_wheels_on_macos fails on AZP [FLINK-32655] - RecreateOnResetOperatorCoordinator did not forward notifyCheckpointAborted to the real OperatorCoordinator [FLINK-32680] - Job vertex names get messed up once there is a source vertex chained with a MultipleInput vertex in job graph [FLINK-32760] - Version Conflict in flink-sql-connector-hive for shaded.parquet prefix packages [FLINK-32876] - ExecutionTimeBasedSlowTaskDetector treats unscheduled tasks as slow tasks and causes speculative execution to fail. [FLINK-32888] - File upload runs into EndOfDataDecoderException [FLINK-32909] - The jobmanager.sh pass arguments failed [FLINK-32962] - Failure to install python dependencies from requirements file [FLINK-32974] - RestClusterClient always leaks flink-rest-client-jobgraphs* directories [FLINK-33010] - NPE when using GREATEST() in Flink SQL [FLINK-33149] - Bump snappy-java to 1.1.10.4 [FLINK-33171] - Consistent implicit type coercion support for equal and non-equal comparisons for codegen [FLINK-33291] - The release profile of Flink does include enforcing the Java version only in a "soft" way [FLINK-33352] - OpenAPI spec is lacking mappings for discriminator properties [FLINK-33442] - UnsupportedOperationException thrown from RocksDBIncrementalRestoreOperation [FLINK-33474] - ShowPlan throws undefined exception In Flink Web Submit Page Improvement [FLINK-31774] - Add document for delete and update statement [FLINK-32186] - Support subtask stack auto-search when redirecting from subtask backpressure tab [FLINK-32304] - Reduce rpc-akka jar size [FLINK-32314] - Ignore class-loading errors after RPC system shutdown [FLINK-32371] - Bump snappy-java to 1.1.10.1 [FLINK-32457] - update current documentation of JSON_OBJECTAGG/JSON_ARRAYAGG to clarify the limitation [FLINK-32458] - support mixed use of JSON_OBJECTAGG & JSON_ARRAYAGG with other aggregate functions [FLINK-32547] - Add missing doc for Timestamp support in ProtoBuf format [FLINK-32758] - PyFlink bounds are overly restrictive and outdated [FLINK-33316] - Avoid unnecessary heavy getStreamOperatorFactory [FLINK-33487] - Add the new Snowflake connector to supported list `}),e.add({id:42,href:"/2023/11/22/apache-flink-kubernetes-operator-1.7.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.7.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink Kubernetes Operator 1.7.0! The release introduces a large number of improvements to the autoscaler, including a complete decoupling from Kubernetes to support more Flink environments in the future. It’s important to call out that the release explicitly drops support for Flink 1.13 and 1.14 as agreed by the community.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Flink Version Support Policy Change # Previously the operator only added more and more supported Flink versions without a policy to remove support for these in the future. This resulted in a lot of legacy codepaths already in the core logic.
To keep technical debt at reasonable levels, the community decided to adopt a new Flink version support policy for the operator.
Starting from 1.7.0 the operator will only support the last 4 Flink minor versions corresponding to the date of the operator release. For 1.7.0 this translates to: 1.18, 1.17, 1.16, 1.15 The operator will simply ignore changes made to resources with unsupported Flink versions. This also means that resources with unsupported versions are not possible to delete once the operator is upgraded. To temporarily work around this, users can upgrade the Flink version of the resource before deleting it.
Highlights # Decoupled Autoscaler Module # Starting from 1.7.0, the autoscaler logic is decoupled from Kubernetes and the Flink Kubernetes Operator. The flink-autoscaler module now does not contain any Kubernetes related dependencies but defines a set of generic interfaces that are implemented by the operator.
As part of the decoupling effort, we released the initial version of the Standalone Autoscaler which serves as a limited alternative for anyone not using the Flink Kubernetes Operator currently. It supports scaling a single Flink cluster that can be any type, including: Flink Standalone Cluster, MiniCluster, Flink YARN session cluster, Flink YARN application cluster.
The Standalone Autoscaler runs as a separate Java Process. Please read the Autoscaler Standalone section for setup instructions. The standalone autoscaler is limited to Flink version 1.18.
To benefit from the best possible integration we recommend using the autoscaler as part of the Flink Kubernetes Operator. The standalone autoscaler is not planned to replace this either now or in the future.
To align with the new structure the autoscaler related configs will lose the kubernetes.operator. prefix going forward:
# Old / Deprecated keys # kubernetes.operator.job.autoscaler.enabled # kubernetes.operator.job.autoscaler.metrics.window # New Keys job.autoscaler.enabled job.autoscaler.metrics.window Visit the Extensibility of Autoscaler doc page to get more information.
Improved source metric tracking # Flink currently reports incorrectly low business metrics for sources that spend too much time fetching / polling input (for example IO bound sources). This lead to the autoscaler not scaling sources that were actually running beyond their capacity.
To tackle this problem, we introduced a new mechanism in the autoscaler to automatically detect cases when the sources are running at full capacity (and backlog is building up). In these situations we switch to a new way to compute the maximum capacity (true processing rate) of the affected source vertices that is much more accurate in these cases. We refer to this mechanism currently as “observed true processing rate”, this feature is enabled by default and should not need any custom configuration.
Savepoint triggering improvements # To provide more flexibility to users, periodic savepoint triggering now supports configuring the trigger schedule using a Cron expression in Quartz format. You can find detailed info on the syntax here.
Operator Rate Limiter # A small but operationally important change is that the operator now enables rate limiting for resource events by default. This helps work around some corner cases where the operator was previously overloading the API server on error loops.
The rate limiter is now enabled by default with the following config:
kubernetes.operator.rate-limiter.limit: 5 kubernetes.operator.rate-limiter.refresh-period: 15 s Java 17 and 21 support # The operator can now be built and executed on Java 17 and 21 and we have enabled integration testing for these versions as well.
At the moment we are not releasing new operator docker images by the different Java versions, these need to be built and bundled by the users. The official Kubernetes Operator image remains on Java 11.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.7.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.7.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.7.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.7.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Alexander Fedulov, Clara Xiong, Daren Wong, Dongwoo Kim, Gabor Somogyi, Gyula Fora, Manan Mangal, Maximilian Michels, Nicolas Fraison, Peter Huang, Praneeth Ramesh, Rui Fan, Sergey Nuyanzin, SteNicholas, Zhanghao, Zhenqiu Huang, mehdid93
`}),e.add({id:43,href:"/2023/10/27/apache-flink-kubernetes-operator-1.6.1-release-announcement/",title:"Apache Flink Kubernetes Operator 1.6.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink Kubernetes Operator 1.6 series.
The release contains fixes for several critical issues, and some doc improvements for the autoscaler.
We highly recommend all users to upgrade to Flink Kubernetes Operator 1.6.1.
Release Notes # Bug # [FLINK-32890] Correct HA patch check for zookeeper metadata store [FLINK-33011] Never accidentally delete HA metadata for last state deployments Documentation improvement # [FLINK-32868][docs] Document the need to backport FLINK-30213 for using autoscaler with older version Flinks [docs][autoscaler] Autoscaler docs and default config improvement Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.6.1 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.6.1 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.6.1/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.6.1/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Gyula Fora, Nicolas Fraison, Zhanghao
`}),e.add({id:44,href:"/2023/10/24/announcing-the-release-of-apache-flink-1.18/",title:"Announcing the Release of Apache Flink 1.18",section:"Flink Blog",content:`The Apache Flink PMC is pleased to announce the release of Apache Flink 1.18.0. As usual, we are looking at a packed release with a wide variety of improvements and new features. Overall, 174 people contributed to this release completing 18 FLIPS and 700+ issues. Thank you!
Let’s dive into the highlights.
Towards a Streaming Lakehouse # Flink SQL Improvements # Introduce Flink JDBC Driver For SQL Gateway # Flink 1.18 comes with a JDBC Driver for the Flink SQL Gateway. So, you can now use any SQL Client that supports JDBC to interact with your tables via Flink SQL. Here is an example using SQLLine.
sqlline> !connect jdbc:flink://localhost:8083 sqlline version 1.12.0 sqlline> !connect jdbc:flink://localhost:8083 Enter username for jdbc:flink://localhost:8083: Enter password for jdbc:flink://localhost:8083: 0: jdbc:flink://localhost:8083> CREATE TABLE T( . . . . . . . . . . . . . . .)> a INT, . . . . . . . . . . . . . . .)> b VARCHAR(10) . . . . . . . . . . . . . . .)> ) WITH ( . . . . . . . . . . . . . . .)> 'connector' = 'filesystem', . . . . . . . . . . . . . . .)> 'path' = 'file:///tmp/T.csv', . . . . . . . . . . . . . . .)> 'format' = 'csv' . . . . . . . . . . . . . . .)> ); No rows affected (0.122 seconds) 0: jdbc:flink://localhost:8083> INSERT INTO T VALUES (1, 'Hi'), (2, 'Hello'); +----------------------------------+ | job id | +----------------------------------+ | fbade1ab4450fc57ebd5269fdf60dcfd | +----------------------------------+ 1 row selected (1.282 seconds) 0: jdbc:flink://localhost:8083> SELECT * FROM T; +---+-------+ | a | b | +---+-------+ | 1 | Hi | | 2 | Hello | +---+-------+ 2 rows selected (1.955 seconds) 0: jdbc:flink://localhost:8083> More Information
Documentation FLIP-293: Introduce Flink Jdbc Driver For SQL Gateway Stored Procedure Support for Flink Connectors # Stored procedures have been an indispensable tool in traditional databases, offering a convenient way to encapsulate complex logic for data manipulation and administrative tasks. They also offer the potential for enhanced performance, since they can trigger the handling of data operations directly within an external database. Other popular data systems like Trino and Iceberg automate and simplify common maintenance tasks into small sets of procedures, which greatly reduces users’ administrative burden.
This new update primarily targets developers of Flink connectors, who can now predefine custom stored procedures into connectors via the Catalog interface. The primary benefit to users is that connector-specific tasks that previously may have required writing custom Flink code can now be replaced with simple calls that encapsulate, standardize, and potentially optimize the underlying operations. Users can execute procedures using the familiar CALL syntax, and discover a connector’s available procedures with SHOW PROCEDURES. Stored procedures within connectors improves the extensibility of Flink’s SQL and Table APIs, and should unlock smoother data access and management for users.
Users can use CALL to directly call built-in stored procedures provided by their catalog. For the built-in stored procedures in catalog, please refer to the documentation of the corresponding catalog. For example, when using the Apache Paimon catalog, you can use a stored procedure to trigger compaction for a table.
CREATE TABLE \`paimon\`.\`default\`.\`T\` ( id BIGINT PRIMARY KEY NOT ENFORCED, dt STRING, -- format 'yyyy-MM-dd' v STRING ); -- use catalog before call procedures USE CATALOG \`paimon\`; -- compact the whole table using call statement CALL sys.compact('default.T'); More Information
Documentation FLIP-311: Support Call Stored Procedure Extended DDL Support # From this release onwards, Flink supports
REPLACE TABLE AS SELECT CREATE OR REPLACE TABLE AS SELECT and both these commands and previously supported CREATE TABLE AS can now support atomicity provided the underlying connector also supports this.
Moreover, Apache Flink now supports TRUNCATE TABLE in batch execution mode. Same as before, the underlying connector needs to implement and provide this capability
And, finally, we have also implemented support for adding, dropping and listing partitions via
ALTER TABLE ADD PARTITION ALTER TABLE DROP PARTITION SHOW PARTITIONS More Information
Documentation on TRUNCATE Documentation on CREATE OR REPLACE Documentation on ALTER TABLE FLIP-302: Support TRUNCATE TABLE statement in batch mode FLIP-303: Support REPLACE TABLE AS SELECT statement FLIP-305: Support atomic for CREATE TABLE AS SELECT(CTAS) statement Time Traveling # Flink supports the time travel SQL syntax for querying historical versions of data that allows users to specify a point in time and retrieve the data and schema of a table as it appeared at that time. With time travel, users can easily analyze and compare historical versions of data.
For example, a user can query a table at a specified point in time with the following statement;
-- Query the table \`tb\` for data on November 11, 2022 SELECT * FROM tb FOR SYSTEM_TIME AS OF TIMESTAMP '2022-11-11 00:00:00'; More information
Documentation FLIP-308: Support Time Travel Streaming Execution Improvements # Support Operator-Level State TTL in Table API & SQL # Starting from Flink 1.18, Table API and SQL users can set state time-to-live (TTL) individually for stateful operators. This means that for scenarios like stream regular joins, users can now set different TTLs for the left and right streams. In previous versions, state expiration could only be controlled at the pipeline level using the configuration table.exec.state.ttl. With the introduction of operator-level state retention, users can now optimize resource usage according to their specific requirements.
More Information
Documentation FLIP-292: Enhance COMPILED PLAN to support operator-level state TTL configuration Watermark Alignment and Idleness Detection in SQL # You can now configure watermark alignment and source idleness timeouts in pure SQL via hints. Previously, these features were only available in the DataStream API.
More Information
Documentation FLIP-296: Extend watermark-related features for SQL Batch Execution Improvements # Hybrid Shuffle supports Remote Storage # Hybrid Shuffle supports storing the shuffle data in remote storage. The remote storage path can be configured with the option taskmanager.network.hybrid-shuffle.remote.path. Hybrid Shuffle uses less network memory than before by decoupling the memory usage from the number of parallelisms, improving the stability and ease of use.
More Information
Documentation FLIP-301: Hybrid Shuffle supports Remote Storage Performance Improvements & TPC-DS Benchmark # In previous releases, the community worked extensively to improve Flink’s batch processing performance, which has led to significant improvements. In this release cycle, community contributors continued to put significant effort into further improving Flink’s batch performance.
Runtime Filter for Flink SQL # Runtime filter is a common method for optimizing Join performance. It is designed to dynamically generate filter conditions for certain Join queries at runtime to reduce the amount of scanned or shuffled data, avoid unnecessary I/O and network transmission, and speed up the query. We introduced runtime filters in Flink 1.18, and verified its effectiveness through the TPC-DS benchmark, and observed up to 3x speedup for some queries by enabling this feature.
Operator Fusion Codegen for Flink SQL # Operator Fusion Codegen improves the execution performance of a query by fusing an operator DAG into a single optimized operator that eliminates virtual function calls, leverages CPU registers for intermediate data and reduces the instruction cache miss. As a general technical optimization, we verified its effectiveness through TPC-DS, and only some batch operators (Calc, HashAgg, and HashJoin) completed fusion codegen support in version 1.18, getting significant performance gains on some query.
Note that both features are disabled by default in Flink 1.18 and the Community is looking for feedback by users before enabling them by default. They can be enabled by using table.optimizer.runtime-filter.enabled and table.exec.operator-fusion-codegen.enabled respectively.
Since Flink 1.16, the Apache Flink Community has been continuously tracking the performance of its batch engine via the TPC-DS benchmarking framework. After significant improvements in Flink 1.17 (dynamic join-reordering, dynamic local aggregations), the two improvements described in the previous sections (operator fusion, runtime filters) lead to 14% performance improvement compared to Flink 1.17, a 54% performance improvement compared to Flink 1.16 on a 10T dataset for partitioned tables.
More Information
FLIP-324: Introduce Runtime Filter for Flink Batch Jobs FLIP-315: Support Operator Fusion Codegen for Flink SQL Benchmarking repository Towards Cloud-Native Elasticity # Elasticity describes the ability of a system to adapt to workload changes in a non-disruptive, ideally automatic manner. It is a defining characteristic of cloud-native systems and for long-running streaming workloads it is particularly important. As such, elasticity improvements are an area of continuous investment in the Apache Flink community. Recent initiatives include the Kubernetes Autoscaler, numerous improvements to rescaling performance and last but not least the Adaptive Scheduler.
The Adaptive Scheduler was first introduced in Flink 1.15 and constitutes a centerpiece of a fully-elastic Apache Flink deployment. At its core, it allows jobs to change their resource requirements and parallelism during runtime. In addition, it also adapts to the available resources in the cluster by only rescaling once the cluster can satisfy the minimum required resources of the job.
Until Flink 1.18, the adaptive scheduler was primarily used in Reactive Mode, which meant that a single job by design would always use all the available resources in the cluster. Please see this blog post on how to autoscale Flink Jobs in Reactive Mode using a Horizontal Pod Autoscaler on Kubernetes.
With Flink 1.18 the adaptive scheduler becomes much more powerful and more widely applicable and is on a trajectory to becoming the default scheduler for streaming workloads on Apache Flink.
Dynamic Fine-Grained Rescaling via REST API # Despite the underlying capabilities of the Adaptive Scheduler, the ability to change the resource requirements of a Job during runtime has not yet been exposed to the end user directly. This changes in Flink 1.18. You can now change the parallelism of any individual task of your job via the Flink Web UI and REST API while the job is running.
Under the hood, Apache Flink performs a regular rescaling operation as soon as the required resources for the new parallelism have been acquired. The rescaling operation is not based on a Savepoint, but on an ordinary, periodic checkpoint, which means it does not introduce any additional snapshot. As you can see in the video above, the rescaling operation already happens nearly instantaneously and with a very short downtime for jobs with small state size.
In conjunction with the backpressure monitor of the Apache Flink Web UI, it is now easier than ever to find and maintain an efficient, backpressure-free parallelism for each of the tasks:
If a task is very busy (red), you increase the parallelism. If a task is mostly idle (blue), you decrease the parallelism. More Information
Documentation FLIP-291: Externalized Declarative Resource Management Faster Rescaling with RocksDB # The rescaling times when using RocksDB Statebackend with incremental checkpoints have been improved about 30% in the 99th quantile.
We increased the potential for parallel download from just downloading state handles in parallel to downloading individual files in parallel.
Furthermore, we deactivated write-ahead-logging for batch-inserting into the temporary RocksDB instances we use for rescaling.
More Information
FLINK-32326 FLINK-32345 Support for Java 17 # Java 17 was released in 2021 and is the latest long-term support (LTS) release of Java with an end-of-life in 2029. So, it was about time that Apache Flink added support for it. What does this mean concretely? As of Flink 1.18, you can now run Apache Flink on Java 17 and the official Docker repository includes an image based on Java 17.
docker pull flink:1.18.0-java17 If your cluster runs on Java 17, this of course, also allows you to use Java 17 features in your user programs and to compile it to a target version of Java 17.
More Information
Documentation FLINK-15736 Others Improvements # Production-Ready Watermark Alignment # Supported as “Beta” since Flink 1.16 and Flink 1.17, watermark alignment has been thoroughly tested at scale in the real world. Over that time the community has collected and addressed bugs and performance issues as they were discovered. With the resolution of these issues, we are now happy to recommend watermark alignment for general use.
More Information
FLINK-32548 Pluggable Failure Handling # Apache Flink serves as the foundation for numerous stream processing platforms at companies like Apple, Netflix or Uber. It is also the basis for various commercial stream processing services. Therefore, its ability to easily integrate into the wider ecosystem of these internal as well as vendor platforms becomes increasingly important. The catalog modification listener and pluggable failure handlers fall into this category of improvements.
More Information
Documentation FLIP-304: Pluggable Failure Enrichers SQL Client Quality of Life Improvements # In 1.18 the SQL Client received a collection of usability improvements:
The SQL Client is now more colorful with the ability to enable SQL syntax highlighting and switching among 7 different color schemes It is now easier to edit and navigate through very large queries. It is now possible to turn line numbers off and on. More Information
FLIP-189: SQL Client Usability Improvements Apache Pekko instead of Akka # A year ago, Lightbend announced changing the license of future versions of Akka (2.7+) from Apache 2.0 to BSL. It was also announced that Akka 2.6, the version that Apache Flink uses, would receive security updates and critical bug fixes until September of 2023. As September 2023 was approaching, we decided to switch from Akka to Apache Pekko (incubating). Apache Pekko (incubating) is a fork of Akka 2.6.x, prior to the Akka project’s adoption of the Business Source License. Pekko recently released Apache Pekko 1.0.1-incubating, which enabled us to already use it in Flink 1.18 - just in time. While our mid-term plan is to drop the dependency on Akka or Pekko altogether (see FLINK-29281), the switch to Pekko presents a good short-term solution and ensures that the Apache Pekko and Apache Flink Community can address critical bug fixes and security vulnerabilities throughout our software supply chain.
More Information
FLINK-32468 Calcite Upgrade(s) # In Apache Flink 1.18, Apache Calcite was gradually upgraded from 1.29 to 1.32. The immediate benefit of these upgrades are bug fixes, a smarter optimizer and performance improvements. On a parser level, it now allows joins to be grouped into trees using parentheses (mentioned in SQL-92) e.g. SELECT * FROM a JOIN (b JOIN c ON b.x = c.x) ON a.y = c.y also see CALCITE-35. In addition, the upgrade to Calcite 1.31+ has unblocked the support of Session Windows via Table-Valued Functions (see CALCITE-4865, FLINK-24024) and as a corollary the deprecation of the legacy group window aggregations. Due to CALCITE-4861 Flink’s casting behavior has slightly changed. Some corner cases might behave differently now: For example, casting from FLOAT/DOUBLE 9234567891.12 to INT/BIGINT has now Java behavior for overflows.
More Information
FLINK-27998 FLINK-28744 FLINK-29319 Important Deprecations # In preparation for the release of Flink 2.0 next year, the community has decided to officially deprecate multiple APIs that were approaching end of life for a while.
SourceFunction is now officially deprecated and will be dropped in Flink 2.0. If you are still using a connector that is built on top of SourceFunction please migrate it to Source. SinkFunction is not officially deprecated, but it is also approaching end-of-life and will be superseded by SinkV2. Queryable State is now officially deprecated and will be dropped in Flink 2.0. The DataSet API is now officially deprecated. Users are recommended to migrate to the DataStream API with execution mode BATCH. Upgrade Notes # The Flink community tries to ensure that upgrades are as seamless as possible. However, certain changes may require users to make adjustments to certain parts of the program when upgrading to version 1.18. Please refer to the release notes for a comprehensive list of adjustments to make and issues to check during the upgrading process.
List of Contributors # The Apache Flink community would like to express gratitude to all the contributors who made this release possible:
Aitozi, Akinfolami Akin-Alamu, Alain Brown, Aleksandr Pilipenko, Alexander Fedulov, Anton Kalashnikov, Archit Goyal, Bangui Dunn, Benchao Li, BoYiZhang, Chesnay Schepler, Chris Nauroth, Colten Pilgreen, Danny Cranmer, David Christle, David Moravek, Dawid Wysakowicz, Deepyaman Datta, Dian Fu, Dian Qi, Dong Lin, Eric Xiao, Etienne Chauchot, Feng Jin, Ferenc Csaky, Fruzsina Nagy, Gabor Somogyi, Gunnar Morling, Gyula Fora, HaiYang Chen, Hang Ruan, Hangxiang Yu, Hanyu Zheng, Hong Liang Teoh, Hongshun Wang, Huston, Jacky Lau, James Hughes, Jane Chan, Jark Wu, Jayadeep Jayaraman, Jia Liu, JiangXin, Joao Boto, Junrui Lee, Juntao Hu, K.I. (Dennis) Jung, Kaiqi Dong, L, Leomax_Sun, Leonard Xu, Licho, Lijie Wang, Liu Jiangang, Lyn Zhang, Maomao Min, Martijn Visser, Marton Balassi, Mason Chen, Matthew de Detrich, Matthias Pohl, Min, Mingliang Liu, Mohsen Rezaei, Mrart, Mulavar, Nicholas Jiang, Nicolas Fraison, Noah, Panagiotis Garefalakis, Patrick Lucas, Paul Lin, Peter Vary, Piotr Nowojski, Qingsheng Ren, Ran Tao, Rich Bowen, Robert Metzger, Roc Marshal, Roman Khachatryan, Ron, Rui Fan, Ryan Skraba, Samrat002, Sergey Nuyanzin, Sergio Morales, Shammon FY, ShammonFY, Shengkai, Shuiqiang Chen, Stefan Richter, Tartarus0zm, Timo Walther, Tzu-Li (Gordon) Tai, Venkata krishnan Sowrirajan, Wang FeiFan, Weihua Hu, Weijie Guo, Wencong Liu, Xiaogang Zhou, Xintong Song, XuShuai, Yanfei Lei, Yu Chen, Yubin Li, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, ZhengYiWeng, Zhu Zhu, archzi, baiwuchang, cailiuyang, chenyuzhi, darenwkt, dongwoo kim, eason.qin, felixzh, fengli, frankeshi, fredia, godfrey he, haishui, hehuiyuan, huangxingbo, jiangxin, jiaoqingbo, jinfeng, jingge, kevin.cyj, kristoffSC, leixin, leiyanfei, liming.1018, lincoln lee, lincoln.lil, liujiangang, liuyongvs, luoyuxia, maigeiye, mas-chen, novakov-alexey, oleksandr.nitavskyi, pegasas, sammieliu, shammon, shammon FY, shuiqiangchen, slfan1989, sunxia, tison, tsreaper, wangfeifan, wangkang, whjshj, wuqqq, xiangyu0xf, xincheng.ljr, xmzhou, xuyu, xzw, yuanweining, yuchengxin, yunfengzhou-hub, yunhong, yuxia Luo, yuxiqian, zekai-li, zhangmang, zhengyunhong.zyh, zzzzzzzs, 沈嘉琦
`}),e.add({id:45,href:"/2023/09/19/stateful-functions-3.3.0-release-announcement/",title:"Stateful Functions 3.3.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Stateful Functions 3.3.0!
Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. This new release upgrades the Flink runtime to 1.16.2.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Java SDK, Python SDK,, GoLang SDK and JavaScript SDK distributions are available on Maven, PyPI, Github, and npm respectively. You can also find official StateFun Docker images of the new version on Dockerhub.
For more details, check the complete release notes and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA!
New Features # Fixed CVE-2023-41834 # Stateful Functions versions 3.1.0, 3.1.1 and 3.2.0 allowed HTTP header injection due to Improper Neutralization of CRLF Sequences. Attackers could potentially inject malicious content into the HTTP response that is sent to the user. This could include injecting a fake login form or other phishing content, or injecting malicious JavaScript code that can steal user credentials or perform other malicious actions on the user’s behalf.
Stateful Functions 3.3.0 has fixed this security vulnerability. More details can be found on the Security page.
Upgraded Flink dependency to 1.16.2 # Stateful Functions 3.3.0 runtime uses Flink 1.16.2 underneath. This means that Stateful Functions benefits from the latest improvements and stabilisations that went into Flink. For more information see Flink’s release announcement.
Release Notes # Please review the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 3.3.0.
List of Contributors # Till Rohrmann, Mingmin Xu, Igal Shilman, Martijn Visser, Chesnay Schepler, SiddiqueAhmad, Galen Warren, Seth Wiesman, FilKarnicki, Tzu-Li (Gordon) Tai
If you’d like to get involved, we’re always looking for new contributors.
`}),e.add({id:46,href:"/2023/08/15/apache-flink-kubernetes-operator-1.6.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.6.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink Kubernetes Operator 1.6.0! The release features a large number of improvements all across the operator.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Highlights # Improved and simplified rollback mechanism # Previously the rollback mechanism had some serious limitations always requiring the presence of HA metadata. This prevented rollbacks in many cases for instance when the new application terminally failed after the upgrade.
1.6.0 introduces several core improvements to the rollback mechanism to leverage the robust upgrade flow and cover a much wider range of failure scenarios.
Experimental support for Flink 1.18 and in-place rescaling # Flink 1.18 introduces a new endpoint as part of FLIP-291 allowing users to rescale operators (job vertexes) through the REST API. The operator now has built in support to apply vertex parallelism overrides through the rest api to reduce downtime.
This feature enables the autoscaler to execute very quick scale up/down actions when used with Flink 1.18.
In-place scaling is only available when the job uses the adaptive scheduler (jobmanager.scheduler: adaptive).
Namespace and Flink Version specific config defaults # The operator now supports setting default configuration on a per-namespace and per-flink version level. This allows users for example to set config defaults differently for Flink 1.18 (enable adaptive scheduler by default)
Or to use different reconciliation/operator settings for different namespaces.
Syntax:
# Version Specific Defaults kubernetes.operator.default-configuration.flink-version.v1_17.key: value # Namespace Specific Defaults kubernetes.operator.default-configuration.namespace.ns1.key: value JobManager startup probe for more robust upgrades # The operator now automatically applies a startup probe for the JobManager deployments allowing the reconciler to better detect startup failures. This further hardens the upgrade mechanism and drastically reduces the number of cases where manual intervention is necessary from users.
Flink client upgrade to 1.17 # We have upgraded the Flink client libraries used by the operator to the 1.17.1 release which helped simplify the build and packaging significantly due to the improvements in the flink-kubernetes module shading.
General Autoscaler Improvements # The release also contains several improvements to the autoscaler module. This includes improved metrics tracking, better observability and less noisy logging.
We also reworked the way parallelism overrides are applied. Instead of changing the spec, parallelism overrides are now applied on the fly on top of the user provided spec. This allows us to seamlessly carry over the autoscaler settings even when the spec changes.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.6.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.6.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.6.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.6.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Alexander Fedulov, ConradJam, Fangbin Sun, Gyula Fora, James Busche, Mate Czagany, Matyas Orhidi, Maximilian Michels, Nicolas Fraison, Oleksandr Nitavskyi, Tamir Sagi, Thomas, Xin Hao, Xingcan Cui, Daren Wong, Fabio Wanner, kenankule, llussy, yangjf2019,
`}),e.add({id:47,href:"/2023/08/04/announcing-three-new-apache-flink-connectors-the-new-connector-versioning-strategy-and-externalization/",title:"Announcing three new Apache Flink connectors, the new connector versioning strategy and externalization",section:"Flink Blog",content:` New connectors # We’re excited to announce that Apache Flink now supports three new connectors: Amazon DynamoDB, MongoDB and OpenSearch! The connectors are available for both the DataStream and Table/SQL APIs.
Amazon DynamoDB - This connector includes a sink that provides at-least-once delivery guarantees. MongoDB connector - This connector includes a source and sink that provide at-least-once guarantees. OpenSearch sink - This connector includes a sink that provides at-least-once guarantees. Connector Date Released Supported Flink Versions Amazon DynamoDB sink 2022-12-02 1.15+ MongoDB connector 2023-03-31 1.16+ OpenSearch sink 2022-12-21 1.16+ List of Contributors # The Apache Flink community would like to express gratitude to all the new connector contributors:
Andriy Redko, Chesnay Schepler, Danny Cranmer, darenwkt, Hong Liang Teoh, Jiabao Sun, Leonid Ilyevsky, Martijn Visser, nir.tsruya, Sergey Nuyanzin, Weijie Guo, Yuri Gusev, Yuxin Tan
Externalized connectors # The community has externalized connectors from Flink’s main repository. This was driven to realise the following benefits:
Faster releases of connectors: New features can be added more quickly, bugs can be fixed immediately, and we can have faster security patches in case of direct or indirect (through dependencies) security flaws. Adding newer connector features to older Flink versions: By having stable connector APIs, the same connector artifact may be used with different Flink versions. Thus, new features can also immediately be used with older Flink versions. More activity and contributions around connectors: By easing the contribution and development process around connectors, we will see faster development and also more connectors. Documentation: Standardized documentation and user experience for the connectors, regardless of where they are maintained. A faster Flink CI: By not needing to build and test connectors, the Flink CI pipeline will be faster and Flink developers will experience fewer build instabilities (which mostly come from connectors). That should speed up Flink development. The following connectors have been moved to individual repositories:
Kafka / Upsert-Kafka Cassandra Elasticsearch MongoDB OpenSearch RabbitMQ Google Cloud PubSub Pulsar JDBC HBase Hive AWS connectors: Firehose Kinesis DynamoDB Versioning # Connectors continue to use the same Maven dependency groupId and artifactId. However, the JAR artifact version has changed and now uses the format, <major>.<minor>.<patch>-<flink-major>.<flink-minor>. For example, to use the DynamoDB connector for Flink 1.17, add the following dependency to your project:
<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-dynamodb</artifactId> <version>4.1.0-1.17</version> </dependency> You can find the maven dependency for a connector in the Flink connectors documentation for a specific Flink version. Use the Flink Downloads page to verify which version your connector is compatible with.
Contributing # Similarly, when creating JIRAs to report issues or to contribute to externalized connectors, the Affects Version/s and Fix Version/s fields should now use the connector version instead of a Flink version. The format should be <connector-name>-<major>.<minor>.<patch>. For example, use opensearch-1.1.0 for the OpenSearch connector. All other fields in the JIRA like Component/s remain the same.
For more information on how to contribute to externalized connectors, see the Externalized Connector development wiki.
`}),e.add({id:48,href:"/2023/07/03/sigmod-systems-award-for-apache-flink/",title:"SIGMOD Systems Award for Apache Flink",section:"Flink Blog",content:`Apache Flink received the 2023 SIGMOD Systems Award, which is awarded to an individual or set of individuals to recognize the development of a software or hardware system whose technical contributions have had significant impact on the theory or practice of large-scale data management systems:
The 2023 SIGMOD Systems Award goes to Apache Flink:
“Apache Flink greatly expanded the use of stream data-processing.”
Winning of SIGMOD Systems Award indicates the high recognition of Flink’s technological advancement and industry influence from academia. This is a significant achievement by the whole community in Apache Flink, including the over 1,400 contributors and many others who contributed in ways beyond code.
`}),e.add({id:49,href:"/2023/05/25/apache-flink-1.16.2-release-announcement/",title:"Apache Flink 1.16.2 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the second bug fix release of the Flink 1.16 series.
This release includes 104 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.16.2.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.16.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.16.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.16.2</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.16.2 Release Notes # Bug [FLINK-27246] - Code of method "processElement(Lorg/apache/flink/streaming/runtime/streamrecord/StreamRecord;)V" of class "HashAggregateWithKeys$9211" grows beyond 64 KB [FLINK-27800] - addInEdge check state error [FLINK-27848] - ZooKeeperLeaderElectionDriver keeps writing leader information, using up zxid [FLINK-28786] - Cannot run PyFlink 1.16 on MacOS with M1 chip [FLINK-29852] - Adaptive Scheduler duplicates operators for each parallel instance in the Web UI [FLINK-30461] - Some rocksdb sst files will remain forever [FLINK-30462] - DefaultMultipleComponentLeaderElectionService saves wrong leader session ID [FLINK-30477] - Not properly blocking retries when timeout occurs in AsyncWaitOperator [FLINK-30561] - ChangelogStreamHandleReaderWithCache cause FileNotFoundException [FLINK-30567] - Wrong insert overwrite behavior when the table contains uppercase character with Hive dialect [FLINK-30679] - Can not load the data of hive dim table when project-push-down is introduced [FLINK-30792] - clean up not uploaded state changes after materialization complete [FLINK-30803] - PyFlink mishandles script dependencies [FLINK-30864] - Optional pattern at the start of a group pattern not working [FLINK-30876] - Fix ResetTransformationProcessor don't reset the transformation of ExecNode in BatchExecMultiInput.rootNode [FLINK-30881] - Crictl/Minikube version mismatch causes errors in k8s setup [FLINK-30885] - Optional group pattern starts with non-optional looping pattern get wrong result on followed-by [FLINK-30917] - The user configured max parallelism does not take effect when using adaptive batch scheduler [FLINK-30989] - Configuration table.exec.spill-compression.block-size not take effect in batch job [FLINK-31017] - Early-started partial match timeout not yield completed matches [FLINK-31041] - Build up of pending global failures causes JM instability [FLINK-31042] - AfterMatchSkipStrategy not working on notFollowedBy ended pattern [FLINK-31043] - KeyError exception is thrown in CachedMapState [FLINK-31077] - Trigger checkpoint failed but it were shown as COMPLETED by rest API [FLINK-31083] - Python ProcessFunction with OutputTag cannot be reused [FLINK-31099] - Chained WindowOperator throws NPE in PyFlink ThreadMode [FLINK-31131] - The INITIALIZING of ExecutionState is missed in the state_machine doc [FLINK-31162] - Avoid setting private tokens to AM container context when kerberos delegation token fetch is disabled [FLINK-31182] - CompiledPlan cannot deserialize BridgingSqlFunction with MissingTypeStrategy [FLINK-31183] - Flink Kinesis EFO Consumer can fail to stop gracefully [FLINK-31185] - Python BroadcastProcessFunction not support side output [FLINK-31272] - Duplicate operators appear in the StreamGraph for Python DataStream API jobs [FLINK-31273] - Left join with IS_NULL filter be wrongly pushed down and get wrong join results [FLINK-31283] - Correct the description of building from source with scala version [FLINK-31286] - Python processes are still alive when shutting down a session cluster directly without stopping the jobs [FLINK-31293] - Request memory segment from LocalBufferPool may hanging forever. [FLINK-31305] - KafkaWriter doesn't wait for errors for in-flight records before completing flush [FLINK-31319] - Kafka new source partitionDiscoveryIntervalMs=0 cause bounded source can not quit [FLINK-31346] - Batch shuffle IO scheduler does not throw TimeoutException if numRequestedBuffers is greater than 0 [FLINK-31386] - Fix the potential deadlock issue of blocking shuffle [FLINK-31414] - exceptions in the alignment timer are ignored [FLINK-31437] - Wrong key 'lookup.cache.caching-missing-key' in connector documentation [FLINK-31478] - TypeError: a bytes-like object is required, not 'JavaList' is thrown when ds.execute_and_collect() is called on a KeyedStream [FLINK-31503] - "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Python UDFs in SQL Client [FLINK-31588] - The unaligned checkpoint type is wrong at subtask level [FLINK-31632] - watermark aligned idle source can't resume [FLINK-31652] - Flink should handle the delete event if the pod was deleted while pending [FLINK-31653] - Using\`if\` statement for a string subtype of the row type may meet npe in code generated by codegen [FLINK-31657] - ConfigurationInfo generates incorrect openapi schema [FLINK-31670] - ElasticSearch connector's document was not incorrect linked to external repo [FLINK-31683] - Align the outdated Chinese filesystem connector docs [FLINK-31690] - The current key is not set for KeyedCoProcessOperator [FLINK-31707] - Constant string cannot be used as input arguments of Pandas UDAF [FLINK-31743] - Avoid relocating the RocksDB's log failure when filename exceeds 255 characters [FLINK-31763] - Convert requested buffers to overdraft buffers when pool size is decreased [FLINK-31959] - Correct the unaligned checkpoint type at checkpoint level [FLINK-31963] - java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints [FLINK-32010] - KubernetesLeaderRetrievalDriver always waits for lease update to resolve leadership [FLINK-32027] - Batch jobs could hang at shuffle phase when max parallelism is really large [FLINK-32029] - FutureUtils.handleUncaughtException swallows exceptions that are caused by the exception handler code Improvement [FLINK-25874] - PyFlink package dependencies conflict [FLINK-29729] - Fix credential info configured in flink-conf.yaml is lost during creating ParquetReader [FLINK-30962] - Improve error messaging when launching py4j gateway server [FLINK-31031] - Disable the output buffer of Python process to make it more convenient for interactive users [FLINK-31227] - Remove 'scala version' from file sink modules [FLINK-31651] - Improve logging of granting/revoking leadership in JobMasterServiceLeadershipRunner to INFO level [FLINK-31692] - Integrate MongoDB connector docs into Flink website [FLINK-31703] - Update Flink docs for AWS v4.1.0 [FLINK-31764] - Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool [FLINK-31779] - Track stable branch of externalized connector instead of specific release tag [FLINK-31799] - Python connector download link should refer to the url defined in externalized repository [FLINK-31984] - Savepoint on S3 should be relocatable if entropy injection is not effective [FLINK-32024] - Short code related to externalized connector retrieve version from its own data yaml `}),e.add({id:50,href:"/2023/05/25/apache-flink-1.17.1-release-announcement/",title:"Apache Flink 1.17.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.17 series.
This release includes 75 bug fixes, vulnerability fixes, and minor improvements for Flink 1.17. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.17.1.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.17.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.17.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.17.1</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.17.1 Release Notes # Release Notes - Flink - Version 1.17.1 Bug [FLINK-28786] - Cannot run PyFlink 1.16 on MacOS with M1 chip [FLINK-30989] - Configuration table.exec.spill-compression.block-size not take effect in batch job [FLINK-31131] - The INITIALIZING of ExecutionState is missed in the state_machine doc [FLINK-31165] - Over Agg: The window rank function without order by error in top N query [FLINK-31273] - Left join with IS_NULL filter be wrongly pushed down and get wrong join results [FLINK-31293] - Request memory segment from LocalBufferPool may hanging forever. [FLINK-31305] - KafkaWriter doesn't wait for errors for in-flight records before completing flush [FLINK-31414] - exceptions in the alignment timer are ignored [FLINK-31424] - NullPointer when using StatementSet for multiple sinks [FLINK-31437] - Wrong key 'lookup.cache.caching-missing-key' in connector documentation [FLINK-31478] - TypeError: a bytes-like object is required, not 'JavaList' is thrown when ds.execute_and_collect() is called on a KeyedStream [FLINK-31503] - "org.apache.beam.sdk.options.PipelineOptionsRegistrar: Provider org.apache.beam.sdk.options.DefaultPipelineOptionsRegistrar not a subtype" is thrown when executing Python UDFs in SQL Client [FLINK-31541] - Get metrics in Flink WEB UI error [FLINK-31557] - Metric viewUpdater and reporter task in a SingleThreadScheduledExecutor lead to inaccurate PerSecond related metrics [FLINK-31588] - The unaligned checkpoint type is wrong at subtask level [FLINK-31612] - ClassNotFoundException when using GCS path as HA directory [FLINK-31626] - HsSubpartitionFileReaderImpl should recycle skipped read buffers. [FLINK-31628] - ArrayIndexOutOfBoundsException in watermark processing [FLINK-31632] - watermark aligned idle source can't resume [FLINK-31652] - Flink should handle the delete event if the pod was deleted while pending [FLINK-31653] - Using\`if\` statement for a string subtype of the row type may meet npe in code generated by codegen [FLINK-31657] - ConfigurationInfo generates incorrect openapi schema [FLINK-31670] - ElasticSearch connector's document was not incorrect linked to external repo [FLINK-31683] - Align the outdated Chinese filesystem connector docs [FLINK-31690] - The current key is not set for KeyedCoProcessOperator [FLINK-31707] - Constant string cannot be used as input arguments of Pandas UDAF [FLINK-31711] - OpenAPI spec omits complete-statement request body [FLINK-31733] - Model name clashes in OpenAPI spec [FLINK-31735] - JobDetailsInfo plan incorrectly documented as string [FLINK-31738] - FlameGraphTypeQueryParameter#Type clashes with java.reflect.Type in generated clients [FLINK-31743] - Avoid relocating the RocksDB's log failure when filename exceeds 255 characters [FLINK-31758] - Some external connectors sql client jar has a wrong download url in document [FLINK-31763] - Convert requested buffers to overdraft buffers when pool size is decreased [FLINK-31792] - Errors are not reported in the Web UI [FLINK-31818] - parsing error of 'security.kerberos.access.hadoopFileSystems' in flink-conf.yaml [FLINK-31834] - Azure Warning: no space left on device [FLINK-31839] - Token delegation fails when both flink-s3-fs-hadoop and flink-s3-fs-presto plugins are used [FLINK-31882] - SqlGateway will throw exception when executing DeleteFromFilterOperation [FLINK-31959] - Correct the unaligned checkpoint type at checkpoint level [FLINK-31962] - libssl not found when running CI [FLINK-31963] - java.lang.ArrayIndexOutOfBoundsException when scaling down with unaligned checkpoints [FLINK-32010] - KubernetesLeaderRetrievalDriver always waits for lease update to resolve leadership [FLINK-32027] - Batch jobs could hang at shuffle phase when max parallelism is really large [FLINK-32029] - FutureUtils.handleUncaughtException swallows exceptions that are caused by the exception handler code Improvement [FLINK-29542] - Unload.md wrongly writes UNLOAD operation as LOAD operation [FLINK-31398] - Don't wrap with TemporaryClassLoaderContext in OperationExecutor [FLINK-31651] - Improve logging of granting/revoking leadership in JobMasterServiceLeadershipRunner to INFO level [FLINK-31656] - Obtain delegation tokens early to support external file system usage in blob server [FLINK-31692] - Integrate MongoDB connector docs into Flink website [FLINK-31702] - Integrate Opensearch connector docs into Flink docs v1.17/master [FLINK-31703] - Update Flink docs for AWS v4.1.0 [FLINK-31764] - Get rid of numberOfRequestedOverdraftMemorySegments in LocalBufferPool [FLINK-31779] - Track stable branch of externalized connector instead of specific release tag [FLINK-31799] - Python connector download link should refer to the url defined in externalized repository [FLINK-31984] - Savepoint on S3 should be relocatable if entropy injection is not effective [FLINK-32001] - SupportsRowLevelUpdate does not support returning only a part of the columns. [FLINK-32024] - Short code related to externalized connector retrieve version from its own data yaml [FLINK-32099] - create flink_data volume for operations playground [FLINK-32112] - Fix the deprecated state backend sample config in Chinese document Technical Debt [FLINK-31704] - Pulsar docs should be pulled from dedicated branch [FLINK-31705] - Remove Conjars `}),e.add({id:51,href:"/2023/05/17/apache-flink-kubernetes-operator-1.5.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.5.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink Kubernetes Operator 1.5.0! The release focuses on improvements to the job autoscaler that was introduced in the previous release and general operational hardening of the operator.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Autoscaler improvements # Algorithm improvements and better scale down behaviour # The release contains important improvements to the core autoscaling logic. This includes improved stability of scaling decisions (leading to less parallelism oscillations) and better handling of slow or idle streams.
There are also some fixes related to output ratio computation and propagation that greatly improves the autoscaler on more complex streaming pipelines.
This version also introduces new metrics for tracking the number of scaling decisions and scaling errors together with some more Kubernetes events to improve the observability of the system.
Improved default configuration # We have simplified and improved some default autoscaler configs for a better out-of-the-box user experience.
Some notable changes:
kubernetes.operator.job.autoscaler.metrics.window: 5m -> 10m kubernetes.operator.job.autoscaler.target.utilization.boundary: 0.1 -> 0.4 kubernetes.operator.job.autoscaler.scale-up.grace-period: 10m -> 1h kubernetes.operator.job.autoscaler.history.max.count: 1 -> 3 kubernetes.operator.job.autoscaler.scaling.effectiveness.detection.enabled: true -> false kubernetes.operator.job.autoscaler.catch-up.duration: 10m -> 5m kubernetes.operator.job.autoscaler.restart.time: 5m -> 3m CRD Changes # Ephemeral storage support # Stateful streaming jobs often rely on ephemeral storage to store the working state of the pipeline. Previously it was only possible to change the ephemeral storage size through the pod template mechanism. The 1.5.0 release adds a new field to the task and jobmanager specification that allows configuring this similarly to other resources:
spec: ... taskManager: resource: memory: "2048m" cpu: 8 ephemeralStorage: "10G" Make sure you upgrade the CRD together with the operator deployment to be able to access this feature. For more details check the docs
General operations # Fabric8 and JOSDK version bump # The operator have been updated to use the latest Java Operator SDK and Fabric8 versions that contain important fixes for production environments.
Health probe and canary resources # Previous operator versions already contained a rudimentary health probe to catch simple startup errors but did not have a good mechanism to catch errors that developed during the lifetime of the running operator.
The 1.5.0 version adds two significant improvements here:
Improved health probe to detect informer errors after startup Introduce canary resources for detecting general operator problems The new canary resource feature allows users to deploy special dummy resources (canaries) into selected namespaces. The operator health probe will then monitor that these resources are reconciled in a timely manner. This allows the operator health probe to catch any slowdowns, and other general reconciliation issues not covered otherwise.
Canary FlinkDeployment:
apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment metadata: name: canary labels: "flink.apache.org/canary": "true" Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.5.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.5.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.5.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.5.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Gyula Fora, Marton Balassi, Mate Czagany, Maximilian Michels, Rafał Boniecki, Rodrigo Meneses, Tamir Sagi, Xin Hao, Xin Li, Zhanghao Chen, Zhenqiu Huang, Daren Wong, Gaurav Miglani, Peter Vary, Tan Kim, yangjf2019
`}),e.add({id:52,href:"/2023/05/12/howto-test-a-batch-source-with-the-new-source-framework/",title:"Howto test a batch source with the new Source framework",section:"Flink Blog",content:` Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. This article is the continuation of the howto create a batch source with the new Source framework article . Now it is time to test the created source ! As the previous article, this one was built while implementing the Flink batch source for Cassandra.
Unit testing the source # Testing the serializers # Example Cassandra SplitSerializer and SplitEnumeratorStateSerializer
In the previous article, we created serializers for Split and SplitEnumeratorState. We should now test them in unit tests. To test serde we create an object, serialize it using the serializer and then deserialize it using the same serializer and finally assert on the equality of the two objects. Thus, hascode() and equals() need to be implemented for the serialized objects.
Other unit tests # Of course, we also need to unit test low level processing such as query building for example or any processing that does not require a running backend.
Integration testing the source # For tests that require a running backend, Flink provides a JUnit5 source test framework. It is composed of different parts gathered in a test suite:
The Flink environment The backend environment The checkpointing semantics The test context Example Cassandra SourceITCase For the test to be integrated to Flink CI, the test class must be called *ITCAse. But it can be called differently if the test belongs to somewhere else. The class extends SourceTestSuiteBase . This test suite provides all the necessary tests already (single split, multiple splits, idle reader, etc…). It is targeted for batch and streaming sources, so for our batch source case here, the tests below need to be disabled as they are targeted for streaming sources. They can be disabled by overriding them in the ITCase and annotating them with @Disabled:
testSourceMetrics testSavepoint testScaleUp testScaleDown testTaskManagerFailure Of course we can add our own integration tests cases for example tests on limits, tests on low level splitting or any test that requires a running backend. But for most cases we only need to provide Flink test environment classes to configure the ITCase:
Flink environment # We add this annotated field to our ITCase and we’re done
@TestEnv MiniClusterTestEnvironment flinkTestEnvironment = new MiniClusterTestEnvironment(); Backend environment # Example Cassandra TestEnvironment
To test the connector we need a backend to run the connector against. This TestEnvironment provides everything related to the backend: the container, its configuration, the session to connect to it, and all the elements bound to the whole test case (table space, initialization requests …)
We add this annotated field to our ITCase
@TestExternalSystem MyBackendTestEnvironment backendTestEnvironment = new MyBackendTestEnvironment(); To integrate with JUnit5 BackendTestEnvironment implements TestResource . This environment is scoped to the test suite, so it is where we setup the backend and shared resources (session, tablespace, etc…) by implementing startup() and tearDown() methods. For that we advise the use of testContainers that relies on docker images to provide a real backend instance (not a mock) that is representative for integration tests. Several backends are supported out of the box by testContainers. We need to configure test containers that way:
Redirect container output (error and standard output) to Flink logs Set the different timeouts to cope with CI server load Set retrial mechanisms for connection, initialization requests etc… for the same reason Checkpointing semantics # In big data execution engines, there are 2 levels of guarantee regarding source and sinks:
At least once: upon failure and recovery, some records may be reflected multiple times but none will be lost Exactly once: upon failure and recovery, every record will be reflected exactly once By the following code we verify that the source supports exactly once semantics:
@TestSemantics CheckpointingMode[] semantics = new CheckpointingMode[] {CheckpointingMode.EXACTLY_ONCE}; That being said, we could encounter a problem while running the tests : the default assertions in the Flink source test framework assume that the data is read in the same order it was written. This is untrue for most big data backends where ordering is usually not deterministic. To support unordered checks and still use all the framework provided tests, we need to override SourceTestSuiteBase#checkResultWithSemantic in out ITCase:
@Override protected void checkResultWithSemantic( CloseableIterator<Pojo> resultIterator, List<List<Pojo>> testData, CheckpointingMode semantic, Integer limit) { if (limit != null) { Runnable runnable = () -> CollectIteratorAssertions.assertUnordered(resultIterator) .withNumRecordsLimit(limit) .matchesRecordsFromSource(testData, semantic); assertThat(runAsync(runnable)).succeedsWithin(DEFAULT_COLLECT_DATA_TIMEOUT); } else { CollectIteratorAssertions.assertUnordered(resultIterator) .matchesRecordsFromSource(testData, semantic); } } This is a copy-paste of the parent method where CollectIteratorAssertions.assertOrdered() is replaced by CollectIteratorAssertions.assertUnordered().
Test context # Example Cassandra TestContext
The test context provides Flink with means to interact with the backend, like inserting test data, creating tables or constructing the source. It is scoped to the test case (and not to the test suite).
It is linked to the ITCase through a factory of TestContext as shown below.
@TestContext TestContextFactory contextFactory = new TestContextFactory(testEnvironment); TestContext implements DataStreamSourceExternalContext:
We don’t connect to the backend at each test case, so the shared resources such as session are created by the backend test environment (test suite scoped). They are then passed to the test context by constructor. It is also in the constructor that we initialize test case backend resources such as test case table. close() : drop the created test case resources getProducedType(): specify the test output type of the source such as a test Pojo for example getConnectorJarPaths(): provide a list of attached jars. Most of the time, we can return an empty list as maven already adds the jars to the test classpath createSource(): here we create the source as a user would have done. It will be provided to the test cases by the Flink test framework createSourceSplitDataWriter(): here we create an ExternalSystemSplitDataWriter responsible for writing test data which comes as a list of produced type objects such as defined in getProducedType() generateTestData(): produce the list of test data that will be given to the ExternalSystemSplitDataWriter. We must make sure that equals() returns false when 2 records of this list belong to different splits. The easier for that is to include the split id into the produced type and make sure equals() and hashcode() are properly implemented to include this field. Contributing the source to Flink # Lately, the Flink community has externalized all the connectors to external repositories that are sub-repositories of the official Apache Flink repository. This is mainly to decouple the release of Flink to the release of the connectors. To distribute the created source, we need to follow this official wiki page .
Conclusion # This concludes the series of articles about creating a batch source with the new Flink framework. This was needed as, apart from the javadocs, the documentation about testing is missing for now. I hope you enjoyed reading and I hope the Flink community will receive a source PR from you soon :)
`}),e.add({id:53,href:"/2023/05/09/howto-migrate-a-real-life-batch-pipeline-from-the-dataset-api-to-the-datastream-api/",title:"Howto migrate a real-life batch pipeline from the DataSet API to the DataStream API",section:"Flink Blog",content:` Introduction # The Flink community has been deprecating the DataSet API since version 1.12 as part of the work on FLIP-131: Consolidate the user-facing Dataflow SDKs/APIs (and deprecate the DataSet API) . This blog article illustrates the migration of a real-life batch DataSet pipeline to a batch DataStream pipeline. All the code presented in this article is available in the tpcds-benchmark-flink repo. The use case shown here is extracted from a broader work comparing Flink performances of different APIs by implementing TPCDS queries using these APIs.
What is TPCDS? # TPC-DS is a decision support benchmark that models several generally applicable aspects of a decision support system. The purpose of TPCDS benchmarks is to provide relevant, objective performance data of Big Data engines to industry users.
Chosen TPCDS query # The chosen query for this article is Query3 because it contains all the more common analytics operators (filter, join, aggregation, group by, order by, limit). It represents an analytic query on store sales. Its SQL code is presented here:
SELECT dt.d_year, item.i_brand_id brand_id, item.i_brand brand,SUM(ss_ext_sales_price) sum_agg FROM date_dim dt, store_sales, item WHERE dt.d_date_sk = store_sales.ss_sold_date_sk AND store_sales.ss_item_sk = item.i_item_sk AND item.i_manufact_id = 128 AND dt.d_moy=11 GROUP BY dt.d_year, item.i_brand, item.i_brand_id ORDER BY dt.d_year, sum_agg desc, brand_id LIMIT 100
The initial DataSet pipeline # The pipeline we are migrating is this batch pipeline that implements the above query using the DataSet API and Row as dataset element type.
Migrating the DataSet pipeline to a DataStream pipeline # Instead of going through all of the code which is available here we will rather focus on some key areas of the migration. The code is based on the latest release of Flink at the time this article was written: version 1.16.0.
DataStream is a unified API that allows to run pipelines in both batch and streaming modes. To execute a DataStream pipeline in batch mode, it is not enough to set the execution mode in the Flink execution environment, it is also needed to migrate some operations. Indeed, the DataStream API semantics are the ones of a streaming pipeline. The arriving data is thus considered infinite. So, compared to the DataSet API which operates on finite data, there are adaptations to be made on some operations.
Setting the execution environment # We start by moving from ExecutionEnvironment to StreamExecutionEnvironment . Then, as the source in this pipeline is bounded, we can use either the default streaming execution mode or the batch mode. In batch mode the tasks of the job can be separated into stages that can be executed one after another. In streaming mode all tasks need to be running all the time and records are sent to downstream tasks as soon as they are available.
Here we keep the default streaming mode that gives good performance on this pipeline and that would allow to run the same pipeline with no change on an unbounded source.
Using the streaming sources and datasets # Sources: DataSource becomes DataStreamSource after the call to env.createInput().
Datasets: DataSet are now DataStream and subclasses.
Migrating the join operation # The DataStream join operator does not yet support aggregations in batch mode ( see FLINK-22587 for details). Basically, the problem is with the trigger of the default GlobalWindow which never fires so the records are never output. We will workaround this problem by applying a custom EndOfStream window. It is a window assigner that assigns all the records to a single TimeWindow . So, like for the GlobalWindow, all the records are assigned to the same window except that this window’s trigger is based on the end of the window (which is set to Long.MAX_VALUE). As we are on a bounded source, at some point the watermark will advance to +INFINITY (Long.MAX_VALUE) and will thus cross the end of the time window and consequently fire the trigger and output the records.
Now that we have a working triggering, we need to call a standard join with the Row::join function.
Migrating the group by and reduce (sum) operations # DataStream API has no more groupBy() method, we now use the keyBy() method. An aggregation downstream will be applied on elements with the same key exactly as a GroupReduceFunction would have done on a DataSet except it will not need to materialize the collection of data. Indeed, the following operator is a reducer: the summing operation downstream is still done through a ReduceFunction but this time the operator reduces the elements incrementally instead of receiving the rows as a Collection. To make the sum we store in the reduced row the partially aggregated sum. Due to incremental reduce, we also need to distinguish if we received an already reduced row (in that case, we read the partially aggregated sum) or a fresh row (in that case we just read the corresponding price field). Also, please note that, as in the join case, we need to specify windowing for the aggregation.
Migrating the order by operation # The sort of the datastream is done by applying a KeyedProcessFunction .
But, as said above, the DataStream semantics are the ones of a streaming pipeline. The arriving data is thus considered infinite. As such we need to “divide” the data to have output times. For that we need to set a timer to output the resulting data. We set a timer to fire at the end of the EndOfStream window meaning that the timer will fire at the end of the batch.
To sort the data, we store the incoming rows inside a ListState and sort them at output time, when the timer fires in the onTimer() callback.
Another thing: to be able to use Flink state, we need to key the datastream beforehand, even if there is no group by key because Flink state is designed per-key. Thus, we key by a fake static key so that there is a single state.
Migrating the limit operation # As all the elements of the DataStream were keyed by the same “0” key, they are kept in the same " group". So we can implement the SQL LIMIT with a ProcessFunction with a counter that will output only the first 100 elements.
Migrating the sink operation # As with sources, there were big changes in sinks with recent versions of Flink. We now use the Sink interface that requires an Encoder . But the resulting code is very similar to the one using the DataSet API. It’s only that Encoder#encode() method writes bytes when TextOutputFormat.TextFormatter#format() wrote Strings.
Conclusion # As you saw for the migration of the join operation, the new unified DataStream API has some limitations left in batch mode. In addition, the order by and limit resulting code is quite manual and requires the help of the Flink state API for the migration. For all these reasons, the Flink community recommends to use Flink SQL for batch pipelines. It results in much simpler code, good performance and out-of-the-box analytics capabilities. You could find the equivalent Query3 code that uses the Flink SQL/Table API in the Query3ViaFlinkSQLCSV class .
`}),e.add({id:54,href:"/2023/05/03/howto-create-a-batch-source-with-the-new-source-framework/",title:"Howto create a batch source with the new Source framework",section:"Flink Blog",content:` Introduction # The Flink community has designed a new Source framework based on FLIP-27 lately. Some connectors have migrated to this new framework. This article is a how-to for creating a batch source using this new framework. It was built while implementing the Flink batch source for Cassandra. If you are interested in contributing or migrating connectors, this blog post is for you!
Implementing the source components # The source architecture is depicted in the diagrams below:
Source # Example Cassandra Source
The source interface only does the “glue” between all the other components. Its role is to instantiate all of them and to define the source Boundedness . We also do the source configuration here along with user configuration validation.
SourceReader # Example Cassandra SourceReader
As shown in the graphic above, the instances of the SourceReader (which we will call simply readers in the continuation of this article) run in parallel in task managers to read the actual data which is divided into Splits. Readers request splits from the SplitEnumerator and the resulting splits are assigned to them in return.
Flink provides the SourceReaderBase implementation that takes care of all the threading. Flink also provides a useful extension to this class for most cases: SingleThreadMultiplexSourceReaderBase . This class has the threading model already configured: each SplitReader instance reads splits using one thread (but there are several SplitReader instances that live among task managers).
What we have left to do in the SourceReader class is:
Provide a SplitReader supplier Create a RecordEmitter Create the shared resources for the SplitReaders (sessions, etc…). As the SplitReader supplier is created in the SourceReader constructor in a super() call, using a SourceReader factory to create the shared resources and pass them to the supplier is a good idea. Implement start(): here we should ask the enumerator for our first split Override close() in SourceReaderBase parent class to free up any created resources (the shared resources for example) Implement initializedState() to create a mutable SplitState from a Split Implement toSplitType() to create a Split from the mutable SplitState Implement onSplitFinished(): here, as it is a batch source (finite data), we should ask the Enumerator for next split Split and SplitState # Example Cassandra Split
The SourceSplit represents a partition of the source data. What defines a split depends on the backend we are reading from. It could be a (partition start, partition end) tuple or an (offset, split size) tuple for example.
In any case, the Split object should be seen as an immutable object: any update to it should be done on the associated SplitState. The split state is the one that will be stored inside the Flink checkpoints . A checkpoint may happen between 2 fetches for 1 split. So, if we’re reading a split, we must store in the split state the current state of the reading process. This current state needs to be something serializable (because it will be part of a checkpoint) and something that the backend source can resume from. That way, in case of failover, the reading could be resumed from where it was left off. Thus we ensure there will be no duplicates or lost data.
For example, if the records reading order is deterministic in the backend, then the split state can store the number n of already read records to restart at n+1 after failover.
SplitEnumerator and SplitEnumeratorState # Example Cassandra SplitEnumerator and SplitEnumeratorState
The SplitEnumerator is responsible for creating the splits and serving them to the readers. Whenever possible, it is preferable to generate the splits lazily, meaning that each time a reader asks the enumerator for a split, the enumerator generates one on demand and assigns it to the reader. For that we implement SplitEnumerator#handleSplitRequest() . Lazy splits generation is preferable to splits discovery, in which we pre-generate all the splits and store them waiting to assign them to the readers. Indeed, in some situations, the number of splits can be enormous and consume a lot a memory which could be problematic in case of straggling readers. The framework offers the ability to act upon reader registration by implementing addReader() but, as we do lazy splits generation, we have nothing to do there. In some cases, generating a split is too costly, so we can pre-generate a batch (not all) of splits to amortize this cost. The number/size of batched splits need to be taken into account to avoid consuming too much memory.
Long story short, the tricky part of the source implementation is splitting the source data. The good equilibrium to find is not to have too many splits (which could lead to too much memory consumption) nor too few (which could lead to sub-optimal parallelism). One good way to meet this equilibrium is to evaluate the size of the source data upfront and allow the user to specify the maximum memory a split will take. That way they can configure this parameter accordingly to the memory available on the task managers. This parameter is optional so the source needs to provide a default value. Also, the source needs to control that the user provided max-split-size is not too little which would lead to too many splits. The general rule of thumb is to let the user some freedom but protect him from unwanted behavior. For these safety measures, rigid thresholds don’t work well as the source may start to fail when the thresholds are suddenly exceeded.
For example if we enforce that the number of splits is below twice the parallelism, if the job is regularly run on a growing table, at some point there will be more and more splits of max-split-size and the threshold will be exceeded. Of course, the size of the source data needs to be evaluated without reading the actual data. For the Cassandra connector it was done like this.
Another important topic is state. If the job manager fails, the split enumerator needs to recover. For that, as for the split, we need to provide a state for the enumerator that will be part of a checkpoint. Upon recovery, the enumerator is reconstructed and receives an enumerator state for recovering its previous state. Upon checkpointing, the enumerator returns its state when SplitEnumerator#snapshotState() is called. The state must contain everything needed to resume where the enumerator was left off after failover. In lazy split generation scenario, the state will contain everything needed to generate the next split whenever asked to. It can be for example the start offset of next split, split size, number of splits still to generate etc… But the SplitEnumeratorState must also contain a list of splits, not the list of discovered splits, but a list of splits to reassign. Indeed, whenever a reader fails, if it was assigned splits after last checkpoint, then the checkpoint will not contain those splits. Consequently, upon restoration, the reader won’t have the splits assigned anymore. There is a callback to deal with that case: addSplitsBack() . There, the splits that were assigned to the failing reader, can be put back into the enumerator state for later re-assignment to readers. There is no memory size risk here as the number of splits to reassign is pretty low.
The above topics are the more important regarding splitting. There are 2 methods left to implement: the usual start() /close() methods for resources creation/disposal. Regarding implementing start(), the Flink connector framework provides enumeratorContext#callAsync() utility to run long processing asynchronously such as splits preparation or splits discovery (if lazy splits generation is impossible). Indeed, the start() method runs in the source coordinator thread, we don’t want to block it for a long time.
SplitReader # Example Cassandra SplitReader
This class is responsible for reading the actual splits that it receives when the framework calls handleSplitsChanges() . The main part of the split reader is the fetch() implementation where we read all the splits received and return the read records as a RecordsBySplits object. This object contains a map of the split ids to the belonging records and also the ids of the finished splits. Important points need to be considered:
The fetch call must be non-blocking. If any call in its code is synchronous and potentially long, an escape from the fetch() must be provided. When the framework calls wakeUp() we should interrupt the fetch for example by setting an AtomicBoolean. Fetch call needs to be re-entrant: an already read split must not be re-read. We should remove it from the list of splits to read and add its id to the finished splits (along with empty splits) in the RecordsBySplits that we return. It is totally fine for the implementer to exit the fetch() method early. Also a failure could interrupt the fetch. In both cases the framework will call fetch() again later on. In that case, the fetch method must resume the reading from where it was left off using the split state already discussed. If resuming the read of a split is impossible because of backend constraints, then the only solution is to read splits atomically (either not read the split at all, or read it entirely). That way, in case of interrupted fetch, nothing will be output and the split could be read again from the beginning at next fetch call leading to no duplicates. But if the split is read entirely, there are points to consider:
We should ensure that the total split content (records from the source) fits in memory for example by specifying a max split size in bytes ( see SplitEnumarator) The split state becomes useless, only a Split class is needed RecordEmitter # Example Cassandra RecordEmitter
The SplitReader reads records in the form of an intermediary record format that the implementer provides for each record. It can be the raw format returned by the backend or any format allowing to extract the actual record afterwards. This format is not the final output format expected by the source. It contains anything needed to do the conversion to the record output format. We need to implement RecordEmitter#emitRecord() to do this conversion. A good pattern here is to initialize the RecordEmitter with a mapping Function. The implementation must be idempotent. Indeed the method maybe interrupted in the middle. In that case, the same set of records will be passed to the record emitter again later.
Serializers # Example Cassandra SplitSerializer and SplitEnumeratorStateSerializer
We need to provide singleton serializers for:
Split: splits are serialized when sending them from enumerator to reader, and when checkpointing the reader’s current state SplitEnumeratorState: the serializer is used for the result of the SplitEnumerator#snapshotState() For both, we need to implement SimpleVersionedSerializer . Care needs to be taken at some important points:
Using Java serialization is forbidden in Flink mainly for migration concerns. We should rather manually write the fields of the objects using ObjectOutputStream. When a class is not supported by the ObjectOutputStream (not String, Integer, Long…), we should write the size of the object in bytes as an Integer and then write the object converted to byte[]. Similar method is used to serialize collections. First write the number of elements of the collection, then serialize all the contained objects. Of course, for deserialization we do the exact same reading with the same order. There can be a lot of splits, so we should cache the OutputStream used in SplitSerializer. We can do so by using. ThreadLocal<DataOutputSerializer> SERIALIZER_CACHE = ThreadLocal.withInitial(() -> new DataOutputSerializer(64));
The initial stream size depends on the size of a split.
Testing the source # For the sake of concision of this article, testing the source will be the object of the next article. Stay tuned !
Conclusion # This article gathering the implementation field feedback was needed as the javadocs cannot cover all the implementation details for high-performance and maintainable sources. I hope you enjoyed reading and that it gave you the desire to contribute a new connector to the Flink project !
`}),e.add({id:55,href:"/2023/04/19/apache-flink-ml-2.2.0-release-announcement/",title:"Apache Flink ML 2.2.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink ML 2.2.0! This release focuses on enriching Flink ML’s feature engineering algorithms. The library now includes 33 feature engineering algorithms, making it a more comprehensive library for feature engineering tasks.
With the addition of these algorithms, we believe Flink ML library is ready for use in production jobs that require feature engineering capabilities, whose input can then be consumed by both offline and online machine learning tasks.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Notable Features # Introduced API and infrastructure for online serving # In machine learning, one of the main goals of model training is to deploy the trained model to perform online inference, where the model server must respond to incoming requests with millisecond-level latency. However, prior releases of Flink ML only supported nearline inference using the Flink runtime, which may not meet the requirements of online inference use-cases.
With FLIP-289, Flink ML now provides an API and infrastructure for users to load a ModelServable from model data generated by an Estimator. This ModelServable can be replicated across multiple model servers to process online inference requests in parallel. As the ModelServable is effectively a UDF that does not rely on Flink runtime, it can also be integrated as a UDF into other serving or processing frameworks to serve the model trained by Flink ML.
As a first step, the LogisticRegressionModelServable has been added to serve the logistic regression model online, and more servables will be added in the future. This new feature enables Flink ML to be used for both offline and online machine learning tasks, making it more versatile for a wider range of use cases.
Added 27 feature engineering algorithms # Flink ML 2.2.0 significantly expanded the coverage of feature engineering algorithms, increasing the number from 6 to 33. Flink ML now covers 28 out of the 33 feature engineering algorithms provided in Spark ML, making it a more comprehensive library for feature engineering tasks.
Feature engineering is a critical step in modern AI infrastructures as it can preprocess data not only for traditional machine learning algorithms like GBT but also for deep learning algorithms and large language models like Transformer, which are increasingly popular. With the addition of these algorithms, we hope Flink ML can be more useful in machine-learning tasks for Flink users.
All feature engineering algorithms can be easily accessed through the drop-down list on the left side of this Flink ML page. For each algorithm, we have provided Python and Java examples to demonstrate how to use them.
Added two production-validated online learning algorithms # Flink ML offers a significant advantage over other machine learning libraries in terms of its ability to perform online learning using Flink’s streaming runtime. To leverage this strength, we implemented two online algorithms in Flink ML and successfully used them in a production machine learning job at Alibaba.
This job involves dynamically clustering similar logs and detecting errors in the logs to help site reliability engineers. By using OnlineStandardScaler and AgglomerativeClustering to standardize and cluster logs in real-time, the job is able to update models more frequently with a much simpler infrastructure setup. We presented this work at Flink Forward Asia last year, and it will soon be integrated into the open-source project SREWorks.
With these online algorithms, Flink ML provides users with the ability to continuously update models using new data in real-time, resulting in more accurate and up-to-date predictions. This can be particularly useful in use cases where data is constantly streaming in, and it’s important to make quick decisions based on the latest available information.
Upgrade Notes # This release is fully backward compatible with Flink ML 2.1. Users should be able to upgrade to Flink ML 2.2.0 without worrying about any incompatibilities or breaking changes.
Release Notes and Resources # Please take a look at the release notes for a detailed list of changes and new features.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent distribution of Flink ML Python package is available on PyPI.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
Zhipeng Zhang, Dong Lin, Fan Hong, JiangXin, Zsombor Chikan, huangxingbo, taosiyuan163, vacaly, weibozhao, yunfengzhou-hub
`}),e.add({id:56,href:"/2023/03/23/announcing-the-release-of-apache-flink-1.17/",title:"Announcing the Release of Apache Flink 1.17",section:"Flink Blog",content:`The Apache Flink PMC is pleased to announce Apache Flink release 1.17.0. Apache Flink is the leading stream processing standard, and the concept of unified stream and batch data processing is being successfully adopted in more and more companies. Thanks to our excellent community and contributors, Apache Flink continues to grow as a technology and remains one of the most active projects in the Apache Software Foundation. Flink 1.17 had 172 contributors enthusiastically participating and saw the completion of 7 FLIPs and 600+ issues, bringing many exciting new features and improvements to the community.
Towards Streaming Warehouses # In order to achieve greater efficiency in the realm of streaming warehouse, Flink 1.17 contains substantial improvements to both the performance of batch processing and the semantics of streaming processing. These improvements represent a significant stride towards the creation of a more efficient and streamlined data warehouse, capable of processing large quantities of data in real-time.
For batch processing, this release includes several new features and improvements:
Streaming Warehouse API: FLIP-282 introduces the new Delete and Update API in Flink SQL which only works in batch mode. External storage systems like Flink Table Store can implement row-level modification via this new API. The ALTER TABLE syntax is enhanced by including the ability to ADD/MODIFY/DROP columns, primary keys, and watermarks, making it easier for users to maintain their table schema. Batch Execution Improvements: Execution of batch workloads has been significantly improved in Flink 1.17 in terms of performance, stability and usability. Performance wise, a 26% TPC-DS improvement on 10T dataset is achieved with strategy and operator optimizations, such as new join reordering and adaptive local hash aggregation, Hive aggregate functions improvements, and the Hybrid Shuffle Mode enhancements. Stability wise, Speculative Execution now supports all operators, and the Adaptive Batch Scheduler is more robust against data skew. Usability wise, the tuning effort required for batch workloads has been reduced. The is now the default scheduler in batch mode. The Hybrid Shuffle Mode is compatible with Speculative Execution and the Adaptive Batch Scheduler, next to various configuration simplifications. SQL Client/Gateway: Apache Flink 1.17 introduces the “gateway mode” for SQL Client, allowing users to submit SQL queries to a SQL Gateway for enhanced functionality. Users can use SQL statements to manage job lifecycles, including displaying job information and stopping running jobs. This provides a powerful tool for managing Flink jobs. For stream processing, the following features and improvements are realized:
Streaming SQL Semantics: Non-deterministic operations may bring incorrect results or exceptions which is a challenging topic in streaming SQL. Incorrect optimization plans and functional issues have been fixed, and the experimental feature of PLAN_ADVICE is introduced to inform of potential correctness risks and optimization suggestions to SQL users. Checkpoint Improvements: The generic incremental checkpoint improvements enhance the speed and stability of the checkpoint procedure, and the unaligned checkpoint has improved stability under backpressure and is production-ready in Flink 1.17. Users can manually trigger checkpoints with self-defined checkpoint types while a job is running with the newly introduced REST interface for triggering checkpoints. Watermark Alignment Enhancement: Efficient watermark processing directly affects the execution efficiency of event time applications. In Flink 1.17, FLIP-217 introduces an improvement to watermark alignment by aligning data emission across splits within a source operator. This improvement results in more efficient coordination of watermark progress in the source, which in turn mitigates excessive buffering by downstream operators and enhances the overall efficiency of steaming job execution. StateBackend Upgrade: The updated version of FRocksDB to 6.20.3-ververica-2.0 brings improvements to RocksDBStateBackend like sharing memory between slots, and now supports Apple Silicon chipsets like the Mac M1. Batch processing # As a unified stream and batch data processing engine, Flink stands out particularly in the field of stream processing. In order to improve its batch processing capabilities, the community contributors put in a lot of effort into improving Flink’s batch performance and ecosystem in version 1.17. This makes it easier for users to build a streaming warehouse based on Flink.
Speculative Execution # Speculative Execution for sinks is now supported. Previously, Speculative Execution was not enabled for sinks to avoid instability or incorrect results. In Flink 1.17, the context of sinks is improved so that sinks, including new sinks and OutputFormat sinks, are aware of the number of attempts. With the number of attempts, sinks are able to isolate the produced data of different attempts of the same subtask, even if the attempts are running at the same time. The FinalizeOnMaster interface is also improved so that OutputFormat sinks can see which attempts are finished and then properly commit the written data. Once a sink can work well with concurrent attempts, it can implement the decorative interface SupportsConcurrentExecutionAttempts so that Speculative Execution is allowed to be performed on it. Some built in sinks are enabled to do Speculative Execution, including DiscardingSink, PrintSinkFunction, PrintSink, FileSink, FileSystemOutputFormat and HiveTableSink.
The slow task detection is improved for Speculative Execution. Previously, it only considered the execution time of tasks when deciding which tasks are slow. It now takes the input data volume of tasks into account. Tasks which have a longer execution time but consume more data may not be considered as slow. This improvement helps to eliminate the negative impacts from data skew on slow task detection.
Adaptive Batch Scheduler # Adaptive Batch Scheduler is now used for batch jobs by default. This scheduler can automatically decide a proper parallelism of each job vertex, based on how much data the vertex processes. It is also the only scheduler which supports speculative execution.
The configuration of Adaptive Batch Scheduler is improved for ease of use. Users no longer need to explicitly set the global default parallelism to -1 to enable automatically deciding parallelism. Instead, the global default parallelism, if set, will be used as the upper bound when deciding the parallelism. The keys of Adaptive Batch Scheduler configuration options are also renamed to be easier to understand.
The capabilities of Adaptive Batch Scheduler are also improved. It now supports evenly distributing data to downstream tasks, based on fine-grained data distribution information. The limitation that the decided parallelism of vertices can only be a power of 2 is no longer needed and therefore removed.
Hybrid Shuffle Mode # Various important improvements to the Hybrid Shuffle Mode are available in this release.
Hybrid Shuffle Mode now supports Adaptive Batch Scheduler and Speculative Execution. Hybrid Shuffle Mode now supports reusing intermediate data when possible, which brings significant performance improvements. The stability is improved to avoid stability issues in large scale production. More details can be found at the Hybrid-Shuffle section of the documentation.
TPC-DS Benchmark # Starting with Flink 1.16, the performance of the Batch engine has continuously been optimized. In Flink 1.16, dynamic partition pruning was introduced, but not all TPC-DS queries could be optimized. In Flink 1.17, the algorithm has been improved, and most of the TPC-DS results are now optimized. In Flink 1.17, a dynamic programming join-reorder algorithm is introduced, which has a better working, larger search space compared to the previous algorithm.. The planner can automatically select the appropriate join-reorder algorithm based on the number of joins in a query, so that users no longer need to care about the join-reorder algorithms. (Note: the join-reorder is disabled by default, and you need to enable it when running TPC-DS.) In the operator layer, dynamic hash local aggregation strategy is introduced, which can dynamically determine according to the data distribution whether the local hash aggregation operation is needed to improve performance. In the runtime layer, some unnecessary virtual function calls are removed to speed up the execution. To summarize, Flink 1.17 has a 26% performance improvement compared to Flink 1.16 on a 10T dataset for partitioned tables.
SQL Client / Gateway # Apache Flink 1.17 introduces a new feature called “gateway mode” for the SQL Client, which enhances its functionality by allowing it to connect to a remote gateway and submit SQL queries like it does in embedded mode. This new mode offers users much more convenience when working with the SQL Gateway.
In addition, the SQL Client/SQL Gateway now provides new support for managing job lifecycles through SQL statements. Users can use SQL statements to display all job information stored in the JobManager and stop running jobs using their unique job IDs. With this new feature, SQL Client/Gateway now has almost the same functionality as Flink CLI, making it another powerful tool for managing Flink jobs.
SQL API # Row-Level SQL Delete & Update are becoming more and more important in modern big data workflows. The use cases include deleting a set of rows for regulatory compliance, updating a set of rows for data correction, etc. Many popular engines such as Trino or Hive have supported them. In Flink 1.17, the new Delete & Update API is introduced in Flink, which works in batch mode and is exposed to connectors. Now external storage systems can implement row-level modification via this new API. Furthermore, the ALTER TABLE syntax is extended to include the ability to ADD/MODIFY/DROP columns, primary keys, and watermarks. These enhanced capabilities provide users with the flexibility to maintain their table schema metadata according to their needs.
Hive Compatibility # Apache Flink 1.17 brings new improvements to the Hive table sink, making it more efficient than ever before. In previous versions, the Hive table sink only supported automatic file compaction in streaming mode, but not in batch mode. In Flink 1.17, the Hive table sink can now automatically compact newly written files in batch mode as well. This feature can greatly reduce the number of small files. Also, for using Hive built-in functions via HiveModule, Flink introduces several native Hive aggregation functions including SUM/COUNT/AVG/MIN/MAX to HiveModule. These functions can be executed using the hash-based aggregation operator and then bring significant performance improvements.
Streaming Processing # In Flink 1.17, difficult Streaming SQL semantics and correctness issues are addressed, checkpoint performance is optimized, watermark alignment is enhanced, Streaming FileSink expands ABFS(Azure Blob Filesystem) support, Calcite and FRocksDB have been upgraded to newer versions. These improvements further enhance the capabilities of Flink in the field of stream processing.
Streaming SQL Semantics # In terms of correctness and semantic enhancement, Flink 1.17 introduces the experimental feature PLAN_ADVICE that detects potential correctness risks and provides optimization suggestions. For example, if a NDU (Non-deterministic Updates) issue is detected by EXPLAIN PLAN_ADVICE, the optimizer will append the advice at the end of the physical plan, the optimizer will then tag the advice id to the relational node of related operations, and recommend that users update their configurations accordingly. By providing users with this specific advice, the optimizer can help them improve the accuracy and reliability of their query results.
== Optimized Physical Plan With Advice == ... advice[1]: [WARNING] The column(s): day(generated by non-deterministic function: CURRENT_TIMESTAMP ) can not satisfy the determinism requirement for correctly processing update message('UB'/'UA'/'D' in changelogMode, not 'I' only), this usually happens when input node has no upsertKey(upsertKeys=[{}]) or current node outputs non-deterministic update messages. Please consider removing these non-deterministic columns or making them deterministic by using deterministic functions. The PLAN_ADVICE also helps users improve the performance and efficiency of their queries. For example, when a GroupAggregate operation is detected and can be optimized to a more efficient local-global aggregation. By providing users with this specific advice for optimization, the optimizer enables users to easily improve the performance and efficiency of their queries.
== Optimized Physical Plan With Advice == ... advice[1]: [ADVICE] You might want to enable local-global two-phase optimization by configuring ('table.optimizer.agg-phase-strategy' to 'AUTO'). In addition, Flink 1.17 resolved several incorrect plan optimizations which led to incorrect results reported in FLINK-29849, FLINK-30006, and FLINK-30841.
Checkpoint Improvements # Generic Incremental Checkpoint (GIC) aims to improve the speed and stability of the checkpoint procedure. Some experimental results in the WordCount case are shown as below. More details can be found in this blog post.
Table1: Benefits after enabling GIC in WordCount case
Table2: Costs after enabling GIC in WordCount case
Unaligned Checkpoints (UC) greatly increase the completed ratio of checkpoints under backpressure. The previous UC implementation would write many small files which may cause high load for the namenode of HDFS. In this release, this problem is resolved to make UC more usable in the production environment.
In 1.17, a REST API is provided so that users can manually trigger checkpoints with a self-defined checkpoint type while a job is running. For example, for a job running with incremental checkpoint, users can trigger a full checkpoint periodically or manually to break the incremental checkpoint chain to avoid referring to files from a long time ago.
Watermark Alignment Support # In earlier versions, FLIP-182 proposed a solution called watermark alignment to tackle the issue of data skew in event time applications caused by imbalanced sources. However, it had a limitation that the source parallelism had to match the number of splits. This was because a source operator with multiple splits might need to buffer a considerable amount of data if one split emitted data faster than another. To address this limitation, Flink 1.17 introduced FLIP-217, which enhances watermark alignment to align data emission across splits within a source operator while considering watermark boundaries. This enhancement ensures more coordinated watermark progress in the source, preventing downstream operators from buffering excessive data and improving the execution efficiency of streaming jobs.
Streaming FileSink Expansion # Following the addition of ABFS support, the FileSink is now able to function in streaming mode with a total of five different filesystems: HDFS, S3, OSS, ABFS, and Local. This expansion effectively covers the majority of main filesystems, providing a comprehensive range of options and increased versatility for users.
RocksDBStateBackend Upgrade # This release has updated the version of FRocksDB to 6.20.3-ververica-2.0 which brings improvements for RocksDBStateBackend:
Support build FRocksDB Java on Apple Silicon chipsets, such as Mac M1 and M2. Improve the performance of compaction filter by avoiding expensive ToString() Upgrade ZLIB version of FRocksDB to avoid memory corruption Add periodic_compaction_seconds option to RocksJava Please see FLINK-30836 for more details.
This release also widens the scope of sharing memory between slots to TaskManager, which can help to increase the memory efficiency if the memory usage of slots in a TaskManager is uneven. Furthermore, it can reduce the overall memory consumption at the expense of resource isolation after tuning. Read more about state.backend.rocksdb.memory.fixed-per-tm configuration.
Calcite Upgrade # Flink 1.17 is upgraded to Calcite version 1.29.0 to improve the performance and efficiency of the Flink SQL system. Flink 1.16 uses Calcite 1.26.0 which has severe issues with RexNode simplification caused by the SEARCH operator. This leads to wrong data from query optimization as reported in CALCITE-4325 and CALCITE-4352. By upgrading the version of Calcite, Flink can take advantage of its improved performance and new features in Flink SQL processing. This resolves multiple bugs and leads to faster query processing times.
Others # PyFlink # The Flink 1.17 release includes updates to PyFlink, the Python interface for Apache Flink. Notable improvements include support for Python 3.10 and execution capabilities on Apple Silicon chipsets, such as the Mac M1 and M2 computers. Additionally, the release includes minor optimizations that enhance cross-process communication stability between Java and Python processes, enable the specification of data types of Python UDFs via strings to improve usability, and support access to job parameters in Python UDFs. This release focuses on improving PyFlink’s functionality and usability, rather than introducing new major features. However, these enhancements are expected to improve the user experience and facilitate efficient data processing.
Daily Performance Benchmark # In Flink 1.17, daily performance monitoring has been integrated into the #flink-dev-benchmarks Slack channel. This feature is crucial in quickly identifying regressions and ensuring the quality of the code. Once a regression is identified through the Slack channel or the speed center, developers can refer to the guidance provided in the Benchmark’s wiki to address the issue effectively. This feature helps the community take a proactive approach in ensuring system performance, resulting in a better product and increased user satisfaction.
Subtask Level Flame Graph # Starting with Flink 1.17, Flame Graph provides “drill down” visualizations to the task level, which allows users to gain a more detailed understanding of the performance of their tasks. This feature is a significant improvement over previous versions of Flame Graph, as it empowers users to select a subtask of interest and see the corresponding flame graph. By doing so, users can identify specific areas where their tasks may be experiencing performance issues and take steps to address them. This can lead to significant improvements in the overall efficiency and effectiveness of their data processing pipelines.
Generalized Delegation Token Support # Previously, Flink supported Kerberos authentication and Hadoop based tokens. With FLIP-272 being finalized, Flink’s delegation token framework is generalized to make it authentication protocol agnostic. This will allow contributors in the future to add support for non-Hadoop compliant frameworks where the authentication protocol is not based on Kerberos. Additionally, FLIP-211 was implemented which improves Flink’s interactions with Kerberos: It reduces the number of requests that are necessary to exchange delegation tokens in Flink.
Upgrade Notes # The Flink community tries to ensure that upgrades are as seamless as possible. However, certain changes may require users to make adjustments to certain parts of the program when upgrading to version 1.17. Please refer to the release notes for a comprehensive list of adjustments to make and issues to check during the upgrading process.
List of Contributors # The Apache Flink community would like to express gratitude to all the contributors who made this release possible:
Ahmed Hamdy, Aitozi, Aleksandr Pilipenko, Alexander Fedulov, Alexander Preuß, Anton Kalashnikov, Arvid Heise, Bo Cui, Brayno, Carlos Castro, ChangZhuo Chen (陳昌倬), Chen Qin, Chesnay Schepler, Clemens, ConradJam, Danny Cranmer, Dawid Wysakowicz, Dian Fu, Dong Lin, Dongjoon Hyun, Elphas Toringepi, Eric Xiao, Fabian Paul, Ferenc Csaky, Gabor Somogyi, Gen Luo, Gunnar Morling, Gyula Fora, Hangxiang Yu, Hong Liang Teoh, HuangXingBo, Jacky Lau, Jane Chan, Jark Wu, Jiale, Jin, Jing Ge, Jinzhong Li, Joao Boto, John Roesler, Jun He, JunRuiLee, Junrui Lee, Juntao Hu, Krzysztof Chmielewski, Leonard Xu, Licho, Lijie Wang, Mark Canlas, Martijn Visser, MartijnVisser, Martin Liu, Marton Balassi, Mason Chen, Matt, Matthias Pohl, Maximilian Michels, Mingliang Liu, Mulavar, Nico Kruber, Noah, Paul Lin, Peter Huang, Piotr Nowojski, Qing Lim, QingWei, Qingsheng Ren, Rakesh, Ran Tao, Robert Metzger, Roc Marshal, Roman Khachatryan, Ron, Rui Fan, Ryan Skraba, Salva Alcántara, Samrat, Samrat Deb, Samrat002, Sebastian Mattheis, Sergey Nuyanzin, Seth Saperstein, Shengkai, Shuiqiang Chen, Smirnov Alexander, Sriram Ganesh, Steven van Rossum, Tartarus0zm, Timo Walther, Venkata krishnan Sowrirajan, Wei Zhong, Weihua Hu, Weijie Guo, Xianxun Ye, Xintong Song, Yash Mayya, YasuoStudyJava, Yu Chen, Yubin Li, Yufan Sheng, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhenqiu Huang, Zhu Zhu, ZmmBigdata, bzhaoopenstack, chengshuo.cs, chenxujun, chenyuzhi, chenyuzhi459, chenzihao, dependabot[bot], fanrui, fengli, frankeshi, fredia, godfreyhe, gongzhongqiang, harker2015, hehuiyuan, hiscat, huangxingbo, hunter-cloud09, ifndef-SleePy, jeremyber-aws, jiangjiguang, jingge, kevin.cyj, kristoffSC, kurt, laughingman7743, libowen, lincoln lee, lincoln.lil, liujiangang, liujingmao, liuyongvs, liuzhuang2017, luoyuxia, mas-chen, moqimoqidea, muggleChen, noelo, ouyangwulin, ramkrish86, saikikun, sammieliu, shihong90, shuiqiangchen, snuyanzin, sunxia, sxnan, tison, todd5167, tonyzhu918, wangfeifan, wenbingshen, xuyang, yiksanchan, yunfengzhou-hub, yunhong, yuxia Luo, yuzelin, zhangjingcun, zhangmang, zhengyunhong.zyh, zhouli, zoucao, 沈嘉琦
`}),e.add({id:57,href:"/2023/03/15/apache-flink-1.15.4-release-announcement/",title:"Apache Flink 1.15.4 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the fourth bug fix release of the Flink 1.15 series.
This release includes 53 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.15.4.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.15.4</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.15.4</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.15.4</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.15.4 Release Notes # Bug [FLINK-27341] - TaskManager running together with JobManager are bind to 127.0.0.1 [FLINK-27800] - addInEdge check state error [FLINK-27944] - IO metrics collision happens if a task has union inputs [FLINK-28526] - Fail to lateral join with UDTF from Table with timstamp column [FLINK-28695] - Fail to send partition request to restarted taskmanager [FLINK-28742] - Table.to_pandas fails with lit("xxx") [FLINK-28863] - Snapshot result of RocksDB native savepoint should have empty shared-state [FLINK-29231] - PyFlink UDAF produces different results in the same sliding window [FLINK-29234] - Dead lock in DefaultLeaderElectionService [FLINK-30133] - HadoopModuleFactory creates error if the security module cannot be loaded [FLINK-30168] - PyFlink Deserialization Error with Object Array [FLINK-30304] - Possible Deadlock in Kinesis/Firehose/DynamoDB Connector [FLINK-30308] - ClassCastException: class java.io.ObjectStreamClass$Caches$1 cannot be cast to class java.util.Map is showing in the logging when the job shutdown [FLINK-30366] - Python Group Agg failed in cleaning the idle state [FLINK-30461] - Some rocksdb sst files will remain forever [FLINK-30637] - In linux-aarch64 environment, using “is” judgment to match the window type of overwindow have returned incorrect matching results [FLINK-30679] - Can not load the data of hive dim table when project-push-down is introduced [FLINK-30803] - PyFlink mishandles script dependencies [FLINK-30864] - Optional pattern at the start of a group pattern not working [FLINK-30885] - Optional group pattern starts with non-optional looping pattern get wrong result on followed-by [FLINK-31041] - Build up of pending global failures causes JM instability [FLINK-31043] - KeyError exception is thrown in CachedMapState [FLINK-31183] - Flink Kinesis EFO Consumer can fail to stop gracefully [FLINK-31272] - Duplicate operators appear in the StreamGraph for Python DataStream API jobs [FLINK-31283] - Correct the description of building from source with scala version [FLINK-31286] - Python processes are still alive when shutting down a session cluster directly without stopping the jobs Improvement [FLINK-27327] - Add description about changing max parallelism explicitly leads to state incompatibility [FLINK-29155] - Improve default config of grpcServer in Process Mode [FLINK-29639] - Add ResourceId in TransportException for debugging [FLINK-29729] - Fix credential info configured in flink-conf.yaml is lost during creating ParquetReader [FLINK-29966] - Replace and redesign the Python api documentation base [FLINK-30633] - Update AWS SDKv2 to v2.19.14 [FLINK-30724] - Update doc of kafka per-partition watermark to FLIP-27 source [FLINK-30962] - Improve error messaging when launching py4j gateway server [FLINK-31031] - Disable the output buffer of Python process to make it more convenient for interactive users Sub-task [FLINK-30462] - DefaultMultipleComponentLeaderElectionService saves wrong leader session ID `}),e.add({id:58,href:"/2023/02/27/apache-flink-kubernetes-operator-1.4.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.4.0 Release Announcement",section:"Flink Blog",content:`We are proud to announce the latest stable release of the operator. In addition to the expected stability improvements and fixes, the 1.4.0 release introduces the first version of the long-awaited autoscaler module.
Flink Streaming Job Autoscaler # A highly requested feature for Flink applications is the ability to scale the pipeline based on incoming data load and the utilization of the dataflow. While Flink has already provided some of the required building blocks, this feature has not yet been realized in the open source ecosystem.
With FLIP-271 the community set out to build such an autoscaler component as part of the Kubernetes Operator subproject. The Kubernetes Operator proved to be a great place for the autoscaler module as it already contains all the necessary bits for managing and upgrading production streaming applications.
Fast-forward to the 1.4.0 release, we now have the first fully functional autoscaler implementation in the operator, ready to be tested and used in production applications. For more, detailed information, please refer to the Autoscaler Documentation.
Overview # The autoscaler uses Flink task metrics to effectively and independently scale the job vertices of the streaming pipeline. This removes backpressure from the job to ensure an optimal flow of data at the lowest possible resource usage. All kind of jobs, including SQL jobs, can be scaled with this method.
The approach is based on Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows by Kalavri et al. Shoutout to our fellow Flink community member and committer Vasiliki Kalavri! The used metrics include: Source metrics:
number of pending records (source only) number of partitions (source only) ingestion rate (source only) processing rate time spent processing (utilization) The algorithm starts from the sources and recursively computes the required processing capacity for each operator in the pipeline. At the source vertices, target data rate (processing capacity) is equal to the data rate in Kafka. For other operators we compute it as the sum of the input (upstream) operators output data rate.
Users configure the target utilization percentage of the operators in the pipeline, e.g. keep the all operators between 60% - 80% busy. The autoscaler then finds a configuration such that the output rates of all operators match the input rates of all their downstream operators at the targeted utilization.
In this example we see an upscale operation:
Similarly as load decreases, the autoscaler adjusts individual operator parallelism levels to match the current rate over time.
The operator reports detailed JobVertex level metrics about the evaluated Flink job metrics that are collected and used in the scaling decision. This includes:
Utilization, input rate, target rate metrics Scaling thresholds Parallelism and max parallelism changes over time These metrics are reported under the Kubernetes Operator Resource metric group:
[resource_prefix].Autoscaler.[jobVertexID].[ScalingMetric].Current/Average Limitations # While we are very happy with the progress we have made in the last few months, the autoscaler is still in an early stage of development. We rely on users to share feedback so we can improve and make this a very robust component.
The autoscaler currently requires Flink 1.17 Source scaling requires the standard connector metrics, currently works best with Kafka sources ZooKeeper HA Support # Until now the operator only integrated with the Flink Kubernetes HA mechanism for last-state and other types of application upgrades. 1.4.0 adds support for the ZooKeeper HA storage as well.
While ZooKeeper is a slightly older solution, many users are still using it for HA metadata even in the Kubernetes world.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.4.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.4.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.4.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.4.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Anton Ippolitov, FabioWanner, Gabor Somogyi, Gyula Fora, James Busche, Kyle Ahn, Matyas Orhidi, Maximilian Michels, Mohemmad Zaid Khan, Márton Balassi, Navaneesh Kumar, Ottomata, Peter Huang, Rodrigo, Shang Yuanchun, Shipeng Xie, Swathi Chandrashekar, Tony Garrard, Usamah Jassat, Vincent Chenal, Zsombor Chikan, Peter Vary
`}),e.add({id:59,href:"/2023/01/30/apache-flink-1.16.1-release-announcement/",title:"Apache Flink 1.16.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.16 series.
This release includes 84 bug fixes, vulnerability fixes, and minor improvements for Flink 1.16. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.16.1.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.16.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.16.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.16.1</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.16.1 Upgrade Notes # FLINK-28988 - Incorrect result for filter after temporal join The filter will not be pushed down into both inputs of the event time temporal join. This may cause incompatible plan changes compared to Flink 1.16.0, e.g., when the left input is an upsert source (like upsert-kafka connector), the query plan will remove the ChangelogNormalize node in Flink 1.16.1, while it did appear in 1.16.0.
FLINK-29849 - Event time temporal join on an upsert source may produce incorrect execution plan This resolves the correctness issue when doing an event time temporal join with a versioned table backed by an upsert source. When the right input of the join is an upsert source, it no longer generates a ChangelogNormalize node for it. This is an incompatible plan change compared to 1.16.0
FLINK-30383 - UseLogicalIdentifier makes datadog consider metric as custom The Datadog reporter now adds a “flink.” prefix to metric identifiers if “useLogicalIdentifier” is enabled. This is required for these metrics to be recognized as Flink metrics, not custom ones.
Release Notes # Bug [FLINK-16582] - NettyBufferPoolTest may have warns on NettyBuffer leak [FLINK-26037] - TaskManagerRunner may crash during shutdown sequence [FLINK-26890] - DynamoDB consumer error consuming partitions close to retention [FLINK-27341] - TaskManager running together with JobManager are bind to 127.0.0.1 [FLINK-27944] - IO metrics collision happens if a task has union inputs [FLINK-28102] - Flink AkkaRpcSystemLoader fails when temporary directory is a symlink [FLINK-28526] - Fail to lateral join with UDTF from Table with timstamp column [FLINK-28695] - Fail to send partition request to restarted taskmanager [FLINK-28742] - Table.to_pandas fails with lit("xxx") [FLINK-28786] - Cannot run PyFlink 1.16 on MacOS with M1 chip [FLINK-28863] - Snapshot result of RocksDB native savepoint should have empty shared-state [FLINK-28960] - Pulsar throws java.lang.NoClassDefFoundError: javax/xml/bind/annotation/XmlElement [FLINK-28988] - Incorrect result for filter after temporal join [FLINK-29231] - PyFlink UDAF produces different results in the same sliding window [FLINK-29234] - Dead lock in DefaultLeaderElectionService [FLINK-29298] - LocalBufferPool request buffer from NetworkBufferPool hanging [FLINK-29479] - Support whether using system PythonPath for PyFlink jobs [FLINK-29539] - dnsPolicy in FlinkPod is not overridable [FLINK-29615] - MetricStore does not remove metrics of nonexistent subtasks when adaptive scheduler lowers job parallelism [FLINK-29627] - Sink - Duplicate key exception during recover more than 1 committable. [FLINK-29677] - Prevent dropping the current catalog [FLINK-29728] - TablePlanner prevents Flink from starting is working directory is a symbolic link [FLINK-29749] - flink info command support dynamic properties [FLINK-29781] - ChangelogNormalize uses wrong keys after transformation by WatermarkAssignerChangelogNormalizeTransposeRule [FLINK-29803] - Table API Scala APIs lack proper source jars [FLINK-29817] - Published metadata for apache-flink in pypi are inconsistent and causes poetry to fail [FLINK-29827] - [Connector][AsyncSinkWriter] Checkpointed states block writer from sending records [FLINK-29839] - HiveServer2 endpoint doesn't support TGetInfoType value 'CLI_ODBC_KEYWORDS' [FLINK-29849] - Event time temporal join on an upsert source may produce incorrect execution plan [FLINK-29857] - Fix failure to connect to 'HiveServer2Endpoint' when using hive3 beeline [FLINK-29899] - Stacktrace printing in DefaultExecutionGraphCacheTest is confusing maven test log output [FLINK-29923] - Hybrid Shuffle may face deadlock when running a task need to execute big size data [FLINK-29927] - AkkaUtils#getAddress may cause memory leak [FLINK-30030] - Unexpected behavior for overwrite in Hive dialect [FLINK-30133] - HadoopModuleFactory creates error if the security module cannot be loaded [FLINK-30168] - PyFlink Deserialization Error with Object Array [FLINK-30189] - HsSubpartitionFileReader may load data that has been consumed from memory [FLINK-30239] - The flame graph doesn't work due to groupExecutionsByLocation has bug [FLINK-30304] - Possible Deadlock in Kinesis/Firehose/DynamoDB Connector [FLINK-30308] - ClassCastException: class java.io.ObjectStreamClass$Caches$1 cannot be cast to class java.util.Map is showing in the logging when the job shutdown [FLINK-30334] - SourceCoordinator error splitRequest check cause HybridSource loss of data and hang [FLINK-30359] - Encountered NoClassDefFoundError when using flink-sql-connector-elasticsearch6 [FLINK-30366] - Python Group Agg failed in cleaning the idle state [FLINK-30525] - Cannot open jobmanager configuration web page [FLINK-30558] - The metric 'numRestarts' reported in SchedulerBase will be overridden by metric 'fullRestarts' [FLINK-30637] - In linux-aarch64 environment, using “is” judgment to match the window type of overwindow have returned incorrect matching results Improvement [FLINK-27327] - Add description about changing max parallelism explicitly leads to state incompatibility [FLINK-29134] - fetch metrics may cause oom(ThreadPool task pile up) [FLINK-29155] - Improve default config of grpcServer in Process Mode [FLINK-29244] - Add metric lastMaterializationDuration to ChangelogMaterializationMetricGroup [FLINK-29458] - When two tables have the same field, do not specify the table name,Exception will be thrown:SqlValidatorException :Column 'currency' is ambiguous [FLINK-29639] - Add ResourceId in TransportException for debugging [FLINK-29693] - MiniClusterExtension should respect DEFAULT_PARALLELISM if set [FLINK-29834] - Clear static Jackson TypeFactory cache on CL release [FLINK-29966] - Replace and redesign the Python api documentation base [FLINK-30016] - Update Flink 1.16 release notes about updated oshi-core [FLINK-30116] - Don't Show Env Vars in Web UI [FLINK-30183] - We should add a proper error message in case the deprecated reflection-based instantiation of a reporter is triggered [FLINK-30357] - Wrong link in connector/jdbc doc. [FLINK-30436] - Integrate Opensearch connector docs into Flink docs v1.16 [FLINK-30592] - The unsupported hive version is not deleted on the hive overview document [FLINK-30633] - Update AWS SDKv2 to v2.19.14 [FLINK-30724] - Update doc of kafka per-partition watermark to FLIP-27 source Technical Debt [FLINK-27731] - Remove Hugo Modules integration [FLINK-29157] - Clarify the contract between CompletedCheckpointStore and SharedStateRegistry [FLINK-29957] - Rework connector docs integration [FLINK-29958] - Add new connector_artifact shortcode [FLINK-29972] - Pin Flink docs to Elasticsearch Connector 3.0.0 [FLINK-29973] - connector_artifact should append Flink minor version [FLINK-30291] - Integrate flink-connector-aws into Flink docs [FLINK-30382] - Flink 1.16 to integrate KDS/KDF docs from flink-connector-aws [FLINK-30383] - UseLogicalIdentifier makes datadog consider metric as custom `}),e.add({id:60,href:"/2023/01/20/delegation-token-framework-obtain-distribute-and-use-temporary-credentials-automatically/",title:"Delegation Token Framework: Obtain, Distribute and Use Temporary Credentials Automatically",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce that the upcoming minor version of Flink (1.17) includes the Delegation Token Framework proposed in FLIP-272. This enables Flink to authenticate to external services at a central location (JobManager) and distribute authentication tokens to the TaskManagers.
Introduction # Authentication in distributed systems is not an easy task. Previously all worker nodes (TaskManagers) reading from or writing to an external system needed to authenticate on their own. In such a case several things can go wrong, including but not limited to:
Too many authentication requests (potentially resulting in rejected requests) Large number of retries on authentication failures Re-occurring propagation/update of temporary credentials in a timely manner Dependency issues when external system libraries are having the same dependency with different versions Each authentication/temporary credentials are different making standardization challenging … The aim of Delegation Token Framework is to solve the above challenges. The framework is authentication protocol agnostic and pluggable. The primary design concept is that authentication happens only at a single location (JobManager), the obtained temporary credentials propagated automatically to all the task managers where they can be used. The token re-obtain process is also handled in the JobManager.
New authentication providers can be added with small amount of code which is going to be loaded by Flink automatically. At the moment the following external systems are supported:
Hadoop filesystems HBase S3 Planned, but not yet implemented/contributed:
Kafka Hive The design and implementation approach has already been proven in Apache Spark. Gabor is a Spark committer, he championed this feature in the Spark community. The most notable improvement we achieved compared to the current state in Spark is that the framework in Flink is already authentication protocol agnostic (and not bound to Kerberos).
Documentation # For more details please refer to the following documentation:
Delegation Tokens In General How to use Kerberos delegation tokens Development details # Major tickets where the framework has been added:
FLINK-21232 Kerberos delegation token framework FLINK-29918 Generalized delegation token support FLINK-30704 Add S3 delegation token support Example implementation # Adding a new authentication protocol is relatively straight-forward:
Check out the example implementation Change FlinkTestJavaDelegationTokenProvider.obtainDelegationTokens to obtain a custom token from any external service Change FlinkTestJavaDelegationTokenReceiver.onNewTokensObtained to receive the previously obtained tokens on all task managers Use the tokens for external service authentication Compile the project and put it into the classpath (adding it inside a plugin also supported) Enjoy that Flink does all the heavy lifting behind the scenes :-) Example implementation testing # The existing providers are tested with the Flink Kubernetes Operator but one can use any other supported deployment model, because the framework is not bound to any of them. We choose the Kubernetes Operator so that we could provide a completely containerized and easily reproducible test environment.
An example tutorial can be found here on external system authentication.
Summary # The Delegation Token Framework is feature complete on the master branch and is becoming generally available on the release of Flink 1.17. The framework obtains authentication tokens at a central location and propagates them to all workers on a re-occurring basis.
Any connector to an external system which supports authentication can be a potential user of this framework. To support authentication in your connector we encourage you to implement your own DelegationTokenProvider/DelegationTokenReceiver pair.
`}),e.add({id:61,href:"/2023/01/13/apache-flink-table-store-0.3.0-release-announcement/",title:"Apache Flink Table Store 0.3.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the release of the Apache Flink Table Store (0.3.0).
We highly recommend all users upgrade to Flink Table Store 0.3.0. 0.3.0 completed 150+ issues, which were completed by nearly 30 contributors.
Please check out the full documentation for detailed information and user guides.
Flink Table Store 0.3 completes many exciting features, enhances its ability as a data lake storage and greatly improves the availability of its stream pipeline. Some important features are described below.
Changelog Producer: Full-Compaction # If:
You are using partial-update or aggregation table, at the time of writing, table store can’t know what the result is after merging, so table store can’t generate the corresponding changelog. Your input can’t produce a complete changelog but you still want to get rid of the costly normalized operator, You may consider using the Full compaction changelog producer.
By specifying 'changelog-producer' = 'full-compaction', Table Store will compare the results between full compactions and produce the differences as changelog. The latency of changelog is affected by the frequency of full compactions. By specifying changelog-producer.compaction-interval table property (default value 30min), users can define the maximum interval between two full compactions to ensure latency.
Dedicated Compaction Job && Multiple Writers # By default, Table Store writers will perform compaction as needed when writing records. This is sufficient for most use cases, but there are two downsides:
This may result in unstable write throughput because throughput might temporarily drop when performing a compaction. Compaction will mark some data files as “deleted”. If multiple writers mark the same file a conflict will occur when committing the changes. Table Store will automatically resolve the conflict, but this may result in job restarts. To avoid these downsides, users can also choose to skip compactions in writers, and run a dedicated job only for compaction. As compactions are performed only by the dedicated job, writers can continuously write records without pausing and no conflicts will ever occur.
To skip compactions in writers, set write-only to true.
To run a dedicated job for compaction, follow these instructions.
Flink SQL currently does not support statements related to compactions, so we have to submit the compaction job through flink run.
Run the following command to submit a compaction job for the table.
<FLINK_HOME>/bin/flink run \\ -c org.apache.flink.table.store.connector.action.FlinkActions \\ /path/to/flink-table-store-dist-<version>.jar \\ compact \\ --warehouse <warehouse-path> \\ --database <database-name> \\ --table <table-name> Aggregation Table # Sometimes users only care about aggregated results. The aggregation merge engine aggregates each value field with the latest data one by one under the same primary key according to the aggregate function.
Each field that is not part of the primary keys must be given an aggregate function, specified by the fields.<field-name>.aggregate-function table property.
For example:
CREATE TABLE MyTable ( product_id BIGINT, price DOUBLE, sales BIGINT, PRIMARY KEY (product_id) NOT ENFORCED ) WITH ( 'merge-engine' = 'aggregation', 'fields.price.aggregate-function' = 'max', 'fields.sales.aggregate-function' = 'sum' ); Schema Evolution # In version 0.2, the research and development of Schema Evolution has begun. In version 0.3, some of the capabilities of Schema Evolution have been completed. You can use below operations in Spark-SQL (Flink SQL completes the following syntax in 1.17):
Adding New Columns Renaming Column Name Dropping Columns Flink Table Store ensures that the above operations are safe, and the old data will automatically adapt to the new schema when it is read.
For example:
CREATE TABLE T (i INT, j INT); INSERT INTO T (1, 1); ALTER TABLE T ADD COLUMN k INT; ALTER TABLE T RENAME COLUMN i to a; INSERT INTO T (2, 2, 2); SELECT * FROM T; -- outputs (1, 1, NULL) and (2, 2, 2) Flink Lookup Join # Lookup Joins are a type of join in streaming queries. It is used to enrich a table with data that is queried from Table Store. The join requires one table to have a processing time attribute and the other table to be backed by a lookup source connector.
Table Store supports lookup joins on unpartitioned tables with primary keys in Flink.
The lookup join operator will maintain a RocksDB cache locally and pull the latest updates of the table in real time. Lookup join operator will only pull the necessary data, so your filter conditions are very important for performance.
This feature is only suitable for tables containing at most tens of millions of records to avoid excessive use of local disks.
Time Traveling # You can use Snapshots Table and Scan Mode to time traveling.
You can view all current snapshots by reading snapshots table. You can travel time through from-timestamp and from-snapshot. For example:
SELECT * FROM T$options; /* +--------------+------------+--------------+-------------------------+ | snapshot_id | schema_id | commit_kind | commit_time | +--------------+------------+--------------+-------------------------+ | 2 | 0 | APPEND | 2022-10-26 11:44:15.600 | | 1 | 0 | APPEND | 2022-10-26 11:44:15.148 | +--------------+------------+--------------+-------------------------+ 2 rows in set */ SELECT * FROM T /*+ OPTIONS('from-snapshot'='1') */; Audit Log Table # If you need to audit the changelog of the table, you can use the audit_log system table. Through audit_log table, you can get the rowkind column when you get the incremental data of the table. You can use this column for filtering and other operations to complete the audit.
There are four values for rowkind:
+I: Insertion operation. -U: Update operation with the previous content of the updated row. +U: Update operation with new content of the updated row. -D: Deletion operation. For example:
SELECT * FROM MyTable$audit_log; /* +------------------+-----------------+-----------------+ | rowkind | column_0 | column_1 | +------------------+-----------------+-----------------+ | +I | ... | ... | +------------------+-----------------+-----------------+ | -U | ... | ... | +------------------+-----------------+-----------------+ | +U | ... | ... | +------------------+-----------------+-----------------+ 3 rows in set */ Ecosystem # Flink Table Store continues to strengthen its ecosystem and gradually gets through the reading and writing of all engines. Each engine below 0.3 has been enhanced.
Spark write has been supported. But INSERT OVERWRITE and stream write are still unsupported. S3 and OSS are supported by all computing engines. Hive 3.1 is supported. Trino the latest version (JDK 1s.17) is supported. Getting started # Please refer to the getting started guide for more details.
What’s Next? # In the upcoming 0.4.0 release you can expect the following additional features:
Provides Flink decoupled independent Java APIs Spark: enhance batch write, provide streaming write and streaming read Flink: complete DDL & DML, providing more management operations Changelog producer: Lookup, the delay of stream reading each scenario is less than one minute Provide multi table consistent materialized views in real-time Data Integration: Schema Evolution integration, whole database integration. Please give the release a try, share your feedback on the Flink mailing list and contribute to the project!
List of Contributors # The Apache Flink community would like to thank every one of the contributors that have made this release possible:
Feng Wang, Hannankan, Jane Chan, Jia Liu, Jingsong Lee, Jonathan Leitschuh, JunZhang, Kirill Listopad, Liwei Li, MOBIN-F, Nicholas Jiang, Wang Luning, WencongLiu, Yubin Li, gongzhongqiang, houhang1005, liuzhuang2017, openinx, tsreaper, wuyouwuyoulian, zhuangchong, zjureel (shammon), 吴祥平
`}),e.add({id:62,href:"/2023/01/10/apache-flink-kubernetes-operator-1.3.1-release-announcement/",title:"Apache Flink Kubernetes Operator 1.3.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink Kubernetes Operator 1.3 series.
The release contains fixes for several critical issues and some major stability improvements for the application upgrade mechanism.
We highly recommend all users to upgrade to Flink Kubernetes Operator 1.3.1.
Release Notes # Bug # [FLINK-30329] - flink-kubernetes-operator helm chart does not work with dynamic config because of use of volumeMount subPath [FLINK-30361] - Cluster deleted and created back while updating replicas [FLINK-30406] - Jobmanager Deployment error without HA metadata should not lead to unrecoverable error [FLINK-30437] - State incompatibility issue might cause state loss [FLINK-30527] - Last-state suspend followed by flinkVersion change may lead to state loss [FLINK-30528] - Job may be stuck in upgrade loop when last-state fallback is disabled and deployment is missing Improvement # [FLINK-28875] - Add FlinkSessionJobControllerTest [FLINK-30408] - Add unit test for HA metadata check logic Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.3.1 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.3.1 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.3.1/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.3.1/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Gyula Fora, Andrew Otto, Swathi Chandrashekar, Peter Vary
`}),e.add({id:63,href:"/2022/12/14/apache-flink-kubernetes-operator-1.3.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.3.0 Release Announcement",section:"Flink Blog",content:`The Flink community is happy to announce that the latest Flink Kubernetes Operator version went live today. Beyond the regular operator improvements and fixes the 1.3.0 version also integrates better with some popular infrastructure management tools like OLM and Argo CD. These improvements are clear indicators that the original intentions of the Flink community, namely to provide the de facto standard solution for managing Flink applications on Kubernetes is making steady progress to becoming a reality.
Release Highlights # Upgrade to Fabric8 6.x.x and JOSDK 4.x.x Restart unhealthy Flink clusters Contribute the Flink Kubernetes Operator to OperatorHub Publish flink-kubernetes-operator-api module separately Upgrade to Fabric8 6.x.x and JOSDK 4.x.x # Two important framework components were upgraded with the current operator release, the Fabric8 client to v6.2.0 and the JOSDK to v4.1.0. These upgrades among others contain important informer improvements that help lower or completely eliminate the occurrence of certain intermittent issues when the operator looses track of managed Custom Resources.
With the new JOSDK version, the operator now supports leader election and allows users to run standby operator replicas to reduce downtime due to operator failures. Read more about this in the docs.
Restart unhealthy Flink clusters # Flink has its own restart strategies which are working fine in most of the cases, but there are certain circumstances when Flink can be stuck in restart loops often resulting in OutOfMemoryError: Metaspace type of state which the job cannot recover from. If the root cause is just a temporary outage of some external system, for example, the Flink job could be resurrected by simply performing a full restart on the application.
This restart can now be triggered by the operator itself. The operator can watch the actual retry count of a Flink job and restart it when too many restarts occurred in a defined amount of time window, for example:
kubernetes.operator.job.health-check.enabled: false kubernetes.operator.job.restart-check.duration-window: 2m kubernetes.operator.job.restart-check.threshold: 64 Operator is checking the retry count of a job in every defined interval. If there is a count value where the actual job retry count is bigger than the threshold and the timestamp is inside the grace period then the operator initiates a full job restart.
Contribute the Flink Kubernetes Operator to OperatorHub # The Apache Flink Kubernetes Operator has been contributed to OperatorHub.io by the Flink community. The OperatorHub.io aims to be a central location to find a wide array of operators that have been built by the community. An OLM bundle generator ensures that the resources required by OperatorHub.io are automatically derived from Helm charts.
Publish flink-kubernetes-operator-api module separately # With the current operator release the Flink community introduces a more light-weight dependency model for interacting with the Flink Kubernetes Operator programmatically. We have refactored the existing operator modules, and introduced a new module, called flink-kubernetes-operator-api that contains the generated CRD classes and a minimal set of dependencies only to make the operator client as slim as possible.
What’s Next? # “One of the most challenging aspects of running an always-on streaming pipeline is the correct sizing of Flink deployments. … Clearly, it would be desirable to automatically adjust the resources for Flink deployments. This process is referred to as autoscaling.” - The Flink community is planning to propose an operator based vertex autoscaler. See FLIP-271 for further details. Beyond the autoscaler, which is one of the most anticipated features of the operator, the community is continuing to improve the stability, operability, and usability of the Apache Flink Kubernetes Operator on every front.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.3.0 release by adding the Helm chart to your own local registry:
$ helm repo add flink-kubernetes-operator-1.3.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.3.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.3.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Chesnay Schepler, Clara Xiong, Denis Nuțiu, Gabor Somogyi, Gyula Fora, James Busche, Jeesmon Jacob, Marton Balassi, Matyas Orhidi, Maximilian Michels, Sriram Ganesh, Steven Zhang, Thomas Weise, Tony Garrard, Usamah Jassat, Xin Hao, Yaroslav Tkachenko, Zezae Oh, Zhenqiu Huang, Zhiming, clarax, darenwkt, jiangzho, judy.zhu, pvary, ted chang, tison, yangjf2019, zhou-jiang
`}),e.add({id:64,href:"/2022/11/25/optimising-the-throughput-of-async-sinks-using-a-custom-ratelimitingstrategy/",title:"Optimising the throughput of async sinks using a custom RateLimitingStrategy",section:"Flink Blog",content:` Introduction # When designing a Flink data processing job, one of the key concerns is maximising job throughput. Sink throughput is a crucial factor because it can determine the entire job’s throughput. We generally want the highest possible write rate in the sink without overloading the destination. However, since the factors impacting a destination’s performance are variable over the job’s lifetime, the sink needs to adjust its write rate dynamically. Depending on the sink’s destination, it helps to tune the write rate using a different RateLimitingStrategy.
This post explains how you can optimise sink throughput by configuring a custom RateLimitingStrategy on a connector that builds on the AsyncSinkBase (FLIP-171). In the sections below, we cover the design logic behind the AsyncSinkBase and the RateLimitingStrategy, then we take you through two example implementations of rate limiting strategies, specifically the CongestionControlRateLimitingStrategy and TokenBucketRateLimitingStrategy.
Background of the AsyncSinkBase # When implementing the AsyncSinkBase, our goal was to simplify building new async sinks to custom destinations by providing common async sink functionality used with at least once processing. This has allowed users to more easily write sinks to custom destinations, such as Amazon Kinesis Data Streams and Amazon Kinesis Firehose. An additional async sink to Amazon DynamoDB (FLIP-252) is also being developed at the time of writing.
The AsyncSinkBase provides the core implementation which handles the mechanics of async requests and responses. This includes retrying failed messages, deciding when to flush records to the destination, and persisting un-flushed records to state during checkpointing. In order to increase throughput, the async sink also dynamically adjusts the request rate depending on the destination’s responses. Read more about this in our previous 1.15 release blog post or watch our FlinkForward talk recording explaining the design of the Async Sink.
Configuring the AsyncSinkBase # When designing the AsyncSinkBase, we wanted users to be able to tune their custom connector implementations based on their use case and needs, without having to understand the low-level workings of the base sink itself.
So, as part of our initial implementation in Flink 1.15, we exposed configurations such as maxBatchSize, maxInFlightRequests, maxBufferedRequests, maxBatchSizeInBytes, maxTimeInBufferMS and maxRecordSizeInBytes so that users can adapt the flushing and writing behaviour of the sink.
In Flink 1.16, we have further extended this configurability to the RateLimitingStrategy used by the AsyncSinkBase (FLIP-242). With this change, users can now customise how the AsyncSinkBase dynamically adjusts the request rate in real-time to optimise throughput whilst mitigating back pressure. Example customisations include changing the mathematical function used to scale the request rate, implementing a cool off period between rate adjustments, or implementing a token bucket RateLimitingStrategy.
Rationale behind the RateLimitingStrategy interface # public interface RateLimitingStrategy { // Information provided to the RateLimitingStrategy void registerInFlightRequest(RequestInfo requestInfo); void registerCompletedRequest(ResultInfo resultInfo); // Controls offered to the RateLimitingStrategy boolean shouldBlock(RequestInfo requestInfo); int getMaxBatchSize(); } There are 2 core ideas behind the RateLimitingStrategy interface:
Information methods: We need methods to provide the RateLimitingStrategy with sufficient information to track the rate of requests or rate of sent messages (each request can comprise multiple messages) Control methods: We also need methods to allow the RateLimitingStrategy to control the sink’s request rate. These are the type of methods that we see in the RateLimitingStrategy interface. With registerInFlightRequest() and registerCompletedRequest(), the RateLimitingStrategy has sufficient information to track the number in-flight requests and messages, as well as the rate of these requests.
With shouldBlock(), the RateLimitingStrategy can decide to postpone new requests until a specified condition is met (e.g. current in-flight requests must not exceed a given number). This allows the RateLimitingStrategy to control the rate of requests to the destination. It can decide to increase throughput or to increase backpressure in the Flink job graph.
With getMaxBatchSize(), the RateLimitingStrategy can dynamically adjust the number of messages packaged into a single request. This can be useful to optimise sink throughput if the request size affects the destination’s performance.
Implementing a custom RateLimitingStrategy # [Example 1] CongestionControlRateLimitingStrategy # The AsyncSinkBase comes pre-packaged with the CongestionControlRateLimitingStrategy. In this section, we explore its implementation.
This strategy is modelled after TCP congestion control, and aims to discover a destination’s highest possible request rate. It achieves this by increasing the request rate until the sink is throttled by the destination, at which point it will reduce the request rate.
In this RateLimitingStrategy, we want to dynamically adjust the request rate by:
Setting a maximum number of in-flight requests at any time Setting a maximum number of in-flight messages at any time (each request can comprise multiple messages) Increasing the maximum number of in-flight messages after each successful request, to maximise the request rate Decreasing the maximum number of in-flight messages after an unsuccessful request, to prevent overloading the destination Independently keeping track of the maximum number of in-flight messages if there are multiple sink subtasks This strategy means we will start with a low request rate (slow start), but aggressively increase it until the destination throttles us, which allows us to discover the highest possible request rate. It will also adjust the request rate if the conditions of the destination changes (e.g. another client starts writing to the same destination). This strategy works well if destinations implement traffic shaping and throttles once the bandwidth limit is reached (e.g. Amazon Kinesis Data Streams, Amazon Kinesis Data Firehose).
First, we implement the information methods to keep track of the number of in-flight requests and in-flight messages.
public class CongestionControlRateLimitingStrategy implements RateLimitingStrategy { // ... @Override public void registerInFlightRequest(RequestInfo requestInfo) { currentInFlightRequests++; currentInFlightMessages += requestInfo.getBatchSize(); } @Override public void registerCompletedRequest(ResultInfo resultInfo) { currentInFlightRequests = Math.max(0, currentInFlightRequests - 1); currentInFlightMessages = Math.max(0, currentInFlightMessages - resultInfo.getBatchSize()); if (resultInfo.getFailedMessages() > 0) { maxInFlightMessages = scalingStrategy.scaleDown(maxInFlightMessages); } else { maxInFlightMessages = scalingStrategy.scaleUp(maxInFlightMessages); } } // ... } Then we implement the control methods to dynamically adjust the request rate.
We keep a current value for maxInFlightMessages and maxInFlightRequests, and postpone all new requests if maxInFlightRequests or maxInFlightMessages have been reached.
Every time a request completes, the CongestionControlRateLimitingStrategy will check if there are any failed messages in the response. If there are, it will decrease maxInFlightMessages. If there are no failed messages, it will increase maxInFlightMessages. This gives us indirect control of the rate of messages being written to the destination.
Side note: The default CongestionControlRateLimitingStrategy uses an Additive Increase / Multiplicative Decrease (AIMD) scaling strategy. This is also used in TCP congestion control to avoid overloading the destination by increasing write rate slowly but backing off quickly if throttled.
public class CongestionControlRateLimitingStrategy implements RateLimitingStrategy { // ... @Override public void registerCompletedRequest(ResultInfo resultInfo) { // ... if (resultInfo.getFailedMessages() > 0) { maxInFlightMessages = scalingStrategy.scaleDown(maxInFlightMessages); } else { maxInFlightMessages = scalingStrategy.scaleUp(maxInFlightMessages); } } public boolean shouldBlock(RequestInfo requestInfo) { return currentInFlightRequests >= maxInFlightRequests || (currentInFlightMessages + requestInfo.getBatchSize() > maxInFlightMessages); } // ... } [Example 2] TokenBucketRateLimitingStrategy # The CongestionControlRateLimitingStrategy is rather aggressive, and relies on a robust server-side rate limiting strategy. In the event we don’t have a robust server-side rate limiting strategy, we can implement a client-side rate limiting strategy.
As an example, we can look at the token bucket rate limiting strategy. This strategy allows us to set the exact rate of the sink (e.g. requests per second, messages per second). If the limits are set correctly, we will avoid overloading the destination altogether.
In this strategy, we want to do the following:
Implement a TokenBucket that has a given initial number of tokens (e.g. 10). These tokens refill at a given rate (e.g. 1 token per second). When preparing an async request, we check if the token bucket has sufficient tokens. If not, we postpone the request. Let’s look at an example implementation:
public class TokenBucketRateLimitingStrategy implements RateLimitingStrategy { private final Bucket bucket; public TokenBucketRateLimitingStrategy() { Refill refill = Refill.intervally(1, Duration.ofSeconds(1)); Bandwidth limit = Bandwidth.classic(10, refill); this.bucket = Bucket4j.builder() .addLimit(limit) .build(); } // ... (information methods not needed) @Override public boolean shouldBlock(RequestInfo requestInfo) { return bucket.tryConsume(requestInfo.getBatchSize()); } } In the above example, we use the Bucket4j library’s Token Bucket implementation. We also map 1 message to 1 token. Since our token bucket has a size of 10 tokens and a refill rate of 1 token per second, we can be sure that we will not exceed a burst of 10 messages, and will also not exceed a constant rate of 1 message per second.
This would be useful if we know that our destination will failover ungracefully if a rate of 1 message per second is exceeded, or if we intentionally want to limit our sink’s throughput to provide higher bandwidth for other clients writing to the same destination.
Specifying a custom RateLimitingStrategy # To specify a custom RateLimitingStrategy, we have to specify it in the AsyncSinkWriterConfiguration which is passed into the constructor of the AsyncSinkWriter. For example:
class MyCustomSinkWriter<InputT> extends AsyncSinkWriter<InputT, MyCustomRequestEntry> { MyCustomSinkWriter( ElementConverter<InputT, MyCustomRequestEntry> elementConverter, Sink.InitContext context, Collection<BufferedRequestState<MyCustomRequestEntry>> states) { super( elementConverter, context, AsyncSinkWriterConfiguration.builder() // ... .setRateLimitingStrategy(new TokenBucketRateLimitingStrategy()) .build(), states); } } Summary # From Apache Flink 1.16 we can customise the RateLimitingStrategy used to dynamically adjust the behaviour of the Async Sink at runtime. This allows users to tune their connector implementations based on specific use cases and needs, without having to understand the base sink’s low-level workings.
We hope this extension will be useful for you. If you have any feedback, feel free to reach out!
`}),e.add({id:65,href:"/2022/11/10/apache-flink-1.15.3-release-announcement/",title:"Apache Flink 1.15.3 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the third bug fix release of the Flink 1.15 series.
This release includes 59 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.15.3.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.15.3</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.15.3</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.15.3 Release Notes # Bug [FLINK-26726] - Remove the unregistered task from readersAwaitingSplit [FLINK-26890] - DynamoDB consumer error consuming partitions close to retention [FLINK-27384] - In the Hive dimension table, when the data is changed on the original partition, the create_time configuration does not take effect [FLINK-27400] - Pulsar connector subscribed the system topic when using the regex [FLINK-27415] - Read empty csv file throws exception in FileSystem table connector [FLINK-27492] - Flink table scala example does not including the scala-api jars [FLINK-27579] - The param client.timeout can not be set by dynamic properties when stopping the job [FLINK-27611] - ConcurrentModificationException during Flink-Pulsar checkpoint notification [FLINK-27954] - JobVertexFlameGraphHandler does not work on standby Dispatcher [FLINK-28084] - Pulsar unordered reader should disable retry and delete reconsume logic. [FLINK-28265] - Inconsistency in Kubernetes HA service: broken state handle [FLINK-28488] - KafkaMetricWrapper does incorrect cast [FLINK-28609] - Flink-Pulsar connector fails on larger schemas [FLINK-28863] - Snapshot result of RocksDB native savepoint should have empty shared-state [FLINK-28934] - Pulsar Source put all the splits to only one parallelism when using Exclusive subscription [FLINK-28951] - Header in janino generated java files can merge with line numbers [FLINK-28959] - 504 gateway timeout when consume large number of topics using TopicPatten [FLINK-28960] - Pulsar throws java.lang.NoClassDefFoundError: javax/xml/bind/annotation/XmlElement [FLINK-28975] - withIdleness marks all streams from FLIP-27 sources as idle [FLINK-28976] - Changelog 1st materialization delayed unneccesarily [FLINK-29130] - Correct the doc description of state.backend.local-recovery [FLINK-29138] - Project pushdown not work for lookup source [FLINK-29205] - FlinkKinesisConsumer not respecting Credential Provider configuration for EFO [FLINK-29207] - Pulsar message eventTime may be incorrectly set to a negative number [FLINK-29253] - DefaultJobmanagerRunnerRegistry#localCleanupAsync calls close instead of closeAsync [FLINK-29324] - Calling Kinesis connector close method before subtask starts running results in NPE [FLINK-29325] - Fix documentation bug on how to enable batch mode for streaming examples [FLINK-29381] - Key_Shared subscription isn't works in the latest Pulsar connector [FLINK-29395] - [Kinesis][EFO] Issue using EFO consumer at timestamp with empty shard [FLINK-29397] - Race condition in StreamTask can lead to NPE if changelog is disabled [FLINK-29459] - Sink v2 has bugs in supporting legacy v1 implementations with global committer [FLINK-29477] - ClassCastException when collect primitive array to Python [FLINK-29479] - Support whether using system PythonPath for PyFlink jobs [FLINK-29483] - flink python udf arrow in thread model bug [FLINK-29500] - InitializeOnMaster uses wrong parallelism with AdaptiveScheduler [FLINK-29509] - Set correct subtaskId during recovery of committables [FLINK-29512] - Align SubtaskCommittableManager checkpointId with CheckpointCommittableManagerImpl checkpointId during recovery [FLINK-29539] - dnsPolicy in FlinkPod is not overridable [FLINK-29567] - Revert sink output metric names from numRecordsSend back to numRecordsOut [FLINK-29613] - Wrong message size assertion in Pulsar's batch message [FLINK-29627] - Sink - Duplicate key exception during recover more than 1 committable. [FLINK-29645] - BatchExecutionKeyedStateBackend is using incorrect ExecutionConfig when creating serializer [FLINK-29749] - flink info command support dynamic properties [FLINK-29803] - Table API Scala APIs lack proper source jars [FLINK-29827] - [Connector][AsyncSinkWriter] Checkpointed states block writer from sending records [FLINK-29927] - AkkaUtils#getAddress may cause memory leak Improvement [FLINK-24906] - Improve CSV format handling and support [FLINK-28733] - jobmanager.sh should support dynamic properties [FLINK-28909] - Add ribbon filter policy option in RocksDBConfiguredOptions [FLINK-29134] - fetch metrics may cause oom(ThreadPool task pile up) [FLINK-29158] - Fix logging in DefaultCompletedCheckpointStore [FLINK-29223] - Missing info output for when filtering JobGraphs based on their persisted JobResult [FLINK-29255] - FLIP-258 - Enforce binary compatibility in patch releases [FLINK-29476] - Kinesis Connector retry mechanism not applied to EOFException [FLINK-29503] - Add backpressureLevel field without hyphens [FLINK-29504] - Jar upload spec should define a schema `}),e.add({id:66,href:"/2022/10/28/announcing-the-release-of-apache-flink-1.16/",title:"Announcing the Release of Apache Flink 1.16",section:"Flink Blog",content:`Apache Flink continues to grow at a rapid pace and is one of the most active communities in Apache. Flink 1.16 had over 240 contributors enthusiastically participating, with 19 FLIPs and 1100+ issues completed, bringing a lot of exciting features to the community.
Flink has become the leading role and factual standard of stream processing, and the concept of the unification of stream and batch data processing is gradually gaining recognition and is being successfully implemented in more and more companies. Previously, the integrated stream and batch concept placed more emphasis on a unified API and a unified computing framework. This year, based on this, Flink proposed the next development direction of Flink-Streaming Warehouse (Streamhouse), which further upgraded the scope of stream-batch integration: it truly completes not only the unified computation but also unified storage, thus realizing unified real-time analysis.
In 1.16, the Flink community has completed many improvements for both batch and stream processing:
For batch processing, all-round improvements in ease of use, stability and performance have been completed. 1.16 is a milestone version of Flink batch processing and an important step towards maturity. Ease of use: with the introduction of SQL Gateway and full compatibility with Hive Server2, users can submit Flink SQL jobs and Hive SQL jobs very easily, and it is also easy to connect to the original Hive ecosystem. Functionality: Join hints let Flink SQL users manually specify join strategies to avoid unreasonable execution plans. The compatibility of Hive SQL has reached 94%, and users can migrate from Hive to Flink at a very low cost. Stability: Propose a speculative execution mechanism to reduce the long tail sub-tasks of a job and improve the stability. Improve HashJoin and introduce failure rollback mechanism to avoid join failure. Performance: Introduce dynamic partition pruning to reduce the Scan I/O and improve join processing for the star-schema queries. There is 30% improvement in the TPC-DS benchmark. We can use hybrid shuffle mode to improve resource usage and processing performance. For stream processing, there are a number of significant improvements: Changelog State Backend provides users with second or even millisecond checkpoints to dramatically improve the fault tolerance experience, while providing a smaller end-to-end latency experience for transactional Sink jobs. Lookup join is widely used in stream processing. Slow lookup speed, low throughput and delay update are resolved through common cache mechanism, asynchronous io and retriable lookup. These features are very useful, solving the pain points that users often complain about, and supporting richer scenarios. From the first day of the birth of Flink SQL, there were some non-deterministic operations that could cause incorrect results or exceptions, which caused great distress to users. In 1.16, we spent a lot of effort to solve most of the problems, and we will continue to improve in the future. With the further refinement of the integration of stream and batch, and the continuous iteration of the Flink Table Store (0.2 has been released), the Flink community is pushing the Streaming warehouse from concept to reality and maturity step by step.
Understanding Streaming Warehouses # To be precise, a streaming warehouse is to make data warehouse streaming, which allows the data for each layer in the whole warehouse to flow in real-time. The goal is to realize a Streaming Service with end-to-end real-time performance through a unified API and computing framework. Please refer to the article for more details.
Batch processing # Flink is a unified stream batch processing engine, stream processing has become the leading role thanks to our long-term investment. We’re also putting more effort to improve batch processing to make it an excellent computing engine. This makes the overall experience of stream batch unification smoother.
SQL Gateway # The feedback from various channels indicates that SQL Gateway is a highly anticipated feature for users, especially for batch users. This function finally completed (see FLIP-91 for design) in 1.16. SQL Gateway is an extension and enhancement to SQL Client, supporting multi-tenancy and pluggable API protocols (Endpoints), solving the problem that SQL Client can only serve a single user and cannot be integrated with external services or components. Currently SQL Gateway has support for REST API and HiveServer2 Protocol and users can link to SQL Gateway via cURL, Postman, HTTP clients in various programming languages to submit stream jobs, batch jobs and even OLAP jobs. For HiveServer2 Endpoint, please refer to the Hive Compatibility for more details.
Hive Compatibility # To reduce the cost of migrating Hive to Flink, we introduce HiveServer2 Endpoint and Hive Syntax Improvements in this version:
The HiveServer2 Endpoint allows users to interact with SQL Gateway with Hive JDBC/Beeline and migrate with Flink into the Hive ecosystem (DBeaver, Apache Superset, Apache DolphinScheduler, and Apache Zeppelin). When users connect to the HiveServer2 endpoint, the SQL Gateway registers the Hive Catalog, switches to Hive Dialect, and uses batch execution mode to execute jobs. With these steps, users can have the same experience as HiveServer2.
Hive syntax is already the factual standard for big data processing. Flink has improved compatibility with Hive syntax and added support for several Hive syntaxes commonly used in production. Hive syntax compatibility can help users migrate existing Hive SQL tasks to Flink, and it is convenient for users who are familiar with Hive syntax to use Hive syntax to write SQL to query tables registered in Flink. The compatibility is measured using the Hive qtest suite which contains more than 12K SQL cases. Until now, for Hive 2.3, the compatibility has reached 94.1% for whole hive queries, and has reached 97.3% if ACID queries are excluded.
Join Hints for Flink SQL # The join hint is a common solution in the industry to improve the shortcomings of the optimizer by manually modifying the execution plans. Join is the most widely used operator in batch jobs, and Flink supports a variety of join strategies. Missing statistics or a poor cost model of the optimizer can lead to the wrong choice of a join strategy, which will cause slow execution or even job failure. By specifying a Join Hint, the optimizer will choose the join strategy specified by the user whenever possible. It could avoid various shortcomings of the optimizer and ensure the production availability of the batch job.
Adaptive Hash Join # For batch jobs, the hash join operators may fail if the input data is severely skewed, it’s a very bad experience for users. To solve this, we introduce adaptive Hash-Join which can automatically fall back to Sort-Merge-Join once the Hash-Join fails in runtime. This mechanism ensures that the Hash-Join is always successful and improves the stability by gracefully degrading from a Hash-Join to a more robust Sort-Merge-Join.
Speculative Execution for Batch Job # Speculative execution is introduced in Flink 1.16 to mitigate batch job slowness which is caused by problematic nodes. A problematic node may have hardware problems, accident I/O busy, or high CPU load. These problems may make the hosted tasks run much slower than tasks on other nodes, and affect the overall execution time of a batch job.
When speculative execution is enabled, Flink will keep detecting slow tasks. Once slow tasks are detected, the nodes that the slow tasks locate in will be identified as problematic nodes and get blocked via the blocklist mechanism (FLIP-224). The scheduler will create new attempts for the slow tasks and deploy them to nodes that are not blocked, while the existing attempts will keep running. The new attempts process the same input data and produce the same data as the original attempt. Once any attempt finishes first, it will be admitted as the only finished attempt of the task, and the remaining attempts of the task will be canceled.
Most existing sources can work with speculative execution (FLIP-245). Only if a source uses SourceEvent, it must implement SupportsHandleExecutionAttemptSourceEvent interface to support speculative execution. Sinks do not support speculative execution yet so that speculative execution will not happen on sinks at the moment.
The Web UI & REST API are also improved (FLIP-249) to display multiple concurrent attempts of tasks and blocked task managers.
Hybrid Shuffle Mode # We have introduced a new Hybrid Shuffle Mode for batch executions. It combines the advantages of blocking shuffle and pipelined shuffle (in streaming mode).
Like blocking shuffle, it does not require upstream and downstream tasks to run simultaneously, which allows executing a job with little resources. Like pipelined shuffle, it does not require downstream tasks to be executed after upstream tasks finish, which reduces the overall execution time of the job when given sufficient resources. It adapts to custom preferences between persisting less data and restarting less tasks on failures, by providing different spilling strategies. Note: This feature is experimental and by default not activated.
Further improvements of blocking shuffle # We further improved the blocking shuffle usability and performance in this version, including adaptive network buffer allocation, sequential IO optimization and result partition reuse which allows multiple consumer vertices reusing the same physical result partition to reduce disk IO and storage space. These optimizations can achieve an overall 7% performance gain for the TPC-DS test with a scale of 10 TB. In addition, two more compression algorithms (LZO and ZSTD) with higher compression ratio were introduced which can further reduce the storage space with some CPU cost compared to the default LZ4 compression algorithm.
Dynamic Partition Pruning # For batch jobs, partitioned tables are more widely used than non-partitioned tables in production environments. Currently Flink has support for static partition pruning, where the optimizer pushes down the partition field related filter conditions in the WHERE clause into the Source Connector during the optimization phase, thus reducing unnecessary partition scan IO. The star-schema is the simplest of the most commonly used data mart patterns. We have found that many user jobs cannot use static partition pruning because partition pruning information can only be determined in execution, which requires dynamic partition pruning techniques, where partition pruning information is collected at runtime based on data from other related tables, thus reducing unnecessary partition scan IO for partitioned tables. The use of dynamic partition pruning has been validated with the 10 TB dataset TPC-DS to improve performance by up to 30%.
Stream Processing # In 1.16, we have made improvements in Checkpoints, SQL, Connector and other fields, so that stream computing can continue to lead.
Generalized incremental checkpoint # Changelog state backend aims at making checkpoint intervals shorter and more predictable, this release is prod-ready and is dedicated to adapting changelog state backend to the existing state backends and improving the usability of changelog state backend:
Support state migration Support local recovery Introduce file cache to optimize restoring Support switch based on checkpoint Improve the monitoring experience of changelog state backend expose changelog’s metrics expose changelog’s configuration to webUI Table 1: The comparison between Changelog Enabled / Changelog Disabled on value state (see this blog for more details)
{:class=“table table-bordered”}
Percentile End to End Duration Checkpointed Data Size* Full Checkpoint Data Size* 50% 311ms / 5s 14.8MB / 3.05GB 24.2GB / 18.5GB 90% 664ms / 6s 23.5MB / 4.52GB 25.2GB / 19.3GB 99% 1s / 7s 36.6MB / 5.19GB 25.6GB / 19.6GB 99.9% 1s / 10s 52.8MB / 6.49GB 25.7GB / 19.8GB RocksDB rescaling improvement & rescaling benchmark # Rescaling is a frequent operation for cloud services built on Apache Flink, this release leverages deleteRange to optimize the rescaling of Incremental RocksDB state backend. deleteRange is used to avoid massive scan-and-delete operations, for upscaling with a large number of states that need to be deleted, the speed of restoring can be almost increased by 2~10 times.
Improve monitoring experience and usability of state backend # This release also improves the monitoring experience and usability of state backend. Previously, RocksDB’s log was located in its own DB folder, which made debugging RocksDB not so easy. This release lets RocksDB’s log stay in Flink’s log directory by default. RocksDB statistics-based metrics are introduced to help debug the performance at the database level, e.g. total block cache hit/miss count within the DB.
Support overdraft buffer # A new concept of overdraft network buffer is introduced to mitigate the effects of uninterruptible blocking a subtask thread during back pressure, which can be turned on through the taskmanager.network.memory.max-overdraft-buffers-per-gate configuration parameter.
Starting from 1.16.0, Flink subtask can request by default up to 5 extra (overdraft) buffers over the regular configured amount. This change can slightly increase memory consumption of the Flink Job but vastly reduce the checkpoint duration of the unaligned checkpoint. If the subtask is back pressured by downstream subtasks and the subtask requires more than a single network buffer to finish what it’s currently doing, overdraft buffer comes into play. Read more about this in the documentation.
Timeout aligned to unaligned checkpoint barrier in the output buffers of an upstream subtask # This release updates the timing of switching from Aligned Checkpoint (AC) to Unaligned Checkpoint (UC). With UC enabled, if execution.checkpointing.aligned-checkpoint-timeout is configured, each checkpoint will still begin as an AC, but when the global checkpoint duration exceeds the aligned-checkpoint-timeout, if the AC has not been completed, then the checkpoint will be switched to unaligned.
Previously, the switch of one subtask needs to wait for all barriers from upstream. If the back pressure is severe, the downstream subtask may not receive all barriers within checkpointing-timeout, causing the checkpoint to fail.
In this release, if the barrier cannot be sent from the output buffer to the downstream task within the execution.checkpointing.aligned-checkpoint-timeout, Flink lets upstream subtasks switch to UC first to send barriers to downstream, thereby decreasing the probability of checkpoint timeout during back pressure. More details can be found in this documentation.
Non-Determinism In Stream Processing # Users often complain about the high cost of understanding stream processing. One of the pain points is the non-determinisim in stream processing (usually not intuitive) which may cause wrong results or errors. These pain points have been around since the early days of Flink SQL.
For complex streaming jobs, now it’s possible to detect and resolve potential correctness issues before running. If the problems can’t be resolved completely, a detailed message could prompt users to adjust the SQL so as to avoid introducing non-deterministic problems. More details can be found in the documentation.
Enhanced Lookup Join # Lookup join is widely used in stream processing, and we have introduced several improvements:
Adds a unified abstraction for lookup source cache and related metrics to speed up lookup queries Introduces the configurable asynchronous mode (ALLOW_UNORDERED) via job configuration or lookup hint to significantly improve query throughput without compromising correctness. Retryable lookup mechanism gives users more tools to solve the delayed updates issue in external systems. Retry Support For Async I/O # Introduces a built-in retry mechanism for asynchronous I/O that is transparent to the user’s existing code, allowing flexibility to meet the user’s retry and exception handling needs.
PyFlink # In Flink 1.15, we have introduced a new execution mode ’thread’ mode in which the user-defined Python functions will be executed in the JVM via JNI instead of in a separate Python process. However, it’s only supported for Python scalar functions in the Table API & SQL in Flink 1.15. In this release, we have provided more comprehensive support for it. It has also been supported in the Python DataStream API and also Python table functions in the Table API & SQL.
We are also continuing to fill in the last few missing features in Python API. In this release, we have provided more comprehensive support for Python DataStream API and supported features such as side output, broadcast state, etc and have also finalized the windowing support. We have also added support for more connectors and formats in the Python DataStream API, e.g. added support for connectors elasticsearch, kinesis, pulsar, hybrid source, etc and formats orc, parquet, etc. With all these features added, the Python API should have aligned most notable features in the Java & Scala API and users should be able to develop most kinds of Flink jobs using Python language smoothly.
Others # New SQL Syntax # In 1.16, we extend more DDL syntaxes which could help users to better use SQL:
USING JAR supports dynamic loading of UDF jar to help platform developers to easily manage UDF. CREATE TABLE AS SELECT (CTAS) supports users to create new tables based on existing tables and queries. ANALYZE TABLE supports users to manually generate table statistics so that the optimizer could generate better execution plans. Cache in DataStream for Interactive Programming # Supports caching the result of a transformation via DataStream#cache. The cached intermediate result is generated lazily at the first time the intermediate result is computed so that the result can be reused by later jobs. If the cache is lost, it will be recomputed using the original transformations. Currently only batch mode is supported. This feature is very useful for ML and interactive programming in Python.
History Server & Completed Jobs Information Enhancement # We have enhanced the experiences of viewing completed jobs’ information in this release.
JobManager / HistoryServer WebUI now provides detailed execution time metrics, including duration tasks spent in each execution state and the accumulated busy / idle / back-pressured time during running. JobManager / HistoryServer WebUI now provides aggregation of major SubTask metrics, grouped by Task or TaskManager. JobManager / HistoryServer WebUI now provides more environmental information, including environment variables, JVM options and classpath. HistoryServer now supports browsing logs from external log archiving services. Protobuf format # Flink now supports the Protocol Buffers (Protobuf) format. This allows you to use this format directly in your Table API or SQL applications.
Introduce configurable RateLimitingStrategy for Async Sink # The Async Sink was implemented in 1.15 to allow users to easily implement their own custom asynchronous sinks. We have now extended it to support a configurable RateLimitingStrategy. This means sink implementers can now customize how their Async Sink behaves when requests fail, depending on the specific sink. If no RateLimitingStrategy is specified, it will default to the current default of AIMDScalingStrategy.
Upgrade Notes # We aim to make upgrades as smooth as possible, but some of the changes require users to adjust some parts of the program when upgrading to Apache Flink 1.16. Please take a look at the release notes for a list of adjustments to make and issues to check during upgrades.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
1996fanrui, Ada Wang, Ada Wong, Ahmed Hamdy, Aitozi, Alexander Fedulov, Alexander Preuß, Alexander Trushev, Andriy Redko, Anton Kalashnikov, Arvid Heise, Ben Augarten, Benchao Li, BiGsuw, Biao Geng, Bobby Richard, Brayno, CPS794, Cheng Pan, Chengkai Yang, Chesnay Schepler, Danny Cranmer, David N Perkins, Dawid Wysakowicz, Dian Fu, DingGeGe, EchoLee5, Etienne Chauchot, Fabian Paul, Ferenc Csaky, Francesco Guardiani, Gabor Somogyi, Gen Luo, Gyula Fora, Haizhou Zhao, Hangxiang Yu, Hao Wang, Hong Liang Teoh, Hong Teoh, Hongbo Miao, HuangXingBo, Ingo Bürk, Jacky Lau, Jane Chan, Jark Wu, Jay Li, Jia Liu, Jie Wang, Jin, Jing Ge, Jing Zhang, Jingsong Lee, Jinhu Wu, Joe Moser, Joey Pereira, Jun He, JunRuiLee, Juntao Hu, JustDoDT, Kai Chen, Krzysztof Chmielewski, Krzysztof Dziolak, Kyle Dong, LeoZhang, Levani Kokhreidze, Lihe Ma, Lijie Wang, Liu Jiangang, Luning Wang, Marios Trivyzas, Martijn Visser, MartijnVisser, Mason Chen, Matthias Pohl, Metehan Yıldırım, Michael, Mingde Peng, Mingliang Liu, Mulavar, Márton Balassi, Nie yingping, Niklas Semmler, Paul Lam, Paul Lin, Paul Zhang, PengYuan, Piotr Nowojski, Qingsheng Ren, Qishang Zhong, Ran Tao, Robert Metzger, Roc Marshal, Roman Boyko, Roman Khachatryan, Ron, Ron Cohen, Ruanshubin, Rudi Kershaw, Rufus Refactor, Ryan Skraba, Sebastian Mattheis, Sergey, Sergey Nuyanzin, Shengkai, Shubham Bansal, SmirAlex, Smirnov Alexander, SteNicholas, Steven van Rossum, Suhan Mao, Tan Yuxin, Tartarus0zm, TennyZhuang, Terry Wang, Thesharing, Thomas Weise, Timo Walther, Tom, Tony Wei, Weijie Guo, Wencong Liu, WencongLiu, Xintong Song, Xuyang, Yangze Guo, Yi Tang, Yu Chen, Yuan Huang, Yubin Li, Yufan Sheng, Yufei Zhang, Yun Gao, Yun Tang, Yuxin Tan, Zakelly, Zhanghao Chen, Zhu Zhu, Zichen Liu, Zili Sun, acquachen, bgeng777, billyrrr, bzhao, caoyu, chenlei677, chenzihao, chenzihao5, coderap, cphe, davidliu, dependabot[bot], dkkb, dusukang, empcl, eyys, fanrui, fengjiankun, fengli, fredia, gabor.g.somogyi, godfreyhe, gongzhongqiang, harker2015, hongli, huangxingbo, huweihua, jayce, jaydonzhou, jiabao.sun, kevin.cyj, kurt, lidefu, lijiewang.wlj, liliwei, lincoln lee, lincoln.lil, littleeleventhwolf, liufangqi, liujia10, liujiangang, liujingmao, liuyongvs, liuzhuang2017, longwang, lovewin99, luoyuxia, mans2singh, maosuhan, mayue.fight, mayuehappy, nieyingping, pengmide, pengmingde, polaris6, pvary, qinjunjerry, realdengziqi, root, shammon, shihong90, shuiqiangchen, slinkydeveloper, snailHumming, snuyanzin, suxinglee, sxnan, tison, trushev, tsreaper, unknown, wangfeifan, wangyang0918, wangzhiwu, wenbingshen, xiangqiao123, xuyang, yangjf2019, yangjunhan, yangsanity, yangxin, ylchou, yuchengxin, yunfengzhou-hub, yuxia Luo, yuzelin, zhangchaoming, zhangjingcun, zhangmang, zhangzhengqi3, zhaoweinan, zhengyunhong.zyh, zhenyu xing, zhouli, zhuanshenbsj1, zhuzhu.zz, zoucao, zp, 周磊, 饶紫轩,, 鲍健昕 愚鲤, 帝国阿三
`}),e.add({id:67,href:"/2022/10/13/apache-flink-table-store-0.2.1-release-announcement/",title:"Apache Flink Table Store 0.2.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink Table Store 0.2 series.
This release includes 13 bug fixes, vulnerability fixes, and minor improvements for Flink Table Store 0.2. Below you will find a list of all bugfixes and improvements. For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink Table Store 0.2.1.
Release Artifacts # Binaries # You can find the binaries on the updated Downloads page.
Release Notes # Bug [FLINK-29098] - StoreWriteOperator#prepareCommit should let logSinkFunction flush first before fetching offset [FLINK-29241] - Can not overwrite from empty input [FLINK-29273] - Page not enough Exception in SortBufferMemTable [FLINK-29278] - BINARY type is not supported in table store [FLINK-29295] - Clear RecordWriter slower to avoid causing frequent compaction conflicts [FLINK-29367] - Avoid manifest corruption for incorrect checkpoint recovery [FLINK-29369] - Commit delete file failure due to Checkpoint aborted [FLINK-29385] - AddColumn in flink table store should check the duplicate field names [FLINK-29412] - Connection leak in orc reader Improvement [FLINK-29154] - Support LookupTableSource for table store [FLINK-29181] - log.system can be congiured by dynamic options [FLINK-29226] - Throw exception for streaming insert overwrite [FLINK-29276] - Flush all memory in SortBufferMemTable.clear `}),e.add({id:68,href:"/2022/10/07/apache-flink-kubernetes-operator-1.2.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.2.0 Release Announcement",section:"Flink Blog",content:`We are proud to announce the latest stable release of the operator. The 1.2.0 release adds support for the Standalone Kubernetes deployment mode and includes several improvements to the core logic.
Release Highlights # Standalone deployment mode support Improved upgrade flow Readiness and liveness probes Flexible job jar handling Standalone deployment mode support # Until now the operator relied exclusively on Flink’s built-in Native Kubernetes integration to deploy and manage Flink clusters. When using the Native deployment mode the Flink cluster communicates directly with Kubernetes to allocate/deallocate TaskManager resources on the fly. While this leads to a very simple deployment model, in some environments it also means higher security exposure as the user code running on the Flink cluster may gain the same Kubernetes access privileges.
Flink Kubernetes Operator 1.2.0 brings Standalone mode support for FlinkDeployment resources.
When using the standalone mode, the operator itself sets up the Job and TaskManager resources for the Flink cluster. Flink processes then run without any need for Kubernetes access. In fact in this mode the Flink cluster itself is unaware that it is running in a Kubernetes environment. If unknown or external code is being executed on the Flink cluster then Standalone mode adds another layer of security.
The default deployment mode is Native. Native deployment mode remains the recommended mode for standard operator use and when running your own Flink jobs.
The deployment mode can be set using the mode field in the deployment spec.
apiVersion: flink.apache.org/v1beta1 kind: FlinkDeployment ... spec: ... mode: native/standalone Improved upgrade flow # There have been a number of important changes that improve the job submission and upgrade flow. The operator now distinguishes configuration & spec changes that do not require the redeployment of the Flink cluster resources (such as setting a new periodic savepoint interval). These improvements now avoid unnecessary job downtime in many cases.
Leveraging the standalone deployment mode the operator now also supports rescaling jobs directly using Flink’s reactive scheduler. When changing the parallelism of an application FlinkDeployment with mode: standalone set and scheduler-mode: reactive in the flinkConfiguration the operator will simply increase the number of TaskManagers to match the new parallelism and let Flink do the scaling automatically (reactively). Same as with the reactive scaling itself, this is considered to be an experimental feature.
There are also some important fixes to problems that might occur when switching between Flink versions or using the stateless upgrade mode.
Readiness and Liveness probes # From an operational perspective it is very important to be able to determine the health of the Kubernetes Operator process. The operator now exposes a health endpoint by default together with a liveness and readiness probe.
Flexible job jar handling # The 1.2.0 release now makes the jobSpec.jarURI parameter optional to allow users to run jobs using dependencies that are already bundled in the Flink classpath.
This can be especially valuable in session deployments when multiple jobs, reusing the same artifacts, are deployed with different configurations.
Release Resources # The source artifacts and helm chart are now available on the updated Downloads page of the Flink website.
$ helm repo add flink-kubernetes-operator-1.2.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.2.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.2.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on [Dockerhub](https://hub.docker.com/r/apache/flink-kubernetes-operator). For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Aitozi, Avocadomaster, ConradJam, Dylan Meissner, Gabor Somogyi, Gaurav Miglani, Gyula Fora, Jeesmon Jacob, Joao Ubaldo, Marton Balassi, Matyas Orhidi, Maximilian Michels, Nicholas Jiang, Peter Huang, Robson Roberto Souza Peixoto, Thomas Weise, Tim, Usamah Jassat, Xin Hao, Yaroslav Tkachenko
`}),e.add({id:69,href:"/2022/09/28/apache-flink-1.14.6-release-announcement/",title:"Apache Flink 1.14.6 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce another bug fix release for Flink 1.14.
This release includes 34 bug fixes, vulnerability fixes and minor improvements for Flink 1.14. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.14.6.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.14.6</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.14.6</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.14.6</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.14.6 Release Notes # Bug [FLINK-24862] - The user-defined hive udaf/udtf cannot be used normally in hive dialect [FLINK-25454] - Negative time in throughput calculator [FLINK-27041] - KafkaSource in batch mode failing if any topic partition is empty [FLINK-27399] - Pulsar connector didn't set start consuming position correctly [FLINK-27418] - Flink SQL TopN result is wrong [FLINK-27683] - Insert into (column1, column2) Values(.....) fails with SQL hints [FLINK-27762] - Kafka WakeupException during handling splits changes [FLINK-28019] - Error in RetractableTopNFunction when retracting a stale record with state ttl enabled [FLINK-28057] - LD_PRELOAD is hardcoded to x64 on flink-docker [FLINK-28357] - Watermark issue when recovering Finished sources [FLINK-28454] - Fix the wrong timestamp example of KafkaSource [FLINK-28609] - Flink-Pulsar connector fails on larger schemas [FLINK-28880] - Fix CEP doc with wrong result of strict contiguity of looping patterns [FLINK-28908] - Coder for LIST type is incorrectly chosen is PyFlink [FLINK-28978] - Kinesis connector doesn't work for new AWS regions [FLINK-29130] - Correct the doc description of state.backend.local-recovery [FLINK-29138] - Project pushdown not work for lookup source Improvement [FLINK-27865] - Add guide and example for configuring SASL and SSL in Kafka SQL connector document [FLINK-28094] - Upgrade AWS SDK to support ap-southeast-3 `}),e.add({id:70,href:"/2022/09/08/regarding-akkas-licensing-change/",title:"Regarding Akka's licensing change",section:"Flink Blog",content:`On September 7th Lightbend announced a license change for the Akka project, the TL;DR being that you will need a commercial license to use future versions of Akka (2.7+) in production if you exceed a certain revenue threshold.
Within a few hours of the announcement several people reached out to the Flink project, worrying about the impact this has on Flink, as we use Akka internally.
The purpose of this blogpost is to clarify our position on the matter.
Please be aware that this topic is still quite fresh, and things are subject to change.
Should anything significant change we will amend this blogpost and inform you via the usual channels.
Give me the short version # Flink is not in any immediate danger and we will ensure that users are not affected by this change.
The licensing of Flink will not change; it will stay Apache-licensed and will only contain dependencies that are compatible with it.
We will not use Akka versions with the new license.
What’s the plan going forward? # For now, we’ll stay on Akka 2.6, the current latest version that is still available under the original license. Historically Akka has been incredibly stable, and combined with our limited use of features, we do not expect this to be a problem.
Meanwhile, we will
observe how the situation unfolds (in particular w.r.t. community forks) look into a replacement for Akka. Should a community fork be created (which at this time seems possible) we will switch to that fork in all likely-hood for 1.15+.
What if a new security vulnerabilities is found in Akka 2.6? # That is the big unknown.
Even though we will be able to upgrade to 2.6.20 (the (apparently) last planned release for Akka 2.6) in Flink 1.17, the unfortunate reality is that 2.6 will no longer be supported from that point onwards.
Should a CVE be discovered after that it is unlikely to be fixed in Akka 2.6.
We cannot provide a definitive answer as to how that case would be handled, as it depends on what the CVE is and/or whether a community fork already exists at the time.
Update - September 9th: Akka 2.6 will continue to receive critical security updates and critical bug fixes under the current Apache 2 license until September of 2023.
Will critical vulnerabilities and bugs be patched in 2.6.x?
Yes, critical security updates and critical bugs will be patched in Akka v2.6.x under the current Apache 2 license until September of 2023.
How does Flink use Akka? # Akka is used in the coordination layer of Flink to
exchange status messages between processes/components (e.g., JobManager and TaskManager), enforce certain guarantees w.r.t. multi-threading (i.e., only one thread can make changes to the internal state of a component) observe components for unexpected crashes (i.e., notice and handle TaskManager thread crashes). What this means is that we are using very few functionalities of Akka.
Additionally, that we use Akka is an implementation detail that the vast majority of Flink code isn’t aware of, meaning that we can replace it with something else without having to change Flink significantly.
`}),e.add({id:71,href:"/2022/08/29/apache-flink-table-store-0.2.0-release-announcement/",title:"Apache Flink Table Store 0.2.0 Release Announcement",section:"Flink Blog",content:` The Apache Flink community is pleased to announce the release of the Apache Flink Table Store (0.2.0).
Please check out the full documentation for detailed information and user guides.
What is Flink Table Store # Flink Table Store is a data lake storage for streaming updates/deletes changelog ingestion and high-performance queries in real time.
As a new type of updatable data lake, Flink Table Store has the following features:
Large throughput data ingestion while offering good query performance. High performance query with primary key filters, as fast as 100ms. Streaming reads are available on Lake Storage, lake storage can also be integrated with Kafka to provide second-level streaming reads. Notable Features # In this release, we have accomplished many exciting features.
Catalog # This release introduces Table Store’s own catalog and supports automatic synchronization to the Hive Metastore.
CREATE CATALOG tablestore WITH ( 'type'='table-store', 'warehouse'='hdfs://nn:8020/warehouse/path', -- optional hive metastore 'metastore'='hive', 'uri'='thrift://<hive-metastore-host-name>:<port>' ); USE CATALOG tablestore; CREATE TABLE my_table ... Ecosystem # In this release, we provide support for Flink 1.14 and provide read support for multiple compute engines.
Engine Version Feature Read Pushdown Flink 1.14 read, write Projection, Filter Flink 1.15 read, write Projection, Filter Hive 2.3 read Projection, Filter Spark 2.4 read Projection, Filter Spark 3.0 read Projection, Filter Spark 3.1 read Projection, Filter Spark 3.2 read Projection, Filter Spark 3.3 read Projection, Filter Trino 358 read Projection, Filter Trino 388 read Projection, Filter Append-only # The append-only table feature is a performance improvement and only accepts INSERT_ONLY data to append to the storage instead of updating or de-duplicating the existing data, and hence suitable for use cases that do not require updates (such as log data synchronization).
CREATE TABLE my_table ( ... ) WITH ( 'write-mode' = 'append-only', ... ) Streaming writing to an Append-only table also has asynchronous compaction, so you don’t have to worry about small files.
Rescale Bucket # Since the number of total buckets dramatically influences the performance, Table Store allows users to tune bucket numbers by ALTER TABLE command and reorganize necessary partitions, the old partitions remain unchanged.
Getting started # Please refer to the getting started guide for more details.
What’s Next? # In the upcoming 0.3.0 release you can expect the following additional features:
Streaming Changelog Concurrent Writes, the separation of Compaction. Aggregation Table, to build your materialized view. Changelog producing for partial-update/aggregation Tables. Full Schema Evolution supports for drop column and rename column. Lookup Supports for Flink Dim Join. Please give the release a try, share your feedback on the Flink mailing list and contribute to the project!
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # The Apache Flink community would like to thank every one of the contributors that have made this release possible:
Jane Chan, Jia Liu, Jingsong Lee, liliwei, Nicholas Jiang, openinx, tsreaper
`}),e.add({id:72,href:"/2022/08/24/apache-flink-1.15.2-release-announcement/",title:"Apache Flink 1.15.2 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the second bug fix release of the Flink 1.15 series.
This release includes 30 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.15.2.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.15.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.15.2</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.15.2</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.15.2 Upgrade Notes # For Table API: 1.15.0 and 1.15.1 generated non-deterministic UIDs for operators that make it difficult/impossible to restore state or upgrade to next patch version. A new table.exec.uid.generation config option (with correct default behavior) disables setting a UID for new pipelines from non-compiled plans. Existing pipelines can set table.exec.uid.generation=ALWAYS if the 1.15.0/1 behavior was acceptable due to a stable environment. See FLINK-28861 for more information.
Release Notes # Bug [FLINK-23528] - stop-with-savepoint can fail with FlinkKinesisConsumer [FLINK-25097] - Bug in inner join when the filter condition is boolean type [FLINK-26931] - Pulsar sink's producer name should be unique [FLINK-27399] - Pulsar connector didn't set start consuming position correctly [FLINK-27570] - Checkpoint path error does not cause the job to stop [FLINK-27794] - The primary key obtained from MySQL is incorrect by using MysqlCatalog [FLINK-27856] - Adding pod template without spec crashes job manager [FLINK-28027] - Initialise Async Sink maximum number of in flight messages to low number for rate limiting strategy [FLINK-28057] - LD_PRELOAD is hardcoded to x64 on flink-docker [FLINK-28226] - 'Run kubernetes pyflink application test' fails while pulling image [FLINK-28239] - Table-Planner-Loader lacks access to commons-math3 [FLINK-28240] - NettyShuffleMetricFactory#RequestedMemoryUsageMetric#getValue may throw ArithmeticException when the total segments of NetworkBufferPool is 0 [FLINK-28250] - exactly-once sink kafka cause out of memory [FLINK-28269] - Kubernetes test failed with permission denied [FLINK-28322] - DataStreamScanProvider's new method is not compatible [FLINK-28357] - Watermark issue when recovering Finished sources [FLINK-28404] - Annotation @InjectClusterClient does not work correctly with RestClusterClient [FLINK-28454] - Fix the wrong timestamp example of KafkaSource [FLINK-28577] - 1.15.1 web ui console report error about checkpoint size [FLINK-28602] - StateChangeFsUploader cannot close stream normally while enabling compression [FLINK-28817] - NullPointerException in HybridSource when restoring from checkpoint [FLINK-28835] - Savepoint and checkpoint capabilities and limitations table is incorrect [FLINK-28861] - Non-deterministic UID generation might cause issues during restore [FLINK-28880] - Fix CEP doc with wrong result of strict contiguity of looping patterns [FLINK-28908] - Coder for LIST type is incorrectly chosen is PyFlink [FLINK-28978] - Kinesis connector doesn't work for new AWS regions [FLINK-28994] - Enable withCredentials for Flink UI Improvement [FLINK-27199] - Bump Pulsar to 2.10.0 for fixing the unstable Pulsar test environment. [FLINK-27865] - Add guide and example for configuring SASL and SSL in Kafka SQL connector document [FLINK-28094] - Upgrade AWS SDK to support ap-southeast-3 [FLINK-28140] - Improve the documentation by adding Python examples [FLINK-28486] - [docs-zh] Flink FileSystem SQL Connector Doc is not right `}),e.add({id:73,href:"/2022/07/25/apache-flink-kubernetes-operator-1.1.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.1.0 Release Announcement",section:"Flink Blog",content:`The community has continued to work hard on improving the Flink Kubernetes Operator capabilities since our first production ready release we launched about two months ago.
With the release of Flink Kubernetes Operator 1.1.0 we are proud to announce a number of exciting new features improving the overall experience of managing Flink resources and the operator itself in production environments.
Release Highlights # A non-exhaustive list of some of the more exciting features added in the release:
Kubernetes Events on application and job state changes New operator metrics Unified and more robust reconciliation flow Periodic savepoints Custom Flink Resource Listeners Dynamic watched namespaces New built-in examples For Flink SQL and PyFlink Experimental autoscaling support Kubernetes Events for Application and Job State Changes # The operator now emits native Kubernetes Events on relevant Flink Deployment and Job changes. This includes status changes, custom resource specification changes, deployment failures, etc.
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Submit 53m JobManagerDeployment Starting deployment Normal StatusChanged 52m Job Job status changed from RECONCILING to CREATED Normal StatusChanged 52m Job Job status changed from CREATED to RUNNING New Operator Metrics # The first version of the operator only came with basic system level metrics to monitor the JVM process.
In 1.1.0 we have introduced a wide range of additional metrics related to lifecycle-management, Kubernetes API server access and the Java Operator SDK framework the operator itself is built on. These metrics allow operator administrators to get a comprehensive view of what’s happening in the environment.
For details check the list of supported metrics.
Unified and more robust reconciliation flow # We have refactored and streamlined the core reconciliation flow responsible for executing and tracking resource upgrades, savepoints, rollbacks and other operations.
In the process we made a number of important improvements to tolerate operator failures and temporary Kubernetes API outages more gracefully, which is critical in production environments.
Periodic Savepoints # By popular demand we have introduced periodic savepoints for applications and session jobs using a the following simple configuration option:
flinkConfiguration: ... kubernetes.operator.periodic.savepoint.interval: 6h Old savepoints are cleaned up automatically according to the user configured policy:
kubernetes.operator.savepoint.history.max.count: 5 kubernetes.operator.savepoint.history.max.age: 48h Custom Flink Resource Listeners # The operator allows users to listen to events and status updates triggered for the Flink Resources managed by the operator.
This feature enables tighter integration with the user’s own data platform. By implementing the FlinkResourceListener interface users can listen to both events and status updates per resource type (FlinkDeployment / FlinkSessionJob). The interface methods will be called after the respective events have been triggered by the system.
New SQL and Python Job Examples # To demonstrate the power of the operator for all Flink use-cases, we have added examples showcasing how to deploy Flink SQL and Python jobs.
We have also added a brief README for the examples to make it easier for you to find what you are looking for.
Dynamic watched namespaces # The operator can watch and manage custom resources in an arbitrary list of namespaces. The watched namespaces can be defined through the property kubernetes.operator.watched.namespaces: ns1,ns2. The list of watched namespaces can be changed anytime in the corresponding config map, however the operator ignores the changes unless dynamic watched namespaces is enabled.
This is controlled by the property kubernetes.operator.dynamic.namespaces.enabled: true.
Experimental autoscaling support # In this version we have taken the first steps toward enabling Kubernetes native autoscaling integration for the operator. The FlinkDeployment CRD now exposes the scale subresource which allows us to create HPA policies directly in Kubernetes that will monitor the task manager pods.
This integration is still very much experimental but we are planning to build on top of this in the upcoming releases to provide a reliable scaling mechanism.
You can find an example scaling policy here.
What’s Next? # In the coming months, our focus will be on the following key areas:
Standalone deployment mode support Hardening of rollback mechanism and stability conditions Scaling improvements Support for older Flink versions These features will allow the operator and users to benefit more from the recent advancements in Flink’s scheduling capabilities.
Upgrading to 1.1.0 # The new 1.1.0 release is backward compatible as long as you follow our operator upgrade quide.
Please ensure that CRDs are updated in order to enable some of the new features.
The upgrade should not impact any currently deployed Flink resources.
Release Resources # The source artifacts and helm chart are available on the Downloads page of the Flink website. You can easily try out the new features shipped in the official 1.1.0 release by following our quickstart guide.
You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # Aitozi, Biao Geng, Chethan, ConradJam, Dora Marsal, Gyula Fora, Hao Xin, Hector Miuler Malpica Gallegos, Jaganathan Asokan, Jeesmon Jacob, Jim Busche, Maksim Aniskov, Marton Balassi, Matyas Orhidi, Nicholas Jiang, Peng Yuan, Peter Vary, Thomas Weise, Xin Hao, Yang Wang
`}),e.add({id:74,href:"/2022/07/12/apache-flink-ml-2.1.0-release-announcement/",title:"Apache Flink ML 2.1.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink ML 2.1.0! This release focuses on improving Flink ML’s infrastructure, such as Python SDK, memory management, and benchmark framework, to facilitate the development of performant, memory-safe, and easy-to-use algorithm libraries. We validated the enhanced infrastructure by implementing, benchmarking, and optimizing 10 new algorithms in Flink ML, and confirmed that Flink ML can meet or exceed the performance of selected algorithms from alternative popular ML libraries. In addition, this release added example Python and Java programs for each algorithm in the library to help users learn and use Flink ML.
With the improvements and performance benchmarks made in this release, we believe Flink ML’s infrastructure is ready for use by the interested developers in the community to build performant pythonic machine learning libraries.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Notable Features # API and Infrastructure # Supporting fine-grained per-operator memory management # Before this release, algorithm operators with internal states (e.g. the training data to be replayed for each round of iteration) store state data using the state-backend API (e.g. ListState). Such an operator either needs to store all data in memory, which risks OOM, or it needs to always store data on disk. In the latter case, it needs to read and de-serialize all data from disks repeatedly in each round of iteration even if the data can fit in RAM, leading to sub-optimal performance when the training data size is small. This makes it hard for developers to write performant and memory-safe operators.
This release enhances the Flink ML infrastructure with the mechanism to specify the amount of managed memory that an operator can consume. This allows algorithm operators to write and read data from managed memory when the data size is below the quota, and automatically spill those data that exceeds the memory quota to disks to avoid OOM. Algorithm developers can use this mechanism to achieve optimal algorithm performance as input data size varies. Please feel free to check out the implementation of the KMeans operator for example.
Improved infrastructure for developing online learning algorithms # A key objective of Flink ML is to facilitate the development of online learning applications. In the last release, we enhanced the Flink ML API with setModelData() and getModelData(), which allows users of online learning algorithms to transmit and persist model data as unbounded data streams. This release continues the effort by improving and validating the infrastructure needed to develop online learning algorithms.
Specifically, this release added two online learning algorithm prototypes (i.e. OnlineKMeans and OnlineLogisticRegression) with tests covering the entire lifecycle of using these algorithms. These two algorithms introduce concepts such as global batch size and model version, together with metrics and APIs to set and get those values. While the online algorithm prototypes have not been optimized for prediction accuracy yet, this line of work is an important step toward setting up best practices for building online learning algorithms in Flink ML. We hope more contributors from the community can join this effort.
Algorithm benchmark framework # An easy-to-use benchmark framework is critical to developing and maintaining performant algorithm libraries in Flink ML. This release added a benchmark framework that provides APIs to write pluggable and reusable data generators, takes benchmark configuration in JSON format, and outputs benchmark results in JSON format to enable custom analysis. An off-the-shelf script is provided to visualize benchmark results using Matplotlib. Feel free to check out this README for instructions on how to use this benchmark framework.
The benchmark framework currently supports evaluating algorithm throughput. In the future release, we plan to support evaluating algorithm latency and accuracy.
Python SDK # This release enhances the Python SDK so that operators in the Flink ML Python library can invoke the corresponding operators in the Java library. The Python operator is a thin-wrapper around the Java operator and delivers the same performance as the Java operator during execution. This capability significantly improves developer velocity by allowing algorithm developers to maintain both the Python and the Java libraries of algorithms without having to implement those algorithms twice.
Algorithm Library # This release continues to extend the algorithm library in Flink ML, with the focus on validating the functionalities and the performance of Flink ML infrastructure using representative algorithms in different categories.
Below are the lists of algorithms newly supported in this release, grouped by their categories:
Feature engineering (MinMaxScaler, StringIndexer, VectorAssembler, StandardScaler, Bucketizer) Online learning (OnlineKmeans, OnlineLogisiticRegression) Regression (LinearRegression) Classification (LinearSVC) Evaluation (BinaryClassificationEvaluator) Example Python and Java programs for these algorithms are provided on the Apache Flink ML website to help users learn and try out Flink ML. And we also provided example benchmark configuration files in the repo for users to validate Flink ML performance. Feel free to check out this README for instructions on how to run those benchmarks.
Upgrade Notes # Please review this note for a list of adjustments to make and issues to check if you plan to upgrade to Flink ML 2.1.0.
This note discusses any critical information about incompatibilities and breaking changes, performance changes, and any other changes that might impact your production deployment of Flink ML.
Flink dependency is changed from 1.14 to 1.15.
This change introduces all the breaking changes listed in the Flink 1.15 release notes.
Release Notes and Resources # Please take a look at the release notes for a detailed list of changes and new features.
The source artifacts is now available on the updated Downloads page of the Flink website, and the most recent distribution of Flink ML Python package is available on PyPI.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
Yunfeng Zhou, Zhipeng Zhang, huangxingbo, weibo, Dong Lin, Yun Gao, Jingsong Li and mumuhhh.
`}),e.add({id:75,href:"/2022/07/11/flip-147-support-checkpoints-after-tasks-finished-part-one/",title:"FLIP-147: Support Checkpoints After Tasks Finished - Part One",section:"Flink Blog",content:` Motivation # Flink is a distributed processing engine for both unbounded and bounded streams of data. In recent versions, Flink has unified the DataStream API and the Table / SQL API to support both streaming and batch cases. Since most users require both types of data processing pipelines, the unification helps reduce the complexity of developing, operating, and maintaining consistency between streaming and batch backfilling jobs, like the case for Alibaba.
Flink provides two execution modes under the unified programming API: the streaming mode and the batch mode. The streaming mode processes records incrementally based on the states, thus it supports both bounded and unbounded sources. The batch mode works with bounded sources and usually has a better performance for bounded jobs because it executes all the tasks in topological order and avoids random state access by pre-sorting the input records. Although batch mode is often the preferred mode to process bounded jobs, streaming mode is also required for various reasons. For example, users may want to deal with records containing retraction or exploit the property that data is roughly sorted by event times in streaming mode (like the case in Kappa+ Architecture). Moreover, users often have mixed jobs involving both unbounded streams and bounded side-inputs, which also require streaming execution mode.
Figure 1. A comparison of the Streaming mode and Batch mode for the example Count operator. For streaming mode, the arrived elements are not sorted, the operator would read / write the state corresponding to the element for computation. For batch mode, the arrived elements are first sorted as a whole and then processed. In streaming mode, checkpointing is the vital mechanism in supporting exactly-once guarantees. By periodically snapshotting the aligned states of operators, Flink can recover from the latest checkpoint and continue execution when failover happens. However, previously Flink could not take checkpoints if any task gets finished. This would cause problems for jobs with both bounded and unbounded sources: if there are no checkpoints after the bounded part finished, the unbounded part might need to reprocess a large amount of records in case of a failure.
Furthermore, being unable to take checkpoints with finished tasks is a problem for jobs using two-phase-commit sinks to achieve end-to-end exactly-once processing. The two-phase-commit sinks first write data to temporary files or external transactions, and commit the data only after a checkpoint completes to ensure the data would not be replayed on failure. However, if a job contains bounded sources, committing the results would not be possible after the bounded sources finish. Also because of that, for bounded jobs we have no way to commit the last piece of data after the first source task finished, and previously the bounded jobs just ignore the uncommitted data when finishing. These behaviors caused a lot of confusion and are always asked in the user mailing list.
Therefore, to complete the support of streaming mode for jobs using bounded sources, it is important for us to
Support taking checkpoints with finished tasks. Furthermore, revise the process of finishing so that all the data could always be committed. The remaining blog briefly describes the changes we made to achieve the above goals. In the next blog, we’ll share more details on how they are implemented.
Support Checkpointing with Finished Tasks # The core idea of supporting checkpoints with finished tasks is to mark the finished operators in checkpoints and skip executing these operators after recovery. As illustrated in Figure 2, a checkpoint is composed of the states of all the operators. If all the subtasks of an operator have finished, we could mark it as fully finished and skip the execution of this operator on startup. For other operators, their states are composed of the states of all the running subtasks. The states will be repartitioned on restarting and all the new subtasks restarted with the assigned states.
Figure 2. An illustration of the extended checkpoint format. To support creating such a checkpoint for jobs with finished tasks, we extended the checkpoint procedure. Previously the checkpoint coordinator inside the JobManager first notifies all the sources to report snapshots, then all the sources further notify their descendants via broadcasting barrier events. Since now the sources might have already finished, the checkpoint coordinator would instead treat the running tasks who also do not have running precedent tasks as “new sources”, and it notifies these tasks to initiate the checkpoints. Finally, if the subtasks of an operator are either finished on triggering checkpoint or have finished processing all the data on snapshotting states, the operator would be marked as fully finished.
The changes of the checkpoint procedure are transparent to users except that for checkpoints indeed containing finished tasks, we disallowed adding new operators as precedents of the fully finished ones, since it would make the fully finished operators have running precedents after restarting, which conflicts with the design that tasks finished in topological order.
Revise the Process of Finishing # Based on the ability to take checkpoints with finished tasks, we could then solve the issue that two-phase-commit operators could not commit all the data when running in streaming mode. As the background, Flink jobs have two ways to finish:
All sources are bound and they processed all the input records. The job will finish after all the input records are processed and all the result are committed to external systems. Users execute stop-with-savepoint [--drain]. The job will take a savepoint and then finish. With –-drain, the job will be stopped permanently and is also required to commit all the data. However, without --drain the job might be resumed from the savepoint later, thus not all data are required to be committed, as long as the state of the data could be recovered from the savepoint. Let’s first have a look at the case of bounded sources. To achieve end-to-end exactly-once, two-phase-commit operators only commit data after a checkpoint following this piece of data succeeded. However, previously there is no such an opportunity for the data between the last periodic checkpoint and job getting finished, and the data finally gets lost. Note that it is also not correct if we directly commit the data on job finished, since if there are failovers after that (like due to other unfinished tasks getting failed), the data will be replayed and cause duplication.
The case of stop-with-savepoint --drain also has problems. The previous implementation first stalls the execution and takes a savepoint. After the savepoint succeeds, all the source tasks would stop actively. Although the savepoint seems to provide the opportunity to commit all the data, some processing logic is in fact executed during the job getting stopped, and the records produced would be discarded by mistake. For example, calling endInput() method for operators happens during the stopping phase, some operators like the async operator might still emit new records in this method.
At last, although stop-with-savepoint without draining is not required to commit all the data, we hope the job finish process could be unified for all the cases to keep the code clean.
To fix the remaining issues, we need to modify the process of finishing to ensure all the data getting committed for the required cases. An intuitive idea is to directly insert a step to the tasks’ lifecycle to wait for the next checkpoint, as shown in the left part of Figure 3. However, it could not solve all the issues.
Figure 3. A comparison of the two options to ensure tasks committed all the data before getting finished. The first option directly inserts a step in the tasks’ lifecycle to wait for the next checkpoint, which disallows the tasks to wait for the same checkpoint / savepoint. The second option decouples the notification of finishing operator logic and finishing tasks, thus it allows all the tasks to first process all records, then they have the chance to wait for the same checkpoint / savepoint. For the case of bounded sources, the intuitive idea works, but it might have performance issues in some cases: as exemplified in Figure 4, If there are multiple cascading tasks containing two-phase commit sinks, each task would wait for the next checkpoint separately, thus the job needs to wait for three more checkpoints during finishing, which might prolong the total execution time for a long time.
Figure 4. An example job that contains a chain of tasks containing two-phase-commit operators. For the case of stop-with-savepoint [--drain], the intuitive idea does not work since different tasks have to wait for different checkpoints / savepoints, thus we could not finish the job with a specific savepoint.
Therefore, we do not take the intuitive option. Instead, we decoupled “finishing operator logic” and “finishing tasks”: all the tasks would first finish their execution logic as a whole, including calling lifecycle methods like endInput(), then each task could wait for the next checkpoint concurrently. Besides, for stop-with-savepoint we also reverted the current implementation similarly: all the tasks will first finish executing the operators’ logic, then they simply wait for the next savepoint to happen before finish. Therefore, in this way the finishing processes are unified and the data could be fully committed for all the cases.
Based on this thought, as shown in the right part of Figure 3, to decoupled the process of “finishing operator logic” and “finishing tasks”, we introduced a new EndOfData event. For each task, after executing all the operator logic it would first notify the descendants with an EndOfData event so that the descendants also have chances to finish executing the operator logic. Then all the tasks could wait for the next checkpoint or the specified savepoint concurrently to commit all the remaining data before getting finished.
At last, it is also worthy to mention we have clarified and renamed the close() and dispose() methods in the operators’ lifecycle. The two methods are in fact different since close() is only called when the task finishes normally and dispose() is called in both cases of normal finishing and failover. However, this was not clear from their names. Therefore, we rename the two methods to finish() and close():
finish() marks the termination of the operator and no more records are allowed after finish() is called. It should only be called when sources are finished or when the -–drain parameter is specified. close() is used to do cleanup and release all the held resources. Conclusion # By supporting the checkpoints after tasks finished and revising the process of finishing, we can support checkpoints for jobs with both bounded and unbounded sources, and ensure the bounded job gets all output records committed before it finishes. The motivation behind this change is to ensure data consistency, results completeness, and failure recovery if there are bounded sources in the pipeline. The final checkpoint mechanism was first implemented in Flink 1.14 and enabled by default in Flink 1.15. If you have any questions, please feel free to start a discussion or report an issue in the dev or user mailing list.
`}),e.add({id:76,href:"/2022/07/11/flip-147-support-checkpoints-after-tasks-finished-part-two/",title:"FLIP-147: Support Checkpoints After Tasks Finished - Part Two",section:"Flink Blog",content:`In the first part of this blog, we have briefly introduced the work to support checkpoints after tasks get finished and revised the process of finishing. In this part we will present more details on the implementation, including how we support checkpoints with finished tasks and the revised protocol of the finish process.
Implementation of support Checkpointing with Finished Tasks # As described in part one, to support checkpoints after some tasks are finished, the core idea is to mark the finished operators in checkpoints and skip executing these operators after recovery. To implement this idea, we enhanced the checkpointing procedure to generate the flag and use the flag on recovery. This section presents more details on the process of taking checkpoints with finished tasks and recovery from such checkpoints.
Previously, checkpointing only worked when all tasks were running. As shown in the Figure 1, in this case the checkpoint coordinator first notify all the source tasks, and then the source tasks further notify the downstream tasks to take snapshots via barrier events. Similarly, if there are finished tasks, we need to find the new “source” tasks to initiate the checkpoint, namely those tasks that are still running but have no running precedent tasks. CheckpointCoordinator does the computation atomically at the JobManager side based on the latest states recorded in the execution graph.
There might be race conditions when triggering tasks: when the checkpoint coordinator decides to trigger one task and starts emitting the RPC, it is possible that the task is just finished and reporting the FINISHED status to JobManager. In this case, the RPC message would fail and the checkpoint would be aborted.
Figure 1. The tasks chosen as the new sources when taking checkpoint with finished tasks. The principle is to choose the running tasks whose precedent tasks are all finished. In order to keep track of the finish status of each operator, we need to extend the checkpoint format. A checkpoint consists of the states of all the stateful operators, and the state of one operator consists of the entries from all its parallel instances. Note that the concept of Task is not reflected in the checkpoint. Task is more of a physical execution container that drives the behavior of operators. It is not well-defined across multiple executions of the same job since job upgrades might modify the operators contained in one task. Therefore, the finished status should also be attached to the operators.
As shown in the Figure 2, operators could be classified into three types according to their finished status:
Fully finished: If all the instances of an operator are finished, we could view the logic of the operators as fully executed and we should skip the execution of the operator after recovery. We need to store a special flag for this kind of operator. Partially finished: If only some instances of an operator are finished, then we still need to continue executing the remaining logic of this operator. As a whole we could view the state of the operator as the set of entries collected from all the running instances, which represents the remaining workload for this operator. No finished instances: In this case, the state of the operator is the same as the one taken when no tasks are finished. Figure 2. An illustration of the extended checkpoint format. If the job is later restored from a checkpoint taken with finished tasks, we would skip executing all the logic for fully finished operators, and execute normally for the operators with no finished instances.
However, this would be a bit complex for the partially finished operators. The state of partially finished operators would be redistributed to all the instances, similar to rescaling when the parallelism is changed. Among all the types of states that Flink offers, the keyed state and operator state with even-split redistribution would work normally, but the broadcast state and operator state with union redistribution would be affected for the following reasons:
The broadcast state always replicates the state of the first subtask to the other subtasks. If the first subtask is finished, an empty state would be distributed and the operator would run from scratch, which is not correct. The operator state with union distribution merges the states of all the subtasks and then sends the merged state to all the subtasks. Based on this behavior, some operators may choose one subtask to store a shared value and after restarting this value will be distributed to all the subtasks. However, if this chosen task is finished, the state would be lost. These two issues would not occur when rescaling since there would be no finished tasks in that scenario. To address these issues, we chose one of the running subtasks instead to acquire the current state for the broadcast state. For the operator state with union redistribution, we have to collect the states of all the subtasks to maintain the semantics. Thus, currently we abort the checkpoint if parts of subtasks finished for operators using this kind of state.
In principle, you should be able to modify your job (which changes the dataflow graph) and restore from a previous checkpoint. That said, there are certain graph modifications that are not supported. These kinds of changes include adding a new operator as the precedent of a fully finished one. Flink would check for such modifications and throw exceptions while restoring.
The Revised Process of Finishing # As described in the part one, based on the ability to take checkpoints with finished tasks, we revised the process of finishing so that we could always commit all the data for two-phase-commit sinks. We’ll show the detailed protocol of the finished process in this section.
How did Jobs in Flink Finish Before? # A job might finish in two ways: all sources finish or users execute stop-with-savepoint [--drain]. Let’s first have a look at the detailed process of finishing before FLIP-147.
When sources finish # If all the sources are bounded, The job will finish after all the input records are processed and all the result are committed to external systems. In this case, the sources would first emit a MAX_WATERMARK (Long.MAX_VALUE) and then start to terminate the task. On termination, a task would call endOfInput(), close() and dispose() for all the operators, then emit an EndOfPartitionEvent to the downstream tasks. The intermediate tasks would start terminating after receiving an EndOfPartitionEvent from all the input channels, and this process will continue until the last task is finished.
1. Source operators emit MAX_WATERMARK 2. On received MAX_WATERMARK for non-source operators a. Trigger all the event-time timers b. Emit MAX_WATERMARK 3. Source tasks finished a. endInput(inputId) for all the operators b. close() for all the operators c. dispose() for all the operators d. Emit EndOfPartitionEvent e. Task cleanup 4. On received EndOfPartitionEvent for non-source tasks a. endInput(int inputId) for all the operators b. close() for all the operators c. dispose() for all the operators d. Emit EndOfPartitionEvent e. Task cleanup When users execute stop-with-savepoint [–drain] # Users could execute the command stop-with-savepoint [–drain] for both bounded and unbounded jobs to trigger jobs to finish. In this case, Flink first triggers a synchronous savepoint and all the tasks would stall after seeing the synchronous savepoint. If the savepoint succeeds, all the source operators would finish actively and the job would finish the same as the above scenario.
1. Trigger a savepoint 2. Sources received savepoint trigger RPC a. If with –-drain i. source operators emit MAX_WATERMARK b. Source emits savepoint barrier 3. On received MAX_WATERMARK for non-source operators a. Trigger all the event times b. Emit MAX_WATERMARK 4. On received savepoint barrier for non-source operators a. The task blocks till the savepoint succeed 5. Finish the source tasks actively a. If with –-drain ii. endInput(inputId) for all the operators b. close() for all the operators c. dispose() for all the operators d. Emit EndOfPartitionEvent e. Task cleanup 6. On received EndOfPartitionEvent for non-source tasks a. If with –-drain i. endInput(int inputId) for all the operators b. close() for all the operators c. dispose() for all the operators d. Emit EndOfPartitionEvent e. Task cleanup A parameter –-drain is supported with stop-with-savepoint: if not specified, the job is expected to resume from this savepoint, otherwise the job is expected to terminate permanently. Thus we only emit MAX_WATERMARK to trigger all the event timers and call endInput() in the latter case.
Revise the Finishing Steps # As described in part one, after revising the process of finishing, we have decoupled the process of “finishing operator logic” and “finishing task” by introducing a new EndOfData event. After the revision each task will first notify the descendants with an EndOfData event after executing all the logic so that the descendants also have chances to finish executing the operator logic, then all the tasks could wait for the next checkpoint or the specified savepoint concurrently to commit all the remaining data. This section will present the detailed protocol of the revised process. Since we have renamed close() /dispose() to finish() / close(), we’ll stick to the new terminologies in the following description.
The revised process of finishing is shown as follows:
1. Source tasks finished due to no more records or stop-with-savepoint. a. if no more records or stop-with-savepoint –-drain i. source operators emit MAX_WATERMARK ii. endInput(inputId) for all the operators iii. finish() for all the operators iv. emit EndOfData[isDrain = true] event b. else if stop-with-savepoint i. emit EndOfData[isDrain = false] event c. Wait for the next checkpoint / the savepoint after operator finished complete d. close() for all the operators e. Emit EndOfPartitionEvent f. Task cleanup 2. On received MAX_WATERMARK for non-source operators a. Trigger all the event times b. Emit MAX_WATERMARK 3. On received EndOfData for non-source tasks a. If isDrain i. endInput(int inputId) for all the operators ii. finish() for all the operators b. Emit EndOfData[isDrain = the flag value of the received event] 4. On received EndOfPartitionEvent for non-source tasks a. Wait for the next checkpoint / the savepoint after operator finished complete b. close() for all the operators c. Emit EndOfPartitionEvent d. Task cleanup Figure 3. An example job of the revised process of finishing. An example of the process of job finishing is shown in Figure 3.
Let’s first have a look at the example that all the source tasks are bounded. If Task C finishes after processing all the records, it first emits the max-watermark, then finishes the operators and emits the EndOfData event. After that, it waits for the next checkpoint to complete and then emits the EndOfPartitionEvent.
Task D finishes all the operators right after receiving the EndOfData event. Since any checkpoints taken after operators finish can commit all the pending records and be the final checkpoint, Task D’s final checkpoint would be the same as Task C’s since the barrier must be emitted after the EndOfData event.
Task E is a bit different in that it has two inputs. Task A might continue to run for a while and, thus, Task E needs to wait until it receives an EndOfData event also from the other input before finishing operators and its final checkpoint might be different.
On the other hand, when using stop-with-savepoint [--drain], the process is similar except that all the tasks need to wait for the exact savepoint before finishing instead of just any checkpoints. Moreover, since both Task C and Task A would finish at the same time, Task E would also be able to wait for this particular savepoint before finishing.
Conclusion # In this part we have presented more details of how the checkpoints are taken with finished tasks and the revised process of finishing. We hope the details could provide more insights of the thoughts and implementations for this part of work. Still, if you have any questions, please feel free to start a discussion or report an issue in the dev or user mailing list.
`}),e.add({id:77,href:"/2022/07/06/apache-flink-1.15.1-release-announcement/",title:"Apache Flink 1.15.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the first bug fix release of the Flink 1.15 series.
This release includes 62 bug fixes, vulnerability fixes, and minor improvements for Flink 1.15. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users upgrade to Flink 1.15.1.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.15.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java</artifactId> <version>1.15.1</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients</artifactId> <version>1.15.1</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.15.1 Release Notes # The community is aware of 3 issues that were introduced with 1.15.0 that remain unresolved. Efforts are underway to fix these issues for Flink 1.15.2:
[FLINK-28861] - Non-deterministic UID generation might cause issues during restore for Table/SQL API [FLINK-28060] - Kafka commit on checkpointing fails repeatedly after a broker restart [FLINK-28322] - DataStreamScanProvider's new method is not compatible Bug [FLINK-22984] - UnsupportedOperationException when using Python UDF to generate watermark [FLINK-24491] - ExecutionGraphInfo may not be archived when the dispatcher terminates [FLINK-24735] - SQL client crashes with \`Cannot add expression of different type to set\` [FLINK-26645] - Pulsar Source subscribe to a single topic partition will consume all partitions from that topic [FLINK-27041] - KafkaSource in batch mode failing if any topic partition is empty [FLINK-27140] - Move JobResultStore dirty entry creation into ioExecutor [FLINK-27174] - Non-null check for bootstrapServers field is incorrect in KafkaSink [FLINK-27218] - Serializer in OperatorState has not been updated when new Serializers are NOT incompatible [FLINK-27223] - State access doesn't work as expected when cache size is set to 0 [FLINK-27247] - ScalarOperatorGens.numericCasting is not compatible with legacy behavior [FLINK-27255] - Flink-avro does not support serialization and deserialization of avro schema longer than 65535 characters [FLINK-27282] - Fix the bug of wrong positions mapping in RowCoder [FLINK-27367] - SQL CAST between INT and DATE is broken [FLINK-27368] - SQL CAST(' 1 ' as BIGINT) returns wrong result [FLINK-27409] - Cleanup stale slot allocation record when the resource requirement of a job is empty [FLINK-27418] - Flink SQL TopN result is wrong [FLINK-27420] - Suspended SlotManager fails to re-register metrics when started again [FLINK-27465] - AvroRowDeserializationSchema.convertToTimestamp fails with negative nano seconds [FLINK-27487] - KafkaMetricWrappers do incorrect cast [FLINK-27545] - Update examples in PyFlink shell [FLINK-27563] - Resource Providers - Yarn doc page has minor display error [FLINK-27606] - CompileException when using UDAF with merge() method [FLINK-27676] - Output records from on_timer are behind the triggering watermark in PyFlink [FLINK-27683] - Insert into (column1, column2) Values(.....) fails with SQL hints [FLINK-27711] - Correct the typo of set_topics_pattern by changing it to set_topic_pattern for Pulsar Connector [FLINK-27733] - Rework on_timer output behind watermark bug fix [FLINK-27734] - Not showing checkpoint interval properly in WebUI when checkpoint is disabled [FLINK-27760] - NPE is thrown when executing PyFlink jobs in batch mode [FLINK-27762] - Kafka WakeupException during handling splits changes [FLINK-27797] - PythonTableUtils.getCollectionInputFormat cannot correctly handle None values [FLINK-27848] - ZooKeeperLeaderElectionDriver keeps writing leader information, using up zxid [FLINK-27881] - The key(String) in PulsarMessageBuilder returns null [FLINK-27890] - SideOutputExample.java fails [FLINK-27910] - FileSink not enforcing rolling policy if started from scratch [FLINK-27933] - Savepoint status cannot be queried from standby jobmanager [FLINK-27955] - PyFlink installation failure on Windows OS [FLINK-27999] - NoSuchMethodError when using Hive 3 dialect [FLINK-28018] - the start index to create empty splits in BinaryInputFormat#createInputSplits is inappropriate [FLINK-28019] - Error in RetractableTopNFunction when retracting a stale record with state ttl enabled [FLINK-28114] - The path of the Python client interpreter could not point to an archive file in distributed file system Improvement [FLINK-24586] - SQL functions should return STRING instead of VARCHAR(2000) [FLINK-26788] - AbstractDeserializationSchema should add cause when throwing a FlinkRuntimeException [FLINK-26909] - Allow setting parallelism to -1 from CLI [FLINK-27064] - Centralize ArchUnit rules for production code [FLINK-27480] - KafkaSources sharing the groupId might lead to InstanceAlreadyExistException warning [FLINK-27534] - Apply scalafmt to 1.15 branch [FLINK-27776] - Throw exception when UDAF used in sliding window does not implement merge method in PyFlink [FLINK-27935] - Add Pyflink example of create temporary view document Technical Debt [FLINK-25694] - Upgrade Presto to resolve GSON/Alluxio Vulnerability Sub-task [FLINK-26052] - Update chinese documentation regarding FLIP-203 [FLINK-26588] - Translate the new SQL CAST documentation to Chinese [FLINK-27382] - Make Job mode wait with cluster shutdown until the cleanup is done `}),e.add({id:78,href:"/2022/06/22/apache-flink-1.14.5-release-announcement/",title:"Apache Flink 1.14.5 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce another bug fix release for Flink 1.14.
This release includes 67 bugs, vulnerability fixes and minor improvements for Flink 1.14. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.14.5.
Release Artifacts # Maven Dependencies # <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.14.5</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.14.5</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-clients_2.11</artifactId> <version>1.14.5</version> </dependency> Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.14.5 Release Notes # Sub-task [FLINK-25800] - Update wrong links in the datastream/execution_mode.md page. Bug [FLINK-22984] - UnsupportedOperationException when using Python UDF to generate watermark [FLINK-24491] - ExecutionGraphInfo may not be archived when the dispatcher terminates [FLINK-25227] - Comparing the equality of the same (boxed) numeric values returns false [FLINK-25440] - Apache Pulsar Connector Document description error about 'Starting Position'. [FLINK-25904] - NullArgumentException when accessing checkpoint stats on standby JobManager [FLINK-26016] - FileSystemLookupFunction does not produce correct results when hive table uses columnar storage [FLINK-26018] - Unnecessary late events when using the new KafkaSource [FLINK-26049] - The tolerable-failed-checkpoints logic is invalid when checkpoint trigger failed [FLINK-26285] - ZooKeeperStateHandleStore does not handle not existing nodes properly in getAllAndLock [FLINK-26334] - When timestamp - offset + windowSize < 0, elements cannot be assigned to the correct window [FLINK-26381] - Wrong document order of Chinese version [FLINK-26395] - The description of RAND_INTEGER is wrong in SQL function documents [FLINK-26504] - Fix the incorrect type error in unbounded Python UDAF [FLINK-26536] - PyFlink RemoteKeyedStateBackend#merge_namespaces bug [FLINK-26543] - Fix the issue that exceptions generated in startup are missed in Python loopback mode [FLINK-26550] - Correct the information of checkpoint failure [FLINK-26607] - There are multiple MAX_LONG_VALUE value errors in pyflink code [FLINK-26629] - Error in code comment for SubtaskStateMapper.RANGE [FLINK-26645] - Pulsar Source subscribe to a single topic partition will consume all partitions from that topic [FLINK-26708] - TimestampsAndWatermarksOperator should not propagate WatermarkStatus [FLINK-26738] - Default value of StateDescriptor is valid when enable state ttl config [FLINK-26775] - PyFlink WindowOperator#process_element register wrong cleanup timer [FLINK-26846] - Gauge metrics doesn't work in PyFlink [FLINK-26855] - ImportError: cannot import name 'environmentfilter' from 'jinja2' [FLINK-26920] - Job executes failed with "The configured managed memory fraction for Python worker process must be within (0, 1], was: %s." [FLINK-27108] - State cache clean up doesn't work as expected [FLINK-27174] - Non-null check for bootstrapServers field is incorrect in KafkaSink [FLINK-27223] - State access doesn't work as expected when cache size is set to 0 [FLINK-27255] - Flink-avro does not support serialization and deserialization of avro schema longer than 65535 characters [FLINK-27315] - Fix the demo of MemoryStateBackendMigration [FLINK-27409] - Cleanup stale slot allocation record when the resource requirement of a job is empty [FLINK-27442] - Module flink-sql-avro-confluent-registry does not configure Confluent repo [FLINK-27545] - Update examples in PyFlink shell [FLINK-27676] - Output records from on_timer are behind the triggering watermark in PyFlink [FLINK-27733] - Rework on_timer output behind watermark bug fix [FLINK-27751] - Dependency resolution from repository.jboss.org fails on CI [FLINK-27760] - NPE is thrown when executing PyFlink jobs in batch mode New Feature [FLINK-26382] - Add Chinese documents for flink-training exercises Improvement [FLINK-5151] - Add discussion about object mutations to heap-based state backend docs. [FLINK-23843] - Exceptions during "SplitEnumeratorContext.runInCoordinatorThread()" should cause Global Failure instead of Process Kill [FLINK-24274] - Wrong parameter order in documentation of State Processor API [FLINK-24384] - Count checkpoints failed in trigger phase into numberOfFailedCheckpoints [FLINK-26130] - Document why and when user would like to increase network buffer size [FLINK-26575] - Improve the info message when restoring keyed state backend [FLINK-26650] - Avoid to print stack trace for checkpoint trigger failure if not all tasks are started [FLINK-26788] - AbstractDeserializationSchema should add cause when thow a FlinkRuntimeException [FLINK-27088] - The example of using StringDeserializer for deserializing Kafka message value as string has an error [FLINK-27480] - KafkaSources sharing the groupId might lead to InstanceAlreadyExistException warning [FLINK-27776] - Throws exception when udaf used in sliding window does not implement merge method in PyFlink Technical Debt [FLINK-25694] - Upgrade Presto to resolve GSON/Alluxio Vulnerability [FLINK-26352] - Missing license header in WebUI source files [FLINK-26961] - Update multiple Jackson dependencies to v2.13.2 and v2.13.2.1 `}),e.add({id:79,href:"/2022/06/17/adaptive-batch-scheduler-automatically-decide-parallelism-of-flink-batch-jobs/",title:"Adaptive Batch Scheduler: Automatically Decide Parallelism of Flink Batch Jobs",section:"Flink Blog",content:` Introduction # Deciding proper parallelisms of operators is not an easy work for many users. For batch jobs, a small parallelism may result in long execution time and big failover regression. While an unnecessary large parallelism may result in resource waste and more overhead cost in task deployment and network shuffling.
To decide a proper parallelism, one needs to know how much data each operator needs to process. However, It can be hard to predict data volume to be processed by a job because it can be different everyday. And it can be harder or even impossible (due to complex operators or UDFs) to predict data volume to be processed by each operator.
To solve this problem, we introduced the adaptive batch scheduler in Flink 1.15. The adaptive batch scheduler can automatically decide parallelism of an operator according to the size of its consumed datasets. Here are the benefits the adaptive batch scheduler can bring:
Batch job users can be relieved from parallelism tuning. Parallelism tuning is fine grained considering different operators. This is particularly beneficial for SQL jobs which can only be set with a global parallelism previously. Parallelism tuning can better fit consumed datasets which have a varying volume size every day. Get Started # To automatically decide parallelism of operators, you need to:
Configure to use adaptive batch scheduler. Set the parallelism of operators to -1. Configure to use adaptive batch scheduler # To use adaptive batch scheduler, you need to set configurations as below:
Set jobmanager.scheduler: AdaptiveBatch. Leave the execution.batch-shuffle-mode unset or explicitly set it to ALL-EXCHANGES-BLOCKING (default value). Currently, the adaptive batch scheduler only supports batch jobs whose shuffle mode is ALL-EXCHANGES-BLOCKING. In addition, there are several related configuration options to control the upper bounds and lower bounds of tuned parallelisms, to specify expected data volume to process by each operator, and to specify the default parallelism of sources. More details can be found in the feature documentation page.
Set the parallelism of operators to -1 # The adaptive batch scheduler only automatically decides parallelism of operators whose parallelism is not set (which means the parallelism is -1). To leave parallelism unset, you should configure as follows:
Set parallelism.default: -1 for all jobs. Set table.exec.resource.default-parallelism: -1 for SQL jobs. Don&rsquo;t call setParallelism() for operators in DataStream/DataSet jobs. Don&rsquo;t call setParallelism() on StreamExecutionEnvironment/ExecutionEnvironment in DataStream/DataSet jobs. Implementation Details # In this section, we will elaborate the details of the implementation. Before that, we need to briefly introduce some concepts involved:
JobVertex and JobGraph: A job vertex is an operator chain formed by chaining several operators together for better performance. The job graph is a data flow consisting of job vertices. ExecutionVertex and ExecutionGraph: An execution vertex represents a parallel subtask of a job vertex, which will eventually be instantiated as a physical task. For example, a job vertex with a parallelism of 100 will generate 100 execution vertices. The execution graph is the physical execution topology consisting of all execution vertices. More details about the above concepts can be found in the Flink documentation. Note that the adaptive batch scheduler decides the parallelism of operators by deciding the parallelism of the job vertices which consist of these operators. To automatically decide parallelism of job vertices, we introduced the following changes:
Enabled the scheduler to collect sizes of finished datasets. Introduced a new component VertexParallelismDecider to compute proper parallelisms of job vertices according to the sizes of their consumed results. Enabled to dynamically build up execution graph to allow the parallelisms of job vertices to be decided lazily. The execution graph starts with an empty execution topology and then gradually attaches the vertices during job execution. Introduced the adaptive batch scheduler to update and schedule the dynamic execution graph. The details will be introduced in the following sections.
Fig. 1 - The overall structure of automatically deciding parallelism Collect sizes of consumed datasets # The adaptive batch scheduler decides the parallelism of vertices by the size of input results, so the scheduler needs to know the sizes of result partitions produced by tasks. We introduced a numBytesProduced counter to record the size of each produced result partition, the accumulated result of the counter will be sent to the scheduler when tasks finish.
Decide proper parallelisms of job vertices # We introduced a new component VertexParallelismDecider to compute proper parallelisms of job vertices according to the sizes of their consumed results. The computation algorithm is as follows:
Suppose
V is the bytes of data the user expects to be processed by each task. totalBytesnon-broadcast is the sum of the non-broadcast result sizes consumed by this job vertex. totalBytesbroadcast is the sum of the broadcast result sizes consumed by this job vertex. maxBroadcastRatio is the maximum ratio of broadcast bytes that affects the parallelism calculation. normalize(x) is a function that round x to the closest power of 2. then the parallelism of this job vertex P will be:
Note that we introduced two special treatment in the above formula :
Limit the maximum ratio of broadcast bytes Normalize the parallelism to the closest power of 2 However, the above formula cannot be used to decide the parallelism of the source vertices, because the source vertices have no input. To solve it, we introduced the configuration option jobmanager.adaptive-batch-scheduler.default-source-parallelism to allow users to manually configure the parallelism of source vertices. Note that not all data sources need this option, because some data sources can automatically infer parallelism (For example, HiveTableSource, see HiveParallelismInference for more detail). For these sources, it is recommended to decide parallelism by themselves.
Limit the maximum ratio of broadcast bytes # As you can see, we limit the maximum ratio of broadcast bytes that affects the parallelism calculation to maxBroadcastRatio. That is, the non-broadcast bytes processed by each task is at least (1-maxBroadcastRatio) * V. If not so,when the total broadcast bytes is close to V, even if the total non-broadcast bytes is very small, it may cause a large parallelism, which is unnecessary and may lead to resource waste and large task deployment overhead.
Generally, the broadcast dataset is usually relatively small against the other co-processed datasets, so we set the maximum ratio to 0.5 by default. The value is hard coded in the first version, and we may make it configurable later.
Normalize the parallelism to the closest power of 2 # The normalize is to avoid introducing data skew. To better understand this section, we suggest you read the Flexible subpartition mapping section first.
Taking Fig. 4 (b) as example, A1/A2 produces 4 subpartitions, and the decided parallelism of B is 3. In this case, B1 will consume 1 subpartition, B2 will consume 1 subpartition, and B3 will consume 2 subpartitions. We assume that subpartitions have the same amount of data, which means B3 will consume twice the data of other tasks, data skew is introduced due to the subpartition mapping.
To solve this problem, we need to make the subpartitions evenly consumed by downstream tasks, which means the number of subpartitions should be a multiple of the number of downstream tasks. For simplicity, we require the user-specified max parallelism to be 2N, and then adjust the calculated parallelism to a closest 2M (M &lt;= N), so that we can guarantee that subpartitions will be evenly consumed by downstream tasks.
Note that this is a temporary solution, the ultimate solution would be the Auto-rebalancing of workloads, which may come soon.
Build up execution graph dynamically # Before the adaptive batch scheduler was introduced to Flink, the execution graph was fully built in a static way before starting scheduling. To allow parallelisms of job vertices to be decided lazily, the execution graph must be able to be built up dynamically.
Create execution vertices and execution edges lazily # A dynamic execution graph means that a Flink job starts with an empty execution topology, and then gradually attaches vertices during job execution, as shown in Fig. 2.
The execution topology consists of execution vertices and execution edges. The execution vertices will be created and attached to the execution topology only when:
The parallelism of the corresponding job vertex is decided. All upstream execution vertices are already attached. The parallelism of the job vertex needs to be decided first so that Flink knows how many execution vertices should be created. Upstream execution vertices need to be attached first so that Flink can connect the newly created execution vertices to the upstream vertices with execution edges.
Fig. 2 - Build up execution graph dynamically Flexible subpartition mapping # Before the adaptive batch scheduler was introduced to Flink, when deploying a task, Flink needed to know the parallelism of its consumer job vertex. This is because consumer vertex parallelism is used to decide the number of subpartitions produced by each upstream task. The reason behind that is, for one result partition, different subpartitions serve different consumer execution vertices. More specifically, one consumer execution vertex only consumes data from subpartition with the same index.
Taking Fig. 3 as example, parallelism of the consumer B is 2, so the result partition produced by A1/A2 should contain 2 subpartitions, the subpartition with index 0 serves B1, and the subpartition with index 1 serves B2.
Fig. 3 - How subpartitions serve consumer execution vertices with static execution graph But obviously, this doesn&rsquo;t work for dynamic graphs, because when a job vertex is deployed, the parallelism of its consumer job vertices may not have been decided yet. To enable Flink to work in this case, we need a way to allow a job vertex to run without knowing the parallelism of its consumer job vertices.
To achieve this goal, we can set the number of subpartitions to be the max parallelism of the consumer job vertex. Then when the consumer execution vertices are deployed, they should be assigned with a subpartition range to consume. Suppose N is the number of consumer execution vertices and P is the number of subpartitions. For the kth consumer execution vertex, the consumed subpartition range should be:
Taking Fig. 4 as example, the max parallelism of B is 4, so A1/A2 have 4 subpartitions. And then if the decided parallelism of B is 2, then the subpartitions mapping will be Fig. 4 (a), if the decided parallelism of B is 3, then the subpartitions mapping will be Fig. 4 (b).
Fig. 4 - How subpartitions serve consumer execution vertices with dynamic graph Update and schedule the dynamic execution graph # The adaptive batch scheduler scheduling is similar to the default scheduler, the only difference is that an empty dynamic execution graph will be generated initially and vertices will be attached later. Before handling any scheduling event, the scheduler will try deciding the parallelisms of job vertices, and then initialize them to generate execution vertices, connecting execution edges, and update the execution graph.
The scheduler will try to decide the parallelism of all job vertices before handling each scheduling event, and the parallelism decision will be made for each job vertex in topological order:
For source vertices, the parallelism should have been decided before starting scheduling. For non-source vertices, the parallelism can be decided only when all its consumed results are fully produced. After deciding the parallelism, the scheduler will try to initialize the job vertices in topological order. A job vertex that can be initialized should meet the following conditions:
The parallelism of the job vertex has been decided and the job vertex has not been initialized yet. All upstream job vertices have been initialized. Future improvement # Auto-rebalancing of workloads # When running batch jobs, data skew may occur (a task needs to process much larger data than other tasks), which leads to long-tail tasks and further slows down the finish of jobs. Users usually hope that the system can automatically solve this problem. One typical data skew case is that some subpartitions have a significantly larger amount of data than others. This case can be solved by finer grained subpartitions and auto-rebalancing of workload. The work of the adaptive batch scheduler can be considered as the first step towards it, because the requirements of auto-rebalancing are similar to adaptive batch scheduler, they both need the support of dynamic graphs and the collection of result partitions size. Based on the implementation of adaptive batch scheduler, we can solve the above problem by increasing max parallelism (for finer grained subpartitions) and simply changing the subpartition range division algorithm (for auto-rebalancing). In the current design, the subpartition range is divided according to the number of subpartitions, we can change it to divide according to the amount of data in subpartitions, so that the amount of data within each subpartition range can be approximately the same. In this way, workloads of downstream tasks can be balanced.
Fig. 5 - Auto-rebalance with finer grained subpartitions `}),e.add({id:80,href:"/2022/06/05/apache-flink-kubernetes-operator-1.0.0-release-announcement/",title:"Apache Flink Kubernetes Operator 1.0.0 Release Announcement",section:"Flink Blog",content:`In the last two months since our initial preview release the community has been hard at work to stabilize and improve the core Flink Kubernetes Operator logic. We are now proud to announce the first production ready release of the operator project.
Release Highlights # The Flink Kubernetes Operator 1.0.0 version brings numerous improvements and new features to almost every aspect of the operator.
New v1beta1 API version &amp; compatibility guarantees Session Job Management support Support for Flink 1.13, 1.14 and 1.15 Deployment recovery and rollback New Operator metrics Improved configuration management Custom validators Savepoint history and cleanup New API version and compatibility guarantees # The 1.0.0 release brings a new API version: v1beta1.
Don’t let the name confuse you, we consider v1beta1 the first production ready API release, and we will maintain backward compatibility for your applications going forward.
If you are already using the 0.1.0 preview release you can read about the upgrade process here, or check our detailed compatibility guarantees.
Session Job Management # One of the most exciting new features of 1.0.0 is the introduction of the FlinkSessionJob resource. In contrast with the FlinkDeployment that allows us to manage Application and Session Clusters, the FlinkSessionJob allows users to manage Flink jobs on a running Session deployment.
This is extremely valuable in environments where users want to deploy Flink jobs quickly and iteratively and also allows cluster administrators to manage the session cluster independently of the running jobs.
Example:
apiVersion: flink.apache.org/v1beta1 kind: FlinkSessionJob metadata: name: basic-session-job-example spec: deploymentName: basic-session-cluster job: jarURI: https://repo1.maven.org/maven2/org/apache/flink/flink-examples-streaming_2.12/1.15.0/flink-examples-streaming_2.12-1.15.0-TopSpeedWindowing.jar parallelism: 4 upgradeMode: stateless Multi-version Flink support # The Flink Kubernetes Operator now supports the following Flink versions out-of-the box:
Flink 1.15 (Recommended) Flink 1.14 Flink 1.13 Flink 1.15 comes with a set of features that allow deeper integration for the operator. We recommend using Flink 1.15 to get the best possible operational experience.
Deployment Recovery and Rollbacks # We have added two new features to make Flink cluster operations smoother when using the operator.
Now the operator will try to recover Flink JobManager deployments that went missing for some reason. Maybe it was accidentally deleted by the user or another service in the cluster. As long as HA was enabled and the job did not fatally fail, the operator will try to restore the job from the latest available checkpoint.
We also added experimental support for application upgrade rollbacks. With this feature the operator will monitor new application upgrades and if they don’t become stable (healthy &amp; running) within a configurable period, they will be rolled back to the latest stable specification previously deployed.
While this feature will likely see improvements and new settings in the coming versions, it already provides benefits in cases where we have a large number of jobs with strong uptime requirements where it’s better to roll back than be stuck in a failing state.
Improved Operator Metrics # Beyond the existing JVM based system metrics, additional Operator specific metrics were added to the current release.
{:class=&ldquo;table table-bordered&rdquo;}
Scope Metrics Description Type Namespace FlinkDeployment.Count Number of managed FlinkDeployment instances per namespace Gauge Namespace FlinkDeployment.&lt;Status&gt;.Count Number of managed FlinkDeployment resources per &lt;Status&gt; per namespace. &lt;Status&gt; can take values from: READY, DEPLOYED_NOT_READY, DEPLOYING, MISSING, ERROR Gauge Namespace FlinkSessionJob.Count Number of managed FlinkSessionJob instances per namespace Gauge What&rsquo;s Next? # Our intention is to advance further on the Operator Maturity Model by adding more dynamic/automatic features
Standalone deployment mode support FLIP-225 Auto-scaling using Horizontal Pod Autoscaler Dynamic change of watched namespaces Pluggable Status and Event reporters (Making it easier to integrate with proprietary control planes) SQL jobs support Release Resources # The source artifacts and helm chart are now available on the updated Downloads page of the Flink website.
The official 1.0.0 release archive doubles as a Helm repository that you can easily register locally:
$ helm repo add flink-kubernetes-operator-1.0.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-1.0.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-1.0.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # The Apache Flink community would like to thank each and every one of the contributors that have made this release possible:
Aitozi, Biao Geng, ConradJam, Fuyao Li, Gyula Fora, Jaganathan Asokan, James Busche, liuzhuo, Márton Balassi, Matyas Orhidi, Nicholas Jiang, Ted Chang, Thomas Weise, Xin Hao, Yang Wang, Zili Chen
`}),e.add({id:81,href:"/2022/05/30/improving-speed-and-stability-of-checkpointing-with-generic-log-based-incremental-checkpoints/",title:"Improving speed and stability of checkpointing with generic log-based incremental checkpoints",section:"Flink Blog",content:` Introduction # One of the most important characteristics of stream processing systems is end-to-end latency, i.e. the time it takes for the results of processing an input record to reach the outputs. In the case of Flink, end-to-end latency mostly depends on the checkpointing mechanism, because processing results should only become visible after the state of the stream is persisted to non-volatile storage (this is assuming exactly-once mode; in other modes, results can be published immediately).
Furthermore, сheckpoint duration also defines the reasonable interval with which checkpoints are made. A shorter interval provides the following advantages:
Lower latency for transactional sinks: Transactional sinks commit on checkpoints, so faster checkpoints mean more frequent commits. More predictable checkpoint intervals: Currently, the duration of a checkpoint depends on the size of the artifacts that need to be persisted in the checkpoint storage. Less work on recovery. The more frequently the checkpoint, the fewer events need to be re-processed after recovery. Following are the main factors affecting checkpoint duration in Flink:
Barrier travel time and alignment duration Time to take state snapshot and persist it onto the durable highly-available storage (such as S3) Recent improvements such as Unaligned checkpoints and Buffer debloating try to address (1), especially in the presence of back-pressure. Previously, Incremental checkpoints were introduced to reduce the size of a snapshot, thereby reducing the time required to store it (2).
However, there are still some cases when this duration is high
Every checkpoint is delayed by at least one task with high parallelism # With the existing incremental checkpoint implementation of the RocksDB state backend, every subtask needs to periodically perform some form of compaction. That compaction results in new, relatively big files, which in turn increase the upload time (2). The probability of at least one node performing such compaction and thus slowing down the whole checkpoint grows proportionally to the number of nodes. In large deployments, almost every checkpoint becomes delayed by some node.
Unnecessary delay before uploading state snapshot # State backends don&rsquo;t start any snapshotting work until the task receives at least one checkpoint barrier, increasing the effective checkpoint duration. This is suboptimal if the upload time is comparable to the checkpoint interval; instead, a snapshot could be uploaded continuously throughout the interval.
This work discusses the mechanism introduced in Flink 1.15 to address the above cases by continuously persisting state changes on non-volatile storage while performing materialization in the background. The basic idea is described in the following section, and then important implementation details are highlighted. Subsequent sections discuss benchmarking results, limitations, and future work.
High-level Overview # The core idea is to introduce a state changelog (a log that records state changes); this changelog allows operators to persist state changes in a very fine-grained manner, as described below:
Stateful operators write the state changes to the state changelog, in addition to applying them to the state tables in RocksDB or the in-mem Hashtable. An operator can acknowledge a checkpoint as soon as the changes in the log have reached the durable checkpoint storage. The state tables are persisted periodically as well, independent of the checkpoints. We call this procedure the materialization of the state on the durable checkpoint storage. Once the state is materialized on the checkpoint storage, the state changelog can be truncated to the point where the state is materialized. This can be illustrated as follows:
&lt;br/&gt; This approach mirrors what database systems do, adjusted to distributed checkpoints:
Changes (inserts/updates/deletes) are written to the transaction log, and the transaction is considered durable once the log is synced to disk (or other durable storage). The changes are also materialized in the tables (so the database system can efficiently query the table). The tables are usually persisted asynchronously. Once all relevant parts of the changed tables have been persisted, the transaction log can be truncated, which is similar to the materialization procedure in our approach.
Such a design makes a number of trade-offs:
Increased use of network IO and remote storage space for changelog Increased memory usage to buffer state changes Increased time to replay state changes during the recovery process The last one, may or may not be compensated by more frequent checkpoints. More frequent checkpoints mean less re-processing is needed after recovery.
System architecture # Changelog storage (DSTL) # The component that is responsible for actually storing state changes has the following requirements.
Durability # Changelog constitutes a part of a checkpoint, and therefore the same durability guarantees as for checkpoints must be provided. However, the duration for which the changelog is stored is expected to be short (until the changes are materialized).
Workload # The workload is write-heavy: changelog is written continuously, and it is only read in case of failure. Once written, data can not be modified.
Latency # We target checkpoint duration of 1s in the Flink 1.15 MVP for 99% of checkpoints. Therefore, an individual write request must complete within that duration or less (if parallelism is 100, then 99.99% of write requests must complete within 1s).
Consistency # Once a change is persisted (and acknowledged to JM), it must be available for replay to enable recovery (this can be achieved by using a single machine, quorum, or synchronous replication).
Concurrency # Each task writes to its own changelog, which prevents concurrency issues across multiple tasks. However, when a task is restarted, it needs to write to the same log, which may cause concurrency issues. This is addressed by:
Using unique log segment identifiers while writing Fencing previous execution attempts on JM when handling checkpoint acknowledgments After closing the log, treating it as Flink state, which is read-only and is discarded by a single JM (leader) To emphasize the difference in durability requirements and usage compared to other systems (durable, short-lived, append-only), the component is called &ldquo;Durable Short-term Log&rdquo; (DSTL).
DSTL can be implemented in many ways, such as Distributed Log, Distributed File System* (DFS), or even a database. In the MVP version in Flink 1.15, we chose DFS because of the following reasons:
No additional external dependency; DFS is readily available in most environments and is already used to store checkpoints No additional stateful components to manage; using any other persistence medium would incur additional operational overhead DFS natively provides durability and consistency guarantees which need to be taken care of when implementing a new customized distributed log storage (in particular, when implementing replication) On the other hand, the DFS approach has the following disadvantages:
has higher latency than for example Distributed Log writing to the local disks its scalability is limited by DFS (most Storage Providers start rate-limiting at some point) However, after some initial experimentation, we think the performance of popular DFS could satisfy 80% of the use cases, and more results will be illustrated with the MVP version in a later section. DFS here makes no distinction between DFS and object stores.
Using RocksDB as an example, this approach can be illustrated at the high level as follows. State updates are replicated to both RocksDB and DSTL by the Changelog State Backend. DSTL continuously writes state changes to DFS and flushes them periodically and on checkpoint. That way, checkpoint time only depends on the time to flush a small amount of data. RocksDB on the other hand is still used for querying the state. Furthermore, its SSTables are periodically uploaded to DFS, which is called “materialization”. That upload is independent of and is much less frequent than checkpointing procedure, with 10 minutes as the default interval.
There are a few more issues worth highlighting here:
State cleanup # State changelog needs to be truncated once the corresponding changes are materialized. It becomes more complicated with re-scaling and sharing the underlying files across multiple operators. However, Flink already provides a mechanism called SharedStateRegistry similar to file system reference counting. Log fragments can be viewed as shared state objects, and therefore can be tracked by this SharedStateRegistry (please see this article for more information on how SharedStateRegistry was used previously).
DFS-specific issues # Small files problem # One issue with using DFS is that much more and likely smaller files are created for each checkpoint. And with the increased checkpoint frequency, there are more checkpoints. To mitigate this, state changes related to the same job on a TM are grouped into a single file.
High tail latency # DFS are known for high tail latencies, although this has been improving in recent years. To address the high-tail-latency problem, write requests are retried when they fail to complete within a timeout, which is 1 second by default (but can be configured manually).
Benchmark results # The improvement of checkpoint stability and speed after enabling Changelog highly depends on the factors below:
The difference between the changelog diff size and the full state size (or incremental state size, if comparing changelog to incremental checkpoints). The ability to upload the updates continuously during the checkpoint (e.g. an operator might maintain state in memory and only update Flink state objects on checkpoint - in this case, changelog wouldn’t help much). The ability to group updates from multiple tasks (multiple tasks must be deployed on a single TM). Grouping the updates leads to fewer files being created thereby reducing the load on DFS, which improves the stability. The ability of the underlying backend to accumulate updates to the same key before flushing (This makes state change log potentially contain more updates compared to just the final value, leading to a larger incremental changelog state size) The speed of the underlying durable storage (the faster it is, the less significant the improvement) The following setup was used in the experiment:
Parallelism: 50 Running time: 21h State backend: RocksDB (incremental checkpoint enabled) Storage: S3 (Presto plugin) Machine type: AWS m5.xlarge (4 slots per TM) Checkpoint interval: 10ms State Table materialization interval: 3m Input rate: 50K events per second ValueState workload # A workload updating mostly the new keys each time would benefit the most.
&nbsp; Changelog Disabled Changelog Enabled Records processed 3,808,629,616 3,810,508,130 Checkpoints made 10,023 108,649 Checkpoint duration, 90% 6s 664ms Checkpoint duration, 99.9% 10s 1s Full checkpoint size *, 99% 19.6GB 25.6GB Recovery time (local recovery disabled) 20-21s 35-65s (depending on the checkpoint) As can be seen from the above table, checkpoint duration is reduced 10 times for 99.9% of checkpoints, while space usage increases by 30%, and recovery time increases by 66%-225%.
More details about the checkpoints (Changelog Enabled / Changelog Disabled):
Percentile End to End Duration Checkpointed Data Size * Full Checkpoint Data Size * 50% 311ms / 5s 14.8MB / 3.05GB 24.2GB / 18.5GB 90% 664ms / 6s 23.5MB / 4.52GB 25.2GB / 19.3GB 99% 1s / 7s 36.6MB / 5.19GB 25.6GB / 19.6GB 99.9% 1s / 10s 52.8MB / 6.49GB 25.7GB / 19.8GB * Checkpointed Data Size is the size of data persisted after receiving the necessary number of checkpoint barriers, during a so-called synchronous and then asynchronous checkpoint phases. Most of the data is persisted pre-emptively (i.e. after the previous checkpoint and before the current one), and that’s why this size is much lower when the Changelog is enabled. * Full checkpoint size is the total size of all the files comprising the checkpoint, including any files reused from the previous checkpoints. Compared to a normal checkpoint, the one with a changelog is less compact, keeping all the historical values since the last materialization, and therefore consumes much more space
Window workload # This workload used Processing Time Sliding Window. As can be seen below, checkpoints are still faster, resulting in 3 times shorter durations; but storage amplification is much higher in this case (45 times more space consumed):
Checkpoint Statistics for Window Workload with Changelog Enabled / Changelog Disabled
Percentile End to End Duration Checkpointed Data Size Full Checkpoint Data Size 50% 791ms / 1s 269MB / 1.18GB 85.5GB / 1.99GB 90% 1s / 1s 292MB / 1.36GB 97.4GB / 2.16GB 99% 1s / 6s 310MB / 1.67GB 103GB / 2.26GB 99.9% 2s / 6s 324MB / 1.87GB 104GB / 2.30GB The increase in space consumption (Full Checkpoint Data Size) can be attributed to:
Assigning each element to multiple sliding windows (and persisting the state changelog for each). While RocksDB and Heap have the same issue, with changelog the impact is multiplied even further. As mentioned above, if the underlying state backend (i.e. RocksDB) is able to accumulate multiple state updates for the same key without flushing, the snapshot will be smaller in size than the changelog. In this particular case of sliding window, the updates to its contents are eventually followed by purging that window. If those updates and purge happen during the same checkpoint, then it&rsquo;s quite likely that the window is not included in the snapshot. This also implies that the faster the window is purged, the smaller the size of the snapshot is. Conclusion and future work # Generic log-based incremental checkpoints is released as MVP version in Flink 1.15. This version demonstrates that solutions based on modern DFS can provide good enough latency. Furthermore, checkpointing time and stability are improved significantly by using the Changelog. However, some trade-offs must be made before using it (in particular, space amplification). In the next releases, we plan to enable more use cases for Changelog, e.g., by reducing recovery time via local recovery and improving compatibility.
Another direction is further reducing latency. This can be achieved by using faster storage, such as Apache Bookkeeper or Apache Kafka.
Besides that, we are investigating other applications of Changelog, such as WAL for sinks and queryable states.
We encourage you to try out this feature and assess the pros and cons of using it in your setup. The simplest way to do this it is to add the following to your flink-conf.yaml:
state.backend.changelog.enabled: true state.backend.changelog.storage: filesystem dstl.dfs.base-path: &lt;location similar to state.checkpoints.dir&gt; Please see the full documentation here.
Acknowledgments # We thank Stephan Ewen for the initial idea of the project, and many other engineers including Piotr Nowojski, Yu Li and Yun Tang for design discussions and code reviews.
References # FLIP-158 generic log-based incremental checkpoints documentation Unaligned checkpoints Buffer debloating Incremental checkpoints `}),e.add({id:82,href:"/2022/05/23/getting-into-low-latency-gears-with-apache-flink-part-two/",title:"Getting into Low-Latency Gears with Apache Flink - Part Two",section:"Flink Blog",content:`This series of blog posts present a collection of low-latency techniques in Flink. In part one, we discussed the types of latency in Flink and the way we measure end-to-end latency and presented a few techniques that optimize latency directly. In this post, we will continue with a few more direct latency optimization techniques. Just like in part one, for each optimization technique, we will clarify what it is, when to use it, and what to keep in mind when using it. We will also show experimental results to support our statements.
Direct latency optimization # Spread work across time # When you use timers or do windowing in a job, timer or window firing may create load spikes due to heavy computation or state access. If the allocated resources cannot cope with these load spikes, timer or window firing will take a long time to finish. This often results in high latency.
To avoid this situation, you should change your code to spread out the workload as much as possible such that you do not accumulate too much work to be done at a single point in time. In the case of windowing, you should consider using incremental window aggregation with AggregateFunction or ReduceFunction. In the case of timers in a ProcessFunction, the operations executed in the onTimer() method should be optimized such that the time spent there is reduced to a minimum. If you see latency spikes resulting from a global aggregation or if you need to collect events in a well-defined order to perform certain computations, you can consider adding a pre-aggregation phase in front of the current operator.
You can apply this optimization if you are using timer-based processing (e.g., timers, windowing) and an efficient aggregation can be applied whenever an event arrives instead of waiting for timers to fire.
Keep in mind that when you spread work across time, you should consider not only computation but also state access, especially when using RocksDB. Spreading one type of work while accumulating the other may result in higher latencies.
WindowingJob already does incremental window aggregation with AggregateFunction. To show the latency improvement of this technique, we compared WindowingJob with a variant that does not do incremental aggregation, WindowingJobNoAggregation, both running with the commonly used rocksdb state backend. As the results below show, without incremental window aggregation, the latency would increase from 720 ms to 1.7 seconds.
Access external systems efficiently # Using async I/O # When interacting with external systems (e.g., RDBMS, object stores, web services) in a Flink job for data enrichment, the latency in getting responses from external systems often dominates the overall latency of the job. With Flink’s Async I/O API (e.g., AsyncDataStream.unorderedWait() or AsyncDataStream.orderedWait()), a single parallel function instance can handle many requests concurrently and receive responses asynchronously. This reduces latencies because the waiting time for responses is amortized over multiple requests.
You can apply this optimization if the client of your external system supports asynchronous requests. If it does not, you can use a thread pool of multiple clients to handle synchronous requests in parallel. You can also use a cache to speed up lookups if the data in the external system is not changing frequently. A cache, however, comes at the cost of working with outdated data.
In this experiment, we simulated an external system that returns responses within 1 to 6 ms randomly, and we keep the external system response in a cache in our job for 1s. The results below show the comparison between two jobs: EnrichingJobSync and EnrichingJobAsync. By using async I/O, the latency was reduced from around 600 ms to 100 ms.
Using a streaming join # If you are enriching a stream of events with an external database where the data changes frequently, and the changes can be converted to a data stream, then you have another option to use connected streams and a CoProcessFunction to do a streaming join. This can usually achieve lower latencies than the per-record lookups approach. An alternative approach is to pre-load external data into the job but a full streaming join can usually achieve better accuracy because it does not work with stale data and takes event-time into account. Please refer to this webinar for more details on streaming joins.
Tune checkpointing # There are two aspects in checkpointing that impact latency: checkpoint alignment time as well as checkpoint frequency and duration in case of end-to-end exactly-once with transactional sinks.
Reduce checkpoint alignment time # During checkpoint alignment, operators block the event processing from the channels where checkpoint barriers have been received in order to wait for the checkpoint barriers from other channels. Longer alignment time will result in higher latencies.
There are different ways to reduce checkpoint alignment time:
Improve the throughput. Any improvement in throughput helps processing the buffers sitting in front of a checkpoint barrier faster. Scale up or scale out. This is the same as the technique of “allocate enough resources” described in part one. Increased processing power helps reducing backpressure and checkpoint alignment time. Use unaligned checkpointing. In this case, checkpoint barriers will not wait until the data is processed but skip over and pass on to the next operator immediately. Skipped-over data, however, has to be checkpointed as well in order to be consistent. Flink can also be configured to automatically switch over from aligned to unaligned checkpointing after a certain alignment time has passed. Buffer less data. You can reduce the buffered data size by tuning the number of exclusive and floating buffers. With less data buffered in the network stack, the checkpoint barrier can arrive at operators quicker. However, reducing buffers has an adverse effect on throughput and is just mentioned here for completeness. Flink 1.14 improves buffer handling by introducing a feature called buffer debloating. Buffer debloating can dynamically adjust buffer size based on the current throughput such that the buffered data can be worked off by the receiver within a configured fixed duration, e.g., 1 second. This reduces the buffered data during the alignment phase and can be used in combination with unaligned checkpointing to reduce the checkpoint alignment time. Tune checkpoint duration and frequency # If you are working with transactional sinks with exactly-once semantics, the output events are committed to external systems (e.g., Kafka) only upon checkpoint completion. In this case, tuning other options may not help if you do not tune checkpointing. Instead, you need to have fast and more frequent checkpointing.
To have fast checkpointing, you need to reduce the checkpoint duration. To achieve that, you can, for example, turn on rocksdb incremental checkpointing, reduce the state stored in Flink, clean up state that is not needed anymore, do not put cache into managed state, store only necessary fields in state, optimize the serialization format, etc. You can also scale up or scale out, same as the technique of “allocate enough resources” described in part one. This has two effects: it reduces backpressure because of the increased processing power, and with the increased parallelism, writing checkpoints to remote storage can finish quicker. You can also tune checkpoint alignment time, as described in the previous section, to reduce the checkpoint duration. If you use Flink 1.15 or later, you can enable the changelog feature. It may help to reduce the async duration of checkpointing.
To have more frequent checkpointing, you can reduce the checkpoint interval, the minimum pause between checkpoints, or use concurrent checkpoints. But keep in mind that concurrent checkpoints introduce more runtime overhead.
Another option is to not use exactly-once sinks but to switch to at-least-once sinks. The result of this is that you may have (correct but) duplicated output events, so this may require the downstream application that consumes the output events of your jobs to perform deduplication additionally.
Process events on arrival # In a stream processing pipeline, there often exists a delay between the time an event is received and the time the event can be processed (e.g., after having seen all events up to a certain point in event time). The amount of delay may be significant for those pipelines with very low latency requirements. For example, a fraud detection job usually requires a sub-second level of latency. In this case, you could process events with ProcessFunction immediately when they arrive and deal with out-of-order events by yourself (in case of event-time processing) depending on your business requirements, e.g., drop or add them to the side output for special processing. Please refer to this Flink blog post for a great example of a low latency fraud detection job with implementation details.
You can apply this optimization if your job has a sub-second level latency requirement (e.g., hundreds of milliseconds) and the reduced watermarking interval still contributes a significant part of the latency.
Keep in mind that this may change your job logic considerably since you have to deal with out-of-order events by yourself.
Summary # Following part one, this blog post presented a few more latency optimization techniques with a focus on direct latency optimization. In the next part, we will focus on techniques that optimize latency by increasing throughput. Stay tuned!
`}),e.add({id:83,href:"/2022/05/18/getting-into-low-latency-gears-with-apache-flink-part-one/",title:"Getting into Low-Latency Gears with Apache Flink - Part One",section:"Flink Blog",content:`Apache Flink is a stream processing framework well known for its low latency processing capabilities. It is generic and suitable for a wide range of use cases. As a Flink application developer or a cluster administrator, you need to find the right gear that is best for your application. In other words, you don&rsquo;t want to be driving a luxury sports car while only using the first gear.
In this multi-part series, we will present a collection of low-latency techniques in Flink. Part one starts with types of latency in Flink and the way we measure the end-to-end latency, followed by a few techniques that optimize latency directly. Part two continues with a few more direct latency optimization techniques. Further parts of this series will cover techniques that improve latencies by optimizing throughput. For each optimization technique, we will clarify what it is, when to use it, and what to keep in mind when using it. We will also show experimental results to support our statements.
This series of blog posts is a write-up of our talk in Flink Forward Global 2021 and includes additional latency optimization techniques and details.
Latency # Types of latency # Latency can refer to different things. LatencyMarkers in Flink measure the time it takes for the markers to travel from each source operator to each downstream operator. As LatencyMarkers bypass user functions in operators, the measured latencies do not reflect the entire end-to-end latency but only a part of it. Flink also supports tracking the state access latency, which measures the response latency when state is read/written. One can also manually measure the time taken by some operators, or get this data with profilers. However, what users usually care about is the end-to-end latency, including the time spent in user-defined functions, in the stream processing framework, and when state is accessed. End-to-end latency is what we will focus on.
How we measure end-to-end latency # There are two scenarios to consider. In the first scenario, a pipeline does a simple transformation, and there are no timers or any other complex event time logic. For example, a pipeline that produces one output event for each input event. In this case, we measure the processing delay as the latency, that is, t2 - t1 as shown in the diagram.
The second scenario is where complex event time logic is involved (e.g., timers, aggregation, windowing). In this case, we measure the event time lag as the latency, that is, current processing time - current watermark. The event time lag gives us the difference between the expected output time and the actual output time.
In both scenarios, we capture a histogram and show the 99th percentile of the end-to-end latency. The latency we measure here includes the time an event stays in the source message queue (e.g., Kafka). The reason for this is that it covers the scenarios where a source operator in a pipeline is backpressured by other operators. The more the source operator is backpressured, the longer the messages stay in the message queue. So, including the time events stay in the message queue in the latency gives us how slow or fast a pipeline is.
Low-latency optimization techniques # We will discuss low-latency techniques in two groups: techniques that optimize latency directly and techniques that improve latency by optimizing throughput. Each of these techniques can be as simple as a configuration change or may require code changes, or both. We have created a git repository containing the example jobs used in our experiments to support our statements. Keep in mind that all the experimental results we will show are specific to those jobs and the environment they run in. Your job may show different results depending on where the latency bottleneck is.
Direct latency optimization # Allocate enough resources # An obvious but often forgotten low-latency technique is to allocate enough resources to your job. Flink has some metrics (e.g., idleTimeMsPerSecond, busyTimeMsPerSecond, backPressureTimeMsPerSecond) to indicate whether an operator/subtask is busy or not. This can also be spotted easily in the job graph on Flink’s Web UI if you are using Flink 1.13 or later. If some operators in your job are 100% busy, they will backpressure upstream operators and the backpressure may propagate up to the source operators. Backpressure slows down the pipeline and results in high latency. If you scale up your job by adding more CPU/memory resources or scale out by increasing the parallelism, your job will be able to process events faster or process more events in parallel which leads to reduced latencies. We recommend having an average load below 70% under normal circumstances to accommodate load spikes that come from input data, timers, windowing, or other sources. You should adjust the threshold based on your job resource usage patterns and your latency requirements.
You can apply this optimization if your job or part of it is running at its total CPU/memory capacity and you have more resources that can be allocated to the job. In the case of scaling out with high parallelism, your streaming job must be able to make use of the additional resources. For example, the job should not have fixed parallelisms in the code, the job should not be bottlenecked on the source streams, and the input streams are partitionable by keys such that they can be processed in parallel and have no severe data skew, etc. In the case of scaling up by allocating more CPU cores, your streaming job must not be bottlenecked on a single thread or any other resources.
Keep in mind that allocating more resources may result in increased financial costs, especially when you are running jobs in the cloud.
Below are the experimental results of WindowingJob. As you can see from the graph at the left, when the parallelism was 2, the two subtasks were often 100% busy. After we increased the parallelism to 3, the three subtasks were around 75% busy. As a result, the 99th percentile latency reduces from around 3 seconds to 650 milliseconds.
Use applicable state backends # When using the filesystem (Flink 1.12 or early) or hashmap (Flink 1.13 or later) state backend, the state objects are stored in memory and can be accessed directly. In contrast, when using the rocksdb state backend, every state access has to go through a (de-)serialization process which in addition may involve disk accesses. So using filesystem/hashmap state backend can help reduce latency.
You can apply this optimization if your state size is very small compared to the memory you can allocate to your job and your state size will not grow beyond your memory capacity. You can set the managed memory size to 0 if not needed. Since Flink 1.13, you can always start with the hashmap state backend and seamlessly switch to the rocksdb state backend via savepoints when the state increases to the size that is close to your memory capacity. Note that you should closely monitor the memory usage and perform the switch before an out-of-memory happens. Please refer to this Flink blog post for best practices when using the rocksdb state backend.
Keep in mind that heap-based state backends use more memory compared with RocksDB due to their copy-on-write data structure and Java’s on-heap object representation. Heap-based state backends can be affected by the garbage collector which makes them less predictable and may lead to high tail latencies. Also, as of now, there is no support for incremental checkpointing (this is being developed in FLIP-151). You should measure the difference before you make the switch.
Our experiments with the previously mentioned WindowingJob after switching the state backend from rocksdb to hashmap show a further reduction of the latency down to 500ms. Depending on your job’s state access pattern, you may see larger or smaller improvements. The graph on the right shows the garbage collection&rsquo;s impact on the latency.
Emit watermarks quickly # When using a periodic watermark generator, Flink generates a watermark every 200 ms. This means that, by default, each parallel watermark generator does not produce watermark updates until 200 ms have passed. While this may be sufficient for many cases, if you are aiming for sub-second latencies, you could try reducing the interval even further, for example, to 100 ms.
You can apply this optimization if you use event time and a periodic watermark generator, and you are aiming for sub-second latencies.
Keep in mind that watermark generation that is too frequent may also degrade performance because more watermarks must be processed by the framework. Moreover, even though watermarks are only created every 200 milliseconds, watermarks may arrive at much higher frequencies further downstream in your job because tasks may receive watermarks from multiple parallel watermark generators.
We re-ran the previous WindowingJob experiment with the reduced watermark interval pipeline.auto-watermark-interval: 100ms and reduced the latency further to 430ms.
Flush network buffers early # Flink uses buffers when sending data from one task to another over the network. Buffers are flushed and sent out when they are filled up or when the default timeout of 100ms has passed. Again, if you are aiming for sub-second latencies, you can lower the timeout to reduce latencies.
You can apply this optimization if you are aiming for sub-second latencies.
Keep in mind that network buffer timeout that is too low may reduce throughput.
As seen in the following experiment results, by using execution.buffer-timeout: 10 ms in WindowingJob, we again reduced the latency (now to 370ms).
Summary # In part one of this multi-part series, we discussed types of latency in Flink and the way we measure end-to-end latency. Then we presented a few latency optimization techniques with a focus on direct latency optimization. For each technique, we explained what it is, when to use it, and what to keep in mind when using it. Part two will continue with a few more direct latency optimization techniques. Stay tuned!
`}),e.add({id:84,href:"/2022/05/11/apache-flink-table-store-0.1.0-release-announcement/",title:"Apache Flink Table Store 0.1.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the preview release of the Apache Flink Table Store (0.1.0).
Please check out the full documentation for detailed information and user guides.
Note: Flink Table Store is still in beta status and undergoing rapid development. We do not recommend that you use it directly in a production environment.
What is Flink Table Store # In the past years, thanks to our numerous contributors and users, Apache Flink has established itself as one of the best distributed computing engines, especially for stateful stream processing at large scale. However, there are still a few challenges people are facing when they try to obtain insights from their data in real-time. Among these challenges, one prominent problem is lack of storage that caters to all the computing patterns.
As of now it is quite common that people deploy a few storage systems to work with Flink for different purposes. A typical setup is a message queue for stream processing, a scannable file system / object store for batch processing and ad-hoc queries, and a K-V store for lookups. Such an architecture posts challenge in data quality and system maintenance, due to its complexity and heterogeneity. This is becoming a major issue that hurts the end-to-end user experience of streaming and batch unification brought by Apache Flink.
The goal of Flink table store is to address the above issues. This is an important step of the project. It extends Flink&rsquo;s capability from computing to the storage domain. So we can provide a better end-to-end experience to the users.
Flink Table Store aims to provide a unified storage abstraction, so users don&rsquo;t have to build the hybrid storage by themselves. More specifically, Table Store offers the following core capabilities:
Support storage of large datasets and allows read / write in both batch and streaming manner. Support streaming queries with minimum latency down to milliseconds. Support Batch/OLAP queries with minimum latency down to the second level. Support incremental snapshots for stream consumption by default. So users don&rsquo;t need to solve the problem of combining different stores by themselves. In this preview version, as shown in the architecture above:
Users can use Flink to insert data into the Table Store, either by streaming the change log captured from databases, or by loading the data in batches from the other stores like data warehouses. Users can use Flink to query the table store in different ways, including streaming queries and Batch/OLAP queries. It is also worth noting that users can use other engines such as Apache Hive to query from the table store as well. Under the hood, table Store uses a hybrid storage architecture, using a Lake Store to store historical data and a Queue system (Apache Kafka integration is currently supported) to store incremental data. It provides incremental snapshots for hybrid streaming reads. Table Store&rsquo;s Lake Store stores data as columnar files on file system / object store, and uses the LSM Structure to support a large amount of data updates and high-performance queries. Many thanks for the inspiration of the following systems: Apache Iceberg and RocksDB.
Getting started # Please refer to the getting started guide for more details.
What&rsquo;s Next? # The community is currently working on hardening the core logic, stabilizing the storage format and adding the remaining bits for making the Flink Table Store production-ready.
In the upcoming 0.2.0 release you can expect (at-least) the following additional features:
Ecosystem: Support Flink Table Store Reader for Apache Hive Engine Core: Support the adjustment of the number of Bucket Core: Support for Append Only Data, Table Store is not just limited to update scenarios Core: Full Schema Evolution Improvements based on feedback from the preview release In the medium term, you can also expect:
Ecosystem: Support Flink Table Store Reader for Trino, PrestoDB and Apache Spark Flink Table Store Service to accelerate updates and improve query performance Please give the preview release a try, share your feedback on the Flink mailing list and contribute to the project!
Release Resources # The source artifacts and binaries are now available on the updated Downloads page of the Flink website.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # The Apache Flink community would like to thank every one of the contributors that have made this release possible:
Jane Chan, Jiangjie (Becket) Qin, Jingsong Lee, Leonard Xu, Nicholas Jiang, Shen Zhu, tsreaper, Yubin Li
`}),e.add({id:85,href:"/2022/05/06/exploring-the-thread-mode-in-pyflink/",title:"Exploring the thread mode in PyFlink",section:"Flink Blog",content:`PyFlink was introduced in Flink 1.9 which purpose is to bring the power of Flink to Python users and allow Python users to develop Flink jobs in Python language. The functionality becomes more and more mature through the development in the past releases.
Before Flink 1.15, Python user-defined functions will be executed in separate Python processes (based on the Apache Beam Portability Framework). It will bring additional serialization/deserialization overhead and also communication overhead. In scenarios where the data size is big, e.g. image processing, etc, this overhead becomes non-negligible. Besides, since it involves inter-process communication, the processing latency is also non-negligible, which is unacceptable in scenarios where the latency is critical, e.g. quantitative transaction, etc.
In Flink 1.15, we have introduced a new execution mode named &rsquo;thread&rsquo; mode (based on PEMJA) where the Python user-defined functions will be executed in the JVM as a thread instead of a separate Python process. In this article, we will dig into the details about this execution mode and also share some benchmark data to give users a basic understanding of how it works and which scenarios it’s applicable for.
Process Mode # Fig. 1 - PyFlink Architecture Overview From Fig. 1, we can see the architecture of PyFlink. As shown on the left side of Fig.1, users could use PyFlink API(Python Table API &amp; SQL or Python DataStream API) to declare the logic of jobs, which will be finally translated into JobGraph (DAG of the job) which could be recognized by Flink’s execution framework. It should be noted that Python operators (Flink operators whose purpose is to execute Python user-defined functions) will be used to execute the Python user-defined functions.
On the right side of Fig. 1, it shows the details of the Python operators where the Python user-defined functions were executed in separate Python processes.
In order to communicate with the Python worker process, a series of communication services are required between the Python operator(runs in JVM) and the Python worker(runs in Python VM). PyFlink has employed Apache Beam Portability framework to execute Python user-defined functions which provides the basic building blocks required for PyFlink.
Fig. 2 - PyFlink Runtime in Process Mode Process mode can be executed stably and efficiently in most scenarios. It is enough for more users. However, in some scenarios, it doesn’t work well due to the additional serialization/deserialization overhead. One of the most typical scenarios is image processing, where the input data size is often very big. Besides, since it involves inter-process communication, the processing latency is also non-negligible which is unacceptable in scenarios where latency is critical, e.g. quantitative transaction, etc. In order to overcome these problems, we have introduced a new execution mode(thread mode) where Python user-defined functions will be executed in the JVM as a thread instead of a separate Python process. In the following section, we will dig into the details of this new execution mode.
PEMJA # Before digging into the thread mode, let’s introduce a library PEMJA firstly, which is the core to the architecture of thread mode.
As we all know, Java Native Interface (JNI) is a standard programming interface for writing Java native methods and embedding the Java virtual machine into native applications. What’s more, CPython provides Python/C API to help embed Python in C Applications.
So if we combine these two interfaces together, we can embed Python in Java Application. Since this library solves a general problem that Python and Java could call each other, we have open sourced it as an independent project, and PyFlink has depended on PEMJA since Flink 1.15 to support thread mode.
PEMJA Architecture # Fig. 3 - PEMJA Architecture As we can see from the architecture of PEMJA in Fig. 3, JVM and PVM can call each other in the same process through PEMJA Library.
Firstly, PEMJA will start a daemon thread in JVM, which is responsible for initializing the Python Environment and creating a Python Main Interpreter owned by this process. The reason why PEMJA uses a dedicated thread to initialize Python Environment is to avoid potential deadlocks in Python Interpreter. Python Interpreter could deadlock when trying to acquire the GIL through methods such as PyGILState_* in Python/C API concurrently. It should be noted that PEMJA doesn’t call those methods directly, however, it may happen that third-party libraries may call them, e.g. numpy, etc. To get around this, we use a dedicated thread to initialize the Python Environment.
Then, each Java worker thread can invoke the Python functions through the Python ThreadState created from Python Main Interpreter.
Comparison with other solutions # Framework Principle Limitations Jython Python compiler implemented in Java Only support for Python2 GraalVM Truffle framework Compatibility issues with various Python ecological libraries Works only with GraalVM JPype JNI + Python/C API Don’t support Java calling Python Only support for CPython Jep JNI + Python/C API Difficult to integrate Performance is not good enough Only support for CPython PEMJA JNI + Python/C API Only support for CPython In the table above, we have made a basic comparison of the popular solutions of Java/Python calling libraries.
Jython: Jython is a Python interpreter implemented in Java language. Because its implementation language is Java, the interoperability between code implemented by Python syntax and Java code will be very natural. However, Jython does not support Python 3 anymore, and it is no longer actively maintained.
GraalVM: GraalVM takes use of Truffle framework to support interoperability between Python and Java. However, it has the limitation that not all the Python libraries are supported. As we know, many Python libraries rely on standard CPython to implement their C extensions. The other problem is that it only works with GraalVM, which means high migration costs.
JPype: Similar to PEMJA, JPype is also a framework built using JNI and Python/C API, but JPype only supports calling Java from Python.
Jep: Similar to PEMJA, Jep is also a framework built using JNI and Python/C API and it supports calling Python from Java. However, it doesn’t provide a jar to the maven repository and the process of loading native packages needs to be specified in advance through jvm parameters or environment variables when the JVM starts, which makes it difficult to integrate. Furthermore, our benchmark shows that the performance is not very good.
PEMJA: Similar to Jep and JPype, PEMJA is built on CPython, so it cannot support other Python interpreters, such as PyPy, etc. Since CPython is the most used implementation and standard of Python Runtime officially provided by Python, most libraries of the Python ecology are built on CPython Runtime and so could work with PEMJA naturally.
Thread Mode # Fig. 4 - PyFlink Runtime in Thread Mode From the picture above, we can see that in thread mode, the Python user-defined function runs in the same process as the Python operator(which runs in JVM). PEMJA is used as a bridge between the Java code and the Python code.
Since the Python user-defined function runs in JVM, for each input data received from the upstream operators, it will be passed to the Python user-defined function directly instead of buffered and passed to the Python user-defined function in a batch. Therefore, thread mode could have lower latency compared to the process mode. Currently, if users want to achieve lower latency in process mode, usually they need to configure the python.fn-execution.bundle.size or python.fn-execution.bundle.time to a lower value. However, since it involves inter-process communication, the latency is still a little high in some scenarios. However, this is not a problem any more in thread mode. Besides, configuring python.fn-execution.bundle.size or python.fn-execution.bundle.time to a lower value usually will affect the overall performance of the job and this will also not be a problem in thread mode.
Comparisons between process mode and thread mode # Execution Mode Benefits Limitations Process Mode Better resource isolation IPC overhead High implementation complexity Thread Mode Higher throughput Lower latency Less checkpoint time Less usage restrictions Only support for CPython Multiple jobs cannot use different Python interpreters in session mode Performance is affected by the GIL Benefits of thread mode # Since it processes data in batches in process mode, currently Python user-defined functions could not be used in some scenarios, e.g. used in the Join(Table API &amp; SQL) condition and taking columns both from the left table and the right table as inputs. However, this will not be a big problem any more in thread mode because of the nature that it handles the data one by one instead of a batch.
Unlike process mode which sends and receives data asynchronously in batches, in thread mode, data will be processed synchronously one by one. So usually it will have lower latency and also less checkpoint time. In terms of performance, since there is no inter-process communication, it could avoid data serialization/deserialization and communication overhead, as well as the stage of copying and context switching between kernel space and user space, so it usually will have better performance in thread mode.
Limitations # However, there are also some limitations for thread mode:
It only supports CPython which is also one of the most used Python interpreters. It doesn’t support session mode well and so it’s recommended that users only use thread mode in per-job or application deployments. The reason is it doesn’t support using different Python interpreters for the jobs running in the same TaskManager. This limitation comes from the fact that many Python libraries assume that they will only be initialized once in the process, so they use a lot of static variables. Usage # The execution mode could be configured via the configuration python.execution-mode. It has two possible values:
process: The Python user-defined functions will be executed in a separate Python process. (default) thread: The Python user-defined functions will be executed in the same process as Java operators. For example, you could configure it as following in Python Table API:
# Specify \`process\` mode table_env.get_config().set(&#34;python.execution-mode&#34;, &#34;process&#34;) # Specify \`thread\` mode table_env.get_config().set(&#34;python.execution-mode&#34;, &#34;thread&#34;) It should be noted that since this is still the first release of &rsquo;thread&rsquo; mode, currently there are still many limitations about it, e.g. it only supports Python ScalarFunction of Python Table API &amp; SQL. It will fall back to &lsquo;process&rsquo; mode where &rsquo;thread&rsquo; mode is not supported. So it may happen that you configure a job to execute in thread mode, however, it’s actually executed in &lsquo;process&rsquo; execution mode.
Benchmark # Test environment # OS: Alibaba Cloud Linux (Aliyun Linux) release 2.1903 LTS (Hunting Beagle)
CPU: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
Memory: 16G
CPython: Python 3.7.3
JDK: OpenJDK Runtime Environment (build 1.8.0_292-b10)
PyFlink: 1.15.0
Test results # Here, we test the json processing which is a very common scenario for PyFlink users.
The UDF implementation is as following:
# python udf @udf(result_type=DataTypes.STRING(), func_type=&#34;general&#34;) def json_value_lower(s: str): import json a = json.loads(s) a[&#39;a&#39;] = a[&#39;a&#39;].lower() return json.dumps(a) // Java UDF public class JsonValueLower extends ScalarFunction { private transient ObjectMapper mapper; private transient ObjectWriter writer; @Override public void open(FunctionContext context) throws Exception { this.mapper = new ObjectMapper(); this.writer = mapper.writerWithDefaultPrettyPrinter(); } public String eval(String s) { try { StringObject object = mapper.readValue(s, StringObject.class); object.setA(object.a.toLowerCase()); return writer.writeValueAsString(object); } catch (JsonProcessingException e) { throw new RuntimeException(&#34;Failed to read json value&#34;, e); } } private static class StringObject { private String a; public String getA() { return a; } public void setA(String a) { this.a = a; } @Override public String toString() { return &#34;StringObject{&#34; + &#34;a=&#39;&#34; + a + &#39;\\&#39;&#39; + &#39;}&#39;; } } } The test results is as following:
Type (input data size) QPS Latency Checkpoint Time Java UDF (100k) 900 2ms 100ms Java UDF (10k) 1w 20us 10ms Java UDF (1k) 5w 1us 10ms Java UDF (100) 28w 200ns 10ms Process Mode (100k) 900 5s-10s 5s Process Mode (10k) 7000 5s-10s 3s Process Mode (1k) 3.6w 3s 3s Process Mode (100) 12w 2s 2s Thread Mode (100k) 1200 1ms 100ms Thread Mode (10k) 1.2w 20us 10ms Thread Mode (1k) 5w 3us 10ms Thread Mode (100) 12w 1us 10ms As we can see from the test results:
If you care about latency and checkpoint time, thread mode is your better choice. The processing latency could be decreased from several seconds in process mode to microseconds in thread mode.
Thread mode can bring better performance than process mode when data serialization/deserialization is not negligible relative to UDF calculation itself. Compared to process mode, benchmark has shown that the throughput could be increased by 2x in common scenarios such as json processing in thread mode. ​​However, if the UDF calculation is slow and spends much longer time, then it is more recommended to use process mode, because the process mode is more mature and it has better resource isolation.
When the performance of Python UDF is close to that of Java UDF, the end-to-end performance of thread mode will be close to that of Java UDF.
Summary &amp; Future work # In this article, we have introduced the &rsquo;thread&rsquo; execution mode in PyFlink which is a new feature introduced in Flink 1.15. Compared with the &lsquo;process&rsquo; execution mode, users will get better performance, lower latency, less checkpoint time in &rsquo;thread&rsquo; mode. However, there are also some limitations about &rsquo;thread&rsquo; mode, e.g. poor support for session deployment mode, etc.
It should be noted that since this is still the first release of &rsquo;thread&rsquo; mode, currently there are still many limitations about it, e.g. it only supports Python ScalarFunction of Python Table API &amp; SQL. We&rsquo;re planning to extend it to other places where Python user-defined functions could be used in next releases.
`}),e.add({id:86,href:"/2022/05/06/improvements-to-flink-operations-snapshots-ownership-and-savepoint-formats/",title:"Improvements to Flink operations: Snapshots Ownership and Savepoint Formats",section:"Flink Blog",content:`Flink has become a well established data streaming engine and a mature project requires some shifting of priorities from thinking purely about new features towards improving stability and operational simplicity. In the last couple of releases, the Flink community has tried to address some known friction points, which includes improvements to the snapshotting process. Snapshotting takes a global, consistent image of the state of a Flink job and is integral to fault-tolerance and exacty-once processing. Snapshots include savepoints and checkpoints.
This post will outline the journey of improving snapshotting in past releases and the upcoming improvements in Flink 1.15, which includes making it possible to take savepoints in the native state backend specific format as well as clarifying snapshots ownership.
Past improvements to the snapshotting process # Flink 1.13 was the first release where we announced unaligned checkpoints to be production-ready. We encouraged people to use them if their jobs are backpressured to a point where it causes issues for checkpoints. We also unified the binary format of savepoints across all different state backends, which enables stateful switching of savepoints.
Flink 1.14 also brought additional improvements. As an alternative and as a complement to unaligned checkpoints, we introduced a feature called &ldquo;buffer debloating&rdquo;. This is built around the concept of automatically adjusting the amount of in-flight data that needs to be aligned while snapshotting. We also fixed another long-standing problem and made it possible to continue checkpointing even if there are finished tasks in a JobGraph.
New improvements to the snapshotting process # You can expect more improvements in Flink 1.15! We continue to be invested in making it easy to operate Flink clusters and have tackled the following problems. :)
Savepoints can be expensive to take and restore from if taken for a very large state stored in the RocksDB state backend. In order to circumvent this issue, we have seen users leveraging the externalized incremental checkpoints instead of savepoints in order to benefit from the native RocksDB format. However, checkpoints and savepoints serve different operational purposes. Thus, we now made it possible to take savepoints in the native state backend specific format, while still maintaining some characteristics of savepoints (i.e. making them relocatable).
Another issue reported with externalized checkpoints is that it is not clear who owns the checkpoint files (Flink or the user?). This is especially problematic when it comes to incremental RocksDB checkpoints where you can easily end up in a situation where you do not know which checkpoints depend on which files which makes it tough to clean those files up. To solve this issue, we added explicit restore modes (CLAIM, NO_CLAIM, and LEGACY) which clearly define whether Flink should take care of cleaning up the snapshots or whether it should remain the user&rsquo;s responsibility. .
The new restore modes # The Restore Mode determines who takes ownership of the files that make up savepoints or externalized checkpoints after they are restored. Snapshots, which are either checkpoints or savepoints in this context, can be owned either by a user or Flink itself. If a snapshot is owned by a user, Flink will not delete its files and will not depend on the existence of such files since it might be deleted outside of Flink&rsquo;s control.
The restore modes are CLAIM, NO_CLAIM, and LEGACY (for backwards compatibility). You can pass the restore mode like this:
$ bin/flink run -s :savepointPath -restoreMode :mode -n [:runArgs] While each restore mode serves a specific purpose, we believe the default NO_CLAIM mode is a good tradeoff in most situations, as it provides clear ownership with a small price for the first checkpoint after the restore.
Let&rsquo;s dig further into each of the modes.
LEGACY mode # The legacy mode is how Flink dealt with snapshots until version 1.15. In this mode, Flink will never delete the initial checkpoint. Unfortunately, at the same time, it is not clear if a user can ever delete it as well. The problem here is that Flink might immediately build an incremental checkpoint on top of the restored one. Therefore, subsequent checkpoints depend on the restored checkpoint. Overall, the ownership is not well defined in this mode.
NO_CLAIM (default) mode # To fix the issue of files that no one can reliably claim ownership of, we introduced the NO_CLAIM mode as the new default. In this mode, Flink will not assume ownership of the snapshot and will leave the files in the user&rsquo;s control and never delete any of the files. You can start multiple jobs from the same snapshot in this mode.
In order to make sure Flink does not depend on any of the files from that snapshot, it will force the first (successful) checkpoint to be a full checkpoint as opposed to an incremental one. This only makes a difference for state.backend: rocksdb, because all other state backends always take full checkpoints.
Once the first full checkpoint completes, all subsequent checkpoints will be taken as usual/configured. Consequently, once a checkpoint succeeds, you can manually delete the original snapshot. You can not do this earlier, because without any completed checkpoints, Flink will - upon failure - try to recover from the initial snapshot.
CLAIM mode # If you do not want to sacrifice any performance while taking the first checkpoint, we suggest looking into the CLAIM mode. In this mode, Flink claims ownership of the snapshot and essentially treats it like a checkpoint: it controls the lifecycle and might delete it if it is not needed for recovery anymore. Hence, it is not safe to manually delete the snapshot or to start two jobs from the same snapshot. Flink keeps around a configured number of checkpoints.
Savepoint format # You can now trigger savepoints in the native format of state backends. This has been introduced to match two characteristics, one of both savepoints and checkpoints:
self-contained, relocatable, and owned by users lightweight (and thus fast to take and recover from) In order to provide the two features in a single concept, we provided a way for Flink to create a savepoint in a (native) binary format of the used state backend. This brings a significant difference especially in combination with the state.backend: rocksdb setting and incremental snapshots.
That state backend can leverage RocksDB native on-disk data structures which are usually referred to as SST files. Incremental checkpoints leveraged those files and are collections of those SST files with some additional metadata, which can be quickly reloaded into the working directory of RocksDB upon restore.
Native savepoints can use the same mechanism of uploading the SST files instead of dumping the entire state into a canonical Flink format. There is one additional benefit over simply using the externalized incremental checkpoints: native savepoints are still relocatable and self-contained in a single directory. In case of checkpoints that do not hold, because a single SST file can be used by multiple checkpoints, and thus is put into a common shared directory. That is why they are called incremental.
You can choose the savepoint format when triggering the savepoint like this:
# take an intermediate savepoint $ bin/flink savepoint --type [native/canonical] :jobId [:targetDirectory] # stop the job with a savepoint $ bin/flink stop --type [native/canonical] --savepointPath [:targetDirectory] :jobId Capabilities and limitations # Unfortunately it is not possible to provide the same guarantees for all types of snapshots (canonical or native savepoints and aligned or unaligned checkpoints). The main difference between checkpoints and savepoints is that savepoints are still triggered and owned by users. Flink does not create them automatically nor ever depends on their existence. Their main purpose is still for planned, manual backups, whereas checkpoints are used for recovery. In database terms, savepoints are similar to backups, whereas checkpoints are like recovery logs.
Having additional dimensions of properties in each of the two main snapshots category does not make it easier, therefore we try to list what you can achieve with every type of snapshot.
The following table gives an overview of capabilities and limitations for the various types of savepoints and checkpoints.
✓ - Flink fully supports this type of snapshot x - Flink doesn&rsquo;t support this type of snapshot Operation Canonical Savepoint Native Savepoint Aligned Checkpoint Unaligned Checkpoint State backend change ✓ x x x State Processor API(writing) ✓ x x x State Processor API(reading) ✓ ✓ ✓ x Self-contained and relocatable ✓ ✓ x x Schema evolution ✓ ✓ ✓ ✓ Arbitrary job upgrade ✓ ✓ ✓ x Non-arbitrary job upgrade ✓ ✓ ✓ x Flink minor version upgrade ✓ ✓ ✓ x Flink bug/patch version upgrade ✓ ✓ ✓ ✓ Rescaling ✓ ✓ ✓ ✓ State backend change - you can restore from the snapshot with a different state.backend than the one for which the snapshot was taken State Processor API (writing) - The ability to create new snapshot via State Processor API. State Processor API (reading) - The ability to read state from the existing snapshot via State Processor API. Self-contained and relocatable - One snapshot directory contains everything it needs for recovery. You can move the directory around. Schema evolution - Changing the data type of the state in your UDFs. Arbitrary job upgrade - Restoring the snapshot with the different partitioning type(rescale, rebalance, map, etc.) or with a different record type for the existing operator. In other words you can add arbitrary operators anywhere in your job graph. Non-arbitrary job upgrade - In contrary to the above, you still should be able to add new operators, but certain limitations apply. You can not change partitioning for existing operators or the data type of records being exchanged. Flink minor version upgrade - Restoring a snapshot which was taken for an older minor version of Flink (1.x → 1.y). Flink bug/patch version upgrade - Restoring a snapshot which was taken for an older patch version of Flink (1.14.x → 1.14.y). Rescaling - Restoring the snapshot with a different parallelism than was used during the snapshot creation. Summary # We hope the changes we introduced over the last releases make it easier to operate Flink in respect to snapshotting. We are eager to hear from you if any of the new features have helped you solve problems you&rsquo;ve faced in the past. At the same time, if you still struggle with an issue or you had to work around some obstacles, please let us know! Maybe we will be able to incorporate your approach or find a different solution together.
`}),e.add({id:87,href:"/2022/05/05/announcing-the-release-of-apache-flink-1.15/",title:"Announcing the Release of Apache Flink 1.15",section:"Flink Blog",content:`Thanks to our well-organized and open community, Apache Flink continues to grow as a technology and remain one of the most active projects in the Apache community. With the release of Flink 1.15, we are proud to announce a number of exciting changes.
One of the main concepts that makes Apache Flink stand out is the unification of batch (aka bounded) and stream (aka unbounded) data processing, which helps reduce the complexity of development. A lot of effort went into this unification in the previous releases, and you can expect more efforts in this direction.
Apache Flink is not only growing when it comes to contributions and users, but also out of the original use cases. We are seeing a trend towards more business/analytics use cases implemented in low-/no-code. Flink SQL is the feature in the Flink ecosystem that enables such uses cases and this is why its popularity continues to grow.
Apache Flink is an essential building block in data pipelines/architectures and is used with many other technologies in order to drive all sorts of use cases. While new ideas/products may appear in this domain, existing technologies continue to establish themselves as standards for solving mission-critical problems. Knowing that we have such a wide reach and play a role in the success of many projects, it is important that the experience of integrating Apache Flink with the cloud infrastructures and existing systems is as seamless and easy as possible.
In the 1.15 release the Apache Flink community made significant progress across all these areas. Still those are not the only things that made it into 1.15. The contributors improved the experience of operating Apache Flink by making it much easier and more transparent to handle checkpoints and savepoints and their ownership, making auto scaling more seamless and complete, by removing side effects of use cases in which different data sources produce varying amounts of data, and - finally - the ability to upgrade SQL jobs without losing the state. By continuing on supporting checkpoints after tasks finished and adding window table valued functions in batch mode, the experience of unified stream and batch processing was once more improved making hybrid use cases way easier. In the SQL space, not only the first step in version upgrades have been added but also JSON functions to make it easier to import and export structured data in SQL. Both will allow users to better rely on Flink SQL for production use cases in the long term. To establish Apache Flink as part of the data processing ecosystem we improved the cloud interoperability and added more sink connectors and formats. And yes we enabled a Scala-free runtime (the hype is real).
Operating Apache Flink with ease # Even Flink jobs that have been built and tuned by the best engineering teams still need to be operated, usually on a long-term basis. The many deployment patterns, APIs, tunable configs, and use cases covered by Apache Flink mean that operation support is vital and can be burdensome.
In this release, we listened to user feedback and now operating Flink is made much easier. It is now more transparent in terms of handling checkpoints and savepoints and their ownership, which makes auto-scaling more seamless and complete (by removing side effects of use cases where different data sources produce varying amounts of data) and enables the
ability to upgrade SQL jobs without losing the state.
Clarification of checkpoint and savepoint semantics # An essential cornerstone of Flink’s fault tolerance strategy is based on checkpoints and savepoints (see the comparison). The purpose of savepoints has always been to put transitions, backups, and upgrades of Flink jobs in the control of users. Checkpoints, on the other hand, are intended to be fully controlled by Flink and guarantee fault tolerance through fast recovery, failover, etc. Both concepts are quite similar, and the underlying implementation also shares aspects of the same ideas.
However, both concepts grew apart by following specific feature requests and sometimes neglecting the overarching idea and strategy. Based on user feedback, it became apparent that this should be aligned and harmonized better and, above all, to make more clear!
There have been situations in which users relied on checkpoints to stop/restart jobs when savepoints would have been the right way to go. It was also not clear that savepoints are slower since they don’t include some of the features that make taking checkpoints so fast. In some cases like resuming from a retained checkpoint - where the checkpoint is somehow considered as a savepoint - it is unclear to the user when they can actually clean it up.
With FLIP-193 (Snapshots ownership) the community aims to make ownership the only difference between savepoints and checkpoints. In the 1.15 release the community has fixed some of those shortcomings by supporting native and incremental savepoints. Savepoints always used to use the canonical format which made them slower. Also writing full savepoints for sure takes longer than doing it in an incremental way. With 1.15 if users use the native format to take savepoints as well as the RocksDB state backend, savepoints will be automatically taken in an incremental manner. The documentation has also been clarified to provide a better overview and understanding of the differences between checkpoints and savepoints. The semantics for resuming from savepoint/retained checkpoint have also been clarified introducing the CLAIM and NO_CLAIM mode. With the CLAIM mode Flink takes over ownership of an existing snapshot, with NO_CLAIM it creates its own copy and leaves the existing one up to the user. Please note that NO_CLAIM mode is the new default behavior. The old semantic of resuming from savepoint/retained checkpoint is still accessible but has to be manually selected by choosing LEGACY mode.
Elastic scaling with reactive mode and the adaptive scheduler # Driven by the increasing number of cloud services built on top of Apache Flink, the project is becoming more and more cloud native which makes elastic scaling even more important.
This release improves metrics for the reactive mode, which is a job-scope mode where the JobManager will try to use all TaskManager resources available. To do this, we made all the metrics in the Job scope work correctly when reactive mode is enabled.
We also added an exception history for the adaptive scheduler, which is a new scheduler that first declares the required resources and waits for them before deciding on the parallelism with which to execute a job.
Furthermore, downscaling is sped up significantly. The TaskManager now has a dedicated shutdown code path, where it actively deregisters itself from the cluster instead of relying on heartbeats, giving the JobManager a clear signal for downscaling.
Adaptive batch scheduler # In 1.15, we introduced a new scheduler to Apache Flink: the Adaptive Batch Scheduler. The new scheduler can automatically decide parallelisms of job vertices for batch jobs, according to the size of data volume each vertex needs to process.
Major benefits of this scheduler includes:
Ease-of-use: Batch job users can be relieved from parallelism tuning. Adaptive: Automatically tuned parallelisms can better fit consumed datasets which have a varying volume size every day. Fine-grained: Parallelism of each job vertex will be tuned individually. This allows vertices of SQL batch jobs to be automatically assigned different proper parallelisms. Watermark alignment across data sources # Having data sources that increase watermarks at different paces could lead to problems with downstream operators. For example, some operators might need to buffer excessive amounts of data which could lead to huge operator states. This is why we introduced watermark alignment in this release.
For sources based on the new source interface, watermark alignment can be activated. Users can define alignment groups to pause consuming from sources which are too far ahead from others. The ideal scenario for aligned watermarks is when there are two or more sources that produce watermarks at a different speed and when the source has the same parallelism as splits/shards/partitions.
SQL version upgrades # The execution plan of SQL queries and its resulting topology is based on optimization rules and a cost model. This means that even minimal changes could introduce a completely different topology. This dynamism makes guaranteeing snapshot compatibility very challenging across different Flink versions. In the efforts of 1.15, the community focused on keeping the same query (via the same topology) up and running even after upgrades.
At the core of SQL upgrades are JSON plans (please note that we only have documentation in our JavaDocs for now and are still working on updating the documentation), which are JSON functions that make it easier to import and export structured data in SQL. This has been introduced for internal use already in previous releases and will now be exposed externally. Both the Table API and SQL will provide a way to compile and execute a plan which guarantees the same topology throughout different versions. This feature will be released as an experimental MVP. Users who want to give it a try already can create a JSON plan that can then be used to restore a Flink job based on the old operator structure. The full feature can be expected in Flink 1.16.
Reliable upgrades makes Flink SQL more dependable for production use cases in the long term.
Changelog state backend # In Flink 1.15, we introduced the MVP feature of the changelog state backend, which aims at making checkpoint intervals shorter and more predictable with the following advantages:
Shorter end-to-end latency: end-to-end latency mostly depends on the checkpointing mechanism, especially for transactional sinks. Transactional sinks commit on checkpoints, so faster checkpoints mean more frequent commits. More predictable checkpoint intervals: currently checkpointing time largely depends on the size of the artifacts that need to be persisted on the checkpoint storage. By keeping the size consistently small, checkpointing time becomes more predictable. Less work on recovery: with more frequently checkpoints are taken, less data need to be re-processed after each recovery. The changelog state backend helps achieve the above by continuously persisting state changes on non-volatile storage while performing state materialization in the background.
Repeatable cleanup # In previous releases of Flink, cleaning up job-related artifacts was done only once which might have resulted in abandoned artifacts in case of an error. In this version, Flink will try to run the cleanup again to avoid leaving artifacts behind. This retry mechanism runs until it was successful, by default. Users can change this behavior by configuring the repeatable cleanup options. Disabling the retry strategy will lead to Flink behaving like in previous releases.
There is still work in progress around cleaning up checkpoints, which is covered by FLINK-26606.
OpenAPI # Flink is now providing an experimental REST API specification following the OpenAPI standard. This allows the REST API to be used with standard tools that are implementing the OpenAPI standard. You can find the specification here.
Improvements to application mode # When running Flink in application mode, it can now be guaranteed that jobs will take a savepoint after they are completed if they have been configured to do so (see execution.shutdown-on-application-finish).
The recovery and clean up of jobs running in application mode has also been improved. The local state can be persisted in the working directory, which makes recovering from local storage easier.
Unification of stream and batch processing - more progress # In the latest release, we picked up new efforts and continued some previous ones in the goal of unifying stream and batch processing.
Final checkpoints # In Flink 1.14, final checkpoints were added as a feature that had to be enabled manually. Since the last release, we listened to user feedback and decided to enable it by default. For more information and how to disable this feature, please refer to the documentation. This change in configuration can prolong the shutting down sequence of bounded streaming jobs because jobs have to wait for a final checkpoint before being allowed to finish.
Window table-valued functions # Window table-valued functions have only been available for unbounded data streams. With this release they will also be usable in BATCH mode. While working on this, change window table-valued functions have also been improved in general by implementing a dedicated operator which no longer requires those window functions to be used with aggregators.
Flink SQL # Community metrics indicate that Flink SQL is widely used and becomes more popular every day. The community made several improvements but we’d like to go into two in more detail.
CAST/Type system enhancements # Data appears in all sorts and shapes but is often not in the type that you need it to be, which is why casting is one of the most common operations in SQL. In Flink 1.15, the default behavior of a failing CAST has changed from returning a null to returning an error, which makes it more compliant with the SQL standard. The old casting behavior can still be used by calling the newly introduced TRY_CAST function or restored via a configuration flag.
In addition, many bugs have been fixed and improvements made to the casting functionality, to ensure correct results.
JSON functions # JSON is one of the most popular data formats and SQL users increasingly need to build and read these data structures. Multiple JSON functions have been added to Flink SQL according to the SQL 2016 standard. It allows users to inspect, create, and modify JSON strings using the Flink SQL dialect.
Community enablement # Enabling people to build streaming data pipelines to solve their use cases is our goal. The community is well aware that a technology like Apache Flink is never used on its own and will always be part of a bigger architecture. Thus, it is important that Flink operates well in the cloud, connects seamlessly to other systems, and continues to support programming languages like Java and Python.
Cloud interoperability # There are users operating Flink deployments in cloud infrastructures from various cloud providers. There are also services that offer to manage Flink deployments for users on their platform.
In Flink 1.15, a recoverable writer for Google Cloud Storage has been added. We also organized the connectors in the Flink ecosystem and put some focus on connectors for the AWS ecosystem (i.e. KDS, Firehose).
The Elasticsearch sink # There was significant work on Flink’s overall connector ecosystem, but we want to highlight the Elasticsearch sink because it was implemented with the new connector interfaces, which offers asynchronous functionality coupled with end-to-end semantics. This sink will act as a template in the future.
A Scala-free Flink # A detailed blog post
already explains the ins and outs of why Scala users can now use the Flink Java API with any Scala version (including Scala 3).
In the end, removing Scala is just part of a larger effort of cleaning up and updating various technologies from the Flink ecosystem.
Starting in Flink 1.14, we removed the Mesos integration, isolated Akka, deprecated the DataSet Java API, and hid the Table API behind an abstraction. There’s already a lot of traction in the community towards these endeavors.
PyFlink # Before Flink 1.15, Python user-defined functions were executed in separate Python processes which caused additional serialization/deserialization and communication overhead. In scenarios in with large amounts of data, e.g. image processing, etc, this overhead becomes non-negligible. Besides, since it involves inter-process communication, the processing latency is also non-negligible, which is unacceptable in scenarios for which latency is critical, e.g. quantitative transaction, etc. In Flink 1.15, we have introduced a new execution mode named &rsquo;thread&rsquo; mode, for which Python user-defined functions will be executed in the JVM as a thread instead of a separate Python process. Benchmarks have shown that throughput could be increased by 2x in common scenarios such as JSON processing. Processing latency is also decreased from several seconds to micro-seconds. It should be noted that since this is still the first release of &rsquo;thread&rsquo; mode, it currently only supports Python ScalarFunction which is used in Python Table API &amp; SQL. We&rsquo;re planning to extend it to other areas in which Python user-defined functions could be used in the next releases.
Other # Further work has been done on the connector testing framework. If you want to contribute a connector or improve on one, you should definitely have a look.
Some long-awaited features have been added, including the CSV format and the small file compaction in the unified sink interface.
The sink API has been upgraded to version 2 and we encourage every connector maintainer to upgrade to this version.
Summary # Apache Flink is now easier to operate, made even more progress towards aligning stream and batch processing, became more accessible through improvements in the SQL components, and now integrates better with other technologies.
It is also worth mentioning that the community has set up a new home for the CDC connectors, the connector repository will be externalized (with the Elasticsearch sink as a first example), and there is now a Kubernetes operator (announcement blogpost maintained by the community.
Moving forward, the community will continue to focus on making Apache Flink a true unified stream and batch processor and work on better integrating Flink into the cloud-native ecosystem.
Upgrade Notes # While we aim to make upgrades as smooth as possible, some of the changes require users to adjust some parts of the program when upgrading to Apache Flink 1.15. Please take a look at the release notes for a list of applicable adjustments and issues during upgrades. The one big thing worth mentioning when upgrading is the updated dependencies without the Scala version. Get the details here.
List of Contributors # The Apache Flink community would like to thank each and every one of the contributors that have made this release possible:
Ada Wong, Ahmed Hamdy, Aitozi, Alexander Fedulov, Alexander Preuß, Alexander Trushev, Ali Bahadir Zeybek, Anton Kalashnikov, Arvid Heise, Bernard Joseph Jean Bruno, Bo Cui, Brian Zhou, Camile, ChangLi, Chengkai Yang, Chesnay Schepler, Daisy T, Danny Cranmer, David Anderson, David Moravek, David N Perkins, Dawid Wysakowicz, Denis-Cosmin Nutiu, Dian Fu, Dong Lin, Eelis Kostiainen, Etienne Chauchot, Fabian Paul, Francesco Guardiani, Gabor Somogyi, Galen Warren, Gao Yun, Gen Luo, GitHub, Gyula Fora, Hang Ruan, Hangxiang Yu, Honnix, Horace Lee, Ingo Bürk, JIN FENG, Jack, Jane Chan, Jark Wu, JianZhangYang, Jiangjie (Becket) Qin, JianzhangYang, Jiayi Liao, Jing, Jing Ge, Jing Zhang, Jingsong Lee, JingsongLi, Jinzhong Li, Joao Boto, Joey Lee, John Karp, Jon Gillham, Jun Qin, Junfan Zhang, Juntao Hu, Kexin, Kexin Hui, Kirill Listopad, Konstantin Knauf, LB-Yu, Leonard Xu, Lijie Wang, Liu Jiangang, Maciej Bryński, Marios Trivyzas, MartijnVisser, Mason Chen, Matthias Pohl, Michal Ciesielczyk, Mika, Mika Naylor, Mrart, Mulavar, Nick Burkard, Nico Kruber, Nicolas Raga, Nicolaus Weidner, Niklas Semmler, Nikolay, Nuno Afonso, Oleg Smirnov, Paul Lin, Paul Zhang, PengFei Li, Piotr Nowojski, Px, Qingsheng Ren, Robert Metzger, Roc Marshal, Roman, Roman Khachatryan, Ruanshubin, Rudi Kershaw, Rui Li, Ryan Scudellari, Ryan Skraba, Sebastian Mattheis, Sergey, Sergey Nuyanzin, Shen Zhu, Shengkai, Shuo Cheng, Sike Bai, SteNicholas, Steffen Hausmann, Stephan Ewen, Tartarus0zm, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, Victor Xu, Wenhao Ji, X-czh, Xianxun Ye, Xin Yu, Xinbin Huang, Xintong Song, Xuannan, Yang Wang, Yangze Guo, Yao Zhang, Yi Tang, Yibo Wen, Yuan Mei, Yuanhao Tian, Yubin Li, Yuepeng Pan, Yufan Sheng, Yufei Zhang, Yuhao Bi, Yun Gao, Yun Tang, Yuval Itzchakov, Yuxin Tan, Zakelly, Zhu Zhu, Zichen Liu, Zongwen Li, atptour2017, baisike, bgeng777, camilesing, chenxyz707, chenzihao, chuixue, dengziming, dijkwxyz, fanrui, fengli, fenyi, fornaix, gaurav726, godfrey he, godfreyhe, gongzhongqiang, haochenhao, hapihu, hehuiyuan, hongshuboy, huangxingbo, huweihua, iyupeng, jiaoqingbo, jinfeng, jxjgsylsg, kevin.cyj, kylewang, lbb, liliwei, liming.1018, lincoln lee, liufangqi, liujiangang, liushouwei, liuyongvs, lixiaobao14, lmagic233, lovewin99, lujiefsi, luoyuxia, lz, mans2singh, martijnvisser, mayue.fight, nanmu42, oogetyboogety, paul8263, pusheng.li01, qianchutao, realdengziqi, ruanhang1993, sammieliu, shammon, shihong90, shitou, shouweikun, shouzuo1, shuo.cs, siavash119, simenliuxing, sjwiesman, slankka, slinkydeveloper, snailHumming, snuyanzin, sujun, sujun1, syhily, tsreaper, txdong-sz, unknown, vahmed-hamdy, wangfeifan, wangpengcheng, wangyang0918, wangzhiwu, wangzhuo, wgzhao, wsz94, xiangqiao123, xmarker, xuyang, xuyu, xuzifu666, yangjunhan, yangze.gyz, ysymi, yuxia Luo, zhang chaoming, zhangchaoming, zhangjiaogg, zhangjingcun, zhangjun02, zhangmang, zlzhang0122, zoucao, zp, zzccctv, 周平, 子扬, 李锐, 蒋龙, 龙三, 庄天翼
`}),e.add({id:88,href:"/2022/04/03/apache-flink-kubernetes-operator-0.1.0-release-announcement/",title:"Apache Flink Kubernetes Operator 0.1.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce the preview release of the Apache Flink Kubernetes Operator (0.1.0)
The Flink Kubernetes Operator allows users to easily manage their Flink deployment lifecycle using native Kubernetes tooling.
The operator takes care of submitting, savepointing, upgrading and generally managing Flink jobs using the built-in Flink Kubernetes integration. This way users do not have to use the Flink Clients (e.g. CLI) or interact with the Flink jobs manually, they only have to declare the desired deployment specification and the operator will take care of the rest. It also make it easier to integrate Flink job management with CI/CD tooling.
Core Features
Deploy and monitor Flink Application and Session deployments Upgrade, suspend and delete Flink deployments Full logging and metrics integration Getting started # For a detailed getting started guide please check the documentation site.
FlinkDeployment CR overview # When using the operator, users create FlinkDeployment objects to describe their Flink application and session clusters deployments.
A minimal application deployment yaml would look like this:
apiVersion: flink.apache.org/v1alpha1 kind: FlinkDeployment metadata: namespace: default name: basic-example spec: image: flink:1.14 flinkVersion: v1_14 flinkConfiguration: taskmanager.numberOfTaskSlots: &#34;2&#34; serviceAccount: flink jobManager: replicas: 1 resource: memory: &#34;2048m&#34; cpu: 1 taskManager: resource: memory: &#34;2048m&#34; cpu: 1 job: jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar parallelism: 2 upgradeMode: stateless Once applied to the cluster using kubectl apply -f your-deployment.yaml the operator will spin up the application cluster for you. If you would like to upgrade or make changes to your application, you can simply modify the yaml and submit it again, the operator will execute the necessary steps (savepoint, shutdown, redeploy etc.) to upgrade your application.
To stop and delete your application cluster you can simply call kubectl delete -f your-deployment.yaml.
You can read more about the job management features on the documentation site.
What&rsquo;s Next? # The community is currently working on hardening the core operator logic, stabilizing the APIs and adding the remaining bits for making the Flink Kubernetes Operator production ready.
In the upcoming 1.0.0 release you can expect (at-least) the following additional features:
Support for Session Job deployments Job upgrade rollback strategies Pluggable validation logic Operator deployment customization Improvements based on feedback from the preview release In the medium term you can also expect:
Support for standalone / reactive deployment modes Support for other job types such as SQL or Python Please give the preview release a try, share your feedback on the Flink mailing list and contribute to the project!
Release Resources # The source artifacts and helm chart are now available on the updated Downloads page of the Flink website.
The official 0.1.0 release archive doubles as a Helm repository that you can easily register locally:
$ helm repo add flink-kubernetes-operator-0.1.0 https://archive.apache.org/dist/flink/flink-kubernetes-operator-0.1.0/ $ helm install flink-kubernetes-operator flink-kubernetes-operator-0.1.0/flink-kubernetes-operator --set webhook.create=false You can also find official Kubernetes Operator Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
List of Contributors # The Apache Flink community would like to thank each and every one of the contributors that have made this release possible:
Aitozi, Biao Geng, Gyula Fora, Hao Xin, Jaegu Kim, Jaganathan Asokan, Junfan Zhang, Marton Balassi, Matyas Orhidi, Nicholas Jiang, Sandor Kelemen, Thomas Weise, Yang Wang, 愚鲤
`}),e.add({id:89,href:"/2022/03/16/the-generic-asynchronous-base-sink/",title:"The Generic Asynchronous Base Sink",section:"Flink Blog",content:`Flink sinks share a lot of similar behavior. Most sinks batch records according to user-defined buffering hints, sign requests, write them to the destination, retry unsuccessful or throttled requests, and participate in checkpointing.
This is why for Flink 1.15 we have decided to create the AsyncSinkBase (FLIP-171), an abstract sink with a number of common functionalities extracted.
This is a base implementation for asynchronous sinks, which you should use whenever you need to implement a sink that doesn&rsquo;t offer transactional capabilities. Adding support for a new destination now only requires a lightweight shim that implements the specific interfaces of the destination using a client that supports async requests.
This common abstraction will reduce the effort required to maintain individual sinks that extend from this abstract sink, with bug fixes and improvements to the sink core benefiting all implementations that extend it. The design of AsyncSinkBase focuses on extensibility and a broad support of destinations. The core of the sink is kept generic and free of any connector-specific dependencies.
The sink base is designed to participate in checkpointing to provide at-least-once semantics and can work directly with destinations that provide a client that supports asynchronous requests.
In this post, we will go over the details of the AsyncSinkBase so that you can start using it to build your own concrete sink.
Adding the base sink as a dependency # In order to use the base sink, you will need to add the following dependency to your project. The example below follows the Maven syntax:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-connector-base&lt;/artifactId&gt; &lt;version&gt;\${flink.version}&lt;/version&gt; &lt;/dependency&gt; The Public Interfaces of AsyncSinkBase # Generic Types # &lt;InputT&gt; – type of elements in a DataStream that should be passed to the sink
&lt;RequestEntryT&gt; – type of a payload containing the element and additional metadata that is required to submit a single element to the destination
Element Converter Interface # ElementConverter
public interface ElementConverter&lt;InputT, RequestEntryT&gt; extends Serializable { RequestEntryT apply(InputT element, SinkWriter.Context context); } The concrete sink implementation should provide a way to convert from an element in the DataStream to the payload type that contains all the additional metadata required to submit that element to the destination by the sink. Ideally, this would be encapsulated from the end user since it allows concrete sink implementers to adapt to changes in the destination API without breaking end user code.
Sink Writer Interface # AsyncSinkWriter
There is a buffer in the sink writer that holds the request entries that have been sent to the sink but not yet written to the destination. An element of the buffer is a RequestEntryWrapper&lt;RequestEntryT&gt; consisting of the RequestEntryT along with the size of that record.
public abstract class AsyncSinkWriter&lt;InputT, RequestEntryT extends Serializable&gt; implements StatefulSink.StatefulSinkWriter&lt;InputT, BufferedRequestState&lt;RequestEntryT&gt;&gt; { // ... protected abstract void submitRequestEntries( List&lt;RequestEntryT&gt; requestEntries, Consumer&lt;List&lt;RequestEntryT&gt;&gt; requestResult); // ... } We will submit the requestEntries asynchronously to the destination from here. Sink implementers should use the client libraries of the destination they intend to write to, to perform this.
Should any elements fail to be persisted, they will be requeued back in the buffer for retry using requestResult.accept(...list of failed entries...). However, retrying any element that is known to be faulty and consistently failing, will result in that element being requeued forever, therefore a sensible strategy for determining what should be retried is highly recommended. If no errors were returned, we must indicate this with requestResult.accept(Collections.emptyList()).
If at any point, it is determined that a fatal error has occurred and that we should throw a runtime exception from the sink, we can call getFatalExceptionCons().accept(...); from anywhere in the concrete sink writer.
public abstract class AsyncSinkWriter&lt;InputT, RequestEntryT extends Serializable&gt; implements StatefulSink.StatefulSinkWriter&lt;InputT, BufferedRequestState&lt;RequestEntryT&gt;&gt; { // ... protected abstract long getSizeInBytes(RequestEntryT requestEntry); // ... } The async sink has a concept of size of elements in the buffer. This allows users to specify a byte size threshold beyond which elements will be flushed. However the sink implementer is best positioned to determine what the most sensible measure of size for each RequestEntryT is. If there is no way to determine the size of a record, then the value 0 may be returned, and the sink will not flush based on record size triggers.
public abstract class AsyncSinkWriter&lt;InputT, RequestEntryT extends Serializable&gt; implements StatefulSink.StatefulSinkWriter&lt;InputT, BufferedRequestState&lt;RequestEntryT&gt;&gt; { // ... public AsyncSinkWriter( ElementConverter&lt;InputT, RequestEntryT&gt; elementConverter, Sink.InitContext context, int maxBatchSize, int maxInFlightRequests, int maxBufferedRequests, long maxBatchSizeInBytes, long maxTimeInBufferMS, long maxRecordSizeInBytes) { /* ... */ } // ... } By default, the method snapshotState returns all the elements in the buffer to be saved for snapshots. Any elements that were previously removed from the buffer are guaranteed to be persisted in the destination by a preceding call to AsyncWriter#flush(true). You may want to save additional state from the concrete sink. You can achieve this by overriding snapshotState, and restoring from the saved state in the constructor. You will receive the saved state by overriding restoreWriter in your concrete sink. In this method, you should construct a sink writer, passing in the recovered state.
class MySinkWriter&lt;InputT&gt; extends AsyncSinkWriter&lt;InputT, RequestEntryT&gt; { MySinkWriter( // ... Collection&lt;BufferedRequestState&lt;Record&gt;&gt; initialStates) { super( // ... initialStates); // restore concrete sink state from initialStates } @Override public List&lt;BufferedRequestState&lt;RequestEntryT&gt;&gt; snapshotState(long checkpointId) { super.snapshotState(checkpointId); // ... } } Sink Interface # AsyncSinkBase
class MySink&lt;InputT&gt; extends AsyncSinkBase&lt;InputT, RequestEntryT&gt; { // ... @Override public StatefulSinkWriter&lt;InputT, BufferedRequestState&lt;RequestEntryT&gt;&gt; createWriter(InitContext context) { return new MySinkWriter(context); } // ... } AsyncSinkBase implementations return their own extension of the AsyncSinkWriter from createWriter().
At the time of writing, the Kinesis Data Streams sink and Kinesis Data Firehose sink are using this base sink.
Metrics # There are three metrics that automatically exist when you implement sinks (and, thus, should not be implemented by yourself).
CurrentSendTime Gauge - returns the amount of time in milliseconds it took for the most recent request to write records to complete, whether successful or not. NumBytesOut Counter - counts the total number of bytes the sink has tried to write to the destination, using the method getSizeInBytes to determine the size of each record. This will double count failures that may need to be retried. NumRecordsOut Counter - similar to above, this counts the total number of records the sink has tried to write to the destination. This will double count failures that may need to be retried. Sink Behavior # There are six sink configuration settings that control the buffering, flushing, and retry behavior of the sink.
int maxBatchSize - maximum number of elements that may be passed in the list to submitRequestEntries to be written downstream. int maxInFlightRequests - maximum number of uncompleted calls to submitRequestEntries that the SinkWriter will allow at any given point. Once this point has reached, writes and callbacks to add elements to the buffer may block until one or more requests to submitRequestEntries completes. int maxBufferedRequests - maximum buffer length. Callbacks to add elements to the buffer and calls to write will block if this length has been reached and will only unblock if elements from the buffer have been removed for flushing. long maxBatchSizeInBytes - a flush will be attempted if the most recent call to write introduces an element to the buffer such that the total size of the buffer is greater than or equal to this threshold value. long maxTimeInBufferMS - maximum amount of time an element may remain in the buffer. In most cases elements are flushed as a result of the batch size (in bytes or number) being reached or during a snapshot. However, there are scenarios where an element may remain in the buffer forever or a long period of time. To mitigate this, a timer is constantly active in the buffer such that: while the buffer is not empty, it will flush every maxTimeInBufferMS milliseconds. long maxRecordSizeInBytes - maximum size in bytes allowed for a single record, as determined by getSizeInBytes(). Destinations typically have a defined throughput limit and will begin throttling or rejecting requests once near. We employ Additive Increase Multiplicative Decrease (AIMD) as a strategy for selecting the optimal batch size.
Summary # The AsyncSinkBase is a new abstraction that makes creating and maintaining async sinks easier. This will be available in Flink 1.15 and we hope that you will try it out and give us feedback on it.
`}),e.add({id:90,href:"/2022/03/11/apache-flink-1.14.4-release-announcement/",title:"Apache Flink 1.14.4 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce another bug fix release for Flink 1.14.
This release includes 51 bug and vulnerability fixes and minor improvements for Flink 1.14. Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.14.4.
Release Artifacts # Maven Dependencies # &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.14.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.14.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.14.4&lt;/version&gt; &lt;/dependency&gt; Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.14.4 Release Notes # Sub-task [FLINK-21788] - Throw PartitionNotFoundException if the partition file has been lost for blocking shuffle [FLINK-24954] - Reset read buffer request timeout on buffer recycling for sort-shuffle [FLINK-25653] - Move buffer recycle in SortMergeSubpartitionReader out of lock to avoid deadlock [FLINK-25654] - Remove the redundant lock in SortMergeResultPartition [FLINK-25879] - Track used search terms in Matomo [FLINK-25880] - Implement Matomo in Flink documentation Bug [FLINK-21752] - NullPointerException on restore in PojoSerializer [FLINK-23946] - Application mode fails fatally when being shut down [FLINK-24334] - Configuration kubernetes.flink.log.dir not working [FLINK-24407] - Pulsar connector chinese document link to Pulsar document location incorrectly. [FLINK-24607] - SourceCoordinator may miss to close SplitEnumerator when failover frequently [FLINK-25171] - When the DDL statement was executed, the column names of the Derived Columns were not validated [FLINK-25199] - StreamEdges are not unique in self-union, which blocks propagation of watermarks [FLINK-25362] - Incorrect dependencies in Table Confluent/Avro docs [FLINK-25407] - Network stack deadlock when cancellation happens during initialisation [FLINK-25466] - TTL configuration could parse in StateTtlConfig#DISABLED [FLINK-25486] - Perjob can not recover from checkpoint when zookeeper leader changes [FLINK-25494] - Duplicate element serializer during DefaultOperatorStateBackendSnapshotStrategy#syncPrepareResources [FLINK-25678] - TaskExecutorStateChangelogStoragesManager.shutdown is not thread-safe [FLINK-25683] - wrong result if table transfrom to DataStream then window process in batch mode [FLINK-25728] - Potential memory leaks in StreamMultipleInputProcessor [FLINK-25732] - Dispatcher#requestMultipleJobDetails returns non-serialiable collection [FLINK-25827] - Potential memory leaks in SourceOperator [FLINK-25856] - Fix use of UserDefinedType in from_elements [FLINK-25883] - The value of DEFAULT_BUNDLE_PROCESSOR_CACHE_SHUTDOWN_THRESHOLD_S is too large [FLINK-25893] - ResourceManagerServiceImpl&#39;s lifecycle can lead to exceptions [FLINK-25952] - Savepoint on S3 are not relocatable even if entropy injection is not enabled [FLINK-26039] - Incorrect value getter in map unnest table function [FLINK-26159] - Pulsar Connector: should add description MAX_FETCH_RECORD in doc to explain slow consumption [FLINK-26160] - Pulsar Connector: stopCursor description should be changed. Connector only stop when auto discovery is disabled. [FLINK-26187] - Chinese docs override english aliases [FLINK-26304] - GlobalCommitter can receive failed committables New Feature [FLINK-20188] - Add Documentation for new File Source [FLINK-21407] - Clarify which sources and APIs support which formats Improvement [FLINK-20830] - Add a type of HEADLESS_CLUSTER_IP for rest service type [FLINK-24880] - Error messages &quot;OverflowError: timeout value is too large&quot; shown when executing PyFlink jobs [FLINK-25160] - Make doc clear: tolerable-failed-checkpoints counts consecutive failures [FLINK-25611] - Remove CoordinatorExecutorThreadFactory thread creation guards [FLINK-25650] - Document unaligned checkpoints performance limitations (larger records/flat map/timers/...) [FLINK-25767] - Translation of page &#39;Working with State&#39; is incomplete [FLINK-25818] - Add explanation how Kafka Source deals with idleness when parallelism is higher then the number of partitions Technical Debt [FLINK-25576] - Update com.h2database:h2 to 2.0.206 [FLINK-25785] - Update com.h2database:h2 to 2.0.210 `}),e.add({id:91,href:"/2022/02/22/scala-free-in-one-fifteen/",title:"Scala Free in One Fifteen",section:"Flink Blog",content:`Flink 1.15 is right around the corner, and among the many improvements is a Scala free classpath. Users can now leverage the Java API from any Scala version, including Scala 3!
Fig.1 Flink 1.15 Scala 3 Example This blog will discuss what has historically made supporting multiple Scala versions so complex, how we achieved this milestone, and the future of Scala in Apache Flink.
TLDR: All Scala dependencies are now isolated to the flink-scala jar. To remove Scala from the user-code classpath, remove this jar from the lib directory of the Flink distribution. $ rm flink-dist/lib/flink-scala* The Classpath and Scala # If you have worked with a JVM-based application, you have probably heard the term classpath. The classpath defines where the JVM will search for a given classfile when it needs to be loaded. There may only be one instance of a classfile on each classpath, forcing any dependency Flink exposes onto users. That is why the Flink community works hard to keep our classpath &ldquo;clean&rdquo; - or free of unnecessary dependencies. We achieve this through a combination of shaded dependencies, child first class loading, and a plugins abstraction for optional components.
The Apache Flink runtime is primarily written in Java but contains critical components that forced Scala on the default classpath. And because Scala does not maintain binary compatibility across minor releases, this historically required cross-building components for all versions of Scala. But due to many reasons - breaking changes in the compiler, a new standard library, and a reworked macro system - this was easier said than done.
Hiding Scala # As mentioned above, Flink uses Scala in a few key components; Mesos integration, the serialization stack, RPC, and the table planner. Instead of removing these dependencies or finding ways to cross-build them, the community hid Scala. It still exists in the codebase but no longer leaks into the user code classloader.
In 1.14, we took our first steps in hiding Scala from our users. We dropped the support for Apache Mesos, partially implemented in Scala, which Kubernetes very much eclipsed in terms of adoption. Next, we isolated our RPC system into a dedicated classloader, including Akka. With these changes, the runtime itself no longer relied on Scala (hence why flink-runtime lost its Scala suffix), but Scala was still ever-present in the API layer.
These changes, and the ease with which we implemented them, started to make people wonder what else might be possible. After all, we isolated Akka in less than a month, a task stuck in the backlog for years, thought to be too time-consuming.
The next logical step was to decouple the DataStream / DataSet Java APIs from Scala. This primarily entailed the few cleanups of some test classes but also the identifying of code paths that are only relevant for the Scala API. These paths were then migrated into the Scala API modules and only used if required.
For example, the Kryo serializer, which we always extended to support certain Scala types, now only includes them if an application uses the Scala APIs.
Finally, it was time to tackle the Table API, specifically the table planner, which contains 378,655 lines of Scala code at the time of writing. The table planner provides parsing, planning, and optimization of SQL and Table API queries into highly optimized Java code. It is the most extensive Scala codebase in Flink and it cannot be ported easily to Java. Using what we learned from building dedicated classloaders for the RPC stack and conditional classloading for the serializers, we hid the planner behind an abstraction that does not expose any of its internals, including Scala.
The Future of Scala in Apache Flink # While most of these changes happened behind the scenes, they resulted in one very user-facing change: removing many scala suffixes. You can find a list of all dependencies that lost their Scala suffix at the end of this post12.
Additionally, changes to the Table API required several changes to the packaging and the distribution, which some power users relying on the planner internals might need to adapt to3.
Going forward, Flink will continue to support Scala packages for the DataStream and Table APIs compiled against Scala 2.12 while the Java API is now unlocked for users to leverage components from any Scala version. We are already seeing new Scala 3 wrappers pop up in the community are excited to see how users leverage these tools in their streaming pipelines456!
flink-cep, flink-clients, flink-connector-elasticsearch-base, flink-connector-elasticsearch6, flink-connector-elasticsearch7, flink-connector-gcp-pubsub, flink-connector-hbase-1.4, flink-connector-hbase-2.2, flink-connector-hbase-base, flink-connector-jdbc, flink-connector-kafka, flink-connector-kinesis, flink-connector-nifi, flink-connector-pulsar, flink-connector-rabbitmq, flink-connector-testing, flink-connector-twitter, flink-connector-wikiedits, flink-container, flink-dstl-dfs, flink-gelly, flink-hadoop-bulk, flink-kubernetes, flink-runtime-web, flink-sql-connector-elasticsearch6, flink-sql-connector-elasticsearch7, flink-sql-connector-hbase-1.4, flink-sql-connector-hbase-2.2, flink-sql-connector-kafka, flink-sql-connector-kinesis, flink-sql-connector-rabbitmq, flink-state-processor-api, flink-statebackend-rocksdb, flink-streaming-java, flink-table-api-java-bridge, flink-test-utils, flink-yarn, flink-table-runtime, flink-table-api-java-bridge&#160;&#x21a9;&#xfe0e;
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/configuration/overview/#which-dependencies-do-you-need&#160;&#x21a9;&#xfe0e;
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/configuration/advanced/#anatomy-of-table-dependencies&#160;&#x21a9;&#xfe0e;
https://github.com/ariskk/flink4s&#160;&#x21a9;&#xfe0e;
https://github.com/findify/flink-adt&#160;&#x21a9;&#xfe0e;
https://github.com/sjwiesman/flink-scala-3&#160;&#x21a9;&#xfe0e;
`}),e.add({id:92,href:"/2022/02/18/apache-flink-1.13.6-release-announcement/",title:"Apache Flink 1.13.6 Release Announcement",section:"Flink Blog",content:`The Apache Flink Community is pleased to announce another bug fix release for Flink 1.13.
This release includes 99 bug and vulnerability fixes and minor improvements for Flink 1.13 including another upgrade of Apache Log4j (to 2.17.1). Below you will find a list of all bugfixes and improvements (excluding improvements to the build infrastructure and build stability). For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.13.6.
Release Artifacts # Maven Dependencies # &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.13.6&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.6&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.6&lt;/version&gt; &lt;/dependency&gt; Binaries # You can find the binaries on the updated Downloads page.
Docker Images # library/flink (official images) apache/flink (ASF repository) PyPi # apache-flink==1.13.6 Release Notes # Bug [FLINK-15987] - SELECT 1.0e0 / 0.0e0 throws NumberFormatException [FLINK-17914] - HistoryServer deletes cached archives if archive listing fails [FLINK-20195] - Jobs endpoint returns duplicated jobs [FLINK-20370] - Result is wrong when sink primary key is not the same with query [FLINK-21289] - Application mode ignores the pipeline.classpaths configuration [FLINK-23919] - PullUpWindowTableFunctionIntoWindowAggregateRule generates invalid Calc for Window TVF [FLINK-24232] - Archiving of suspended jobs prevents breaks subsequent archive attempts [FLINK-24255] - Test Environment / Mini Cluster do not forward configuration. [FLINK-24310] - A bug in the BufferingSink example in the doc [FLINK-24318] - Casting a number to boolean has different results between &#39;select&#39; fields and &#39;where&#39; condition [FLINK-24334] - Configuration kubernetes.flink.log.dir not working [FLINK-24366] - Unnecessary/misleading error message about failing restores when tasks are already canceled. [FLINK-24401] - TM cannot exit after Metaspace OOM [FLINK-24465] - Wrong javadoc and documentation for buffer timeout [FLINK-24492] - incorrect implicit type conversion between numeric and (var)char [FLINK-24506] - checkpoint directory is not configurable through the Flink configuration passed into the StreamExecutionEnvironment [FLINK-24509] - FlinkKafkaProducer example is not compiling due to incorrect constructer signature used [FLINK-24540] - Fix Resource leak due to Files.list [FLINK-24543] - Zookeeper connection issue causes inconsistent state in Flink [FLINK-24563] - Comparing timstamp_ltz with random string throws NullPointerException [FLINK-24597] - RocksdbStateBackend getKeysAndNamespaces would return duplicate data when using MapState [FLINK-24621] - JobManager fails to recover 1.13.1 checkpoint due to InflightDataRescalingDescriptor [FLINK-24662] - PyFlink sphinx check failed with &quot;node class &#39;meta&#39; is already registered, its visitors will be overridden&quot; [FLINK-24667] - Channel state writer would fail the task directly if meeting exception previously [FLINK-24676] - Schema does not match if explain insert statement with partial column [FLINK-24678] - Correct the metric name of map state contains latency [FLINK-24708] - \`ConvertToNotInOrInRule\` has a bug which leads to wrong result [FLINK-24728] - Batch SQL file sink forgets to close the output stream [FLINK-24761] - Fix PartitionPruner code gen compile fail [FLINK-24846] - AsyncWaitOperator fails during stop-with-savepoint [FLINK-24860] - Fix the wrong position mappings in the Python UDTF [FLINK-24885] - ProcessElement Interface parameter Collector : java.lang.NullPointerException [FLINK-24922] - Fix spelling errors in the word &quot;parallism&quot; [FLINK-25022] - ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API [FLINK-25067] - Correct the description of RocksDB&#39;s background threads [FLINK-25084] - Field names must be unique. Found duplicates [FLINK-25091] - Official website document FileSink orc compression attribute reference error [FLINK-25096] - Issue in exceptions API(/jobs/:jobid/exceptions) in flink 1.13.2 [FLINK-25199] - StreamEdges are not unique in self-union, which blocks propagation of watermarks [FLINK-25362] - Incorrect dependencies in Table Confluent/Avro docs [FLINK-25468] - Local recovery fails if local state storage and RocksDB working directory are not on the same volume [FLINK-25486] - Perjob can not recover from checkpoint when zookeeper leader changes [FLINK-25494] - Duplicate element serializer during DefaultOperatorStateBackendSnapshotStrategy#syncPrepareResources [FLINK-25513] - CoFlatMapFunction requires both two flat_maps to yield something [FLINK-25559] - SQL JOIN causes data loss [FLINK-25683] - wrong result if table transfrom to DataStream then window process in batch mode [FLINK-25728] - Potential memory leaks in StreamMultipleInputProcessor [FLINK-25732] - Dispatcher#requestMultipleJobDetails returns non-serialiable collection Improvement [FLINK-21407] - Clarify which sources and APIs support which formats [FLINK-20443] - ContinuousProcessingTimeTrigger doesn&#39;t fire at the end of the window [FLINK-21467] - Document possible recommended usage of Bounded{One/Multi}Input.endInput and emphasize that they could be called multiple times [FLINK-23842] - Add log messages for reader registrations and split requests. [FLINK-24631] - Avoiding directly use the labels as selector for deployment and service [FLINK-24739] - State requirements for Flink&#39;s application mode in the documentation [FLINK-24987] - Enhance ExternalizedCheckpointCleanup enum [FLINK-25160] - Make doc clear: tolerable-failed-checkpoints counts consecutive failures [FLINK-25415] - implement retrial on connections to Cassandra container [FLINK-25611] - Remove CoordinatorExecutorThreadFactory thread creation guards [FLINK-25818] - Add explanation how Kafka Source deals with idleness when parallelism is higher then the number of partitions Technical Debt [FLINK-24740] - Update testcontainers dependency to v1.16.2 [FLINK-24796] - Exclude javadocs / node[_modules] directories from CI compile artifact [FLINK-25472] - Update to Log4j 2.17.1 [FLINK-25375] - Update Log4j to 2.17.0 [FLINK-25576] - Update com.h2database:h2 to 2.0.206 `}),e.add({id:93,href:"/2022/01/31/stateful-functions-3.2.0-release-announcement/",title:"Stateful Functions 3.2.0 Release Announcement",section:"Flink Blog",content:`Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new JavaScript SDK!
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Java SDK, Python SDK,, GoLang SDK and JavaScript SDK distributions are available on Maven, PyPI, Github, and npm respectively. You can also find official StateFun Docker images of the new version on Dockerhub.
For more details, check the complete release changelog and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA!
New Features # A brand new JavaScript SDK for NodeJS # Stateful Functions provides a unified model for building stateful applications across various programming languages and deployment environments. The community is thrilled to release an official JavaScript SDK as part of the 3.2.0 release.
const http = require(&#34;http&#34;); const {messageBuilder, StateFun, Context} = require(&#34;apache-flink-statefun&#34;); let statefun = new StateFun(); statefun.bind({ typename: &#34;com.example.fns/greeter&#34;, fn(context, message) { const name = message.asString(); let seen = context.storage.seen || 0; seen = seen + 1; context.storage.seen = seen; context.send( messageBuilder({typename: &#39;com.example.fns/inbox&#39;, id: name, value: \`&#34;Hello \${name} for the \${seen}th time!&#34;\`}) ); }, specs: [{ name: &#34;seen&#34;, type: StateFun.intType(), }] }); http.createServer(statefun.handler()).listen(8000); As with the Python, Java and Go SDKs, the JavaScript SDK includes:
An address scoped storage acting as a key-value store for a particular address. A unified cross-language way to send, receive and store values across languages. Dynamic ValueSpec to describe the state name, type, and possibly expiration configuration at runtime. You can get started by adding the SDK to your project.
npm install apache-flink-statefun@3.2.0
For a detailed SDK tutorial, we would like to encourage you to visit:
JavaScript SDK Documentation Support different remote functions module names # With the newly introduced configuration option statefun.remote.module-name, it is possible to override the default remote module file name (module.yaml).
To provide a different name, for example prod.yaml that is located at /flink/usrlib/prod.yaml, one can add the following to ones flink-conf.yaml:
statefun.remote.module-name: /flink/usrlib/prod.yaml For more information see FLINK-25308.
Allow creating custom metrics # The embedded SDK now supports registering custom counters. For more information see FLINK-22533.
Upgraded Flink dependency to 1.14.3 # Stateful Functions 3.2.0 runtime uses Flink 1.14.3 underneath. This means that Stateful Functions benefits from the latest improvements and stabilisations that went into Flink. For more information see Flink&rsquo;s release announcement.
Release Notes # Please review the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 3.2.0.
List of Contributors # Seth Wiesman, Igal Shilman, Till Rohrmann, Stephan Ewen, Tzu-Li (Gordon) Tai, Ingo Bürk, Evans Ye, neoXfire, Galen Warren
If you’d like to get involved, we’re always looking for new contributors.
`}),e.add({id:94,href:"/2022/01/20/pravega-flink-connector-101/",title:"Pravega Flink Connector 101",section:"Flink Blog",content:`Pravega, which is now a CNCF sandbox project, is a cloud-native storage system based on abstractions for both batch and streaming data consumption. Pravega streams (a new storage abstraction) are durable, consistent, and elastic, while natively supporting long-term data retention. In comparison, Apache Flink is a popular real-time computing engine that provides unified batch and stream processing. Flink provides high-throughput, low-latency computation, as well as support for complex event processing and state management. Both Pravega and Flink share the same design philosophy and treat data streams as primitives. This makes them a great match when constructing storage+computing data pipelines which can unify batch and streaming use cases.
That&rsquo;s also the main reason why Pravega has chosen to use Flink as the first integrated execution engine among the various distributed computing engines on the market. With the help of Flink, users can use flexible APIs for windowing, complex event processing (CEP), or table abstractions to process streaming data easily and enrich the data being stored. Since its inception in 2016, Pravega has established communication with Flink PMC members and developed the connector together.
In 2017, the Pravega Flink connector module started to move out of the Pravega main repository and has been maintained in a new separate repository since then. During years of development, many features have been implemented, including:
exactly-once processing guarantees for both Reader and Writer, supporting end-to-end exactly-once processing pipelines seamless integration with Flink&rsquo;s checkpoints and savepoints parallel Readers and Writers supporting high throughput and low latency processing support for Batch, Streaming, and Table API to access Pravega Streams These key features make streaming pipeline applications easier to develop without worrying about performance and correctness which are the common pain points for many streaming use cases.
In this blog post, we will discuss how to use this connector to read and write Pravega streams with the Flink DataStream API.
Basic usages # Dependency # To use this connector in your application, add the dependency to your project:
&lt;dependency&gt; &lt;groupId&gt;io.pravega&lt;/groupId&gt; &lt;artifactId&gt;pravega-connectors-flink-1.13_2.12&lt;/artifactId&gt; &lt;version&gt;0.10.1&lt;/version&gt; &lt;/dependency&gt; In the above example,
1.13 is the Flink major version which is put in the middle of the artifact name. The Pravega Flink connector maintains compatibility for the three most recent major versions of Flink.
0.10.1 is the version that aligns with the Pravega version.
You can find the latest release with a support matrix on the GitHub Releases page.
API introduction # Configurations # The connector provides a common top-level object PravegaConfig for Pravega connection configurations. The config object automatically configures itself from environment variables, system properties and program arguments.
The basic controller URI and the default scope can be set like this:
Setting Environment Variable /System Property /Program Argument Default Value Controller URI PRAVEGA_CONTROLLER_URIpravega.controller.uri--controller tcp://localhost:9090 Default Scope PRAVEGA_SCOPEpravega.scope--scope - The recommended way to create an instance of PravegaConfig is the following:
// From default environment PravegaConfig config = PravegaConfig.fromDefaults(); // From program arguments ParameterTool params = ParameterTool.fromArgs(args); PravegaConfig config = PravegaConfig.fromParams(params); // From user specification PravegaConfig config = PravegaConfig.fromDefaults() .withControllerURI(&#34;tcp://...&#34;) .withDefaultScope(&#34;SCOPE-NAME&#34;) .withCredentials(credentials) .withHostnameValidation(false); Serialization/Deserialization # Pravega has defined io.pravega.client.stream.Serializer for the serialization/deserialization, while Flink has also defined standard interfaces for the purpose.
org.apache.flink.api.common.serialization.SerializationSchema org.apache.flink.api.common.serialization.DeserializationSchema For interoperability with other pravega client applications, we have built-in adapters PravegaSerializationSchema and PravegaDeserializationSchema to support processing Pravega stream data produced by a non-Flink application.
Here is the adapter for Pravega Java serializer:
import io.pravega.client.stream.impl.JavaSerializer; ... DeserializationSchema&lt;MyEvent&gt; adapter = new PravegaDeserializationSchema&lt;&gt;( MyEvent.class, new JavaSerializer&lt;MyEvent&gt;()); FlinkPravegaReader # FlinkPravegaReader is a Flink SourceFunction implementation which supports parallel reads from one or more Pravega streams. Internally, it initiates a Pravega reader group and creates Pravega EventStreamReader instances to read the data from the stream(s). It provides a builder-style API to construct, and can allow streamcuts to mark the start and end of the read.
You can use it like this:
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Enable Flink checkpoint to make state fault tolerant env.enableCheckpointing(60000); // Define the Pravega configuration ParameterTool params = ParameterTool.fromArgs(args); PravegaConfig config = PravegaConfig.fromParams(params); // Define the event deserializer DeserializationSchema&lt;MyClass&gt; deserializer = ... // Define the data stream FlinkPravegaReader&lt;MyClass&gt; pravegaSource = FlinkPravegaReader.&lt;MyClass&gt;builder() .forStream(...) .withPravegaConfig(config) .withDeserializationSchema(deserializer) .build(); DataStream&lt;MyClass&gt; stream = env.addSource(pravegaSource) .setParallelism(4) .uid(&#34;pravega-source&#34;); FlinkPravegaWriter # FlinkPravegaWriter is a Flink SinkFunction implementation which supports parallel writes to Pravega streams.
It supports three writer modes that relate to guarantees about the persistence of events emitted by the sink to a Pravega Stream:
Best-effort - Any write failures will be ignored and there could be data loss. At-least-once(default) - All events are persisted in Pravega. Duplicate events are possible, due to retries or in case of failure and subsequent recovery. Exactly-once - All events are persisted in Pravega using a transactional approach integrated with the Flink checkpointing feature. Internally, it will initiate several Pravega EventStreamWriter or TransactionalEventStreamWriter (depends on the writer mode) instances to write data to the stream. It provides a builder-style API to construct.
A basic usage looks like this:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // Define the Pravega configuration PravegaConfig config = PravegaConfig.fromParams(params); // Define the event serializer SerializationSchema&lt;MyClass&gt; serializer = ... // Define the event router for selecting the Routing Key PravegaEventRouter&lt;MyClass&gt; router = ... // Define the sink function FlinkPravegaWriter&lt;MyClass&gt; pravegaSink = FlinkPravegaWriter.&lt;MyClass&gt;builder() .forStream(...) .withPravegaConfig(config) .withSerializationSchema(serializer) .withEventRouter(router) .withWriterMode(EXACTLY_ONCE) .build(); DataStream&lt;MyClass&gt; stream = ... stream.addSink(pravegaSink) .setParallelism(4) .uid(&#34;pravega-sink&#34;); You can see some more examples here.
Internals of reader and writer # Checkpoint integration # Flink has periodic checkpoints based on the Chandy-Lamport algorithm to make state in Flink fault-tolerant. By allowing state and the corresponding stream positions to be recovered, the application is given the same semantics as a failure-free execution.
Pravega also has its own Checkpoint concept which is to create a consistent &ldquo;point in time&rdquo; persistence of the state of each Reader in the Reader Group, by using a specialized Event (Checkpoint Event) to signal each Reader to preserve its state. Once a Checkpoint has been completed, the application can use the Checkpoint to reset all the Readers in the Reader Group to the known consistent state represented by the Checkpoint.
This means that our end-to-end recovery story is not like other messaging systems such as Kafka, which uses a more coupled method and persists its offset in the Flink task state and lets Flink do the coordination. Flink delegates the Pravega source recovery completely to the Pravega server and uses only a lightweight hook to connect. We collaborated with the Flink community and added a new interface ExternallyInducedSource (FLINK-6390) to allow such external calls for checkpointing. The connector integrated this interface to guarantee exactly-once semantics during a failure recovery.
The checkpoint mechanism works as a two-step process:
The master hook handler from the JobManager initiates the triggerCheckpoint request to the ReaderCheckpointHook that was registered with the JobManager during FlinkPravegaReader source initialization. The ReaderCheckpointHook handler notifies Pravega to checkpoint the current reader state. This is a non-blocking call that returns a future once Pravega readers are done with the checkpointing. Once the future completes, the Pravega checkpoint will be persisted in a &ldquo;master state&rdquo; of a Flink checkpoint.
A Checkpoint event will be sent by Pravega as part of the data stream flow and, upon receiving the event, the FlinkPravegaReader will initiate a triggerCheckpoint request to effectively let Flink continue and complete the checkpoint process.
End-to-end exactly-once semantics # In the early years of big data processing, results from real-time stream processing were always considered inaccurate/approximate/speculative. However, this correctness is extremely important for some use cases and in some industries such as finance.
This constraint stems mainly from two issues:
unordered data source in event time end-to-end exactly-once semantics guarantee During recent years of development, watermarking has been introduced as a tradeoff between correctness and latency, which is now considered a good solution for unordered data sources in event time.
The guarantee of end-to-end exactly-once semantics is more tricky. When we say “exactly-once semantics”, what we mean is that each incoming event affects the final results exactly once. Even in the event of a machine or software failure, there is no duplicate data and no data that goes unprocessed. This is quite difficult because of the demands of message acknowledgment and recovery during such fast processing and is also why some early distributed streaming engines like Storm(without Trident) chose to support &ldquo;at-least-once&rdquo; guarantees.
Flink is one of the first streaming systems that was able to provide exactly-once semantics due to its delicate checkpoint mechanism. But to make it work end-to-end, the final stage needs to apply the semantic to external message system sinks that support commits and rollbacks.
To work around this problem, Pravega introduced transactional writes. A Pravega transaction allows an application to prepare a set of events that can be written &ldquo;all at once&rdquo; to a Stream. This allows an application to &ldquo;commit&rdquo; a bunch of events atomically. When writes are idempotent, it is possible to implement end-to-end exactly-once pipelines together with Flink.
To build such an end-to-end solution requires coordination between Flink and the Pravega sink, which is still challenging. A common approach for coordinating commits and rollbacks in a distributed system is the two-phase commit protocol. We used this protocol and, together with the Flink community, implemented the sink function in a two-phase commit way coordinated with Flink checkpoints.
The Flink community then extracted the common logic from the two-phase commit protocol and provided a general interface TwoPhaseCommitSinkFunction (FLINK-7210) to make it possible to build end-to-end exactly-once applications with other message systems that have transaction support. This includes Apache Kafka versions 0.11 and above. There is an official Flink blog post that describes this feature in detail.
Summary # The Pravega Flink connector enables Pravega to connect to Flink and allows Pravega to act as a key data store in a streaming pipeline. Both projects share a common design philosophy and can integrate well with each other. Pravega has its own concept of checkpointing and has implemented transactional writes to support end-to-end exactly-once guarantees.
Future plans # FlinkPravegaInputFormat and FlinkPravegaOutputFormat are now provided to support batch reads and writes in Flink, but these are under the legacy DataSet API. Since Flink is now making efforts to unify batch and streaming, it is improving its APIs and providing new interfaces for the source and sink APIs in the Flink 1.11 and 1.12 releases. We will continue to work with the Flink community and integrate with the new APIs.
We will also put more effort into SQL / Table API support in order to provide a better user experience since it is simpler to understand and even more powerful to use in some cases.
Note: the original blog post can be found here.
`}),e.add({id:95,href:"/2022/01/17/apache-flink-1.14.3-release-announcement/",title:"Apache Flink 1.14.3 Release Announcement",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.14 series. The first bugfix release was 1.14.2, being an emergency release due to an Apache Log4j Zero Day (CVE-2021-44228). Flink 1.14.1 was abandoned. That means that this Flink release is the first bugfix release of the Flink 1.14 series which contains bugfixes not related to the mentioned CVE.
This release includes 164 fixes and minor improvements for Flink 1.14.0. The list below includes bugfixes and improvements. For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.14.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.14.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.14.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.14.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.14.3 Sub-task [FLINK-24355] - Expose the flag for enabling checkpoints after tasks finish in the Web UI Bug [FLINK-15987] - SELECT 1.0e0 / 0.0e0 throws NumberFormatException [FLINK-17914] - HistoryServer deletes cached archives if archive listing fails [FLINK-19142] - Local recovery can be broken if slot hijacking happened during a full restart [FLINK-20195] - Jobs endpoint returns duplicated jobs [FLINK-20370] - Result is wrong when sink primary key is not the same with query [FLINK-21289] - Application mode ignores the pipeline.classpaths configuration [FLINK-21345] - NullPointerException LogicalCorrelateToJoinFromTemporalTableFunctionRule.scala:157 [FLINK-22113] - UniqueKey constraint is lost with multiple sources join in SQL [FLINK-22954] - Don&#39;t support consuming update and delete changes when use table function that does not contain table field [FLINK-23614] - The resulting scale of TRUNCATE(DECIMAL, ...) is not correct [FLINK-23704] - FLIP-27 sources are not generating LatencyMarkers [FLINK-23827] - Fix ModifiedMonotonicity inference for some node [FLINK-23919] - PullUpWindowTableFunctionIntoWindowAggregateRule generates invalid Calc for Window TVF [FLINK-24156] - BlobServer crashes due to SocketTimeoutException in Java 11 [FLINK-24232] - Archiving of suspended jobs prevents breaks subsequent archive attempts [FLINK-24291] - Decimal precision is lost when deserializing in test cases [FLINK-24310] - A bug in the BufferingSink example in the doc [FLINK-24315] - Cannot rebuild watcher thread while the K8S API server is unavailable [FLINK-24318] - Casting a number to boolean has different results between &#39;select&#39; fields and &#39;where&#39; condition [FLINK-24331] - PartiallyFinishedSourcesITCase fails with &quot;No downstream received 0 from xxx;&quot; [FLINK-24336] - PyFlink TableEnvironment executes the SQL randomly MalformedURLException with the configuration for &#39;pipeline.classpaths&#39; [FLINK-24344] - Handling of IOExceptions when triggering checkpoints doesn&#39;t cause job failover [FLINK-24353] - Bash scripts do not respect dynamic configurations when calculating memory sizes [FLINK-24366] - Unnecessary/misleading error message about failing restores when tasks are already canceled. [FLINK-24371] - Support SinkWriter preCommit without the need of a committer [FLINK-24377] - TM resource may not be properly released after heartbeat timeout [FLINK-24380] - Flink should handle the state transition of the pod from Pending to Failed [FLINK-24381] - Table API exceptions may leak sensitive configuration values [FLINK-24401] - TM cannot exit after Metaspace OOM [FLINK-24407] - Pulsar connector chinese document link to Pulsar document location incorrectly. [FLINK-24408] - org.codehaus.janino.InternalCompilerException: Compiling &quot;StreamExecValues$200&quot;: Code of method &quot;nextRecord(Ljava/lang/Object;)Ljava/lang/Object;&quot; of class &quot;StreamExecValues$200&quot; grows beyond 64 KB [FLINK-24409] - Kafka topics with periods in their names generate a constant stream of errors [FLINK-24431] - [Kinesis][EFO] EAGER registration strategy does not work when job fails over [FLINK-24432] - RocksIteratorWrapper.seekToLast() calls the wrong RocksIterator method [FLINK-24465] - Wrong javadoc and documentation for buffer timeout [FLINK-24467] - Set min and max buffer size even if the difference less than threshold [FLINK-24468] - NPE when notifyNewBufferSize [FLINK-24469] - Incorrect calcualtion of the buffer size in case of channel data skew [FLINK-24480] - EqualiserCodeGeneratorTest fails on azure [FLINK-24488] - KafkaRecordSerializationSchemaBuilder does not forward timestamp [FLINK-24492] - incorrect implicit type conversion between numeric and (var)char [FLINK-24506] - checkpoint directory is not configurable through the Flink configuration passed into the StreamExecutionEnvironment [FLINK-24540] - Fix Resource leak due to Files.list [FLINK-24543] - Zookeeper connection issue causes inconsistent state in Flink [FLINK-24550] - Can not access job information from a standby jobmanager UI [FLINK-24551] - BUFFER_DEBLOAT_SAMPLES property is taken from the wrong configuration [FLINK-24552] - Ineffective buffer debloat configuration in randomized tests [FLINK-24563] - Comparing timstamp_ltz with random string throws NullPointerException [FLINK-24596] - Bugs in sink.buffer-flush before upsert-kafka [FLINK-24597] - RocksdbStateBackend getKeysAndNamespaces would return duplicate data when using MapState [FLINK-24600] - Duplicate 99th percentile displayed in checkpoint summary [FLINK-24608] - Sinks built with the unified sink framework do not receive timestamps when used in Table API [FLINK-24613] - Documentation on orc supported data types is outdated [FLINK-24647] - ClusterUncaughtExceptionHandler does not log the exception [FLINK-24654] - NPE on RetractableTopNFunction when some records were cleared by state ttl [FLINK-24662] - PyFlink sphinx check failed with &quot;node class &#39;meta&#39; is already registered, its visitors will be overridden&quot; [FLINK-24667] - Channel state writer would fail the task directly if meeting exception previously [FLINK-24676] - Schema does not match if explain insert statement with partial column [FLINK-24678] - Correct the metric name of map state contains latency [FLINK-24691] - FLINK SQL SUM() causes a precision error [FLINK-24704] - Exception occurs when the input record loses monotonicity on the sort key field of UpdatableTopNFunction [FLINK-24706] - AkkaInvocationHandler silently ignores deserialization errors [FLINK-24708] - \`ConvertToNotInOrInRule\` has a bug which leads to wrong result [FLINK-24728] - Batch SQL file sink forgets to close the output stream [FLINK-24733] - Data loss in pulsar source when using shared mode [FLINK-24738] - Fail during announcing buffer size to released local channel [FLINK-24761] - Fix PartitionPruner code gen compile fail [FLINK-24773] - KafkaCommitter should fail on unknown Exception [FLINK-24777] - Processed (persisted) in-flight data description miss on Monitoring Checkpointing page [FLINK-24789] - IllegalStateException with CheckpointCleaner being closed already [FLINK-24792] - OperatorCoordinatorSchedulerTest crashed JVM on AZP [FLINK-24835] - &quot;group by&quot; in the interval join will throw a exception [FLINK-24846] - AsyncWaitOperator fails during stop-with-savepoint [FLINK-24858] - TypeSerializer version mismatch during eagerly restore [FLINK-24874] - Dropdown menu is not properly shown in UI [FLINK-24885] - ProcessElement Interface parameter Collector : java.lang.NullPointerException [FLINK-24919] - UnalignedCheckpointITCase hangs on Azure [FLINK-24922] - Fix spelling errors in the word &quot;parallism&quot; [FLINK-24937] - &quot;kubernetes application HA test&quot; hangs on azure [FLINK-24938] - Checkpoint cleaner is closed before checkpoints are discarded [FLINK-25022] - ClassLoader leak with ThreadLocals on the JM when submitting a job through the REST API [FLINK-25067] - Correct the description of RocksDB&#39;s background threads [FLINK-25084] - Field names must be unique. Found duplicates [FLINK-25091] - Official website document FileSink orc compression attribute reference error [FLINK-25096] - Issue in exceptions API(/jobs/:jobid/exceptions) in flink 1.13.2 [FLINK-25126] - FlinkKafkaInternalProducer state is not reset if transaction finalization fails [FLINK-25132] - KafkaSource cannot work with object-reusing DeserializationSchema [FLINK-25134] - Unused RetryRule in KafkaConsumerTestBase swallows retries [FLINK-25222] - Remove NetworkFailureProxy used for Kafka connector tests [FLINK-25271] - ApplicationDispatcherBootstrapITCase. testDispatcherRecoversAfterLosingAndRegainingLeadership failed on azure [FLINK-25294] - Incorrect cloudpickle import [FLINK-25375] - Update Log4j to 2.17.0 [FLINK-25418] - The dir_cache is specified in the flink task. When there is no network, you will still download the python third-party library [FLINK-25446] - Avoid sanity check on read bytes on DataInputStream#read(byte[]) [FLINK-25468] - Local recovery fails if local state storage and RocksDB working directory are not on the same volume [FLINK-25477] - The directory structure of the State Backends document is not standardized [FLINK-25513] - CoFlatMapFunction requires both two flat_maps to yield something Improvement [FLINK-20443] - ContinuousProcessingTimeTrigger doesn&#39;t fire at the end of the window [FLINK-21467] - Document possible recommended usage of Bounded{One/Multi}Input.endInput and emphasize that they could be called multiple times [FLINK-23519] - Aggregate State Backend Latency by State Level [FLINK-23798] - Avoid using reflection to get filter when partition filter is enabled [FLINK-23842] - Add log messages for reader registrations and split requests. [FLINK-23914] - Make connector testing framework more verbose on test failure [FLINK-24117] - Remove unHandledErrorListener in ZooKeeperLeaderElectionDriver and ZooKeeperLeaderRetrievalDriver [FLINK-24148] - Add bloom filter policy option in RocksDBConfiguredOptions [FLINK-24382] - RecordsOut metric for sinks is inaccurate [FLINK-24437] - Remove unhandled exception handler from CuratorFramework before closing it [FLINK-24460] - Rocksdb Iterator Error Handling Improvement [FLINK-24481] - Translate buffer debloat documenation to chinese [FLINK-24529] - flink sql job cannot use custom job name [FLINK-24631] - Avoiding directly use the labels as selector for deployment and service [FLINK-24670] - Restructure unaligned checkpoints documentation page to &quot;Checkpointing under back pressure&quot; [FLINK-24690] - Clarification of buffer size threshold calculation in BufferDebloater [FLINK-24695] - Update how to configure unaligned checkpoints in the documentation [FLINK-24739] - State requirements for Flink&#39;s application mode in the documentation [FLINK-24813] - Improve ImplicitTypeConversionITCase [FLINK-24880] - Error messages &quot;OverflowError: timeout value is too large&quot; shown when executing PyFlink jobs [FLINK-24958] - correct the example and link for temporal table function documentation [FLINK-24987] - Enhance ExternalizedCheckpointCleanup enum [FLINK-25092] - Implement artifact cacher for Bash based Elasticsearch test Technical Debt [FLINK-24367] - Add a fallback AkkaRpcSystemLoader for tests in the IDE [FLINK-24445] - Move RPC System packaging to package phase [FLINK-24455] - FallbackAkkaRpcSystemLoader should check for maven errors [FLINK-24513] - AkkaRpcSystemLoader must be an ITCase [FLINK-24559] - flink-rpc-akka-loader does not bundle flink-rpc-akka [FLINK-24609] - flink-rpc-akka uses wrong Scala version property for parser-combinators [FLINK-24859] - Document new File formats [FLINK-25472] - Update to Log4j 2.17.1 `}),e.add({id:96,href:"/2022/01/07/apache-flink-ml-2.0.0-release-announcement/",title:"Apache Flink ML 2.0.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink ML 2.0.0! Flink ML is a library that provides APIs and infrastructure for building stream-batch unified machine learning algorithms, that can be easy-to-use and performant with (near-) real-time latency.
This release involves a major refactor of the earlier Flink ML library and introduces major features that extend the Flink ML API and the iteration runtime, such as supporting stages with multi-input multi-output, graph-based stage composition, and a new stream-batch unified iteration library. Moreover, we added five algorithm implementations in this release, which is the start of a long-term initiative to provide a large number of off-the-shelf algorithms in Flink ML with state-of-the-art performance.
We believe this release is an important step towards extending Apache Flink to a wide range of machine learning use cases, especially the real-time machine learning scenarios.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA! We hope you like the new release and we’d be eager to learn about your experience with it.
Notable Features # API and Infrastructure # Supporting stages requiring multi-input multi-output # Stages in a machine learning workflow might take multiple inputs and return multiple outputs. For example, a graph embedding algorithm might need to read two tables, which represent the edge and node of the graph respectively. And a workflow might need a stage that splits the input dataset into two output datasets, for training and testing respectively.
With this capability, algorithm developers can assemble a machine learning workflow as a directed acyclic graph (DAG) of pre-defined stages. And this workflow can be configured and deployed without users knowing the implementation details of this graph. This improvement could considerably expand the applicability and usability of Flink ML.
Supporting online learning with APIs exposing model data # In a native online learning scenario, we have a long-running job that keeps processing training data and updating a machine learning model. And we could have multiple jobs deployed in web servers which do online inference. It is necessary to transmit the latest model data from the training job to those inference jobs in (near-) real-time latency.
The traditional Estimator/Transformer paradigm does not provide APIs to expose this model data in a streaming manner. Users have to repeatedly call fit() to update model data. Although users might be able to update model data once every few minutes, it is likely very inefficient, if not impossible, to update model data once every few seconds with this approach.
With FLIP-173, model data can be exposed as an unbounded stream via the getModelData() API. Then algorithm users can transfer the model data to web servers in real-time and use the up-to-date model data to do online inference. This feature could significantly strengthen Flink ML’s capability to support online learning applications.
Improved usability for managing parameters # We care a lot about usability and developer velocity in Flink ML. In this release, we refactored and significantly simplified the experience of defining, getting and setting parameters for algorithms.
With FLIP-174, parameters can be defined as static variables of an interface, and any algorithm that implements the interface could inherit these variable definitions without additional work. Commonly used parameter validators are provided as part of the infrastructure.
Tools for composing DAG of stages into a new stage # One of the most useful tool in the existing ML libraries (e.g. Scikit-learn, Flink, Spark) is Pipeline, which allows users to compose an estimator from an ordered list of estimators and transformers, without having to explicitly implement the fit/transform for the estimator/transformer.
FLIP-175 extended this capability from pipeline to DAG. Users can now compose an estimator from a DAG of estimator and transformers. This capability of composition allows developers to slice a complex workflow into simpler modules and re-use the modules across multiple workflows. We believe this capability could significantly improve the experience of building and deploying complex workflows using Flink ML.
Stream-batch Unified Iteration Library # To support training machine learning algorithms and adjust the model parameters dynamically based on the prediction result, it is necessary to have native support for processing data iteratively. It is known that Flink uses DAG to describe the process logic, thus we need to provide the iteration library on top of Flink separately. Besides, since we need to support both offline training and online training / adjustment, the iteration library should support both streaming and batch cases.
FLIP-176 implements a stream-batch unified iteration library. It provides the function of transmitting records back to the precedent operators and the ability to track the progress of rounds inside the iteration. Users could directly use DataStream API and Table API to express the execution logic inside the iteration. Besides, the new iteration library also extends Flink’s checkpointing mechanism to also support exactly-once failover for jobs using iterations.
Python SDK # Nowadays many machine learning practitioners are used to developing machine learning workflows in Python due to its ease-of-use and excellent ecosystem. To meet the needs of these users, a Python package dedicated for Flink ML is created starting from this release. The Python package currently provides APIs similar to their Java counterparts for developing machine learning algorithms.
Users can install Flink ML Python package through pip using the following command:
pip install apache-flink-ml In the future we will enhance the Python SDK to enable its interoperability with Flink ML’s Java library, for example, allowing users to express machine learning workflows in Python, where workflows consist of a mixture of stages from the Flink ML Java library as well as stages implemented in Python (e.g. a TensorFlow program).
Algorithm Library # Now that the Flink ML API re-design is done, we started the initiative to add off-the-shelf algorithms in Flink ML. The release of Flink-ML 2.0.0 is closely related to project Alink - an Apache Flink ecosystem project open sourced by Alibaba. The connection between the Flink community and developers of the Alink project dates back to 2017. The project Alink developers have a significant contribution in designing the new Flink ML APIs, refactoring, optimizing and migrating algorithms from Alink to Flink. Our long-term goal is to provide a library of performant algorithms that are easy to use, debug and customize for your needs.
We have implemented five algorithms in this release, i.e. logistic regression, k-means, k-nearest neighbors, naive bayes and one-hot encoder. For now these algorithms focus on validating the APIs and iteration runtime. In addition to adding more and more algorithms, we will also stress test and optimize their performance to make sure these algorithms have state-of-the-art performance. Stay tuned!
Related Work # Flink ML project moved to a separate repository # To accelerate the development of Flink ML, the effort has moved to the new repository flink-ml under the Flink project. We here follow a similar approach like the Stateful Functions effort, where a separate repository has helped to speed up the development by allowing for more light-weight contribution workflows and separate release cycles.
Github organization created for Flink ecosystem projects # To facilitate the community collaboration on ecosystem projects that extend the capability of the Apache Flink, Apache Flink PMC has granted the permission to use flink-extended as the name of this GitHub organization, which provides a neutral place to host the code of ecosystem projects.
Two Flink ML related projects have been moved to this organization. dl-on-flink provides the capability to implement Flink ML stages using TensorFlow. And clink is a library that facilitates the implementation of Flink ML stages using C++ in order to support e.g. real-time feature engineering.
We hope you can join this effort and share your Flink ecosystem projects in this Github organization. And stay tuned for more updates on ecosystem projects.
Upgrade Notes # Please review this note for a list of adjustments to make and issues to check if you plan to upgrade to Flink ML 2.0.0.
This note discusses any critical information about incompatibilities and breaking changes, performance changes, and any other changes that might impact your production deployment of Flink ML.
Module names are changed.
We have replaced the flink-ml-api module with the flink-ml-core_2.12 module.
For users who have a dependency on flink-ml-api, please replace it with flink-ml-core_2.12
PipelineStage and its subclasses are changed.
FLIP-173 made major changes to PipelineStage and its subclasses. Changes include class rename, method signature change, method removal etc.
Users who use PipelineStage and its subclasses should use the new APIs introduced in FLIP-173.
Param-related classes are changed.
FLIP-174 made major changes to the param-related classes. Changes include class rename, method signature change, method removal etc.
Users who use classes such as Params and WithParams should use the new APIs introduced in FLIP-174.
Flink dependency is changed from 1.12 to 1.14.
This change introduces all the breaking changes listed in the Flink 1.14 release notes. One major change is that the DataSet API is not supported anymore.
Users who use DataSet::iterate should switch to using the datastream-based iteration API introduced in FLIP-176.
Release Notes and Resources # Please take a look at the release notes for a detailed list of changes and new features.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent distribution of Flink ML Python package is available on PyPI.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
Yun Gao, Dong Lin, Zhipeng Zhang, huangxingbo, Yunfeng Zhou, Jiangjie (Becket) Qin, weibo, abdelrahman-ik.
`}),e.add({id:97,href:"/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-one/",title:"How We Improved Scheduler Performance for Large-scale Jobs - Part One",section:"Flink Blog",content:` Introduction # When scheduling large-scale jobs in Flink 1.12, a lot of time is required to initialize jobs and deploy tasks. The scheduler also requires a large amount of heap memory in order to store the execution topology and host temporary deployment descriptors. For example, for a job with a topology that contains two vertices connected with an all-to-all edge and a parallelism of 10k (which means there are 10k source tasks and 10k sink tasks and every source task is connected to all sink tasks), Flink’s JobManager would require 30 GiB of heap memory and more than 4 minutes to deploy all of the tasks.
Furthermore, task deployment may block the JobManager&rsquo;s main thread for a long time and the JobManager will not be able to respond to any other requests from TaskManagers. This could lead to heartbeat timeouts that trigger a failover. In the worst case, this will render the Flink cluster unusable because it cannot deploy the job.
To improve the performance of the scheduler for large-scale jobs, we&rsquo;ve implemented several optimizations in Flink 1.13 and 1.14:
Introduce the concept of consuming groups to optimize procedures related to the complexity of topologies, including the initialization, scheduling, failover, and partition release. This also reduces the memory required to store the topology; Introduce a cache to optimize task deployment, which makes the process faster and requires less memory; Leverage characteristics of the logical topology and the scheduling topology to speed up the building of pipelined regions. Benchmarking Results # To estimate the effect of our optimizations, we conducted several experiments to compare the performance of Flink 1.12 (before the optimization) with Flink 1.14 (after the optimization). The job in our experiments contains two vertices connected with an all-to-all edge. The parallelisms of these vertices are both 10K. To make temporary deployment descriptors distributed via the blob server, we set the configuration blob.offload.minsize to 100 KiB (from default value 1 MiB). This configuration means that the blobs larger than the set value will be distributed via the blob server, and the size of deployment descriptors in our test job is about 270 KiB. The results of our experiments are illustrated below:
Table 1 - The comparison of time cost between Flink 1.12 and 1.14 Procedure 1.12 1.14 Reduction(%) Job Initialization 11,431ms 627ms 94.51% Task Deployment 63,118ms 17,183ms 72.78% Computing tasks to restart when failover 37,195ms 170ms 99.55% In addition to quicker speeds, the memory usage is significantly reduced. It requires 30 GiB heap memory for a JobManager to deploy the test job and keep it running stably with Flink 1.12, while the minimum heap memory required by the JobManager with Flink 1.14 is only 2 GiB. There are also less occurrences of long-term garbage collection. When running the test job with Flink 1.12, a garbage collection that lasts more than 10 seconds occurs during both job initialization and task deployment. With Flink 1.14, since there is no long-term garbage collection, there is also a decreased risk of heartbeat timeouts, which creates better cluster stability.
In our experiment, it took more than 4 minutes for the large-scale job with Flink 1.12 to transition to running (excluding the time spent on allocating resources). With Flink 1.14, it took no more than 30 seconds (excluding the time spent on allocating resources). The time cost is reduced by 87%. Thus, for users who are running large-scale jobs for production and want better scheduling performance, please consider upgrading Flink to 1.14.
In part two of this blog post, we are going to talk about these improvements in detail.
`}),e.add({id:98,href:"/2022/01/04/how-we-improved-scheduler-performance-for-large-scale-jobs-part-two/",title:"How We Improved Scheduler Performance for Large-scale Jobs - Part Two",section:"Flink Blog",content:`Part one of this blog post briefly introduced the optimizations we’ve made to improve the performance of the scheduler; compared to Flink 1.12, the time cost and memory usage of scheduling large-scale jobs in Flink 1.14 is significantly reduced. In part two, we will elaborate on the details of these optimizations.
Reducing complexity with groups # A distribution pattern describes how consumer tasks are connected to producer tasks. Currently, there are two distribution patterns in Flink: pointwise and all-to-all. When the distribution pattern is pointwise between two vertices, the computational complexity of traversing all edges is O(n). When the distribution pattern is all-to-all, the complexity of traversing all edges is O(n2), which means that complexity increases rapidly when the scale goes up.
Fig. 1 - Two distribution patterns in Flink In Flink 1.12, the ExecutionEdge class is used to store the information of connections between tasks. This means that for the all-to-all distribution pattern, there would be O(n2) ExecutionEdges, which would take up a lot of memory for large-scale jobs. For two JobVertices connected with an all-to-all edge and a parallelism of 10K, it would take more than 4 GiB memory to store 100M ExecutionEdges. Since there can be multiple all-to-all connections between vertices in production jobs, the amount of memory required would increase rapidly.
As we can see in Fig. 1, for two JobVertices connected with the all-to-all distribution pattern, all IntermediateResultPartitions produced by upstream ExecutionVertices are isomorphic, which means that the downstream ExecutionVertices they connect to are exactly the same. The downstream ExecutionVertices belonging to the same JobVertex are also isomorphic, as the upstream IntermediateResultPartitions they connect to are the same too. Since every JobEdge has exactly one distribution type, we can divide vertices and result partitions into groups according to the distribution type of the JobEdge.
For the all-to-all distribution pattern, since all downstream ExecutionVertices belonging to the same JobVertex are isomorphic and belong to a single group, all the result partitions they consume are connected to this group. This group is called ConsumerVertexGroup. Inversely, all the upstream result partitions are grouped into a single group, and all the consumer vertices are connected to this group. This group is called ConsumedPartitionGroup.
The basic idea of our optimizations is to put all the vertices that consume the same result partitions into one ConsumerVertexGroup, and put all the result partitions with the same consumer vertices into one ConsumedPartitionGroup.
Fig. 2 - How partitions and vertices are grouped w.r.t. distribution patterns When scheduling tasks, Flink needs to iterate over all the connections between result partitions and consumer vertices. In the past, since there were O(n2) edges in total, the overall complexity of the iteration was O(n2). Now ExecutionEdge is replaced with ConsumerVertexGroup and ConsumedPartitionGroup. As all the isomorphic result partitions are connected to the same downstream ConsumerVertexGroup, when the scheduler iterates over all the connections, it just needs to iterate over the group once. The computational complexity decreases from O(n2) to O(n).
For the pointwise distribution pattern, one ConsumedPartitionGroup is connected to one ConsumerVertexGroup point-to-point. The number of groups is the same as the number of ExecutionEdges. Thus, the computational complexity of iterating over the groups is still O(n).
For the example job we mentioned above, replacing ExecutionEdges with the groups can effectively reduce the memory usage of ExecutionGraph from more than 4 GiB to about 12 MiB. Based on the concept of groups, we further optimized several procedures, including job initialization, scheduling tasks, failover, and partition releasing. These procedures are all involved with traversing all consumer vertices for all the partitions. After the optimization, their overall computational complexity decreases from O(n2) to O(n).
Optimizations related to task deployment # The problem # In Flink 1.12, it takes a long time to deploy tasks for large-scale jobs if they contain all-to-all edges. Furthermore, a heartbeat timeout may happen during or after task deployment, which makes the cluster unstable.
Currently, task deployment includes the following steps:
A JobManager creates TaskDeploymentDescriptors for each task, which happens in the JobManager&rsquo;s main thread; The JobManager serializes TaskDeploymentDescriptors asynchronously; The JobManager ships serialized TaskDeploymentDescriptors to TaskManagers via RPC messages; TaskManagers create new tasks based on the TaskDeploymentDescriptors and execute them. A TaskDeploymentDescriptor (TDD) contains all the information required by TaskManagers to create a task. At the beginning of task deployment, a JobManager creates the TDDs for all tasks. Since this happens in the main thread, the JobManager cannot respond to any other requests. For large-scale jobs, the main thread may get blocked for a long time, heartbeat timeouts may happen, and a failover would be triggered.
A JobManager can become a bottleneck during task deployment since all descriptors are transmitted from it to all TaskManagers. For large-scale jobs, these temporary descriptors would require a lot of heap memory and cause frequent long-term garbage collection pauses.
Thus, we need to speed up the creation of the TDDs. Furthermore, if the size of descriptors can be reduced, then they will be transmitted faster, which leads to faster task deployments.
The solution # Cache ShuffleDescriptors # ShuffleDescriptors are used to describe the information of result partitions that a task consumes and can be the largest part of a TaskDeploymentDescriptor. For an all-to-all edge, when the parallelisms of both upstream and downstream vertices are n, the number of ShuffleDescriptors for each downstream vertex is n, since they are connected to n upstream vertices. Thus, the total count of the ShuffleDescriptors for the vertices is n2.
However, the ShuffleDescriptors for the downstream vertices are all the same since they all consume the same upstream result partitions. Therefore, Flink doesn&rsquo;t need to create ShuffleDescriptors for each downstream vertex individually. Instead, it can create them once and cache them to be reused. This will decrease the overall complexity of creating TaskDeploymentDescriptors for tasks from O(n2) to O(n).
To decrease the size of RPC messages and reduce the transmission of replicated data over the network, the cached ShuffleDescriptors can be compressed. For the example job we mentioned above, if the parallelisms of vertices are both 10k, then each downstream vertex has 10k ShuffleDescriptors. After compression, the size of the serialized value would be reduced by 72%.
Distribute ShuffleDescriptors via the blob server # A blob (binary large objects) is a collection of binary data used to store large files. Flink hosts a blob server to transport large-sized data between the JobManager and TaskManagers. When a JobManager decides to transmit a large file to TaskManagers, it would first store the file in the blob server (will also upload files to the distributed file system) and get a token representing the blob, called the blob key. It would then transmit the blob key instead of the blob file to TaskManagers. When TaskManagers get the blob key, they will retrieve the file from the distributed file system (DFS). The blobs are stored in the blob cache on TaskManagers so that they only need to retrieve the file once.
During task deployment, the JobManager is responsible for distributing the ShuffleDescriptors to TaskManagers via RPC messages. The messages will be garbage collected once they are sent. However, if the JobManager cannot send the messages as fast as they are created, these messages would take up a lot of space in heap memory and become a heavy burden for the garbage collector to deal with. There will be more long-term garbage collections that stop the world and slow down the task deployment.
To solve this problem, the blob server can be used to distribute large ShuffleDescriptors. The JobManager first sends ShuffleDescriptors to the blob server, which stores ShuffleDescriptors in the DFS. TaskManagers request ShuffleDescriptors from the DFS once they begin to process TaskDeploymentDescriptors. With this change, the JobManager doesn&rsquo;t need to keep all the copies of ShuffleDescriptors in heap memory until they are sent. Moreover, the frequency of garbage collections for large-scale jobs is significantly reduced. Also, task deployment will be faster since there will be no bottleneck during task deployment anymore, because the DFS provides multiple distributed nodes for TaskManagers to download the ShuffleDescriptors from.
Fig. 3 - How ShuffleDescriptors are distributed To avoid running out of space on the local disk, the cache will be cleared when the related partitions are no longer valid and a size limit is added for ShuffleDescriptors in the blob cache on TaskManagers. If the overall size exceeds the limit, the least recently used cached value will be removed. This ensures that the local disks on the JobManager and TaskManagers won&rsquo;t be filled up with ShuffleDescriptors, especially in session mode.
Optimizations when building pipelined regions # In Flink, there are two types of data exchanges: pipelined and blocking. When using blocking data exchanges, result partitions are first fully produced and then consumed by the downstream vertices. The produced results are persisted and can be consumed multiple times. When using pipelined data exchanges, result partitions are produced and consumed concurrently. The produced results are not persisted and can be consumed only once.
Since the pipelined data stream is produced and consumed simultaneously, Flink needs to make sure that the vertices connected via pipelined data exchanges execute at the same time. These vertices form a pipelined region. The pipelined region is the basic unit of scheduling and failover by default. During scheduling, all vertices in a pipelined region will be scheduled together, and all pipelined regions in the graph will be scheduled one by one topologically.
Fig. 4 - The LogicalPipelinedRegion and the SchedulingPipelinedRegion Currently, there are two types of pipelined regions in the scheduler: LogicalPipelinedRegion and SchedulingPipelinedRegion. The LogicalPipelinedRegion denotes the pipelined regions on the logical level. It consists of JobVertices and forms the JobGraph. The SchedulingPipelinedRegion denotes the pipelined regions on the execution level. It consists of ExecutionVertices and forms the ExecutionGraph. Like ExecutionVertices are derived from a JobVertex, SchedulingPipelinedRegions are derived from a LogicalPipelinedRegion, as Fig. 4 shows.
During the construction of pipelined regions, a problem arises: There may be cyclic dependencies between pipelined regions. A pipelined region can be scheduled if and only if all its dependencies have finished. However, if there are two pipelined regions with cyclic dependencies between each other, there will be a scheduling deadlock. They are both waiting for the other one to be scheduled first, and none of them can be scheduled. Therefore, Tarjan&rsquo;s strongly connected components algorithm is adopted to discover the cyclic dependencies between regions and merge them into one pipelined region. It will traverse all the edges in the topology. For the all-to-all distribution pattern, the number of edges is O(n2). Thus, the computational complexity of this algorithm is O(n2), and it significantly slows down the initialization of the scheduler.
Fig. 5 - The topology with scheduling deadlock To speed up the construction of pipelined regions, the relevance between the logical topology and the scheduling topology can be leveraged. Since a SchedulingPipelinedRegion is derived from just one LogicalPipelinedRegion, Flink traverses all LogicalPipelinedRegions and converts them into SchedulingPipelinedRegions one by one. The conversion varies based on the distribution patterns of edges that connect vertices in the LogicalPipelinedRegion.
If there are any all-to-all distribution patterns inside the region, the entire region can just be converted into one SchedulingPipelinedRegion directly. That&rsquo;s because for the all-to-all edge with the pipelined data exchange, all the regions connected to this edge must execute simultaneously, which means they are merged into one region. For the all-to-all edge with a blocking data exchange, it will introduce cyclic dependencies, as Fig. 5 shows. All the regions it connects must be merged into one region to avoid scheduling deadlock, as Fig. 6 shows. Since there&rsquo;s no need to use Tarjan&rsquo;s algorithm, the computational complexity in this case is O(n).
If there are only pointwise distribution patterns inside a region, Tarjan&rsquo;s strongly connected components algorithm is still used to ensure no cyclic dependencies. Since there are only pointwise distribution patterns, the number of edges in the topology is O(n), and the computational complexity of the algorithm will be O(n).
Fig. 6 - How to convert a LogicalPipelinedRegion to ScheduledPipelinedRegions After the optimization, the overall computational complexity of building pipelined regions decreases from O(n2) to O(n). In our experiments, for the job which contains two vertices connected with a blocking all-to-all edge, when their parallelisms are both 10K, the time of building pipelined regions decreases by 99%, from 8,257 ms to 120 ms.
Summary # All in all, we&rsquo;ve done several optimizations to improve the scheduler’s performance for large-scale jobs in Flink 1.13 and 1.14. The optimizations involve procedures including job initialization, scheduling, task deployment, and failover. If you have any questions about them, please feel free to start a discussion in the dev mail list.
`}),e.add({id:99,href:"/2021/12/22/apache-flink-statefun-log4j-emergency-release/",title:"Apache Flink StateFun Log4j emergency release",section:"Flink Blog",content:`The Apache Flink community has released an emergency bugfix version of Apache Flink Stateful Function 3.1.1.
This release include a version upgrade of Apache Flink to 1.13.5, for log4j to address CVE-2021-44228 and CVE-2021-45046.
We highly recommend all users to upgrade to the latest patch release.
You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink-statefun dockerhub repository.
`}),e.add({id:100,href:"/2021/12/16/apache-flink-log4j-emergency-releases/",title:"Apache Flink Log4j emergency releases",section:"Flink Blog",content:`The Apache Flink community has released emergency bugfix versions of Apache Flink for the 1.11, 1.12, 1.13 and 1.14 series.
These releases only include a version upgrade for Log4j to address CVE-2021-44228 and CVE-2021-45046.
We highly recommend all users to upgrade to the respective patch release.
You can find the source and binaries on the updated Downloads page, and Docker images in the apache/flink dockerhub repository.
We are publishing this announcement earlier than usual to give users access to the updated source/binary releases as soon as possible. As a result of that certain artifacts are not yet available:
Maven artifacts are currently being synced to Maven central and will become available over the next 24 hours. The 1.11.6/1.12.7 Python binaries will be published at a later date. This post will be continously updated to reflect the latest state.
The newly released versions are: 1.14.2 1.13.5 1.12.7 1.11.6 To clarify and avoid confusion: The 1.14.1 / 1.13.4 / 1.12.6 / 1.11.5 releases, which were supposed to only contain a Log4j upgrade to 2.15.0, were skipped because CVE-2021-45046 was discovered during the release publication. Some artifacts were published to Maven Central, but no source/binary releases nor Docker images are available for those versions.
`}),e.add({id:101,href:"/2021/12/10/advise-on-apache-log4j-zero-day-cve-2021-44228/",title:"Advise on Apache Log4j Zero Day (CVE-2021-44228)",section:"Flink Blog",content:` Please see [this](/news/2021/12/16/log4j-patch-releases) for our updated recommendation regarding this CVE. Yesterday, a new Zero Day for Apache Log4j was reported. It is by now tracked under CVE-2021-44228.
Apache Flink is bundling a version of Log4j that is affected by this vulnerability. We recommend users to follow the advisory of the Apache Log4j Community. For Apache Flink this currently translates to setting the following property in your flink-conf.yaml:
env.java.opts: -Dlog4j2.formatMsgNoLookups=true If you are already setting env.java.opts.jobmanager, env.java.opts.taskmanager, env.java.opts.client, or env.java.opts.historyserver you should instead add the system change to those existing parameter lists.
As soon as Log4j has been upgraded to 2.15.0 in Apache Flink, this is not necessary anymore. This effort is tracked in FLINK-25240. It will be included in Flink 1.15.0, Flink 1.14.1 and Flink 1.13.3. We expect Flink 1.14.1 to be released in the next 1-2 weeks. The other releases will follow in their regular cadence.
`}),e.add({id:102,href:"/2021/11/03/flink-backward-the-apache-flink-retrospective/",title:"Flink Backward - The Apache Flink Retrospective",section:"Flink Blog",content:`It has now been a month since the community released Apache Flink 1.14 into the wild. We had a comprehensive look at the enhancements, additions, and fixups in the release announcement blog post, and now we will look at the development cycle from a different angle. Based on feedback collected from contributors involved in this release, we will explore the experiences and processes behind it all.
A retrospective on the release cycle # From the team, we collected emotions that have been attributed to points in time of the 1.14 release cycle:
The overall sentiment seems to be quite good. A ship crushed a robot two times, someone felt sick towards the end, an octopus causing negative emotions appeared in June&hellip;
We looked at the origin of these emotions and analyzed what went well and what could be improved. We also incorporated some feedback gathered from the community.
Problems faced # No release is perfect, and the community is constantly looking to improve.
Apache Flink has active contributors from around the globe, many of whom do not speak English as a first language. The community is still ironing out processes for delivering high-quality documentation and blog posts from a content perspective. It is a work in progress but we have contributors focusing on this component.
Each Flink release is built with the help of hundreds of contributors, each working on different parts of the project. Changes to one module may affect others in ways that are not always obvious. To maintain quality, the community supports an expansive test suite. Invariably, some tests are found to be flaky. Whenever we discover a test issue, the community opens a blocker issue that we must resolve before the next release. In practice, this leads to contributors triaging most test instabilities towards the end of each release cycle. From now on, we want to be more mindful of these failures and prioritize them when discovered.
Finally, the community pushed the planned feature freeze for 1.14 by two weeks. Two weeks is an improvement from previous release cycles, but we hope to continue improving this metric for 1.15.
Things enjoyed # The implementation of some features, such as buffer debloating and fine-grained resource management, went smoothly. Though a few issues are now popping up as people begin using them in production, it is satisfying to see an engineering effort go according to plan.
We also said goodbye to some components, the old table planner and integrated Mesos support. As any developer will tell you, there&rsquo;s nothing better than deleting old code and reducing complexity.
What we want to achieve through process changes # Transparency - let the community participate # When approaching a release, usually a couple of weeks after the previous release has been done, we set up bi-weekly meetings for the community to discuss any issues regarding the release. The usefulness of those meetings varied a lot, and so we started to track the efforts in the Apache Flink Confluence wiki.
We came up with a system to label the current states of each feature: “independent”, “won’t make it”, “very unlikely”, “will make it”, “done”, and “done done”. We introduced the “done done” state since we lacked a shared understanding of the definition of done. To qualify for “done done”, the feature is manually tested by someone not involved in the implementation. Additionally, there must exist comprehensive documentation that enables users to use the feature.
After each meeting, we provided updates on the mailing list and created a corresponding burn-down chart. Those efforts have well been received by our contributors, although they might still require some improvements.
The meeting used to only be for those driving the primary efforts, but we opened it up to the whole community for this release. While nobody ended up joining, we will continue to make the meetings open to everyone.
Stability - reduce building and testing pain # At one point, as we were coming close to the feature freeze, the stability of the master branch became quite unstable. Although we have encountered this issue in the past, building and testing Flink under such conditions was not ideal.
As a result, we focused on reducing stability issues, and the release managers have tried to organize and manage this effort. In future development cycles, the whole community needs to focus on the stability of the master branch. There are already improvements in the making, and they will hopefully enhance the experience of contributing significantly.
Documentation - make it user-friendly # Coming back to Apache traditions, most of the documentation (if any) was still being pushed after the feature freeze. As mentioned before, documentation is required to achieve the level of &ldquo;done done&rdquo;. Going forward, we will keep more of an eye on pushing documentation earlier in the development process. Apache Flink is an amazing piece of software that can solve so many problems, but we can do so much more in improving the user experience and introducing it to a wider audience.
API consistency - a timeless, joyful experience # The issue of API consistency was not caused by the 1.14 release, but popped up during the development cycle nevertheless, including a bigger discussion on the mailing list. While we tried to be transparent about the stability guarantees of an API (there are no guarantees across major versions), this was not made very clear or easy to find. Since many users rely on PublicEvolving APIs (due to a lack of Public API additions), this resulted in problems for downstream projects.
Moving forward, we will document more clearly what the guarantees are and introduce a process for promoting PublicEvolving APIs. This might involve generating a report on any removed/modified PublicEvolving APIs during the release cycle so that downstream projects can prepare for the changes.
Some noteworthy items # The first iteration for the buffer debloat feature was done in a Hackathon.
Our Apache Flink 1.14 Release wiki page has 167 historic versions. For comparison, FLIP 147 (one of the most active FLIPs) has just 76.
With FLINK-2491, we closed the third most watched issue in the Apache Flink Jira. This makes sense since FLINK-2491 was created 6 years ago (August 6, 2015). The second oldest issue was created in 2017.
:heart:
An open source community is more than just working on software. Apache Flink is the perfect example of software that is collaborated on in all parts of the world. The active mailing list, the discussions on FLIPs, and the interactions on Jira tickets all document how people work together to build something great. We should never forget that.
In the meantime, the community is already working towards Apache Flink 1.15. If you would like to become a contributor, please reach out via the dev mailing list. We are happy to help you find a ticket to get started on.
`}),e.add({id:103,href:"/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-one/",title:"Sort-Based Blocking Shuffle Implementation in Flink - Part One",section:"Flink Blog",content:`Part one of this blog post will explain the motivation behind introducing sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature.
How data gets passed around between operators # Data shuffling is an important stage in batch processing applications and describes how data is sent from one operator to the next. In this phase, output data of the upstream operator will spill over to persistent storages like disk, then the downstream operator will read the corresponding data and process it. Blocking shuffle means that intermediate results from operator A are not sent immediately to operator B until operator A has completely finished.
The hash-based and sort-based blocking shuffle are two main blocking shuffle implementations widely adopted by existing distributed data processing frameworks:
Hash-Based Approach: The core idea behind the hash-based approach is to write data consumed by different consumer tasks to different files and each file can then serve as a natural boundary for the partitioned data. Sort-Based Approach: The core idea behind the sort-based approach is to write all the produced data together first and then leverage sorting to cluster data belonging to different data partitions or even keys. The sort-based blocking shuffle was introduced in Flink 1.12 and further optimized and made production-ready in 1.13 for both stability and performance. We hope you enjoy the improvements and any feedback is highly appreciated.
Motivation behind the sort-based implementation # The hash-based blocking shuffle has been supported in Flink for a long time. However, compared to the sort-based approach, it can have several weaknesses:
Stability: For batch jobs with high parallelism (tens of thousands of subtasks), the hash-based approach opens many files concurrently while writing or reading data, which can give high pressure to the file system (i.e. maintenance of too many file metas, exhaustion of inodes or file descriptors). We have encountered many stability issues when running large-scale batch jobs via the hash-based blocking shuffle. Performance: For large-scale batch jobs, the hash-based approach can produce too many small files: for each data shuffle (or connection), the number of output files is (producer parallelism) * (consumer parallelism) and the average size of each file is (shuffle data size) / (number of files). The random IO caused by writing/reading these fragmented files can influence the shuffle performance a lot, especially on spinning disks. See the benchmark results section for more information. By introducing the sort-based blocking shuffle implementation, fewer data files will be created and opened, and more sequential reads are done. As a result, better stability and performance can be achieved.
Moreover, the sort-based implementation can save network buffers for large-scale batch jobs. For the hash-based implementation, the network buffers needed for each output result partition are proportional to the consumers’ parallelism. For the sort-based implementation, the network memory consumption can be decoupled from the parallelism, which means that a fixed size of network memory can satisfy requests for all result partitions, though more network memory may lead to better performance.
Benchmark results on stability and performance # Aside from the problem of consuming too many file descriptors and inodes mentioned in the above section, the hash-based blocking shuffle also has a known issue of creating too many files which blocks the TaskExecutor’s main thread (FLINK-21201). In addition, some large-scale jobs like q78 and q80 of the tpc-ds benchmark failed to run on the hash-based blocking shuffle in our tests because of the “connection reset by peer” exception which is similar to the issue reported in FLINK-19925 (reading shuffle data by Netty threads can influence network stability).
We ran the tpc-ds test suit (10T scale with 1050 max parallelism) for both the hash-based and the sort-based blocking shuffle. The results show that the sort-based shuffle can achieve 2-6 times more performance gain compared to the hash-based one on spinning disks. If we exclude the computation time, up to 10 times performance gain can be achieved for some jobs. Here are some performance results of our tests:
Jobs Time used for Sort-Shuffle (s) Time used for Hash-Shuffle (s) Speedup Factor q4.sql 986 5371 5.45 q11.sql 348 798 2.29 q14b.sql 883 2129 2.51 q17.sql 269 781 2.90 q23a.sql 418 1199 2.87 q23b.sql 376 843 2.24 q25.sql 413 873 2.11 q29.sql 354 1038 2.93 q31.sql 223 498 2.23 q50.sql 215 550 2.56 q64.sql 217 442 2.04 q74.sql 270 962 3.56 q75.sql 166 713 4.30 q93.sql 204 540 2.65 The throughput per disk of the new sort-based implementation can reach up to 160MB/s for both writing and reading on our testing nodes: Disk Name Disk SDI Disk SDJ Disk SDK Writing Speed (MB/s) 189 173 186 Reading Speed (MB/s) 112 154 158 **Note:** The following table shows the settings for our test cluster. Because we have a large available memory size per node, those jobs of small shuffle size will exchange their shuffle data purely via memory (page cache). As a result, evident performance differences are seen only between those jobs which shuffle a large amount of data. Number of Nodes Memory Size Per Node Cores Per Node Disks Per Node 12 About 400G 96 3 How to use this new feature # The sort-based blocking shuffle is introduced mainly for large-scale batch jobs but it also works well for batch jobs with low parallelism.
The sort-based blocking shuffle is not enabled by default. You can enable it by setting the taskmanager.network.sort-shuffle.min-parallelism config option to a smaller value. This means that for parallelism smaller than this threshold, the hash-based blocking shuffle will be used, otherwise, the sort-based blocking shuffle will be used (it has no influence on streaming applications). Setting this option to 1 will disable the hash-based blocking shuffle.
For spinning disks and large-scale batch jobs, you should use the sort-based blocking shuffle. For low parallelism (several hundred processes or fewer) on solid state drives, both implementations should be fine.
There are several other config options that can have an impact on the performance of the sort-based blocking shuffle:
taskmanager.network.blocking-shuffle.compression.enabled: This enables shuffle data compression, which can reduce both the network and the disk IO with some CPU overhead. It is recommended to enable shuffle data compression unless the data compression ratio is low. It works for both sort-based and hash-based blocking shuffle.
taskmanager.network.sort-shuffle.min-buffers: This declares the minimum number of required network buffers that can be used as the in-memory sort-buffer per result partition for data caching and sorting. Increasing the value of this option may improve the blocking shuffle performance. Several hundreds of megabytes of memory is usually enough for large-scale batch jobs.
taskmanager.memory.framework.off-heap.batch-shuffle.size: This configuration defines the maximum memory size that can be used by data reading of the sort-based blocking shuffle per task manager. Increasing the value of this option may improve the shuffle read performance, and usually, several hundreds of megabytes of memory is enough for large-scale batch jobs. Because this memory is cut from the framework off-heap memory, you may also need to increase taskmanager.memory.framework.off-heap.size before you increase this value.
For more information about blocking shuffle in Flink, please refer to the official documentation.
Note: From the optimization mechanism in part two, we can see that the IO scheduling relies on the concurrent data read requests of the downstream consumer tasks for more sequential reads. As a result, if the downstream consumer task is running one by one (for example, because of limited resources), the advantage brought by IO scheduling disappears, which can influence performance. We may further optimize this scenario in future versions.
What&rsquo;s next? # For details on the design and implementation of this feature, please refer to the second part of this blog!
`}),e.add({id:104,href:"/2021/10/26/sort-based-blocking-shuffle-implementation-in-flink-part-two/",title:"Sort-Based Blocking Shuffle Implementation in Flink - Part Two",section:"Flink Blog",content:`Part one of this blog post explained the motivation behind introducing sort-based blocking shuffle, presented benchmark results, and provided guidelines on how to use this new feature.
Like sort-merge shuffle implemented by other distributed data processing frameworks, the whole sort-based shuffle process in Flink consists of several important stages, including collecting data in memory, sorting the collected data in memory, spilling the sorted data to files, and reading the shuffle data from these spilled files. However, Flink’s implementation has some core differences, including the multiple data region file structure, the removal of file merge, and IO scheduling.
In part two of this blog post, we will give you insight into some core design considerations and implementation details of the sort-based blocking shuffle in Flink and list several ideas for future improvement.
Design considerations # There are several core objectives we want to achieve for the new sort-based blocking shuffle to be implemented Flink:
Produce fewer (small) files # As discussed above, the hash-based blocking shuffle would produce too many small files for large-scale batch jobs. Producing fewer files can help to improve both stability and performance. The sort-merge approach has been widely adopted to solve this problem. By first writing to the in-memory buffer and then sorting and spilling the data into a file after the in-memory buffer is full, the number of output files can be reduced, which becomes (total data size) / (in-memory buffer size). Then by merging the produced files together, the number of files can be further reduced and larger data blocks can provide better sequential reads.
Flink’s sort-based blocking shuffle adopts a similar logic. A core difference is that data spilling will always append data to the same file so only one file will be spilled for each output, which means fewer files are produced.
Open fewer files concurrently # The hash-based implementation will open all partition files when writing and reading data which will consume resources like file descriptors and native memory. Exhaustion of file descriptors will lead to stability issues like &ldquo;too many open files&rdquo;.
By always writing/reading only one file per data result partition and sharing the same opened file channel among all the concurrent data reads from the downstream consumer tasks, Flink’s sort-based blocking shuffle implementation can greatly reduce the number of concurrently opened files.
Create more sequential disk IO # Although the hash-based implementation writes and reads each output file sequentially, the large amount of writing and reading can cause random IO because of the large number of files being processed concurrently, which means that reducing the number of files can also achieve more sequential IO.
In addition to producing larger files, there are some other optimizations implemented by Flink. In the data writing phase, by merging small output data together into larger batches and writing through the writev system call, more writing sequential IO can be achieved. In the data reading phase, more sequential data reading IO is achieved by IO scheduling. In short, Flink tries to always read data in file offset order which maximizes sequential reads. Please refer to the IO scheduling section for more information.
Have less disk IO amplification # The sort-merge approach can reduce the number of files and produce larger data blocks by merging the spilled data files together. One down side of this approach is that it writes and reads the same data multiple times because of the data merging and, theoretically, it may also take up more storage space than the total size of shuffle data.
Flink’s implementation eliminates the data merging phase by spilling all data of one data result partition together into one file. As a result, the total amount of disk IO can be reduced, as well as the storage space. Though without the data merging, the data blocks are not merged into larger ones. With the IO scheduling technique, Flink can still achieve good sequential reading and high disk IO throughput. The benchmark results from the first part shows that.
Decouple memory consumption from parallelism # Similar to the sort-merge implementation in other distributed data processing systems, Flink’s implementation uses a piece of fixed size (configurable) in-memory buffer for data sorting and the buffer does not necessarily need to be extended after the task parallelism is changed, though increasing the size may lead to better performance for large-scale batch jobs.
Note: This only decouples the memory consumption from the parallelism at the data producer side. On the data consumer side, there is an improvement which works for both streaming and batch jobs (see FLINK-16428).
Implementation details # Here are several core components and algorithms implemented in Flink’s sort-based blocking shuffle:
In-memory sort # In the sort-spill phase, data records are serialized to the in-memory sort buffer first. When the sort buffer is full or all output has been finished, the data in the sort buffer will be copied and spilled into the target data file in the specific order. The following is the sort buffer interface in Flink:
public interface SortBuffer { /** Appends data of the specified channel to this SortBuffer. */ boolean append(ByteBuffer source, int targetChannel, Buffer.DataType dataType) throws IOException; /** Copies data in this SortBuffer to the target MemorySegment. */ BufferWithChannel copyIntoSegment(MemorySegment target); long numRecords(); long numBytes(); boolean hasRemaining(); void finish(); boolean isFinished(); void release(); boolean isReleased(); } Currently, Flink does not need to sort records by key on the data producer side, so the default implementation of sort buffer only sorts data by subpartition index, which is achieved by binary bucket sort. More specifically, each data record will be serialized and attached a 16 bytes binary header. Among the 16 bytes, 4 bytes is for the record length, 4 bytes is for the data type (event or data buffer) and 8 bytes is for pointers to the next records belonging to the same subpartition to be consumed by the same downstream data consumer. When reading data from the sort buffer, all records of the same subpartition will be copied one by one following the pointer in the record header, which guarantees that for each subpartition, the order of record reading/spilling is the same order as when the record is emitted by the producer task. The following picture shows the internal structure of the in-memory binary sort buffer:
Storage structure # The data of each blocking result partition is stored as a physical data file on the disk. The data file consists of multiple data regions, one data spilling produces one data region. In each data region, the data is clustered by the subpartition ID (index) and each subpartition is corresponding to one data consumer.
The following picture shows the structure of a simple data file. This data file has three data regions (R1, R2, R3) and three consumers (C1, C2, C3). Data blocks B1.1, B2.1 and B3.1 will be consumed by C1, data blocks B1.2, B2.2 and B3.2 will be consumed by C2, and data blocks B1.3, B2.3 and B3.3 will be consumed by C3.
In addition to the data file, for each result partition, there is also an index file which contains pointers to the data file. The index file has the same number of regions as the data file. In each region, there are n (equals to the number of subpartitions) index entries. Each index entry consists of two parts: one is the file offset of the target data in the data file, the other is the data size. To reduce the disk IO caused by index data file access, Flink caches the index data using unmanaged heap memory if the index data file size is less than 4M. The following picture illustrates the relationship between index file and data file:
IO scheduling # Based on the storage structure described above, we introduced the IO scheduling technique to achieve more sequential reads for the sort-based blocking shuffle in Flink. The core idea behind IO scheduling is pretty simple. Just like the elevator algorithm for disk scheduling, the IO scheduling for sort-based blocking shuffle always tries to serve data read requests in the file offset order. More formally, we have n data regions indexed from 0 to n-1 in a result partition file. In each data region, there are m data subpartitions to be consumed by m downstream data consumers. These data consumers read data concurrently.
// let data_regions as the data region list indexed from 0 to n - 1 // let data_readers as the concurrent downstream data readers queue indexed from 0 to m - 1 for (data_region in data_regions) { data_reader = poll_reader_of_the_smallest_file_offset(data_readers); if (data_reader == null) break; reading_buffers = request_reading_buffers(); if (reading_buffers.isEmpty()) break; read_data(data_region, data_reader, reading_buffers); } Broadcast optimization # Shuffle data broadcast in Flink refers to sending the same collection of data to all the downstream data consumers. Instead of copying and writing the same data multiple times, Flink optimizes this process by copying and spilling the broadcast data only once, which improves the data broadcast performance.
More specifically, when broadcasting a data record to the sort buffer, the record will be copied and stored once. A similar thing happens when spilling the broadcast data into files. For index data, the only difference is that all the index entries for different downstream consumers point to the same data in the data file.
Data compression # Data compression is a simple but really useful technique to improve blocking shuffle performance. Similar to the data compression implementation of the hash-based blocking shuffle, data is compressed per buffer after it is copied from the in-memory sort buffer and before it is spilled to disk. If the data size becomes even larger after compression, the original uncompressed data buffer will be kept. Then the corresponding downstream data consumers are responsible for decompressing the received shuffle data when processing it. In fact, the sort-based blocking shuffle reuses those building blocks implemented for the hash-based blocking shuffle directly. The following picture illustrates the shuffle data compression process:
Future improvements # TCP Connection Reuse: This improvement is also useful for streaming applications which can improve the network stability. There are already tickets opened for it: FLINK-22643 and FLINK-15455.
Multi-Disks Load Balance: Multi-Disks Load Balance: In production environments, there are usually multiple disks per node, better load balance can lead to better performance, the relevant issues are FLINK-21790 and FLINK-21789.
External/Remote Shuffle Service: Implementing an external/remote shuffle service can further improve the shuffle io performance because as a centralized service, it can collect more information leading to more optimized decisions. For example, further merging of data to the same downstream task, better node-level load balance, handling of stragglers, shared resources and so on. There are several relevant issues: FLINK-13247, FLINK-22672, FLINK-19551 and FLINK-10653.
Enable the Choice of SSD/HDD: In production environments, there are usually both SSD and HDD storage. Some jobs may prefer SSD for the faster speed, some jobs may prefer HDD for larger space and cheaper price. Enabling the choice of SSD/HDD can improve the usability of Flink’s blocking shuffle.
`}),e.add({id:105,href:"/2021/10/19/apache-flink-1.13.3-released/",title:"Apache Flink 1.13.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.13 series.
This release includes 136 fixes and minor improvements for Flink 1.13.2. The list below includes bugfixes and improvements. For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.13.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.13.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Below you can find more information on changes that might affect the behavior of Flink:
Propagate unique keys for fromChangelogStream (FLINK-24033) # StreamTableEnvironment.fromChangelogStream might produce a different stream because primary keys were not properly considered before.
Table API &lsquo;Primary Key&rsquo; feature was not working correctly (FLINK-23895 FLINK-20374) # Various primary key issues have been fixed that effectively made it impossible to use this feature. The change might affect savepoint backwards compatibility for affected pipelines. Pipelines that were not affected should be able to restore from a savepoint without issues. The resulting changelog stream might be different after these changes.
Clarify SourceFunction#cancel() contract about interrupting (FLINK-23527) # The contract of the SourceFunction#cancel() method with respect to interruptions has been clarified:
The source itself shouldn&rsquo;t interrupt the source thread. The source can expect to not be interrupted during a clean cancellation procedure. taskmanager.slot.timeout falls back to akka.ask.timeout (FLINK-22002) # The config option taskmanager.slot.timeout falls now back to akka.ask.timeout if no value has been configured.
Increase akka.ask.timeout for tests using the MiniCluster (FLINK-23906) # The default akka.ask.timeout used by the MiniCluster has been increased to 5 minutes. If you want to use a smaller value, then you have to set it explicitly in the passed configuration. The change is due to the fact that messages cannot get lost in a single-process minicluster, so this timeout (which otherwise helps to detect message loss in distributed setups) has no benefit here. The increased timeout reduces the number of false-positive timeouts, for example during heavy tests on loaded CI/CD workers or during debugging.
`}),e.add({id:106,href:"/2021/09/29/apache-flink-1.14.0-release-announcement/",title:"Apache Flink 1.14.0 Release Announcement",section:"Flink Blog",content:`The Apache Software Foundation recently released its annual report and Apache Flink once again made it on the list of the top 5 most active projects! This remarkable activity also shows in the new 1.14.0 release. Once again, more than 200 contributors worked on over 1,000 issues. We are proud of how this community is consistently moving the project forward.
This release brings many new features and improvements in areas such as the SQL API, more connector support, checkpointing, and PyFlink. A major area of changes in this release is the integrated streaming &amp; batch experience. We believe that, in practice, unbounded stream processing goes hand-in-hand with bounded- and batch processing tasks, because many use cases require processing historic data from various sources alongside streaming data. Examples are data exploration when developing new applications, bootstrapping state for new applications, training models to be applied in a streaming application, or re-processing data after fixes/upgrades.
In Flink 1.14, we finally made it possible to mix bounded and unbounded streams in an application: Flink now supports taking checkpoints of applications that are partially running and partially finished (some operators reached the end of the bounded inputs). Additionally, bounded streams now take a final checkpoint when reaching their end to ensure smooth committing of results in sinks.
The batch execution mode now supports programs that use a mixture of the DataStream API and the SQL/Table API (previously only pure Table/SQL or DataStream programs).
The unified Source and Sink APIs have gotten an update, and we started consolidating the connector ecosystem around the unified APIs. We added a new hybrid source that can bridge between multiple storage systems. You can now do things like read old data from Amazon S3 and then switch over to Apache Kafka.
In addition, this release furthers our initiative in making Flink more self-tuning and easier to operate, without necessarily requiring a lot of Stream-Processor-specific knowledge. We started this initiative in the previous release with reactive scaling and are now adding automatic network memory tuning (a.k.a. Buffer Debloating). This feature speeds up checkpoints under high load while maintaining high throughput and without increasing checkpoint size. The mechanism continuously adjusts the network buffers to ensure the best throughput while having minimal in-flight data. See the Buffer Debloating section for more details.
There are many more improvements and new additions throughout various components, as we discuss below. We also had to say goodbye to some features that have been superceded by newer ones in recent releases, most prominently we are removing the old SQL execution engine and are removing the active integration with Apache Mesos.
We hope you like the new release and we&rsquo;d be eager to learn about your experience with it, which yet unsolved problems it solves, what new use-cases it unlocks for you.
The Unified Batch and Stream Processing Experience # One of Flink&rsquo;s unique characteristics is how it integrates stream- and batch processing, using unified APIs and a runtime that supports multiple execution paradigms.
As motivated in the introduction, we believe that stream- and batch processing always go hand in hand. This quote from a report on Facebook&rsquo;s streaming infrastructure echos this sentiment nicely.
Streaming versus batch processing is not an either/or decision. Originally, all data warehouse processing at Facebook was batch processing. We began developing Puma and Swift about five years ago. As we showed in Section [&hellip;], using a mix of streaming and batch processing can speed up long pipelines by hours.
Having both the real-time and the historic computations in the same engine also ensures consistency between semantics and makes results well comparable. Here is an article by Alibaba about unifying business reporting with Apache Flink and getting consistent reports that way.
While unified streaming &amp; batch are already possible in earlier versions, this release brings some features that unlock new use cases, as well as a series of quality-of-life improvements.
Checkpointing and Bounded Streams # Flink&rsquo;s checkpointing mechanism could originally only create checkpoints when all tasks in an application&rsquo;s DAG were running. This meant that applications using both bounded and unbounded data sources were not really possible. In addition, applications on bounded inputs that were executed in a streaming way (not in a batch way) stopped checkpointing towards the end of the processing, when some tasks finished. Without checkpoints, the latest output data was not committed, resulting in lingering data for exactly-once sinks.
With FLIP-147 Flink now supports checkpoints after tasks are finished, and takes a final checkpoint at the end of a bounded stream, ensuring that all sink data is committed before the job ends (similar to how stop-with-savepoint behaves).
To activate this feature, add execution.checkpointing.checkpoints-after-tasks-finish.enabled: true to your configuration. Keeping with the opt-in tradition for big and new features, this is not activated by default in Flink 1.14. We expect it to become the default mode in the next release.
Background: While the batch execution mode is often the preferrable way to run applications over bounded streams, there are various reasons to use streaming execution mode over bounded streams. For example, the sink being used might only support streaming execution (i.e. Kafka sink) or you may want to exploit the streaming-inherent quasi-ordering-by-time in your application, such as motivated by the Kappa+ Architecture.
Batch Execution for mixed DataStream and Table/SQL Applications # SQL and the Table API are becoming the default starting points for new projects. The declarative nature and richness of built-in types and operations make it easy to develop applications fast. It is not uncommon, however, for developers to eventually hit the limit of SQL&rsquo;s expressiveness for certain types of event-driven business logic (or hit the point when it becomes grotesque to express that logic in SQL).
At that point, the natural step is to blend in a piece of stateful DataStream API logic, before switching back to SQL again.
In Flink 1.14, bounded batch-executed SQL/Table programs can convert their intermediate Tables to a DataStream, apply some DataSteam API operations, and convert it back to a Table. Under the hood, Flink builds a dataflow DAG mixing declarative optimized SQL execution with batch-executed DataStream logic. Check out the documentation for details.
Hybrid Source # The new Hybrid Source produces a combined stream from multiple sources, by reading those sources one after the other, seamlessly switching over from one source to the other.
The motivating use case for the Hybrid Source was to read streams from tiered storage setups as if there was one stream that spans all tiers. For example, new data may land in Kafka and is eventually migrated to S3 (typically in compressed columnar format, for cost efficiency and performance). The Hybrid Source can read this as one contiguous logical stream, starting with the historic data on S3 and transitioning over to the more recent data in Kafka.
We believe that this is an exciting step in realizing the full promise of logs and the Kappa Architecture. Even if older parts of an event log are physically migrated to different storage (for reasons such as cost, better compression, faster reads) you can still treat and process it as one contiguous log.
Flink 1.14 adds the core functionality of the Hybrid Source. Over the next releases, we expect to add more utilities and patterns for typical switching strategies.
Consolidating Sources and Sink # With the new unified (streaming/batch) source and sink APIs now being stable, we started the big effort to consolidate all connectors around those APIs. At the same time, we are better aligning connectors between DataStream and SQL/Table API. First are the Kafka and File Soures and Sinks for the DataStream API.
The result of this effort (that we expect to span at least 1-2 futher releases) will be a much smoother and more consistent experience for Flink users when connecting to external systems.
Improvements to Operations # Buffer debloating # Buffer Debloating is a new technology in Flink that minimizes checkpoint latency and cost. It does so by automatically tuning the usage of network memory to ensure high throughput, while minimizing the amount of in-flight data.
Apache Flink buffers a certain amount of data in its network stack to be able to utilize the bandwidth of fast networks. A Flink application running with high throughput uses some (or all) of that memory. Aligned checkpoints flow with the data through the network buffers in milliseconds.
During (temporary) backpressure from a resource bottleneck such as an external system, data skew, or (temporarily) increased load, Flink was buffering a lot more data inside its network buffers than necessary to utilize enough network bandwidth for the application&rsquo;s current – backpressured – throughput. This actually has an adverse effect because more buffered data means that the checkpoints need to do more work. Aligned checkpoint barriers need to wait for more data to be processed, unaligned checkpoints need to persist more in-flight data.
This is where Buffer Debloating comes into play: It changes the network stack from keeping up to X bytes of data to keeping data that is worth X milliseconds of receiver computing time. With the default setting of 1000 milliseconds, that means the network stack will buffer as much data as the receiving task can process in 1000 milliseconds. These values are constantly measured and adjusted, so the system keeps this characteristic even under varying conditions. As a result, Flink can now provide stable and predictable alignment times for aligned checkpoints under backpressure, and can vastly reduce the amount of in-flight data stored in unaliged checkpoints under backpressure.
Buffer Debloating acts as a complementary feature, or even alternative, to unaligned checkpoints. Checkout the documentation to see how to activate this feature.
Fine-grained Resource Management # Fine-grained resource management is an advanced new feature that increases the resource utilization of large shared clusters.
Flink clusters execute various data processing workloads. Different data processing steps typically need different resources such as compute resources and memory. For example, most map() functions are fairly lightweight, but large windows with long retention can benefit from lots of memory. By default, Flink manages resources in coarse-grained units called slots, which are slices of a TaskManager&rsquo;s resources. Streaming pipelines fill a slot with one parallel subtask of each operator, so each slot holds a pipeline of subtasks. Through &lsquo;slot sharing groups&rsquo;, users can influence how subtasks are assigned to slots.
With fine-grained resource management, TaskManager slots can now be dynamically sized. Transformations and operators specify what resource profiles they would like (CPU size, memory pools, disk space) and Flink&rsquo;s Resource Manager and TaskManagers slice off that specific part of a TaskManager&rsquo;s total resources. You can think of it as a minimal lightweight resource orchestration layer within Flink. The figure below illustrates the difference between the current default mode of shared fixed-size slots and the new fine-grained resource management feature.
You may be wondering why we implement such a feature in Flink, when we also integrate with full-fledged resource orchestration frameworks like Kubernetes or YARN. There are several situations where the additional resource management layer within Flink significantly increases the resource utilization:
For many small slots, the overhead of dedicated TaskManagers is very high (JVM overhead, Flink control data structures). Slot-sharing implicitly works around this by sharing the slots between all operator types, which means sharing resources between lightweight operators (which need small slots) and heavyweight operators (which need large slots). However, this only works well when all operators share the same parallelism, which is not aways optimal. Furthermore, certain operators work better when run in isolation (for example ML training operators that need dedicated GPU resources). Kubernetes and YARN often take quite some time to fulfill requests, especially on loaded clusters. For many batch jobs, efficiency gets lost while waiting for the requests to be fulfilled. So when should you use this feature? For most streaming and batch jobs the default resource management mechanism are perfectly suitable. Fine-grained resourced management can help you increase resource efficiency if you have either long-running streaming jobs, or fast batch jobs, where different stages have different resource requirements, and you may have already tuned the parallelism of different operators to different values.
Alibaba&rsquo;s internal Flink-based platform has used this mechanism for some time now and the resource utilization of the cluster has improved significantly.
Please refer to the Fine-grained Resource Management documentation for details on how to use this feature.
Connectors # Connector Metrics # Metrics for connectors have been standardized in this release (see FLIP-33). The community will gradually pull metrics through all connectors, as we rework them onto the new unified APIs over the next releases. In Flink 1.14, we cover the Kafka connector and (partially) the FileSystem connectors.
Connectors are the entry and exit points for data in a Flink job. If a job is not running as expected, the connector telemetry is among the first parts to be checked. We believe this will become a nice improvement when operating Flink applications in production.
Pulsar Connector # In this release, Flink added the Apache Pulsar connector. The Pulsar connector reads data from Pulsar topics and supports both streaming and batch execution modes. With the support of the transaction functionality (introduced in Pulsar 2.8.0), the Pulsar connector provides exactly-once delivery semantic to ensure that a message is delivered exactly once to a consumer, even if a producer retries sending that message.
To support the different message-ordering and scaling requirements of different use cases, the Pulsar source connector exposes four subscription types:
Exclusive Shared Failover Key-Shared The connector currently supports the DataStream API. Table API/SQL bindings are expected to be contributed in a future release. For details about how to use the Pulsar connector, see Apache Pulsar Connector.
PyFlink # Performance Improvements through Chaining # Similar to how the Java APIs chain transformation functions/operators within a task to avoid serialization overhead, PyFlink now chains Python functions. In PyFlink&rsquo;s case, the chaining not only eliminates serialization overhead, but also reduces RPC round trips between the Java and Python processes. This provides a significant boost to PyFlink&rsquo;s overall performance.
Python function chaining was already available for Python UDFs used in the Table API &amp; SQL. In Flink 1.14, chaining is also exploited for the cPython functions in Python DataStream API.
Loopback Mode for Debugging # Python functions are normally executed in a separate Python process next to Flink&rsquo;s JVM. This architecture makes it difficult to debug Python code.
PyFlink 1.14 introduces a loopback mode, which is activated by default for local deployments. In this mode, user-defined Python functions will be executed in the Python process of the client, which is the entry point process that starts the PyFlink program and contains the DataStream API and Table API code that builds the dataflow DAG. Users can now easily debug their Python functions by setting breakpoints in their IDEs when launching a PyFlink job locally.
Miscellaneous Improvements # There are also many other improvements to PyFlink, such as support for executing jobs in YARN application mode and support for compressed tgz files as Python archives. Check out the Python API documentation for more details.
Goodbye Legacy SQL Engine and Mesos Support # Maintaining an open source project also means sometimes saying good-bye to some beloved features.
When we added the Blink SQL Engine to Flink more than two years ago, it was clear that it would eventually replace the previous SQL engine. Blink was faster and more feature-complete. For a year now, Blink has been the default SQL engine. With Flink 1.14 we finally remove all code from the previous SQL engine. This allowed us to drop many outdated interfaces and reduce confusion for users about which interfaces to use when implementing custom connectors or functions. It will also help us in the future to make faster changes to the SQL engine.
The active integration with Apache Mesos was also removed, because we saw little interest by users in this feature and we could not gather enough contributors willing to help maintaining this part of the system. Flink 1.14 can no longer run on Mesos without the help of projects like Marathon, and the Flink Resource Manager can no longer request and release resources from Mesos for workloads with changing resource requirements.
Upgrade Notes # While we aim to make upgrades as smooth as possible, some of the changes require users to adjust some parts of the program when upgrading to Apache Flink 1.14. Please take a look at the release notes for a list of adjustments to make and issues to check during upgrades.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
adavis9592, Ada Wong, aidenma, Aitozi, Ankush Khanna, anton, Anton Kalashnikov, Arvid Heise, Ashwin Kolhatkar, Authuir, bgeng777, Brian Zhou, camile.sing, caoyingjie, Cemre Mengu, chennuo, Chesnay Schepler, chuixue, CodeCooker17, comsir, Daisy T, Danny Cranmer, David Anderson, David Moravek, Dawid Wysakowicz, dbgp2021, Dian Fu, Dong Lin, Edmondsky, Elphas Toringepi, Emre Kartoglu, ericliuk, Eron Wright, est08zw, Etienne Chauchot, Fabian Paul, fangliang, fangyue1, fengli, Francesco Guardiani, FuyaoLi2017, fuyli, Gabor Somogyi, gaoyajun02, Gen Luo, gentlewangyu, GitHub, godfrey he, godfreyhe, gongzhongqiang, Guokuai Huang, GuoWei Ma, Gyula Fora, hackergin, hameizi, Hang Ruan, Han Wei, hapihu, hehuiyuan, hstdream, Huachao Mao, HuangXiao, huangxingbo, huxixiang, Ingo Bürk, Jacklee, Jan Brusch, Jane, Jane Chan, Jark Wu, JasonLee, Jiajie Zhong, Jiangjie (Becket) Qin, Jianzhang Chen, Jiayi Liao, Jing, Jingsong Lee, JingsongLi, Jing Zhang, jinxing64, junfan.zhang, Jun Qin, Jun Zhang, kanata163, Kevin Bohinski, kevin.cyj, Kevin Fan, Kurt Young, kylewang, Lars Bachmann, lbb, LB Yu, LB-Yu, LeeJiangchuan, Leeviiii, leiyanfei, Leonard Xu, LightGHLi, Lijie Wang, liliwei, lincoln lee, Linyu, liuyanpunk, lixiaobao14, luoyuxia, Lyn Zhang, lys0716, MaChengLong, mans2singh, Marios Trivyzas, martijnvisser, Matthias Pohl, Mayi, mayue.fight, Michael Li, Michal Ciesielczyk, Mika, Mika Naylor, MikuSugar, movesan, Mulan, Nico Kruber, Nicolas Raga, Nicolaus Weidner, paul8263, Paul Lin, pierre xiong, Piotr Nowojski, Qingsheng Ren, Rainie Li, Robert Metzger, Roc Marshal, Roman, Roman Khachatryan, Rui Li, sammieliu, sasukerui, Senbin Lin, Senhong Liu, Serhat Soydan, Seth Wiesman, sharkdtu, Shengkai, Shen Zhu, shizhengchao, Shuo Cheng, shuo.cs, simenliuxing, sjwiesman, Srinivasulu Punuru, Stefan Gloutnikov, SteNicholas, Stephan Ewen, sujun, sv3ndk, Svend Vanderveken, syhily, Tartarus0zm, Terry Wang, Thesharing, Thomas Weise, tiegen, Till Rohrmann, Timo Walther, tison, Tony Wei, trushev, tsreaper, TsReaper, Tzu-Li (Gordon) Tai, wangfeifan, wangwei1025, wangxianghu, wangyang0918, weizheng92, Wenhao Ji, Wenlong Lyu, wenqiao, WilliamSong11, wuren, wysstartgo, Xintong Song, yanchenyun, yangminghua, yangqu, Yang Wang, Yangyang ZHANG, Yangze Guo, Yao Zhang, yfhanfei, yiksanchan, Yik San Chan, Yi Tang, yljee, Youngwoo Kim, Yuan Mei, Yubin Li, Yufan Sheng, yulei0824, Yun Gao, Yun Tang, yuxia Luo, Zakelly, zhang chaoming, zhangjunfan, zhangmang, zhangzhengqi3, zhao_wei_nan, zhaown, zhaoxing, ZhiJie Yang, Zhilong Hong, Zhiwen Sun, Zhu Zhu, zlzhang0122, zoran, Zor X. LIU, zoucao, Zsombor Chikan, 子扬, 莫辞
`}),e.add({id:107,href:"/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-one/",title:"Implementing a Custom Source Connector for Table API and SQL - Part One ",section:"Flink Blog",content:`Part one of this tutorial will teach you how to build and run a custom source connector to be used with Table API and SQL, two high-level abstractions in Flink. The tutorial comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client.
Introduction # Apache Flink is a data processing engine that aims to keep state locally in order to do computations efficiently. However, Flink does not &ldquo;own&rdquo; the data but relies on external systems to ingest and persist data. Connecting to external data input (sources) and external data storage (sinks) is usually summarized under the term connectors in Flink.
Since connectors are such important components, Flink ships with connectors for some popular systems. But sometimes you may need to read in an uncommon data format and what Flink provides is not enough. This is why Flink also provides extension points for building custom connectors if you want to connect to a system that is not supported by an existing connector.
Once you have a source and a sink defined for Flink, you can use its declarative APIs (in the form of the Table API and SQL) to execute queries for data analysis.
The Table API provides more programmatic access while SQL is a more universal query language. It is named Table API because of its relational functions on tables: how to obtain a table, how to output a table, and how to perform query operations on the table.
In this two-part tutorial, you will explore some of these APIs and concepts by implementing your own custom source connector for reading in data from an email inbox. You will then use Flink to process emails through the IMAP protocol.
Part one will focus on building a custom source connector and part two will focus on integrating it.
Prerequisites # This tutorial assumes that you have some familiarity with Java and objected-oriented programming.
You are encouraged to follow along with the code in this repository.
It would also be useful to have docker-compose installed on your system in order to use the script included in the repository that builds and runs the connector.
Understand the infrastructure required for a connector # In order to create a connector which works with Flink, you need:
A factory class (a blueprint for creating other objects from string properties) that tells Flink with which identifier (in this case, “imap”) our connector can be addressed, which configuration options it exposes, and how the connector can be instantiated. Since Flink uses the Java Service Provider Interface (SPI) to discover factories located in different modules, you will also need to add some configuration details.
The table source object as a specific instance of the connector during the planning stage. It is responsible for back and forth communication with the optimizer during the planning stage and is like another factory for creating connector runtime implementation. There are also more advanced features, such as abilities, that can be implemented to improve connector performance.
A runtime implementation from the connector obtained during the planning stage. The runtime logic is implemented in Flink&rsquo;s core connector interfaces and does the actual work of producing rows of dynamic table data. The runtime instances are shipped to the Flink cluster.
Let us look at this sequence (factory class → table source → runtime implementation) in reverse order.
Establish the runtime implementation of the connector # You first need to have a source connector which can be used in Flink&rsquo;s runtime system, defining how data goes in and how it can be executed in the cluster. There are a few different interfaces available for implementing the actual source of the data and have it be discoverable in Flink.
For complex connectors, you may want to implement the Source interface which gives you a lot of control. For simpler use cases, you can use the SourceFunction interface. There are already a few different implementations of SourceFunction interfaces for common use cases such as the FromElementsFunction class and the RichSourceFunction class. You will use the latter.
Hint The Source interface is the new abstraction whereas the SourceFunction interface is slowly phasing out. All connectors will eventually implement the Source interface. RichSourceFunction is a base class for implementing a data source that has access to context information and some lifecycle methods. There is a run() method inherited from the SourceFunction interface that you need to implement. It is invoked once and can be used to produce the data either once for a bounded result or within a loop for an unbounded stream.
For example, to create a bounded data source, you could implement this method so that it reads all existing emails and then closes. To create an unbounded source, you could only look at new emails coming in while the source is active. You can also combine these behaviors and expose them through configuration options.
When you first create the class and implement the interface, it should look something like this:
import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.RowData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { @Override public void run(SourceContext&lt;RowData&gt; ctx) throws Exception {} @Override public void cancel() {} } Note that internal data structures (RowData) are used because that is required by the table runtime.
In the run() method, you get access to a context object inherited from the SourceFunction interface, which is a bridge to Flink and allows you to output data. Since the source does not produce any data yet, the next step is to make it produce some static data in order to test that the data flows correctly:
import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.GenericRowData; import org.apache.flink.table.data.RowData; import org.apache.flink.table.data.StringData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { @Override public void run(SourceContext&lt;RowData&gt; ctx) throws Exception { ctx.collect(GenericRowData.of( StringData.fromString(&#34;Subject 1&#34;), StringData.fromString(&#34;Hello, World!&#34;) )); } @Override public void cancel(){} } You do not need to implement the cancel() method yet because the source finishes instantly.
Create and configure a dynamic table source for the data stream # Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data and, like its name suggests, change over time. You can imagine a data stream being logically converted into a table that is constantly changing. For this tutorial, the emails that will be read in will be interpreted as a (source) table that is queryable. It can be viewed as a specific instance of a connector class.
You will now implement a DynamicTableSource interface. There are two types of dynamic table sources: ScanTableSource and LookupTableSource. Scan sources read the entire table on the external system while lookup sources look for specific rows based on keys. The former will fit the use case of this tutorial.
This is what a scan table source implementation would look like:
import org.apache.flink.table.connector.ChangelogMode; import org.apache.flink.table.connector.source.DynamicTableSource; import org.apache.flink.table.connector.source.ScanTableSource; import org.apache.flink.table.connector.source.SourceFunctionProvider; public class ImapTableSource implements ScanTableSource { @Override public ChangelogMode getChangelogMode() { return ChangelogMode.insertOnly(); } @Override public ScanRuntimeProvider getScanRuntimeProvider(ScanContext ctx) { boolean bounded = true; final ImapSource source = new ImapSource(); return SourceFunctionProvider.of(source, bounded); } @Override public DynamicTableSource copy() { return new ImapTableSource(); } @Override public String asSummaryString() { return &#34;IMAP Table Source&#34;; } } ChangelogMode informs Flink of expected changes that the planner can expect during runtime. For example, whether the source produces only new rows, also updates to existing ones, or whether it can remove previously produced rows. Our source will only produce (insertOnly()) new rows.
ScanRuntimeProvider allows Flink to create the actual runtime implementation you established previously (for reading the data). Flink even provides utilities like SourceFunctionProvider to wrap it into an instance of SourceFunction, which is one of the base runtime interfaces.
You will also need to indicate whether the source is bounded or not. Currently, this is the case but you will have to change this later.
Create a factory class for the connector so it can be discovered by Flink # You now have a working source connector, but in order to use it in Table API or SQL, it needs to be discoverable by Flink. You also need to define how the connector is addressable from a SQL statement when creating a source table.
You need to implement a Factory, which is a base interface that creates object instances from a list of key-value pairs in Flink&rsquo;s Table API and SQL. A factory is uniquely identified by its class name and factoryIdentifier(). For this tutorial, you will implement the more specific DynamicTableSourceFactory, which allows you to configure a dynamic table connector as well as create DynamicTableSource instances.
import java.util.HashSet; import java.util.Set; import org.apache.flink.configuration.ConfigOption; import org.apache.flink.table.connector.source.DynamicTableSource; import org.apache.flink.table.factories.DynamicTableSourceFactory; import org.apache.flink.table.factories.FactoryUtil; public class ImapTableSourceFactory implements DynamicTableSourceFactory { @Override public String factoryIdentifier() { return &#34;imap&#34;; } @Override public Set&lt;ConfigOption&lt;?&gt;&gt; requiredOptions() { return new HashSet&lt;&gt;(); } @Override public Set&lt;ConfigOption&lt;?&gt;&gt; optionalOptions() { return new HashSet&lt;&gt;(); } @Override public DynamicTableSource createDynamicTableSource(Context ctx) { final FactoryUtil.TableFactoryHelper factoryHelper = FactoryUtil.createTableFactoryHelper(this, ctx); factoryHelper.validate(); return new ImapTableSource(); } } There are currently no configuration options but they can be added and also validated within the createDynamicTableSource() function. There is a small helper utility, TableFactoryHelper, that Flink offers which ensures that required options are set and that no unknown options are provided.
Finally, you need to register your factory for Java&rsquo;s Service Provider Interfaces (SPI). Classes that implement this interface can be discovered and should be added to this file src/main/resources/META-INF/services/org.apache.flink.table.factories.Factory with the fully classified class name of your factory:
// if you created your class in the package org.example.acme, it should be named the following: org.example.acme.ImapTableSourceFactory Test the custom connector # You should now have a working source connector. If you are following along with the provided repository, you can test it by running:
$ cd testing/ $ ./build_and_run.sh This builds the connector, starts a Flink cluster, a test email server (which you will need later), and the SQL client (which is bundled in the regular Flink distribution) for you. If successful, you should see the SQL CLI:
Flink SQL Client
You can now create a table (with a &ldquo;subject&rdquo; column and a &ldquo;content&rdquo; column) with your connector by executing the following statement with the SQL client:
CREATE TABLE T (subject STRING, content STRING) WITH (&#39;connector&#39; = &#39;imap&#39;); SELECT * FROM T; Note that the schema must be exactly as written since it is currently hardcoded into the connector.
You should be able to see the static data you provided in your source connector earlier, which would be &ldquo;Subject 1&rdquo; and &ldquo;Hello, World!&rdquo;.
Now that you have a working connector, the next step is to make it do something more useful than returning static data.
Summary # In this tutorial, you looked into the infrastructure required for a connector and configured its runtime implementation to define how it should be executed in a cluster. You also defined a dynamic table source that reads the entire stream-converted table from the external source, made the connector discoverable by Flink through creating a factory class for it, and then tested it.
Next Steps # In part two, you will integrate this connector with an email inbox through the IMAP protocol.
`}),e.add({id:108,href:"/2021/09/07/implementing-a-custom-source-connector-for-table-api-and-sql-part-two/",title:"Implementing a custom source connector for Table API and SQL - Part Two ",section:"Flink Blog",content:`In part one of this tutorial, you learned how to build a custom source connector for Flink. In part two, you will learn how to integrate the connector with a test email inbox through the IMAP protocol and filter out emails using Flink SQL.
Goals # Part two of the tutorial will teach you how to:
integrate a source connector which connects to a mailbox using the IMAP protocol use Jakarta Mail, a Java library that can send and receive email via the IMAP protocol write Flink SQL and execute the queries in the Ververica Platform for a nicer visualization You are encouraged to follow along with the code in this repository. It provides a boilerplate project that also comes with a bundled docker-compose setup that lets you easily run the connector. You can then try it out with Flink’s SQL client.
Prerequisites # This tutorial assumes that you have:
followed the steps outlined in part one of this tutorial some familiarity with Java and objected-oriented programming Understand how to fetch emails via the IMAP protocol # Now that you have a working source connector that can run on Flink, it is time to connect to an email server via IMAP (an Internet protocol that allows email clients to retrieve messages from a mail server) so that Flink can process emails instead of test static data.
You will use Jakarta Mail, a Java library that can be used to send and receive email via IMAP. For simplicity, authentication will use a plain username and password.
This tutorial will focus more on how to implement a connector for Flink. If you want to learn more about the details of how IMAP or Jakarta Mail work, you are encouraged to explore a more extensive implementation at this repository. It offers a wide range of information to be read from emails, as well as options to ingest existing emails alongside new ones, connecting with SSL, and more. It also supports different formats for reading email content and implements some connector abilities such as reading metadata.
In order to fetch emails, you will need to connect to the email server, register a listener for new emails and collect them whenever they arrive, and enter a loop to keep the connector running.
Add configuration options - server information and credentials # In order to connect to your IMAP server, you will need at least the following:
hostname (of the mail server) port number username password You will start by creating a class to encapsulate the configuration options. You will make use of Lombok to help with some boilerplate code. By adding the @Data and @SuperBuilder annotations, Lombok will generate these for all the fields of the immutable class.
import lombok.Data; import lombok.experimental.SuperBuilder; import javax.annotation.Nullable; import java.io.Serializable; @Data @SuperBuilder(toBuilder = true) public class ImapSourceOptions implements Serializable { private static final long serialVersionUID = 1L; private final String host; private final @Nullable Integer port; private final @Nullable String user; private final @Nullable String password; } Now you can add an instance of this class to the ImapSource and ImapTableSource classes previously created (in part one) so it can be used there. Take note of the column names with which the table has been created. This will help later. You will also switch the source to be unbounded now as we will change the implementation in a bit to continuously listen for new emails.
Hint The column names would be "subject" and "content" with the SQL executed in part one:
CREATE TABLE T (subject STRING, content STRING) WITH ('connector' = 'imap'); import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.RowData; import java.util.List; import java.util.stream.Collectors; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { private final ImapSourceOptions options; private final List&lt;String&gt; columnNames; public ImapSource( ImapSourceOptions options, List&lt;String&gt; columnNames ) { this.options = options; this.columnNames = columnNames.stream() .map(String::toUpperCase) .collect(Collectors.toList()); } // ... } import org.apache.flink.table.connector.source.DynamicTableSource; import org.apache.flink.table.connector.source.ScanTableSource; import java.util.List; public class ImapTableSource implements ScanTableSource { private final ImapSourceOptions options; private final List&lt;String&gt; columnNames; public ImapTableSource( ImapSourceOptions options, List&lt;String&gt; columnNames ) { this.options = options; this.columnNames = columnNames; } // … @Override public ScanRuntimeProvider getScanRuntimeProvider(ScanContext ctx) { final boolean bounded = false; final ImapSource source = new ImapSource(options, columnNames); return SourceFunctionProvider.of(source, bounded); } @Override public DynamicTableSource copy() { return new ImapTableSource(options, columnNames); } // … } Finally, in the ImapTableSourceFactory class, you need to create a ConfigOption&lt;&gt; for the hostname, port number, username, and password. Then you need to report them to Flink. Host, user, and password are mandatory and can be added to requiredOptions(); the port is optional and can be added to optionalOptions() instead.
import org.apache.flink.configuration.ConfigOption; import org.apache.flink.configuration.ConfigOptions; import org.apache.flink.table.factories.DynamicTableSourceFactory; import java.util.HashSet; import java.util.Set; public class ImapTableSourceFactory implements DynamicTableSourceFactory { public static final ConfigOption&lt;String&gt; HOST = ConfigOptions.key(&#34;host&#34;).stringType().noDefaultValue(); public static final ConfigOption&lt;Integer&gt; PORT = ConfigOptions.key(&#34;port&#34;).intType().noDefaultValue(); public static final ConfigOption&lt;String&gt; USER = ConfigOptions.key(&#34;user&#34;).stringType().noDefaultValue(); public static final ConfigOption&lt;String&gt; PASSWORD = ConfigOptions.key(&#34;password&#34;).stringType().noDefaultValue(); // … @Override public Set&lt;ConfigOption&lt;?&gt;&gt; requiredOptions() { final Set&lt;ConfigOption&lt;?&gt;&gt; options = new HashSet&lt;&gt;(); options.add(HOST); options.add(USER); options.add(PASSWORD); return options; } @Override public Set&lt;ConfigOption&lt;?&gt;&gt; optionalOptions() { final Set&lt;ConfigOption&lt;?&gt;&gt; options = new HashSet&lt;&gt;(); options.add(PORT); return options; } // … } Now take a look at the createDynamicTableSource() function in the ImapTableSourceFactory class. Recall that previously (in part one) you used a small helper utility TableFactoryHelper, that Flink offers which ensures that required options are set and that no unknown options are provided. You can now use it to automatically make sure that the required options of hostname, port number, username, and password are all provided when creating a table using this connector. The helper function will throw an error message if one required option is missing. You can also use it to access the provided options (getOptions()), convert them into an instance of the ImapTableSource class created earlier, and provide the instance to the table source:
import java.util.List; import java.util.stream.Collectors; import org.apache.flink.table.factories.DynamicTableSourceFactory; import org.apache.flink.table.factories.FactoryUtil; import org.apache.flink.table.catalog.Column; public class ImapTableSourceFactory implements DynamicTableSourceFactory { // ... @Override public DynamicTableSource createDynamicTableSource(Context ctx) { final FactoryUtil.TableFactoryHelper factoryHelper = FactoryUtil.createTableFactoryHelper(this, ctx); factoryHelper.validate(); final ImapSourceOptions options = ImapSourceOptions.builder() .host(factoryHelper.getOptions().get(HOST)) .port(factoryHelper.getOptions().get(PORT)) .user(factoryHelper.getOptions().get(USER)) .password(factoryHelper.getOptions().get(PASSWORD)) .build(); final List&lt;String&gt; columnNames = ctx.getCatalogTable().getResolvedSchema().getColumns().stream() .filter(Column::isPhysical) .map(Column::getName) .collect(Collectors.toList()); return new ImapTableSource(options, columnNames); } } Hint Ideally, you would use connector metadata instead of column names. You can refer again to the accompanying repository which does implement this using metadata fields. To test these new configuration options, run:
$ cd testing/ $ ./build_and_run.sh Once you see the Flink SQL client start up, execute the following statements to create a table with your connector:
CREATE TABLE T (subject STRING, content STRING) WITH (&#39;connector&#39; = &#39;imap&#39;); SELECT * FROM T; This time it will fail because the required options are not provided:
[ERROR] Could not execute SQL statement. Reason: org.apache.flink.table.api.ValidationException: One or more required options are missing. Missing required options are: host password user Connect to the source email server # Now that you have configured the required options to connect to the email server, it is time to actually connect to the server.
Going back to the ImapSource class, you first need to convert the options given to the table source into a Properties object, which is what you can pass to the Jakarta library. You can also set various other properties here as well (i.e. enabling SSL).
The specific properties that the Jakarta library understands are documented here.
import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.RowData; import java.util.Properties; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { // … private Properties getSessionProperties() { Properties props = new Properties(); props.put(&#34;mail.store.protocol&#34;, &#34;imap&#34;); props.put(&#34;mail.imap.auth&#34;, true); props.put(&#34;mail.imap.host&#34;, options.getHost()); if (options.getPort() != null) { props.put(&#34;mail.imap.port&#34;, options.getPort()); } return props; } } Now create a method (connect()) which sets up the connection:
import jakarta.mail.*; import com.sun.mail.imap.IMAPFolder; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.RowData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { // … private transient Store store; private transient IMAPFolder folder; private void connect() throws Exception { final Session session = Session.getInstance(getSessionProperties(), null); store = session.getStore(); store.connect(options.getUser(), options.getPassword()); final Folder genericFolder = store.getFolder(&#34;INBOX&#34;); folder = (IMAPFolder) genericFolder; if (!folder.isOpen()) { folder.open(Folder.READ_ONLY); } } } You can now use this method to connect to the mail server when the source is created. Create a loop to keep the source running while collecting email counts. Lastly, implement methods to cancel and close the connection:
import jakarta.mail.*; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.streaming.api.functions.source.SourceFunction; import org.apache.flink.table.data.RowData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { private transient volatile boolean running = false; // … @Override public void run(SourceFunction.SourceContext&lt;RowData&gt; ctx) throws Exception { connect(); running = true; // TODO: Listen for new messages while (running) { // Trigger some IMAP request to force the server to send a notification folder.getMessageCount(); Thread.sleep(250); } } @Override public void cancel() { running = false; } @Override public void close() throws Exception { if (folder != null) { folder.close(); } if (store != null) { store.close(); } } } There is a request trigger to the server in every loop iteration. This is crucial as it ensures that the server will keep sending notifications. A more sophisticated approach would be to make use of the IDLE protocol.
Note Since the source is not checkpointable, no state fault tolerance will be possible.
Collect incoming emails # Now you need to listen for new emails arriving in the inbox folder and collect them. To begin, hardcode the schema and only return the email’s subject. Fortunately, Jakarta provides a simple hook (addMessageCountListener()) to get notified when new messages arrive on the server. You can use this in place of the “TODO” comment above:
import jakarta.mail.*; import jakarta.mail.event.MessageCountAdapter; import jakarta.mail.event.MessageCountEvent; import org.apache.flink.streaming.api.functions.source.RichSourceFunction; import org.apache.flink.table.data.GenericRowData; import org.apache.flink.table.data.StringData; import org.apache.flink.table.data.RowData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { @Override public void run(SourceFunction.SourceContext&lt;RowData&gt; ctx) throws Exception { // … folder.addMessageCountListener(new MessageCountAdapter() { @Override public void messagesAdded(MessageCountEvent e) { collectMessages(ctx, e.getMessages()); } }); // … } private void collectMessages(SourceFunction.SourceContext&lt;RowData&gt; ctx, Message[] messages) { for (Message message : messages) { try { ctx.collect(GenericRowData.of(StringData.fromString(message.getSubject()))); } catch (MessagingException ignored) {} } } } Now build the project again and start up the SQL client:
$ cd testing/ $ ./build_and_run.sh This time, you will connect to a GreenMail server which is started as part of the setup:
CREATE TABLE T ( subject STRING ) WITH ( &#39;connector&#39; = &#39;imap&#39;, &#39;host&#39; = &#39;greenmail&#39;, &#39;port&#39; = &#39;3143&#39;, &#39;user&#39; = &#39;alice&#39;, &#39;password&#39; = &#39;alice&#39; ); SELECT * FROM T; The query above should now run continuously but no rows will be produced since it is a test server. You need to first send an email to the server. If you have mailx installed, you can do so by executing in your terminal:
$ echo &#34;This is the email body&#34; | mailx -Sv15-compat \\ -s&#34;Email Subject&#34; \\ -Smta=&#34;smtp://alice:alice@localhost:3025&#34; \\ alice@acme.org The row “Email Subject” should now have appeared as a row in your output. Your source connector is working!
However, since you are still hard-coding the schema produced by the source, defining the table with a different schema will produce errors. You want to be able to define which fields of an email interest you and then produce the data accordingly. To do this, you will use the list of column names from earlier and then look at it when you collect the emails.
import org.apache.flink.table.data.GenericRowData; import org.apache.flink.table.data.RowData; import org.apache.flink.table.data.TimestampData; public class ImapSource extends RichSourceFunction&lt;RowData&gt; { private void collectMessages(SourceFunction.SourceContext&lt;RowData&gt; ctx, Message[] messages) { for (Message message : messages) { try { collectMessage(ctx, message); } catch (MessagingException ignored) {} } } private void collectMessage(SourceFunction.SourceContext&lt;RowData&gt; ctx, Message message) throws MessagingException { final GenericRowData row = new GenericRowData(columnNames.size()); for (int i = 0; i &lt; columnNames.size(); i++) { switch (columnNames.get(i)) { case &#34;SUBJECT&#34;: row.setField(i, StringData.fromString(message.getSubject())); break; case &#34;SENT&#34;: row.setField(i, TimestampData.fromInstant(message.getSentDate().toInstant())); break; case &#34;RECEIVED&#34;: row.setField(i, TimestampData.fromInstant(message.getReceivedDate().toInstant())); break; // ... } } ctx.collect(row); } } You should now have a working source where you can select any of the columns that are supported. Try it out again in the SQL client, but this time specifying all the columns (&ldquo;subject&rdquo;, &ldquo;sent&rdquo;, &ldquo;received&rdquo;) supported above:
CREATE TABLE T ( subject STRING, sent TIMESTAMP(3), received TIMESTAMP(3) ) WITH ( &#39;connector&#39; = &#39;imap&#39;, &#39;host&#39; = &#39;greenmail&#39;, &#39;port&#39; = &#39;3143&#39;, &#39;user&#39; = &#39;alice&#39;, &#39;password&#39; = &#39;alice&#39; ); SELECT * FROM T; Use the mailx command from earlier to send emails to the GreenMail server and you should see them appear. You can also try selecting only some of the columns, or write more complex queries.
Test the connector with a real mail server on the Ververica Platform # If you want to test the connector with a real mail server, you can import it into Ververica Platform Community Edition. To begin, make sure that you have the Ververica Platform up and running.
Since the example connector in this blog post is still a bit limited, you will use the finished connector in this repository instead. You can clone that repository and build it the same way to obtain the JAR file.
For this example, let&rsquo;s connect to a Gmail account. This requires SSL and comes with an additional caveat that you need to enable two-factor authentication and create an application password to use instead of your real password.
First, head to SQL → Connectors. There you can create a new connector by uploading your JAR file. The platform will detect the connector options automatically. Afterwards, go back to the SQL Editor and you should now be able to use the connector.
Ververica Platform - SQL Editor
Summary # Apache Flink is designed for easy extensibility and allows users to access many different external systems as data sources or sinks through a versatile set of connectors. It can read and write data from databases, local and distributed file systems.
Flink also exposes APIs on top of which custom connectors can be built. In this two-part blog series, you explored some of these APIs and concepts and learned how to implement your own custom source connector that can read in data from an email inbox. You then used Flink to process incoming emails through the IMAP protocol and wrote some Flink SQL.
`}),e.add({id:109,href:"/2021/08/31/stateful-functions-3.1.0-release-announcement/",title:"Stateful Functions 3.1.0 Release Announcement",section:"Flink Blog",content:`Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications. This new release brings various improvements to the StateFun runtime, a leaner way to specify StateFun module components, and a brand new GoLang SDK!
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Java SDK, Python SDK, and GoLang SDK distributions are available on Maven, PyPI, and Github repecitvely. You can also find official StateFun Docker images of the new version on Dockerhub.
For more details, check the complete release changelog and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA!
New Features # Delayed Message Cancellation # Stateful Functions communicate by sending messages, but sometimes it is helpful that a function will send a message for itself. For example, you may want to set a time limit on a customer onboarding flow to complete. This can easily be implmented by sending a message with a delay. But up until now, there was no way to indicate to the StateFun runtime that a particular delayed message is not necessary anymore (a customer had completed their onboarding flow). With StateFun 3.1, it is now possible to cancel a delayed message.
... context.send_after(timedelta(days=3), message_builder(target_typename=&#34;fns/onboarding&#34;, target_id=&#34;user-1234&#34;, str_value=&#34;send a reminder email&#34;), cancellation_token=&#34;flow-1234&#34;) ... To cancel the message at a later time, simply call
context.cancel_delayed_message(&#34;flow-1234&#34;) Please note that a message cancellation occurs on a best-effort basis, as the message might have already been delivered or enqueued for immediate delivery on a remote worker’s mailbox.
New way to specify components # StateFun applications consist of multiple configuration components, including remote function endpoints, along with ingress and egress definitions, defined in a YAML format. We&rsquo;ve added a new structure that treats each StateFun component as a standalone YAML document in this release. Thus, a module.yaml file becomes simply a collection of components.
kind: io.statefun.endpoints.v2/http spec: functions: com.example/* urlPathTemplate: https://bar.foo.com/{function.name} --- kind: io.statefun.kafka.v1/ingress spec: id: com.example/my-ingress address: kafka-broker:9092 consumerGroupId: my-consumer-group topics: - topic: message-topic valueType: io.statefun.types/string targets: - com.example/greeter --- kind: io.statefun.kafka.v1/egress spec: id: com.example/my-egress address: kafka-broker:9092 deliverySemantic: type: exactly-once transactionTimeout: 15min --- While this might seem like a minor cosmetic improvement, this change opens the door to more flexible configuration management options in future releases - such as managing each component as a custom K8s resource definition or even behind a REST API. StateFun still supports the legacy module format in version 3.0 for backward compatibility, but users are encouraged to upgrade. The community is providing an automated migration tool to ease the transition.
Pluggable transport for remote function invocations # It is possible to plugin a custom mechanism that invokes a remote stateful function starting with this release. Users who wish to use a customized transport need to register it as an extension and later reference it straight from the endpoint component definition.
For example:
kind: io.statefun.endpoints.v2/http spec: functions: com.foo.bar/* urlPathTemplate: https://{function.name}/ maxNumBatchRequests: 10000 transport: type: com.foo.bar/pubsub some_property1: some_value1 For a complete example of a custom transport you can start exploring here. Along with a reference usage over here.
Asynchronous, non blocking remote function invocation (beta) # For this release we’ve included a new transport implementation (opt in for this release) that is implemented on top of the asynchronous Netty framework. This transport enables much higher resource utilization, higher throughput, and lower remote function invocation latency.
To enable this new transport, set the transport type to be io.statefun.transports.v1/async Like in the following example:
kind: io.statefun.endpoints.v2/http spec: functions: fns/* urlPathTemplate: https://api-gateway.foo.bar/{function.name} maxNumBatchRequests: 10000 transport: type: io.statefun.transports.v1/async call: 2m connect: 20s Take it for a spin!
A brand new GoLang SDK # Stateful Functions provides a unified model for building stateful applications across various programming languages and deployment environments. The community is thrilled to release an official GoLang SDK as part of the 3.1.0 release.
import ( &#34;fmt&#34; &#34;github.com/apache/flink-statefun/statefun-sdk-go/v3/pkg/statefun&#34; &#34;net/http&#34; ) type Greeter struct { SeenCount statefun.ValueSpec } func (g *Greeter) Invoke(ctx statefun.Context, message statefun.Message) error { storage := ctx.Storage() // Read the current value of the state // or zero value if no value is set var count int32 storage.Get(g.SeenCount, &amp;count) count += 1 // Update the state which will // be made persistent by the runtime storage.Set(g.SeenCount, count) name := message.AsString() greeting := fmt.Sprintf(&#34;Hello there %s at the %d-th time!\\n&#34;, name, count) ctx.Send(statefun.MessageBuilder{ Target: *ctx.Caller(), Value: greeting, }) return nil } func main() { greeter := &amp;Greeter{ SeenCount: statefun.ValueSpec{ Name: &#34;seen_count&#34;, ValueType: statefun.Int32Type, }, } builder := statefun.StatefulFunctionsBuilder() _ = builder.WithSpec(statefun.StatefulFunctionSpec{ FunctionType: statefun.TypeNameFrom(&#34;com.example.fns/greeter&#34;), States: []statefun.ValueSpec{greeter.SeenCount}, Function: greeter, }) http.Handle(&#34;/statefun&#34;, builder.AsHandler()) _ = http.ListenAndServe(&#34;:8000&#34;, nil) } As with the Python and Java SDKs, the Go SDK includes:
An address scoped storage acting as a key-value store for a particular address. A unified cross-language way to send, receive and store values across languages. Dynamic ValueSpec to describe the state name, type, and possibly expiration configuration at runtime. You can get started by adding the SDK to your go.mod file.
require github.com/apache/flink-statefun/statefun-sdk-go/v3 v3.1.0
For a detailed SDK tutorial, we would like to encourage you to visit:
GoLang SDK Showcase GoLang Greeter GoLang SDK Documentation Release Notes # Release Notes # Please review the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 3.1.0.
List of Contributors # Evans Ye, George Birbilis, Igal Shilman, Konstantin Knauf, Seth Wiesman, Siddique Ahmad, Tzu-Li (Gordon) Tai, ariskk, austin ce
If you’d like to get involved, we’re always looking for new contributors.
`}),e.add({id:110,href:"/2021/08/31/help-us-stabilize-apache-flink-1.14.0-rc0/",title:"Help us stabilize Apache Flink 1.14.0 RC0",section:"Flink Blog",content:` Hint Update 29th of September: Today Apache Flink 1.14 has been released. For sure we'd still like to hear your feedback. Dear Flink Community,
we are excited to announce the first release candidate of Apache Flink 1.14. 🎉
A lot of features and fixes went into this release, including improvements to the unified batch and streaming experience, an increase in fault tolerance by reducing in-flight data, and more developments on connectors and components. It wouldn&rsquo;t have been possible without your help. Around 211 people have made contributions!
Two weeks ago (August 16th) we created a feature freeze. This means that only a few small, almost-ready features will go into the release from this moment on. We are now in the process of stabilizing the release and need your help! As you can see on the 1.14 release coordination page, a lot of focus is on documentation and testing.
If you would like to contribute to the squirrel community, a great way would be to download the release candidate and test it. You can run some existing Flink jobs or pick one of the test issues. We would greatly appreciate any feedback you can provide on the JIRA tickets or on the mailing list.
We continue to be grateful and inspired by the community who believe in the project and want to help create a great user experience and product for all Flink users.
Many thanks!
`}),e.add({id:111,href:"/2021/08/09/apache-flink-1.11.4-released/",title:"Apache Flink 1.11.4 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.11 series.
This release includes 78 fixes and minor improvements for Flink 1.11.4. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.11.4.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.11.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.4&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.11.4 Sub-task [FLINK-21070] - Overloaded aggregate functions cause converter errors [FLINK-21486] - Add sanity check when switching from Rocks to Heap timers Bug [FLINK-15262] - kafka connector doesn&#39;t read from beginning immediately when &#39;connector.startup-mode&#39; = &#39;earliest-offset&#39; [FLINK-16443] - Fix wrong fix for user-code CheckpointExceptions [FLINK-18438] - TaskManager start failed [FLINK-19369] - BlobClientTest.testGetFailsDuringStreamingForJobPermanentBlob hangs [FLINK-19436] - TPC-DS end-to-end test (Blink planner) failed during shutdown [FLINK-19771] - NullPointerException when accessing null array from postgres in JDBC Connector [FLINK-20288] - Correct documentation about savepoint self-contained [FLINK-20383] - DataSet allround end-to-end test fails with NullPointerException [FLINK-20626] - Canceling a job when it is failing will result in job hanging in CANCELING state [FLINK-20666] - Fix the deserialized Row losing the field_name information in PyFlink [FLINK-20675] - Asynchronous checkpoint failure would not fail the job anymore [FLINK-20680] - Fails to call var-arg function with no parameters [FLINK-20752] - FailureRateRestartBackoffTimeStrategy allows one less restart than configured [FLINK-20793] - Fix NamesTest due to code style refactor [FLINK-20803] - Version mismatch between spotless-maven-plugin and google-java-format plugin [FLINK-20832] - Deliver bootstrap resouces ourselves for website and documentation [FLINK-20841] - Fix compile error due to duplicated generated files [FLINK-20913] - Improve new HiveConf(jobConf, HiveConf.class) [FLINK-20989] - Functions in ExplodeFunctionUtil should handle null data to avoid NPE [FLINK-21008] - Residual HA related Kubernetes ConfigMaps and ZooKeeper nodes when cluster entrypoint received SIGTERM in shutdown [FLINK-21009] - Can not disable certain options in Elasticsearch 7 connector [FLINK-21013] - Blink planner does not ingest timestamp into StreamRecord [FLINK-21028] - Streaming application didn&#39;t stop properly [FLINK-21030] - Broken job restart for job with disjoint graph [FLINK-21071] - Snapshot branches running against flink-docker dev-master branch [FLINK-21132] - BoundedOneInput.endInput is called when taking synchronous savepoint [FLINK-21138] - KvStateServerHandler is not invoked with user code classloader [FLINK-21148] - YARNSessionFIFOSecuredITCase cannot connect to BlobServer [FLINK-21208] - pyarrow exception when using window with pandas udaf [FLINK-21213] - e2e test fail with &#39;As task is already not running, no longer decline checkpoint&#39; [FLINK-21215] - Checkpoint was declined because one input stream is finished [FLINK-21216] - StreamPandasConversionTests Fails [FLINK-21274] - At per-job mode, during the exit of the JobManager process, if ioExecutor exits at the end, the System.exit() method will not be executed. [FLINK-21289] - Application mode ignores the pipeline.classpaths configuration [FLINK-21312] - SavepointITCase.testStopSavepointWithBoundedInputConcurrently is unstable [FLINK-21323] - Stop-with-savepoint is not supported by SourceOperatorStreamTask [FLINK-21453] - BoundedOneInput.endInput is NOT called when doing stop with savepoint WITH drain [FLINK-21497] - JobLeaderIdService completes leader future despite no leader being elected [FLINK-21550] - ZooKeeperHaServicesTest.testSimpleClose fail [FLINK-21606] - TaskManager connected to invalid JobManager leading to TaskSubmissionException [FLINK-21609] - SimpleRecoveryITCaseBase.testRestartMultipleTimes fails on azure [FLINK-21654] - YARNSessionCapacitySchedulerITCase.testStartYarnSessionClusterInQaTeamQueue fail because of NullPointerException [FLINK-21725] - DataTypeExtractor extracts wrong fields ordering for Tuple12 [FLINK-21753] - Cycle references between memory manager and gc cleaner action [FLINK-21980] - ZooKeeperRunningJobsRegistry creates an empty znode [FLINK-21986] - taskmanager native memory not release timely after restart [FLINK-22081] - Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin [FLINK-22109] - Misleading exception message if the number of arguments of a nested function is incorrect [FLINK-22184] - Rest client shutdown on failure runs in netty thread [FLINK-22424] - Writing to already released buffers potentially causing data corruption during job failover/cancellation [FLINK-22489] - subtask backpressure indicator shows value for entire job [FLINK-22597] - JobMaster cannot be restarted [FLINK-22815] - Disable unaligned checkpoints for broadcast partitioning [FLINK-22946] - Network buffer deadlock introduced by unaligned checkpoint [FLINK-23164] - JobMasterTest.testMultipleStartsWork unstable on azure [FLINK-23166] - ZipUtils doesn&#39;t handle properly for softlinks inside the zip file Improvement [FLINK-9844] - PackagedProgram does not close URLClassLoader [FLINK-18182] - Upgrade AWS SDK in flink-connector-kinesis to include new region af-south-1 [FLINK-19415] - Move Hive document to &quot;Table &amp; SQL Connectors&quot; from &quot;Table API &amp; SQL&quot; [FLINK-20651] - Use Spotless/google-java-format for code formatting/enforcement [FLINK-20770] - Incorrect description for config option kubernetes.rest-service.exposed.type [FLINK-20790] - Generated classes should not be put under src/ directory [FLINK-20792] - Allow shorthand invocation of spotless [FLINK-20805] - Blink runtime classes partially ignored by spotless [FLINK-20866] - Add how to list jobs in Yarn deployment documentation when HA enabled [FLINK-20906] - Update copyright year to 2021 for NOTICE files [FLINK-21020] - Bump Jackson to 2.10.5[.1] / 2.12.1 [FLINK-21123] - Upgrade Beanutils 1.9.x to 1.9.4 [FLINK-21164] - Jar handlers don&#39;t cleanup temporarily extracted jars [FLINK-21210] - ApplicationClusterEntryPoints should explicitly close PackagedProgram [FLINK-21411] - The components on which Flink depends may contain vulnerabilities. If yes, fix them. [FLINK-21735] - Harden JobMaster#updateTaskExecutionState() [FLINK-22142] - Remove console logging for Kafka connector for AZP runs [FLINK-22208] - Bump snappy-java to 1.1.5+ [FLINK-22470] - The root cause of the exception encountered during compiling the job was not exposed to users in certain cases [FLINK-23312] - Use -Dfast for building e2e tests on AZP `}),e.add({id:112,href:"/2021/08/06/apache-flink-1.12.5-released/",title:"Apache Flink 1.12.5 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.
This release includes 76 fixes and minor improvements for Flink 1.12.4. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.12.5.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.12.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.5&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.12.5
Bug [FLINK-19925] - Errors$NativeIoException: readAddress(..) failed: Connection reset by peer [FLINK-20321] - Get NPE when using AvroDeserializationSchema to deserialize null input [FLINK-20888] - ContinuousFileReaderOperator should not close the output on close() [FLINK-21329] - &quot;Local recovery and sticky scheduling end-to-end test&quot; does not finish within 600 seconds [FLINK-21445] - Application mode does not set the configuration when building PackagedProgram [FLINK-21469] - stop-with-savepoint --drain doesn&#39;t advance watermark for sources chained to MultipleInputStreamTask [FLINK-21952] - Make all the &quot;Connection reset by peer&quot; exception wrapped as RemoteTransportException [FLINK-22015] - SQL filter containing OR and IS NULL will produce an incorrect result. [FLINK-22105] - SubtaskCheckpointCoordinatorTest.testForceAlignedCheckpointResultingInPriorityEvents unstable [FLINK-22157] - Join &amp; Select a part of composite primary key will cause ArrayIndexOutOfBoundsException [FLINK-22312] - YARNSessionFIFOSecuredITCase&gt;YARNSessionFIFOITCase.checkForProhibitedLogContents due to the heartbeat exception with Yarn RM [FLINK-22408] - Flink Table Parsr Hive Drop Partitions Syntax unparse is Error [FLINK-22419] - testScheduleRunAsync fail [FLINK-22434] - Dispatcher does not store suspended jobs in execution graph store [FLINK-22443] - can not be execute an extreme long sql under batch mode [FLINK-22494] - Avoid discarding checkpoints in case of failure [FLINK-22496] - ClusterEntrypointTest.testCloseAsyncShouldBeExecutedInShutdownHook failed [FLINK-22502] - DefaultCompletedCheckpointStore drops unrecoverable checkpoints silently [FLINK-22547] - OperatorCoordinatorHolderTest. verifyCheckpointEventOrderWhenCheckpointFutureCompletesLate fail [FLINK-22564] - Kubernetes-related ITCases do not fail even in case of failure [FLINK-22592] - numBuffersInLocal is always zero when using unaligned checkpoints [FLINK-22613] - FlinkKinesisITCase.testStopWithSavepoint fails [FLINK-22683] - The total Flink/process memory of memoryConfiguration in /taskmanagers can be null or incorrect value [FLINK-22698] - RabbitMQ source does not stop unless message arrives in queue [FLINK-22704] - ZooKeeperHaServicesTest.testCleanupJobData failed [FLINK-22721] - Breaking HighAvailabilityServices interface by adding new method [FLINK-22733] - Type mismatch thrown in DataStream.union if parameter is KeyedStream for Python DataStream API [FLINK-22756] - DispatcherTest.testJobStatusIsShownDuringTermination fail [FLINK-22788] - Code of equals method grows beyond 64 KB [FLINK-22814] - New sources are not defining/exposing checkpointStartDelayNanos metric [FLINK-22815] - Disable unaligned checkpoints for broadcast partitioning [FLINK-22819] - YARNFileReplicationITCase fails with &quot;The YARN application unexpectedly switched to state FAILED during deployment&quot; [FLINK-22820] - Stopping Yarn session cluster will cause fatal error [FLINK-22833] - Source tasks (both old and new) are not reporting checkpointStartDelay via CheckpointMetrics [FLINK-22856] - Move our Azure pipelines away from Ubuntu 16.04 by September [FLINK-22886] - Thread leak in RocksDBStateUploader [FLINK-22898] - HiveParallelismInference limit return wrong parallelism [FLINK-22908] - FileExecutionGraphInfoStoreTest.testPutSuspendedJobOnClusterShutdown should wait until job is running [FLINK-22927] - Exception on JobClient.get_job_status().result() [FLINK-22946] - Network buffer deadlock introduced by unaligned checkpoint [FLINK-22952] - docs_404_check fail on azure due to ruby version not available [FLINK-22963] - The description of taskmanager.memory.task.heap.size in the official document is incorrect [FLINK-22964] - Connector-base exposes dependency to flink-core. [FLINK-22987] - Scala suffix check isn&#39;t working [FLINK-23010] - HivePartitionFetcherContextBase::getComparablePartitionValueList can return partitions that don&#39;t exist [FLINK-23030] - PartitionRequestClientFactory#createPartitionRequestClient should throw when network failure [FLINK-23045] - RunnablesTest.testExecutorService_uncaughtExceptionHandler fails on azure [FLINK-23074] - There is a class conflict between flink-connector-hive and flink-parquet [FLINK-23076] - DispatcherTest.testWaitingForJobMasterLeadership fails on azure [FLINK-23119] - Fix the issue that the exception that General Python UDAF is unsupported is not thrown in Compile Stage. [FLINK-23120] - ByteArrayWrapperSerializer.serialize should use writeInt to serialize the length [FLINK-23133] - The dependencies are not handled properly when mixing use of Python Table API and Python DataStream API [FLINK-23135] - Flink SQL Error while applying rule AggregateReduceGroupingRule [FLINK-23164] - JobMasterTest.testMultipleStartsWork unstable on azure [FLINK-23166] - ZipUtils doesn&#39;t handle properly for softlinks inside the zip file [FLINK-23182] - Connection leak in RMQSource [FLINK-23184] - CompileException Assignment conversion not possible from type &quot;int&quot; to type &quot;short&quot; [FLINK-23201] - The check on alignmentDurationNanos seems to be too strict [FLINK-23223] - When flushAlways is enabled the subpartition may lose notification of data availability [FLINK-23233] - OperatorEventSendingCheckpointITCase.testOperatorEventLostWithReaderFailure fails on azure [FLINK-23248] - SinkWriter is not closed when failing [FLINK-23417] - MiniClusterITCase.testHandleBatchJobsWhenNotEnoughSlot fails on Azure [FLINK-23429] - State Processor API failed with FileNotFoundException when working with state files on Cloud Storage Improvement [FLINK-17857] - Kubernetes and docker e2e tests could not run on Mac OS after migration [FLINK-18182] - Upgrade AWS SDK in flink-connector-kinesis to include new region af-south-1 [FLINK-20695] - Zookeeper node under leader and leaderlatch is not deleted after job finished [FLINK-21229] - Support ssl connection with schema registry format [FLINK-21411] - The components on which Flink depends may contain vulnerabilities. If yes, fix them. [FLINK-22708] - Propagate savepoint settings from StreamExecutionEnvironment to StreamGraph [FLINK-22747] - Update commons-io to 2.8 [FLINK-22757] - Update GCS documentation [FLINK-22774] - Update Kinesis SQL connector&#39;s Guava to 27.0-jre [FLINK-22939] - Generalize JDK switch in azure setup [FLINK-23009] - Bump up Guava in Kinesis Connector [FLINK-23052] - cron_snapshot_deployment_maven unstable on maven [FLINK-23312] - Use -Dfast for building e2e tests on AZP `}),e.add({id:113,href:"/2021/08/06/apache-flink-1.13.2-released/",title:"Apache Flink 1.13.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.13 series.
This release includes 127 fixes and minor improvements for Flink 1.13.2. The list below includes bugfixes and improvements. For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.13.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.13.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.13.2 Sub-task [FLINK-22726] - Hive GROUPING__ID returns different value in older versions Bug [FLINK-20888] - ContinuousFileReaderOperator should not close the output on close() [FLINK-20975] - HiveTableSourceITCase.testPartitionFilter fails on AZP [FLINK-21389] - ParquetInputFormat should not need parquet schema as user input [FLINK-21445] - Application mode does not set the configuration when building PackagedProgram [FLINK-21952] - Make all the &quot;Connection reset by peer&quot; exception wrapped as RemoteTransportException [FLINK-22045] - Set log level for shaded zookeeper logger [FLINK-22195] - YARNHighAvailabilityITCase.testClusterClientRetrieval because of TestTimedOutException [FLINK-22203] - KafkaChangelogTableITCase.testKafkaCanalChangelogSource fail due to ConcurrentModificationException [FLINK-22272] - Some scenes can&#39;t drop table by hive catalog [FLINK-22312] - YARNSessionFIFOSecuredITCase&gt;YARNSessionFIFOITCase.checkForProhibitedLogContents due to the heartbeat exception with Yarn RM [FLINK-22376] - SequentialChannelStateReaderImpl may recycle buffer twice [FLINK-22443] - can not be execute an extreme long sql under batch mode [FLINK-22462] - JdbcExactlyOnceSinkE2eTest.testInsert failed because of too many clients. [FLINK-22464] - OperatorEventSendingCheckpointITCase.testOperatorEventLostWithReaderFailure hangs with \`AdaptiveScheduler\` [FLINK-22492] - KinesisTableApiITCase with wrong results [FLINK-22496] - ClusterEntrypointTest.testCloseAsyncShouldBeExecutedInShutdownHook failed [FLINK-22545] - JVM crashes when runing OperatorEventSendingCheckpointITCase.testOperatorEventAckLost [FLINK-22547] - OperatorCoordinatorHolderTest. verifyCheckpointEventOrderWhenCheckpointFutureCompletesLate fail [FLINK-22613] - FlinkKinesisITCase.testStopWithSavepoint fails [FLINK-22662] - YARNHighAvailabilityITCase.testKillYarnSessionClusterEntrypoint fail [FLINK-22683] - The total Flink/process memory of memoryConfiguration in /taskmanagers can be null or incorrect value [FLINK-22686] - Incompatible subtask mappings while resuming from unaligned checkpoints [FLINK-22689] - Table API Documentation Row-Based Operations Example Fails [FLINK-22698] - RabbitMQ source does not stop unless message arrives in queue [FLINK-22725] - SlotManagers should unregister metrics at the start of suspend() [FLINK-22730] - Lookup join condition with CURRENT_DATE fails to filter records [FLINK-22746] - Links to connectors in docs are broken [FLINK-22759] - Correct the applicability of RocksDB related options as per operator [FLINK-22760] - HiveParser::setCurrentTimestamp fails with hive-3.1.2 [FLINK-22777] - Restore lost sections in Try Flink DataStream API example [FLINK-22779] - KafkaChangelogTableITCase.testKafkaDebeziumChangelogSource fail due to ConcurrentModificationException [FLINK-22786] - sql-client can not create .flink-sql-history file [FLINK-22795] - Throw better exception when executing remote SQL file in SQL Client [FLINK-22796] - Update mem_setup_tm documentation [FLINK-22814] - New sources are not defining/exposing checkpointStartDelayNanos metric [FLINK-22815] - Disable unaligned checkpoints for broadcast partitioning [FLINK-22819] - YARNFileReplicationITCase fails with &quot;The YARN application unexpectedly switched to state FAILED during deployment&quot; [FLINK-22820] - Stopping Yarn session cluster will cause fatal error [FLINK-22833] - Source tasks (both old and new) are not reporting checkpointStartDelay via CheckpointMetrics [FLINK-22856] - Move our Azure pipelines away from Ubuntu 16.04 by September [FLINK-22863] - ArrayIndexOutOfBoundsException may happen when building rescale edges [FLINK-22884] - Select view columns fail when store metadata with hive [FLINK-22886] - Thread leak in RocksDBStateUploader [FLINK-22890] - Few tests fail in HiveTableSinkITCase [FLINK-22894] - Window Top-N should allow n=1 [FLINK-22898] - HiveParallelismInference limit return wrong parallelism [FLINK-22908] - FileExecutionGraphInfoStoreTest.testPutSuspendedJobOnClusterShutdown should wait until job is running [FLINK-22927] - Exception on JobClient.get_job_status().result() [FLINK-22945] - StackOverflowException can happen when a large scale job is CANCELING/FAILING [FLINK-22946] - Network buffer deadlock introduced by unaligned checkpoint [FLINK-22948] - Scala example for toDataStream does not compile [FLINK-22952] - docs_404_check fail on azure due to ruby version not available [FLINK-22961] - Incorrect calculation of alignment timeout for LocalInputChannel [FLINK-22963] - The description of taskmanager.memory.task.heap.size in the official document is incorrect [FLINK-22964] - Connector-base exposes dependency to flink-core. [FLINK-22966] - NPE in StateAssignmentOperation when rescaling [FLINK-22980] - FileExecutionGraphInfoStoreTest hangs on azure [FLINK-22982] - java.lang.ClassCastException when using Python UDF [FLINK-22987] - Scala suffix check isn&#39;t working [FLINK-22993] - CompactFileWriter won&#39;t emit EndCheckpoint with Long.MAX_VALUE checkpointId [FLINK-23001] - flink-avro-glue-schema-registry lacks scala suffix [FLINK-23003] - Resource leak in RocksIncrementalSnapshotStrategy [FLINK-23010] - HivePartitionFetcherContextBase::getComparablePartitionValueList can return partitions that don&#39;t exist [FLINK-23018] - State factories should handle extended state descriptors [FLINK-23024] - RPC result TaskManagerInfoWithSlots not serializable [FLINK-23025] - sink-buffer-max-rows and sink-buffer-flush-interval options produce a lot of duplicates [FLINK-23030] - PartitionRequestClientFactory#createPartitionRequestClient should throw when network failure [FLINK-23034] - NPE in JobDetailsDeserializer during the reading old version of ExecutionState [FLINK-23045] - RunnablesTest.testExecutorService_uncaughtExceptionHandler fails on azure [FLINK-23073] - Fix space handling in Row CSV timestamp parser [FLINK-23074] - There is a class conflict between flink-connector-hive and flink-parquet [FLINK-23092] - Built-in UDAFs could not be mixed use with Python UDAF in group window [FLINK-23096] - HiveParser could not attach the sessionstate of hive [FLINK-23119] - Fix the issue that the exception that General Python UDAF is unsupported is not thrown in Compile Stage. [FLINK-23120] - ByteArrayWrapperSerializer.serialize should use writeInt to serialize the length [FLINK-23121] - Fix the issue that the InternalRow as arguments in Python UDAF [FLINK-23129] - When cancelling any running job of multiple jobs in an application cluster, JobManager shuts down [FLINK-23133] - The dependencies are not handled properly when mixing use of Python Table API and Python DataStream API [FLINK-23151] - KinesisTableApiITCase.testTableApiSourceAndSink fails on azure [FLINK-23166] - ZipUtils doesn&#39;t handle properly for softlinks inside the zip file [FLINK-23182] - Connection leak in RMQSource [FLINK-23184] - CompileException Assignment conversion not possible from type &quot;int&quot; to type &quot;short&quot; [FLINK-23188] - Unsupported function definition: IFNULL. Only user defined functions are supported as inline functions [FLINK-23196] - JobMasterITCase fail on azure due to BindException [FLINK-23201] - The check on alignmentDurationNanos seems to be too strict [FLINK-23223] - When flushAlways is enabled the subpartition may lose notification of data availability [FLINK-23233] - OperatorEventSendingCheckpointITCase.testOperatorEventLostWithReaderFailure fails on azure [FLINK-23235] - SinkITCase.writerAndCommitterAndGlobalCommitterExecuteInStreamingMode fails on azure [FLINK-23248] - SinkWriter is not closed when failing [FLINK-23259] - [DOCS]The &#39;window&#39; link on page docs/dev/datastream/operators/overview is failed and 404 is returned [FLINK-23260] - [DOCS]The link on page docs/libs/gelly/overview is failed and 404 is returned [FLINK-23270] - Impove description of Regular Joins section [FLINK-23280] - Python ExplainDetails does not have JSON_EXECUTION_PLAN option [FLINK-23306] - FlinkRelMdUniqueKeys causes exception when used with new Schema [FLINK-23359] - Fix the number of available slots in testResourceCanBeAllocatedForDifferentJobAfterFree [FLINK-23368] - Fix the wrong mapping of state cache in PyFlink [FLINK-23429] - State Processor API failed with FileNotFoundException when working with state files on Cloud Storage New Feature [FLINK-22770] - Expose SET/RESET from the parser Improvement [FLINK-18182] - Upgrade AWS SDK in flink-connector-kinesis to include new region af-south-1 [FLINK-20140] - Add documentation of TableResult.collect for Python Table API [FLINK-21229] - Support ssl connection with schema registry format [FLINK-21393] - Implement ParquetAvroInputFormat [FLINK-21411] - The components on which Flink depends may contain vulnerabilities. If yes, fix them. [FLINK-22528] - Document latency tracking metrics for state accesses [FLINK-22638] - Keep channels blocked on alignment timeout [FLINK-22655] - When using -i &lt;init.sql&gt; option to initialize SQL Client session It should be possible to annotate the script with -- [FLINK-22722] - Add Documentation for Kafka New Source [FLINK-22747] - Update commons-io to 2.8 [FLINK-22766] - Report metrics of KafkaConsumer in Kafka new source [FLINK-22774] - Update Kinesis SQL connector&#39;s Guava to 27.0-jre [FLINK-22855] - Translate the &#39;Overview of Python API&#39; page into Chinese. [FLINK-22873] - Add ToC to configuration documentation [FLINK-22905] - Fix missing comma in SQL example in &quot;Versioned Table&quot; page [FLINK-22939] - Generalize JDK switch in azure setup [FLINK-22996] - The description about coalesce is wrong [FLINK-23009] - Bump up Guava in Kinesis Connector [FLINK-23052] - cron_snapshot_deployment_maven unstable on maven [FLINK-23138] - Raise an exception if types other than PickledBytesTypeInfo are specified for state descriptor [FLINK-23156] - Change the reference of &#39;docs/dev/table/sql/queries&#39; [FLINK-23157] - Fix missing comma in SQL example in &quot;Versioned View&quot; page [FLINK-23162] - Create table uses time_ltz in the column name and it&#39;s expression which results in exception [FLINK-23168] - Catalog shouldn&#39;t merge properties for alter DB operation [FLINK-23178] - Raise an error for writing stream data into partitioned hive tables without a partition committer [FLINK-23200] - Correct grammatical mistakes in &#39;Table API&#39; page of &#39;Table API &amp; SQL&#39; [FLINK-23226] - Flink Chinese doc learn-flink/etl transformation.svg display issue [FLINK-23312] - Use -Dfast for building e2e tests on AZP `}),e.add({id:114,href:"/2021/07/07/how-to-identify-the-source-of-backpressure/",title:"How to identify the source of backpressure?",section:"Flink Blog",content:` Backpressure monitoring in the web UI
The backpressure topic was tackled from different angles over the last couple of years. However, when it comes to identifying and analyzing sources of backpressure, things have changed quite a bit in the recent Flink releases (especially with new additions to metrics and the web UI in Flink 1.13). This post will try to clarify some of these changes and go into more detail about how to track down the source of backpressure, but first&hellip;
What is backpressure? # This has been explained very well in an old, but still accurate, post by Ufuk Celebi. I highly recommend reading it if you are not familiar with this concept. For a much deeper and low-level understanding of the topic and how Flink’s network stack works, there is a more advanced explanation available here.
At a high level, backpressure happens if some operator(s) in the Job Graph cannot process records at the same rate as they are received. This fills up the input buffers of the subtask that is running this slow operator. Once the input buffers are full, backpressure propagates to the output buffers of the upstream subtasks. Once those are filled up, the upstream subtasks are also forced to slow down their records’ processing rate to match the processing rate of the operator causing this bottleneck down the stream. Backpressure further propagates up the stream until it reaches the source operators.
As long as the load and available resources are static and none of the operators produce short bursts of data (like windowing operators), those input/output buffers should only be in one of two states: almost empty or almost full. If the downstream operator or subtask is able to keep up with the influx of data, the buffers will be empty. If not, then the buffers will be full [1]. In fact, checking the buffers’ usage metrics was the basis of the previously recommended way on how to detect and analyze backpressure described a couple of years back by Nico Kruber. As I mentioned in the beginning, Flink now offers much better tools to do the same job, but before we get to that, there are two questions worth asking.
Why should I care about backpressure? # Backpressure is an indicator that your machines or operators are overloaded. The buildup of backpressure directly affects the end-to-end latency of the system, as records are waiting longer in the queues before being processed. Secondly, aligned checkpointing takes longer with backpressure, while unaligned checkpoints will be larger (you can read more about aligned and unaligned checkpoints in the documentation. If you are struggling with checkpoint barriers propagation times, taking care of backpressure would most likely help to solve the problem. Lastly, you might just want to optimize your job in order to reduce the costs of running the job.
In order to address the problem for all cases, one needs to be aware of it, then locate and analyze it.
Why shouldn’t I care about backpressure? # Frankly, you do not always have to care about the presence of backpressure. Almost by definition, lack of backpressure means that your cluster is at least ever so slightly underutilized and over-provisioned. If you want to minimize idling resources, you probably can not avoid incurring some backpressure. This is especially true for batch processing.
How to detect and track down the source of backpressure? # One way to detect backpressure is to use metrics, however, in Flink 1.13 it’s no longer necessary to dig so deep. In most cases, it should be enough to just look at the job graph in the Web UI.
The first thing to note in the example above is that different tasks have different colors. Those colors represent a combination of two factors: under how much backpressure this task is and how busy it is. Idling tasks will be blue, fully busy tasks will be red hot, and fully backpressured tasks will be black. Anything in between will be a combination/shade of those three colors. With this knowledge, one can easily spot the backpressured tasks (black). The busiest (red) task downstream of the backpressured tasks will most likely be the source of the backpressure (the bottleneck).
If you click on one particular task and go into the “BackPressure” tab you will be able to further dissect the problem and check what is the busy/backpressured/idle status of every subtask in that task. For example, this is especially handy if there is a data skew and not all subtasks are equally utilized.
Backpressure among subtasks
In the above example, we can clearly see which subtasks are idling, which are backpressured, and that none of them are busy. And frankly, in a nutshell, that should be enough to quickly understand what is happening with your Job :) However, there are a couple of more details worth explaining.
What are those numbers? # If you are curious how it works underneath, we can go a little deeper. At the base of this new mechanism we have three new metrics that are exposed and calculated by each subtask:
idleTimeMsPerSecond busyTimeMsPerSecond backPressuredTimeMsPerSecond Each of them measures the average time in milliseconds per second that the subtask spent being idle, busy, or backpressured respectively. Apart from some rounding errors they should complement each other and add up to 1000ms/s. In essence, they are quite similar to, for example, CPU usage metrics.
Another important detail is that they are being averaged over a short period of time (a couple of seconds) and they take into account everything that is happening inside the subtask’s thread: operators, functions, timers, checkpointing, records serialization/deserialization, network stack, and other Flink internal overheads. A WindowOperator that is busy firing timers and producing results will be reported as busy or backpressured. A function doing some expensive computation in CheckpointedFunction#snapshotState call, for instance flushing internal buffers, will also be reported as busy.
One limitation, however, is that busyTimeMsPerSecond and idleTimeMsPerSecond metrics are oblivious to anything that is happening in separate threads, outside of the main subtask’s execution loop. Fortunately, this is only relevant for two cases:
Custom threads that you manually spawn in your operators (a discouraged practice). Old-style sources that implement the deprecated SourceFunction interface. Such sources will report NaN/N/A as the value for busyTimeMsPerSecond. For more information on the topic of Data Sources please take a look here. Old-style sources do not report busy time
In order to present those raw numbers in the web UI, those metrics need to be aggregated from all subtasks (on the job graph we are showing only tasks). This is why the web UI presents the maximal value from all subtasks of a given task and why the aggregated maximal values of busy and backpressured may not add up to 100%. One subtask can be backpressured at 60%, while another can be busy at 60%. This can result in a task that is both backpressured and busy at 60%.
Varying load # There is one more thing. Do you remember that those metrics are measured and averaged over a couple of seconds? Keep this in mind when analyzing jobs or tasks with varying load, such as (sub)tasks containing a WindowOperator that is firing periodically. Both the subtask with a constant load of 50% and the subtask that alternates every second between being fully busy and fully idle will be reporting the same value of busyTimeMsPerSecond of 500ms/s.
Furthermore, varying load and especially firing windows can move the bottleneck to a different place in the job graph:
Bottleneck alternating between two tasks
SlidingWindowOperator
In this particular example, SlidingWindowOperator was the bottleneck as long as it was accumulating records. However, as soon as it starts to fire its windows (once every 10 seconds), the downstream task SlidingWindowCheckMapper -&gt; Sink: SlidingWindowCheckPrintSink becomes the bottleneck and SlidingWindowOperator gets backpressured. As those busy/backpressured/idle metrics are averaging time over a couple of seconds, this subtlety is not immediately visible and has to be read between the lines. On top of that, the web UI is updating its state only once every 10 seconds, which makes spotting more frequent changes a bit more difficult.
What can I do with backpressure? # In general this is a complex topic that is worthy of a dedicated blog post. It was, to a certain extent, addressed in previous blog posts. In short, there are two high-level ways of dealing with backpressure. Either add more resources (more machines, faster CPU, more RAM, better network, using SSDs…) or optimize usage of the resources you already have (optimize the code, tune the configuration, avoid data skew). In either case, you first need to analyze what is causing backpressure by:
Identifying the presence of backpressure. Locating which subtask(s) or machines are causing it. Digging deeper into what part of the code is causing it and which resource is scarce. Backpressure monitoring improvements and metrics can help you with the first two points. To tackle the last one, profiling the code can be the way to go. To help with profiling, also starting from Flink 1.13, Flame Graphs are integrated into Flink&rsquo;s web UI. Flame Graphs is a well known profiling tool and visualization technique and I encourage you to give it a try.
But keep in mind that after locating where the bottleneck is, you can analyze it the same way you would any other non-distributed application (by checking resource utilization, attaching a profiler, etc). Usually there is no silver bullet for problems like this. You can try to scale up but sometimes it might not be easy or practical to do.
Anyway&hellip; The aforementioned improvements to backpressure monitoring allow us to easily detect the source of backpressure, and Flame Graphs can help us to analyze why a particular subtask is causing problems. Together those two features should make the previously quite tedious process of debugging and performance analysis of Flink jobs that much easier! Please upgrade to Flink 1.13.x and try them out!
[1] There is a third possibility. In a rare case when network exchange is actually the bottleneck in your job, the downstream task will have empty input buffers, while upstream output buffers will be full. `}),e.add({id:115,href:"/2021/05/28/apache-flink-1.13.1-released/",title:"Apache Flink 1.13.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.13 series.
This release includes 82 fixes and minor improvements for Flink 1.13.1. The list below includes bugfixes and improvements. For a complete list of all changes see: JIRA.
We highly recommend all users to upgrade to Flink 1.13.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.13.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.13.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.13.1 Sub-task [FLINK-22378] - Type mismatch when declaring SOURCE_WATERMARK on TIMESTAMP_LTZ column [FLINK-22666] - Make structured type&#39;s fields more lenient during casting Bug [FLINK-12351] - AsyncWaitOperator should deep copy StreamElement when object reuse is enabled [FLINK-17170] - Cannot stop streaming job with savepoint which uses kinesis consumer [FLINK-19449] - LEAD/LAG cannot work correctly in streaming mode [FLINK-21181] - Buffer pool is destroyed error when outputting data over a timer after cancellation. [FLINK-21247] - flink iceberg table map&lt;string,string&gt; cannot convert to datastream [FLINK-21469] - stop-with-savepoint --drain doesn&#39;t advance watermark for sources chained to MultipleInputStreamTask [FLINK-21923] - SplitAggregateRule will be abnormal, when the sum/count and avg in SQL at the same time [FLINK-22109] - Misleading exception message if the number of arguments of a nested function is incorrect [FLINK-22294] - Hive reading fail when getting file numbers on different filesystem nameservices [FLINK-22355] - Simple Task Manager Memory Model image does not show up [FLINK-22356] - Filesystem/Hive partition file is not committed when watermark is applied on rowtime of TIMESTAMP_LTZ type [FLINK-22408] - Flink Table Parsr Hive Drop Partitions Syntax unparse is Error [FLINK-22424] - Writing to already released buffers potentially causing data corruption during job failover/cancellation [FLINK-22431] - AdaptiveScheduler does not log failure cause when recovering [FLINK-22434] - Dispatcher does not store suspended jobs in execution graph store [FLINK-22438] - add numRecordsOut metric for Async IO [FLINK-22442] - Using scala api to change the TimeCharacteristic of the PatternStream is invalid [FLINK-22463] - IllegalArgumentException is thrown in WindowAttachedWindowingStrategy when two phase is enabled for distinct agg [FLINK-22479] - [Kinesis][Consumer] Potential lock-up under error condition [FLINK-22489] - subtask backpressure indicator shows value for entire job [FLINK-22494] - Avoid discarding checkpoints in case of failure [FLINK-22502] - DefaultCompletedCheckpointStore drops unrecoverable checkpoints silently [FLINK-22511] - Fix the bug of non-composite result type in Python TableAggregateFunction [FLINK-22512] - Can&#39;t call current_timestamp with hive dialect for hive-3.1 [FLINK-22522] - BytesHashMap has many verbose logs [FLINK-22523] - TUMBLE TVF should throw helpful exception when specifying second interval parameter [FLINK-22525] - The zone id in exception message should be GMT+08:00 instead of GMT+8:00 [FLINK-22535] - Resource leak would happen if exception thrown during AbstractInvokable#restore of task life [FLINK-22555] - LGPL-2.1 files in flink-python jars [FLINK-22573] - AsyncIO can timeout elements after completion [FLINK-22574] - Adaptive Scheduler: Can not cancel restarting job [FLINK-22592] - numBuffersInLocal is always zero when using unaligned checkpoints [FLINK-22596] - Active timeout is not triggered if there were no barriers [FLINK-22618] - Fix incorrect free resource metrics of task managers [FLINK-22654] - SqlCreateTable toString()/unparse() lose CONSTRAINTS and watermarks [FLINK-22661] - HiveInputFormatPartitionReader can return invalid data [FLINK-22688] - Root Exception can not be shown on Web UI in Flink 1.13.0 [FLINK-22706] - Source NOTICE outdated regarding docs/ [FLINK-22721] - Breaking HighAvailabilityServices interface by adding new method [FLINK-22733] - Type mismatch thrown in DataStream.union if parameter is KeyedStream for Python DataStream API Improvement [FLINK-18952] - Add 10 minutes to DataStream API documentation [FLINK-20695] - Zookeeper node under leader and leaderlatch is not deleted after job finished [FLINK-22250] - flink-sql-parser model Class ParserResource lack ParserResource.properties [FLINK-22301] - Statebackend and CheckpointStorage type is not shown in the Web UI [FLINK-22304] - Refactor some interfaces for TVF based window to improve the extendability [FLINK-22470] - The root cause of the exception encountered during compiling the job was not exposed to users in certain cases [FLINK-22560] - Filter maven metadata from all jars [FLINK-22699] - Make ConstantArgumentCount public API [FLINK-22708] - Propagate savepoint settings from StreamExecutionEnvironment to StreamGraph [FLINK-22725] - SlotManagers should unregister metrics at the start of suspend() `}),e.add({id:116,href:"/2021/05/21/apache-flink-1.12.4-released/",title:"Apache Flink 1.12.4 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.
This release includes 21 fixes and minor improvements for Flink 1.12.3. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.12.4.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.12.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.4&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.12.4
Bug [FLINK-17170] - Cannot stop streaming job with savepoint which uses kinesis consumer [FLINK-20114] - Fix a few KafkaSource-related bugs [FLINK-21181] - Buffer pool is destroyed error when outputting data over a timer after cancellation. [FLINK-22109] - Misleading exception message if the number of arguments of a nested function is incorrect [FLINK-22368] - UnalignedCheckpointITCase hangs on azure [FLINK-22424] - Writing to already released buffers potentially causing data corruption during job failover/cancellation [FLINK-22438] - add numRecordsOut metric for Async IO [FLINK-22442] - Using scala api to change the TimeCharacteristic of the PatternStream is invalid [FLINK-22479] - [Kinesis][Consumer] Potential lock-up under error condition [FLINK-22489] - subtask backpressure indicator shows value for entire job [FLINK-22555] - LGPL-2.1 files in flink-python jars [FLINK-22557] - Japicmp fails on 1.12 branch [FLINK-22573] - AsyncIO can timeout elements after completion [FLINK-22577] - KubernetesLeaderElectionAndRetrievalITCase is failing [FLINK-22597] - JobMaster cannot be restarted Improvement [FLINK-18952] - Add 10 minutes to DataStream API documentation [FLINK-20553] - Add end-to-end test case for new Kafka source [FLINK-22470] - The root cause of the exception encountered during compiling the job was not exposed to users in certain cases [FLINK-22539] - Restructure the Python dependency management documentation [FLINK-22544] - Add the missing documentation about the command line options for PyFlink [FLINK-22560] - Filter maven metadata from all jars `}),e.add({id:117,href:"/2021/05/06/scaling-flink-automatically-with-reactive-mode/",title:"Scaling Flink automatically with Reactive Mode",section:"Flink Blog",content:` Introduction # Streaming jobs which run for several days or longer usually experience variations in workload during their lifetime. These variations can originate from seasonal spikes, such as day vs. night, weekdays vs. weekend or holidays vs. non-holidays, sudden events or simply the growing popularity of your product. Although some of these variations are more predictable than others, in all cases there is a change in job resource demand that needs to be addressed if you want to ensure the same quality of service for your customers.
A simple way of quantifying the mismatch between the required resources and the available resources is to measure the space between the actual load and the number of available workers. As pictured below, in the case of static resource allocation, you can see that there&rsquo;s a big gap between the actual load and the available workers — hence, we are wasting resources. For elastic resource allocation, the gap between the red and black line is consistently small.
Manually rescaling a Flink job has been possible since Flink 1.2 introduced rescalable state, which allows you to stop-and-restore a job with a different parallelism. For example, if your job is running with a parallelism of p=100 and your load increases, you can restart it with p=200 to cope with the additional data.
The problem with this approach is that you have to orchestrate a rescale operation with custom tools by yourself, including error handling and similar tasks.
Reactive Mode introduces a new option in Flink 1.13: You monitor your Flink cluster and add or remove resources depending on some metrics, Flink will do the rest. Reactive Mode is a mode where JobManager will try to use all TaskManager resources available.
The big benefit of Reactive Mode is that you don&rsquo;t need any specific knowledge to scale Flink anymore. Flink basically behaves like a fleet of servers (e.g. webservers, caches, batch processing) that you can expand or shrink as you wish. Since this is such a common pattern, there is a lot of infrastructure available for handling such cases: all major cloud providers offer utilities to monitor specific metrics and automatically scale a set of machines accordingly. For example, this would be provided through Auto Scaling groups in AWS, and Managed Instance groups in Google Cloud. Similarly, Kubernetes provides Horizontal Pod Autoscalers.
What is interesting, as a side note, is that unlike most auto scalable &ldquo;fleets of servers&rdquo;, Flink is a stateful system, often processing valuable data requiring strong correctness guarantees (comparable to a database). But, unlike many traditional databases, Flink is resilient enough (through checkpointing and state backups) to adjust to changing workloads by just adding or removing resources, with very little requirements (i.e. simple blob store for state backups).
Getting Started # If you want to try out Reactive Mode yourself locally, follow these steps using a Flink 1.13.0 distribution:
# These instructions assume you are in the root directory of a Flink distribution. # Put Job into usrlib/ directory mkdir usrlib cp ./examples/streaming/TopSpeedWindowing.jar usrlib/ # Submit Job in Reactive Mode ./bin/standalone-job.sh start -Dscheduler-mode=reactive -Dexecution.checkpointing.interval=&#34;10s&#34; -j org.apache.flink.streaming.examples.windowing.TopSpeedWindowing # Start first TaskManager ./bin/taskmanager.sh start You have now started a Flink job in Reactive Mode. The web interface shows that the job is running on one TaskManager. If you want to scale up the job, simply add another TaskManager to the cluster:
# Start additional TaskManager ./bin/taskmanager.sh start To scale down, remove a TaskManager instance:
# Remove a TaskManager ./bin/taskmanager.sh stop Reactive Mode also works when deploying Flink on Docker or using the standalone Kubernetes deployment (both only as application clusters).
Demo on Kubernetes # In this section, we want to demonstrate the new Reactive Mode in a real-world scenario. You can use this demo as a starting point for your own scalable deployment of Flink on Kubernetes, or as a template for building your own deployment using a different setup.
The Setup # The central idea of this demo is to use a Kubernetes Horizontal Pod Autoscaler, which monitors the CPU load of all TaskManager pods and adjusts their replication factor accordingly. On high CPU load, the autoscaler should add more TaskManagers, distributing the load across more machines. On low load, it should stop TaskManagers to save resources.
The whole setup is presented here:
Let&rsquo;s discuss the components:
Flink
The JobManager is deployed as a Kubernetes job. We are submitting a container that is based on the official Flink Docker image, but has the jar file of our job added to it. The Flink job simply reads data from a Kafka topic and does some expensive math operations per event received. We use these math operations to generate high CPU loads, without requiring a large Kafka deployment. The TaskManager(s) are deployed as a Kubernetes deployment, which is scaled through a Horizontal Pod Autoscaler. In this experiment, the autoscaler is monitoring the CPU load of the pods in the deployment. The number of pods is adjusted between 1 and 15 pods by the autoscaler. Additional Components:
We have a Zookeeper and Kafka deployment (each with one pod) to provide a Kafka topic that serves as the input for the Flink job. The Data Generator pod produces simple string messages at a adjustable rate to the Kafka topic. In this experiment, the rate is following a sine wave. For monitoring, we are deploying Prometheus and Grafana. The entire setup is available on GitHub if you want to try this out yourself.
Results # We&rsquo;ve deployed all the above components on a hosted Kubernetes cluster, running it for several days. The results are best examined based on the following Grafana dashboard:
Reactive Mode Experiment Results
Let&rsquo;s take a closer look at the dashboard:
On the top left, you can see the Kafka consumer lag, reported by Flink&rsquo;s Kafka consumer (source), which reports the queue size of unprocessed messages. A high lag means that Flink is not processing messages as fast as they are produced: we need to scale up.
The lag is usually following the throughput of data coming from Kafka. When the throughput is the highest, the reported lag is at ~75k messages. In low throughput times, it is basically at zero.
On the top right, you&rsquo;ll see the throughput, measured in records per second, as reported by Flink. The throughput is roughly following a sine wave, peaking at 6k messages per second, and going down to almost zero.
The bottom left chart shows the CPU load per TaskManager. We&rsquo;ve added this metric to the dashboard because this is what the pod autoscaler in Kubernetes will use to decide on the replica count of the TaskManager deployment. You can see that, as soon as a certain CPU load is reached, additional TaskManagers are started.
In the bottom right chart, you can see the TaskManager count over time. When the throughput (and CPU load) is peaking, we&rsquo;re running on 5 TaskManagers (with some peaks up to even 8). On low throughput, we&rsquo;re running the minimal number of just one TaskManager. This chart showcases nicely that Reactive Mode is working as expected in this experiment: the number of TaskManagers is adjusting to the load on the system.
Lessons Learned: Configuring a low heartbeat timeout for a smooth scale down # When we initially started with the experiment, we noticed some anomalies in the behavior of Flink, depicted in this chart:
Reactive Mode not scaling down properly
In all the charts, we see sudden spikes or drops: The consumer lag is going to up to 600k messages (that&rsquo;s 8 times more than the usual 75k lag we observe at peak), the throughput seems to peak (and drop). On the &ldquo;Number of TaskManagers&rdquo; chart, we see that we are not following the throughput line very nicely. We are wasting resources by allocating too many TaskManagers for the given at rate.
We see that these issues are only occurring when the load is decreasing, and Reactive Mode is supposed to scale down. So what is happening here?
The Flink JobManager is sending periodic heartbeats to the TaskManagers, to check if they are still alive. These heartbeats have a default timeout of 50 seconds. This value might seem high, but in high load scenarios, there might be network congestions, garbage collection pauses or other disruptions that cause slow heartbeats. We don&rsquo;t want to consider a TaskManager dead just because of a temporary disruption.
However, this default value is causing problems in this experiment: When the Kubernetes autoscaler notices that the CPU load has gone down, it will reduce the replica count of the TaskManager deployment, stopping at least one TaskManager instance. Flink will almost immediately stop processing messages, because of the connection loss in the data transport layer of Flink. However, the JobManager will wait for 50 seconds (the default heartbeat timeout) before the TaskManager is considered dead.
During this waiting period, the throughput is at zero and messages are queuing in Kafka (causing spikes in the consumer lag). Once Flink is running again, Flink will try to catch up on the queued messages, causing a spike in CPU load. The autoscaler notices this load spike and allocates more TaskManagers.
We are only seeing this effect on scale down, because a scale down is much more disruptive than scaling up. Scale up, which means adding TaskManagers, is disrupting the processing only for the duration of a job restart (which is fast since our application state are just a few bytes for the Kafka offsets), while scaling down is disrupting the processing for roughly 50 seconds.
To mitigate this issue, we have reduced the heartbeat.timeout in our experiment to 8 seconds. Additionally, we are looking into improving the behavior of the JobManager to detect TaskManager losses better and faster.
Conclusion # In this blog post, we&rsquo;ve introduced Reactive Mode, a big step forward in Flink&rsquo;s ability to dynamically adjust to changing workloads, reducing resource utilization and overall costs. The blog post demonstrated Reactive Mode on Kubernetes, including some lessons learned.
Reactive Mode is new feature in Flink 1.13 and is currently in the MVP (Minimal Viable Product) phase of product development. Before experimenting with it, or using it in production, please check the documentation, in particular the current limitations section. In this phase, the biggest limitation is that only standalone application mode deployments are supported (i.e. no active resource managers or session clusters).
The community is actively looking for feedback on this feature, to continue improving Flink&rsquo;s resource elasticity. If you have any feedback, please reach out to the dev@ mailing list or to me personally on Twitter.
`}),e.add({id:118,href:"/2021/05/03/apache-flink-1.13.0-release-announcement/",title:"Apache Flink 1.13.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink 1.13.0! More than 200 contributors worked on over 1,000 issues for this new version.
The release brings us a big step forward in one of our major efforts: Making Stream Processing Applications as natural and as simple to manage as any other application. The new reactive scaling mode means that scaling streaming applications in and out now works like in any other application by just changing the number of parallel processes.
The release also prominently features a series of improvements that help users better understand the performance of applications. When the streams don&rsquo;t flow as fast as you&rsquo;d hope, these can help you to understand why: Load and backpressure visualization to identify bottlenecks, CPU flame graphs to identify hot code paths in your application, and State Access Latencies to see how the State Backends are keeping up.
Beyond those features, the Flink community has added a ton of improvements all over the system, some of which we discuss in this article. We hope you enjoy the new release and features. Towards the end of the article, we describe changes to be aware of when upgrading from earlier versions of Apache Flink.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
Notable features # Reactive scaling # Reactive scaling is the latest piece in Flink&rsquo;s initiative to make Stream Processing Applications as natural and as simple to manage as any other application.
Flink has a dual nature when it comes to resource management and deployments: You can deploy Flink applications onto resource orchestrators like Kubernetes or Yarn in such a way that Flink actively manages the resources and allocates and releases workers as needed. That is especially useful for jobs and applications that rapidly change their required resources, like batch applications and ad-hoc SQL queries. The application parallelism rules, the number of workers follows. In the context of Flink applications, we call this active scaling.
For long-running streaming applications, it is often a nicer model to just deploy them like any other long-running application: The application doesn&rsquo;t really need to know that it runs on K8s, EKS, Yarn, etc. and doesn&rsquo;t try to acquire a specific amount of workers; instead, it just uses the number of workers that are given to it. The number of workers rules, the application parallelism adjusts to that. In the context of Flink, we call that reactive scaling.
The Application Deployment Mode started this effort, making deployments more application-like (by avoiding two separate deployment steps to (1) start a cluster and (2) submit an application). The reactive scaling mode completes this, and you now don&rsquo;t have to use extra tools (scripts, or a K8s operator) anymore to keep the number of workers, and the application parallelism settings in sync.
You can now put an auto-scaler around Flink applications like around other typical applications — as long as you are mindful about the cost of rescaling when configuring the autoscaler: Stateful streaming applications must move state around when scaling.
To try the reactive-scaling mode, add the scheduler-mode: reactive config entry and deploy an application cluster (standalone or Kubernetes). Check out the reactive scaling docs for more details.
Analyzing application performance # Like for any application, analyzing and understanding the performance of a Flink application is critical. Often even more critical, because Flink applications are typically data-intensive (processing high volumes of data) and are at the same time expected to provide results within (near-) real-time latencies.
When an application doesn&rsquo;t keep up with the data rate anymore, or an application takes more resources than you&rsquo;d expect it would, these new tools can help you track down the causes:
Bottleneck detection, Back Pressure monitoring
The first question during performance analysis is often: Which operation is the bottleneck?
To help answer that, Flink exposes metrics about the degree to which tasks are busy (doing work) and back-pressured (have the capacity to do work but cannot because their successor operators cannot accept more results). Candidates for bottlenecks are the busy operators whose predecessors are back-pressured.
Flink 1.13 brings an improved back pressure metric system (using task mailbox timings rather than thread stack sampling), and a reworked graphical representation of the job&rsquo;s dataflow with color-coding and ratios for busyness and backpressure.
CPU flame graphs in Web UI
The next question during performance analysis is typically: What part of work in the bottlenecked operator is expensive?
One visually effective means to investigate that is Flame Graphs. They help answer question like:
Which methods are currently consuming CPU resources?
How does one method&rsquo;s CPU consumption compare to other methods?
Which series of calls on the stack led to executing a particular method?
Flame Graphs are constructed by repeatedly sampling the thread stack traces. Every method call is represented by a bar, where the length of the bar is proportional to the number of times it is present in the samples. When enabled, the graphs are shown in a new UI component for the selected operator.
The Flame Graphs documentation contains more details and instructions for enabling this feature.
Access Latency Metrics for State
Another possible performance bottleneck can be the state backend, especially when your state is larger than the main memory available to Flink and you are using the RocksDB state backend.
That&rsquo;s not saying RocksDB is slow (we love RocksDB!), but it has some requirements to achieve good performance. For example, it is easy to accidentally starve RocksDB&rsquo;s demand for IOPs on cloud setups with the wrong type of disk resources.
On top of the CPU flame graphs, the new state backend latency metrics can help you understand whether your state backend is responsive. For example, if you see that RocksDB state accesses start to take milliseconds, you probably need to look into your memory and I/O configuration. These metrics can be activated by setting the state.backend.rocksdb.latency-track-enabled option. The metrics are sampled, and their collection should have a marginal impact on the RocksDB state backend performance.
Switching State Backend with savepoints # You can now change the state backend of a Flink application when resuming from a savepoint. That means the application&rsquo;s state is no longer locked into the state backend that was used when the application was initially started.
This makes it possible, for example, to initially start with the HashMap State Backend (pure in-memory in JVM Heap) and later switch to the RocksDB State Backend, once the state grows too large.
Under the hood, Flink now has a canonical savepoint format, which all state backends use when creating a data snapshot for a savepoint.
User-specified pod templates for Kubernetes deployments # The native Kubernetes deployment (where Flink actively talks to K8s to start and stop pods) now supports custom pod templates.
With those templates, users can set up and configure the JobManagers and TaskManagers pods in a Kubernetes-y way, with flexibility beyond the configuration options that are directly built into Flink&rsquo;s Kubernetes integration.
Unaligned Checkpoints - production-ready # Unaligned Checkpoints have matured to the point where we encourage all users to try them out, if they see issues with their application under backpressure.
In particular, these changes make Unaligned Checkpoints easier to use:
You can now rescale applications from unaligned checkpoints. This comes in handy if your application needs to be scaled from a retained checkpoint because you cannot (afford to) create a savepoint.
Enabling unaligned checkpoints is cheaper for applications that are not back-pressured. Unaligned checkpoints can now trigger adaptively with a timeout, meaning a checkpoint starts as an aligned checkpoint (not storing any in-flight events) and falls back to an unaligned checkpoint (storing some in-flight events), if the alignment phase takes longer than a certain time.
Find out more about how to enable unaligned checkpoints in the Checkpointing Documentation.
Machine Learning Library moving to a separate repository # To accelerate the development of Flink&rsquo;s Machine Learning efforts (streaming, batch, and unified machine learning), the effort has moved to the new repository flink-ml under the Flink project. We here follow a similar approach like the Stateful Functions effort, where a separate repository has helped to speed up the development by allowing for more light-weight contribution workflows and separate release cycles.
Stay tuned for more updates in the Machine Learning efforts, like the interplay with ALink (suite of many common Machine Learning Algorithms on Flink) or the Flink &amp; TensorFlow integration.
Notable SQL &amp; Table API improvements # Like in previous releases, SQL and the Table API remain an area of big developments.
Windows via Table-valued functions # Defining time windows is one of the most frequent operations in streaming SQL queries. Flink 1.13 introduces a new way to define windows: via Table-valued Functions. This approach is both more expressive (lets you define new types of windows) and fully in line with the SQL standard.
Flink 1.13 supports TUMBLE and HOP windows in the new syntax, SESSION windows will follow in a subsequent release. To demonstrate the increased expressiveness, consider the two examples below.
A new CUMULATE window function that assigns windows with an expanding step size until the maximum window size is reached:
SELECT window_time, window_start, window_end, SUM(price) AS total_price FROM TABLE(CUMULATE(TABLE Bid, DESCRIPTOR(bidtime), INTERVAL &#39;2&#39; MINUTES, INTERVAL &#39;10&#39; MINUTES)) GROUP BY window_start, window_end, window_time; You can reference the window start and window end time of the table-valued window functions, making new types of constructs possible. Beyond regular windowed aggregations and windowed joins, you can, for example, now express windowed Top-K aggregations:
SELECT window_time, ... FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY window_start, window_end ORDER BY total_price DESC) as rank FROM t ) WHERE rank &lt;= 100; Improved interoperability between DataStream API and Table API/SQL # This release radically simplifies mixing DataStream API and Table API programs.
The Table API is a great way to develop applications, with its declarative nature and its many built-in functions. But sometimes, you need to escape to the DataStream API for its expressiveness, flexibility, and explicit control over the state.
The new methods StreamTableEnvironment.toDataStream()/.fromDataStream() can model a DataStream from the DataStream API as a table source or sink. Notable improvements include:
Automatic type conversion between the DataStream and Table API type systems.
Seamless integration of event time configurations; watermarks flow through boundaries for high consistency.
Enhancements to the Row class (representing row events from the Table API) has received a major overhaul (improving the behavior of toString()/hashCode()/equals() methods) and now supports accessing fields by name, with support for sparse representations.
Table table = tableEnv.fromDataStream( dataStream, Schema.newBuilder() .columnByMetadata(&#34;rowtime&#34;, &#34;TIMESTAMP(3)&#34;) .watermark(&#34;rowtime&#34;, &#34;SOURCE_WATERMARK()&#34;) .build()); DataStream&lt;Row&gt; dataStream = tableEnv.toDataStream(table) .keyBy(r -&gt; r.getField(&#34;user&#34;)) .window(...); SQL Client: Init scripts and Statement Sets # The SQL Client is a convenient way to run and deploy SQL streaming and batch jobs directly, without writing any code from the command line, or as part of a CI/CD workflow.
This release vastly improves the functionality of the SQL client. Almost all operations as that are available to Java applications (when programmatically launching queries from the TableEnvironment) are now supported in the SQL Client and as SQL scripts. That means SQL users need much less glue code for their SQL deployments.
Easier Configuration and Code Sharing
The support of YAML files to configure the SQL Client will be discontinued. Instead, the client accepts one or more initialization scripts to configure a session before the main SQL script gets executed.
These init scripts would typically be shared across teams/deployments and could be used for loading common catalogs, applying common configuration settings, or defining standard views.
./sql-client.sh -i init1.sql init2.sql -f sqljob.sql More config options
A greater set of recognized config options and improved SET/RESET commands make it easier to define and control the execution from within the SQL client and SQL scripts.
Multi-query Support with Statement Sets
Multi-query execution lets you execute multiple SQL queries (or statements) as a single Flink job. This is particularly useful for streaming SQL queries that run indefinitely.
Statement Sets are the mechanism to group the queries together that should be executed together.
The following is an example of a SQL script that can be run via the SQL client. It sets up and configures the environment and executes multiple queries. The script captures end-to-end the queries and all environment setup and configuration work, making it a self-contained deployment artifact.
-- set up a catalog CREATE CATALOG hive_catalog WITH (&#39;type&#39; = &#39;hive&#39;); USE CATALOG hive_catalog; -- or use temporary objects CREATE TEMPORARY TABLE clicks ( user_id BIGINT, page_id BIGINT, viewtime TIMESTAMP ) WITH ( &#39;connector&#39; = &#39;kafka&#39;, &#39;topic&#39; = &#39;clicks&#39;, &#39;properties.bootstrap.servers&#39; = &#39;...&#39;, &#39;format&#39; = &#39;avro&#39; ); -- set the execution mode for jobs SET execution.runtime-mode=streaming; -- set the sync/async mode for INSERT INTOs SET table.dml-sync=false; -- set the job&#39;s parallelism SET parallism.default=10; -- set the job name SET pipeline.name = my_flink_job; -- restore state from the specific savepoint path SET execution.savepoint.path=/tmp/flink-savepoints/savepoint-bb0dab; BEGIN STATEMENT SET; INSERT INTO pageview_pv_sink SELECT page_id, count(1) FROM clicks GROUP BY page_id; INSERT INTO pageview_uv_sink SELECT page_id, count(distinct user_id) FROM clicks GROUP BY page_id; END; Hive query syntax compatibility # You can now write SQL queries against Flink using the Hive SQL syntax. In addition to Hive&rsquo;s DDL dialect, Flink now also accepts the commonly-used Hive DML and DQL dialects.
To use the Hive SQL dialect, set table.sql-dialect to hive and load the HiveModule. The latter is important because Hive&rsquo;s built-in functions are required for proper syntax and semantics compatibility. The following example illustrates that:
CREATE CATALOG myhive WITH (&#39;type&#39; = &#39;hive&#39;); -- setup HiveCatalog USE CATALOG myhive; LOAD MODULE hive; -- setup HiveModule USE MODULES hive,core; SET table.sql-dialect = hive; -- enable Hive dialect SELECT key, value FROM src CLUSTER BY key; -- run some Hive queries Please note that the Hive dialect no longer supports Flink&rsquo;s SQL syntax for DML and DQL statements. Switch back to the default dialect for Flink&rsquo;s syntax.
Improved behavior of SQL time functions # Working with time is a crucial element of any data processing. But simultaneously, handling different time zones, dates, and times is an increadibly delicate task when working with data.
In Flink 1.13. we put much effort into simplifying the usage of time-related functions. We adjusted (made more specific) the return types of functions such as: PROCTIME(), CURRENT_TIMESTAMP, NOW().
Moreover, you can now also define an event time attribute on a TIMESTAMP_LTZ column to gracefully do window processing with the support of Daylight Saving Time.
Please see the release notes for a complete list of changes.
Notable PyFlink improvements # The general theme of this release in PyFlink is to bring the Python DataStream API and Table API closer to feature parity with the Java/Scala APIs.
Stateful operations in the Python DataStream API # With Flink 1.13, Python programmers now also get to enjoy the full potential of Apache Flink&rsquo;s stateful stream processing APIs. The rearchitected Python DataStream API, introduced in Flink 1.12, now has full stateful capabilities, allowing users to remember information from events in the state and act on it later.
That stateful processing capability is the basis of many of the more sophisticated processing operations, which need to remember information across individual events (for example, Windowing Operations).
This example shows a custom counting window implementation, using state:
class CountWindowAverage(FlatMapFunction): def __init__(self, window_size): self.window_size = window_size def open(self, runtime_context: RuntimeContext): descriptor = ValueStateDescriptor(&#34;average&#34;, Types.TUPLE([Types.LONG(), Types.LONG()])) self.sum = runtime_context.get_state(descriptor) def flat_map(self, value): current_sum = self.sum.value() if current_sum is None: current_sum = (0, 0) # update the count current_sum = (current_sum[0] + 1, current_sum[1] + value[1]) # if the count reaches window_size, emit the average and clear the state if current_sum[0] &gt;= self.window_size: self.sum.clear() yield value[0], current_sum[1] // current_sum[0] else: self.sum.update(current_sum) ds = ... # type: DataStream ds.key_by(lambda row: row[0]) \\ .flat_map(CountWindowAverage(5)) User-defined Windows in the PyFlink DataStream API # Flink 1.13 adds support for user-defined windows to the PyFlink DataStream API. Programs can now use windows beyond the standard window definitions.
Because windows are at the heart of all programs that process unbounded streams (by splitting the stream into &ldquo;buckets&rdquo; of bounded size), this greatly increases the expressiveness of the API.
Row-based operation in the PyFlink Table API # The Python Table API now supports row-based operations, i.e., custom transformation functions on rows. These functions are an easy way to apply data transformations on tables beyond the built-in functions.
This is an example of using a map() operation in Python Table API:
@udf(result_type=DataTypes.ROW( [DataTypes.FIELD(&#34;c1&#34;, DataTypes.BIGINT()), DataTypes.FIELD(&#34;c2&#34;, DataTypes.STRING())])) def increment_column(r: Row) -&gt; Row: return Row(r[0] + 1, r[1]) table = ... # type: Table mapped_result = table.map(increment_column) In addition to map(), the API also supports flat_map(), aggregate(), flat_aggregate(), and other row-based operations. This brings the Python Table API a big step closer to feature parity with the Java Table API.
Batch execution mode for PyFlink DataStream programs # The PyFlink DataStream API now also supports the batch execution mode for bounded streams, which was introduced for the Java DataStream API in Flink 1.12.
The batch execution mode simplifies operations and improves the performance of programs on bounded streams, by exploiting the bounded stream nature to bypass state backends and checkpoints.
Other improvements # Flink Documentation via Hugo
The Flink Documentation has been migrated from Jekyll to Hugo. If you find something missing, please let us know. We are also curious to hear if you like the new look &amp; feel.
Exception histories in the Web UI
The Flink Web UI will present up to n last exceptions that caused a job to fail. That helps to debug scenarios where a root failure caused subsequent failures. The root failure cause can be found in the exception history.
Better exception / failure-cause reporting for unsuccessful checkpoints
Flink now provides statistics for checkpoints that failed or were aborted to make it easier to determine the failure cause without having to analyze the logs.
Prior versions of Flink were reporting metrics (e.g., size of persisted data, trigger time) only in case a checkpoint succeeded.
Exactly-once JDBC sink
From 1.13, JDBC sink can guarantee exactly-once delivery of results for XA-compliant databases by transactionally committing results on checkpoints. The target database must have (or be linked to) an XA Transaction Manager.
The connector exists currently only for the DataStream API, and can be created through the JdbcSink.exactlyOnceSink(...) method (or by instantiating the JdbcXaSinkFunction directly).
PyFlink Table API supports User-Defined Aggregate Functions in Group Windows
Group Windows in PyFlink&rsquo;s Table API now support both general Python User-defined Aggregate Functions (UDAFs) and Pandas UDAFs. Such functions are critical to many analysis- and ML training programs.
Flink 1.13 improves upon previous releases, where these functions were only supported in unbounded Group-by aggregations.
Improved Sort-Merge Shuffle for Batch Execution
Flink 1.13 improves the memory stability and performance of the sort-merge blocking shuffle for batch-executed programs, initially introduced in Flink 1.12 via FLIP-148.
Programs with higher parallelism (1000s) should no longer frequently trigger OutOfMemoryError: Direct Memory. The performance (especially on spinning disks) is improved through better I/O scheduling and broadcast optimizations.
HBase connector supports async lookup and lookup cache
The HBase Lookup Table Source now supports an async lookup mode and a lookup cache. This greatly benefits the performance of Table/SQL jobs with lookup joins against HBase, while reducing the I/O requests to HBase in the typical case.
In prior versions, the HBase Lookup Source only communicated synchronously, resulting in lower pipeline utilization and throughput.
Changes to consider when upgrading to Flink 1.13 # FLINK-21709 - The old planner of the Table &amp; SQL API has been deprecated in Flink 1.13 and will be dropped in Flink 1.14. The Blink engine has been the default planner for some releases now and will be the only one going forward. That means that both the BatchTableEnvironment and SQL/DataSet interoperability are reaching the end of life. Please use the unified TableEnvironment for batch and stream processing going forward. FLINK-22352 The community decided to deprecate the Apache Mesos support for Apache Flink. It is subject to removal in the future. Users are encouraged to switch to a different resource manager. FLINK-21935 - The state.backend.async option is deprecated. Snapshots are always asynchronous now (as they were by default before) and there is no option to configure a synchronous snapshot anymore. FLINK-17012 - The tasks&rsquo; RUNNING state was split into two states: INITIALIZING and RUNNING. A task is INITIALIZING while it loads the checkpointed state, and, in the case of unaligned checkpoints, until the checkpointed in-flight data has been recovered. This lets monitoring systems better determine when the tasks are really back to doing work by making the phase for state restoring explicit. FLINK-21698 - The CAST operation between the NUMERIC type and the TIMESTAMP type is problematic and therefore no longer supported: Statements like CAST(numeric AS TIMESTAMP(3)) will now fail. Please use TO_TIMESTAMP(FROM_UNIXTIME(numeric)) instead. FLINK-22133 The unified source API for connectors has a minor breaking change: The SplitEnumerator.snapshotState() method was adjusted to accept the Checkpoint ID of the checkpoint for which the snapshot is created. FLINK-19463 - The old StateBackend interfaces were deprecated as they had overloaded semantics which many users found confusing. This is a pure API change and does not affect runtime characteristics of applications. For full details on how to update existing pipelines, please see the migration guide. Resources # The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent distribution of PyFlink is available on PyPI.
Please review the release notes carefully if you plan to upgrade your setup to Flink 1.13. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.
You can also check the complete release changelog and updated documentation for a detailed list of changes and new features.
List of Contributors # The Apache Flink community would like to thank each one of the contributors that have made this release possible:
acqua.csq, AkisAya, Alexander Fedulov, Aljoscha Krettek, Ammar Al-Batool, Andrey Zagrebin, anlen321, Anton Kalashnikov, appleyuchi, Arvid Heise, Austin Cawley-Edwards, austin ce, azagrebin, blublinsky, Brian Zhou, bytesmithing, caozhen1937, chen qin, Chesnay Schepler, Congxian Qiu, Cristian, cxiiiiiii, Danny Chan, Danny Cranmer, David Anderson, Dawid Wysakowicz, dbgp2021, Dian Fu, DinoZhang, dixingxing, Dong Lin, Dylan Forciea, est08zw, Etienne Chauchot, fanrui03, Flora Tao, FLRNKS, fornaix, fuyli, George, Giacomo Gamba, GitHub, godfrey he, GuoWei Ma, Gyula Fora, hackergin, hameizi, Haoyuan Ge, Harshvardhan Chauhan, Haseeb Asif, hehuiyuan, huangxiao, HuangXiao, huangxingbo, HuangXingBo, humengyu2012, huzekang, Hwanju Kim, Ingo Bürk, I. Raleigh, Ivan, iyupeng, Jack, Jane, Jark Wu, Jerry Wang, Jiangjie (Becket) Qin, JiangXin, Jiayi Liao, JieFang.He, Jie Wang, jinfeng, Jingsong Lee, JingsongLi, Jing Zhang, Joao Boto, JohnTeslaa, Jun Qin, kanata163, kevin.cyj, KevinyhZou, Kezhu Wang, klion26, Kostas Kloudas, kougazhang, Kurt Young, laughing, legendtkl, leiqiang, Leonard Xu, liaojiayi, Lijie Wang, liming.1018, lincoln lee, lincoln-lil, liushouwei, liuyufei, LM Kang, lometheus, luyb, Lyn Zhang, Maciej Obuchowski, Maciek Próchniak, mans2singh, Marek Sabo, Matthias Pohl, meijie, Mika Naylor, Miklos Gergely, Mohit Paliwal, Moritz Manner, morsapaes, Mulan, Nico Kruber, openopen2, paul8263, Paul Lam, Peidian li, pengkangjing, Peter Huang, Piotr Nowojski, Qinghui Xu, Qingsheng Ren, Raghav Kumar Gautam, Rainie Li, Ricky Burnett, Rion Williams, Robert Metzger, Roc Marshal, Roman, Roman Khachatryan, Ruguo, Ruguo Yu, Rui Li, Sebastian Liu, Seth Wiesman, sharkdtu, sharkdtu(涂小刚), Shengkai, shizhengchao, shouweikun, Shuo Cheng, simenliuxing, SteNicholas, Stephan Ewen, Suo Lu, sv3ndk, Svend Vanderveken, taox, Terry Wang, Thelgis Kotsos, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Ting Sun, totoro, totorooo, TsReaper, Tzu-Li (Gordon) Tai, V1ncentzzZ, vthinkxie, wangfeifan, wangpeibin, wangyang0918, wangyemao-github, Wei Zhong, Wenlong Lyu, wineandcheeze, wjc, xiaoHoly, Xintong Song, xixingya, xmarker, Xue Wang, Yadong Xie, yangsanity, Yangze Guo, Yao Zhang, Yuan Mei, yulei0824, Yu Li, Yun Gao, Yun Tang, yuruguo, yushujun, Yuval Itzchakov, yuzhao.cyz, zck, zhangjunfan, zhangzhengqi3, zhao_wei_nan, zhaown, zhaoxing, Zhenghua Gao, Zhenqiu Huang, zhisheng, zhongqishang, zhushang, zhuxiaoshang, Zhu Zhu, zjuwangg, zoucao, zoudan, 左元, 星, 肖佳文, 龙三
`}),e.add({id:119,href:"/2021/04/29/apache-flink-1.12.3-released/",title:"Apache Flink 1.12.3 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.
This release includes 73 fixes and minor improvements for Flink 1.12.2. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.12.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.12.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-18071] - CoordinatorEventsExactlyOnceITCase.checkListContainsSequence fails on CI [FLINK-20547] - Batch job fails due to the exception in network stack [FLINK-20654] - Unaligned checkpoint recovery may lead to corrupted data stream [FLINK-20722] - HiveTableSink should copy the record when converting RowData to Row [FLINK-20752] - FailureRateRestartBackoffTimeStrategy allows one less restart than configured [FLINK-20761] - Cannot read hive table/partition whose location path contains comma [FLINK-20977] - USE DATABASE &amp; USE CATALOG fails with quoted identifiers containing characters to be escaped in Flink SQL client [FLINK-21008] - Residual HA related Kubernetes ConfigMaps and ZooKeeper nodes when cluster entrypoint received SIGTERM in shutdown [FLINK-21012] - AvroFileFormatFactory uses non-deserializable lambda function [FLINK-21133] - FLIP-27 Source does not work with synchronous savepoint [FLINK-21148] - YARNSessionFIFOSecuredITCase cannot connect to BlobServer [FLINK-21159] - KafkaSourceEnumerator not sending NoMoreSplitsEvent to unassigned reader [FLINK-21178] - Task failure will not trigger master hook&#39;s reset() [FLINK-21289] - Application mode ignores the pipeline.classpaths configuration [FLINK-21387] - DispatcherTest.testInvalidCallDuringInitialization times out on azp [FLINK-21388] - Parquet DECIMAL logical type is not properly supported in ParquetSchemaConverter [FLINK-21431] - UpsertKafkaTableITCase.testTemporalJoin hang [FLINK-21434] - When UDAF return ROW type, and the number of fields is more than 14, the crash happend [FLINK-21497] - JobLeaderIdService completes leader future despite no leader being elected [FLINK-21515] - SourceStreamTaskTest.testStopWithSavepointShouldNotInterruptTheSource is failing [FLINK-21518] - CheckpointCoordinatorTest.testMinCheckpointPause fails fatally on AZP [FLINK-21523] - ArrayIndexOutOfBoundsException occurs while run a hive streaming job with partitioned table source [FLINK-21535] - UnalignedCheckpointITCase.execute failed with &quot;OutOfMemoryError: Java heap space&quot; [FLINK-21550] - ZooKeeperHaServicesTest.testSimpleClose fail [FLINK-21552] - The managed memory was not released if exception was thrown in createPythonExecutionEnvironment [FLINK-21606] - TaskManager connected to invalid JobManager leading to TaskSubmissionException [FLINK-21609] - SimpleRecoveryITCaseBase.testRestartMultipleTimes fails on azure [FLINK-21654] - YARNSessionCapacitySchedulerITCase.testStartYarnSessionClusterInQaTeamQueue fail because of NullPointerException [FLINK-21661] - SHARD_GETRECORDS_INTERVAL_MILLIS wrong use? [FLINK-21685] - Flink JobManager failed to restart from checkpoint in kubernetes HA setup [FLINK-21691] - KafkaSource fails with NPE when setting it up [FLINK-21707] - Job is possible to hang when restarting a FINISHED task with POINTWISE BLOCKING consumers [FLINK-21710] - FlinkRelMdUniqueKeys gets incorrect result on TableScan after project push-down [FLINK-21725] - DataTypeExtractor extracts wrong fields ordering for Tuple12 [FLINK-21733] - WatermarkAssigner incorrectly recomputing the rowtime index which may cause ArrayIndexOutOfBoundsException [FLINK-21746] - flink sql fields in row access error about scalarfunction [FLINK-21753] - Cycle references between memory manager and gc cleaner action [FLINK-21817] - New Kafka Source might break subtask and split assignment upon rescale [FLINK-21833] - TemporalRowTimeJoinOperator.java will lead to the state expansion by short-life-cycle &amp; huge RowData, although config idle.state.retention.time [FLINK-21889] - source:canal-cdc , sink:upsert-kafka, print &quot;select * from sinkTable&quot;, throw NullException [FLINK-21922] - The method partition_by in Over doesn&#39;t work for expression dsl [FLINK-21933] - [kinesis][efo] EFO consumer treats interrupts as retryable exceptions [FLINK-21941] - testSavepointRescalingOutPartitionedOperatorStateList fail [FLINK-21942] - KubernetesLeaderRetrievalDriver not closed after terminated which lead to connection leak [FLINK-21944] - AbstractArrowPythonAggregateFunctionOperator.dispose should consider whether arrowSerializer is null [FLINK-21969] - PythonTimestampsAndWatermarksOperator emitted the Long.MAX_VALUE watermark before emitting all the data [FLINK-21980] - ZooKeeperRunningJobsRegistry creates an empty znode [FLINK-21986] - taskmanager native memory not release timely after restart [FLINK-21992] - Fix availability notification in UnionInputGate [FLINK-21996] - Transient RPC failure without TaskManager failure can lead to split assignment loss [FLINK-22006] - Could not run more than 20 jobs in a native K8s session when K8s HA enabled [FLINK-22024] - Maven: Entry has not been leased from this pool / fix for release 1.12 [FLINK-22053] - NumberSequenceSource causes fatal exception when less splits than parallelism. [FLINK-22055] - RPC main thread executor may schedule commands with wrong time unit of delay [FLINK-22061] - The DEFAULT_NON_SPLITTABLE_FILE_ENUMERATOR defined in FileSource should points to NonSplittingRecursiveEnumerator [FLINK-22081] - Entropy key not resolved if flink-s3-fs-hadoop is added as a plugin [FLINK-22082] - Nested projection push down doesn&#39;t work for data such as row(array(row)) [FLINK-22124] - The job finished without any exception if error was thrown during state access [FLINK-22172] - Fix the bug of shared resource among Python Operators of the same slot is not released [FLINK-22184] - Rest client shutdown on failure runs in netty thread [FLINK-22191] - PyFlinkStreamUserDefinedFunctionTests.test_udf_in_join_condition_2 fail due to NPE [FLINK-22327] - NPE exception happens if it throws exception in finishBundle during job shutdown [FLINK-22339] - Fix some encoding exceptions were not thrown in cython coders [FLINK-22345] - CoordinatorEventsExactlyOnceITCase hangs on azure [FLINK-22385] - Type mismatch in NetworkBufferPool Improvement [FLINK-20533] - Add histogram support to Datadog reporter [FLINK-21382] - Standalone K8s documentation does not explain usage of standby JobManagers [FLINK-21521] - Pretty print K8s specifications [FLINK-21690] - remove redundant tolerableCheckpointFailureNumber setting in CheckpointConfig [FLINK-21735] - Harden JobMaster#updateTaskExecutionState() [FLINK-22051] - Better document the distinction between stop-with-savepoint and stop-with-savepoint-with-drain [FLINK-22142] - Remove console logging for Kafka connector for AZP runs [FLINK-22208] - Bump snappy-java to 1.1.5+ [FLINK-22297] - Perform early check to ensure that the length of the result is the same as the input for Pandas UDF `}),e.add({id:120,href:"/2021/04/15/stateful-functions-3.0.0-remote-functions-front-and-center/",title:"Stateful Functions 3.0.0: Remote Functions Front and Center",section:"Flink Blog",content:`The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 3.0.0! Stateful Functions is a cross-platform stack for building Stateful Serverless applications, making it radically simpler to develop scalable, consistent, and elastic distributed applications.
This new release brings remote functions to the front and center of StateFun, making the disaggregated setup that separates the application logic from the StateFun cluster the default. It is now easier, more efficient, and more ergonomic to write applications that live in their own processes or containers. With the new Java SDK this is now also possible for all JVM languages, in addition to Python.
Background # Starting with the first StateFun release, before the project was donated to the Apache Software Foundation, our focus was: making scalable stateful applications easy to build and run.
The first StateFun version introduced an SDK that allowed writing stateful functions that build up a StateFun application packaged and deployed as a particular Flink job submitted to a Flink cluster. Having functions executing within the same JVM as Flink has some advantages, such as the deployment&rsquo;s performance and immutability. However, it had a few limitations:
❌ ⠀Functions can be written only in a JVM based language. ❌ ⠀A blocking call/CPU heavy task in one function can affect other functions and operations that need to complete in a timely manner, such as checkpointing. ❌ ⠀Deploying a new version of the function required a stateful upgrade of the backing Flink job. With StateFun 2.0.0, the debut official release after the project was donated to Apache Flink, the community introduced the concept of remote functions, together with an additional SDK for the Python language. A remote function is a function that executes in a separate process and is invoked via HTTP by the StateFun cluster processes. Remote functions introduce a new and exciting capability: state and compute disaggregation - allowing users to scale the functions independently of the StateFun cluster, which essentially plays the role of handling messaging and state in a consistent and fault-tolerant manner.
While remote functions did address the limitations (1) and (2) mentioned above, we still had some room to improve:
❌ ⠀A stateful restart of the StateFun processes is required to register a new function or to change the state definitions of an existing function. ❌ ⠀The SDK had a few friction points around state and messaging ergonomics - it had a heavy dependency on Google’s Protocol Buffers for it’s multi-language object representation. As business requirements evolve, the application logic naturally evolves with it. For StateFun applications, this often means typical changes such as adding new functions to the application or updating some existing functions to include new state to be persisted. This is where the first limitation becomes an issue - such operations require a stateful restart of the StateFun cluster in order for the changes to be discovered, meaning that all functions of the application would have some downtime for this to take effect. With remote functions being standalone instances that are supposedly independent of the StateFun cluster processes, this is obviously non-ideal. By making remote functions the default in StateFun, we&rsquo;re aiming at enabling full flexibility and ease of operations for application upgrades.
The second limitation around state and messaging ergonomics had came up a few times from our users. Prior to this release, all state values and message objects were strictly required to be Protobuf objects. This made it cumbersome to use common types such as JSON or simple strings as state and messages.
With the new StateFun 3.0.0 release, the community has enhanced the remote functions protocol (the protocol that describes how StateFun communicates with the remote function processes) to address all the issues mentioned above. Building on the new protocol, we rewrote the Python SDK and introduced a brand new remote Java SDK.
New Features # Unified Language SDKs # One of the goals that we set up to achieve with the SDKs is a unified set of concepts across all the languages. Having standard and unified SDK concepts across the board makes it straightforward for users to switch the languages their functions are implemented in.
Here is the same function written with the updated Python SDK and newly added Java SDK in StateFun 3.0.0:
Python # @functions.bind(typename=&#34;example/greeter&#34;, specs=[ValueSpec(name=&#34;visits&#34;, type=IntType)]) async def greeter(context: Context, message: Message): # update the visit count. visits = context.storage.visits or 0 visits += 1 context.storage.visits = visits # compute a greeting name = message.as_string() greeting = f&#34;Hello there {name} at the {visits}th time!&#34; caller = context.caller context.send(message_builder(target_typename=caller.typename, target_id=caller.id, str_value=greeting)) Java # static final class Greeter implements StatefulFunction { static final ValueSpec&lt;Integer&gt; VISITS = ValueSpec.named(&#34;visits&#34;).withIntType(); @Override public CompletableFuture&lt;Void&gt; apply(Context context, Message message){ // update the visits count int visits = context.storage().get(VISITS).orElse(0); visits++; context.storage().set(VISITS, visits); // compute a greeting var name = message.asUtf8String(); var greeting = String.format(&#34;Hello there %s at the %d-th time!\\n&#34;, name, visits); // reply to the caller with a greeting var caller = context.caller().get(); context.send( MessageBuilder.forAddress(caller) .withValue(greeting) .build() ); return context.done(); } } Although there are some language specific differences, the terms and concepts are the same:
an address scoped storage acting as a key-value store for a particular address. a unified cross-language way to send, receive, and store values across languages (see also Cross-Language Type System below). ValueSpec to describe the state name, type and possibly expiration configuration. Please note that it is no longer necessary to declare the state ahead of time in a module.yaml. For a detailed SDK tutorial, we would like to encourage you to visit:
Java SDK showcase Python SDK showcase Cross-Language Type System # StateFun 3.0.0 introduces a new type system with cross-language support for common primitive types, such as boolean, integer, long, etc. This is of course all transparent for the user, so you don&rsquo;t need to worry about it. Functions implemented in various languages (e.g. Java or Python) can message each other by directly sending supported primitive values as message arguments. Moreover, the type system is used for state values as well - so, you can expect that a function can safely read previous state after reimplementing it in a different language.
The type system is also very easily extensible to support custom message types, such as JSON or Protobuf messages. StateFun makes this super easy by providing builder utilities to help you create custom types.
Dynamic Registration of State and Functions # Starting with this release it is now possible to dynamically register new functions without going through a stateful upgrade cycle of the StateFun cluster (which entails the standard process of performing a stateful restart of a Flink job). This is achieved with a new endpoint definition that supports target URL templating.
Consider the following definition:
endpoints: - endpoint: meta: kind: http spec: functions: example/* urlPathTemplate: https://loadbalancer.svc.cluster.local/{function.name} With this definition, all messages being addressed to functions under the namespace example will be forwarded to the specified templated URL. For example, a message being addressed to a function of typename example/greeter would be forwarded to https://loadbalancer.svc.cluster.local/greeter.
This unlocks the possibility to dynamically introduce new functions into the topology without ever restarting the Stateful Functions application.
Summary # With 3.0.0, we&rsquo;ve brought remote functions to the front and center of StateFun. This is done by a new remote function protocol that:
✅ ⠀Allows registering a new function or changing the state definitions of an existing function to happen dynamically without any downtime, and ✅ ⠀Provides a cross-language type system, which comes along with a few built-in primitive types, that can be used for messaging and state. A new Java SDK was added for remote functions to extend the array of supported languages to also include all JVM based languages. The language SDKs now have unified concepts and constructs in their APIs so that they will all feel familiar to work with when switching around languages for your functions. In upcoming releases, the community is also looking forward to continuing building on top of the new remote function protocol to provide an even more language SDKs, such as Golang.
Release Resources # The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Python SDK distribution is available on PyPI. You can also find official StateFun Docker images of the new version on Dockerhub.
For more details, check the updated documentation and the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 3.0.0. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA
List of Contributors # The Apache Flink community would like to thank all contributors that have made this release possible:
Authuir, Chesnay Schepler, David Anderson, Dian Fu, Frans King, Galen Warren, Guillaume Vauvert, Igal Shilman, Ismaël Mejía, Kartik Khare, Konstantin Knauf, Marta Paes Moreira, Patrick Lucas, Patrick Wiener, Rafi Aroch, Robert Metzger, RocMarshal, Seth Wiesman, Siddique Ahmad, SteNicholas, Stephan Ewen, Timothy Bess, Tymur Yarosh, Tzu-Li (Gordon) Tai, Ufuk Celebi, abc863377, billyrrr, congxianqiu, danp11, hequn8128, kaibo, klion26, morsapaes, slinkydeveloper, wangchao, wangzzu, winder If you’d like to get involved, we’re always looking for new contributors.
`}),e.add({id:121,href:"/2021/03/11/a-rundown-of-batch-execution-mode-in-the-datastream-api/",title:"A Rundown of Batch Execution Mode in the DataStream API",section:"Flink Blog",content:`Flink has been following the mantra that Batch is a Special Case of Streaming since the very early days. As the project evolved to address specific uses cases, different core APIs ended up being implemented for batch (DataSet API) and streaming execution (DataStream API), but the higher-level Table API/SQL was subsequently designed following this mantra of unification. With Flink 1.12, the community worked on bringing a similarly unified behaviour to the DataStream API, and took the first steps towards enabling efficient batch execution in the DataStream API.
The idea behind making the DataStream API a unified abstraction for batch and streaming execution instead of maintaining separate APIs is two-fold:
Reusability: efficient batch and stream processing under the same API would allow you to easily switch between both execution modes without rewriting any code. So, a job could be easily reused to process real-time and historical data.
Operational simplicity: providing a unified API would mean using a single set of connectors, maintaining a single codebase and being able to easily implement mixed execution pipelines e.g. for use cases like backfilling.
The difference between BATCH and STREAMING vs BOUNDED and UNBOUNDED is subtle, and a common source of confusion — so, let&rsquo;s start by clarifying that. These terms might seem mostly interchangeable, but in reality serve different purposes:
Bounded and unbounded refer to the characteristics of the streams you want to process: whether or not they are known to have an end. The terms are also sometimes applied to the applications processing these streams: an application that only processes bounded streams is a bounded stream processing application that eventually finishes; while an unbounded stream processing application processes an unbounded stream and runs forever (or until canceled).
Batch and streaming are execution modes. Batch execution is only applicable to bounded streams/applications because it exploits the fact that it can process the whole data (e.g. from a partition) in a batch rather than event-by-event, and possibly execute different batches one after the other. Continuous streaming execution runs everything at the same time, continuously processes (small groups of) events and is applicable to both bounded and unbounded applications.
Based on that differentiation, there are two main scenarios that result of the combination of these properties:
A bounded Stream Processing Application that is executed in a batch mode, which you can call a Batch (Processing) Application. An unbounded Stream Processing Application that is executed in a streaming mode. This is the combination that has been the primary use case for the DataStream API in Flink. It&rsquo;s also possible to have a bounded Stream Processing Application that is executed in streaming mode, but this combination is less significant and likely to be used e.g. in a test environment or in other rare corner cases.
Which API and execution mode should I use? # Before going into the choice of execution mode, try looking at your use case from a different angle: do you need to process structured data? Does your data have a schema of some sort? The Table API/SQL will most likely be the right choice. In fact, the majority of batch use cases should be expressed with the Table API/SQL! Finite, bounded data can most often be organized, described with a schema and put into a catalog. This is where the SQL API shines, giving you a rich set of functions and operators out-of-the box with low-level optimizations and broad connector support, all supported by standard SQL. And it works for streaming use cases, as well!
However, if you need explicit control over the execution graph, you want to manually control the state of your operations, or you need to be able to upgrade Flink (which applies to unbounded applications), the DataStream API is the right choice. If the DataStream API sounds like the best fit for your use cases, the next decision is what execution mode to run your program in.
When should you use the batch mode, then?
The simple answer is if you run your computation on bounded, historic data. The batch mode has a few benefits:
In bounded data there is no such thing as late data. You do not need to think how to adjust the watermarking logic that you use in your application. In a streaming case, you need to maintain the order in which the records were written - which is often not possible to recreate when reading from e.g. historic files. In batch mode you don&rsquo;t need to care about that as the data will be sorted according to the timestamp and &ldquo;perfect&rdquo; watermarks will be injected automatically. The way streaming applications are scheduled and react upon failure have significant performance implications that can be optimized when dealing with bounded data. We recommend reading through the blogposts on pipelined region scheduling and fine-grained fault tolerance to better understand these performance implications. It can simplify the operational overhead of setting up and maintaining your pipelines. For example, there is no need to configure checkpointing, which otherwise requires things like choosing a state backend or setting up distributed storage for checkpoints. How to use the batch execution # Once you have a good understanding of which execution mode is better suited to your use case, you can configure it via the execution.runtime-mode setting. There are three possible values:
STREAMING: The classic DataStream execution mode (default) BATCH: Batch-style execution on the DataStream API AUTOMATIC: Let the system decide based on the boundedness of the sources This can be configured via command line parameters of bin/flink run ... when submitting a job:
$ bin/flink run -Dexecution.runtime-mode=BATCH examples/streaming/WordCount.jar , or programmatically when creating/configuring the StreamExecutionEnvironment
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setRuntimeMode(RuntimeExecutionMode.BATCH); We recommend passing the execution mode when submitting the job, in order to keep your code configuration-free and potentially be able to execute the same application in different execution modes.
Hello batch mode # Now that you know how to set the execution mode, let&rsquo;s try to write a simple word count program and see how it behaves depending on the chosen mode. The program is a variation of a standard word count, where we count number of orders placed in a given currency. We derive the number in 1-day windows. We read the input data from a new unified file source and then apply a window aggregation. Notice that we will be checking the side output for late arriving data, which can illustrate how watermarks behave differently in the two execution modes.
public class WindowWordCount { private static final OutputTag&lt;String[]&gt; LATE_DATA = new OutputTag&lt;&gt;( &#34;late-data&#34;, BasicArrayTypeInfo.STRING_ARRAY_TYPE_INFO); public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ParameterTool config = ParameterTool.fromArgs(args); Path path = new Path(config.get(&#34;path&#34;)); SingleOutputStreamOperator&lt;Tuple4&lt;String, Integer, String, String&gt;&gt; dataStream = env .fromSource( FileSource.forRecordStreamFormat(new TsvFormat(), path).build(), WatermarkStrategy.&lt;String[]&gt;forBoundedOutOfOrderness(Duration.ofDays(1)) .withTimestampAssigner(new OrderTimestampAssigner()), &#34;Text file&#34; ) .keyBy(value -&gt; value[4]) // group by currency .window(TumblingEventTimeWindows.of(Time.days(1))) .sideOutputLateData(LATE_DATA) .aggregate( new CountFunction(), // count number of orders in a given currency new CombineWindow()); int i = 0; DataStream&lt;String[]&gt; lateData = dataStream.getSideOutput(LATE_DATA); try (CloseableIterator&lt;String[]&gt; results = lateData.executeAndCollect()) { while (results.hasNext()) { String[] late = results.next(); if (i &lt; 100) { System.out.println(Arrays.toString(late)); } i++; } } System.out.println(&#34;Number of late records: &#34; + i); try (CloseableIterator&lt;Tuple4&lt;String, Integer, String, String&gt;&gt; results = dataStream.executeAndCollect()) { while (results.hasNext()) { System.out.println(results.next()); } } } } If we simply execute the above program with:
$ bin/flink run examples/streaming/WindowWordCount.jar it will be executed in a streaming mode by default. Because of that, it will use the given watermarking strategy and produce windows based on it. In real-time scenarios, it might happen that records do not adhere to watermarks and some records might actually be considered late, so you&rsquo;ll get results like:
... [1431681, 130936, F, 135996.21, NOK, 2020-04-11 07:53:02.674, 2-HIGH, Clerk#000000922, 0, quests. slyly regular platelets cajole ironic deposits: blithely even depos] [1431744, 143957, F, 36391.24, CHF, 2020-04-11 07:53:27.631, 2-HIGH, Clerk#000000406, 0, eans. blithely special instructions are quickly. q] [1431812, 58096, F, 55292.05, CAD, 2020-04-11 07:54:16.956, 2-HIGH, Clerk#000000561, 0, , regular packages use. slyly even instr] [1431844, 77335, O, 415443.20, CAD, 2020-04-11 07:54:40.967, 2-HIGH, Clerk#000000446, 0, unts across the courts wake after the accounts! ruthlessly] [1431968, 122005, F, 44964.19, JPY, 2020-04-11 07:55:42.661, 1-URGENT, Clerk#000000001, 0, nal theodolites against the slyly special packages poach blithely special req] [1432097, 26035, F, 42464.15, CAD, 2020-04-11 07:57:13.423, 5-LOW, Clerk#000000213, 0, l accounts hang blithely. carefully blithe dependencies ] [1432193, 97537, F, 87856.63, NOK, 2020-04-11 07:58:06.862, 4-NOT SPECIFIED, Clerk#000000356, 0, furiously furiously brave foxes. bo] [1432291, 112045, O, 114327.52, JPY, 2020-04-11 07:59:12.912, 1-URGENT, Clerk#000000732, 0, ding to the fluffily ironic requests haggle carefully alongsid] Number of late records: 1514 (GBP,374,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (HKD,401,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CNY,402,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CAD,392,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (JPY,411,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CHF,371,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (NOK,370,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (RUB,365,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) ... However, if you execute the exact same code using the batch execution mode:
$ bin/flink run -Dexecution.runtime-mode=BATCH examples/streaming/WordCount.jar you&rsquo;ll see that there won&rsquo;t be any late records.
Number of late records: 0 (GBP,374,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (HKD,401,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CNY,402,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CAD,392,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (JPY,411,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (CHF,371,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (NOK,370,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) (RUB,365,2020-03-31T00:00:00Z,2020-04-01T00:00:00Z) Also, if you compare the execution timelines of both runs, you&rsquo;ll see that the jobs were scheduled differently. In the case of batch execution, the two stages were executed one after the other:
whereas for streaming both stages started at the same time.
Example: Two input operators # Operators that process data from multiple inputs can be executed in both execution modes as well. Let&rsquo;s see how we may implement a join of two data sets on a common key. (Disclaimer: Make sure to think first if you should use the Table API/SQL for your join!). We will enrich a stream of orders with information about the customer and we will make it run either of the two modes.
For this particular use case, the DataStream API provides a DataStream#join method that requires a window in which the join must happen; since we&rsquo;ll process the data in bulk, we can use a GlobalWindow (that would otherwise not be very useful on its own in an unbounded case due to state size concerns):
DataStreamSource&lt;String[]&gt; orders = env .fromSource( FileSource.forRecordStreamFormat(new TsvFormat(), ordersPath).build(), WatermarkStrategy.&lt;String[]&gt;noWatermarks() .withTimestampAssigner((record, previous) -&gt; -1), &#34;Text file&#34; ); Path customersPath = new Path(config.get(&#34;customers&#34;)); DataStreamSource&lt;String[]&gt; customers = env .fromSource( FileSource.forRecordStreamFormat(new TsvFormat(), customersPath).build(), WatermarkStrategy.&lt;String[]&gt;noWatermarks() .withTimestampAssigner((record, previous) -&gt; -1), &#34;Text file&#34; ); DataStream&lt;Tuple2&lt;String, String&gt;&gt; dataStream = orders.join(customers) .where(order -&gt; order[1]).equalTo(customer -&gt; customer[0]) // join on customer id .window(GlobalWindows.create()) .trigger(ContinuousProcessingTimeTrigger.of(Time.seconds(5))) .apply(new ProjectFunction()); You might notice the ContinuousProcessingTimeTrigger. It is there for the application to produce results in a streaming mode. In a streaming application the GlobalWindow never finishes so we need to add a processing time trigger to emit results from time to time. We believe triggers are a way to control when to emit results, but are not part of the logic what to emit. Therefore we think it is safe to ignore those in case of batch mode and that&rsquo;s what we do. In batch mode you will just get one final result for the join.
Looking into the future # Support for efficient batch execution in the DataStream API was introduced in Flink 1.12 as a first step towards achieving a truly unified runtime for both batch and stream processing. This is not the end of the story yet! The community is still working on some optimizations and exploring more use cases that can be enabled with this new mode.
One of the first efforts we want to finalize is providing world-class support for transactional sinks in both execution modes, for bounded and unbounded streams. An experimental API for transactional sinks was already introduced in Flink 1.12, so we&rsquo;re working on stabilizing it and would be happy to hear feedback about its current state!
We are also thinking how the two modes can be brought closer together and benefit from each other. A common pattern that we hear from users is bootstrapping state of a streaming job from a batch one. There are two somewhat different approaches we are considering here:
Having a mixed graph, where one of the branches would have only bounded sources and the other would reflect the unbounded part — you can think of such a graph as effectively two separate jobs. The bounded part would be executed first and sink into the state of a common vertex of the two parts. This jobs&rsquo; purpose would be to populate the state of the common operator. Once that job is done, we could proceed to running the unbounded part.
Another approach is to run the exact same program first on the bounded data. However, this time we wouldn&rsquo;t assume completeness of the job; instead, we would produce the state of all operators up to a certain point in time and store it as a savepoint. Later on, we could use the savepoint to start the application on the unbounded data.
Lastly, to achieve feature parity with the DataSet API (Flink&rsquo;s legacy API for batch-style execution), we are looking into the topic of iterations and how to meet the different usage patterns depending on the mode. In STREAMING mode, iterations serve as a loopback edge, but we don&rsquo;t necessarily need to keep track of the iteration step. On the other hand, the iteration generation is vital for Machine Learning (ML) algorithms, which are the primary use case for iterations in BATCH mode.
Have you tried the new BATCH execution mode in the DataStream API? How was your experience? We are happy to hear your feedback and stories!
`}),e.add({id:122,href:"/2021/03/03/apache-flink-1.12.2-released/",title:"Apache Flink 1.12.2 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.12 series.
This release includes 83 fixes and minor improvements for Flink 1.12.1. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.12.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.12.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-21070] - Overloaded aggregate functions cause converter errors [FLINK-21486] - Add sanity check when switching from Rocks to Heap timers Bug [FLINK-12461] - Document binary compatibility situation with Scala beyond 2.12.8 [FLINK-16443] - Fix wrong fix for user-code CheckpointExceptions [FLINK-19771] - NullPointerException when accessing null array from postgres in JDBC Connector [FLINK-20309] - UnalignedCheckpointTestBase.execute is failed [FLINK-20462] - MailboxOperatorTest.testAvoidTaskStarvation [FLINK-20500] - UpsertKafkaTableITCase.testTemporalJoin test failed [FLINK-20565] - Fix typo in EXPLAIN Statements docs. [FLINK-20580] - Missing null value handling for SerializedValue&#39;s getByteArray() [FLINK-20654] - Unaligned checkpoint recovery may lead to corrupted data stream [FLINK-20663] - Managed memory may not be released in time when operators use managed memory frequently [FLINK-20675] - Asynchronous checkpoint failure would not fail the job anymore [FLINK-20680] - Fails to call var-arg function with no parameters [FLINK-20798] - Using PVC as high-availability.storageDir could not work [FLINK-20832] - Deliver bootstrap resouces ourselves for website and documentation [FLINK-20848] - Kafka consumer ID is not specified correctly in new KafkaSource [FLINK-20913] - Improve new HiveConf(jobConf, HiveConf.class) [FLINK-20921] - Fix Date/Time/Timestamp in Python DataStream [FLINK-20933] - Config Python Operator Use Managed Memory In Python DataStream [FLINK-20942] - Digest of FLOAT literals throws UnsupportedOperationException [FLINK-20944] - Launching in application mode requesting a ClusterIP rest service type results in an Exception [FLINK-20947] - Idle source doesn&#39;t work when pushing watermark into the source [FLINK-20961] - Flink throws NullPointerException for tables created from DataStream with no assigned timestamps and watermarks [FLINK-20992] - Checkpoint cleanup can kill JobMaster [FLINK-20998] - flink-raw-1.12.jar does not exist [FLINK-21009] - Can not disable certain options in Elasticsearch 7 connector [FLINK-21013] - Blink planner does not ingest timestamp into StreamRecord [FLINK-21024] - Dynamic properties get exposed to job&#39;s main method if user parameters are passed [FLINK-21028] - Streaming application didn&#39;t stop properly [FLINK-21030] - Broken job restart for job with disjoint graph [FLINK-21059] - KafkaSourceEnumerator does not honor consumer properties [FLINK-21069] - Configuration &quot;parallelism.default&quot; doesn&#39;t take effect for TableEnvironment#explainSql [FLINK-21071] - Snapshot branches running against flink-docker dev-master branch [FLINK-21104] - UnalignedCheckpointITCase.execute failed with &quot;IllegalStateException&quot; [FLINK-21132] - BoundedOneInput.endInput is called when taking synchronous savepoint [FLINK-21138] - KvStateServerHandler is not invoked with user code classloader [FLINK-21140] - Extract zip file dependencies before adding to PYTHONPATH [FLINK-21144] - KubernetesResourceManagerDriver#tryResetPodCreationCoolDown causes fatal error [FLINK-21155] - FileSourceTextLinesITCase.testBoundedTextFileSourceWithTaskManagerFailover does not pass [FLINK-21158] - wrong jvm metaspace and overhead size show in taskmanager metric page [FLINK-21163] - Python dependencies specified via CLI should not override the dependencies specified in configuration [FLINK-21169] - Kafka flink-connector-base dependency should be scope compile [FLINK-21208] - pyarrow exception when using window with pandas udaf [FLINK-21213] - e2e test fail with &#39;As task is already not running, no longer decline checkpoint&#39; [FLINK-21215] - Checkpoint was declined because one input stream is finished [FLINK-21216] - StreamPandasConversionTests Fails [FLINK-21225] - OverConvertRule does not consider distinct [FLINK-21226] - Reintroduce TableColumn.of for backwards compatibility [FLINK-21274] - At per-job mode, during the exit of the JobManager process, if ioExecutor exits at the end, the System.exit() method will not be executed. [FLINK-21277] - SQLClientSchemaRegistryITCase fails to download testcontainers/ryuk:0.3.0 [FLINK-21312] - SavepointITCase.testStopSavepointWithBoundedInputConcurrently is unstable [FLINK-21323] - Stop-with-savepoint is not supported by SourceOperatorStreamTask [FLINK-21351] - Incremental checkpoint data would be lost once a non-stop savepoint completed [FLINK-21361] - FlinkRelMdUniqueKeys matches on AbstractCatalogTable instead of CatalogTable [FLINK-21412] - pyflink DataTypes.DECIMAL is not available [FLINK-21452] - FLIP-27 sources cannot reliably downscale [FLINK-21453] - BoundedOneInput.endInput is NOT called when doing stop with savepoint WITH drain [FLINK-21490] - UnalignedCheckpointITCase fails on azure [FLINK-21492] - ActiveResourceManager swallows exception stack trace New Feature [FLINK-20359] - Support adding Owner Reference to Job Manager in native kubernetes setup Improvement [FLINK-9844] - PackagedProgram does not close URLClassLoader [FLINK-20417] - Handle &quot;Too old resource version&quot; exception in Kubernetes watch more gracefully [FLINK-20491] - Support Broadcast Operation in BATCH execution mode [FLINK-20517] - Support mixed keyed/non-keyed operations in BATCH execution mode [FLINK-20770] - Incorrect description for config option kubernetes.rest-service.exposed.type [FLINK-20907] - Table API documentation promotes deprecated syntax [FLINK-21020] - Bump Jackson to 20.10.5[.1] / 2.12.1 [FLINK-21034] - Rework jemalloc switch to use an environment variable [FLINK-21035] - Deduplicate copy_plugins_if_required calls [FLINK-21036] - Consider removing automatic configuration fo number of slots from docker [FLINK-21037] - Deduplicate configuration logic in docker entrypoint [FLINK-21042] - Fix code example in &quot;Aggregate Functions&quot; section in Table UDF page [FLINK-21048] - Refactor documentation related to switch memory allocator [FLINK-21123] - Upgrade Beanutils 1.9.x to 1.9.4 [FLINK-21164] - Jar handlers don&#39;t cleanup temporarily extracted jars [FLINK-21210] - ApplicationClusterEntryPoints should explicitly close PackagedProgram [FLINK-21381] - Kubernetes HA documentation does not state required service account and role Task [FLINK-20529] - Publish Dockerfiles for release 1.12.0 [FLINK-20534] - Add Flink 1.12 MigrationVersion [FLINK-20536] - Update migration tests in master to cover migration from release-1.12 [FLINK-20960] - Add warning in 1.12 release notes about potential corrupt data stream with unaligned checkpoint [FLINK-21358] - Missing snapshot version compatibility for 1.12 `}),e.add({id:123,href:"/2021/02/10/how-to-natively-deploy-flink-on-kubernetes-with-high-availability-ha/",title:"How to natively deploy Flink on Kubernetes with High-Availability (HA)",section:"Flink Blog",content:`Flink has supported resource management systems like YARN and Mesos since the early days; however, these were not designed for the fast-moving cloud-native architectures that are increasingly gaining popularity these days, or the growing need to support complex, mixed workloads (e.g. batch, streaming, deep learning, web services). For these reasons, more and more users are using Kubernetes to automate the deployment, scaling and management of their Flink applications.
From release to release, the Flink community has made significant progress in integrating natively with Kubernetes, from active resource management to “Zookeeperless” High Availability (HA). In this blogpost, we&rsquo;ll recap the technical details of deploying Flink applications natively on Kubernetes, diving deeper into Flink’s Kubernetes HA architecture. We&rsquo;ll then walk you through a hands-on example of running a Flink application cluster on Kubernetes with HA enabled. We’ll end with a conclusion covering the advantages of running Flink natively on Kubernetes, and an outlook into future work.
Native Flink on Kubernetes Integration # Before we dive into the technical details of how the Kubernetes-based HA service works, let us briefly explain what native means in the context of Flink deployments on Kubernetes:
Flink is self-contained. There will be an embedded Kubernetes client in the Flink client, and so you will not need other external tools (e.g. kubectl, Kubernetes dashboard) to create a Flink cluster on Kubernetes.
The Flink client will contact the Kubernetes API server directly to create the JobManager deployment. The configuration located on the client side will be shipped to the JobManager pod, as well as the log4j and Hadoop configurations.
Flink’s ResourceManager will talk to the Kubernetes API server to allocate and release the TaskManager pods dynamically on-demand.
All in all, this is similar to how Flink integrates with other resource management systems (e.g. YARN, Mesos), so it should be somewhat straightforward to integrate with Kubernetes if you’ve managed such deployments before — and especially if you already had some internal deployer for the lifecycle management of multiple Flink jobs.
Fig. 1: Architecture of Flink's native Kubernetes integration. Kubernetes High Availability Service # High Availability (HA) is a common requirement when bringing Flink to production: it helps prevent a single point of failure for Flink clusters. Previous to the 1.12 release, Flink has provided a Zookeeper HA service that has been widely used in production setups and that can be integrated in standalone cluster, YARN, or Kubernetes deployments. However, managing a Zookeeper cluster on Kubernetes for HA would require an additional operational cost that could be avoided because, in the end, Kubernetes also provides some public APIs for leader election and configuration storage (i.e. ConfigMap). From Flink 1.12, we leverage these features to make running a HA-configured Flink cluster on Kubernetes more convenient to users.
Fig. 2: Architecture of Flink's Kubernetes High Availability (HA) service. The above diagram shows the architecture of Flink’s Kubernetes HA service, which works as follows: For the leader election, a set of eligible JobManagers is identified. They all race to declare themselves as the leader, with one eventually becoming the active leader. The active JobManager then continually &ldquo;heartbeats&rdquo; to renew its position as the leader. In the meantime, all other standby JobManagers periodically make new attempts to become the leader — this ensures that the JobManager could failover quickly. Different components (e.g. ResourceManager, JobManager, Dispatcher, RestEndpoint) have separate leader election services and ConfigMaps.
The active leader publishes its address to the ConfigMap. It’s important to note that Flink will use the same ConfigMap for contending lock and storing the leader address. This ensures that there is no unexpected change snuck in during a periodic update.
The leader retrieval service is used to find the active leader’s address and allow the components to then register themselves. For example, TaskManagers retrieve the address of ResourceManager and JobManager for registration and to offer slots. Flink uses a Kubernetes watch in the leader retrieval service — once the content of ConfigMap changes, it usually means that the leader has changed, and so the listener can get the latest leader address immediately.
All other meta information (e.g. running jobs, job graphs, completed checkpoints and checkpointer counter) will be directly stored in the corresponding ConfigMaps. Only the leader can update the ConfigMap. The HA data will only be cleaned up once the Flink cluster reaches the global terminal state. Please note that only the pointers are stored in the ConfigMap; the concrete data will be stored in the DistributedStorage. This level of indirection is necessary to keep the amount of data in ConfigMap small (ConfigMap is built for data less than 1MB whereas state can grow to multiple GBs).
Example: Application Cluster with HA # You&rsquo;ll need a running Kubernetes cluster and to get kubeconfig properly set to follow along. You can use kubectl get nodes to verify that you’re all set! In this blog post, we’re using minikube for local testing.
1. Build a Docker image with the Flink job (my-flink-job.jar) baked in
FROM flink:1.12.1 RUN mkdir -p $FLINK_HOME/usrlib COPY /path/of/my-flink-job.jar $FLINK_HOME/usrlib/my-flink-job.jar Use the above Dockerfile to build a user image (&lt;user-image&gt;) and then push it to your remote image repository:
$ docker build -t &lt;user-image&gt; . $ docker push &lt;user-image&gt; **2. Start a Flink Application Cluster** $ ./bin/flink run-application \\ --detached \\ --parallelism 4 \\ --target kubernetes-application \\ -Dkubernetes.cluster-id=k8s-ha-app-1 \\ -Dkubernetes.container.image=&lt;user-image&gt; \\ -Dkubernetes.jobmanager.cpu=0.5 \\ -Dkubernetes.taskmanager.cpu=0.5 \\ -Dtaskmanager.numberOfTaskSlots=4 \\ -Dkubernetes.rest-service.exposed.type=NodePort \\ -Dhigh-availability=org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory \\ -Dhigh-availability.storageDir=s3://flink-bucket/flink-ha \\ -Drestart-strategy=fixed-delay \\ -Drestart-strategy.fixed-delay.attempts=10 \\ -Dcontainerized.master.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.12.1.jar \\ -Dcontainerized.taskmanager.env.ENABLE_BUILT_IN_PLUGINS=flink-s3-fs-hadoop-1.12.1.jar \\ local:///opt/flink/usrlib/my-flink-job.jar **3. Access the Flink Web UI** (http://minikube-ip-address:node-port) **and check that the job is running!** 2021-02-05 17:26:13,403 INFO org.apache.flink.kubernetes.KubernetesClusterDescriptor [] - Create flink application cluster k8s-ha-app-1 successfully, JobManager Web Interface: http://192.168.64.21:32388 You should be able to find a similar log in the Flink client and get the JobManager web interface URL.
**4. Kill the JobManager to simulate failure** $ kubectl exec {jobmanager_pod_name} -- /bin/sh -c &#34;kill 1&#34; **5. Verify that the job recovers from the latest successful checkpoint** Refresh the Flink Web UI until the new JobManager is launched, and then search for the following JobManager logs to verify that the job recovers from the latest successful checkpoint:
2021-02-05 09:44:01,636 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job 00000000000000000000000000000000 from Checkpoint 101 @ 1612518074802 for 00000000000000000000000000000000 located at &lt;checkpoint-not-externally-addressable&gt;. **6. Cancel the job** The job can be cancelled through the Flink the Web UI, or using the following command:
$ ./bin/flink cancel --target kubernetes-application -Dkubernetes.cluster-id=&lt;ClusterID&gt; &lt;JobID&gt; When the job is cancelled, all the Kubernetes resources created by Flink (e.g. JobManager deployment, TaskManager pods, service, Flink configuration ConfigMap, leader-related ConfigMaps) will be deleted automatically.
Conclusion # The native Kubernetes integration was first introduced in Flink 1.10, abstracting a lot of the complexities of hosting, configuring, managing and operating Flink clusters in cloud-native environments. After three major releases, the community has made great progress in supporting multiple deployment modes (i.e. session and application) and an alternative HA setup that doesn’t depend on Zookeeper.
Compared with standalone Kubernetes deployments, the native integration is more user-friendly and requires less upfront knowledge about Kubernetes. Given that Flink is now aware of the underlying Kubernetes cluster, it also can benefit from dynamic resource allocation and make more efficient use of Kubernetes cluster resources. The next building block to deepen Flink’s native integration with Kubernetes is the pod template (FLINK-15656), which will greatly enhance the flexibility of using advanced Kubernetes features (e.g. volumes, init container, sidecar container). This work is already in progress and will be added in the upcoming 1.13 release!
`}),e.add({id:124,href:"/2021/01/29/apache-flink-1.10.3-released/",title:"Apache Flink 1.10.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.10 series.
This release includes 36 fixes and minor improvements for Flink 1.10.2. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.10.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.10.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-14087] - throws java.lang.ArrayIndexOutOfBoundsException when emiting the data using RebalancePartitioner. [FLINK-15170] - WebFrontendITCase.testCancelYarn fails on travis [FLINK-15467] - Should wait for the end of the source thread during the Task cancellation [FLINK-16246] - Exclude &quot;SdkMBeanRegistrySupport&quot; from dynamically loaded AWS connectors [FLINK-17341] - freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException [FLINK-17458] - TaskExecutorSubmissionTest#testFailingScheduleOrUpdateConsumers [FLINK-17677] - FLINK_LOG_PREFIX recommended in docs is not always available [FLINK-18081] - Fix broken links in &quot;Kerberos Authentication Setup and Configuration&quot; doc [FLINK-18196] - flink throws \`NullPointerException\` when executeCheckpointing [FLINK-18212] - Init lookup join failed when use udf on lookup table [FLINK-18832] - BoundedBlockingSubpartition does not work with StreamTask [FLINK-18959] - Fail to archiveExecutionGraph because job is not finished when dispatcher close [FLINK-19022] - AkkaRpcActor failed to start but no exception information [FLINK-19109] - Split Reader eats chained periodic watermarks [FLINK-19135] - (Stream)ExecutionEnvironment.execute() should not throw ExecutionException [FLINK-19237] - LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with &quot;NoResourceAvailableException: Could not allocate the required slot within slot request timeout&quot; [FLINK-19401] - Job stuck in restart loop due to excessive checkpoint recoveries which block the JobMaster [FLINK-19557] - Issue retrieving leader after zookeeper session reconnect [FLINK-19675] - The plan of is incorrect when Calc contains WHERE clause, composite fields access and Python UDF at the same time [FLINK-19901] - Unable to exclude metrics variables for the last metrics reporter. [FLINK-20013] - BoundedBlockingSubpartition may leak network buffer if task is failed or canceled [FLINK-20018] - pipeline.cached-files option cannot escape &#39;:&#39; in path [FLINK-20033] - Job fails when stopping JobMaster [FLINK-20065] - UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException [FLINK-20076] - DispatcherTest.testOnRemovedJobGraphDoesNotCleanUpHAFiles does not test the desired functionality [FLINK-20183] - Fix the default PYTHONPATH is overwritten in client side [FLINK-20218] - AttributeError: module &#39;urllib&#39; has no attribute &#39;parse&#39; [FLINK-20875] - [CVE-2020-17518] Directory traversal attack: remote file writing through the REST API Improvement [FLINK-16753] - Exception from AsyncCheckpointRunnable should be wrapped in CheckpointException [FLINK-18287] - Correct the documentation of Python Table API in SQL pages [FLINK-19055] - MemoryManagerSharedResourcesTest contains three tests running extraordinary long [FLINK-19105] - Table API Sample Code Error [FLINK-19252] - Jaas file created under io.tmp.dirs - folder not created if not exists [FLINK-19339] - Support Avro&#39;s unions with logical types [FLINK-19523] - Hide sensitive command-line configurations Task [FLINK-20906] - Update copyright year to 2021 for NOTICE files `}),e.add({id:125,href:"/2021/01/19/apache-flink-1.12.1-released/",title:"Apache Flink 1.12.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.12 series.
This release includes 79 fixes and minor improvements for Flink 1.12.0. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.12.1.
Attention: Using unaligned checkpoints in Flink 1.12.0 combined with two/multiple inputs tasks or with union inputs for single input tasks can result in corrupted state. This can happen if a new checkpoint is triggered before recovery is fully completed. For state to be corrupted a task with two or more input gates must receive a checkpoint barrier exactly at the same time this tasks finishes recovering spilled in-flight data. In such case this new checkpoint can succeed, with corrupted/missing in-flight data, which will result in various deserialisation/corrupted data stream errors when someone attempts to recover from such corrupted checkpoint.
Using unaligned checkpoints in Flink 1.12.1, a corruption may occur in the checkpoint following a declined checkpoint.
A late barrier of a canceled checkpoint may lead to buffers being not written into the successive checkpoint, such that recovery is not possible. This happens, when the next checkpoint barrier arrives at a given operator before all previous barriers arrived, which can only happen after cancellation in unaligned checkpoints.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.12.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.12.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-18897] - Add documentation for the maxwell-json format [FLINK-20352] - Rework command line interface documentation page [FLINK-20353] - Rework logging documentation page [FLINK-20354] - Rework standalone deployment documentation page [FLINK-20355] - Rework K8s deployment documentation page [FLINK-20356] - Rework Mesos deployment documentation page [FLINK-20422] - Remove from .html files in flink documentation [FLINK-20485] - Map views are deserialized multiple times [FLINK-20601] - Rework PyFlink CLI documentation Bug [FLINK-19369] - BlobClientTest.testGetFailsDuringStreamingForJobPermanentBlob hangs [FLINK-19435] - Deadlock when loading different driver classes concurrently using Class.forName [FLINK-19725] - Logger cannot be initialized due to timeout: LoggerInitializationException is thrown [FLINK-19880] - Fix ignore-parse-errors not work for the legacy JSON format [FLINK-20213] - Partition commit is delayed when records keep coming [FLINK-20221] - DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss [FLINK-20273] - Fix Table api Kafka connector Sink Partitioner Document Error [FLINK-20372] - Update Kafka SQL connector page to mention properties.* options [FLINK-20389] - UnalignedCheckpointITCase failure caused by NullPointerException [FLINK-20404] - ZooKeeper quorum fails to start due to missing log4j library [FLINK-20419] - Insert fails due to failure to generate execution plan [FLINK-20428] - ZooKeeperLeaderElectionConnectionHandlingTest.testConnectionSuspendedHandlingDuringInitialization failed with &quot;No result is expected since there was no leader elected before stopping the server, yet&quot; [FLINK-20429] - KafkaTableITCase.testKafkaTemporalJoinChangelog failed with unexpected results [FLINK-20433] - UnalignedCheckpointTestBase.execute failed with &quot;TestTimedOutException: test timed out after 300 seconds&quot; [FLINK-20464] - Some Table examples are not built correctly [FLINK-20467] - Fix the Example in Python DataStream Doc [FLINK-20470] - MissingNode can&#39;t be casted to ObjectNode when deserializing JSON [FLINK-20476] - New File Sink end-to-end test Failed [FLINK-20486] - Hive temporal join should allow monitor interval smaller than 1 hour [FLINK-20492] - The SourceOperatorStreamTask should implement cancelTask() and finishTask() [FLINK-20493] - SQLClientSchemaRegistryITCase failed with &quot;Could not build the flink-dist image&quot; [FLINK-20521] - Null result values are being swallowed by RPC system [FLINK-20525] - StreamArrowPythonGroupWindowAggregateFunctionOperator doesn&#39;t handle rowtime and proctime properly [FLINK-20543] - Fix typo in upsert kafka docs [FLINK-20554] - The Checkpointed Data Size of the Latest Completed Checkpoint is incorrectly displayed on the Overview page of the UI [FLINK-20582] - Fix typos in \`CREATE Statements\` docs [FLINK-20607] - a wrong example in udfs page. [FLINK-20615] - Local recovery and sticky scheduling end-to-end test timeout with &quot;IOException: Stream Closed&quot; [FLINK-20626] - Canceling a job when it is failing will result in job hanging in CANCELING state [FLINK-20630] - [Kinesis][DynamoDB] DynamoDB Streams Consumer fails to consume from Latest [FLINK-20646] - ReduceTransformation does not work with RocksDBStateBackend [FLINK-20648] - Unable to restore job from savepoint when using Kubernetes based HA services [FLINK-20664] - Support setting service account for TaskManager pod [FLINK-20665] - FileNotFoundException when restore from latest Checkpoint [FLINK-20666] - Fix the deserialized Row losing the field_name information in PyFlink [FLINK-20669] - Add the jzlib LICENSE file in flink-python module [FLINK-20703] - HiveSinkCompactionITCase test timeout [FLINK-20704] - Some rel data type does not implement the digest correctly [FLINK-20756] - PythonCalcSplitConditionRule is not working as expected [FLINK-20764] - BatchGroupedReduceOperator does not emit results for singleton inputs [FLINK-20781] - UnalignedCheckpointITCase failure caused by NullPointerException [FLINK-20784] - .staging_xxx does not exist, when insert into hive [FLINK-20793] - Fix NamesTest due to code style refactor [FLINK-20803] - Version mismatch between spotless-maven-plugin and google-java-format plugin [FLINK-20841] - Fix compile error due to duplicated generated files Improvement [FLINK-19013] - Log start/end of state restoration [FLINK-19259] - Use classloader release hooks with Kinesis producer to avoid metaspace leak [FLINK-19832] - Improve handling of immediately failed physical slot in SlotSharingExecutionSlotAllocator [FLINK-20055] - Datadog API Key exposed in Flink JobManager logs [FLINK-20168] - Translate page &#39;Flink Architecture&#39; into Chinese [FLINK-20209] - Add missing checkpoint configuration to Flink UI [FLINK-20298] - Replace usage of in flink documentation [FLINK-20468] - Enable leadership control in MiniCluster to test JM failover [FLINK-20510] - Enable log4j2 monitor interval by default [FLINK-20519] - Extend HBase notice with transitively bundled dependencies [FLINK-20570] - The \`NOTE\` tip style is different from the others in process_function page. [FLINK-20588] - Add docker-compose as appendix to Mesos documentation [FLINK-20629] - [Kinesis][EFO] Migrate from DescribeStream to DescribeStreamSummary [FLINK-20647] - Use yield to generate output datas in ProcessFunction for Python DataStream [FLINK-20650] - Mark &quot;native-k8s&quot; as deprecated in docker-entrypoint.sh [FLINK-20651] - Use Spotless/google-java-format for code formatting/enforcement [FLINK-20682] - Add configuration options related to hadoop [FLINK-20697] - Correct the Type of &quot;lookup.cache.ttl&quot; in jdbc.md/jdbc.zh.md [FLINK-20790] - Generated classes should not be put under src/ directory [FLINK-20792] - Allow shorthand invocation of spotless [FLINK-20805] - Blink runtime classes partially ignored by spotless [FLINK-20822] - Don&#39;t check whether a function is generic in hive catalog [FLINK-20866] - Add how to list jobs in Yarn deployment documentation when HA enabled Task [FLINK-20300] - Create Flink 1.12 release notes [FLINK-20906] - Update copyright year to 2021 for NOTICE files `}),e.add({id:126,href:"/2021/01/18/using-rocksdb-state-backend-in-apache-flink-when-and-how/",title:"Using RocksDB State Backend in Apache Flink: When and How",section:"Flink Blog",content:`Stream processing applications are often stateful, “remembering” information from processed events and using it to influence further event processing. In Flink, the remembered information, i.e., state, is stored locally in the configured state backend. To prevent data loss in case of failures, the state backend periodically persists a snapshot of its contents to a pre-configured durable storage. The RocksDB state backend (i.e., RocksDBStateBackend) is one of the three built-in state backends in Flink. This blog post will guide you through the benefits of using RocksDB to manage your application’s state, explain when and how to use it and also clear up a few common misconceptions. Having said that, this is not a blog post to explain how RocksDB works in-depth or how to do advanced troubleshooting and performance tuning; if you need help with any of those topics, you can reach out to the Flink User Mailing List.
State in Flink # To best understand state and state backends in Flink, it’s important to distinguish between in-flight state and state snapshots. In-flight state, also known as working state, is the state a Flink job is working on. It is always stored locally in memory (with the possibility to spill to disk) and can be lost when jobs fail without impacting job recoverability. State snapshots, i.e., checkpoints and savepoints, are stored in a remote durable storage, and are used to restore the local state in the case of job failures. The appropriate state backend for a production deployment depends on scalability, throughput, and latency requirements.
What is RocksDB? # Thinking of RocksDB as a distributed database that needs to run on a cluster and to be managed by specialized administrators is a common misconception. RocksDB is an embeddable persistent key-value store for fast storage. It interacts with Flink via the Java Native Interface (JNI). The picture below shows where RocksDB fits in a Flink cluster node. Details are explained in the following sections.
RocksDB in Flink # Everything you need to use RocksDB as a state backend is bundled in the Apache Flink distribution, including the native shared library:
$ jar -tvf lib/flink-dist_2.12-1.12.0.jar| grep librocksdbjni-linux64 8695334 Wed Nov 27 02:27:06 CET 2019 librocksdbjni-linux64.so At runtime, RocksDB is embedded in the TaskManager processes. It runs in native threads and works with local files. For example, if you have a job configured with RocksDBStateBackend running in your Flink cluster, you’ll see something similar to the following, where 32513 is the TaskManager process ID.
$ ps -T -p 32513 | grep -i rocksdb 32513 32633 ? 00:00:00 rocksdb:low0 32513 32634 ? 00:00:00 rocksdb:high0 Note The command is for Linux only. For other operating systems, please refer to their documentation. When to use RocksDBStateBackend # In addition to RocksDBStateBackend, Flink has two other built-in state backends: MemoryStateBackend and FsStateBackend. They both are heap-based, as in-flight state is stored in the JVM heap. For the moment being, let’s ignore MemoryStateBackend, as it is intended only for local developments and debugging, not for production use.
With RocksDBStateBackend, in-flight state is first written into off-heap/native memory, and then flushed to local disks when a configured threshold is reached. This means that RocksDBStateBackend can support state larger than the total configured heap capacity. The amount of state that you can store in RocksDBStateBackend is only limited by the amount of disk space available across the entire cluster. In addition, since RocksDBStateBackend doesn’t use the JVM heap to store in-flight state, it’s not affected by JVM Garbage Collection and therefore has predictable latency.
On top of full, self-contained state snapshots, RocksDBStateBackend also supports incremental checkpointing as a performance tuning option. An incremental checkpoint stores only the changes that occurred since the latest completed checkpoint. This dramatically reduces checkpointing time in comparison to performing a full snapshot. RocksDBStateBackend is currently the only state backend that supports incremental checkpointing.
RocksDB is a good option when:
The state of your job is larger than can fit in local memory (e.g., long windows, large keyed state); You’re looking into incremental checkpointing as a way to reduce checkpointing time; You expect to have more predictable latency without being impacted by JVM Garbage Collection. Otherwise, if your application has small state or requires very low latency, you should consider FsStateBackend. As a rule of thumb, RocksDBStateBackend is a few times slower than heap-based state backends, because it stores key/value pairs as serialized bytes. This means that any state access (reads/writes) needs to go through a de-/serialization process crossing the JNI boundary, which is more expensive than working directly with the on-heap representation of state. The upside is that, for the same amount of state, it has a low memory footprint compared to the corresponding on-heap representation.
How to use RocksDBStateBackend # RocksDB is fully embedded within and fully managed by the TaskManager process. RocksDBStateBackend can be configured at the cluster level as the default for the entire cluster, or at the job level for individual jobs. The job level configuration takes precedence over the cluster level configuration.
Cluster Level # Add the following configuration in conf/flink-conf.yaml:
state.backend: rocksdb state.backend.incremental: true state.checkpoints.dir: hdfs:///flink-checkpoints # location to store checkpoints Job Level # Add the following into your job’s code after StreamExecutionEnvironment is created:
# 'env' is the created StreamExecutionEnvironment # 'true' is to enable incremental checkpointing env.setStateBackend(new RocksDBStateBackend(&quot;hdfs:///fink-checkpoints&quot;, true)); Note In addition to HDFS, you can also use other on-premises or cloud-based object stores if the corresponding dependencies are added under [FLINK_HOME/plugins](//nightlies.apache.org/flink/flink-docs-stable/deployment/filesystems/plugins.html). Best Practices and Advanced Configuration # We hope this overview helped you gain a better understanding of the role of RocksDB in Flink and how to successfully run a job with RocksDBStateBackend. To round it off, we’ll explore some best practices and a few reference points for further troubleshooting and performance tuning.
State Location in RocksDB # As mentioned earlier, in-flight state in RocksDBStateBackend is spilled to files on disk. These files are located under the directory specified by the Flink configuration state.backend.rocksdb.localdir. Because disk performance has a direct impact on RocksDB’s performance, it’s recommended that this directory is located on a local disk. It’s discouraged to configure it to a remote network-based location like NFS or HDFS, as writing to remote disks is usually slower. Also high availability is not a requirement for in-flight state. Local SSD disks are preferred if high disk throughput is required.
State snapshots are persisted to remote durable storage. During state snapshotting, TaskManagers take a snapshot of the in-flight state and store it remotely. Transferring the state snapshot to remote storage is handled purely by the TaskManager itself without the involvement of the state backend. So, state.checkpoints.dir or the parameter you set in the code for a particular job can be different locations like an on-premises HDFS cluster or a cloud-based object store like Amazon S3, Azure Blob Storage, Google Cloud Storage, Alibaba OSS, etc.
Troubleshooting RocksDB # To check how RocksDB is behaving in production, you should look for the RocksDB log file named LOG. By default, this log file is located in the same directory as your data files, i.e., the directory specified by the Flink configuration state.backend.rocksdb.localdir. When enabled, RocksDB statistics are also logged there to help diagnose potential problems. For further information, check RocksDB Troubleshooting Guide in RocksDB Wiki. If you are interested in the RocksDB behavior trend over time, you can consider enabling RocksDB native metrics for your Flink job.
Note From Flink 1.10, RocksDB logging was effectively disabled by [setting the log level to HEADER](https://github.com/apache/flink/blob/master/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/PredefinedOptions.java#L64). To enable it, check [How to get RocksDB's LOG file back for advanced troubleshooting](https://ververica.zendesk.com/hc/en-us/articles/360015933320-How-to-get-RocksDB-s-LOG-file-back-for-advanced-troubleshooting). Warning Enabling RocksDB's native metrics in Flink may have a negative performance impact on your job. Tuning RocksDB # Since Flink 1.10, Flink configures RocksDB’s memory allocation to the amount of managed memory of each task slot by default. The primary mechanism for improving memory-related performance issues is to increase Flink’s managed memory via the Flink configuration taskmanager.memory.managed.size or taskmanager.memory.managed.fraction. For more fine-grained control, you should first disable the automatic memory management by setting state.backend.rocksdb.memory.managed to false, then start with the following Flink configuration: state.backend.rocksdb.block.cache-size (corresponding to block_cache_size in RocksDB), state.backend.rocksdb.writebuffer.size (corresponding to write_buffer_size in RocksDB), and state.backend.rocksdb.writebuffer.count (corresponding to max_write_buffer_number in RocksDB). For more details, check this blog post on how to manage RocksDB memory size in Flink and the RocksDB Memory Usage Wiki page.
While data is being written or overwritten in RocksDB, flushing from memory to local disks and data compaction are managed in the background by RocksDB threads. On a machine with many CPU cores, you should increase the parallelism of background flushing and compaction by setting the Flink configuration state.backend.rocksdb.thread.num (corresponding to max_background_jobs in RocksDB). The default configuration is usually too small for a production setup. If your job reads frequently from RocksDB, you should consider enabling bloom filters.
For other RocksDBStateBackend configurations, check the Flink documentation on Advanced RocksDB State Backends Options. For further tuning, check RocksDB Tuning Guide in RocksDB Wiki.
Conclusion # The RocksDB state backend (i.e., RocksDBStateBackend) is one of the three state backends bundled in Flink, and can be a powerful choice when configuring your streaming applications. It enables scalable applications maintaining up to many terabytes of state with exactly-once processing guarantees. If the state of your Flink job is too large to fit on the JVM heap, you are interested in incremental checkpointing, or you expect to have predictable latency, you should use RocksDBStateBackend. Since RocksDB is embedded in TaskManager processes as native threads and works with files on local disks, RocksDBStateBackend is supported out-of-the-box without the need to further setup and manage any external systems or processes.
`}),e.add({id:127,href:"/2021/01/11/exploring-fine-grained-recovery-of-bounded-data-sets-on-flink/",title:"Exploring fine-grained recovery of bounded data sets on Flink",section:"Flink Blog",content:`Apache Flink is a very versatile tool for all kinds of data processing workloads. It can process incoming data within a few milliseconds or crunch through petabytes of bounded datasets (also known as batch processing).
Processing efficiency is not the only parameter users of data processing systems care about. In the real world, system outages due to hardware or software failure are expected to happen all the time. For unbounded (or streaming) workloads, Flink is using periodic checkpoints to allow for reliable and correct recovery. In case of bounded data sets, having a reliable recovery mechanism is mission critical — as users do not want to potentially lose many hours of intermediate processing results.
Apache Flink 1.9 introduced fine-grained recovery into its internal workload scheduler. The Flink APIs that are made for bounded workloads benefit from this change by individually recovering failed operators, re-using results from the previous processing step.
In this blog post, we are going to give an overview over these changes, and we will experimentally validate their effectiveness.
How does fine-grained recovery work? # For streaming jobs (and in pipelined mode for batch jobs), Flink is using a coarse-grained restart-strategy: upon failure, the entire job is restarted (but streaming jobs have an entirely different fault-tolerance model, using checkpointing)
For batch jobs, we can use a more sophisticated recovery strategy, by using cached intermediate results, thus only restarting parts of the pipeline.
Let’s look at the topology below: Some connections are pipelined (between A1 and B1, as well as A2 and B2) &ndash; data is immediately streamed from operator A1 to B1.
However the output of B1 and B2 is cached on disk (indicated by the grey box). We call such connections blocking. If there’s a failure in the steps succeeding B1 and B2 and the results of B1 and B2 have already been produced, we don’t need to reprocess this part of the pipeline &ndash; we can reuse the cached result.
Looking at the case of a failure (here of D2), we see that we do not need to restart the entire job. Restarting C2 and all dependent tasks is sufficient. This is possible because we can read the cached results of B1 and B2. We call this recovery mechanism “fine-grained”, as we only restart parts of the topology to recover from a failure &ndash; reducing the recovery time, resource consumption and overall job runtime.
Experimenting with fine-grained recovery # To validate the implementation, we’ve conducted a small experiment. The following sections will describe the setup, the experiment and the results.
Setup # Hardware: The experiment was performed on an idle MacBook Pro 2016 (16 GB of memory, SSD storage).
Test Job: We used a modified version (for instrumentation only) of the TPC-H Query 3 example that is part of the Flink batch (DataSet API) examples, running on Flink 1.12
This is the topology of the query:
It has many blocking data exchanges where we cache intermediate results, if executed in batch mode.
Test Data: We generated a TPC-H dataset of 150 GB as the input.
Cluster: We were running 4 TaskManagers with 2 slots each and 1 JobManager in standalone mode.
Running this test job takes roughly 15 minutes with the given hardware and data.
For inducing failures into the job, we decided to randomly throw exceptions in the operators. This has a number of benefits compared to randomly killing entire TaskManagers:
Killing a TaskManager would require starting and registering a new TaskManager — which introduces an uncontrollable factor into our benchmark: We don&rsquo;t want to test how quickly Flink is reconciling a cluster. Killing an entire TaskManager would bring down on average 1/4th of all running operators. In larger production setups, a failure usually affects only a smaller fraction of all running operators. The differences between the execution modes would be less obvious if we killed entire TaskManagers. Keeping TaskManagers across failures helps to better simulate using an external shuffle service, as intermediate results are preserved despite a failure. The failures are controlled by a central &ldquo;failure coordinator&rdquo; which decides when to kill which operator.
Failures are artificially triggered based on a configured mean failure frequency. The failures follow an exponential distribution, which is suitable for simulating continuous and independent failures at a configured average rate.
The Experiment # We were running the job with two parameters which we varied in the benchmark:
Execution Mode: BATCH or PIPELINED.
In PIPELINED mode, except for data exchanges susceptible for deadlocks all exchanges are pipelined (e.g. upstream operators are streaming their result downstream). A failure means that we have to restart the entire job, and start the processing from scratch.
In BATCH mode, all shuffles and broadcasts are persisted before downstream consumption. You can imagine the execution to happen in steps. Since we are persisting intermediate results in BATCH mode, we do not have to reprocess all upstream operators in case of an (induced) failure. We just have to restart the step that was in progress during the failure.
Mean Failure Frequency: This parameter controls the frequency of failures induced into the running job. If the parameter is set to 5 minutes, on average, a failure occurs every 5 minutes. The failures are following an exponential distribution. We’ve chosen values between 15 minutes and 20 seconds.
Each configuration combination was executed at least 3 times. We report the average execution time. This is necessary due to the probabilistic behavior of the induced failures.
The Results # The chart below shows the execution time in seconds for each batch and pipelined execution with different failure frequencies.
We will now discuss some findings:
Execution time with rare failures: Looking at the first few results on the left, where we compare the behavior with a mean failure frequency of 15 (=900s), 10 (=600s), 9 (=540s), 8 (=480s), 7 (=420s) minutes. The execution times are mostly the same, around 15 minutes. The batch execution time is usually lower, and more predictable. This behavior is easy to explain. If an error occurred later in the execution, the pipelined mode needs to start from scratch, while the batch mode can re-use previous intermediate results. The variances in runtime can be explained by statistical effects: if an error happens to be induced close to the end of a pipelined mode run, the entire job needs to rerun. Execution time with frequent failures: The results “in the middle”, with failure frequencies of 6, 5, 4, 3 and 2 minutes show that the pipelined mode execution gets unfeasible at some point: If failures happen on average every 3 minutes, the average execution time reaches more than 60 minutes, for failures every 2 minutes the time spikes to more than 120 minutes. The pipelined job is unable to finish the execution, only if we happen to find a window where no failure is induced for 15 minutes, the pipelined job manages to produce the final result. For more frequent failures, the pipelined mode did not manage to finish at all. How many failures can the batch mode sustain? The last numbers, with failure frequencies between 60 and 20 seconds are probably a bit unrealistic for real world scenarios. But we wanted to investigate how frequent failures can become for the batch mode to become unfeasible. With failures induced every 30 seconds, the average execution time is 30 minutes. In other words, even if you have two failures per minute, your execution time only doubles in this case. The batch mode is much more predictable and well behaved when it comes to execution times. Conclusion # Based on these results, it makes a lot of sense to use the batch execution mode for batch workloads, as the resource consumption and overall execution times are substantially lower compared to the pipelined execution mode.
In general, we recommend conducting your own performance experiments on your own hardware and with your own workloads, as results might vary from what we’ve presented here. Despite the findings here, the pipelined mode probably has some performance advantages in environments with rare failures and slower I/O (for example when using spinning disks, or network attached disks). On the other hand, CPU intensive workloads might benefit from the batch mode even in slow I/O environments.
We should also note that the caching (and subsequent reprocessing on failure) only works if the cached results are still present &ndash; this is currently only the case, if the TaskManager survives a failure. However, this is an unrealistic assumption as many failures would cause the TaskManager process to die. To mitigate this limitation, data processing frameworks employ external shuffle services that persist the cached results independent of the data processing framework. Since Flink 1.9, there is support for a pluggable shuffle service, and there are tickets for adding implementations for YARN (FLINK-13247) and Kubernetes (FLINK-13246). Once these implementations are added, TaskManagers can recover cached results even if the process or machine got killed.
Despite these considerations, we believe that fine-grained recovery is a great improvement for Flink’s batch capabilities, as it makes the framework much more efficient, even in unstable environments.
`}),e.add({id:128,href:"/2021/01/07/whats-new-in-the-pulsar-flink-connector-2.7.0/",title:"What's New in the Pulsar Flink Connector 2.7.0",section:"Flink Blog",content:` About the Pulsar Flink Connector # In order for companies to access real-time data insights, they need unified batch and streaming capabilities. Apache Flink unifies batch and stream processing into one single computing engine with “streams” as the unified data representation. Although developers have done extensive work at the computing and API layers, very little work has been done at the data messaging and storage layers. In reality, data is segregated into data silos, created by various storage and messaging technologies. As a result, there is still no single source-of-truth and the overall operation for the developer teams poses significant challenges. To address such operational challenges, we need to store data in streams. Apache Pulsar (together with Apache BookKeeper) perfectly meets the criteria: data is stored as one copy (source-of-truth) and can be accessed in streams (via pub-sub interfaces) and segments (for batch processing). When Flink and Pulsar come together, the two open source technologies create a unified data architecture for real-time, data-driven businesses.
The Pulsar Flink connector provides elastic data processing with Apache Pulsar and Apache Flink, allowing Apache Flink to read/write data from/to Apache Pulsar. The Pulsar Flink Connector enables you to concentrate on your business logic without worrying about the storage details.
Challenges # When we first developed the Pulsar Flink Connector, it received wide adoption from both the Flink and Pulsar communities. Leveraging the Pulsar Flink connector, Hewlett Packard Enterprise (HPE) built a real-time computing platform, BIGO built a real-time message processing system, and Zhihu is in the process of assessing the Connector’s fit for a real-time computing system.
With more users adopting the Pulsar Flink Connector, it became clear that one of the common issues was evolving around data formats and specifically performing serialization and deserialization. While the Pulsar Flink connector leverages the Pulsar serialization, the previous connector versions did not support the Flink data format. As a result, users had to manually configure their setup in order to use the connector for real-time computing scenarios.
To improve the user experience and make the Pulsar Flink connector easier-to-use, we built the capabilities to fully support the Flink data format, so users of the connector do not spend time on manual tuning and configuration.
What’s New in the Pulsar Flink Connector 2.7.0? # The Pulsar Flink Connector 2.7.0 supports features in Apache Pulsar 2.7.0 and Apache Flink 1.12 and is fully compatible with the Flink connector and Flink message format. With the latest version, you can use important features in Flink, such as exactly-once sink, upsert Pulsar mechanism, Data Definition Language (DDL) computed columns, watermarks, and metadata. You can also leverage the Key-Shared subscription in Pulsar, and conduct serialization and deserialization without much configuration. Additionally, you can easily customize the configuration based on your business requirements.
Below, we provide more details about the key features in the Pulsar Flink Connector 2.7.0.
Ordered message queue with high-performance # When users needed to strictly guarantee the ordering of messages, only one consumer was allowed to consume them. This had a severe impact on throughput. To address this, we designed a Key_Shared subscription model in Pulsar that guarantees the ordering of messages and improves throughput by adding a Key to each message and routes messages with the same Key Hash to one consumer.
Pulsar Flink Connector 2.7.0 supports the Key_Shared subscription model. You can enable this feature by setting enable-key-hash-range to true. The Key Hash range processed by each consumer is decided by the parallelism of tasks.
Introducing exactly-once semantics for Pulsar sink (based on the Pulsar transaction) # In previous versions, sink operators only supported at-least-once semantics, which could not fully meet requirements for end-to-end consistency. To deduplicate messages, users had to do some dirty work, which was not user-friendly.
Transactions are supported in Pulsar 2.7.0, which greatly improves the fault tolerance capability of the Flink sink. In the Pulsar Flink Connector 2.7.0, we designed exactly-once semantics for sink operators based on Pulsar transactions. Flink uses the two-phase commit protocol to implement TwoPhaseCommitSinkFunction. The main life cycle methods are beginTransaction(), preCommit(), commit(), abort(), recoverAndCommit(), recoverAndAbort().
You can flexibly select semantics when creating a sink operator while the internal logic changes are transparent. Pulsar transactions are similar to the two-phase commit protocol in Flink, which greatly improves the reliability of the Connector Sink.
It’s easy to implement beginTransaction and preCommit. You only need to start a Pulsar transaction and persist the TID of the transaction after the checkpoint. In the preCommit phase, you need to ensure that all messages are flushed to Pulsar, while any pre-committed messages will be committed eventually.
We focus on recoverAndCommit and recoverAndAbort in implementation. Limited by Kafka features, Kafka connector adopts hack styles for recoverAndCommit. Pulsar transactions do not rely on the specific Producer, so it’s easy for you to commit and abort transactions based on TID.
Pulsar transactions are highly efficient and flexible. Taking advantages of Pulsar and Flink, the Pulsar Flink connector is even more powerful. We will continue to improve transactional sink in the Pulsar Flink connector.
Introducing upsert-pulsar connector # Users in the Flink community expressed their needs for the upsert Pulsar. After looking through mailing lists and issues, we’ve summarized the following three reasons.
Interpret Pulsar topic as a changelog stream that interprets records with keys as upsert (aka insert/update) events. As a part of the real time pipeline, join multiple streams for enrichment and store results into a Pulsar topic for further calculation later. However, the result may contain update events. As a part of the real time pipeline, aggregate on data streams and store results into a Pulsar topic for further calculation later. However, the result may contain update events. Based on the requirements, we add support for Upsert Pulsar. The upsert-pulsar connector allows for reading data from and writing data to Pulsar topics in the upsert fashion.
As a source, the upsert-pulsar connector produces a changelog stream, where each data record represents an update or delete event. More precisely, the value in a data record is interpreted as an UPDATE of the last value for the same key, if any (if a corresponding key does not exist yet, the update will be considered an INSERT). Using the table analogy, a data record in a changelog stream is interpreted as an UPSERT (aka INSERT/UPDATE) because any existing row with the same key is overwritten. Also, null values are interpreted in a special way: a record with a null value represents a “DELETE”.
As a sink, the upsert-pulsar connector can consume a changelog stream. It will write INSERT/UPDATE_AFTER data as normal Pulsar message values and write DELETE data as Pulsar message with null values (indicate tombstone for the key). Flink will guarantee the message ordering on the primary key by partitioning data on the values of the primary key columns, so the update/deletion messages on the same key will fall into the same partition.
Support new source interface and Table API introduced in FLIP-27 and FLIP-95 # This feature unifies the source of the batch stream and optimizes the mechanism for task discovery and data reading. It is also the cornerstone of our implementation of Pulsar batch and streaming unification. The new Table API supports DDL computed columns, watermarks and metadata.
Support SQL read and write metadata as described in FLIP-107 # FLIP-107 enables users to access connector metadata as a metadata column in table definitions. In real-time computing, users normally need additional information, such as eventTime, or customized fields. The Pulsar Flink connector supports SQL read and write metadata, so it is flexible and easy for users to manage metadata of Pulsar messages in the Pulsar Flink Connector 2.7.0. For details on the configuration, refer to Pulsar Message metadata manipulation.
Add Flink format type atomic to support Pulsar primitive types # In the Pulsar Flink Connector 2.7.0, we add Flink format type atomic to support Pulsar primitive types. When processing with Flink requires a Pulsar primitive type, you can use atomic as the connector format. You can find more information on Pulsar primitive types here.
Migration # If you’re using the previous Pulsar Flink Connector version, you need to adjust your SQL and API parameters accordingly. Below we provide details on each.
SQL # In SQL, we’ve changed the Pulsar configuration parameters in the DDL declaration. The name of some parameters are changed, but the values are not.
Remove the connector. prefix from the parameter names. Change the name of the connector.type parameter into connector. Change the startup mode parameter name from connector.startup-mode into scan.startup.mode. Adjust Pulsar properties as properties.pulsar.reader.readername=testReaderName. If you use SQL in the Pulsar Flink Connector, you need to adjust your SQL configuration accordingly when migrating to Pulsar Flink Connector 2.7.0. The following sample shows the differences between previous versions and the 2.7.0 version for SQL.
SQL in previous versions:
create table topic1( \`rip\` VARCHAR, \`rtime\` VARCHAR, \`uid\` bigint, \`client_ip\` VARCHAR, \`day\` as TO_DATE(rtime), \`hour\` as date_format(rtime,&#39;HH&#39;) ) with ( &#39;connector.type&#39; =&#39;pulsar&#39;, &#39;connector.version&#39; = &#39;1&#39;, &#39;connector.topic&#39; =&#39;persistent://public/default/test_flink_sql&#39;, &#39;connector.service-url&#39; =&#39;pulsar://xxx&#39;, &#39;connector.admin-url&#39; =&#39;http://xxx&#39;, &#39;connector.startup-mode&#39; =&#39;earliest&#39;, &#39;connector.properties.0.key&#39; =&#39;pulsar.reader.readerName&#39;, &#39;connector.properties.0.value&#39; =&#39;testReaderName&#39;, &#39;format.type&#39; =&#39;json&#39;, &#39;update-mode&#39; =&#39;append&#39; ); SQL in Pulsar Flink Connector 2.7.0:
create table topic1( \`rip\` VARCHAR, \`rtime\` VARCHAR, \`uid\` bigint, \`client_ip\` VARCHAR, \`day\` as TO_DATE(rtime), \`hour\` as date_format(rtime,&#39;HH&#39;) ) with ( &#39;connector&#39; =&#39;pulsar&#39;, &#39;topic&#39; =&#39;persistent://public/default/test_flink_sql&#39;, &#39;service-url&#39; =&#39;pulsar://xxx&#39;, &#39;admin-url&#39; =&#39;http://xxx&#39;, &#39;scan.startup.mode&#39; =&#39;earliest&#39;, &#39;properties.pulsar.reader.readername&#39; = &#39;testReaderName&#39;, &#39;format&#39; =&#39;json&#39; ); API # From an API perspective, we adjusted some classes and enabled easier customization.
To solve serialization issues, we changed the signature of the construction method FlinkPulsarSink, and added PulsarSerializationSchema. We removed inappropriate classes related to row, such as FlinkPulsarRowSink, FlinkPulsarRowSource. If you need to deal with Row formats, you can use Apache Flink&rsquo;s Row related serialization components. You can build PulsarSerializationSchema by using PulsarSerializationSchemaWrapper.Builder. TopicKeyExtractor is moved into PulsarSerializationSchemaWrapper. When you adjust your API, you can take the following sample as reference.
new PulsarSerializationSchemaWrapper.Builder&lt;&gt;(new SimpleStringSchema()) .setTopicExtractor(str -&gt; getTopic(str)) .build(); Future Plan # Future plans involve the design of a batch and stream solution integrated with Pulsar Source, based on the new Flink Source API (FLIP-27). The new solution will overcome the limitations of the current streaming source interface (SourceFunction) and simultaneously unify the source interfaces between the batch and streaming APIs.
Pulsar offers a hierarchical architecture where data is divided into streaming, batch, and cold data, which enables Pulsar to provide infinite capacity. This makes Pulsar an ideal solution for unified batch and streaming.
The batch and stream solution based on the new Flink Source API is divided into two simple parts: SplitEnumerator and Reader. SplitEnumerator discovers and assigns partitions, and Reader reads data from the partition.
Apache Pulsar stores messages in the ledger block for users to locate the ledgers through Pulsar admin, and then provide broker partition, BookKeeper partition, Offloader partition, and other information through different partitioning policies. For more details, you can refer here.
Conclusion # The latest version of the Pulsar Flink Connector is now available and we encourage everyone to use/upgrade to the Pulsar Flink Connector 2.7.0. The new version provides significant user enhancements, enabled by various features in Pulsar 2.7 and Flink 1.12. We will be contributing the Pulsar Flink Connector 2.7.0 to the Apache Flink repository soon. If you have any questions or concerns about the Pulsar Flink Connector, feel free to open issues in this repository.
`}),e.add({id:129,href:"/2021/01/02/stateful-functions-2.2.2-release-announcement/",title:"Stateful Functions 2.2.2 Release Announcement",section:"Flink Blog",content:`The Apache Flink community released the second bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.2.
The most important change of this bugfix release is upgrading Apache Flink to version 1.11.3. In addition to many stability fixes to the Flink runtime itself, this also allows StateFun applications to safely use savepoints to upgrade from older versions earlier than StateFun 2.2.1. Previously, restoring from savepoints could have failed under certain conditions.
We strongly recommend all users to upgrade to 2.2.2.
You can find the binaries on the updated Downloads page.
This release includes 5 fixes and minor improvements since StateFun 2.2.1. Below is a detailed list of all fixes and improvements:
Improvement [FLINK-20699] - Feedback invocation_id must not be constant. Task [FLINK-20161] - Consider switching from Travis CI to Github Actions for flink-statefun&#39;s CI workflows [FLINK-20189] - Restored feedback events may be silently dropped if per key-group header bytes were not fully read [FLINK-20636] - Require unaligned checkpoints to be disabled in StateFun applications [FLINK-20689] - Upgrade StateFun to Flink 1.11.3 `}),e.add({id:130,href:"/2020/12/18/apache-flink-1.11.3-released/",title:"Apache Flink 1.11.3 Released",section:"Flink Blog",content:"The Apache Flink community released the third bugfix version of the Apache Flink 1.11 series.\nThis release includes 151 fixes and minor improvements for Flink 1.11.2. The list below includes a detailed list of all fixes and improvements.\nWe highly recommend all users to upgrade to Flink 1.11.3.\nUpdated Maven dependencies:\n&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.11.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.\nList of resolved issues:\nSub-task [FLINK-17393] - Improve the `FutureCompletingBlockingQueue` to wakeup blocking put() more elegantly. [FLINK-18604] - HBase ConnectorDescriptor can not work in Table API [FLINK-18673] - Calling ROW() in a UDF results in UnsupportedOperationException [FLINK-18680] - Improve RecordsWithSplitIds API [FLINK-18916] - Add a &quot;Operations&quot; link(linked to dev/table/tableApi.md) under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-18918] - Add a &quot;Connectors&quot; document under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-18922] - Add a &quot;Catalogs&quot; link (linked to dev/table/catalogs.md) under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-18926] - Add a &quot;Environment Variables&quot; document under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-19162] - Allow Split Reader based sources to reuse record batches [FLINK-19205] - SourceReaderContext should give access to Configuration and Hostbame [FLINK-20397] - Pass checkpointId to OperatorCoordinator.resetToCheckpoint(). Bug [FLINK-9992] - FsStorageLocationReferenceTest#testEncodeAndDecode failed in CI [FLINK-13733] - FlinkKafkaInternalProducerITCase.testHappyPath fails on Travis [FLINK-15170] - WebFrontendITCase.testCancelYarn fails on travis [FLINK-16246] - Exclude &quot;SdkMBeanRegistrySupport&quot; from dynamically loaded AWS connectors [FLINK-16268] - Failed to run rank over window with Hive built-in functions [FLINK-16768] - HadoopS3RecoverableWriterITCase.testRecoverWithStateWithMultiPart hangs [FLINK-17341] - freeSlot in TaskExecutor.closeJobManagerConnection cause ConcurrentModificationException [FLINK-17458] - TaskExecutorSubmissionTest#testFailingScheduleOrUpdateConsumers [FLINK-17677] - FLINK_LOG_PREFIX recommended in docs is not always available [FLINK-17825] - HA end-to-end gets killed due to timeout [FLINK-18128] - CoordinatedSourceITCase.testMultipleSources gets stuck [FLINK-18196] - flink throws `NullPointerException` when executeCheckpointing [FLINK-18222] - &quot;Avro Confluent Schema Registry nightly end-to-end test&quot; unstable with &quot;Kafka cluster did not start after 120 seconds&quot; [FLINK-18815] - AbstractCloseableRegistryTest.testClose unstable [FLINK-18818] - HadoopRenameCommitterHDFSTest.testCommitOneFile[Override: false] failed with &quot;java.io.IOException: The stream is closed&quot; [FLINK-18836] - Python UDTF doesn&#39;t work well when the return type isn&#39;t generator [FLINK-18915] - FIXED_PATH(dummy Hadoop Path) with WriterImpl may cause ORC writer OOM [FLINK-19022] - AkkaRpcActor failed to start but no exception information [FLINK-19121] - Avoid accessing HDFS frequently in HiveBulkWriterFactory [FLINK-19135] - (Stream)ExecutionEnvironment.execute() should not throw ExecutionException [FLINK-19138] - Python UDF supports directly specifying input_types as DataTypes.ROW [FLINK-19140] - Join with Table Function (UDTF) SQL example is incorrect [FLINK-19151] - Flink does not normalize container resource with correct configurations when Yarn FairScheduler is used [FLINK-19154] - Application mode deletes HA data in case of suspended ZooKeeper connection [FLINK-19170] - Parameter naming error [FLINK-19201] - PyFlink e2e tests is instable and failed with &quot;Connection broken: OSError&quot; [FLINK-19227] - The catalog is still created after opening failed in catalog registering [FLINK-19237] - LeaderChangeClusterComponentsTest.testReelectionOfJobMaster failed with &quot;NoResourceAvailableException: Could not allocate the required slot within slot request timeout&quot; [FLINK-19244] - CSV format can&#39;t deserialize null ROW field [FLINK-19250] - SplitFetcherManager does not propagate errors correctly [FLINK-19253] - SourceReaderTestBase.testAddSplitToExistingFetcher hangs [FLINK-19258] - Fix the wrong example of &quot;csv.line-delimiter&quot; in CSV documentation [FLINK-19280] - The option &quot;sink.buffer-flush.max-rows&quot; for JDBC can&#39;t be disabled by set to zero [FLINK-19281] - LIKE cannot recognize full table path [FLINK-19291] - Fix exception for AvroSchemaConverter#convertToSchema when RowType contains multiple row fields [FLINK-19295] - YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string [FLINK-19300] - Timer loss after restoring from savepoint [FLINK-19321] - CollectSinkFunction does not define serialVersionUID [FLINK-19338] - New source interface cannot unregister unregistered source [FLINK-19361] - Make HiveCatalog thread safe [FLINK-19398] - Hive connector fails with IllegalAccessError if submitted as usercode [FLINK-19401] - Job stuck in restart loop due to excessive checkpoint recoveries which block the JobMaster [FLINK-19423] - Fix ArrayIndexOutOfBoundsException when executing DELETE statement in JDBC upsert sink [FLINK-19433] - An Error example of FROM_UNIXTIME function in document [FLINK-19448] - CoordinatedSourceITCase.testEnumeratorReaderCommunication hangs [FLINK-19535] - SourceCoordinator should avoid fail job multiple times. [FLINK-19557] - Issue retrieving leader after zookeeper session reconnect [FLINK-19585] - UnalignedCheckpointCompatibilityITCase.test:97-&gt;runAndTakeSavepoint: &quot;Not all required tasks are currently running.&quot; [FLINK-19587] - Error result when casting binary type as varchar [FLINK-19618] - Broken link in docs [FLINK-19629] - Fix NullPointException when deserializing map field with null value for Avro format [FLINK-19675] - The plan of is incorrect when Calc contains WHERE clause, composite fields access and Python UDF at the same time [FLINK-19695] - Writing Table with RowTime Column of type TIMESTAMP(3) to Kafka fails with ClassCastException [FLINK-19717] - SourceReaderBase.pollNext may return END_OF_INPUT if SplitReader.fetch throws [FLINK-19740] - Error in to_pandas for table containing event time: class java.time.LocalDateTime cannot be cast to class java.sql.Timestamp [FLINK-19741] - InternalTimeServiceManager fails to restore due to corrupt reads if there are other users of raw keyed state streams [FLINK-19748] - KeyGroupRangeOffsets#KeyGroupOffsetsIterator should skip key groups that don&#39;t have a defined offset [FLINK-19750] - Deserializer is not opened in Kafka consumer when restoring from state [FLINK-19755] - Fix CEP documentation error of the example in &#39;After Match Strategy&#39; section [FLINK-19775] - SystemProcessingTimeServiceTest.testImmediateShutdown is instable [FLINK-19777] - Fix NullPointException for WindowOperator.close() [FLINK-19790] - Writing MAP&lt;STRING, STRING&gt; to Kafka with JSON format produces incorrect data. [FLINK-19806] - Job may try to leave SUSPENDED state in ExecutionGraph#failJob() [FLINK-19816] - Flink restored from a wrong checkpoint (a very old one and not the last completed one) [FLINK-19852] - Managed memory released check can block IterativeTask [FLINK-19867] - Validation fails for UDF that accepts var-args [FLINK-19894] - Use iloc for positional slicing instead of direct slicing in from_pandas [FLINK-19901] - Unable to exclude metrics variables for the last metrics reporter. [FLINK-19906] - Incorrect result when compare two binary fields [FLINK-19907] - Channel state (upstream) can be restored after emission of new elements (watermarks) [FLINK-19909] - Flink application in attach mode could not terminate when the only job is canceled [FLINK-19948] - Calling NOW() function throws compile exception [FLINK-20013] - BoundedBlockingSubpartition may leak network buffer if task is failed or canceled [FLINK-20018] - pipeline.cached-files option cannot escape &#39;:&#39; in path [FLINK-20033] - Job fails when stopping JobMaster [FLINK-20050] - SourceCoordinatorProviderTest.testCheckpointAndReset failed with NullPointerException [FLINK-20063] - File Source requests an additional split on every restore. [FLINK-20064] - Broken links in the documentation [FLINK-20065] - UnalignedCheckpointCompatibilityITCase.test failed with AskTimeoutException [FLINK-20068] - KafkaSubscriberTest.testTopicPatternSubscriber failed with unexpected results [FLINK-20069] - docs_404_check doesn&#39;t work properly [FLINK-20076] - DispatcherTest.testOnRemovedJobGraphDoesNotCleanUpHAFiles does not test the desired functionality [FLINK-20079] - Modified UnalignedCheckpointITCase...MassivelyParallel fails [FLINK-20081] - ExecutorNotifier should run handler in the main thread when receive an exception from the callable. [FLINK-20143] - use `yarn.provided.lib.dirs` config deploy job failed in yarn per job mode [FLINK-20165] - YARNSessionFIFOITCase.checkForProhibitedLogContents: Error occurred during initialization of boot layer java.lang.IllegalStateException: Module system already initialized [FLINK-20175] - Avro Confluent Registry SQL format does not support adding nullable columns [FLINK-20183] - Fix the default PYTHONPATH is overwritten in client side [FLINK-20193] - SourceCoordinator should catch exception thrown from SplitEnumerator.start() [FLINK-20194] - KafkaSourceFetcherManager.commitOffsets() should handle the case when there is no split fetcher. [FLINK-20200] - SQL Hints are not supported in &quot;Create View&quot; syntax [FLINK-20213] - Partition commit is delayed when records keep coming [FLINK-20221] - DelimitedInputFormat does not restore compressed filesplits correctly leading to dataloss [FLINK-20222] - The CheckpointCoordinator should reset the OperatorCoordinators when fail before the first checkpoint. [FLINK-20223] - The RecreateOnResetOperatorCoordinator and SourceCoordinator executor thread should use the user class loader. [FLINK-20243] - Remove useless words in documents [FLINK-20262] - Building flink-dist docker image does not work without python2 [FLINK-20266] - New Sources prevent JVM shutdown when running a job [FLINK-20270] - Fix the regression of missing ExternallyInducedSource support in FLIP-27 Source. [FLINK-20277] - flink-1.11.2 ContinuousFileMonitoringFunction cannot restore from failure [FLINK-20284] - Error happens in TaskExecutor when closing JobMaster connection if there was a python UDF [FLINK-20285] - LazyFromSourcesSchedulingStrategy is possible to schedule non-CREATED vertices [FLINK-20333] - Flink standalone cluster throws metaspace OOM after submitting multiple PyFlink UDF jobs. [FLINK-20351] - Execution.transitionState does not properly log slot location [FLINK-20382] - Exception thrown from JobMaster.startScheduling() may be ignored. [FLINK-20396] - Add &quot;OperatorCoordinator.resetSubtask()&quot; to fix order problems of &quot;subtaskFailed()&quot; [FLINK-20404] - ZooKeeper quorum fails to start due to missing log4j library [FLINK-20413] - Sources should add splits back in &quot;resetSubtask()&quot;, rather than in &quot;subtaskFailed()&quot;. [FLINK-20418] - NPE in IteratorSourceReader [FLINK-20442] - Fix license documentation mistakes in flink-python.jar [FLINK-20492] - The SourceOperatorStreamTask should implement cancelTask() and finishTask() [FLINK-20554] - The Checkpointed Data Size of the Latest Completed Checkpoint is incorrectly displayed on the Overview page of the UI New Feature [FLINK-19934] - [FLIP-27 source] add new API: SplitEnumeratorContext.runInCoordinatorThread(Runnable) Improvement [FLINK-16753] - Exception from AsyncCheckpointRunnable should be wrapped in CheckpointException [FLINK-18139] - Unaligned checkpoints checks wrong channels for inflight data. [FLINK-18500] - Make the legacy planner exception more clear when resolving computed columns types for schema [FLINK-18545] - Sql api cannot specify flink job name [FLINK-18715] - add cpu usage metric of jobmanager/taskmanager [FLINK-19193] - Recommend stop-with-savepoint in upgrade guidelines [FLINK-19225] - Improve code and logging in SourceReaderBase [FLINK-19245] - Set default queue capacity for FLIP-27 source handover queue to 2 [FLINK-19251] - Avoid confusing queue handling in &quot;SplitReader.handleSplitsChanges()&quot; [FLINK-19252] - Jaas file created under io.tmp.dirs - folder not created if not exists [FLINK-19265] - Simplify handling of &#39;NoMoreSplitsEvent&#39; [FLINK-19339] - Support Avro&#39;s unions with logical types [FLINK-19523] - Hide sensitive command-line configurations [FLINK-19569] - Upgrade ICU4J to 67.1+ [FLINK-19677] - TaskManager takes abnormally long time to register with JobManager on Kubernetes [FLINK-19698] - Add close() method and onCheckpointComplete() to the Source. [FLINK-19892] - Replace __metaclass__ field with metaclass keyword [FLINK-20049] - Simplify handling of &quot;request split&quot;. [FLINK-20055] - Datadog API Key exposed in Flink JobManager logs [FLINK-20142] - Update the document for CREATE TABLE LIKE that source table from different catalog is supported [FLINK-20152] - Document which execution.target values are supported [FLINK-20156] - JavaDocs of WatermarkStrategy.withTimestampAssigner are wrong wrt Java 8 [FLINK-20169] - Move emitting MAX_WATERMARK out of SourceOperator processing loop [FLINK-20207] - Improve the error message printed when submitting the pyflink jobs via &#39;flink run&#39; [FLINK-20296] - Explanation of keyBy was broken by find/replace of deprecated forms of keyBy Test [FLINK-18725] - &quot;Run Kubernetes test&quot; failed with &quot;30025: provided port is already allocated&quot; Task [FLINK-20455] - Add check to LicenseChecker for top level /LICENSE files in shaded jars "}),e.add({id:131,href:"/2020/12/10/apache-flink-1.12.0-release-announcement/",title:"Apache Flink 1.12.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to announce the release of Flink 1.12.0! Close to 300 contributors worked on over 1k threads to bring significant improvements to usability as well as new features that simplify (and unify) Flink handling across the API stack.
Release Highlights
The community has added support for efficient batch execution in the DataStream API. This is the next major milestone towards achieving a truly unified runtime for both batch and stream processing.
Kubernetes-based High Availability (HA) was implemented as an alternative to ZooKeeper for highly available production setups.
The Kafka SQL connector has been extended to work in upsert mode, supported by the ability to handle connector metadata in SQL DDL. Temporal table joins can now also be fully expressed in SQL, no longer depending on the Table API.
Support for the DataStream API in PyFlink expands its usage to more complex scenarios that require fine-grained control over state and time, and it’s now possible to deploy PyFlink jobs natively on Kubernetes.
This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent distribution of PyFlink is available on PyPI. Please review the release notes carefully, and check the complete release changelog and updated documentation for more details.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
New Features and Improvements # Batch Execution Mode in the DataStream API # Flink’s core APIs have developed organically over the lifetime of the project, and were initially designed with specific use cases in mind. And while the Table API/SQL already has unified operators, using lower-level abstractions still requires you to choose between two semantically different APIs for batch (DataSet API) and streaming (DataStream API). Since a batch is a subset of an unbounded stream, there are some clear advantages to consolidating them under a single API:
Reusability: efficient batch and stream processing under the same API would allow you to easily switch between both execution modes without rewriting any code. So, a job could be easily reused to process real-time and historical data.
Operational simplicity: providing a unified API would mean using a single set of connectors, maintaining a single codebase and being able to easily implement mixed execution pipelines e.g. for use cases like backfilling.
With these advantages in mind, the community has taken the first step towards the unification of the DataStream API: supporting efficient batch execution (FLIP-134). This means that, in the long run, the DataSet API will be deprecated and subsumed by the DataStream API and the Table API/SQL (FLIP-131). For an overview of the unification effort, refer to this recent Flink Forward talk.
Batch for Bounded Streams
You could already use the DataStream API to process bounded streams (e.g. files), with the limitation that the runtime is not “aware” that the job is bounded. To optimize the runtime for bounded input, the new BATCH mode execution uses sort-based shuffles with aggregations purely in-memory and an improved scheduling strategy (see Pipelined Region Scheduling). As a result, BATCH mode execution in the DataStream API already comes very close to the performance of the DataSet API in Flink 1.12. For more details on the performance benchmark, check the original proposal (FLIP-140).
In Flink 1.12, the default execution mode is STREAMING. To configure a job to run in BATCH mode, you can set the configuration when submitting a job:
bin/flink run -Dexecution.runtime-mode=BATCH examples/streaming/WordCount.jar , or do it programmatically:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setRuntimeMode(RuntimeMode.BATCH); Note: Although the DataSet API has not been deprecated yet, we recommend that users give preference to the DataStream API with BATCH execution mode for new batch jobs, and consider migrating existing DataSet jobs. New Data Sink API (Beta) # Ensuring that connectors can work for both execution modes has already been covered for data sources in the previous release, so in Flink 1.12 the community focused on implementing a unified Data Sink API (FLIP-143). The new abstraction introduces a write/commit protocol and a more modular interface where the individual components are transparently exposed to the framework.
A Sink implementor will have to provide the what and how: a SinkWriter that writes data and outputs what needs to be committed (i.e. committables); and a Committer and GlobalCommitter that encapsulate how to handle the committables. The framework is responsible for the when and where: at what time and on which machine or process to commit.
This more modular abstraction allowed to support different runtime implementations for the BATCH and STREAMING execution modes that are efficient for their intended purpose, but use just one, unified sink implementation. In Flink 1.12, the FileSink connector is the unified drop-in replacement for StreamingFileSink (FLINK-19758). The remaining connectors will be ported to the new interfaces in future releases.
Kubernetes High Availability (HA) Service # Kubernetes provides built-in functionalities that Flink can leverage for JobManager failover, instead of relying on ZooKeeper. To enable a “ZooKeeperless” HA setup, the community implemented a Kubernetes HA service in Flink 1.12 (FLIP-144). The service is built on the same base interface as the ZooKeeper implementation and uses Kubernetes’ ConfigMap objects to handle all the metadata needed to recover from a JobManager failure. For more details and examples on how to configure a highly available Kubernetes cluster, check out the documentation.
Note: This does not mean that the ZooKeeper dependency will be dropped, just that there will be an alternative for users of Flink on Kubernetes. Other Improvements # Migration of existing connectors to the new Data Source API
The previous release introduced a new Data Source API (FLIP-27), allowing to implement connectors that work both as bounded (batch) and unbounded (streaming) sources. In Flink 1.12, the community started porting existing source connectors to the new interfaces, starting with the FileSystem connector (FLINK-19161).
Attention: The unified source implementations will be completely separate connectors that are not snapshot-compatible with their legacy counterparts. Pipelined Region Scheduling (FLIP-119)
Flink’s scheduler has been largely designed to address batch and streaming workloads separately. This release introduces a unified scheduling strategy that identifies blocking data exchanges to break down the execution graph into pipelined regions. This allows to schedule each region only when there’s data to perform work and only deploy it once all the required resources are available; as well as to restart failed regions independently. In particular for batch jobs, the new strategy leads to more efficient resource utilization and eliminates deadlocks.
Support for Sort-Merge Shuffles (FLIP-148)
To improve the stability, performance and resource utilization of large-scale batch jobs, the community introduced sort-merge shuffle as an alternative to the original shuffle implementation that Flink already used. This approach can reduce shuffle time significantly, and uses fewer file handles and file write buffers (which is problematic for large-scale jobs). Further optimizations will be implemented in upcoming releases (FLINK-19614).
Attention: This feature is experimental and not enabled by default. To enable sort-merge shuffles, you can configure a reasonable minimum parallelism threshold in the TaskManager network configuration options. Improvements to the Flink WebUI (FLIP-75)
As a continuation of the series of improvements to the Flink WebUI kicked off in the last release, the community worked on exposing JobManager&rsquo;s memory-related metrics and configuration parameters on the WebUI (FLIP-104). The TaskManager&rsquo;s metrics page has also been updated to reflect the changes to the TaskManager memory model introduced in Flink 1.10 (FLIP-102), adding new metrics for Managed Memory, Network Memory and Metaspace.
Table API/SQL: Metadata Handling in SQL Connectors # Some sources (and formats) expose additional fields as metadata that can be valuable for users to process along with record data. A common example is Kafka, where you might want to e.g. access offset, partition or topic information, read/write the record key or use embedded metadata timestamps for time-based operations. With the new release, Flink SQL supports metadata columns to read and write connector- and format-specific fields for every row of a table (FLIP-107). These columns are declared in the CREATE TABLE statement using the METADATA (reserved) keyword.
CREATE TABLE kafka_table ( id BIGINT, name STRING, event_time TIMESTAMP(3) METADATA FROM &#39;timestamp&#39;, -- access Kafka &#39;timestamp&#39; metadata headers MAP&lt;STRING, BYTES&gt; METADATA -- access Kafka &#39;headers&#39; metadata ) WITH ( &#39;connector&#39; = &#39;kafka&#39;, &#39;topic&#39; = &#39;test-topic&#39;, &#39;format&#39; = &#39;avro&#39; ); In Flink 1.12, metadata is exposed for the Kafka and Kinesis connectors, with work on the FileSystem connector already planned (FLINK-19903). Due to the more complex structure of Kafka records, new properties were also specifically implemented for the Kafka connector to control how to handle the key/value pairs. For a complete overview of metadata support in Flink SQL, check the documentation for each connector, as well as the motivating use cases in the original proposal.
Table API/SQL: Upsert Kafka Connector # For some use cases, like interpreting compacted topics or writing out (updating) aggregated results, it’s necessary to handle Kafka record keys as true primary keys that can determine what should be inserted, deleted or updated. To enable this, the community created a dedicated upsert connector (upsert-kafka) that extends the base implementation to work in upsert mode (FLIP-149).
The new upsert-kafka connector can be used for sources and sinks, and provides the same base functionality and persistence guarantees as the existing Kafka connector, as it reuses most of its code under the hood. To use the upsert-kafka connector, you must define a primary key constraint on table creation, as well as specify the (de)serialization format for the key (key.format) and value (value.format).
Table API/SQL: Support for Temporal Table Joins in SQL # Instead of creating a temporal table function to look up against a table at a certain point in time, you can now simply use the standard SQL clause FOR SYSTEM_TIME AS OF (SQL:2011) to express a temporal table join. In addition, temporal joins are now supported against any kind of table that has a time attribute and a primary key, and not just append-only tables. This unlocks a new set of use cases, like performing temporal joins directly against Kafka compacted topics or database changelogs (e.g. from Debezium).
-- Table backed by a Kafka topic CREATE TABLE orders ( order_id STRING, currency STRING, amount INT, order_time TIMESTAMP(3), WATERMARK FOR order_time AS order_time - INTERVAL &#39;30&#39; SECOND ) WITH ( &#39;connector&#39; = &#39;kafka&#39;, ... ); -- Table backed by a Kafka compacted topic CREATE TABLE latest_rates ( currency STRING, currency_rate DECIMAL(38, 10), currency_time TIMESTAMP(3), WATERMARK FOR currency_time AS currency_time - INTERVAL &#39;5&#39; SECOND, PRIMARY KEY (currency) NOT ENFORCED ) WITH ( &#39;connector&#39; = &#39;upsert-kafka&#39;, ... ); -- Event-time temporal table join SELECT o.order_id, o.order_time, o.amount * r.currency_rate AS amount, r.currency FROM orders AS o JOIN latest_rates FOR SYSTEM_TIME AS OF o.order_time r ON o.currency = r.currency; The previous example also shows how you can take advantage of the new upsert-kafka connector in the context of temporal table joins.
Hive Tables in Temporal Table Joins
You can also perform temporal table joins against Hive tables by either automatically reading the latest table partition as a temporal table (FLINK-19644) or the whole table as a bounded stream tracking the latest version at execution time. Refer to the documentation for examples of using Hive tables in temporal table joins.
Other Improvements to the Table API/SQL # Kinesis Flink SQL Connector (FLINK-18858)
From Flink 1.12, Amazon Kinesis Data Streams (KDS) is natively supported as a source/sink also in the Table API/SQL. The new Kinesis SQL connector ships with support for Enhanced Fan-Out (EFO) and Sink Partitioning. For a complete overview of supported features, configuration options and exposed metadata, check the updated documentation.
Streaming Sink Compaction in the FileSystem/Hive Connector (FLINK-19345)
Many bulk formats, such as Parquet, are most efficient when written as large files; this is a challenge when frequent checkpointing is enabled, as too many small files are created (and need to be rolled on checkpoint). In Flink 1.12, the file sink supports file compaction, allowing jobs to retain smaller checkpoint intervals without generating a large number of files. To enable file compaction, you can set auto-compaction=true in the properties of the FileSystem connector, as described in the documentation.
Watermark Pushdown in the Kafka Connector (FLINK-20041)
To ensure correctness when consuming from Kafka, it’s generally preferable to generate watermarks on a per-partition basis, since the out-of-orderness within a partition is usually lower than across all partitions. Flink will now push down watermark strategies to emit per-partition watermarks from within the Kafka consumer. The output watermark of the source will be determined by the minimum watermark across the partitions it reads, leading to better (i.e. closer to real-time) watermarking. Watermark pushdown also lets you configure per-partition idleness detection to prevent idle partitions from holding back the event time progress of the entire application.
Newly Supported Formats
Format Description Supported Connectors Avro Schema Registry Read and write data serialized with the Confluent Schema Registry KafkaAvroSerializer. Kafka, Upsert Kafka Debezium Avro Read and write Debezium records serialized with the Confluent Schema Registry KafkaAvroSerializer. Kafka Maxwell (CDC) Read and write Maxwell JSON records. Kafka
FileSystem
Raw Read and write raw (byte-based) values as a single column. Kafka, Upsert Kafka
Kinesis
FileSystem
Multi-input Operator for Join Optimization (FLINK-19621)
To eliminate unnecessary serialization and data spilling and improve the performance of batch and streaming Table API/SQL jobs, the default planner now leverages the N-ary stream operator introduced in the last release (FLIP-92) to implement the &ldquo;chaining&rdquo; of operators connected by forward edges.
Type Inference for Table API UDAFs (FLIP-65)
This release concluded the work started in Flink 1.9 on a new data type system for the Table API, with the exposure of aggregate functions (UDAFs) to the new type system. From Flink 1.12, UDAFs behave similarly to scalar and table functions, and support all data types.
PyFlink: Python DataStream API # To expand the usability of PyFlink, this release introduces a first version of the Python DataStream API (FLIP-130) with support for stateless operations (e.g. Map, FlatMap, Filter, KeyBy).
from pyflink.common.typeinfo import Types from pyflink.datastream import MapFunction, StreamExecutionEnvironment class MyMapFunction(MapFunction): def map(self, value): return value + 1 env = StreamExecutionEnvironment.get_execution_environment() data_stream = env.from_collection([1, 2, 3, 4, 5], type_info=Types.INT()) mapped_stream = data_stream.map(MyMapFunction(), output_type=Types.INT()) mapped_stream.print() env.execute(&#34;datastream job&#34;) To give the Python DataStream API a try, you can install PyFlink and check out this tutorial that guides you through building a simple streaming application.
Other Improvements to PyFlink # PyFlink Jobs on Kubernetes (FLINK-17480)
In addition to standalone and YARN deployments, PyFlink jobs can now also be deployed natively on Kubernetes. The deployment documentation has detailed instructions on how to start a session or application cluster on Kubernetes.
User-defined Aggregate Functions (UDAFs)
From Flink 1.12, you can define and register UDAFs in PyFlink (FLIP-139). In contrast to a normal UDF, which doesn’t handle state and operates on a single row at a time, a UDAF is stateful and can be used to compute custom aggregations over multiple input rows. To benefit from vectorization, you can also use Pandas UDAFs (FLIP-137) (up to 10x faster).
Note: General UDAFs are only supported for group aggregations and in _streaming_ mode. For _batch_ mode or window aggregations, use Pandas UDAFs. Important Changes # [FLINK-19319] The default stream time characteristic has been changed to EventTime, so you no longer need to call StreamExecutionEnvironment.setStreamTimeCharacteristic() to enable event time support.
[FLINK-19278] Flink now relies on Scala Macros 2.1.1, so Scala versions &lt; 2.11.11 are no longer supported.
[FLINK-19152] The Kafka 0.10.x and 0.11.x connectors have been removed with this release. If you’re still using these versions, please refer to the documentation to learn how to upgrade to the universal Kafka connector.
[FLINK-18795] The HBase connector has been upgraded to the last stable version (2.2.3).
[FLINK-17877] PyFlink now supports Python 3.8.
[FLINK-18738] To align with FLIP-53, managed memory is now the default also for Python workers. The configurations python.fn-execution.buffer.memory.size and python.fn-execution.framework.memory.size have been removed and will not take effect anymore.
Release Notes # Please review the release notes carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.12. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.
List of Contributors # The Apache Flink community would like to thank each and every one of the 300 contributors that have made this release possible:
Abhijit Shandilya, Aditya Agarwal, Alan Su, Alexander Alexandrov, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Allen Madsen, Andrei Bulgakov, Andrey Zagrebin, Arvid Heise, Authuir, Bairos, Bartosz Krasinski, Benchao Li, Brandon, Brian Zhou, C08061, Canbin Zheng, Cedric Chen, Chesnay Schepler, Chris Nix, Congxian Qiu, DG-Wangtao, Da(Dash)Shen, Dan Hill, Daniel Magyar, Danish Amjad, Danny Chan, Danny Cranmer, David Anderson, Dawid Wysakowicz, Devin Thomson, Dian Fu, Dongxu Wang, Dylan Forciea, Echo Lee, Etienne Chauchot, Fabian Paul, Felipe Lolas, Fin-Chan, Fin-chan, Flavio Pompermaier, Flora Tao, Fokko Driesprong, Gao Yun, Gary Yao, Ghildiyal, GitHub, Grebennikov Roman, GuoWei Ma, Gyula Fora, Hequn Cheng, Herman, Hong Teoh, HuangXiao, HuangXingBo, Husky Zeng, Hyeonseop Lee, I. Raleigh, Ivan, Jacky Lau, Jark Wu, Jaskaran Bindra, Jeff Yang, Jeff Zhang, Jiangjie (Becket) Qin, Jiatao Tao, Jiayi Liao, Jiayi-Liao, Jiezhi.G, Jimmy.Zhou, Jindrich Vimr, Jingsong Lee, JingsongLi, Joey Echeverria, Juha Mynttinen, Jun Qin, Jörn Kottmann, Karim Mansour, Kevin Bohinski, Kezhu Wang, Konstantin Knauf, Kostas Kloudas, Kurt Young, Lee Do-Kyeong, Leonard Xu, Lijie Wang, Liu Jiangang, Lorenzo Nicora, LululuAlu, Luxios22, Marta Paes Moreira, Mateusz Sabat, Matthias Pohl, Maximilian Michels, Miklos Gergely, Milan Nikl, Nico Kruber, Niel Hu, Niels Basjes, Oleksandr Nitavskyi, Paul Lam, Peng, PengFei Li, PengchengLiu, Peter Huang, Piotr Nowojski, PoojaChandak, Qingsheng Ren, Qishang Zhong, Richard Deurwaarder, Richard Moorhead, Robert Metzger, Roc Marshal, Roey Shem Tov, Roman, Roman Khachatryan, Rong Rong, Rui Li, Seth Wiesman, Shawn Huang, ShawnHx, Shengkai, Shuiqiang Chen, Shuo Cheng, SteNicholas, Stephan Ewen, Steve Whelan, Steven Wu, Tartarus0zm, Terry Wang, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, TsReaper, Tzu-Li (Gordon) Tai, Ufuk Celebi, V1ncentzzZ, Vladimirs Kotovs, Wei Zhong, Weike DONG, XBaith, Xiaogang Zhou, Xiaoguang Sun, Xingcan Cui, Xintong Song, Xuannan, Yang Liu, Yangze Guo, Yichao Yang, Yikun Jiang, Yu Li, Yuan Mei, Yubin Li, Yun Gao, Yun Tang, Yun Wang, Zhenhua Yang, Zhijiang, Zhu Zhu, acesine, acqua.csq, austin ce, bigdata-ny, billyrrr, caozhen, caozhen1937, chaojianok, chenkai, chris, cpugputpu, dalong01.liu, darionyaphet, dijie, diohabara, dufeng1010, fangliang, felixzheng, gkrishna, gm7y8, godfrey he, godfreyhe, gsralex, haseeb1431, hequn.chq, hequn8128, houmaozheng, huangxiao, huangxingbo, huzekang, jPrest, jasonlee, jinfeng, jinhai, johnm, jxeditor, kecheng, kevin.cyj, kevinzwx, klion26, leiqiang, libenchao, lijiewang.wlj, liufangliang, liujiangang, liuyongvs, liuyufei9527, lsy, lzy3261944, mans2singh, molsionmo, openopen2, pengweibo, rinkako, sanshi@wwdz.onaliyun.com, secondChoice, seunjjs, shaokan.cao, shizhengchao, shizk233, shouweikun, spurthi chaganti, sujun, sunjincheng121, sxnan, tison, totorooo, venn, vthinkxie, wangsong2, wangtong, wangxiyuan, wangxlong, wangyang0918, wangzzu, weizheng92, whlwanghailong, wineandcheeze, wooplevip, wtog, wudi28, wxp, xcomp, xiaoHoly, xiaolong.wang, yangyichao-mango, yingshin, yushengnan, yushujun, yuzhao.cyz, zhangap, zhangmang, zhangzhanchum, zhangzhanchun, zhangzhanhua, zhangzp, zheyu, zhijiang, zhushang, zhuxiaoshang, zlzhang0122, zodo, zoudan, zouzhiye
`}),e.add({id:132,href:"/2020/12/02/improvements-in-task-scheduling-for-batch-workloads-in-apache-flink-1.12/",title:"Improvements in task scheduling for batch workloads in Apache Flink 1.12",section:"Flink Blog",content:`The Flink community has been working for some time on making Flink a truly unified batch and stream processing system. Achieving this involves touching a lot of different components of the Flink stack, from the user-facing APIs all the way to low-level operator processes such as task scheduling. In this blogpost, we’ll take a closer look at how far the community has come in improving scheduling for batch workloads, why this matters and what you can expect in the Flink 1.12 release with the new pipelined region scheduler.
Towards unified scheduling # Flink has an internal scheduler to distribute work to all available cluster nodes, taking resource utilization, state locality and recovery into account. How do you write a scheduler for a unified batch and streaming system? To answer this question, let&rsquo;s first have a look into the high-level differences between batch and streaming scheduling requirements.
Streaming # Streaming jobs usually require that all operator subtasks are running in parallel at the same time, for an indefinite time. Therefore, all the required resources to run these jobs have to be provided upfront, and all operator subtasks must be deployed at once.
Flink: Streaming job example Because there are no finite intermediate results, a streaming job always has to be restarted fully from a checkpoint or a savepoint in case of failure.
Note A _streaming job_ may generally consist of multiple disjoint pipelines which can be restarted independently. Hence, the full job restart is not required in this case but you can think of each disjoint pipeline as if it were a separate job. Batch # In contrast to streaming jobs, batch jobs usually consist of one or more stages that can have dependencies between them. Each stage will only run for a finite amount of time and produce some finite output (i.e. at some point, the batch job will be finished). Independent stages can run in parallel to improve execution time, but for cases where there are dependencies between stages, a stage may have to wait for upstream results to be produced before it can run. These are called blocking results, and in this case stages cannot run in parallel.
Flink: Batch job example As an example, in the figure above Stage 0 and Stage 1 can run simultaneously, as there is no dependency between them. Stage 3, on the other hand, can only be scheduled once both its inputs are available. There are a few implications from this:
(a) You can use available resources more efficiently by only scheduling stages that have data to perform work;
(b) You can use this mechanism also for failover: if a stage fails, it can be restarted individually, without recomputing the results of other stages.
Scheduling Strategies in Flink before 1.12 # Given these differences, a unified scheduler would have to be good at resource management for each individual stage, be it finite (batch) or infinite (streaming), and also across multiple stages. The existing scheduling strategies in older Flink versions up to 1.11 have been largely designed to address these concerns separately.
“All at once (Eager)”
This strategy is the simplest: Flink just tries to allocate resources and deploy all subtasks at once. Up to Flink 1.11, this is the scheduling strategy used for all streaming jobs. For batch jobs, using “all at once” scheduling would lead to suboptimal resource utilization, since it’s unlikely that such jobs would require all resources upfront, and any resources allocated to subtasks that could not run at a given moment would be idle and therefore wasted.
“Lazy from sources”
To account for blocking results and make sure that no consumer is deployed before their respective producers are finished, Flink provides a different scheduling strategy for batch workloads. “Lazy from sources” scheduling deploys subtasks only once all their inputs are ready. This strategy operates on each subtask individually; it does not identify all subtasks which can (or have to) run at the same time.
A practical example # Let’s take a closer look at the specific case of batch jobs, using as motivation a simple SQL query:
CREATE TABLE customers ( customerId int, name varchar(255) ); CREATE TABLE orders ( orderId int, orderCustomerId int ); --fill tables with data SELECT customerId, name FROM customers, orders WHERE customerId = orderCustomerId Assume that two tables were created in some database: the customers table is relatively small and fits into the local memory (or also on disk). The orders table is bigger, as it contains all orders created by customers, and doesn’t fit in memory. To enrich the orders with the customer name, you have to join these two tables. There are basically two stages in this batch job:
Load the complete customers table into a local map: (customerId, name); because this table is smaller, Process the orders table record by record, enriching it with the name value from the map. Executing the job # The batch job described above will have three operators. For simplicity, each operator is represented with a parallelism of 1, so the resulting ExecutionGraph will consist of three subtasks: A, B and C.
A: load full customers table B: load orders table record by record in a streaming (pipelined) fashion C: join order table records with the loaded customer table This translates into A and C being connected with a blocking data exchange, because the customers table needs to be loaded locally (A) before we start processing the order table (B). B and C are connected with a pipelined data exchange, because the consumer (C) can run as soon as the first result records from B have been produced. You can think of B-&gt;C as a finite streaming job. It’s then possible to identify two separate stages within the ExecutionGraph: A and B-&gt;C.
Flink: SQL Join job example Scheduling Limitations # Imagine that the cluster this job will run in has only one slot and can therefore only execute one subtask. If Flink deploys B chained with C first into this one slot (as B and C are connected with a pipelined edge), C cannot run because A has not produced its blocking result yet. Flink will try to deploy A and the job will fail, because there are no more slots. If there were two slots available, Flink would be able to deploy A and the job would eventually succeed. Nonetheless, the resources of the first slot occupied by B and C would be wasted while A was running.
Both scheduling strategies available as of Flink 1.11 (“all at once” and “lazy from source”) would be affected by these limitations. What would be the optimal approach? In this case, if A was deployed first, then B and C could also complete afterwards using the same slot. The job would succeed even if only a single slot was available.
Note If we could load the \`orders\` table into local memory (making B -> C blocking), then the previous strategy would also succeed with one slot. Nonetheless, we would have to allocate a lot of resources to accommodate the table locally, which may not be required. Last but not least, let’s consider what happens in the case of failover: if the processing of the orders table fails (B-&gt;C), then we do not have to reload the customer table (A); we only need to restart B-&gt;C. This did not work prior to Flink 1.9.
To satisfy the scheduling requirements for batch and streaming and overcome these limitations, the Flink community has worked on a new unified scheduling and failover strategy that is suitable for both types of workloads: pipelined region scheduling.
The new pipelined region scheduling # As you read in the previous introductory sections, an optimal scheduler should efficiently allocate resources for the sub-stages of the pipeline, finite or infinite, running in a streaming fashion. Those stages are called pipelined regions in Flink. In this section, we will take a deeper dive into pipelined region scheduling and failover.
Pipelined regions # The new scheduling strategy analyses the ExecutionGraph before starting the subtask deployment in order to identify its pipelined regions. A pipelined region is a subset of subtasks in the ExecutionGraph connected by pipelined data exchanges. Subtasks from different pipelined regions are connected only by blocking data exchanges. The depicted example of an ExecutionGraph has four pipelined regions and subtasks, A to H:
Flink: Pipelined regions Why do we need the pipelined region? Within the pipelined region all consumers have to constantly consume the produced results to not block the producers and avoid backpressure. Hence, all subtasks of a pipelined region have to be scheduled, restarted in case of failure and run at the same time.
Note (out of scope) In certain cases the _subtasks_ can be connected by _[blocking](#intermediate-results)_ data exchanges within one region. Check [FLINK-17330](https://issues.apache.org/jira/browse/FLINK-17330) for details. Pipelined region scheduling strategy # Once the pipelined regions are identified, each region is scheduled only when all the regions it depends on (i.e. its inputs), have produced their blocking results (for the depicted graph: R2 and R3 after R1; R4 after R2 and R3). If the JobManager has enough resources available, it will try to run as many schedulable pipelined regions in parallel as possible. The subtasks of a pipelined region are either successfully deployed all at once or none at all. The job fails if there are not enough resources to run any of its pipelined regions. You can read more about this effort in the original FLIP-119 proposal.
Failover strategy # As mentioned before, only certain regions are running at the same time. Others have already produced their blocking results. The results are stored locally in TaskManagers where the corresponding subtasks run. If a currently running region fails, it gets restarted to consume its inputs again. If some input results got lost (e.g. the hosting TaskManager failed as well), Flink will rerun their producing regions. You can read more about this effort in the user documentation and the original FLIP-1 proposal.
Benefits # Run any batch job, possibly with limited resources
The subtasks of a pipelined region are deployed only when all necessary conditions for their success are fulfilled: inputs are ready and all needed resources are allocated. Hence, the batch job never gets stuck without notifying the user. The job either eventually finishes or fails after a timeout.
Depending on how the subtasks are allowed to share slots, it is often the case that the whole pipelined region can run within one slot, making it generally possible to run the whole batch job with only a single slot. At the same time, if the cluster provides more resources, Flink will run as many regions as possible in parallel to improve the overall job performance.
No resource waste
As mentioned in the definition of pipelined region, all its subtasks have to run simultaneously. The subtasks of other regions either cannot or do not have to run at the same time. This means that a pipelined region is the minimum subgraph of a batch job’s ExecutionGraph that has to be scheduled at once. There is no way to run the job with fewer resources than needed to run the largest region, and so there can be no resource waste.
Note (out of scope) The amount of resources required to run a region can be further optimized separately. It depends on _co-location constraints_ and _slot sharing groups_ of the region’s _subtasks_. Check [FLINK-18689](https://issues.apache.org/jira/browse/FLINK-18689) for details. Conclusion # Scheduling is a fundamental component of the Flink stack. In this blogpost, we recapped how scheduling affects resource utilization and failover as a part of the user experience. We described the limitations of Flink’s old scheduler and introduced a new approach to tackle them: the pipelined region scheduler, which ships with Flink 1.12. The blogpost also explained how pipelined region failover (introduced in Flink 1.11) works.
Stay tuned for more improvements to scheduling in upcoming releases. If you have any suggestions or questions for the community, we encourage you to sign up to the Apache Flink mailing lists and become part of the discussion.
Appendix # What is scheduling? # ExecutionGraph # A Flink job is a pipeline of connected operators to process data. Together, the operators form a JobGraph. Each operator has a certain number of subtasks executed in parallel. The subtask is the actual execution unit in Flink. Each subtask can consume user records from other subtasks (inputs), process them and produce records for further consumption by other subtasks (outputs) down the stream. There are source subtasks without inputs and sink subtasks without outputs. Hence, the subtasks form the nodes of the ExecutionGraph.
Intermediate results # There are also two major data-exchange types to produce and consume results by operators and their subtasks: pipelined and blocking. They are basically types of edges in the ExecutionGraph.
A pipelined result can be consumed record by record. This means that the consumer can already run once the first result records have been produced. A pipelined result can be a never ending output of records, e.g. in case of a streaming job.
A blocking result can be consumed only when its production is done. Hence, the blocking result is always finite and the consumer of the blocking result can run only when the producer has finished its execution.
Slots and resources # A TaskManager instance has a certain number of virtual slots. Each slot represents a certain part of the TaskManager’s physical resources to run the operator subtasks, and each subtask is deployed into a slot of the TaskManager. A slot can run multiple subtasks from different operators at the same time, usually chained together.
Scheduling strategy # Scheduling in Flink is a process of searching for and allocating appropriate resources (slots) from the TaskManagers to run the subtasks and produce results. The scheduling strategy reacts on scheduling events (like start job, subtask failed or finished etc) to decide which subtask to deploy next.
For instance, it does not make sense to schedule subtasks whose inputs are not ready to consume yet to avoid wasting resources. Another example is to schedule subtasks which are connected with pipelined edges together, to avoid deadlocks caused by backpressure.
`}),e.add({id:133,href:"/2020/11/11/stateful-functions-2.2.1-release-announcement/",title:"Stateful Functions 2.2.1 Release Announcement",section:"Flink Blog",content:`The Apache Flink community released the first bugfix release of the Stateful Functions (StateFun) 2.2 series, version 2.2.1.
This release fixes a critical bug that causes restoring the Stateful Functions cluster from snapshots (checkpoints or savepoints) to fail under certain conditions. Starting from this release, StateFun now creates snapshots with a more robust format that allows it to be restored safely going forward.
We strongly recommend all users to upgrade to 2.2.1. Please see the following sections on instructions and things to keep in mind for this upgrade.
For new users just starting out with Stateful Functions # We strongly recommend to skip all previous versions and start using StateFun from version 2.2.1. This guarantees that failure recovery from checkpoints, or application upgrades using savepoints will work as expected for you.
For existing users on versions &lt;= 2.2.0 # Users that are currently using older versions of StateFun may or may not be able to directly upgrade to 2.2.1 using savepoints taken with the older versions. The Flink community is working hard on a follow-up hotfix release, 2.2.2, that would guarantee that you can perform the upgrade smoothly. For the meantime, you may still try to upgrade to 2.2.1 first, but may encounter FLINK-19741 or FLINK-19748. If you do encounter this, do not worry about data loss; this simply means that the restore failed, and you’d have to wait until 2.2.2 is out in order to upgrade.
The follow-up hotfix release 2.2.2 is expected to be ready within another 2~3 weeks, as it requires a new hotfix release from Flink core, and ultimately an upgrade of the Flink dependency in StateFun. We’ll update the community via the Flink mailing lists as soon as this is ready, so please subscribe to the mailing lists for important updates for this!
You can find the binaries on the updated Downloads page.
This release includes 6 fixes and minor improvements since StateFun 2.2.0. Below is a detailed list of all fixes and improvements:
Bug [FLINK-19515] - Async RequestReply handler concurrency bug [FLINK-19692] - Can&#39;t restore feedback channel from savepoint [FLINK-19866] - FunctionsStateBootstrapOperator.createStateAccessor fails due to uninitialized runtimeContext Improvement [FLINK-19826] - StateFun Dockerfile copies plugins with a specific version instead of a wildcard [FLINK-19827] - Allow the harness to start with a user provided Flink configuration [FLINK-19840] - Add a rocksdb and heap timers configuration validation `}),e.add({id:134,href:"/2020/10/15/from-aligned-to-unaligned-checkpoints-part-1-checkpoints-alignment-and-backpressure/",title:"From Aligned to Unaligned Checkpoints - Part 1: Checkpoints, Alignment, and Backpressure",section:"Flink Blog",content:`Apache Flink’s checkpoint-based fault tolerance mechanism is one of its defining features. Because of that design, Flink unifies batch and stream processing, can easily scale to both very small and extremely large scenarios and provides support for many operational features like stateful upgrades with state evolution or roll-backs and time-travel.
Despite all these great properties, Flink&rsquo;s checkpointing method has an Achilles Heel: the speed of a completed checkpoint is determined by the speed at which data flows through the application. When the application backpressures, the processing of checkpoints is backpressured as well (Appendix 1 recaps what is backpressure and why it can be a good thing). In such cases, checkpoints may take longer to complete or even time out completely.
In Flink 1.11, the community introduced a first version of a new feature called &ldquo;unaligned checkpoints&rdquo; that aims at solving this issue, while Flink 1.12 plans to further expand its functionality. In this two-series blog post, we discuss how Flink’s checkpointing mechanism has been modified to support unaligned checkpoints, how unaligned checkpoints work, and how this new mode impacts Flink users. In the first of the two posts, we start with a recap of the original checkpointing process in Flink, its core properties and issues under backpressure.
State in Streaming Applications # Simply put, state is the information that you need to remember across events. Even the most trivial streaming applications are typically stateful because of their need to “remember” the exact position they are processing data from, for example in the form of a Kafka Partition Offset or a File Offset. In addition, many applications hold state internally as a way to support their internal operations, such as windows, aggregations, joins, or state machines.
For the remainder of this article, we&rsquo;ll use the following example showing a streaming application consisting of four operators, each one holding some state.
State Persistence through Checkpoints # Streaming applications are long-lived. They inevitably experience hardware and software failures but should, ideally, look from the outside as if no failure ever happened. Since applications are long-lived — and can potentially accumulate very large state —, recomputing partial results after failures can take quite some time, and so a way to persist and recover this (potentially very large) application state is necessary.
Flink relies on its state checkpointing and recovery mechanism to implement such behavior, as shown in the figure below. Periodic checkpoints store a snapshot of the application’s state on some Checkpoint Storage (commonly an Object Store or Distributed File System, like S3, HDFS, GCS, Azure Blob Storage, etc.). When a failure is detected, the affected parts of the application are reset to the state of the latest checkpoint (either by a local reset or by loading the state from the checkpoint storage).
Flink’s checkpoint-based approach differs from the approach taken by other stream processing systems that keep state in a distributed database or write state changes to a log, for example. The checkpoint-based approach has some nice properties, described below, which make it a great option for Flink.
Checkpointing has very simple external dependencies: An Object Storage or a Distributed FileSystem are probably the most available and easiest-to-administer services. Because these are available on all public cloud providers and among the first systems to provide on-premises, Flink becomes well-suited for a cloud-native stack. In addition, these storage systems are cheaper by an order of magnitude (GB/month) when compared to distributed databases, key/value stores, or event brokers.
Checkpoints are immutable and versioned: Together with immutable and versioned inputs (as input streams are, by nature), checkpoints support storing immutable application snapshots that can be used for rollbacks, debugging, testing, or as a cheap alternative to analyze application state outside the production setup.
Checkpoints decouple the “stream transport” from the persistence mechanism: “Stream transport” refers to how data is being exchanged between operators (e.g. during a shuffle). This decoupling is key to Flink’s batch &lt;-&gt; streaming unification in one system, because it allows Flink to implement a data transport that can take the shape of both a low-latency streaming exchange or a decoupled batch data exchange.
The Checkpointing Mechanism # The fundamental challenge solved by the checkpointing algorithm (details in this paper) is drawing a snapshot out of the ever-changing state of a streaming application without suspending the continuous processing of events. Because there are always events in-flight (on the network, in I/O buffers, etc.), up- and downstream operators can be processing events from different times: the sink may write data from 11:04, while the source already ingests events from 11:06. Ideally, all snapshotted data should belong to the same point-in-time, as if the input was paused and we waited until all in-flight data was drained (i.e. the pipeline becoming idle) before taking the snapshot.
To achieve that, Flink injects checkpoint barriers into the streams at the sources, which travel through the entire topology and eventually reach the sinks. These barriers divide the stream into a pre-checkpoint epoch (all events that are persisted in state or emitted into sinks) and a post-checkpoint epoch (events not reflected in the state, to be re-processed when resuming from the checkpoint).
The following figure shows what happens when a barrier reaches an operator.
Operators need to make sure that they take the checkpoint exactly when all pre-checkpoint events are processed and no post-checkpoint events have yet been processed. When the first barrier reaches the head of the input buffer queue and is consumed by the operator, the operator starts the so-called alignment phase. During that phase, the operator will not consume any data from the channels where it already received a barrier, until it has received a barrier from all input channels.
Once all barriers are received, the operator snapshots its state, forwards the barrier to the output, and ends the alignment phase, which unblocks all inputs. An operator state snapshot is written into the checkpoint storage, typically asynchronously while data processing continues. Once all operators have successfully written their state snapshot to the checkpoint storage, the checkpoint is successfully completed and can be used for recovery.
One important thing to note here is that the barriers flow with the events, strictly in line. In a healthy setup without backpressure, barriers flow and align in milliseconds. The checkpoint duration is dominated by the time it takes to write the state snapshots to the checkpoint storage, which becomes faster with incremental checkpoints. If the events flow slowly under backpressure, so will the barriers. That means that barriers can take long to travel from sources to sinks resulting in the alignment phase to take even longer to complete.
Recovery # When operators restart from a checkpoint (automatically during recovery or manually during deployment from a savepoint), the operators first restore their state from the checkpoint storage before resuming the event stream processing.
Since sources are bound to the offsets persisted in the checkpoint, recovery time is often calculated as the sum of the time of the recovery process — outlined in the previous figure — and any additional time needed to process any remaining data up to the point right before the system failure. When an application experiences backpressure, recovery time can also include the total time from the very start of the recovery process until backpressure is fully eliminated.
Consistency Guarantees # The alignment phase is only necessary for checkpoints with exactly-once processing semantics, which is the default setting in Flink. If an application runs with at-least-once processing semantics, checkpoints will not block any channels with barriers during alignment, which has an additional cost from the duplication of the then-not-blocked events when recovering the operator.
This is not to be confused with having at-least-once semantics only in the sinks — something that many Flink users choose over transactional sinks — because many sink operations are idempotent or converge to the same result (like inputs/outputs to key/value stores). Having at-least-once semantics in an intermediate operator state is often not idempotent (for example a simple count aggregation) and hence using exactly-once checkpoints is advisable for the majority of Flink users.
Conclusion # This blog post recaps how Flink’s fault tolerance mechanism (based on aligned checkpoints) works, and why checkpointing is a fitting mechanism for a fault-tolerant stream processor. The checkpointing mechanism has been optimized over time to make checkpoints faster and cheaper (with both asynchronous and incremental checkpoints) and faster-to-recover (local caching), but the basic concepts (barriers, alignment, operator state snapshots) are still the same as in the original version.
The next part will dig into a major break with the original mechanism that avoids the alignment phase — the recently-introduced &ldquo;unaligned checkpoints&rdquo;. Stay tuned for the second part, which explains how unaligned checkpoints work and how they guarantee consistent checkpointing times under backpressure.
Appendix 1 - On Backpressure # Backpressure refers to the behavior where a slow receiver (e.g. of data/requests) makes the senders slow down in order to not overwhelm the receiver, something that can result in possibly dropping some of the processed data or requests. This is a crucial and very much desirable behavior for systems where completeness/correctness is important. Backpressure is implicitly implemented in many of the most basic building blocks of distributed communication, such as TCP Flow Control, bounded (blocking) I/O queues, poll-based consumers, etc.
Apache Flink implements backpressure across the entire data flow graph. A sink that (temporarily) cannot keep up with the data rate will result in the source connectors slowing down and pulling data out of the source systems more slowly. We believe that this is a good and desirable behavior, because backpressure is not only necessary in order to avoid overwhelming the memory of a receiver (thread), but can also prevent different stages of the streaming application from drifting apart too far.
Consider the example below:
We have a source (let’s say reading data from Apache Kafka), parsing data, grouping and aggregating data by a key, and writing it to a sink system (some database). The application needs to re-group data by key between the parsing and the grouping/aggregation step. Let’s assume we use a non-backpressure approach, like writing the data to a log/MQ for the data re-grouping over the network (the approach used by Kafka Streams). If the sink is now slower than the remaining parts of the streaming application (which can easily happen), the first stage (source and parse) will still work as fast as possible to pull data out of the source, parse it, and put it into the log for the shuffle. That intermediate log will accumulate a lot of data, meaning it needs significant capacity, so that in a worst case scenario can hold a full copy of the input data or otherwise result in data loss (when the drift is greater than the retention time).
With backpressure, the source/parse stage slows down to match the speed of the sink, keeping both parts of the application closer together in their progress through the data, and avoiding the need to provision a lot of intermediate storage capacity.
We&rsquo;d like to thank Marta Paes Moreira and Markos Sfikas for the wonderful review process.
`}),e.add({id:135,href:"/2020/10/13/stateful-functions-internals-behind-the-scenes-of-stateful-serverless/",title:"Stateful Functions Internals: Behind the scenes of Stateful Serverless",section:"Flink Blog",content:`Stateful Functions (StateFun) simplifies the building of distributed stateful applications by combining the best of two worlds: the strong messaging and state consistency guarantees of stateful stream processing, and the elasticity and serverless experience of today&rsquo;s cloud-native architectures and popular event-driven FaaS platforms. Typical StateFun applications consist of functions deployed behind simple services using these modern platforms, with a separate StateFun cluster playing the role of an “event-driven database” that provides consistency and fault-tolerance for the functions&rsquo; state and messaging.
But how exactly does StateFun achieve that? How does the StateFun cluster communicate with the functions?
This blog dives deep into the internals of the StateFun runtime. The entire walkthrough is complemented by a demo application which can be completely deployed on AWS services. Most significantly, in the demo, the stateful functions are deployed and serviced using AWS Lambda, a popular FaaS platform among many others. The goal here is to allow readers to have a good grasp of the interaction between the StateFun runtime and the functions, how that works cohesively to provide a Stateful Serverless experience, and how they can apply what they&rsquo;ve learnt to deploy their StateFun applications on other public cloud offerings such as GCP or Microsoft Azure.
Introducing the example: Shopping Cart # Note You can find the full code [here](https://github.com/tzulitai/statefun-aws-demo/blob/master/app/shopping_cart.py), which uses StateFun's [Python SDK](//nightlies.apache.org/flink/flink-statefun-docs-master/sdk/python.html). Alternatively, if you are unfamiliar with StateFun's API, you can check out this [earlier blog](https://flink.apache.org/2020/08/19/statefun.html) on modeling applications and stateful entities using [StateFun's programming constructs](//nightlies.apache.org/flink/flink-statefun-docs-master/concepts/application-building-blocks.html). Let’s first take a look at a high-level overview of the motivating demo for this blog post: a shopping cart application. The diagram below covers the functions that build up the application, the state that the functions would keep, and the messages that flow between them. We’ll be referencing this example throughout the blog post.
Fig.1: An overly simplified shopping cart application. The application consists of two function types: a cart function and an inventory function. Each instance of the cart function is associated with a single user entity, with its state being the items in the cart for that user (ItemsInCart). In the same way, each instance of the inventory function represents a single inventory, maintaining as state the number of items in stock (NumInStock) as well as the number of items reserved across all user carts (NumReserved). Messages can be sent to function instances using their logical addresses, which consists of the function type and the instance&rsquo;s entity ID, e.g. (cart:Kim) or (inventory:socks).
There are two external messages being sent to and from the shopping cart application via ingresses and egresses: AddToCart, which is sent to the ingress when an item is added to a user’s cart (e.g. sent by a front-end web application), and AddToCartResult, which is sent back from our application to acknowledge the action.
AddToCart messages are handled by the cart function, which in turn invokes other functions to form the main logic of the application. To keep things simple, only two messages between functions are demonstrated: RequestItem, sent from the cart function to the inventory function, representing a request to reserve an item, and ItemReserved, which is a response from the inventory function to acknowledge the request.
What happens in the Stateful Functions runtime? # Now that we understand the business logic of the shopping cart application, let&rsquo;s take a closer look at what keeps the state of the functions and messages sent between them consistent and fault-tolerant: the StateFun cluster.
Fig.2: Simplified view of a StateFun app deployment. The StateFun runtime is built on-top of Apache Flink, and applies the same battle-tested technique that Flink uses as the basis for strongly consistent stateful streaming applications - co-location of state and messaging. In a StateFun application, all messages are routed through the StateFun cluster, including messages sent from ingresses, messages sent between functions, and messages sent from functions to egresses. Moreover, the state of all functions is maintained in the StateFun cluster as well. Like Flink, the message streams flowing through the StateFun cluster and function state are co-partitioned so that compute has local state access, and any updates to the state can be handled atomically with computed side-effects, i.e. messages to send to other functions.
In more solid terms, take for example a message that is targeted for the logical address (cart, &quot;Kim&quot;) being routed through StateFun. Logical addresses are used in StateFun as the partitioning key for both message streams and state, so that the resulting StateFun cluster partition that this message ends up in will have the state for (cart, &quot;Kim&quot;) locally available.
The only difference here for StateFun, compared to Flink, is that the actual compute doesn&rsquo;t happen within the StateFun cluster partitions - computation happens remotely in the function services. So how does StateFun route input messages to the remote function services and provide them with state access, all the while preserving the same consistency guarantees as if state and compute were co-located?
Remote Invocation Request-Reply Protocol # A StateFun cluster partition communicates with the functions using a slim and well-defined request-reply protocol, as illustrated in Fig. 3. Upon receiving an input message, the cluster partition invokes the target functions via their HTTP service endpoint. The service request body carries input events and current state for the functions, retrieved from local state. Any outgoing messages and state mutations as a result of invocations are sent back through StateFun as part of the service response. When the StateFun cluster partition receives the response, all state mutations are written back to local state and outgoing messages are routed to other cluster partitions, which in turn invokes other function services.
Fig.3: The remote invocation request/reply protocol. Under the hood, StateFun SDKs like the Python SDK and other 3rd party SDKs for other languages all implement this protocol. From the user&rsquo;s perspective, they are programming with state local to their function deployment, whereas in reality, state is maintained in StateFun and supplied through this protocol. It is easy to add more language SDKs, as long as the language can handle HTTP requests and responses.
Function state consistency and fault-tolerance # The runtime makes sure that only one invocation per function instance (e.g. (cart, &quot;Kim&quot;)) is ongoing at any point in time (i.e. invocations per entity are serial). If an invocation is ongoing while new messages for the same function instance arrives, the messages are buffered in state and sent as a single batch as soon as the ongoing invocation completes.
In addition, since each request happens in complete isolation and all relevant information is encapsulated in each request and response, function invocations are effectively idempotent (i.e. results depend purely on the provided context of the invocation) and can be retried. This naturally avoids violating consistency in case any function service hiccups occur.
For fault tolerance, all function state managed in the StateFun cluster is periodically and asynchronously checkpointed to a blob storage (e.g. HDFS, S3, GCS) using Flink’s original distributed snapshot mechanism. These checkpoints contain a globally consistent view of state across all functions of the application, including the offset positions in ingresses and the ongoing transaction state in egresses. In the case of an abrupt failure, the system may restore from the latest available checkpoint: all function states will be restored and all events between the checkpoint and the crash will be re-processed (and the functions re-invoked) with identical routing, all as if the failure never happened.
Step-by-step walkthrough of function invocations # Let&rsquo;s conclude this section by going through the actual messages that flow between StateFun and the functions in our shopping cart demo application!
Customer &ldquo;Kim&rdquo; puts 2 socks into his shopping cart (Fig. 4):
Fig.4: Message flow walkthrough. An event AddToCart(&quot;Kim&quot;, &quot;socks&quot;, 2) comes through one of the ingress partitions (1). The ingress event router is configured to route AddToCart events to the function type cart, taking the user ID (&quot;Kim&quot;) as the instance ID. The function type and instance ID together define the logical address of the target function instance for the event (cart:Kim).
Let&rsquo;s assume the event is read by StateFun partition B, but the (cart:Kim) address is owned by partition A. The event is thus routed to partition A (2).
StateFun Partition A receives the event and processes it:
First, the runtime fetches the state for (cart:Kim) from the local state store, i.e. the existing items in Kim’s cart (3). Next, it marks (cart:Kim) as &ldquo;busy&rdquo;, meaning an invocation is happening. This buffers other messages targeted for (cart:Kim) in state until this invocation is completed. The runtime grabs a free HTTP client connection and sends a request to the cart function type&rsquo;s service endpoint. The request contains the AddToCart(&quot;Kim&quot;, &quot;socks&quot;, 2) message and the current state for (cart:Kim) (4). The remote cart function service receives the event and attempts to reserve socks with the inventory function. Therefore, it replies to the invocation with a new message RequestItem(&quot;socks&quot;, 2) targeted at the address (inventory:socks). Any state modifications will be included in the response as well, but in this case there aren’t any state modifications yet (i.e. Kim’s cart is still empty until a reservation acknowledgement is received from the inventory service) (5). The StateFun runtime receives the response, routes the RequestItem message to other partitions, and marks (cart:Kim) as &ldquo;available&rdquo; again for invocation. Assuming that the (inventory:socks) address is owned by partition B, the message is routed to partition B (6).
Once partition B receives the RequestItem message, the runtime invokes the function (inventory:socks) in the same way as described above, and receives a reply with a modification of the state of the inventory (the number of reserved socks is now increased by 2). (inventory:socks) now also wants to reply reservation of 2 socks for Kim, so an ItemReserved(&quot;socks&quot;, 2) message targeted for (cart:Kim) is also included in the response (7), which will again be routed by the StateFun runtime.
Stateful Serverless in the Cloud with FaaS and StateFun # We&rsquo;d like to wrap up this blog by re-emphasizing how the StateFun runtime works well with cloud-native architectures, and provide an overview of what your complete StateFun application deployment would look like using public cloud services.
As you&rsquo;ve already learnt in previous sections, invocation requests themselves are stateless, with all necessary information for an invocation included in the HTTP request (i.e. input events and state access), and all side-effects of the invocation included in the HTTP response (i.e. outgoing messages and state modifications).
Fig.5: Complete deployment example on AWS. A natural consequence of this characteristic is that there is no session-related dependency between individual HTTP requests, making it very easy to horizontally scale the function deployments. This makes it very easy to deploy your stateful functions using FaaS platforms solutions, allowing them to rapidly scale out, scale to zero, or be upgraded with zero-downtime.
In our complementary demo code, you can find here the exact code on how to expose and service StateFun functions through AWS Lambda. Likewise, this is possible for any other FaaS platform that supports triggering the functions using HTTP endpoints (and other transports as well in the future).
Fig. 5 on the right illustrates what a complete AWS deployment of a StateFun application would look like, with functions serviced via AWS Lambda, AWS Kinesis streams as ingresses and egresses, AWS EKS managed Kubernetes cluster to run the StateFun cluster, and an AWS S3 bucket to store the periodic checkpoints. You can also follow the instructions in the demo code to try it out and deploy this yourself right away!
If you’d like to learn more about Stateful Functions, head over to the official documentation, where you can also find more hands-on tutorials to try out yourself!
`}),e.add({id:136,href:"/2020/09/28/stateful-functions-2.2.0-release-announcement/",title:"Stateful Functions 2.2.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.2.0! This release introduces major features that extend the SDKs, such as support for asynchronous functions in the Python SDK, new persisted state constructs, and a new SDK that allows embedding StateFun functions within a Flink DataStream job. Moreover, we&rsquo;ve also included important changes that improve out-of-the-box stability for common workloads, as well as increased observability for operational purposes.
We&rsquo;ve also seen new 3rd party SDKs for StateFun being developed since the last release. While they are not part of the release artifacts, it&rsquo;s great seeing these community-driven additions! We&rsquo;ve highlighted these efforts below in this announcement.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Python SDK distribution is available on PyPI. For more details, check the complete release changelog and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA!
New Features # Asynchronous functions in Python SDK # This release enables registering asynchronous Python functions as stateful functions by introducing a new handler in the Python SDK: AsyncRequestReplyHandler. This allows serving StateFun functions with Python web frameworks that support asynchronous IO natively (for example, aiohttp):
from statefun import StatefulFunctions from statefun import AsyncRequestReplyHandler functions = StatefulFunctions() @functions.bind(&#34;example/greeter&#34;) async def greeter(context, message): html = await fetch(session, &#39;http://....&#39;) context.pack_and_reply(SomeProtobufMessage(html)) # expose this handler via an async web framework handler = AsyncRequestReplyHandler(functions) For more details, please see the docs on exposing Python functions.
Flink DataStream Integration SDK # Using this SDK, you may combine pipelines written with the Flink DataStream API or higher-level libraries (such as Table API, CEP etc., basically anything that can consume or produce a DataStream) with the programming constructs provided by Stateful Functions, as demonstrated below:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream&lt;RoutableMessage&gt; namesIngress = ... StatefulFunctionEgressStreams egresses = StatefulFunctionDataStreamBuilder.builder(&#34;example&#34;) .withDataStreamAsIngress(namesIngress) .withRequestReplyRemoteFunction( RequestReplyFunctionBuilder.requestReplyFunctionBuilder( REMOTE_GREET, URI.create(&#34;http://...&#34;)) .withPersistedState(&#34;seen_count&#34;) .withFunctionProvider(GREET, unused -&gt; new MyFunction()) .withEgressId(GREETINGS) .build(env); DataStream&lt;String&gt; responsesEgress = getDataStreamForEgressId(GREETINGS); Events from DataStream ingresses are being routed to bound functions, and events sent to egresses are captured as DataStream egresses. This opens up the possibility of building complex streaming applications.
Construct for Dynamic State Registration # Prior to this release, the persisted state constructs in the Java SDK, such as PersistedValue, PersistedTable etc., had to be eagerly defined in a stateful function&rsquo;s class. In certain scenarios, what state a function requires is not known in advance, and may only be dynamically registered at runtime (e.g., when a function is invoked).
This release enables that by providing a new PersistedStateRegistry construct:
public class MyFunction implements StatefulFunction { @Persisted private final PersistedStateRegistry registry = new PersistedStateRegistry(); private final PersistedValue&lt;String&gt; myValue; public void invoke(Context context, Object input) { if (myValue == null) { myValue = registry.registerValue(PersistedValue.of(&#34;my-value&#34;, String.class)); } ... } } Improvements # Remote Functions Communication Stability # After observing common workloads, a few configurations for communicating with remote functions were adjusted for a better out-of-the-box connection stability. This includes the following:
The underlying connection pool was tuned for low latency, high throughput workloads. This allows StateFun to reuse existing connections much more aggressively and avoid re-establishing a connection for each request. StateFun applies backpressure once the total number of uncompleted requests reaches a per JVM threshold (statefun.async.max-per-task), but observing typical workloads we have discovered that the default value is set too high. In this release the default was reduced to improve stability and resource consumption, in the face of a slow-responding remote function. Operational observability of a StateFun Application # One major goal of this release was to take a necessary step towards supporting auto-scaling of remote functions. Towards that end, we&rsquo;ve exposed several metrics related to workload of remote functions and resulting backpressure applied by the function dispatchers. This includes the following:
Per function type invocation duration / latency histograms Per function type backlog size Per JVM (StateFun worker) and per function type number of in-flight invocations The full list of metrics and their descriptions can be found here.
Fine-grained control over remote connection lifecycle # With this release, it&rsquo;s possible to set individual timeouts for overall duration and individual read and write IO operations of HTTP requests with remote functions. You can find the corresponding field names in a function spec that defines these timeout values here.
3rd Party SDKs # Since the last release, we&rsquo;ve seen new 3rd party SDKs for different languages being implemented on top of StateFun&rsquo;s remote function HTTP request-reply protocol, including Go and Rust implementations. While these SDKs are not endorsed or maintained by the Apache Flink PMC and currently not part of the current releases, it is great to see these new additions that demonstrate the extensibility of the framework.
For that reason, we&rsquo;ve added a new page in the documentation to list the 3rd party SDKs that the community is aware of. If you&rsquo;ve also worked on a new language SDK for StateFun that is stable and you plan to continue maintaining, please consider letting the community know of your work by submitting a pull request to add your project to the list!
Important Patch Notes # Below is a list of user-facing interface and configuration changes, dependency version upgrades, or removal of supported versions that would be important to be aware of when upgrading your StateFun applications to this version:
[FLINK-18812] The Flink version in StateFun 2.2 has been upgraded to 1.11.1. [FLINK-19203] Upgraded Scala version to 2.12, and dropped support for 2.11. [FLINK-19190] All existing metric names have been renamed to be camel-cased instead of snake-cased, to conform with the Flink metric naming conventions. This breaks existing deployments if you depended on previous metrics. [FLINK-19192] The connection pool size for remote function HTTP requests have been increased to 1024, with a stale TTL of 1 minute. [FLINK-19191] The default max number of asynchronous operations per JVM (StateFun worker) has been decreased to 1024. Release Notes # Please review the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 2.2.0.
List of Contributors # The Apache Flink community would like to thank all contributors that have made this release possible:
abc863377, Authuir, Chesnay Schepler, Congxian Qiu, David Anderson, Dian Fu, Francesco Guardiani, Igal Shilman, Marta Paes Moreira, Patrick Wiener, Rafi Aroch, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai, Ufuk Celebi
If you’d like to get involved, we’re always looking for new contributors.
`}),e.add({id:137,href:"/2020/09/17/apache-flink-1.11.2-released/",title:"Apache Flink 1.11.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.11 series.
This release includes 96 fixes and minor improvements for Flink 1.11.1. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.11.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.11.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-16087] - Translate &quot;Detecting Patterns&quot; page of &quot;Streaming Concepts&quot; into Chinese [FLINK-18264] - Translate the &quot;External Resource Framework&quot; page into Chinese [FLINK-18628] - Invalid error message for overloaded methods with same parameter name [FLINK-18801] - Add a &quot;10 minutes to Table API&quot; document under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-18910] - Create the new document structure for Python documentation according to FLIP-133 [FLINK-18912] - Add a Table API tutorial link(linked to try-flink/python_table_api.md) under the &quot;Python API&quot; -&gt; &quot;GettingStart&quot; -&gt; &quot;Tutorial&quot; section [FLINK-18913] - Add a &quot;TableEnvironment&quot; document under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-18917] - Add a &quot;Built-in Functions&quot; link (linked to dev/table/functions/systemFunctions.md) under the &quot;Python API&quot; -&gt; &quot;User Guide&quot; -&gt; &quot;Table API&quot; section [FLINK-19110] - Flatten current PyFlink documentation structure Bug [FLINK-14087] - throws java.lang.ArrayIndexOutOfBoundsException when emiting the data using RebalancePartitioner. [FLINK-15467] - Should wait for the end of the source thread during the Task cancellation [FLINK-16510] - Task manager safeguard shutdown may not be reliable [FLINK-16827] - StreamExecTemporalSort should require a distribution trait in StreamExecTemporalSortRule [FLINK-18081] - Fix broken links in &quot;Kerberos Authentication Setup and Configuration&quot; doc [FLINK-18212] - Init lookup join failed when use udf on lookup table [FLINK-18341] - Building Flink Walkthrough Table Java 0.1 COMPILATION ERROR [FLINK-18421] - Elasticsearch (v6.3.1) sink end-to-end test instable [FLINK-18468] - TaskExecutorITCase.testJobReExecutionAfterTaskExecutorTermination fails with DuplicateJobSubmissionException [FLINK-18552] - Update migration tests in master to cover migration from release-1.11 [FLINK-18581] - Cannot find GC cleaner with java version previous jdk8u72(-b01) [FLINK-18588] - hive ddl create table should support &#39;if not exists&#39; [FLINK-18595] - Deadlock during job shutdown [FLINK-18600] - Kerberized YARN per-job on Docker test failed to download JDK 8u251 [FLINK-18608] - CustomizedConvertRule#convertCast drops nullability [FLINK-18612] - WordCount example failure when setting relative output path [FLINK-18632] - RowData&#39;s row kind do not assigned from input row data when sink code generate and physical type info is pojo type [FLINK-18639] - Error messages from BashJavaUtils are eaten [FLINK-18641] - &quot;Failure to finalize checkpoint&quot; error in MasterTriggerRestoreHook [FLINK-18646] - Managed memory released check can block RPC thread [FLINK-18650] - The description of dispatcher in Flink Architecture document is not accurate [FLINK-18655] - Set failOnUnableToExtractRepoInfo to false for git-commit-id-plugin in module flink-runtime [FLINK-18656] - Start Delay metric is always zero for unaligned checkpoints [FLINK-18659] - FileNotFoundException when writing Hive orc tables [FLINK-18663] - RestServerEndpoint may prevent server shutdown [FLINK-18665] - Filesystem connector should use TableSchema exclude computed columns [FLINK-18672] - Fix Scala code examples for UDF type inference annotations [FLINK-18677] - ZooKeeperLeaderRetrievalService does not invalidate leader in case of SUSPENDED connection [FLINK-18682] - Vector orc reader cannot read Hive 2.0.0 table [FLINK-18697] - Adding flink-table-api-java-bridge_2.11 to a Flink job kills the IDE logging [FLINK-18700] - Debezium-json format throws Exception when PG table&#39;s IDENTITY config is not FULL [FLINK-18705] - Debezium-JSON throws NPE when tombstone message is received [FLINK-18708] - The links of the connector sql jar of Kafka 0.10 and 0.11 are extinct [FLINK-18710] - ResourceProfileInfo is not serializable [FLINK-18748] - Savepoint would be queued unexpected if pendingCheckpoints less than maxConcurrentCheckpoints [FLINK-18749] - Correct dependencies in Kubernetes pom [FLINK-18750] - SqlValidatorException thrown when select from a view which contains a UDTF call [FLINK-18769] - MiniBatch doesn&#39;t work with FLIP-95 source [FLINK-18821] - Netty client retry mechanism may cause PartitionRequestClientFactory#createPartitionRequestClient to wait infinitely [FLINK-18832] - BoundedBlockingSubpartition does not work with StreamTask [FLINK-18856] - CheckpointCoordinator ignores checkpointing.min-pause [FLINK-18859] - ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources failed with &quot;Condition was not met in given timeout.&quot; [FLINK-18862] - Fix LISTAGG throws BinaryRawValueData cannot be cast to StringData exception in runtime [FLINK-18867] - Generic table stored in Hive catalog is incompatible between 1.10 and 1.11 [FLINK-18900] - HiveCatalog should error out when listing partitions with an invalid spec [FLINK-18902] - Cannot serve results of asynchronous REST operations in per-job mode [FLINK-18941] - There are some typos in &quot;Set up JobManager Memory&quot; [FLINK-18942] - HiveTableSink shouldn&#39;t try to create BulkWriter factory when using MR writer [FLINK-18956] - StreamTask.invoke should catch Throwable instead of Exception [FLINK-18959] - Fail to archiveExecutionGraph because job is not finished when dispatcher close [FLINK-18992] - Table API renameColumns method annotation error [FLINK-18993] - Invoke sanityCheckTotalFlinkMemory method incorrectly in JobManagerFlinkMemoryUtils.java [FLINK-18994] - There is one typo in &quot;Set up TaskManager Memory&quot; [FLINK-19040] - SourceOperator is not closing SourceReader [FLINK-19061] - HiveCatalog fails to get partition column stats if partition value contains special characters [FLINK-19094] - Revise the description of watermark strategy in Flink Table document [FLINK-19108] - Stop expanding the identifiers with scope aliased by the system with &#39;EXPR$&#39; prefix [FLINK-19109] - Split Reader eats chained periodic watermarks [FLINK-19121] - Avoid accessing HDFS frequently in HiveBulkWriterFactory [FLINK-19133] - User provided kafka partitioners are not initialized correctly [FLINK-19148] - Table crashed in Flink Table API &amp; SQL Docs [FLINK-19166] - StreamingFileWriter should register Listener before the initialization of buckets Improvement [FLINK-16619] - Misleading SlotManagerImpl logging for slot reports of unknown task manager [FLINK-17075] - Add task status reconciliation between TM and JM [FLINK-17285] - Translate &quot;Python Table API&quot; page into Chinese [FLINK-17503] - Make memory configuration logging more user-friendly [FLINK-18598] - Add instructions for asynchronous execute in PyFlink doc [FLINK-18618] - Docker e2e tests are failing on CI [FLINK-18619] - Update training to use WatermarkStrategy [FLINK-18635] - Typo in &#39;concepts/timely stream processing&#39; part of the website [FLINK-18643] - Migrate Jenkins jobs to ci-builds.apache.org [FLINK-18644] - Remove obsolete doc for hive connector [FLINK-18730] - Remove Beta tag from SQL Client docs [FLINK-18772] - Hide submit job web ui elements when running in per-job/application mode [FLINK-18793] - Fix Typo for api.common.eventtime.WatermarkStrategy Description [FLINK-18797] - docs and examples use deprecated forms of keyBy [FLINK-18816] - Correct API usage in Pyflink Dependency Management page [FLINK-18831] - Improve the Python documentation about the operations in Table [FLINK-18839] - Add documentation about how to use catalog in Python Table API [FLINK-18847] - Add documentation about data types in Python Table API [FLINK-18849] - Improve the code tabs of the Flink documents [FLINK-18881] - Modify the Access Broken Link [FLINK-19055] - MemoryManagerSharedResourcesTest contains three tests running extraordinary long [FLINK-19105] - Table API Sample Code Error Task [FLINK-18666] - Update japicmp configuration for 1.11.1 [FLINK-18667] - Data Types documentation misunderstand users [FLINK-18678] - Hive connector fails to create vector orc reader if user specifies incorrect hive version `}),e.add({id:138,href:"/2020/09/04/flink-community-update-august20/",title:"Flink Community Update - August'20",section:"Flink Blog",content:`Ah, so much for a quiet August month. This time around, we bring you some new Flink Improvement Proposals (FLIPs), a preview of the upcoming Flink Stateful Functions 2.2 release and a look into how far Flink has come in comparison to 2019.
The Past Month in Flink # Flink Releases # Getting Ready for Flink Stateful Functions 2.2 # The details of the next release of Stateful Functions are under discussion in this @dev mailing list thread, and the feature freeze is set for September 10th — so, you can expect Stateful Functions 2.2 to be released soon after! Some of the most relevant features in the upcoming release are:
DataStream API interoperability, allowing users to embed Stateful Functions pipelines in regular DataStream API programs with DataStream ingress/egress.
Fine-grained control over state for remote functions, including the ability to configure different state expiration modes for each individual function.
As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration. If you’d like to get involved, we’re always looking for new contributors!
Flink 1.10.2 # The community has announced the second patch version to cover some outstanding issues in Flink 1.10. You can find a detailed list with all the improvements and bugfixes that went into Flink 1.10.2 in the announcement blogpost.
New Flink Improvement Proposals (FLIPs) # The number of FLIPs being created and discussed in the @dev mailing list is growing week over week, as the Flink 1.12 release takes form and some longer-term efforts are kicked-off. Below are some of the new FLIPs to keep an eye out for!
# FLIP-131 Consolidate User-Facing APIs and Deprecate the DataSet API The community proposes to deprecate the DataSet API in favor of the Table API/SQL and the DataStream API, in the long run. For this to be feasible, both APIs first need to be adapted and expanded to support the additional use cases currently covered by the DataSet API.
The first discussion to branch out of this "umbrella" FLIP is around support for a batch execution mode in the DataStream API (FLIP-134).
FLIP-135 Approximate Task-Local Recovery To better accommodate recovery scenarios where a certain amount of data loss is tolerable, but a full pipeline restart is not desirable, the community plans to introduce a new failover strategy that allows to restart only the failed task(s). Approximate task-local recovery will allow users to trade consistency for fast failure recovery, which is handy for use cases like online training.
FLIP-136 Improve the interoperability between DataStream and Table API The Table API has seen a great deal of refactoring and new features in recent releases, but the interfaces to and from the DataStream API haven't been updated accordingly. The work in this FLIP will cover multiple known gaps to improve interoperability and expose important functionality also to the DataStream API (e.g. changelog handling).
FLIP-139 Support Stateful Python UDFs Python UDFs have been supported in PyFlink since 1.10, but were so far limited to stateless functions. The community is now looking to introduce stateful aggregate functions (UDAFs) in the Python Table API.
Note: Pandas UDAFs are covered in a separate proposal (FLIP-137).
For a complete overview of the development threads coming up in the project, check the Flink 1.12 Release Wiki and follow the feature discussions in the @dev mailing list.
New Committers and PMC Members # The Apache Flink community has welcomed 1 new PMC Member and 1 new Committer since the last update. Congratulations!
New PMC Members # Dian Fu
New Committers # David Anderson
The Bigger Picture # Flink in 2019: the Aftermath # Roughly a year ago, we did a roundup of community stats to understand how far Flink (and the Flink community) had come in 2019. Where does Flink stand now? What changed?
Perhaps the most impressive result this time around is the surge in activity in the @user-zh mailing list. What started as an effort to better support the chinese-speaking users early in 2019 is now even exceeding the level of activity of the (already very active) main @user mailing list. Also @dev1 registered the highest ever peaks in activity in the months leading to the release of Flink 1.11!
For what it&rsquo;s worth, the Flink GitHub repository is now headed to 15k stars, after reaching the 10k milestone last year. If you consider some other numbers we gathered previously on repository activity and releases over time, 2020 is looking like one for the books in the Flink community.
1. Excluding messages from &ldquo;jira@apache.org&rdquo;.
To put these numbers into perspective, the report for the financial year of 2020 from the Apache Software Foundation (ASF) features Flink as one of the most active open source projects, with mentions for:
Most Active Sources: Visits (#2) * Top Repositories by Number of Commits (#2) * Top Most Active Apache Mailing Lists (@user (#1) and @dev (#2)) For more details on where Flink and other open source projects stand in the bigger ASF picture, check out the full report.
Google Season of Docs 2020 Results # In a previous update, we announced that Flink had been selected for Google Season of Docs (GSoD) 2020, an initiative to pair technical writers with mentors to work on documentation for open source projects. Today, we&rsquo;d like to welcome the two technical writers that will be working with the Flink community to improve the Table API/SQL documentation: Kartik Khare and Muhammad Haseeb Asif!
Kartik is a software engineer at Walmart Labs and a regular contributor to multiple Apache projects. He is also a prolific writer on Medium and has previously published on the Flink blog. Last year, he contributed to Apache Airflow as part of GSoD and he&rsquo;s currently revamping the Apache Pinot documentation.
Muhammad is a dual degree master student at KTH and TU Berlin, with a focus on distributed systems and data intensive processing (in particular, performance optimization of state backends). He writes frequently about Flink on Medium and you can catch him at Flink Forward later this year!
We&rsquo;re looking forward to the next 3 months of collaboration, and would like to thank again all the applicants that invested time into their applications for GSoD with Flink.
Upcoming Events (and More!) # With conference season in full swing, we&rsquo;re glad to see some great Flink content coming up in September! Here, we highlight some of the Flink talks happening soon in virtual events.
As usual, we also leave you with some resources to read and explore.
Category Events ODSC Europe (Sep. 17-19) Snakes on a Plane: Interactive Data Exploration with PyFlink and Zeppelin Notebooks
Big Data LDN (Sep. 23-24) Flink SQL: From Real-Time Pattern Detection to Online View Maintenance
ApacheCon @Home (Sep. 29-Oct.1) Integrate Apache Flink with Cloud Native Ecosystem
Snakes on a Plane: Interactive Data Exploration with PyFlink and Zeppelin Notebooks
Interactive Streaming Data Analytics via Flink on Zeppelin
Flink SQL in 2020: Time to show off!
Change Data Capture with Flink SQL and Debezium
Real-Time Stock Processing With Apache NiFi, Apache Flink and Apache Kafka
Using the Mm FLaNK Stack for Edge AI (Apache MXNet, Apache Flink, Apache NiFi, Apache Kafka, Apache Kudu)
Blogposts Flink 1.11 Series The State of Flink on Docker Accelerating your workload with GPU and other external resources PyFlink: The integration of Pandas into PyFlink Other Monitoring and Controlling Networks of IoT Devices with Flink Stateful Functions Advanced Flink Application Patterns Vol.3: Custom Window Processing Flink Packages Flink Packages is a website where you can explore (and contribute to) the Flink ecosystem of connectors, extensions, APIs, tools and integrations. New in: Flink CDC Connectors Flink File Source Flink DynamoDB Connector If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.
`}),e.add({id:139,href:"/2020/09/01/memory-management-improvements-for-flinks-jobmanager-in-apache-flink-1.11/",title:"Memory Management improvements for Flink’s JobManager in Apache Flink 1.11",section:"Flink Blog",content:`Apache Flink 1.11 comes with significant changes to the memory model of Flink’s JobManager and configuration options for your Flink clusters. These recently-introduced changes make Flink adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), providing better control over its memory consumption.
The previous blog post focused on the memory model of the TaskManagers and how it was improved in Flink 1.10. This post addresses the same topic but for the JobManager instead. Flink 1.11 unifies the memory model of Flink’s processes. The newly-introduced memory model of the JobManager follows a similar approach to that of the TaskManagers; it is simpler and has fewer components and tuning knobs. This post might consequently seem very similar to our previous story on Flink’s memory but aims at providing a complete overview of Flink’s JobManager memory model as of Flink 1.11. Read on for a full list of updates and changes below!
Introduction to Flink’s process memory model # Having a clear understanding of Apache Flink’s process memory model allows you to manage resources for the various workloads more efficiently. The following diagram illustrates the main memory components of a Flink process:
Flink: Total Process Memory The JobManager process is a JVM process. On a high level, its memory consists of the JVM Heap and Off-Heap memory. These types of memory are consumed by Flink directly or by the JVM for its specific purposes (i.e. metaspace). There are two major memory consumers within the JobManager process: the framework itself consuming memory for internal data structures, network communication, etc. and the user code which runs within the JobManager process, e.g. in certain batch sources or during checkpoint completion callbacks.
Note Please note that the user code has direct access to all memory types: JVM Heap, Direct and Native memory. Therefore, Flink cannot really control its allocation and usage. How to set up JobManager memory # With the release of Flink 1.11 and in order to provide better user experience, the Flink community introduced three alternatives to setting up memory in JobManagers.
The first two — and simplest — alternatives are configuring one of the two following options for total memory available for the JVM process of the JobManager:
Total Process Memory: total memory consumed by the Flink Java application (including user code) and by the JVM to run the whole process. Total Flink Memory: only the memory consumed by the Flink Java application, including user code but excluding any memory allocated by the JVM to run it. It is advisable to configure the Total Flink Memory for standalone deployments where explicitly declaring how much memory is given to Flink is a common practice, while the outer JVM overhead is of little interest. For the cases of deploying Flink in containerized environments (such as Kubernetes, Yarn or Mesos), the Total Process Memory option is recommended instead, because it becomes the size for the total memory of the requested container. Containerized environments usually strictly enforce this memory limit.
If you want more fine-grained control over the size of the JVM Heap, there is also the third alternative of configuring it directly. This alternative gives a clear separation between the heap memory and any other memory types.
The remaining memory components will be automatically adjusted either based on their default values or additionally-configured parameters. Apache Flink also checks the overall consistency. You can find more information about the different memory components in the corresponding documentation. You can try different configuration options with the configuration spreadsheet (you have to make a copy of the spreadsheet to edit it) of FLIP-116 and check the corresponding results for your individual case.
If you are migrating from a Flink version older than 1.11, we suggest following the steps in the migration guide of the Flink documentation.
Additionally, you can configure separately the Off-heap memory (JVM direct and non-direct memory) as well as the JVM metaspace &amp; overhead. The JVM overhead is a fraction of the Total Process Memory. The JVM overhead can be configured in a similar way as other fractions described in our previous blog post about the TaskManager’s memory model.
More hints to control the container memory limit # The heap and direct memory usage are managed by the JVM. There are also many other possible sources of native memory consumption in Apache Flink or its user applications which are not managed directly by Flink or the JVM. Controlling their limits is often difficult which complicates debugging of potential memory leaks. If Flink’s process allocates too much memory in an unmanaged way, it can often result in killing its containers for containerized environments. In this case, understanding which type of memory consumption has exceeded its limit might be difficult to grasp and resolve. Flink 1.11 introduces some specific tuning options to clearly represent such components for the JobManager’s process. Although Flink cannot always enforce strict limits and borders among them, the idea here is to explicitly plan the memory usage. Below we provide some examples of how memory setup can prevent containers from exceeding their memory limit:
User code or its dependencies consume significant off-heap memory. Tuning the Off-heap option can assign additional direct or native memory to the user code or any of its dependencies. Flink cannot control native allocations but it sets the limit for JVM Direct memory allocations. The Direct memory limit is enforced by the JVM.
JVM metaspace requires additional memory. If you encounter OutOfMemoryError: Metaspace, Flink provides an option to increase its default limit and the JVM will ensure that it is not exceeded. The metaspace size of a Flink JVM process is always explicitly set in contrast to the default JVM settings where it is not limited.
JVM requires more internal memory. There is no direct control over certain types of JVM process allocations but Flink provides JVM Overhead options. The JVM Overhead options allow declaring an additional amount of memory, anticipated for those allocations and not covered by other options.
Conclusion # The latest Flink release (Flink 1.11) introduces some notable changes to the memory configuration of Flink’s JobManager, making its memory management significantly easier than before. Stay tuned for more additions and features in upcoming releases. If you have any suggestions or questions for the Flink community, we encourage you to sign up to the Apache Flink mailing lists and become part of the discussion.
`}),e.add({id:140,href:"/2020/08/25/apache-flink-1.10.2-released/",title:"Apache Flink 1.10.2 Released",section:"Flink Blog",content:"The Apache Flink community released the second bugfix version of the Apache Flink 1.10 series.\nThis release includes 73 fixes and minor improvements for Flink 1.10.1. The list below includes a detailed list of all fixes and improvements.\nWe highly recommend all users to upgrade to Flink 1.10.2.\nNote After FLINK-18242, the deprecated `OptionsFactory` and `ConfigurableOptionsFactory` classes are removed (not applicable for release-1.10), please use `RocksDBOptionsFactory` and `ConfigurableRocksDBOptionsFactory` instead. Please also recompile your application codes if any class extending `DefaultConfigurableOptionsFactory` Note After FLINK-17800 by default we will set `setTotalOrderSeek` to true for RocksDB's `ReadOptions`, to prevent user from miss using `optimizeForPointLookup`. Meantime we support customizing `ReadOptions` through `RocksDBOptionsFactory`. Please set `setTotalOrderSeek` back to false if any performance regression observed (normally won't happen according to our testing). Updated Maven dependencies:\n&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.10.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.\nList of resolved issues:\nSub-task [FLINK-15836] - Throw fatal error in KubernetesResourceManager when the pods watcher is closed with exception [FLINK-16160] - Schema#proctime and Schema#rowtime don&#39;t work in TableEnvironment#connect code path Bug [FLINK-13689] - Rest High Level Client for Elasticsearch6.x connector leaks threads if no connection could be established [FLINK-14369] - KafkaProducerAtLeastOnceITCase&gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on Travis [FLINK-14836] - Unable to set yarn container number for scala shell in yarn mode [FLINK-14894] - HybridOffHeapUnsafeMemorySegmentTest#testByteBufferWrap failed on Travis [FLINK-15758] - Investigate potential out-of-memory problems due to managed unsafe memory allocation [FLINK-15849] - Update SQL-CLIENT document from type to data-type [FLINK-16309] - ElasticSearch 7 connector is missing in SQL connector list [FLINK-16346] - BlobsCleanupITCase.testBlobServerCleanupCancelledJob fails on Travis [FLINK-16432] - Building Hive connector gives problems [FLINK-16451] - Fix IndexOutOfBoundsException for DISTINCT AGG with constants [FLINK-16510] - Task manager safeguard shutdown may not be reliable [FLINK-17092] - Pyflink test BlinkStreamDependencyTests is instable [FLINK-17322] - Enable latency tracker would corrupt the broadcast state [FLINK-17420] - Cannot alias Tuple and Row fields when converting DataStream to Table [FLINK-17466] - toRetractStream doesn&#39;t work correctly with Pojo conversion class [FLINK-17555] - docstring of pyflink.table.descriptors.FileSystem:1:duplicate object description of pyflink.table.descriptors.FileSystem [FLINK-17558] - Partitions are released in TaskExecutor Main Thread [FLINK-17562] - POST /jars/:jarid/plan is not working [FLINK-17578] - Union of 2 SideOutputs behaviour incorrect [FLINK-17639] - Document which FileSystems are supported by the StreamingFileSink [FLINK-17643] - LaunchCoordinatorTest fails [FLINK-17700] - The callback client of JavaGatewayServer should run in a daemon thread [FLINK-17744] - StreamContextEnvironment#execute cannot be call JobListener#onJobExecuted [FLINK-17763] - No log files when starting scala-shell [FLINK-17788] - scala shell in yarn mode is broken [FLINK-17800] - RocksDB optimizeForPointLookup results in missing time windows [FLINK-17801] - TaskExecutorTest.testHeartbeatTimeoutWithResourceManager timeout [FLINK-17809] - BashJavaUtil script logic does not work for paths with spaces [FLINK-17822] - Nightly Flink CLI end-to-end test failed with &quot;JavaGcCleanerWrapper$PendingCleanersRunner cannot access class jdk.internal.misc.SharedSecrets&quot; in Java 11 [FLINK-17870] - dependent jars are missing to be shipped to cluster in scala shell [FLINK-17891] - FlinkYarnSessionCli sets wrong execution.target type [FLINK-17959] - Exception: &quot;CANCELLED: call already cancelled&quot; is thrown when run python udf [FLINK-18008] - HistoryServer does not log environment information on startup [FLINK-18012] - Deactivate slot timeout if TaskSlotTable.tryMarkSlotActive is called [FLINK-18035] - Executors#newCachedThreadPool could not work as expected [FLINK-18045] - Fix Kerberos credentials checking to unblock Flink on secured MapR [FLINK-18048] - &quot;--host&quot; option could not take effect for standalone application cluster [FLINK-18097] - History server doesn&#39;t clean all job json files [FLINK-18168] - Error results when use UDAF with Object Array return type [FLINK-18223] - AvroSerializer does not correctly instantiate GenericRecord [FLINK-18241] - Custom OptionsFactory in user code not working when configured via flink-conf.yaml [FLINK-18242] - Custom OptionsFactory settings seem to have no effect on RocksDB [FLINK-18297] - SQL client: setting execution.type to invalid value shuts down the session [FLINK-18329] - Dist NOTICE issues [FLINK-18352] - org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe [FLINK-18517] - kubernetes session test failed with &quot;java.net.SocketException: Broken pipe&quot; [FLINK-18539] - StreamExecutionEnvironment#addSource(SourceFunction, TypeInformation) doesn&#39;t use the user defined type information [FLINK-18595] - Deadlock during job shutdown [FLINK-18646] - Managed memory released check can block RPC thread [FLINK-18663] - RestServerEndpoint may prevent server shutdown [FLINK-18677] - ZooKeeperLeaderRetrievalService does not invalidate leader in case of SUSPENDED connection [FLINK-18702] - Flink elasticsearch connector leaks threads and classloaders thereof [FLINK-18815] - AbstractCloseableRegistryTest.testClose unstable [FLINK-18821] - Netty client retry mechanism may cause PartitionRequestClientFactory#createPartitionRequestClient to wait infinitely [FLINK-18859] - ExecutionGraphNotEnoughResourceTest.testRestartWithSlotSharingAndNotEnoughResources failed with &quot;Condition was not met in given timeout.&quot; [FLINK-18902] - Cannot serve results of asynchronous REST operations in per-job mode New Feature [FLINK-17844] - Activate japicmp-maven-plugin checks for @PublicEvolving between bug fix releases (x.y.u -&gt; x.y.v) Improvement [FLINK-16217] - SQL Client crashed when any uncatched exception is thrown [FLINK-16225] - Metaspace Out Of Memory should be handled as Fatal Error in TaskManager [FLINK-16619] - Misleading SlotManagerImpl logging for slot reports of unknown task manager [FLINK-16717] - Use headless service for rpc and blob port when flink on K8S [FLINK-17248] - Make the thread nums of io executor of ClusterEntrypoint and MiniCluster configurable [FLINK-17503] - Make memory configuration logging more user-friendly [FLINK-17819] - Yarn error unhelpful when forgetting HADOOP_CLASSPATH [FLINK-17920] - Add the Python example of Interval Join in Table API doc [FLINK-17945] - Improve error reporting of Python CI tests [FLINK-17970] - Increase default value of IO pool executor to 4 * #cores [FLINK-18010] - Add more logging to HistoryServer [FLINK-18501] - Mapping of Pluggable Filesystems to scheme is not properly logged [FLINK-18644] - Remove obsolete doc for hive connector [FLINK-18772] - Hide submit job web ui elements when running in per-job/application mode "}),e.add({id:141,href:"/2020/08/20/the-state-of-flink-on-docker/",title:"The State of Flink on Docker",section:"Flink Blog",content:`With over 50 million downloads from Docker Hub, the Flink docker images are a very popular deployment option.
The Flink community recently put some effort into improving the Docker experience for our users with the goal to reduce confusion and improve usability.
Let&rsquo;s quickly break down the recent improvements:
Reduce confusion: Flink used to have 2 Dockerfiles and a 3rd file maintained outside of the official repository — all with different features and varying stability. Now, we have one central place for all images: apache/flink-docker.
Here, we keep all the Dockerfiles for the different releases. Check out the detailed readme of that repository for further explanation on the different branches, as well as the Flink Improvement Proposal (FLIP-111) that contains the detailed planning.
The apache/flink-docker repository also seeds the official Flink image on Docker Hub.
Improve Usability: The Dockerfiles are used for various purposes: Native Docker deployments, Flink on Kubernetes, the (unofficial) Flink helm example and the project&rsquo;s internal end to end tests. With one unified image, all these consumers of the images benefit from the same set of features, documentation and testing.
The new images support passing configuration variables via a FLINK_PROPERTIES environment variable. Users can enable default plugins with the ENABLE_BUILT_IN_PLUGINS environment variable. The images also allow loading custom jar paths and configuration files.
Looking into the future, there are already some interesting potential improvements lined up:
Java 11 Docker images (already completed) Use vanilla docker-entrypoint with flink-kubernetes (in progress) History server support Support for OpenShift How do I get started? # This is a short tutorial on how to start a Flink Session Cluster with Docker.
A Flink Session cluster can be used to run multiple jobs. Each job needs to be submitted to the cluster after it has been deployed. To deploy a Flink Session cluster with Docker, you need to start a JobManager container. To enable communication between the containers, we first set a required Flink configuration property and create a network:
FLINK_PROPERTIES=&#34;jobmanager.rpc.address: jobmanager&#34; docker network create flink-network Then we launch the JobManager:
docker run \\ --rm \\ --name=jobmanager \\ --network flink-network \\ -p 8081:8081 \\ --env FLINK_PROPERTIES=&#34;\${FLINK_PROPERTIES}&#34; \\ flink:1.11.1 jobmanager and one or more TaskManager containers:
docker run \\ --rm \\ --name=taskmanager \\ --network flink-network \\ --env FLINK_PROPERTIES=&#34;\${FLINK_PROPERTIES}&#34; \\ flink:1.11.1 taskmanager You now have a fully functional Flink cluster running! You can access the the web front end here: localhost:8081.
Let&rsquo;s now submit one of Flink&rsquo;s example jobs:
# 1: (optional) Download the Flink distribution, and unpack it wget https://archive.apache.org/dist/flink/flink-1.11.1/flink-1.11.1-bin-scala_2.12.tgz tar xf flink-1.11.1-bin-scala_2.12.tgz cd flink-1.11.1 # 2: Start the Flink job ./bin/flink run ./examples/streaming/TopSpeedWindowing.jar The main steps of the tutorial are also recorded in this short screencast:
Next steps: Now that you&rsquo;ve successfully completed this tutorial, we recommend you checking out the full Flink on Docker documentation for implementing more advanced deployment scenarios, such as Job Clusters, Docker Compose or our native Kubernetes integration.
Conclusion # We encourage all readers to try out Flink on Docker to provide the community with feedback to further improve the experience. Please refer to the user@flink.apache.org (remember to subscribe first) for general questions and our issue tracker for specific bugs or improvements, or ideas for contributions!
`}),e.add({id:142,href:"/2020/08/18/monitoring-and-controlling-networks-of-iot-devices-with-flink-stateful-functions/",title:"Monitoring and Controlling Networks of IoT Devices with Flink Stateful Functions",section:"Flink Blog",content:`In this blog post, we&rsquo;ll take a look at a class of use cases that is a natural fit for Flink Stateful Functions: monitoring and controlling networks of connected devices (often called the “Internet of Things” (IoT)).
IoT networks are composed of many individual, but interconnected components, which makes getting some kind of high-level insight into the status, problems, or optimization opportunities in these networks not trivial. Each individual device “sees” only its own state, which means that the status of groups of devices, or even the network as a whole, is often a complex aggregation of the individual devices’ state. Diagnosing, controlling, or optimizing these groups of devices thus requires distributed logic that analyzes the &ldquo;bigger picture&rdquo; and then acts upon it.
A powerful approach to implement this is using digital twins: each device has a corresponding virtual entity (i.e. the digital twin), which also captures their relationships and interactions. The digital twins track the status of their corresponding devices and send updates to other twins, representing groups (such as geographical regions) of devices. Those, in turn, handle the logic to obtain the network&rsquo;s aggregated view, or this &ldquo;bigger picture&rdquo; we mentioned before.
Our Scenario: Datacenter Monitoring and Alerting # Fig.1 An oversimplified view of a data center. There are many examples of the digital twins approach in the real world, such as smart grids of batteries, smart cities, or monitoring infrastructure software clusters. In this blogpost, we&rsquo;ll use the example of data center monitoring and alert correlation implemented with Stateful Functions.
Consider a very simplified view of a data center, consisting of many thousands of commodity servers arranged in server racks. Each server rack typically contains up to 40 servers, with a ToR (Top of the Rack) network switch connected to each server. The switches from all the racks connect through a larger switch (Fig. 1).
In this datacenter, many things can go wrong: a disk in a server can stop working, network cards can start dropping packets, or ToR switches might cease to function. The entire data center might also be affected by power supply degradation, causing servers to operate at reduced capacity. On-site engineers must be able to identify these incidents quickly and fix them promptly.
Diagnosing individual server failures is rather straightforward: take a recent history of metric reports from that particular server, analyse it and pinpoint the anomaly. On the other hand, other incidents only make sense &ldquo;together&rdquo;, because they share a common root cause. Diagnosing or predicting causes of networking degradation at a rack or datacenter level requires an aggregate view of metrics (such as package drop rates) from the individual machines and racks, and possibly some prediction model or diagnosis code that runs under certain conditions.
Monitoring a Virtual Datacenter via Digital Twins # For the sake of this blog post, our oversimplified data center has some servers and racks, each with a unique ID. Each server has a metrics-collecting daemon that publishes metrics to a message queue, and there is a provisioning service that operators will use to ask for server commission- and decommissioning.
Our application will consume these server metrics and commission/decommission events, and produce server/rack/datacenter alerts. There will also be an operator consuming any alerts triggered by the monitoring system. In the next section, we&rsquo;ll show how this use case can be naturally modeled with Stateful Functions (StateFun).
Implementing the use case with Flink StateFun # You can find the code for this example at: https://github.com/igalshilman/iot-statefun-blogpost The basic building block for modeling a StateFun application is a stateful function, which has the following properties:
It has a logical unique address; and persisted, fault tolerant state, scoped to that address.
It can react to messages, both internal (or, sent from other stateful functions) and external (e.g. a message from Kafka).
Invocations of a specific function are serializable, so messages sent to a specific address are not executed concurrently.
There can be many billions of function instances in a single StateFun cluster.
To model our use case, we&rsquo;ll define three functions: ServerFun, RackFun and DataCenterFun.
ServerFun
Each physical server is represented with its digital twin stateful function. This function is responsible for:
Maintaining a sliding window of incoming metrics.
Applying a model that decides whether or not to trigger an alert.
Alerting if metrics are missing for too long.
Notifying its containing RackFun about any open incidents.
RackFun
While the ServerFun is responsible for identifying server-local incidents, we need a function that correlates incidents happening on the different servers deployed in the same rack and:
Collects open incidents reported by the ServerFun functions.
Maintains an histogram of currently opened incidents on this rack.
Applies a correlation model to the individual incidents sent by the ServerFun, and reports high-level, related incidents as a single incident to the DataCenterFun.
DataCenterFun
This function maintains a view of incidents across different racks in our datacenter.
To summarize our plan:
Leaf functions ingest raw metric data (blue lines), and apply localized logic to trigger an alert.
Intermediate functions operate on already summarized events (orange lines) and correlate them into high-level events.
A root function correlates the high-level events across the intermediate functions and into a single healthy/not healthy value.
How does it really look? # ServerFun # This section associates a behaviour for every message that the function expects to be invoked with. The metricsHistory buffer is our sliding window of the last 15 minutes worth of ServerMetricReports. Note that this buffer is configured to expire entries 15 minutes after they were written. serverHealthState represents the current physical server state, open incidents and so on. Let&rsquo;s take a look at what happens when a ServerMetricReport message arrives:
Retrieve the previously computed serverHealthState that is kept in state. Evaluate a model on the sliding window of the previous metric reports + the current metric reported + the previously computed server state to obtain an assessment of the current server health. If the server is not believed to be healthy, emit an alert via an alerts topic, and also send a message to our containing rack with all the open incidents that this server currently has. We'll omit the other handlers for brevity, but it's important to mention that onTimer makes sure that metric reports are coming in periodically, otherwise it'd trigger an alert stating that we didn’t hear from that server for a long time. RackFun # This function keeps a mapping between a ServerId and a set of open incidents on that server. When new alerts are received, this function tries to correlate the alert with any other open alerts on that rack. If a correlated rack alert is present, this function notifies the DataCenterFun about it. DataCenterFun # A persisted mapping between a RackId and the latest alert that rack reported. Throughout the usage of ingress/egress pairs, this function can report back its current view of the world of what racks are currently known to be unhealthy. An operator (via a front-end) can send a GetUnhealthyRacks message addressed to that DataCenterFun, and wait for the corresponding response message(UnhealthyRacks). Whenever a rack reports OK, it&rsquo;ll be removed from the unhealthy racks map. Conclusion # This pattern — where each layer of functions performs a stateful aggregation of events sent from the previous layer (or the input) — is useful for a whole class of problems. And, although we used connected devices to motivate this use case, it&rsquo;s not limited to the IoT domain.
Stateful Functions provides the building blocks necessary for building complex distributed applications (here the digital twins that support analysis and interactions of the physical entities), while removing common complexities of distributed systems like service discovery, retires, circuit breakers, state management, scalability and similar challenges. If you&rsquo;d like to learn more about Stateful Functions, head over to the official documentation, where you can also find more hands-on tutorials to try out yourself!
`}),e.add({id:143,href:"/2020/08/06/accelerating-your-workload-with-gpu-and-other-external-resources/",title:"Accelerating your workload with GPU and other external resources",section:"Flink Blog",content:`Apache Flink 1.11 introduces a new External Resource Framework, which allows you to request external resources from the underlying resource management systems (e.g., Kubernetes) and accelerate your workload with those resources. As Flink provides a first-party GPU plugin at the moment, we will take GPU as an example and show how it affects Flink applications in the AI field. Other external resources (e.g. RDMA and SSD) can also be supported in a pluggable manner.
End-to-end real-time AI with GPU # Recently, AI and Machine Learning have gained additional popularity and have been widely used in various scenarios, such as personalized recommendation and image recognition. Flink, with the ability to support GPU allocation, can be used to build an end-to-end real-time AI workflow.
Why Flink # Typical AI workloads fall into two categories: training and inference.
Typical AI Workflow The training workload is usually a batch task, in which we train a model from a bounded dataset. On the other hand, the inference workload tends to be a streaming job. It consumes an unbounded data stream, which contains image data, for example, and uses a model to produce the output of predictions. Both workloads need to do data preprocessing first. Flink, as a unified batch and stream processing engine, can be used to build an end-to-end AI workflow naturally.
In many cases, the training and inference workload can benefit a lot by leveraging GPUs. Research shows that CPU cluster is outperformed by GPU cluster, which is of similar cost, by about 400 percent. As training datasets are getting bigger and models more complex, supporting GPUs has become mandatory for running AI workloads.
With the External Resource Framework and its GPU plugin, Flink can now request GPU resources from the external resource management system and expose GPU information to operators. With this feature, users can now easily build end-to-end training and real-time inference pipelines with GPU support on Flink.
Example: MNIST Inference with Flink # We take the MNIST inference task as an example to show how to use the External Resource Framework and how to leverage GPUs in Flink. MNIST is a database of handwritten digits, which is usually viewed as the HelloWorld of AI. The goal is to recognize a 28px*28px picture of a number from 0 to 9.
First, you need to set configurations for the external resource framework to enable GPU support:
external-resources: gpu # Define the driver factory class of gpu resource. external-resource.gpu.driver-factory.class: org.apache.flink.externalresource.gpu.GPUDriverFactory # Define the amount of gpu resource per TaskManager. external-resource.gpu.amount: 1 # Enable the coordination mode if you run it in standalone mode external-resource.gpu.param.discovery-script.args: --enable-coordination # If you run it on Yarn external-resource.gpu.yarn.config-key: yarn.io/gpu # If you run it on Kubernetes external-resource.gpu.kubernetes.config-key: nvidia.com/gpu For more details of the configuration, please refer to the official documentation.
In the MNIST inference task, we first need to read the images and do data preprocessing. You can download training or testing data from this site. We provide a simple MNISTReader. It will read the image data located in the provided file path and transform each image into a list of floating point numbers.
Then, we need a classifier to recognize those images. A one-layer pre-trained neural network, whose prediction accuracy is 92.14%, is used in our classify operator. To leverage GPUs in order to accelerate the matrix-matrix multiplication, we use JCuda to call the native Cuda API. The prediction logic of the MNISTClassifier is shown below.
class MNISTClassifier extends RichMapFunction&lt;List&lt;Float&gt;, Integer&gt; { @Override public void open(Configuration parameters) { // Get the GPU information and select the first GPU. final Set&lt;ExternalResourceInfo&gt; externalResourceInfos = getRuntimeContext().getExternalResourceInfos(resourceName); final Optional&lt;String&gt; firstIndexOptional = externalResourceInfos.iterator().next().getProperty(&#34;index&#34;); // Initialize JCublas with the selected GPU JCuda.cudaSetDevice(Integer.parseInt(firstIndexOptional.get())); JCublas.cublasInit(); } @Override public Integer map(List&lt;Float&gt; value) { // Performs multiplication using JCublas. The matrixPointer points to our pre-trained model. JCublas.cublasSgemv(&#39;n&#39;, DIMENSIONS.f1, DIMENSIONS.f0, 1.0f, matrixPointer, DIMENSIONS.f1, inputPointer, 1, 0.0f, outputPointer, 1); // Read the result back from GPU. JCublas.cublasGetVector(DIMENSIONS.f1, Sizeof.FLOAT, outputPointer, 1, Pointer.to(output), 1); int result = 0; for (int i = 0; i &lt; DIMENSIONS.f1; ++i) { result = output[i] &gt; output[result] ? i : result; } return result; } } The complete MNIST inference project can be found here. In this project, we simply print the inference result to STDOUT. In the actual production environment, you could also write the result to Elasticsearch or Kafka, for example.
The MNIST inference task is just a simple case that shows you how the external resource framework works and what Flink can do with GPU support. With Flink’s open source extension Alink, which contains a lot of pre-built algorithms based on Flink, and Tensorflow on Flink, some complex AI workloads, e.g. online learning, real-time inference service, could be easily implemented as well.
Other external resources # In addition to GPU support, there are many other external resources that can be used to accelerate jobs in some specific scenarios. E.g. FPGA, for AI workloads, is supported by both Yarn and Kubernetes. Some low-latency network devices, like RDMA and Solarflare, also provide their device plugin for Kubernetes. Currently, Yarn supports GPUs and FPGAs, while the list of Kubernetes’ device plugins can be found here.
With the external resource framework, you only need to implement a plugin that enables the operator to get the information for these external resources; see Custom Plugin for more details. If you just want to ensure that an external resource exists in the TaskManager, then you only need to find the configuration key of that resource in the underlying resource management system and configure the external resource framework accordingly.
Conclusion # In the latest Flink release (Flink 1.11), an external resource framework has been introduced to support requesting various types of resources from the underlying resource management systems, and supply all the necessary information for using these resources to the operators. The first-party GPU plugin expands the application prospects of Flink in the AI domain. Different resource types can be supported in a pluggable way. You can also implement your own plugins for custom resource types.
Future developments in this area include implementing operator level resource isolation and fine-grained external resource scheduling. The community may kick this work off once FLIP-56 is finished. If you have any suggestions or questions for the community, we encourage you to sign up to the Apache Flink mailing lists and join the discussion there.
`}),e.add({id:144,href:"/2020/08/04/pyflink-the-integration-of-pandas-into-pyflink/",title:"PyFlink: The integration of Pandas into PyFlink",section:"Flink Blog",content:`Python has evolved into one of the most important programming languages for many fields of data processing. So big has been Python’s popularity, that it has pretty much become the default data processing language for data scientists. On top of that, there is a plethora of Python-based data processing tools such as NumPy, Pandas, and Scikit-learn that have gained additional popularity due to their flexibility or powerful functionalities.
Pic source: VanderPlas 2017, slide 52. In an effort to meet the user needs and demands, the Flink community hopes to leverage and make better use of these tools. Along this direction, the Flink community put some great effort in integrating Pandas into PyFlink with the latest Flink version 1.11. Some of the added features include support for Pandas UDF and the conversion between Pandas DataFrame and Table. Pandas UDF not only greatly improve the execution performance of Python UDF, but also make it more convenient for users to leverage libraries such as Pandas and NumPy in Python UDF. Additionally, providing support for the conversion between Pandas DataFrame and Table enables users to switch processing engines seamlessly without the need for an intermediate connector. In the remainder of this article, we will introduce how these functionalities work and how to use them with a step-by-step example.
Note Currently, only Scalar Pandas UDFs are supported in PyFlink. Pandas UDF in Flink 1.11 # Using scalar Python UDF was already possible in Flink 1.10 as described in a previous article on the Flink blog. Scalar Python UDFs work based on three primary steps:
the Java operator serializes one input row to bytes and sends them to the Python worker;
the Python worker deserializes the input row and evaluates the Python UDF with it;
the resulting row is serialized and sent back to the Java operator
While providing support for Python UDFs in PyFlink greatly improved the user experience, it had some drawbacks, namely resulting in:
High serialization/deserialization overhead
Difficulty when leveraging popular Python libraries used by data scientists — such as Pandas or NumPy — that provide high-performance data structure and functions.
The introduction of Pandas UDF is used to address these drawbacks. For Pandas UDF, a batch of rows is transferred between the JVM and PVM in a columnar format (Arrow memory format). The batch of rows will be converted into a collection of Pandas Series and will be transferred to the Pandas UDF to then leverage popular Python libraries (such as Pandas, or NumPy) for the Python UDF implementation.
The performance of vectorized UDFs is usually much higher when compared to the normal Python UDF, as the serialization/deserialization overhead is minimized by falling back to Apache Arrow, while handling pandas.Series as input/output allows us to take full advantage of the Pandas and NumPy libraries, making it a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).
Conversion between PyFlink Table and Pandas DataFrame # Pandas DataFrame is the de-facto standard for working with tabular data in the Python community while PyFlink Table is Flink’s representation of the tabular data in Python. Enabling the conversion between PyFlink Table and Pandas DataFrame allows switching between PyFlink and Pandas seamlessly when processing data in Python. Users can process data by utilizing one execution engine and switch to a different one effortlessly. For example, in case users already have a Pandas DataFrame at hand and want to perform some expensive transformation, they can easily convert it to a PyFlink Table and leverage the power of the Flink engine. On the other hand, users can also convert a PyFlink Table to a Pandas DataFrame and perform the same transformation with the rich functionalities provided by the Pandas ecosystem.
Examples # Using Python in Apache Flink requires installing PyFlink, which is available on PyPI and can be easily installed using pip. Before installing PyFlink, check the working version of Python running in your system using:
$ python --version Python 3.7.6 Note Please note that Python 3.5 or higher is required to install and run PyFlink $ python -m pip install apache-flink Using Pandas UDF # Pandas UDFs take pandas.Series as the input and return a pandas.Series of the same length as the output. Pandas UDFs can be used at the exact same place where non-Pandas functions are currently being utilized. To mark a UDF as a Pandas UDF, you only need to add an extra parameter udf_type=&ldquo;pandas&rdquo; in the udf decorator:
@udf(input_types=[DataTypes.STRING(), DataTypes.FLOAT()], result_type=DataTypes.FLOAT(), udf_type=&#39;pandas&#39;) def interpolate(id, temperature): # takes id: pandas.Series and temperature: pandas.Series as input df = pd.DataFrame({&#39;id&#39;: id, &#39;temperature&#39;: temperature}) # use interpolate() to interpolate the missing temperature interpolated_df = df.groupby(&#39;id&#39;).apply( lambda group: group.interpolate(limit_direction=&#39;both&#39;)) # output temperature: pandas.Series return interpolated_df[&#39;temperature&#39;] The Pandas UDF above uses the Pandas dataframe.interpolate() function to interpolate the missing temperature data for each equipment id. This is a common IoT scenario whereby each equipment/device reports it’s id and temperature to be analyzed, but the temperature field may be null due to various reasons. With the function, you can register and use it in the same way as the normal Python UDF. Below is a complete example of how to use the Pandas UDF in PyFlink.
from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, DataTypes from pyflink.table.udf import udf import pandas as pd env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(env) t_env.get_config().get_configuration().set_boolean(&#34;python.fn-execution.memory.managed&#34;, True) @udf(input_types=[DataTypes.STRING(), DataTypes.FLOAT()], result_type=DataTypes.FLOAT(), udf_type=&#39;pandas&#39;) def interpolate(id, temperature): # takes id: pandas.Series and temperature: pandas.Series as input df = pd.DataFrame({&#39;id&#39;: id, &#39;temperature&#39;: temperature}) # use interpolate() to interpolate the missing temperature interpolated_df = df.groupby(&#39;id&#39;).apply( lambda group: group.interpolate(limit_direction=&#39;both&#39;)) # output temperature: pandas.Series return interpolated_df[&#39;temperature&#39;] t_env.register_function(&#34;interpolate&#34;, interpolate) my_source_ddl = &#34;&#34;&#34; create table mySource ( id INT, temperature FLOAT ) with ( &#39;connector.type&#39; = &#39;filesystem&#39;, &#39;format.type&#39; = &#39;csv&#39;, &#39;connector.path&#39; = &#39;/tmp/input&#39; ) &#34;&#34;&#34; my_sink_ddl = &#34;&#34;&#34; create table mySink ( id INT, temperature FLOAT ) with ( &#39;connector.type&#39; = &#39;filesystem&#39;, &#39;format.type&#39; = &#39;csv&#39;, &#39;connector.path&#39; = &#39;/tmp/output&#39; ) &#34;&#34;&#34; t_env.execute_sql(my_source_ddl) t_env.execute_sql(my_sink_ddl) t_env.from_path(&#39;mySource&#39;)\\ .select(&#34;id, interpolate(id, temperature) as temperature&#34;) \\ .insert_into(&#39;mySink&#39;) t_env.execute(&#34;pandas_udf_demo&#34;) To submit the job:
Firstly, you need to prepare the input data in the &ldquo;/tmp/input&rdquo; file. For example, $ echo -e &#34;1,98.0\\n1,\\n1,100.0\\n2,99.0&#34; &gt; /tmp/input Next, you can run this example on the command line, $ python pandas_udf_demo.py The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, see more details here.
Finally, you can see the execution result on the command line. As you can see, all the temperature data with an empty value has been interpolated: $ cat /tmp/output 1,98.0 1,99.0 1,100.0 2,99.0 Conversion between PyFlink Table and Pandas DataFrame # You can use the from_pandas() method to create a PyFlink Table from a Pandas DataFrame or use the to_pandas() method to convert a PyFlink Table to a Pandas DataFrame.
from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment import pandas as pd import numpy as np env = StreamExecutionEnvironment.get_execution_environment() t_env = StreamTableEnvironment.create(env) # Create a PyFlink Table pdf = pd.DataFrame(np.random.rand(1000, 2)) table = t_env.from_pandas(pdf, [&#34;a&#34;, &#34;b&#34;]).filter(&#34;a &gt; 0.5&#34;) # Convert the PyFlink Table to a Pandas DataFrame pdf = table.to_pandas() print(pdf) Conclusion &amp; Upcoming work # In this article, we introduce the integration of Pandas in Flink 1.11, including Pandas UDF and the conversion between Table and Pandas. In fact, in the latest Apache Flink release, there are many excellent features added to PyFlink, such as support of User-defined Table functions and User-defined Metrics for Python UDFs. What’s more, from Flink 1.11, you can build PyFlink with Cython support and &ldquo;Cythonize&rdquo; your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).
Future work by the community will focus on adding more features and bringing additional optimizations with follow up releases. Such optimizations and additions include a Python DataStream API and more integration with the Python ecosystem, such as support for distributed Pandas in Flink. Stay tuned for more information and updates with the upcoming releases!
`}),e.add({id:145,href:"/2020/07/30/advanced-flink-application-patterns-vol.3-custom-window-processing/",title:"Advanced Flink Application Patterns Vol.3: Custom Window Processing",section:"Flink Blog",content:` Introduction # In the previous articles of the series, we described how you can achieve flexible stream partitioning based on dynamically-updated configurations (a set of fraud-detection rules) and how you can utilize Flink's Broadcast mechanism to distribute processing configuration at runtime among the relevant operators. Following up directly where we left the discussion of the end-to-end solution last time, in this article we will describe how you can use the &quot;Swiss knife&quot; of Flink - the Process Function to create an implementation that is tailor-made to match your streaming business logic requirements. Our discussion will continue in the context of the Fraud Detection engine. We will also demonstrate how you can implement your own custom replacement for time windows for cases where the out-of-the-box windowing available from the DataStream API does not satisfy your requirements. In particular, we will look at the trade-offs that you can make when designing a solution which requires low-latency reactions to individual events.
This article will describe some high-level concepts that can be applied independently, but it is recommended that you review the material in part one and part two of the series as well as checkout the code base in order to make it easier to follow along.
ProcessFunction as a &ldquo;Window&rdquo; # Low Latency # Let&rsquo;s start with a reminder of the type of fraud detection rule that we would like to support:
&ldquo;Whenever the sum of payments from the same payer to the same beneficiary within a 24 hour period is greater than 200 000 $ - trigger an alert.&rdquo;
In other words, given a stream of transactions partitioned by a key that combines the payer and the beneficiary fields, we would like to look back in time and determine, for each incoming transaction, if the sum of all previous payments between the two specific participants exceeds the defined threshold. In effect, the computation window is always moved along to the position of the last observed event for a particular data partitioning key.
Figure 1: Time Windows One of the common key requirements for a fraud detection system is low response time. The sooner the fraudulent action gets detected, the higher the chances that it can be blocked and its negative consequences mitigated. This requirement is especially prominent in the financial domain, where you have one important constraint - any time spent evaluating a fraud detection model is time that a law-abiding user of your system will spend waiting for a response. Swiftness of processing often becomes a competitive advantage between various payment systems and the time limit for producing an alert could lie as low as 300-500 ms. This is all the time you get from the moment of ingestion of a transaction event into a fraud detection system until an alert has to become available to downstream systems. As you might know, Flink provides a powerful Window API that is applicable for a wide range of use cases. However, if you go over all of the available types of supported windows, you will realize that none of them exactly match our main requirement for this use case - the low-latency evaluation of each incoming transaction. There is no type of window in Flink that can express the &ldquo;x minutes/hours/days back from the current event&rdquo; semantic. In the Window API, events fall into windows (as defined by the window assigners), but they cannot themselves individually control the creation and evaluation of windows*. As described above, our goal for the fraud detection engine is to achieve immediate evaluation of the previous relevant data points as soon as the new event is received. This raises the question of feasibility of applying the Window API in this case. The Window API offers some options for defining custom triggers, evictors, and window assigners, which may get to the required result. However, it is usually difficult to get this right (and easy to break). Moreover, this approach does not provide access to broadcast state, which is required for implementing dynamic reconfiguration of business rules.
*) apart from the session windows, but they are limited to assignments based on the session gaps
Figure 2: Evaluation Delays Let&rsquo;s take an example of using a sliding window from Flink&rsquo;s Window API. Using sliding windows with the slide of S translates into an expected value of evaluation delay equal to S/2. This means that you would need to define a window slide of 600-1000 ms to fulfill the low-latency requirement of 300-500 ms delay, even before taking any actual computation time into account. The fact that Flink stores a separate window state for each sliding window pane renders this approach unfeasible under any moderately high load conditions.
In order to satisfy the requirements, we need to create our own low-latency window implementation. Luckily, Flink gives us all the tools required to do so. ProcessFunction is a low-level, but powerful building block in Flink's API. It has a simple contract:
public class SomeProcessFunction extends KeyedProcessFunction&lt;KeyType, InputType, OutputType&gt; { public void processElement(InputType event, Context ctx, Collector&lt;OutputType&gt; out){} public void onTimer(long timestamp, OnTimerContext ctx, Collector&lt;OutputType&gt; out) {} public void open(Configuration parameters){} } processElement() receives input events one by one. You can react to each input by producing one or more output events to the next operator by calling out.collect(someOutput). You can also pass data to a side output or ignore a particular input altogether.
onTimer() is called by Flink when a previously-registered timer fires. Both event time and processing time timers are supported.
open() is equivalent to a constructor. It is called inside of the TaskManager&rsquo;s JVM, and is used for initialization, such as registering Flink-managed state. It is also the right place to initialize fields that are not serializable and cannot be transferred from the JobManager&rsquo;s JVM.
Most importantly, ProcessFunction also has access to the fault-tolerant state, handled by Flink. This combination, together with Flink's message processing and delivery guarantees, makes it possible to build resilient event-driven applications with almost arbitrarily sophisticated business logic. This includes creation and processing of custom windows with state.
Implementation # State and Clean-up # In order to be able to process time windows, we need to keep track of data belonging to the window inside of our program. To ensure that this data is fault-tolerant and can survive failures in a distributed system, we should store it inside of Flink-managed state. As the time progresses, we do not need to keep all previous transactions. According to the sample rule, all events that are older than 24 hours become irrelevant. We are looking at a window of data that constantly moves and where stale transactions need to be constantly moved out of scope (in other words, cleaned up from state).
Figure 3: Window Clean-up We will use MapState to store the individual events of the window. In order to allow efficient clean-up of the out-of-scope events, we will utilize event timestamps as the MapState keys.
In a general case, we have to take into account the fact that there might be different events with exactly the same timestamp, therefore instead of individual Transaction per key(timestamp) we will store sets.
MapState&lt;Long, Set&lt;Transaction&gt;&gt; windowState; Side Note when any Flink-managed state is used inside a \`KeyedProcessFunction\`, the data returned by the \`state.value()\` call is automatically scoped by the key of the *currently-processed event* - see Figure 4. If \`MapState\` is used, the same principle applies, with the difference that a \`Map\` is returned instead of \`MyObject\`. If you are compelled to do something like \`mapState.value().get(inputEvent.getKey())\`, you should probably be using \`ValueState\` instead of the \`MapState\`. As we want to store *multiple values per event key*, in our case, \`MapState\` is the right choice. Figure 4: Keyed State Scoping As described in the first blog of the series, we are dispatching events based on the keys specified in the active fraud detection rules. Multiple distinct rules can be based on the same grouping key. This means that our alerting function can potentially receive transactions scoped by the same key (e.g. {payerId=25;beneficiaryId=12}), but destined to be evaluated according to different rules, which implies potentially different lengths of the time windows. This raises the question of how can we best store fault-tolerant window state within the KeyedProcessFunction. One approach would be to create and manage separate MapStates per rule. Such an approach, however, would be wasteful - we would separately hold state for overlapping time windows, and therefore unnecessarily store duplicate events. A better approach is to always store just enough data to be able to estimate all currently active rules which are scoped by the same key. In order to achieve that, whenever a new rule is added, we will determine if its time window has the largest span and store it in the broadcast state under the special reserved WIDEST_RULE_KEY. This information will later be used during the state clean-up procedure, as described later in this section.
@Override public void processBroadcastElement(Rule rule, Context ctx, Collector&lt;Alert&gt; out){ ... updateWidestWindowRule(rule, broadcastState); } private void updateWidestWindowRule(Rule rule, BroadcastState&lt;Integer, Rule&gt; broadcastState){ Rule widestWindowRule = broadcastState.get(WIDEST_RULE_KEY); if (widestWindowRule == null) { broadcastState.put(WIDEST_RULE_KEY, rule); return; } if (widestWindowRule.getWindowMillis() &lt; rule.getWindowMillis()) { broadcastState.put(WIDEST_RULE_KEY, rule); } } Let&rsquo;s now look at the implementation of the main method, processElement(), in some detail.
In the previous blog post, we described how DynamicKeyFunction allowed us to perform dynamic data partitioning based on the groupingKeyNames parameter in the rule definition. The subsequent description is focused around the DynamicAlertFunction, which makes use of the remaining rule settings.
Figure 5: Sample Rule Definition As described in the previous parts of the blog post series, our alerting process function receives events of type Keyed&lt;Transaction, String, Integer&gt;, where Transaction is the main &ldquo;wrapped&rdquo; event, String is the key (payer #x - beneficiary #y in Figure 1), and Integer is the ID of the rule that caused the dispatch of this event. This rule was previously stored in the broadcast state and has to be retrieved from that state by the ID. Here is the outline of the implementation:
public class DynamicAlertFunction extends KeyedBroadcastProcessFunction&lt; String, Keyed&lt;Transaction, String, Integer&gt;, Rule, Alert&gt; { private transient MapState&lt;Long, Set&lt;Transaction&gt;&gt; windowState; @Override public void processElement( Keyed&lt;Transaction, String, Integer&gt; value, ReadOnlyContext ctx, Collector&lt;Alert&gt; out){ // Add Transaction to state long currentEventTime = value.getWrapped().getEventTime(); // &lt;--- (1) addToStateValuesSet(windowState, currentEventTime, value.getWrapped()); // Calculate the aggregate value Rule rule = ctx.getBroadcastState(Descriptors.rulesDescriptor).get(value.getId()); // &lt;--- (2) Long windowStartTimestampForEvent = rule.getWindowStartTimestampFor(currentEventTime);// &lt;--- (3) SimpleAccumulator&lt;BigDecimal&gt; aggregator = RuleHelper.getAggregator(rule); // &lt;--- (4) for (Long stateEventTime : windowState.keys()) { if (isStateValueInWindow(stateEventTime, windowStartForEvent, currentEventTime)) { aggregateValuesInState(stateEventTime, aggregator, rule); } } // Evaluate the rule and trigger an alert if violated BigDecimal aggregateResult = aggregator.getLocalValue(); // &lt;--- (5) boolean isRuleViolated = rule.apply(aggregateResult); if (isRuleViolated) { long decisionTime = System.currentTimeMillis(); out.collect(new Alert&lt;&gt;(rule.getRuleId(), rule, value.getKey(), decisionTime, value.getWrapped(), aggregateResult)); } // Register timers to ensure state cleanup long cleanupTime = (currentEventTime / 1000) * 1000; // &lt;--- (6) ctx.timerService().registerEventTimeTimer(cleanupTime); } Here are the details of the steps: 1) We first add each new event to our window state: static &lt;K, V&gt; Set&lt;V&gt; addToStateValuesSet(MapState&lt;K, Set&lt;V&gt;&gt; mapState, K key, V value) throws Exception { Set&lt;V&gt; valuesSet = mapState.get(key); if (valuesSet != null) { valuesSet.add(value); } else { valuesSet = new HashSet&lt;&gt;(); valuesSet.add(value); } mapState.put(key, valuesSet); return valuesSet; } Next, we retrieve the previously-broadcasted rule, according to which the incoming transaction needs to be evaluated.
getWindowStartTimestampFor determines, given the window span defined in the rule, and the current transaction timestamp, how far back in time our evaluation should span.
The aggregate value is calculated by iterating over all window state entries and applying an aggregate function. It could be an average, max, min or, as in the example rule from the beginning of this section, a sum.
private boolean isStateValueInWindow( Long stateEventTime, Long windowStartForEvent, long currentEventTime) { return stateEventTime &gt;= windowStartForEvent &amp;&amp; stateEventTime &lt;= currentEventTime; } private void aggregateValuesInState( Long stateEventTime, SimpleAccumulator&lt;BigDecimal&gt; aggregator, Rule rule) throws Exception { Set&lt;Transaction&gt; inWindow = windowState.get(stateEventTime); for (Transaction event : inWindow) { BigDecimal aggregatedValue = FieldsExtractor.getBigDecimalByName(rule.getAggregateFieldName(), event); aggregator.add(aggregatedValue); } } Having an aggregate value, we can compare it to the threshold value that is specified in the rule definition and fire an alert, if necessary.
At the end, we register a clean-up timer using ctx.timerService().registerEventTimeTimer(). This timer will be responsible for removing the current transaction when it is going to move out of scope.
Note Notice the rounding during timer creation. It is an important technique which enables a reasonable trade-off between the precision with which the timers will be triggered, and the number of timers being used. Timers are stored in Flink's fault-tolerant state, and managing them with millisecond-level precision can be wasteful. In our case, with this rounding, we will create at most one timer per key in any given second. Flink documentation provides some additional [details](//nightlies.apache.org/flink/flink-docs-release-1.11/dev/stream/operators/process_function.html#timer-coalescing). The onTimer method will trigger the clean-up of the window state. As previously described, we are always keeping as many events in the state as required for the evaluation of an active rule with the widest window span. This means that during the clean-up, we only need to remove the state which is out of scope of this widest window.
Figure 6: Widest Window This is how the clean-up procedure can be implemented:
@Override public void onTimer(final long timestamp, final OnTimerContext ctx, final Collector&lt;Alert&gt; out) throws Exception { Rule widestWindowRule = ctx.getBroadcastState(Descriptors.rulesDescriptor).get(WIDEST_RULE_KEY); Optional&lt;Long&gt; cleanupEventTimeWindow = Optional.ofNullable(widestWindowRule).map(Rule::getWindowMillis); Optional&lt;Long&gt; cleanupEventTimeThreshold = cleanupEventTimeWindow.map(window -&gt; timestamp - window); // Remove events that are older than (timestamp - widestWindowSpan)ms cleanupEventTimeThreshold.ifPresent(this::evictOutOfScopeElementsFromWindow); } private void evictOutOfScopeElementsFromWindow(Long threshold) { try { Iterator&lt;Long&gt; keys = windowState.keys().iterator(); while (keys.hasNext()) { Long stateEventTime = keys.next(); if (stateEventTime &lt; threshold) { keys.remove(); } } } catch (Exception ex) { throw new RuntimeException(ex); } } Note You might be wondering why we did not use \`ListState\` , as we are always iterating over all of the values of the window state? This is actually an optimization for the case when \`RocksDBStateBackend\` [is used](//nightlies.apache.org/flink/flink-docs-release-1.11/ops/state/state_backends.html#the-rocksdbstatebackend). Iterating over a \`ListState\` would cause all of the \`Transaction\` objects to be deserialized. Using \`MapState\`\\'s keys iterator only causes deserialization of the keys (type \`long\`), and therefore reduces the computational overhead. This concludes the description of the implementation details. Our approach triggers evaluation of a time window as soon as a new transaction arrives. It therefore fulfills the main requirement that we have targeted - low delay for potentially issuing an alert. For the complete implementation, please have a look at the project on github.
Improvements and Optimizations # What are the pros and cons of the described approach?
Pros:
Low latency capabilities
Tailored solution with potential use-case specific optimizations
Efficient state reuse (shared state for the rules with the same key)
Cons:
Cannot make use of potential future optimizations in the existing Window API
No late event handling, which is available out of the box in the Window API
Quadratic computation complexity and potentially large state
Let&rsquo;s now look at the latter two drawbacks and see if we can address them.
Late events: # Processing late events poses a certain question - is it still meaningful to re-evaluate the window in case of a late event arrival? In case this is required, you would need to extend the widest window used for the clean-up by your maximum expected out-of-orderness. This would avoid having potentially incomplete time window data for such late firings (see Figure 7).
Figure 7: Late Events Handling It can be argued, however, that for a use case that puts emphasis on low latency processing, such late triggering would be meaningless. In this case, we could keep track of the most recent timestamp that we have observed so far, and for events that do not monotonically increase this value, only add them to the state and skip the aggregate calculation and the alert triggering logic.
Redundant Re-computations and State Size: # In our described implementation we keep individual transactions in state and go over them to calculate the aggregate again and again on every new event. This is obviously not optimal in terms of wasting computational resources on repeated calculations.
What is the main reason to keep the individual transactions in state? The granularity of stored events directly corresponds to the precision of the time window calculation. Because we store transactions individually, we can precisely ignore individual transactions as soon as they leave the exact 2592000000 ms time window (30 days in ms). At this point, it is worth raising the question - do we really need this milliseconds precision when estimating such a long time window, or is it OK to accept potential false positives in exceptional cases? If the answer for your use case is that such precision is not needed, you could implement additional optimization based on bucketing and pre-aggregation. The idea of this optimization can be broken down as follows:
Instead of storing individual events, create a parent class that can either contain fields of a single transaction, or combined values, calculated based on applying an aggregate function to a set of transactions.
Instead of using timestamps in milliseconds as MapState keys, round them to the level of &ldquo;resolution&rdquo; that you are willing to accept (for instance, a full minute). Each entry therefore represents a bucket.
Whenever a window is evaluated, append the new transaction&rsquo;s data to the bucket aggregate instead of storing individual data points per transaction.
Figure 8: Pre-aggregation State Data and Serializers # Another question that we can ask ourselves in order to further optimize the implementation is how probable is it to get different events with exactly the same timestamp. In the described implementation, we demonstrated one way of approaching this question by storing sets of transactions per timestamp in MapState&lt;Long, Set&lt;Transaction&gt;&gt;. Such a choice, however, might have a more significant effect on performance than might be anticipated. The reason is that Flink does not currently provide a native Set serializer and will enforce a fallback to the less efficient Kryo serializer instead (FLINK-16729). A meaningful alternative strategy is to assume that, in a normal scenario, no two discrepant events can have exactly the same timestamp and to turn the window state into a MapState&lt;Long, Transaction&gt; type. You can use side-outputs to collect and monitor any unexpected occurrences which contradict your assumption. During performance optimizations, I generally recommend you to disable the fallback to Kryo and verify where your application might be further optimized by ensuring that more efficient serializers are being used.
Tip: You can quickly determine which serializer is going to be used for your classes by setting a breakpoint and verifying the type of the returned TypeInformation. PojoTypeInfo indicates that an efficient Flink POJO serializer will be used. GenericTypeInfo indicates the fallback to a Kryo serializer. Event pruning: Instead of storing complete events and putting additional stress on the ser/de machinery, we can reduce individual events data to only relevant information. This would potentially require &ldquo;unpacking&rdquo; individual events as fields, and storing those fields into a generic Map&lt;String, Object&gt; data structure, based on the configurations of active rules.
While this adjustment could potentially produce significant improvements for objects of large size, it should not be your first pick as it can easily turn into a premature optimization.
Summary: # This article concludes the description of the implementation of the fraud detection engine that we started in part one. In this blog post we demonstrated how ProcessFunction can be utilized to &quot;impersonate&quot; a window with a sophisticated custom logic. We have discussed the pros and cons of such approach and elaborated how custom use-case-specific optimizations can be applied - something that would not be directly possible with the Window API.
The goal of this blog post was to illustrate the power and flexibility of Apache Flink&rsquo;s APIs. At the core of it are the pillars of Flink, that spare you, as a developer, very significant amounts of work and generalize well to a wide range of use cases by providing:
Efficient data exchange in a distributed cluster
Horizontal scalability via data partitioning
Fault-tolerant state with quick, local access
Convenient abstraction for working with this state, which is as simple as using a local variable
Multi-threaded, parallel execution engine. ProcessFunction code runs in a single thread, without the need for synchronization. Flink handles all the parallel execution aspects and correct access to the shared state, without you, as a developer, having to think about it (concurrency is hard).
All these aspects make it possible to build applications with Flink that go well beyond trivial streaming ETL use cases and enable implementation of arbitrarily-sophisticated, distributed event-driven applications. With Flink, you can rethink approaches to a wide range of use cases which normally would rely on using stateless parallel execution nodes and &ldquo;pushing&rdquo; the concerns of state fault tolerance to a database, an approach that is often destined to run into scalability issues in the face of ever-increasing data volumes.
`}),e.add({id:146,href:"/2020/07/29/flink-community-update-july20/",title:"Flink Community Update - July'20",section:"Flink Blog",content:`As July draws to an end, we look back at a monthful of activity in the Flink community, including two releases (!) and some work around improving the first-time contribution experience in the project.
Also, events are starting to pick up again, so we&rsquo;ve put together a list of some great ones you can (virtually) attend in August!
The Past Month in Flink # Flink Releases # Flink 1.11 # A couple of weeks ago, Flink 1.11 was announced in what was (again) the biggest Flink release to date (see &ldquo;A Look Into the Evolution of Flink Releases&rdquo;)! The new release brought significant improvements to usability as well as new features to Flink users across the API stack. Some highlights of Flink 1.11 are:
Unaligned checkpoints to cope with high backpressure scenarios;
The new source API, that simplifies and unifies the implementation of (custom) sources;
Support for Change Data Capture (CDC) and other common use cases in the Table API/SQL;
Pandas UDFs and other performance optimizations in PyFlink, making it more powerful for data science and ML workloads.
For a more detailed look into the release, you can recap the announcement blogpost and join the upcoming meetup on “What’s new in Flink 1.11?”, where you’ll be able to ask anything release-related to Aljoscha Krettek (Flink PMC Member). The community has also been working on a series of blogposts that deep-dive into the most significant features and improvements in 1.11, so keep an eye on the Flink blog!
Flink 1.11.1 # Shortly after releasing Flink 1.11, the community announced the first patch version to cover some outstanding issues in the major release. This version is particularly important for users of the Table API/SQL, as it addresses known limitations that affect the usability of new features like changelog sources and support for JDBC catalogs.
You can find a detailed list with all the improvements and bugfixes that went into Flink 1.11.1 in the announcement blogpost.
Gearing up for Flink 1.12 # The Flink 1.12 release cycle has been kicked-off last week and a discussion about what features will go into the upcoming release is underway in this @dev Mailing List thread. While we wait for more of these ideas to turn into proposals and JIRA issues, here are some recent FLIPs that are already being discussed in the Flink community:
FLIP FLIP-130 Support Python DataStream API Python support in Flink has so far been bounded to the Table API/SQL. These APIs are high-level and convenient, but have some limitations for more complex stream processing use cases. To expand the usability of PyFlink to a broader set of use cases, FLIP-130 proposes to support it also in the DataStream API, starting with stateless operations.
FLIP-132 Temporal Table DDL Flink SQL users can't currently create temporal tables using SQL DDL, which forces them to change context frequently for use cases that require them. FLIP-132 proposes to extend the DDL syntax to support temporal tables, which in turn will allow to also bring temporal joins with changelog sources to Flink SQL.
New Committers and PMC Members # The Apache Flink community has welcomed 2 new PMC Members since the last update. Congratulations!
New PMC Members # Piotr Nowojski
Yu Li
The Bigger Picture # A Look Into the Evolution of Flink Releases # It’s been a while since we had a look at community numbers, so this time we’d like to shed some light on the evolution of contributors and, well, work across releases. Let’s have a look at some git data:
If we consider Flink 1.8 (Apr. 2019) as the baseline, the Flink community more than tripled the number of implemented and/or resolved issues in a single release with the support of an additional ~100 contributors in Flink 1.11. This is pretty impressive on its own, and even more so if you consider that Flink contributors are distributed around the globe, working across different locations and timezones!
First-time Contributor Guide # Flink has an extensive guide for code and non-code contributions that helps new community members navigate the project and get familiar with existing contribution guidelines. In particular for code contributions, knowing where to start can be difficult, given the sheer size of the Flink codebase and the pace of development of the project.
To better guide new contributors, a brief section was added to the guide on how to look for what to contribute and the starter label has been revived in Jira to highlight issues that are suitable for first-time contributors.
Note As a reminder, you no longer need to ask for contributor permissions to start contributing to Flink. Once you’ve found something you’d like to work on, read the contribution guide carefully and reach out to a Flink Committer, who will be able to help you get started. Replacing “charged” words in the Flink repo # The community is working on gradually replacing words that are outdated and carry a negative connotation in the Flink codebase, such as “master/slave” and “whitelist/blacklist”. The progress of this work can be tracked in FLINK-18209.
Upcoming Events (and More!) # We&rsquo;re happy to see the &ldquo;high season&rdquo; of virtual events approaching, with a lot of great conferences taking place in the coming month, as well as some meetups. Here, we highlight some of the Flink talks happening in those events, but we recommend checking out the complete event programs!
As usual, we also leave you with some resources to read and explore.
Category Events Virtual Flink Meetup (Jul. 29) What’s new in Flink 1.11? + Q&A with Aljoscha Krettek
DC Thursday (Jul. 30) Interview and Community Q&A with Stephan Ewen
KubeCon + CloudNativeCon Europe (Aug. 17-20) Stateful Serverless and the Elephant in the Room
DataEngBytes (Aug. 20-21) Change Data Capture with Flink SQL and Debezium
Sweet Streams are Made of These: Data Driven Development with Stream Processing
Beam Summit (Aug. 24-29) Streaming, Fast and Slow
Building Stateful Streaming Pipelines With Beam
Blogposts Flink 1.11 Series Application Deployment in Flink: Current State and the new Application Mode Sharing is caring - Catalogs in Flink SQL (Tutorial) Flink SQL Demo: Building an End-to-End Streaming Application (Tutorial) Other Streaming analytics with Java and Apache Flink (Tutorial) Flink for online Machine Learning and real-time processing at Weibo Data-driven Matchmaking at Azar with Apache Flink Flink Packages Flink Packages is a website where you can explore (and contribute to) the Flink ecosystem of connectors, extensions, APIs, tools and integrations. New in: SignalFx Metrics Reporter Yauaa: Yet Another UserAgent Analyzer If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.
`}),e.add({id:147,href:"/2020/07/28/flink-sql-demo-building-an-end-to-end-streaming-application/",title:"Flink SQL Demo: Building an End-to-End Streaming Application",section:"Flink Blog",content:`Apache Flink 1.11 has released many exciting new features, including many developments in Flink SQL which is evolving at a fast pace. This article takes a closer look at how to quickly build streaming applications with Flink SQL from a practical point of view.
In the following sections, we describe how to integrate Kafka, MySQL, Elasticsearch, and Kibana with Flink SQL to analyze e-commerce user behavior in real-time. All exercises in this blogpost are performed in the Flink SQL CLI, and the entire process uses standard SQL syntax, without a single line of Java/Scala code or IDE installation. The final result of this demo is shown in the following figure:
Preparation # Prepare a Linux or MacOS computer with Docker installed.
Starting the Demo Environment # The components required in this demo are all managed in containers, so we will use docker-compose to start them. First, download the docker-compose.yml file that defines the demo environment, for example by running the following commands:
mkdir flink-sql-demo; cd flink-sql-demo; wget https://raw.githubusercontent.com/wuchong/flink-sql-demo/v1.11-EN/docker-compose.yml The Docker Compose environment consists of the following containers:
Flink SQL CLI: used to submit queries and visualize their results. Flink Cluster: a Flink JobManager and a Flink TaskManager container to execute queries. MySQL: MySQL 5.7 and a pre-populated category table in the database. The category table will be joined with data in Kafka to enrich the real-time data. Kafka: mainly used as a data source. The DataGen component automatically writes data into a Kafka topic. Zookeeper: this component is required by Kafka. Elasticsearch: mainly used as a data sink. Kibana: used to visualize the data in Elasticsearch. DataGen: the data generator. After the container is started, user behavior data is automatically generated and sent to the Kafka topic. By default, 2000 data entries are generated each second for about 1.5 hours. You can modify DataGen&rsquo;s speedup parameter in docker-compose.yml to adjust the generation rate (which takes effect after Docker Compose is restarted). Note Before starting the containers, we recommend configuring Docker so that sufficient resources are available and the environment does not become unresponsive. We suggest running Docker at 3-4 GB memory and 3-4 CPU cores. To start all containers, run the following command in the directory that contains the docker-compose.yml file.
docker-compose up -d This command automatically starts all the containers defined in the Docker Compose configuration in a detached mode. Run docker ps to check whether the 9 containers are running properly. You can also visit http://localhost:5601/ to see if Kibana is running normally.
Don’t forget to run the following command to stop all containers after you finished the tutorial:
docker-compose down Entering the Flink SQL CLI client # To enter the SQL CLI client run:
docker-compose exec sql-client ./sql-client.sh The command starts the SQL CLI client in the container. You should see the welcome screen of the CLI client.
Creating a Kafka table using DDL # The DataGen container continuously writes events into the Kafka user_behavior topic. This data contains the user behavior on the day of November 27, 2017 (behaviors include “click”, “like”, “purchase” and “add to shopping cart” events). Each row represents a user behavior event, with the user ID, product ID, product category ID, event type, and timestamp in JSON format. Note that the dataset is from the Alibaba Cloud Tianchi public dataset.
In the directory that contains docker-compose.yml, run the following command to view the first 10 data entries generated in the Kafka topic:
docker-compose exec kafka bash -c &#39;kafka-console-consumer.sh --topic user_behavior --bootstrap-server kafka:9094 --from-beginning --max-messages 10&#39; {&#34;user_id&#34;: &#34;952483&#34;, &#34;item_id&#34;:&#34;310884&#34;, &#34;category_id&#34;: &#34;4580532&#34;, &#34;behavior&#34;: &#34;pv&#34;, &#34;ts&#34;: &#34;2017-11-27T00:00:00Z&#34;} {&#34;user_id&#34;: &#34;794777&#34;, &#34;item_id&#34;:&#34;5119439&#34;, &#34;category_id&#34;: &#34;982926&#34;, &#34;behavior&#34;: &#34;pv&#34;, &#34;ts&#34;: &#34;2017-11-27T00:00:00Z&#34;} ... In order to make the events in the Kafka topic accessible to Flink SQL, we run the following DDL statement in SQL CLI to create a table that connects to the topic in the Kafka cluster:
CREATE TABLE user_behavior ( user_id BIGINT, item_id BIGINT, category_id BIGINT, behavior STRING, ts TIMESTAMP(3), proctime AS PROCTIME(), -- generates processing-time attribute using computed column WATERMARK FOR ts AS ts - INTERVAL &#39;5&#39; SECOND -- defines watermark on ts column, marks ts as event-time attribute ) WITH ( &#39;connector&#39; = &#39;kafka&#39;, -- using kafka connector &#39;topic&#39; = &#39;user_behavior&#39;, -- kafka topic &#39;scan.startup.mode&#39; = &#39;earliest-offset&#39;, -- reading from the beginning &#39;properties.bootstrap.servers&#39; = &#39;kafka:9094&#39;, -- kafka broker address &#39;format&#39; = &#39;json&#39; -- the data format is json ); The above snippet declares five fields based on the data format. In addition, it uses the computed column syntax and built-in PROCTIME() function to declare a virtual column that generates the processing-time attribute. It also uses the WATERMARK syntax to declare the watermark strategy on the ts field (tolerate 5-seconds out-of-order). Therefore, the ts field becomes an event-time attribute. For more information about time attributes and DDL syntax, see the following official documents:
Time attributes in Flink’s Table API &amp; SQL DDL Syntax in Flink SQL After creating the user_behavior table in the SQL CLI, run SHOW TABLES; and DESCRIBE user_behavior; to see registered tables and table details. Also, run the command SELECT * FROM user_behavior; directly in the SQL CLI to preview the data (press q to exit).
Next, we discover more about Flink SQL through three real-world scenarios.
Hourly Trading Volume # Creating an Elasticsearch table using DDL # Let’s create an Elasticsearch result table in the SQL CLI. We need two columns in this case: hour_of_day and buy_cnt (trading volume).
CREATE TABLE buy_cnt_per_hour ( hour_of_day BIGINT, buy_cnt BIGINT ) WITH ( &#39;connector&#39; = &#39;elasticsearch-7&#39;, -- using elasticsearch connector &#39;hosts&#39; = &#39;http://elasticsearch:9200&#39;, -- elasticsearch address &#39;index&#39; = &#39;buy_cnt_per_hour&#39; -- elasticsearch index name, similar to database table name ); There is no need to create the buy_cnt_per_hour index in Elasticsearch in advance since Elasticsearch will automatically create the index if it does not exist.
Submitting a Query # The hourly trading volume is the number of &ldquo;buy&rdquo; behaviors completed each hour. Therefore, we can use a TUMBLE window function to assign data into hourly windows. Then, we count the number of “buy” records in each window. To implement this, we can filter out the &ldquo;buy&rdquo; data first and then apply COUNT(*).
INSERT INTO buy_cnt_per_hour SELECT HOUR(TUMBLE_START(ts, INTERVAL &#39;1&#39; HOUR)), COUNT(*) FROM user_behavior WHERE behavior = &#39;buy&#39; GROUP BY TUMBLE(ts, INTERVAL &#39;1&#39; HOUR); Here, we use the built-in HOUR function to extract the value for each hour in the day from a TIMESTAMP column. Use INSERT INTO to start a Flink SQL job that continuously writes results into the Elasticsearch buy_cnt_per_hour index. The Elasticearch result table can be seen as a materialized view of the query. You can find more information about Flink’s window aggregation in the Apache Flink documentation.
After running the previous query in the Flink SQL CLI, we can observe the submitted task on the Flink Web UI. This task is a streaming task and therefore runs continuously.
Using Kibana to Visualize Results # Access Kibana at http://localhost:5601. First, configure an index pattern by clicking &ldquo;Management&rdquo; in the left-side toolbar and find &ldquo;Index Patterns&rdquo;. Next, click &ldquo;Create Index Pattern&rdquo; and enter the full index name buy_cnt_per_hour to create the index pattern. After creating the index pattern, we can explore data in Kibana.
Note Since we are using the TUMBLE window of one hour here, it might take about four minutes between the time that containers started and until the first row is emitted. Until then the index does not exist and Kibana is unable to find the index. Click &ldquo;Discover&rdquo; in the left-side toolbar. Kibana lists the content of the created index.
Next, create a dashboard to display various views. Click &ldquo;Dashboard&rdquo; on the left side of the page to create a dashboard named &ldquo;User Behavior Analysis&rdquo;. Then, click &ldquo;Create New&rdquo; to create a new view. Select &ldquo;Area&rdquo; (area graph), then select the buy_cnt_per_hour index, and draw the trading volume area chart as illustrated in the configuration on the left side of the following diagram. Apply the changes by clicking the “▶” play button. Then, save it as &ldquo;Hourly Trading Volume&rdquo;.
You can see that during the early morning hours the number of transactions have the lowest value for the entire day.
As real-time data is added into the indices, you can enable auto-refresh in Kibana to see real-time visualization changes and updates. You can do so by clicking the time picker and entering a refresh interval (e.g. 3 seconds) in the “Refresh every” field.
Cumulative number of Unique Visitors every 10-min # Another interesting visualization is the cumulative number of unique visitors (UV). For example, the number of UV at 10:00 represents the total number of UV from 00:00 to 10:00. Therefore, the curve is monotonically increasing.
Let’s create another Elasticsearch table in the SQL CLI to store the UV results. This table contains 3 columns: date, time and cumulative UVs. The date_str and time_str column are defined as primary key, Elasticsearch sink will use them to calculate the document ID and work in upsert mode to update UV values under the document ID.
CREATE TABLE cumulative_uv ( date_str STRING, time_str STRING, uv BIGINT, PRIMARY KEY (date_str, time_str) NOT ENFORCED ) WITH ( &#39;connector&#39; = &#39;elasticsearch-7&#39;, &#39;hosts&#39; = &#39;http://elasticsearch:9200&#39;, &#39;index&#39; = &#39;cumulative_uv&#39; ); We can extract the date and time using DATE_FORMAT function based on the ts field. As the section title describes, we only need to report every 10 minutes. So, we can use SUBSTR and the string concat function || to convert the time value into a 10-minute interval time string, such as 12:00, 12:10. Next, we group data by date_str and perform a COUNT DISTINCT aggregation on user_id to get the current cumulative UV in this day. Additionally, we perform a MAX aggregation on time_str field to get the current stream time: the maximum event time observed so far. As the maximum time is also a part of the primary key of the sink, the final result is that we will insert a new point into the elasticsearch every 10 minute. And every latest point will be updated continuously until the next 10-minute point is generated.
INSERT INTO cumulative_uv SELECT date_str, MAX(time_str), COUNT(DISTINCT user_id) as uv FROM ( SELECT DATE_FORMAT(ts, &#39;yyyy-MM-dd&#39;) as date_str, SUBSTR(DATE_FORMAT(ts, &#39;HH:mm&#39;),1,4) || &#39;0&#39; as time_str, user_id FROM user_behavior) GROUP BY date_str; After submitting this query, we create a cumulative_uv index pattern in Kibana. We then create a &ldquo;Line&rdquo; (line graph) on the dashboard, by selecting the cumulative_uv index, and drawing the cumulative UV curve according to the configuration on the left side of the following figure before finally saving the curve.
Top Categories # The last visualization represents the category rankings to inform us on the most popular categories in our e-commerce site. Since our data source offers events for more than 5,000 categories without providing any additional significance to our analytics, we would like to reduce it so that it only includes the top-level categories. We will use the data in our MySQL database by joining it as a dimension table with our Kafka events to map sub-categories to top-level categories.
Create a table in the SQL CLI to make the data in MySQL accessible to Flink SQL.
CREATE TABLE category_dim ( sub_category_id BIGINT, parent_category_name STRING ) WITH ( &#39;connector&#39; = &#39;jdbc&#39;, &#39;url&#39; = &#39;jdbc:mysql://mysql:3306/flink&#39;, &#39;table-name&#39; = &#39;category&#39;, &#39;username&#39; = &#39;root&#39;, &#39;password&#39; = &#39;123456&#39;, &#39;lookup.cache.max-rows&#39; = &#39;5000&#39;, &#39;lookup.cache.ttl&#39; = &#39;10min&#39; ); The underlying JDBC connector implements the LookupTableSource interface, so the created JDBC table category_dim can be used as a temporal table (i.e. lookup table) out-of-the-box in the data enrichment.
In addition, create an Elasticsearch table to store the category statistics.
CREATE TABLE top_category ( category_name STRING PRIMARY KEY NOT ENFORCED, buy_cnt BIGINT ) WITH ( &#39;connector&#39; = &#39;elasticsearch-7&#39;, &#39;hosts&#39; = &#39;http://elasticsearch:9200&#39;, &#39;index&#39; = &#39;top_category&#39; ); In order to enrich the category names, we use Flink SQL’s temporal table joins to join a dimension table. You can access more information about temporal joins in the Flink documentation.
Additionally, we use the CREATE VIEW syntax to register the query as a logical view, allowing us to easily reference this query in subsequent queries and simplify nested queries. Please note that creating a logical view does not trigger the execution of the job and the view results are not persisted. Therefore, this statement is lightweight and does not have additional overhead.
CREATE VIEW rich_user_behavior AS SELECT U.user_id, U.item_id, U.behavior, C.parent_category_name as category_name FROM user_behavior AS U LEFT JOIN category_dim FOR SYSTEM_TIME AS OF U.proctime AS C ON U.category_id = C.sub_category_id; Finally, we group the dimensional table by category name to count the number of buy events and write the result to Elasticsearch’s top_category index.
INSERT INTO top_category SELECT category_name, COUNT(*) buy_cnt FROM rich_user_behavior WHERE behavior = &#39;buy&#39; GROUP BY category_name; After submitting the query, we create a top_category index pattern in Kibana. We then create a &ldquo;Horizontal Bar&rdquo; (bar graph) on the dashboard, by selecting the top_category index and drawing the category ranking according to the configuration on the left side of the following diagram before finally saving the list.
As illustrated in the diagram, the categories of clothing and shoes exceed by far other categories on the e-commerce website.
We have now implemented three practical applications and created charts for them. We can now return to the dashboard page and drag-and-drop each view to give our dashboard a more formal and intuitive style, as illustrated in the beginning of the blogpost. Of course, Kibana also provides a rich set of graphics and visualization features, and the user_behavior logs contain a lot more interesting information to explore. Using Flink SQL, you can analyze data in more dimensions, while using Kibana allows you to display more views and observe real-time changes in its charts!
Summary # In the previous sections, we described how to use Flink SQL to integrate Kafka, MySQL, Elasticsearch, and Kibana to quickly build a real-time analytics application. The entire process can be completed using standard SQL syntax, without a line of Java or Scala code. We hope that this article provides some clear and practical examples of the convenience and power of Flink SQL, featuring an easy connection to various external systems, native support for event time and out-of-order handling, dimension table joins and a wide range of built-in functions. We hope you have fun following the examples in this blogpost!
`}),e.add({id:148,href:"/2020/07/23/sharing-is-caring-catalogs-in-flink-sql/",title:"Sharing is caring - Catalogs in Flink SQL",section:"Flink Blog",content:"With an ever-growing number of people working with data, it&rsquo;s a common practice for companies to build self-service platforms with the goal of democratizing their access across different teams and — especially — to enable users from any background to be independent in their data needs. In such environments, metadata management becomes a crucial aspect. Without it, users often work blindly, spending too much time searching for datasets and their location, figuring out data formats and similar cumbersome tasks.\nIn this blog post, we want to give you a high level overview of catalogs in Flink. We&rsquo;ll describe why you should consider using them and what you can achieve with one in place. To round it up, we&rsquo;ll also showcase how simple it is to combine catalogs and Flink, in the form of an end-to-end example that you can try out yourself.\nWhy do I need a catalog? # Frequently, companies start building a data platform with a metastore, catalog, or schema registry of some sort already in place. Those let you clearly separate making the data available from consuming it. That separation has a few benefits:\nImproved productivity - The most obvious one. Making data reusable and shifting the focus on building new models/pipelines rather than data cleansing and discovery. Security - You can control the access to certain features of the data. For example, you can make the schema of the dataset publicly available, but limit the actual access to the underlying data only to particular teams. Compliance - If you have all the metadata in a central entity, it&rsquo;s much easier to ensure compliance with GDPR and similar regulations and legal requirements. What is stored in a catalog? # Almost all data sets can be described by certain properties that must be known in order to consume them. Those include:\nSchema - It describes the actual contents of the data, what columns it has, what are the constraints (e.g. keys) on which the updates should be performed, which fields can act as time attributes, what are the rules for watermark generation and so on.\nLocation - Does the data come from Kafka or a file in a filesystem? How do you connect to the external system? Which topic or file name do you use?\nFormat - Is the data serialized as JSON, CSV, or maybe Avro records?\nStatistics - You can also store additional information that can be useful when creating an execution plan of your query. For example, you can choose the best join algorithm, based on the number of rows in joined datasets.\nCatalogs don’t have to be limited to the metadata of datasets. You can usually store other objects that can be reused in different scenarios, such as:\nFunctions - It&rsquo;s very common to have domain specific functions that can be helpful in different use cases. Instead of having to create them in each place separately, you can just create them once and share them with others.\nQueries - Those can be useful when you don’t want to persist a data set, but want to provide a recipe for creating it from other sources instead.\nCatalogs support in Flink SQL # Starting from version 1.9, Flink has a set of Catalog APIs that allows to integrate Flink with various catalog implementations. With the help of those APIs, you can query tables in Flink that were created in your external catalogs (e.g. Hive Metastore). Additionally, depending on the catalog implementation, you can create new objects such as tables or views from Flink, reuse them across different jobs, and possibly even use them in other tools compatible with that catalog. In other words, you can see catalogs as having a two-fold purpose:\nProvide an out-of-the box integration with ecosystems such as RDBMSs or Hive that allows you to query external objects like tables, views, or functions with no additional connector configuration. The connector properties are automatically derived from the catalog itself.\nAct as a persistent store for Flink-specific metadata. In this mode, we additionally store connector properties alongside the logical metadata (e.g. schema, object name). That approach enables you to, for example, store a full definition of a Kafka-backed table with records serialized with Avro in Hive that can be later on used by Flink. However, as it incorporates Flink-specific properties, it can not be used by other tools that leverage Hive Metastore.\nAs of Flink 1.11, there are two catalog implementations supported by the community:\nA comprehensive Hive catalog\nA Postgres catalog (preview, read-only, for now)\nNote Flink does not store data at rest; it is a compute engine and requires other systems to consume input from and write its output. This means that Flink does not own the lifecycle of the data. Integration with Catalogs does not change that. Flink uses catalogs for metadata management only. All you need to do to start querying your tables defined in either of these metastores is to create the corresponding catalogs with connection parameters. Once this is done, you can use them the way you would in any relational database management system.\n-- create a catalog which gives access to the backing Postgres installation CREATE CATALOG postgres WITH ( &#39;type&#39;=&#39;jdbc&#39;, &#39;property-version&#39;=&#39;1&#39;, &#39;base-url&#39;=&#39;jdbc:postgresql://postgres:5432/&#39;, &#39;default-database&#39;=&#39;postgres&#39;, &#39;username&#39;=&#39;postgres&#39;, &#39;password&#39;=&#39;example&#39; ); -- create a catalog which gives access to the backing Hive installation CREATE CATALOG hive WITH ( &#39;type&#39;=&#39;hive&#39;, &#39;property-version&#39;=&#39;1&#39;, &#39;hive-version&#39;=&#39;2.3.6&#39;, &#39;hive-conf-dir&#39;=&#39;/opt/hive-conf&#39; ); After creating the catalogs, you can confirm that they are available to Flink and also list the databases or tables in each of these catalogs:\n&gt; show catalogs; default_catalog hive postgres -- switch the default catalog to Hive &gt; use catalog hive; &gt; show databases; default -- hive&#39;s default database &gt; show tables; dev_orders &gt; use catalog postgres; &gt; show tables; prod_customer prod_nation prod_rates prod_region region_stats -- desribe a schema of a table in Postgres, the Postgres types are automatically mapped to -- Flink&#39;s type system &gt; describe prod_customer root |-- c_custkey: INT NOT NULL |-- c_name: VARCHAR(25) NOT NULL |-- c_address: VARCHAR(40) NOT NULL |-- c_nationkey: INT NOT NULL |-- c_phone: CHAR(15) NOT NULL |-- c_acctbal: DOUBLE NOT NULL |-- c_mktsegment: CHAR(10) NOT NULL |-- c_comment: VARCHAR(117) NOT NULL Now that you know which tables are available, you can write your first query. In this scenario, we keep customer orders in Hive (dev_orders) because of their volume, and reference customer data in Postgres (prod_customer) to be able to easily update it. Let’s write a query that shows customers and their orders by region and order priority for a specific day.\nUSE CATALOG postgres; SELECT r_name AS `region`, o_orderpriority AS `priority`, COUNT(DISTINCT c_custkey) AS `number_of_customers`, COUNT(o_orderkey) AS `number_of_orders` FROM `hive`.`default`.dev_orders -- we need to fully qualify the table in hive because we set the -- current catalog to Postgres JOIN prod_customer ON o_custkey = c_custkey JOIN prod_nation ON c_nationkey = n_nationkey JOIN prod_region ON n_regionkey = r_regionkey WHERE FLOOR(o_ordertime TO DAY) = TIMESTAMP &#39;2020-04-01 0:00:00.000&#39; AND NOT o_orderpriority = &#39;4-NOT SPECIFIED&#39; GROUP BY r_name, o_orderpriority ORDER BY r_name, o_orderpriority; Flink&rsquo;s catalog support also covers storing Flink-specific objects in external catalogs that might not be fully usable by the corresponding external tools. The most notable use case for this is, for example, storing a table that describes a Kafka topic in a Hive catalog. Take the following DDL statement, that contains a watermark declaration as well as a set of connector properties that are not recognizable by Hive. You won&rsquo;t be able to query the table with Hive, but it will be persisted and can be reused by different Flink jobs.\nUSE CATALOG hive; CREATE TABLE prod_lineitem ( l_orderkey INTEGER, l_partkey INTEGER, l_suppkey INTEGER, l_linenumber INTEGER, l_quantity DOUBLE, l_extendedprice DOUBLE, l_discount DOUBLE, l_tax DOUBLE, l_currency STRING, l_returnflag STRING, l_linestatus STRING, l_ordertime TIMESTAMP(3), l_shipinstruct STRING, l_shipmode STRING, l_comment STRING, l_proctime AS PROCTIME(), WATERMARK FOR l_ordertime AS l_ordertime - INTERVAL &#39;5&#39; SECONDS ) WITH ( &#39;connector&#39;=&#39;kafka&#39;, &#39;topic&#39;=&#39;lineitem&#39;, &#39;scan.startup.mode&#39;=&#39;earliest-offset&#39;, &#39;properties.bootstrap.servers&#39;=&#39;kafka:9092&#39;, &#39;properties.group.id&#39;=&#39;testGroup&#39;, &#39;format&#39;=&#39;csv&#39;, &#39;csv.field-delimiter&#39;=&#39;|&#39; ); With prod_lineitem stored in Hive, you can now write a query that will enrich the incoming stream with static data kept in Postgres. To illustrate how this works, let&rsquo;s calculate the item prices based on the current currency rates:\nUSE CATALOG postgres; SELECT l_proctime AS `querytime`, l_orderkey AS `order`, l_linenumber AS `linenumber`, l_currency AS `currency`, rs_rate AS `cur_rate`, (l_extendedprice * (1 - l_discount) * (1 + l_tax)) / rs_rate AS `open_in_euro` FROM hive.`default`.prod_lineitem JOIN prod_rates FOR SYSTEM_TIME AS OF l_proctime ON rs_symbol = l_currency WHERE l_linestatus = &#39;O&#39;; The query above uses a SYSTEM AS OF clause for executing a temporal join. If you&rsquo;d like to learn more about the different kind of joins you can do in Flink I highly encourage you to check this documentation page.\nConclusion # Catalogs can be extremely powerful when building data platforms aimed at reusing the work of different teams in an organization. Centralizing the metadata is a common practice for improving productivity, security, and compliance when working with data.\nFlink provides flexible metadata management capabilities, that aim at reducing the cumbersome, repetitive work needed before querying the data such as defining schemas, connection properties etc. As of version 1.11, Flink provides a native, comprehensive integration with Hive Metastore and a read-only version for Postgres catalogs.\nYou can get started with Flink and catalogs by reading the docs. If you want to play around with Flink SQL (e.g. try out how catalogs work in Flink yourself), you can check this demo prepared by our colleagues Fabian and Timo — it runs in a dockerized environment, and we used it for the examples in this blog post.\n"}),e.add({id:149,href:"/2020/07/21/apache-flink-1.11.1-released/",title:"Apache Flink 1.11.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.11 series.
This release includes 44 fixes and minor improvements for Flink 1.11.0. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.11.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.11.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.11.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-15794] - Rethink default value of kubernetes.container.image [FLINK-18324] - Translate updated data type and function page into Chinese [FLINK-18387] - Translate &quot;BlackHole SQL Connector&quot; page into Chinese [FLINK-18388] - Translate &quot;CSV Format&quot; page into Chinese [FLINK-18391] - Translate &quot;Avro Format&quot; page into Chinese [FLINK-18395] - Translate &quot;ORC Format&quot; page into Chinese [FLINK-18469] - Add Application Mode to release notes. [FLINK-18524] - Scala varargs cause exception for new inference Bug [FLINK-15414] - KafkaITCase#prepare failed in travis [FLINK-16181] - IfCallGen will throw NPE for primitive types in blink [FLINK-16572] - CheckPubSubEmulatorTest is flaky on Azure [FLINK-17543] - Rerunning failed azure jobs fails when uploading logs [FLINK-17636] - SingleInputGateTest.testConcurrentReadStateAndProcessAndClose: Trying to read from released RecoveredInputChannel [FLINK-18097] - History server doesn&#39;t clean all job json files [FLINK-18419] - Can not create a catalog from user jar [FLINK-18434] - Can not select fields with JdbcCatalog [FLINK-18440] - ROW_NUMBER function: ROW/RANGE not allowed with RANK, DENSE_RANK or ROW_NUMBER functions [FLINK-18461] - Changelog source can&#39;t be insert into upsert sink [FLINK-18470] - Tests RocksKeyGroupsRocksSingleStateIteratorTest#testMergeIteratorByte &amp; RocksKeyGroupsRocksSingleStateIteratorTest#testMergeIteratorShort fail locally [FLINK-18471] - flink-runtime lists &quot;org.uncommons.maths:uncommons-maths:1.2.2a&quot; as a bundled dependency, but it isn&#39;t [FLINK-18477] - ChangelogSocketExample does not work [FLINK-18478] - AvroDeserializationSchema does not work with types generated by avrohugger [FLINK-18485] - Kerberized YARN per-job on Docker test failed during unzip jce_policy-8.zip [FLINK-18519] - Propagate exception to client when execution fails for REST submission [FLINK-18520] - New Table Function type inference fails [FLINK-18529] - Query Hive table and filter by timestamp partition can fail [FLINK-18539] - StreamExecutionEnvironment#addSource(SourceFunction, TypeInformation) doesn&#39;t use the user defined type information [FLINK-18573] - InfluxDB reporter cannot be loaded as plugin [FLINK-18583] - The _id field is incorrectly set to index in Elasticsearch6 DynamicTableSink [FLINK-18585] - Dynamic index can not work in new DynamicTableSink [FLINK-18591] - Fix the format issue for metrics web page Improvement [FLINK-18186] - Various updates on Kubernetes standalone document [FLINK-18422] - Update Prefer tag in documentation &#39;Fault Tolerance training lesson&#39; [FLINK-18457] - Fix invalid links in &quot;Detecting Patterns&quot; page of &quot;Streaming Concepts&quot; [FLINK-18472] - Local Installation Getting Started Guide [FLINK-18484] - RowSerializer arity error does not provide specific information about the mismatch [FLINK-18501] - Mapping of Pluggable Filesystems to scheme is not properly logged [FLINK-18526] - Add the configuration of Python UDF using Managed Memory in the doc of Pyflink [FLINK-18532] - Remove Beta tag from MATCH_RECOGNIZE docs [FLINK-18561] - Build manylinux1 with better compatibility instead of manylinux2014 Python Wheel Packages [FLINK-18593] - Hive bundle jar URLs are broken Test [FLINK-18534] - KafkaTableITCase.testKafkaDebeziumChangelogSource failed with &quot;Topic &#39;changelog_topic&#39; already exists&quot; Task [FLINK-18502] - Add the page &#39;legacySourceSinks.zh.md&#39; into the directory &#39;docs/dev/table&#39; [FLINK-18505] - Correct the content of &#39;sourceSinks.zh.md&#39; `}),e.add({id:150,href:"/2020/07/14/application-deployment-in-flink-current-state-and-the-new-application-mode/",title:"Application Deployment in Flink: Current State and the new Application Mode",section:"Flink Blog",content:`With the rise of stream processing and real-time analytics as a critical tool for modern businesses, an increasing number of organizations build platforms with Apache Flink at their core and offer it internally as a service. Many talks with related topics from companies like Uber, Netflix and Alibaba in the latest editions of Flink Forward further illustrate this trend.
These platforms aim at simplifying application submission internally by lifting all the operational burden from the end user. To submit Flink applications, these platforms usually expose only a centralized or low-parallelism endpoint (e.g. a Web frontend) for application submission that we will call the Deployer.
One of the roadblocks that platform developers and maintainers often mention is that the Deployer can be a heavy resource consumer that is difficult to provision for. Provisioning for average load can lead to the Deployer service being overwhelmed with deployment requests (in the worst case, for all production applications in a short period of time), while planning based on top load leads to unnecessary costs. Building on this observation, Flink 1.11 introduces the Application Mode as a deployment option, which allows for a lightweight, more scalable application submission process that manages to spread more evenly the application deployment load across the nodes in the cluster.
In order to understand the problem and how the Application Mode solves it, we start by describing briefly the current status of application execution in Flink, before describing the architectural changes introduced by the new deployment mode and how to leverage them.
Application Execution in Flink # The execution of an application in Flink mainly involves three entities: the Client, the JobManager and the TaskManagers. The Client is responsible for submitting the application to the cluster, the JobManager is responsible for the necessary bookkeeping during execution, and the TaskManagers are the ones doing the actual computation. For more details please refer to Flink&rsquo;s Architecture documentation page.
Current Deployment Modes # Before the introduction of the Application Mode in version 1.11, Flink allowed users to execute an application either on a Session or a Per-Job Cluster. The differences between the two have to do with the cluster lifecycle and the resource isolation guarantees they provide.
Session Mode # Session Mode assumes an already running cluster and uses the resources of that cluster to execute any submitted application. Applications executed in the same (session) cluster use, and consequently compete for, the same resources. This has the advantage that you do not pay the resource overhead of spinning up a full cluster for every submitted job. But, if one of the jobs misbehaves or brings down a TaskManager, then all jobs running on that TaskManager will be affected by the failure. Apart from a negative impact on the job that caused the failure, this implies a potential massive recovery process with all the restarting jobs accessing the file system concurrently and making it unavailable to other services. Additionally, having a single cluster running multiple jobs implies more load for the JobManager, which is responsible for the bookkeeping of all the jobs in the cluster. This mode is ideal for short jobs where startup latency is of high importance, e.g. interactive queries.
Per-Job Mode # In Per-Job Mode, the available cluster manager framework (e.g. YARN or Kubernetes) is used to spin up a Flink cluster for each submitted job, which is available to that job only. When the job finishes, the cluster is shut down and any lingering resources (e.g. files) are cleaned up. This mode allows for better resource isolation, as a misbehaving job cannot affect any other job. In addition, it spreads the load of bookkeeping across multiple entities, as each application has its own JobManager. Given the aforementioned resource isolation concerns of the Session Mode, users often opt for the Per-Job Mode for long-running jobs which are willing to accept some increase in startup latency in favor of resilience.
To summarize, in Session Mode, the cluster lifecycle is independent of any job running on the cluster and all jobs running on the cluster share its resources. The per-job mode chooses to pay the price of spinning up a cluster for every submitted job, in order to provide better resource isolation guarantees as the resources are not shared across jobs. In this case, the lifecycle of the cluster is bound to that of the job.
Application Submission # Flink application execution consists of two stages: pre-flight, when the users’ main() method is called; and runtime, which is triggered as soon as the user code calls execute(). The main() method constructs the user program using one of Flink’s APIs (DataStream API, Table API, DataSet API). When the main() method calls env.execute(), the user-defined pipeline is translated into a form that Flink&rsquo;s runtime can understand, called the job graph, and it is shipped to the cluster.
Despite their differences, both session and per-job modes execute the application’s main() method, i.e. the pre-flight phase, on the client side.1
This is usually not a problem for individual users who already have all the dependencies of their jobs locally, and then submit their applications through a client running on their machine. But in the case of submission through a remote entity like the Deployer, this process includes:
downloading the application’s dependencies locally,
executing the main()method to extract the job graph,
ship the job graph and its dependencies to the cluster for execution and,
potentially, wait for the result.
This makes the Client a heavy resource consumer as it may need substantial network bandwidth to download dependencies and ship binaries to the cluster, and CPU cycles to execute the main() method. This problem is even more pronounced as more users share the same Client.
The figure above illustrates the two deployment modes using 3 applications depicted in red, blue and green. Each one has a parallelism of 3. The black rectangles represent different processes: TaskManagers, JobManagers and the Deployer; and we assume a single Deployer process in all scenarios. The colored triangles represent the load of the submission process, while the colored rectangles represent the load of the TaskManager and JobManager processes. As shown in the figure, the Deployer in both per-job and session mode share the same load. Their difference lies in the distribution of the tasks and the JobManager load. In the Session Mode, there is a single JobManager for all the jobs in the cluster while in the per-job mode, there is one for each job. In addition, tasks in Session Mode are assigned randomly to TaskManagers while in Per-Job Mode, each TaskManager can only have tasks of a single job.
Application Mode # The Application Mode builds on the above observations and tries to combine the resource isolation of the per-job mode with a lightweight and scalable application submission process. To achieve this, it creates a cluster per submitted application, but this time, the main() method of the application is executed on the JobManager.
Creating a cluster per application can be seen as creating a session cluster shared only among the jobs of a particular application and torn down when the application finishes. With this architecture, the Application Mode provides the same resource isolation and load balancing guarantees as the Per-Job Mode, but at the granularity of a whole application. This makes sense, as jobs belonging to the same application are expected to be correlated and treated as a unit.
Executing the main() method on the JobManager allows saving the CPU cycles required for extracting the job graph, but also the bandwidth required on the client for downloading the dependencies locally and shipping the job graph and its dependencies to the cluster. Furthermore, it spreads the network load more evenly, as there is one JobManager per application. This is illustrated in the figure above, where we have the same scenario as in the session and per-job deployment mode section, but this time the client load has shifted to the JobManager of each application.
Note In the Application Mode, the main() method is executed on the cluster and not on the Client, as in the other modes. This may have implications for your code as, for example, any paths you register in your environment using the registerCachedFile() must be accessible by the JobManager of your application. Compared to the Per-Job Mode, the Application Mode allows the submission of applications consisting of multiple jobs. The order of job execution is not affected by the deployment mode but by the call used to launch the job. Using the blocking execute() method establishes an order and will lead to the execution of the “next” job being postponed until “this” job finishes. In contrast, the non-blocking executeAsync() method will immediately continue to submit the “next” job as soon as the current job is submitted.
Reducing Network Requirements # As described above, by executing the application&rsquo;s main() method on the JobManager, the Application Mode manages to save a lot of the resources previously required during job submission. But there is still room for improvement.
Focusing on YARN, which already supports all the optimizations mentioned here2, and even with the Application Mode in place, the Client is still required to send the user jar to the JobManager. In addition, for each application, the Client has to ship to the cluster the &ldquo;flink-dist&rdquo; directory which contains the binaries of the framework itself, including the flink-dist.jar, lib/ and plugin/ directories. These two can account for a substantial amount of bandwidth on the client side. Furthermore, shipping the same flink-dist binaries on every submission is both a waste of bandwidth but also of storage space which can be alleviated by simply allowing applications to share the same binaries.
In Flink 1.11, we introduce options that allow the user to:
Specify a remote path to a directory where YARN can find the Flink distribution binaries, and
Specify a remote path where YARN can find the user jar.
For 1., we leverage YARN’s distributed cache and allow applications to share these binaries. So, if an application happens to find copies of Flink on the local storage of its TaskManager due to a previous application that was executed on the same TaskManager, it will not even have to download it internally.
Note Both optimizations are available to all deployment modes on YARN, and not only the Application Mode. Example: Application Mode on Yarn # For a full description, please refer to the official Flink documentation and more specifically to the page that refers to your cluster management framework, e.g. YARN or Kubernetes. Here we will give some examples around YARN, where all the above features are available.
To launch an application in Application Mode, you can use:
./bin/flink run-application -t yarn-application ./MyApplication.jar With this command, all configuration parameters, such as the path to a savepoint to be used to bootstrap the application’s state or the required JobManager/TaskManager memory sizes, can be specified by their configuration option, prefixed by -D. For a catalog of the available configuration options, please refer to Flink’s configuration page.
As an example, the command to specify the memory sizes of the JobManager and the TaskManager would look like:
./bin/flink run-application -t yarn-application \\ -Djobmanager.memory.process.size=2048m \\ -Dtaskmanager.memory.process.size=4096m \\ ./MyApplication.jar As discussed earlier, the above will make sure that your application’s main() method will be executed on the JobManager.
To further save the bandwidth of shipping the Flink distribution to the cluster, consider pre-uploading the Flink distribution to a location accessible by YARN and using the yarn.provided.lib.dirs configuration option, as shown below:
./bin/flink run-application -t yarn-application \\ -Djobmanager.memory.process.size=2048m \\ -Dtaskmanager.memory.process.size=4096m \\ -Dyarn.provided.lib.dirs="hdfs://myhdfs/remote-flink-dist-dir" \\ ./MyApplication.jar Finally, in order to further save the bandwidth required to submit your application jar, you can pre-upload it to HDFS, and specify the remote path that points to ./MyApplication.jar, as shown below:
./bin/flink run-application -t yarn-application \\ -Djobmanager.memory.process.size=2048m \\ -Dtaskmanager.memory.process.size=4096m \\ -Dyarn.provided.lib.dirs="hdfs://myhdfs/remote-flink-dist-dir" \\ hdfs://myhdfs/jars/MyApplication.jar This will make the job submission extra lightweight as the needed Flink jars and the application jar are going to be picked up from the specified remote locations rather than be shipped to the cluster by the Client. The only thing the Client will ship to the cluster is the configuration of your application which includes all the aforementioned paths.
Conclusion # We hope that this discussion helped you understand the differences between the various deployment modes offered by Flink and will help you to make informed decisions about which one is suitable in your own setup. Feel free to play around with them and report any issues you may find. If you have any questions or requests, do not hesitate to post them in the mailing lists and, hopefully, see you (virtually) at one of our conferences or meetups soon!
The only exceptions are the Web Submission and the Standalone per-job implementation.&#160;&#x21a9;&#xfe0e;
Support for Kubernetes will come soon.&#160;&#x21a9;&#xfe0e;
`}),e.add({id:151,href:"/2020/07/06/apache-flink-1.11.0-release-announcement/",title:"Apache Flink 1.11.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is proud to announce the release of Flink 1.11.0! More than 200 contributors worked on over 1.3k issues to bring significant improvements to usability as well as new features to Flink users across the whole API stack. Some highlights that we&rsquo;re particularly excited about are:
The core engine is introducing unaligned checkpoints, a major change to Flink&rsquo;s fault tolerance mechanism that improves checkpointing performance under heavy backpressure.
A new Source API that simplifies the implementation of (custom) sources by unifying batch and streaming execution, as well as offloading internals such as event-time handling, watermark generation or idleness detection to Flink.
Flink SQL is introducing Support for Change Data Capture (CDC) to easily consume and interpret database changelogs from tools like Debezium. The renewed FileSystem Connector also expands the set of use cases and formats supported in the Table API/SQL, enabling scenarios like streaming data directly from Kafka to Hive.
Multiple performance optimizations to PyFlink, including support for vectorized User-defined Functions (Pandas UDFs). This improves interoperability with libraries like Pandas and NumPy, making Flink more powerful for data science and ML workloads.
Read on for all major new features and improvements, important changes to be aware of and what to expect moving forward!
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent distribution of PyFlink is available on PyPI. Please review the release notes carefully, and check the complete release changelog and updated documentation for more details.
We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
New Features and Improvements # Unaligned Checkpoints (Beta) # Triggering a checkpoint in Flink will cause a checkpoint barrier to flow from the sources of your topology all the way towards the sinks. For operators that receive more than one input stream, the barriers flowing through each channel need to be aligned before the operator can snapshot its state and forward the checkpoint barrier — typically, this alignment will take just a few milliseconds to complete, but it can become a bottleneck in backpressured pipelines as:
Checkpoint barriers will flow much slower through backpressured channels, effectively blocking the remaining channels and their upstream operators during checkpointing;
Slow checkpoint barrier propagation leads to longer checkpointing times and can, worst case, result in little to no progress in the application.
To improve the performance of checkpointing under backpressure scenarios, the community is rolling out the first iteration of unaligned checkpoints (FLIP-76) with Flink 1.11. Compared to the original checkpointing mechanism (Fig. 1), this approach doesn’t wait for barrier alignment across input channels, instead allowing barriers to overtake in-flight records (i.e., data stored in buffers) and forwarding them downstream before the synchronous part of the checkpoint takes place (Fig. 2).
Fig.1: Aligned Checkpoints Fig.2: Unaligned Checkpoints Because in-flight records have to be persisted as part of the snapshot, unaligned checkpoints will lead to increased checkpoints sizes. On the upside, checkpointing times are heavily reduced, so users will see more progress (even in unstable environments) as more up-to-date checkpoints will lighten the recovery process. You can learn more about the current limitations of unaligned checkpoints in the documentation, and track the improvement work planned for this feature in FLINK-14551.
As with any beta feature, we appreciate early feedback that you might want to share with the community after giving unaligned checkpoints a try!
Info To enable this feature, you need to configure the enableUnalignedCheckpoints option in your checkpoint config. Please note that unaligned checkpoints can only be enabled if checkpointingMode is set to CheckpointingMode.EXACTLY_ONCE.
Unified Watermark Generators # So far, watermark generation (prev. also called assignment) relied on two different interfaces: AssignerWithPunctuatedWatermarks and AssignerWithPeriodicWatermarks; that were closely intertwined with timestamp extraction. This made it difficult to implement long-requested features like support for idleness detection, besides leading to code duplication and maintenance burden. With FLIP-126, the legacy watermark assigners are unified into a single interface: the WatermarkGenerator; and detached from the TimestampAssigner.
This gives users more control over watermark emission and simplifies the implementation of new connectors that need to support watermark assignment and timestamp extraction at the source (see New Data Source API). Multiple strategies for watermarking are available out-of-the-box as convenience methods in Flink 1.11 (e.g. forBoundedOutOfOrderness, forMonotonousTimestamps), though you can also choose to customize your own.
Support for Watermark Idleness Detection
The WatermarkStrategy.withIdleness() method allows you to mark a stream as idle if no events arrive within a configured time (i.e. a timeout duration), which in turn allows handling event time skew properly and preventing idle partitions from holding back the event time progress of the entire application. Users can already benefit from per-partition idleness detection in the Kafka connector, which has been adapted to use the new interfaces (FLINK-17669).
Note FLIP-126 introduces no breaking changes, but we recommend that users give preference to the new WatermarkGenerator interface moving forward, in preparation for the deprecation of the legacy watermark assigners in future releases.
New Data Source API (Beta) # Up to this point, writing a production-grade source connector for Flink was a non-trivial task that required users to be somewhat familiar with Flink internals and account for implementation details like event time assignment, watermark generation or idleness detection in their code. Flink 1.11 introduces a new Data Source API (FLIP-27) to overcome these limitations, as well as the need to rewrite separate code for batch and streaming execution.
Separating the work of split discovery and the actual reading of the consumed data (i.e. the splits) in different components — resp. the SplitEnumerator and SourceReader — allows mixing and matching different enumeration strategies and split readers.
As an example, the existing Kafka connector has multiple strategies for partition discovery that are intermingled with the rest of the code. With the new interfaces in place, it would only need a single reader implementation and there could be several split enumerators for the different partition discovery strategies.
Batch and Streaming Unification
Source connectors implemented using the Data Source API will be able to work both as a bounded (batch) and unbounded (streaming) source. The difference between both cases is minimal: for bounded input, the SplitEnumerator will generate a fixed set of splits and each split is finite; for unbounded input, either the splits are not finite or the SplitEnumerator keeps generating new splits.
Implicit Watermark and Event Time Handling
The TimestampAssigner and WatermarkGenerator run transparently as part of the SourceReader component, so users also don’t have to implement any timestamp extraction or watermark generation code.
Note The existing source connectors have not yet been reimplemented using the Data Source API — this is planned for upcoming releases. If you’re looking to implement a new source, please refer to the Data Source documentation and the tips on source development.
Application Mode Deployments # Prior to Flink 1.11, jobs in a Flink application could either be submitted to a long-running Flink Session Cluster (session mode) or a dedicated Flink Job Cluster (job mode). For both these modes, the main() method of user programs runs on the client side. This presents some challenges: on one hand, if the client is part of a large installation, it can easily become a bottleneck for JobGraph generation; and on the other, it’s not a good fit for containerized environments like Docker or Kubernetes.
From this release on, Flink gets an additional deployment mode: Application Mode (FLIP-85); where the main() method runs on the cluster, rather than the client. The job submission becomes a one-step process: you package your application logic and dependencies into an executable job JAR and the cluster entrypoint (ApplicationClusterEntryPoint) is responsible for calling the main() method to extract the JobGraph.
In Flink 1.11, the community worked to already support application mode in Kubernetes (FLINK-10934).
Other Improvements # Unified Memory Configuration for JobManagers (FLIP-116)
Following the work started in Flink 1.10 to improve memory management and configuration, this release introduces a new memory model that aligns the JobManagers’ configuration options and terminology with that introduced in FLIP-49 for TaskManagers. This affects all deployment types: standalone, YARN, Mesos and the new active Kubernetes integration.
Attention Reusing a previous Flink configuration without any adjustments can result in differently computed memory parameters for the JVM and, as a result, performance changes or even failures. Make sure to check the migration guide if you’re planning to update to the latest version.
Improvements to the Flink WebUI (FLIP-75)
In Flink 1.11, the community kicked off a series of improvements to the Flink WebUI. The first to roll out are better TaskManager and JobManager log display (FLIP-103), as well as a new thread dump utility (FLINK-14816). Some additional work planned for upcoming releases includes better backpressure detection, more flexible and configurable exception display and support for displaying the history of subtask failure attempts.
Docker Image Unification (FLIP-111)
With this release, all Docker-related resources have been consolidated into apache/flink-docker and the entry point script has been extended to allow users to run the default Docker image in different modes without the need to create a custom image. The updated documentation describes in detail how to use and customize the official Flink Docker image for different environments and deployment modes.
Table API/SQL: Support for Change Data Capture (CDC) # Change Data Capture (CDC) has become a popular pattern to capture committed changes from a database and propagate those changes to downstream consumers, for example to keep multiple datastores in sync and avoid common pitfalls such as dual writes. Being able to easily ingest and interpret these changelogs into the Table API/SQL has been a highly demanded feature in the Flink community — and it’s now possible with Flink 1.11.
To extend the scope of the Table API/SQL to use cases like CDC, Flink 1.11 introduces new table source and sink interfaces with changelog mode (see New TableSource and TableSink Interfaces) and support for the Debezium and Canal formats (FLIP-105). This means that dynamic tables sources are no longer limited to append-only operations and can ingest these external changelogs (INSERT events), interpret them into change operations (INSERT, UPDATE, DELETE events) and emit them downstream with the change type.
Users have to specify either “format=debezium-json” or “format=canal-json” in their CREATE TABLE statement to consume changelogs using SQL DDL.
CREATE TABLE my_table ( ... ) WITH ( &#39;connector&#39;=&#39;...&#39;, -- e.g. &#39;kafka&#39; &#39;format&#39;=&#39;debezium-json&#39;, &#39;debezium-json.schema-include&#39;=&#39;true&#39; -- default: false (Debezium can be configured to include or exclude the message schema) &#39;debezium-json.ignore-parse-errors&#39;=&#39;true&#39; -- default: false ); Flink 1.11 only supports Kafka as a changelog source out-of-the-box and JSON-encoded changelogs, with Avro (Debezium) and Protobuf (Canal) planned for future releases. There are also plans to support MySQL binlogs and Kafka compacted topics as sources, as well as to extend changelog support to batch execution.
Attention There is a known issue (FLINK-18461) that prevents changelog sources from being used to write to upsert sinks (e.g. MySQL, HBase, Elasticsearch). This will be fixed in the next patch release (1.11.1).
Table API/SQL: JDBC Catalog Interface and Postgres Catalog # Flink 1.11 introduces a generic JDBC catalog interface (FLIP-93) that enables users of the Table API/SQL to derive table schemas automatically from connections to relational databases over JDBC. This eliminates the previous need for manual schema definition and type conversion, and also allows to check for schema errors at compile time instead of runtime.
The first implementation, rolling out with the new release, is the Postgres catalog.
Table API/SQL: FileSystem Connector with Support for Avro, ORC and Parquet # To improve the user experience for end-to-end streaming ETL use cases, the Flink community worked on a new FileSystem Connector for the Table API/SQL (FLIP-115). The implementation is based on Flink’s FileSystem abstraction and reuses StreamingFileSink to ensure the same set of capabilities and consistent behaviour with the DataStream API.
This also means that Table API/SQL users can now make use of all formats already supported by StreamingFileSink, like (Avro) Parquet, as well as the new formats introduced with this release, like Avro (FLINK-11395) and Orc (FLINK-10114).
CREATE TABLE my_table ( column_name1 INT, column_name2 STRING, ... part_name1 INT, part_name2 STRING ) PARTITIONED BY (part_name1, part_name2) WITH ( &#39;connector&#39; = &#39;filesystem&#39;, &#39;path&#39; = &#39;file:///path/to/file, &#39;format&#39; = &#39;...&#39;, -- supported formats: Avro, ORC, Parquet, CSV, JSON ... ); The new all-rounder FileSystem Connector transparently handles batch and streaming execution, provides exactly-once guarantees and has full partition support, greatly expanding the scope of usage of the legacy connector. This allows users to easily implement common use cases like directly streaming data from Kafka to Hive.
You can track the upcoming improvements to the FileSystem Connector in FLINK-17778.
Table API/SQL: Support for Python UDFs # Prior to this release, users of the Table API/SQL were limited to defining UDFs in either Java or Scala. In Flink 1.11, the community worked on expanding the usage scope of the Python language beyond PyFlink and providing support for Python UDFs in the SQL DDL syntax (FLIP-106), as well as the SQL Client (FLIP-114). Users can also register Python UDFs in the system catalog via SQL DDL or the Java Catalog API, so that functions can be shared between jobs.
Other Improvements to the Table API/SQL # DDL and DML Compatibility for the Hive Connector (FLIP-123)
Starting from Flink 1.11, users can write SQL statements directly using Hive syntax (HiveQL) in the Table API/SQL and the SQL Client. For this purpose, an additional dialect was introduced and users can now dynamically switch between Flink (default) and Hive (hive) on a per-statement basis. For a complete list of supported DDL and DML statements, check the Hive dialect documentation.
Extensions and Improvements to the Flink SQL Syntax
Flink 1.11 introduces the concept of primary key constraints to leverage runtime optimizations in Flink SQL DDL (FLIP-87);
View objects are now fully supported in SQL DDL using the CREATE/ALTER/DROP VIEW statements (FLIP-71);
Users can now specify or override table options in their DQL/DML statements using dynamic table options (FLIP-113).
To make connector properties less verbose and improve exception handling, some key properties have been refactored (FLIP-122). This change does not break compatibility, so users can still use the old property keys.
New TableSource and TableSink Interfaces (FLIP-95)
Flink 1.11 introduces new table source and sink interfaces (resp. DynamicTableSource and DynamicTableSink) that unify batch and streaming execution, provide more efficient data processing with the Blink planner and offer support for handling changelogs (see Support for Change Data Capture (CDC)). The new interfaces also make it easier for users to implement custom connectors or modify existing ones. For an end-to-end example on how to implement a custom scan table source with a decoding format that supports changelog semantics, check out the documentation.
Note Although compatibility is not immediately affected, we recommend that Table API/SQL users update any sources and sinks to the new interface stack.
Refactored TableEnvironment Interface (FLIP-84)
The semantics to describe similar behaviours in the TableEnvironment and Table interfaces have diverged over time, leading to an inconsistent and sometimes unclear user experience. To improve this and make programming more fluent in the Table API/SQL, Flink 1.11 introduces new methods that unify behaviours like execution triggering (e.g. executeSql()) and result representation (e.g. print(), collect()), and also lay the groundwork for important features like multi-statement execution support in future releases.
Note The methods deprecated with FLIP-84 will not be immediately removed, but we recommend that users adopt the newly introduced methods. For a complete list of new and deprecated methods, check the “Summary” section of FLIP-84.
New Type Inference for Table API UDFs (FLIP-65)
In Flink 1.9, the community started working on a new data type system for the Table API to improve its compliance with the SQL standard (FLIP-37). This work is now close to being completed in Flink 1.11, with the exposure of Table API UDFs to the new type system (scalar and table functions, with aggregate functions planned for the next release).
PyFlink: Support for Pandas UDFs # Up to this release, Python UDFs in PyFlink only supported scalar values of standard Python types. This presented some limitations:
High serialization/deserialization overhead in the process of transferring data between the JVM and the Python processes;
Hard to integrate with common Python libraries for high-performance numerical processing like pandas and NumPy.
To overcome these limitations, the community introduced support for (scalar) vectorized Python UDFs based on pandas in Flink 1.11 (FLIP-97). The performance of vectorized UDFs is usually much higher, as the serialization/deserialization overhead is minimized by falling back to Apache Arrow; and handling pandas.Series as input/output allows to take full advantage of the pandas and NumPy libraries. This makes Pandas UDFs a popular solution to parallelize Machine Learning and other large-scale, distributed data science workloads (e.g. feature engineering, distributed model application).
@udf(input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()], result_type=DataTypes.BIGINT(), udf_type=&#34;pandas&#34;) def add(i, j): return i + j To mark a UDF as a Pandas UDF, you only need to add an extra parameter udf_type=”pandas” in the udf decorator, as described in the documentation.
Other Improvements to PyFlink # Conversion fromPandas/toPandas (FLIP-120)
Arrow is also supported as an optimization to convert between PyFlink tables and pandas.DataFrames, enabling users to switch processing engines seamlessly without the need for an intermediate connector. For examples on how to use the new fromPandas() and toPandas() methods in PyFlink, check out the documentation.
Support for User-defined Table Functions (UDTFs) (FLINK-14500)
From Flink 1.11, you can define and register custom UDTFs in PyFlink. Similar to a Python UDF, a UDTF takes zero, one or multiple scalar values as input, but can return an arbitrary number of rows as output instead of a single value.
Cython Performance Optimization for UDFs (FLIP-121)
Cython is a compiled superset of the Python language that is often used to improve the performance of large-scale numeric processing in Python, as it optimizes execution to machine code-level speed and pairs well with popular C-based libraries like NumPy. From Flink 1.11, you can build PyFlink with Cython support and “Cythonize” your Python UDFs to substantially improve code execution speed (up to 30x faster, compared to Python UDFs in Flink 1.10).
User-defined Metrics in Python UDFs (FLIP-112)
To make it easier for users to monitor and debug the execution of Python UDFs, PyFlink now allows gathering and exposing metrics to external systems, as well as defining user scopes and variables. You can access the metrics system from a UDF by calling function_context.get_metric_group() in the open method, as described in the documentation.
Important Changes # [FLINK-17339] The Blink planner is the default in the Table API/SQL starting from Flink 1.11. This was already the case for the SQL Client since Flink 1.10. The old Flink planner is still supported, but not actively developed.
[FLINK-5763] Savepoints now contain all their state inside a single directory (both metadata and program state). This makes it straightforward to figure out which files make up the state of a savepoint and allows users to relocate savepoints by simply moving a directory.
[FLINK-16408] To reduce pressure on the JVM metaspace, the user code class loader is being reused by a TaskExecutor as long as there is at least a single slot allocated for the respective job. This changes Flink&rsquo;s recovery behaviour slightly, so that it will not reload static fields.
[FLINK-11086] Flink now supports Hadoop versions above Hadoop 3.0.0. Note that the Flink project does not provide any updated &ldquo;flink-shaded-hadoop-*&rdquo; jars. Users need to provide Hadoop dependencies through the HADOOP_CLASSPATH environment variable (recommended) or the lib/ folder.
[FLINK-16963] All MetricReporters that come with Flink have been converted to plugins. These should no longer be placed into /lib (which may result in dependency conflicts), but /plugins/&lt;some_directory&gt; instead.
[FLINK-12639] The Flink documentation is undergoing some rework, so you might notice that the navigation and organization of content look slightly different starting from Flink 1.11.
Release Notes # Please review the release notes carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.11. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.
List of Contributors # The Apache Flink community would like to thank all the 200+ contributors that have made this release possible:
Aitozi, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Andrey Zagrebin, Arvid Heise, Ayush Saxena, Bairos, Bartosz Krasinski, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Canbin Zheng, Cedric Chen, Chesnay Schepler, Colm O hEigeartaigh, Congxian Qiu, CrazyTomatoOo, Danish Amjad, Danny Chan, David Anderson, Dawid Wysakowicz, Dian Fu, Dominik Wosiński, Echo Lee, Ethan Marsh, Etienne Chauchot, Fabian Hueske, Fabian Paul, Flavio Pompermaier, Gao Yun, Gary Yao, Ghildiyal, Grebennikov Roman, GuoWei Ma, Guru Prasad, Gyula Fora, Hequn Cheng, Hu Guang, HuFeiHu, HuangXingBo, Igal Shilman, Ismael Juma, Jacob Sevart, Jark Wu, Jaskaran Bindra, Jason K, Jeff Yang, Jeff Zhang, Jerry Wang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jiayi-Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Jun Qin, JunZhang, Jörn Kottmann, Kevin Bohinski, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Lining Jing, Liupengcheng, LululuAlu, Marta Paes Moreira, Matt Welke, Max Kuklinski, Maximilian Michels, Nico Kruber, Niels Basjes, Oleksandr Nitavskyi, Paul Lam, Paul Lin, PengFei Li, PengchengLiu, Piotr Nowojski, Prem Santosh, Qingsheng Ren, Rafi Aroch, Raymond Farrelly, Richard Deurwaarder, Robert Metzger, RocMarshal, Roey Shem Tov, Roman, Roman Khachatryan, Rong Rong, RoyRuan, Rui Li, Seth Wiesman, Shaobin.Ou, Shengkai, Shuiqiang Chen, Shuo Cheng, Sivaprasanna, Sivaprasanna S, SteNicholas, Stefan Richter, Stephan Ewen, Steve OU, Steve Whelan, Tartarus, Terry Wang, Thomas Weise, Till Rohrmann, Timo Walther, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, Wei Zhong, Weike DONG, Xiaogang Zhou, Xintong Song, Xu Bai, Xuannan, Yadong Xie, Yang Wang, Yangze Guo, Yichao Yang, Ying, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Yuval Itzchakov, Zakelly, Zhao, Zhenghua Gao, Zhijiang, Zhu Zhu, acqua.csq, austin ce, azagrebin, bdine, bowen.li, caoyingjie, caozhen, caozhen1937, chaojianok, chen, chendonglin, comsir, cpugputpu, czhang2, dianfu, edu05, eduardowt, fangliang, felixzheng, fmyblack, gauss, gk0916, godfrey he, godfreyhe, guliziduo, guowei.mgw, hehuiyuan, hequn8128, hpeter, huangxingbo, huzheng, ifndef-SleePy, jingwen-ywb, jrthe42, kevin.cyj, klion26, lamber-ken, leesf, libenchao, lijiewang.wlj, liuyongvs, lsy, lumen, machinedoll, mans2singh, molsionmo, oliveryunchang, openinx, paul8263, ptmagic, qqibrow, sev7e0, shuai-xu, shuai.xu, shuiqiangchen, snuyanzin, spafka, sunhaibotb, sunjincheng121, testfixer, tison, vinoyang, vthinkxie, wangtong, wangxianghu, wangxiyuan, wangxlong, wangyang0918, wenlong.lwl, whlwanghailong, william, windWheel, wooplevip, wuxuyang, xushiwei, xuyang1706, yanghua, yangyichao-mango, yuzhao.cyz, zentol, zhanglibing, zhangmang, zhangzhanchun, zhengcanbin, zhengshuli, zhenxianyimeng, zhijiang, zhongyong jin, zhule, zhuxiaoshang, zjuwangg, zoudan, zoudaokoulife, zzchun, “lzh576177775”, 骚sir, 厉颖, 张军, 曹建华, 漫步云端
`}),e.add({id:152,href:"/2020/06/23/flink-on-zeppelin-notebooks-for-interactive-data-analysis-part-2/",title:"Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 2",section:"Flink Blog",content:`In a previous post, we introduced the basics of Flink on Zeppelin and how to do Streaming ETL. In this second part of the &ldquo;Flink on Zeppelin&rdquo; series of posts, I will share how to perform streaming data visualization via Flink on Zeppelin and how to use Apache Flink UDFs in Zeppelin.
Streaming Data Visualization # With Zeppelin, you can build a real time streaming dashboard without writing any line of javascript/html/css code.
Overall, Zeppelin supports 3 kinds of streaming data analytics:
Single Mode Update Mode Append Mode Single Mode # Single mode is used for cases when the result of a SQL statement is always one row, such as the following example. The output format is translated in HTML, and you can specify a paragraph local property template for the final output content template. And you can use {i} as placeholder for the {i}th column of the result.
Update Mode # Update mode is suitable for the cases when the output format is more than one row, and will always be continuously updated. Here’s one example where we use GROUP BY.
Append Mode # Append mode is suitable for the cases when the output data is always appended. For instance, the example below uses a tumble window.
UDF # SQL is a very powerful language, especially in expressing data flow. But most of the time, you need to handle complicated business logic that cannot be expressed by SQL. In these cases UDFs (user-defined functions) come particularly handy. In Zeppelin, you can write Scala or Python UDFs, while you can also import Scala, Python and Java UDFs. Here are 2 examples of Scala and Python UDFs:
Scala UDF %flink class ScalaUpper extends ScalarFunction { def eval(str: String) = str.toUpperCase } btenv.registerFunction(&#34;scala_upper&#34;, new ScalaUpper()) Python UDF %flink.pyflink class PythonUpper(ScalarFunction): def eval(self, s): return s.upper() bt_env.register_function(&#34;python_upper&#34;, udf(PythonUpper(), DataTypes.STRING(), DataTypes.STRING())) After you define the UDFs, you can use them directly in SQL:
Use Scala UDF in SQL Use Python UDF in SQL Summary # In this post, we explained how to perform streaming data visualization via Flink on Zeppelin and how to use UDFs. Besides that, you can do more in Zeppelin with Flink, such as batch processing, Hive integration and more. You can check the following articles for more details and here&rsquo;s a list of Flink on Zeppelin tutorial videos for your reference.
References # Apache Zeppelin official website Flink on Zeppelin tutorials - Part 1 Flink on Zeppelin tutorials - Part 2 Flink on Zeppelin tutorials - Part 3 Flink on Zeppelin tutorials - Part 4 Flink on Zeppelin tutorial videos `}),e.add({id:153,href:"/2020/06/15/flink-on-zeppelin-notebooks-for-interactive-data-analysis-part-1/",title:"Flink on Zeppelin Notebooks for Interactive Data Analysis - Part 1",section:"Flink Blog",content:`The latest release of Apache Zeppelin comes with a redesigned interpreter for Apache Flink (version Flink 1.10+ is only supported moving forward) that allows developers to use Flink directly on Zeppelin notebooks for interactive data analysis. I wrote 2 posts about how to use Flink in Zeppelin. This is part-1 where I explain how the Flink interpreter in Zeppelin works, and provide a tutorial for running Streaming ETL with Flink on Zeppelin.
The Flink Interpreter in Zeppelin 0.9 # The Flink interpreter can be accessed and configured from Zeppelin’s interpreter settings page. The interpreter has been refactored so that Flink users can now take advantage of Zeppelin to write Flink applications in three languages, namely Scala, Python (PyFlink) and SQL (for both batch &amp; streaming executions). Zeppelin 0.9 now comes with the Flink interpreter group, consisting of the below five interpreters:
%flink - Provides a Scala environment %flink.pyflink - Provides a python environment %flink.ipyflink - Provides an ipython environment %flink.ssql - Provides a stream sql environment %flink.bsql - Provides a batch sql environment Not only has the interpreter been extended to support writing Flink applications in three languages, but it has also extended the available execution modes for Flink that now include:
Running Flink in Local Mode Running Flink in Remote Mode Running Flink in Yarn Mode You can find more information about how to get started with Zeppelin and all the execution modes for Flink applications in Zeppelin notebooks in this post.
Flink on Zeppelin for Stream processing # Performing stream processing jobs with Apache Flink on Zeppelin allows you to run most major streaming cases, such as streaming ETL and real time data analytics, with the use of Flink SQL and specific UDFs. Below we showcase how you can execute streaming ETL using Flink on Zeppelin:
You can use Flink SQL to perform streaming ETL by following the steps below (for the full tutorial, please refer to the Flink Tutorial/Streaming ETL tutorial of the Zeppelin distribution):
Step 1. Create source table to represent the source data. Step 2. Create a sink table to represent the processed data. Step 3. After creating the source and sink table, we can insert them to our statement to trigger the stream processing job as the following: Step 4. After initiating the streaming job, you can use another SQL statement to query the sink table to verify the results of your job. Here you can see the top 10 records which will be refreshed every 3 seconds. Summary # In this post, we explained how the redesigned Flink interpreter works in Zeppelin 0.9.0 and provided some examples for performing streaming ETL jobs with Flink and Zeppelin. In the next post, I will talk about how to do streaming data visualization via Flink on Zeppelin. Besides that, you can find an additional tutorial for batch processing with Flink on Zeppelin as well as using Flink on Zeppelin for more advance operations like resource isolation, job concurrency &amp; parallelism, multiple Hadoop &amp; Hive environments and more on our series of posts on Medium. And here&rsquo;s a list of Flink on Zeppelin tutorial videos for your reference.
References # Apache Zeppelin official website Flink on Zeppelin tutorials - Part 1 Flink on Zeppelin tutorials - Part 2 Flink on Zeppelin tutorials - Part 3 Flink on Zeppelin tutorials - Part 4 Flink on Zeppelin tutorial videos `}),e.add({id:154,href:"/2020/06/10/flink-community-update-june20/",title:"Flink Community Update - June'20",section:"Flink Blog",content:`And suddenly it’s June. The previous month has been calm on the surface, but quite hectic underneath — the final testing phase for Flink 1.11 is moving at full speed, Stateful Functions 2.1 is out in the wild and Flink has made it into Google Season of Docs 2020.
To top it off, a piece of good news: Flink Forward is back on October 19-22 as a free virtual event!
The Past Month in Flink # Flink Stateful Functions 2.1 Release # It might seem like Stateful Functions 2.0 was announced only a handful of weeks ago (and it was!), but the Flink community has just released Stateful Functions 2.1! This release introduces two new features: state expiration for any kind of persisted state and support for UNIX Domain Sockets (UDS) to improve the performance of inter-container communication in co-located deployments; as well as other important changes that improve the overall stability and testability of the project. You can read the announcement blogpost for more details on the release!
As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration. If you’d like to get involved, we’re always looking for new contributors — especially around SDKs for other languages (e.g. Go, Rust, Javascript).
Testing is ON for Flink 1.11 # Things have been pretty quiet in the Flink community, as all efforts shifted to testing the newest features shipping with Flink 1.11. While we wait for a voting Release Candidate (RC) to be out, you can check the progress of testing in this JIRA burndown board and learn more about some of the upcoming features in these Flink Forward videos:
Rethinking of fault tolerance in Flink: what lies ahead?
It’s finally here: Python on Flink &amp; Flink on Zeppelin
A deep dive into Flink SQL
Production-Ready Flink and Hive Integration - what story you can tell now?
We encourage the wider community to also get involved in testing once the voting RC is out. Keep an eye on the @dev mailing list for updates!
Flink Minor Releases # Flink 1.10.1 # The community released Flink 1.10.1, covering some outstanding bugs in Flink 1.10. You can find more in the announcement blogpost!
New Committers and PMC Members # The Apache Flink community has welcomed 2 new Committers since the last update. Congratulations!
New Committers # Benchao Li
Xintong Song
The Bigger Picture # Flink Forward Global Virtual Conference 2020 # After a first successful virtual conference last April, Flink Forward will be hosting a second free virtual edition on October 19-22. This time around, the conference will feature two days of hands-on training and two full days of conference talks!
Got a Flink story to share? Maybe your recent adventures with Stateful Functions? The Call for Presentations is now open and accepting submissions from the community until June 19th, 11:59 PM CEST.
Google Season of Docs 2020 # In the last update, we announced that Flink was applying to Google Season of Docs (GSoD) again this year. The good news: we’ve made it into the shortlist of accepted projects! This represents an invaluable opportunity for the Flink community to collaborate with technical writers to improve the Table API &amp; SQL documentation. We’re honored to have seen a great number of people reach out over the last couple of weeks, and look forward to receiving applications from this week on!
If you’re interested in learning more about our project idea or want to get involved in GSoD as a technical writer, check out the announcement blogpost and submit your application. The deadline for GSoD applications is July 9th, 18:00 UTC.
If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.
`}),e.add({id:155,href:"/2020/06/09/stateful-functions-2.1.0-release-announcement/",title:"Stateful Functions 2.1.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is happy to announce the release of Stateful Functions (StateFun) 2.1.0! This release introduces new features around state expiration and performance improvements for co-located deployments, as well as other important changes that improve the stability and testability of the project. As the community around StateFun grows, the release cycle will follow this pattern of smaller and more frequent releases to incorporate user feedback and allow for faster iteration.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website, and the most recent Python SDK distribution is available on PyPI. For more details, check the complete release changelog and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA!
New Features and Improvements # Support for State Time-To-Live (TTL) # Being able to define state expiration and a state cleanup strategy is a useful feature for stateful applications — for example, to keep state size from growing indefinitely or to work with sensitive data. In previous StateFun versions, users could implement this behavior manually using delayed messages as state expiration callbacks. For StateFun 2.1, the community has worked on enabling users to configure any persisted state to expire and be purged after a given duration (i.e. the state time-to-live) (FLINK-17644, FLINK-17875).
Persisted state can be configured to expire after the last write operation (AFTER_WRITE) or after the last read or write operation (AFTER_READ_AND_WRITE). For the Java SDK, users can configure State TTL in the definition of their persisted fields:
@Persisted PersistedValue&lt;Integer&gt; table = PersistedValue.of( &#34;my-value&#34;, Integer.class, Expiration.expireAfterWriting(Duration.ofHours(1))); For remote functions using e.g. the Python SDK, users can configure State TTL in their module.yaml:
functions: - function: states: - name: xxxx expireAfter: 5min # optional key Note: The state expiration mode for remote functions is currently restricted to AFTER_READ_AND_WRITE, and the actual TTL being set is the longest duration across all registered state, not for each individual state entry. This is planned to be improved in upcoming releases (FLINK-17954). Improved Performance with UNIX Domain Sockets (UDS) # Stateful functions can be deployed in multiple ways, even within the same application. For deployments where functions are co-located with the Flink StateFun workers, it’s common to use Kubernetes to deploy pods consisting of a Flink StateFun container and the function sidecar container, communicating via the pod-local network. To improve the performance of such deployments, StateFun 2.1 allows using Unix Domain Sockets (UDS) to communicate between containers in the same pod (i.e. the same machine) (FLINK-17611), which drastically reduces the overhead of going through the network stack.
Users can enable transport via UDS in a remote module by specifying the following in their module.yaml:
functions: - function: spec: - endpoint: http(s)+unix://&lt;socket-file-path&gt;/&lt;serve-url-path&gt; Important Changes # [FLINK-17712] The Flink version in StateFun 2.1 has been upgraded to 1.10.1, the most recent patch version.
[FLINK-17533] StateFun 2.1 now supports concurrent checkpoints, which means applications will no longer fail on savepoints that are triggered concurrently to a checkpoint.
[FLINK-16928] StateFun 2.0 was using the Flink legacy scheduler due to a bug in Flink 1.10. In 2.1, this change is reverted to using the new Flink scheduler again.
[FLINK-17516] The coverage for end-to-end StateFun tests has been extended to also include exactly-once semantics verification (with failure recovery).
Release Notes # Please review the release notes for a detailed list of changes and new features if you plan to upgrade your setup to Stateful Functions 2.1.
List of Contributors # The Apache Flink community would like to thank all contributors that have made this release possible:
abc863377, Authuir, Chesnay Schepler, Congxian Qiu, David Anderson, Dian Fu, Francesco Guardiani, Igal Shilman, Marta Paes Moreira, Patrick Wiener, Rafi Aroch, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai
If you’d like to get involved, we’re always looking for new contributors — especially around SDKs for other languages like Go, Rust or Javascript.
`}),e.add({id:156,href:"/2020/05/12/apache-flink-1.10.1-released/",title:"Apache Flink 1.10.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.10 series.
This release includes 158 fixes and minor improvements for Flink 1.10.0. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.10.1.
Note FLINK-16684 changed the builders of the StreamingFileSink to make them compilable in Scala. This change is source compatible but binary incompatible. If using the StreamingFileSink, please recompile your user code against 1.10.1 before upgrading. Note FLINK-16683 Flink no longer supports starting clusters with .bat scripts. Users should instead use environments like WSL or Cygwin and work with the .sh scripts. Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.10.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.10.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-14126] - Elasticsearch Xpack Machine Learning doesn&#39;t support ARM [FLINK-15143] - Create document for FLIP-49 TM memory model and configuration guide [FLINK-15561] - Unify Kerberos credentials checking [FLINK-15790] - Make FlinkKubeClient and its implementations asynchronous [FLINK-15817] - Kubernetes Resource leak while deployment exception happens [FLINK-16049] - Remove outdated &quot;Best Practices&quot; section from Application Development Section [FLINK-16131] - Translate &quot;Amazon S3&quot; page of &quot;File Systems&quot; into Chinese [FLINK-16389] - Bump Kafka 0.10 to 0.10.2.2 Bug [FLINK-2336] - ArrayIndexOufOBoundsException in TypeExtractor when mapping [FLINK-10918] - incremental Keyed state with RocksDB throws cannot create directory error in windows [FLINK-11193] - Rocksdb timer service factory configuration option is not settable per job [FLINK-13483] - PrestoS3FileSystemITCase.testDirectoryListing fails on Travis [FLINK-14038] - ExecutionGraph deploy failed due to akka timeout [FLINK-14311] - Streaming File Sink end-to-end test failed on Travis [FLINK-14316] - Stuck in &quot;Job leader ... lost leadership&quot; error [FLINK-15417] - Remove the docker volume or mount when starting Mesos e2e cluster [FLINK-15669] - SQL client can&#39;t cancel flink job [FLINK-15772] - Shaded Hadoop S3A with credentials provider end-to-end test fails on travis [FLINK-15811] - StreamSourceOperatorWatermarksTest.testNoMaxWatermarkOnAsyncCancel fails on Travis [FLINK-15812] - HistoryServer archiving is done in Dispatcher main thread [FLINK-15838] - Dangling CountDownLatch.await(timeout) [FLINK-15852] - Job is submitted to the wrong session cluster [FLINK-15904] - Make Kafka Consumer work with activated &quot;disableGenericTypes()&quot; [FLINK-15936] - TaskExecutorTest#testSlotAcceptance deadlocks [FLINK-15953] - Job Status is hard to read for some Statuses [FLINK-16013] - List and map config options could not be parsed correctly [FLINK-16014] - S3 plugin ClassNotFoundException SAXParser [FLINK-16025] - Service could expose blob server port mismatched with JM Container [FLINK-16026] - Travis failed due to python setup [FLINK-16047] - Blink planner produces wrong aggregate results with state clean up [FLINK-16067] - Flink&#39;s CalciteParser swallows error position information [FLINK-16068] - table with keyword-escaped columns and computed_column_expression columns [FLINK-16070] - Blink planner can not extract correct unique key for UpsertStreamTableSink [FLINK-16108] - StreamSQLExample is failed if running in blink planner [FLINK-16111] - Kubernetes deployment does not respect &quot;taskmanager.cpu.cores&quot;. [FLINK-16113] - ExpressionReducer shouldn&#39;t escape the reduced string value [FLINK-16115] - Aliyun oss filesystem could not work with plugin mechanism [FLINK-16139] - Co-location constraints are not reset on task recovery in DefaultScheduler [FLINK-16161] - Statistics zero should be unknown in HiveCatalog [FLINK-16170] - SearchTemplateRequest ClassNotFoundException when use flink-sql-connector-elasticsearch7 [FLINK-16220] - JsonRowSerializationSchema throws cast exception : NullNode cannot be cast to ArrayNode [FLINK-16231] - Hive connector is missing jdk.tools exclusion against Hive 2.x.x [FLINK-16234] - Fix unstable cases in StreamingJobGraphGeneratorTest [FLINK-16241] - Remove the license and notice file in flink-ml-lib module on release-1.10 branch [FLINK-16242] - BinaryGeneric serialization error cause checkpoint failure [FLINK-16262] - Class loader problem with FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory [FLINK-16269] - Generic type can not be matched when convert table to stream. [FLINK-16281] - parameter &#39;maxRetryTimes&#39; can not work in JDBCUpsertTableSink [FLINK-16301] - Annoying &quot;Cannot find FunctionDefinition&quot; messages with SQL for f_proctime or = [FLINK-16308] - SQL connector download links are broken [FLINK-16313] - flink-state-processor-api: surefire execution unstable on Azure [FLINK-16331] - Remove source licenses for old WebUI [FLINK-16345] - Computed column can not refer time attribute column [FLINK-16360] - connector on hive 2.0.1 don&#39;t support type conversion from STRING to VARCHAR [FLINK-16371] - HadoopCompressionBulkWriter fails with &#39;java.io.NotSerializableException&#39; [FLINK-16373] - EmbeddedLeaderService: IllegalStateException: The RPC connection is already closed [FLINK-16413] - Reduce hive source parallelism when limit push down [FLINK-16414] - create udaf/udtf function using sql casuing ValidationException: SQL validation failed. null [FLINK-16433] - TableEnvironmentImpl doesn&#39;t clear buffered operations when it fails to translate the operation [FLINK-16435] - Replace since decorator with versionadd to mark the version an API was introduced [FLINK-16467] - MemorySizeTest#testToHumanReadableString() is not portable [FLINK-16526] - Fix exception when computed column expression references a keyword column name [FLINK-16541] - Document of table.exec.shuffle-mode is incorrect [FLINK-16550] - HadoopS3* tests fail with NullPointerException exceptions [FLINK-16560] - Forward Configuration in PackagedProgramUtils#getPipelineFromProgram [FLINK-16567] - Get the API error of the StreamQueryConfig on Page &quot;Query Configuration&quot; [FLINK-16573] - Kinesis consumer does not properly shutdown RecordFetcher threads [FLINK-16576] - State inconsistency on restore with memory state backends [FLINK-16626] - Prevent REST handler from being closed more than once [FLINK-16632] - SqlDateTimeUtils#toSqlTimestamp(String, String) may yield incorrect result [FLINK-16635] - Incompatible okio dependency in flink-metrics-influxdb module [FLINK-16646] - flink read orc file throw a NullPointerException [FLINK-16647] - Miss file extension when inserting to hive table with compression [FLINK-16652] - BytesColumnVector should init buffer in Hive 3.x [FLINK-16662] - Blink Planner failed to generate JobGraph for POJO DataStream converting to Table (Cannot determine simple type name) [FLINK-16664] - Unable to set DataStreamSource parallelism to default (-1) [FLINK-16675] - TableEnvironmentITCase. testClearOperation fails on travis nightly build [FLINK-16684] - StreamingFileSink builder does not work with Scala [FLINK-16696] - Savepoint trigger documentation is insufficient [FLINK-16703] - AkkaRpcActor state machine does not record transition to terminating state. [FLINK-16705] - LocalExecutor tears down MiniCluster before client can retrieve JobResult [FLINK-16718] - KvStateServerHandlerTest leaks Netty ByteBufs [FLINK-16727] - Fix cast exception when having time point literal as parameters [FLINK-16732] - Failed to call Hive UDF with constant return value [FLINK-16740] - OrcSplitReaderUtil::logicalTypeToOrcType fails to create decimal type with precision &lt; 10 [FLINK-16759] - HiveModuleTest failed to compile on release-1.10 [FLINK-16767] - Failed to read Hive table with RegexSerDe [FLINK-16771] - NPE when filtering by decimal column [FLINK-16821] - Run Kubernetes test failed with invalid named &quot;minikube&quot; [FLINK-16822] - The config set by SET command does not work [FLINK-16825] - PrometheusReporterEndToEndITCase should rely on path returned by DownloadCache [FLINK-16836] - Losing leadership does not clear rpc connection in JobManagerLeaderListener [FLINK-16860] - Failed to push filter into OrcTableSource when upgrading to 1.9.2 [FLINK-16888] - Re-add jquery license file under &quot;/licenses&quot; [FLINK-16901] - Flink Kinesis connector NOTICE should have contents of AWS KPL&#39;s THIRD_PARTY_NOTICES file manually merged in [FLINK-16913] - ReadableConfigToConfigurationAdapter#getEnum throws UnsupportedOperationException [FLINK-16916] - The logic of NullableSerializer#copy is wrong [FLINK-16944] - Compile error in. DumpCompiledPlanTest and PreviewPlanDumpTest [FLINK-16980] - Python UDF doesn&#39;t work with protobuf 3.6.1 [FLINK-16981] - flink-runtime tests are crashing the JVM on Java11 because of PowerMock [FLINK-17062] - Fix the conversion from Java row type to Python row type [FLINK-17066] - Update pyarrow version bounds less than 0.14.0 [FLINK-17093] - Python UDF doesn&#39;t work when the input column is from composite field [FLINK-17107] - CheckpointCoordinatorConfiguration#isExactlyOnce() is inconsistent with StreamConfig#getCheckpointMode() [FLINK-17114] - When the pyflink job runs in local mode and the command &quot;python&quot; points to Python 2.7, the startup of the Python UDF worker will fail. [FLINK-17124] - The PyFlink Job runs into infinite loop if the Python UDF imports job code [FLINK-17152] - FunctionDefinitionUtil generate wrong resultType and acc type of AggregateFunctionDefinition [FLINK-17308] - ExecutionGraphCache cachedExecutionGraphs not cleanup cause OOM Bug [FLINK-17313] - Validation error when insert decimal/varchar with precision into sink using TypeInformation of row [FLINK-17334] - Flink does not support HIVE UDFs with primitive return types [FLINK-17338] - LocalExecutorITCase.testBatchQueryCancel test timeout [FLINK-17359] - Entropy key is not resolved if flink-s3-fs-hadoop is added as a plugin [FLINK-17403] - Fix invalid classpath in BashJavaUtilsITCase [FLINK-17471] - Move LICENSE and NOTICE files to root directory of python distribution [FLINK-17483] - Update flink-sql-connector-elasticsearch7 NOTICE file to correctly reflect bundled dependencies [FLINK-17496] - Performance regression with amazon-kinesis-producer 0.13.1 in Flink 1.10.x [FLINK-17499] - LazyTimerService used to register timers via State Processing API incorrectly mixes event time timers with processing time timers [FLINK-17514] - TaskCancelerWatchdog does not kill TaskManager New Feature [FLINK-17275] - Add core training exercises Improvement [FLINK-9656] - Environment java opts for flink run [FLINK-15094] - Warning about using private constructor of java.nio.DirectByteBuffer in Java 11 [FLINK-15584] - Give nested data type of ROWs in ValidationException [FLINK-15616] - Move boot error messages from python-udf-boot.log to taskmanager&#39;s log file [FLINK-15989] - Rewrap OutOfMemoryError in allocateUnpooledOffHeap with better message [FLINK-16018] - Improve error reporting when submitting batch job (instead of AskTimeoutException) [FLINK-16125] - Make zookeeper.connect optional for Kafka connectors [FLINK-16167] - Update documentation about python shell execution [FLINK-16191] - Improve error message on Windows when RocksDB Paths are too long [FLINK-16280] - Fix sample code errors in the documentation about elasticsearch connector [FLINK-16288] - Setting the TTL for discarding task pods on Kubernetes. [FLINK-16293] - Document using plugins in Kubernetes [FLINK-16343] - Improve exception message when reading an unbounded source in batch mode [FLINK-16406] - Increase default value for JVM Metaspace to minimise its OutOfMemoryError [FLINK-16538] - Restructure Python Table API documentation [FLINK-16604] - Column key in JM configuration is too narrow [FLINK-16683] - Remove scripts for starting Flink on Windows [FLINK-16697] - Disable JMX rebinding [FLINK-16763] - Should not use BatchTableEnvironment for Python UDF in the document of flink-1.10 [FLINK-16772] - Bump derby to 10.12.1.1+ or exclude it [FLINK-16790] - enables the interpretation of backslash escapes [FLINK-16862] - Remove example url in quickstarts [FLINK-16874] - Respect the dynamic options when calculating memory options in taskmanager.sh [FLINK-16942] - ES 5 sink should allow users to select netty transport client [FLINK-17065] - Add documentation about the Python versions supported for PyFlink [FLINK-17125] - Add a Usage Notes Page to Answer Common Questions Encountered by PyFlink Users [FLINK-17254] - Improve the PyFlink documentation and examples to use SQL DDL for source/sink definition [FLINK-17276] - Add checkstyle to training exercises [FLINK-17277] - Apply IntelliJ recommendations to training exercises [FLINK-17278] - Add Travis to the training exercises [FLINK-17279] - Use gradle build scans for training exercises [FLINK-17316] - Have HourlyTips solutions use TumblingEventTimeWindows.of Task [FLINK-15741] - Fix TTL docs after enabling RocksDB compaction filter by default (needs Chinese translation) [FLINK-15933] - update content of how generic table schema is stored in hive via HiveCatalog [FLINK-15991] - Create Chinese documentation for FLIP-49 TM memory model [FLINK-16004] - Exclude flink-rocksdb-state-memory-control-test jars from the dist [FLINK-16454] - Update the copyright year in NOTICE files [FLINK-16530] - Add documentation about &quot;GROUPING SETS&quot; and &quot;CUBE&quot; support in streaming mode [FLINK-16592] - The doc of Streaming File Sink has a mistake of grammar `}),e.add({id:157,href:"/2020/05/06/flink-community-update-may20/",title:"Flink Community Update - May'20",section:"Flink Blog",content:`Can you smell it? It’s release month! It took a while, but now that we’re all caught up with the past, the Community Update is here to stay. This time around, we’re warming up for Flink 1.11 and peeping back to the month of April in the Flink community — with the release of Stateful Functions 2.0, a new self-paced Flink training and some efforts to improve the Flink documentation experience.
Last month also marked the debut of Flink Forward Virtual Conference 2020: what did you think? If you missed it altogether or just want to recap some of the sessions, the videos and slides are now available!
The Past Month in Flink # Flink Stateful Functions 2.0 is out! # In the beginning of April, the Flink community announced the release of Stateful Functions 2.0 — the first as part of the Apache Flink project. From this release, you can use Flink as the base of a (stateful) serverless platform with out-of-the-box consistent and scalable state, and efficient messaging between functions. You can even run your stateful functions on platforms like AWS Lambda, as Gordon (@tzulitai) demonstrated in his Flink Forward talk.
It’s been encouraging to see so many questions about Stateful Functions popping up in the mailing list and Stack Overflow! If you’d like to get involved, we’re always looking for new contributors — especially around SDKs for other languages like Go, Javascript and Rust.
Warming up for Flink 1.11 # The final preparations for the release of Flink 1.11 are well underway, with the feature freeze scheduled for May 15th, and there’s a lot of new features and improvements to look out for:
On the usability side, you can expect a big focus on smoothing data ingestion with contributions like support for Change Data Capture (CDC) in the Table API/SQL (FLIP-105), easy streaming data ingestion into Apache Hive (FLIP-115) or support for Pandas DataFrames in PyFlink (FLIP-120). A great deal of effort has also gone into maturing PyFlink, with the introduction of user defined metrics in Python UDFs (FLIP-112) and the extension of Python UDF support beyond the Python Table API (FLIP-106,FLIP-114).
On the operational side, the much anticipated new Source API (FLIP-27) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (FLIP-76) and changes to network memory management will allow to speed up checkpointing under backpressure — this is part of a bigger effort to rethink fault tolerance that will introduce many other non-trivial changes to Flink. You can learn more about it in this recent Flink Forward talk!
Throw into the mix improvements around type systems, the WebUI, metrics reporting, supported formats and&hellip;we can&rsquo;t wait! To get an overview of the ongoing developments, have a look at this thread. We encourage the community to get involved in testing once an RC (Release Candidate) is out. Keep an eye on the @dev mailing list for updates!
Flink Minor Releases # Flink 1.9.3 # The community released Flink 1.9.3, covering some outstanding bugs from Flink 1.9! You can find more in the announcement blogpost.
Flink 1.10.1 # Also in the pipeline is the release of Flink 1.10.1, already in the RC voting phase. So, you can expect Flink 1.10.1 to be released soon!
New Committers and PMC Members # The Apache Flink community has welcomed 3 PMC Members and 2 new Committers since the last update. Congratulations!
New PMC Members # Dawid Wysakowicz
Hequn Cheng
Zhijiang Wang
New Committers # Konstantin Knauf
Seth Wiesman
The Bigger Picture # A new self-paced Apache Flink training # This week, the Flink website received the invaluable contribution of a self-paced training course curated by David (@alpinegizmo) — or, what used to be the entire training materials under training.ververica.com. The new materials guide you through the very basics of Flink and the DataStream API, and round off every concepts section with hands-on exercises to help you better assimilate what you learned.
Whether you&rsquo;re new to Flink or just looking to strengthen your foundations, this training is the most comprehensive way to get started and is now completely open source: https://flink.apache.org/training.html. For now, the materials are only available in English, but the community intends to also provide a Chinese translation in the future.
Google Season of Docs 2020 # Google Season of Docs (GSOD) is a great initiative organized by Google Open Source to pair technical writers with mentors to work on documentation for open source projects. Last year, the Flink community submitted an application that unfortunately didn’t make the cut — but we are trying again! This time, with a project idea to improve the Table API &amp; SQL documentation:
1) Restructure the Table API &amp; SQL Documentation
Reworking the current documentation structure would allow to:
Lower the entry barrier to Flink for non-programmatic (i.e. SQL) users.
Make the available features more easily discoverable.
Improve the flow and logical correlation of topics.
FLIP-60 contains a detailed proposal on how to reorganize the existing documentation, which can be used as a starting point.
2) Extend the Table API &amp; SQL Documentation
Some areas of the documentation have insufficient detail or are not accessible for new Flink users. Examples of topics and sections that require attention are: planners, built-in functions, connectors, overview and concepts sections. There is a lot of work to be done and the technical writer could choose what areas to focus on — these improvements could then be added to the documentation rework umbrella issue (FLINK-12639).
If you’re interested in learning more about this project idea or want to get involved in GSoD as a technical writer, check out the announcement blogpost.
&hellip;and something to read! # Events across the globe have pretty much come to a halt, so we’ll leave you with some interesting resources to read and explore instead. In addition to this written content, you can also recap the sessions from the Flink Forward Virtual Conference!
Type Links Blogposts Event-Driven Supply Chain for Crisis with FlinkSQL and Zeppelin Memory Management Improvements with Apache Flink 1.10 Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can Tutorials PyFlink: Introducing Python Support for UDFs in Flink's Table API Flink Stateful Functions: where to start? Flink Packages Flink Packages is a website where you can explore (and contribute to) the Flink ecosystem of connectors, extensions, APIs, tools and integrations. New in: Spillable State Backend for Flink Flink Memory Calculator Ververica Platform Community Edition If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.
`}),e.add({id:158,href:"/2020/05/04/applying-to-google-season-of-docs-2020/",title:"Applying to Google Season of Docs 2020",section:"Flink Blog",content:`The Flink community is thrilled to share that the project is applying again to Google Season of Docs (GSoD) this year! If you’re unfamiliar with the program, GSoD is a great initiative organized by Google Open Source to pair technical writers with mentors to work on documentation for open source projects. The first edition supported over 40 projects, including some other cool Apache Software Foundation (ASF) members like Apache Airflow and Apache Cassandra.
Why Apply? # As one of the most active projects in the ASF, Flink is experiencing a boom in contributions and some major changes to its codebase. And, while the project has also seen a significant increase in activity when it comes to writing, reviewing and translating documentation, it’s hard to keep up with the pace.
Since last year, the community has been working on FLIP-42 to improve the documentation experience and bring a more accessible information architecture to Flink. After some discussion, we agreed that GSoD would be a valuable opportunity to double down on this effort and collaborate with someone who is passionate about technical writing&hellip;and Flink!
How can you contribute? # If working shoulder to shoulder with the Flink community on documentation sounds exciting, we’d love to hear from you! You can read more about our idea for this year’s project below and, depending on whether it is accepted, apply as a technical writer. If you have any questions or just want to know more about the project idea, ping us at dev@flink.apache.org!
Please subscribe to the Apache Flink mailing list before reaching out. If you are not subscribed then responses to your message will not go through. You can always unsubscribe at any time. Project: Improve the Table API &amp; SQL Documentation # Apache Flink is a stateful stream processor supporting a broad set of use cases and featuring APIs at different levels of abstraction that allow users to trade off expressiveness and usability, as well as work with their language of choice (Java/Scala, SQL or Python). The Table API &amp; SQL are Flink’s high-level relational abstractions and focus on data analytics use cases. A core principle is that either API can be used to process static (batch) and continuous (streaming) data with the same syntax and yielding the same results.
As the Flink community works on extending the scope of the Table API &amp; SQL, a lot of new features are being added and some underlying structures are also being refactored. At the same time, the documentation for these APIs is growing onto a somewhat unruly structure and has potential for improvement in some areas.
The project has two main workstreams: restructuring and extending the Table API &amp; SQL documentation. These can be worked on by one person as a bigger effort or assigned to different technical writers.
1) Restructure the Table API &amp; SQL Documentation
Reworking the current documentation structure would allow to:
Lower the entry barrier to Flink for non-programmatic (i.e. SQL) users. Make the available features more easily discoverable. Improve the flow and logical correlation of topics. FLIP-60 contains a detailed proposal on how to reorganize the existing documentation, which can be used as a starting point.
2) Extend the Table API &amp; SQL Documentation
Some areas of the documentation have insufficient detail or are not accessible for new Flink users. Examples of topics and sections that require attention are: planners, built-in functions, connectors, overview and concepts sections. There is a lot of work to be done and the technical writer could choose what areas to focus on — these improvements could then be added to the documentation rework umbrella issue (FLINK-12639).
Project Mentors # Aljoscha Krettek (Apache Flink and Apache Beam PMC Member) Seth Wiesman (Apache Flink Committer) Related Resources # FLIP-60: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685
Table API &amp; SQL Documentation: //nightlies.apache.org/flink/flink-docs-release-1.10/dev/table/
How to Contribute Documentation: https://flink.apache.org/contributing/contribute-documentation.html
Documentation Style Guide: https://flink.apache.org/contributing/docs-style.html
We look forward to receiving feedback on this GSoD application and also to continue improving the documentation experience for Flink users. Join us!
`}),e.add({id:159,href:"/2020/04/24/apache-flink-1.9.3-released/",title:"Apache Flink 1.9.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.9 series.
This release includes 38 fixes and minor improvements for Flink 1.9.2. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.9.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.9.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-15143] - Create document for FLIP-49 TM memory model and configuration guide [FLINK-16389] - Bump Kafka 0.10 to 0.10.2.2 Bug [FLINK-11193] - Rocksdb timer service factory configuration option is not settable per job [FLINK-14316] - Stuck in &quot;Job leader ... lost leadership&quot; error [FLINK-14560] - The value of taskmanager.memory.size in flink-conf.yaml is set to zero will cause taskmanager not to work [FLINK-15010] - Temp directories flink-netty-shuffle-* are not cleaned up [FLINK-15085] - HistoryServer dashboard config json out of sync [FLINK-15386] - SingleJobSubmittedJobGraphStore.putJobGraph has a logic error [FLINK-15575] - Azure Filesystem Shades Wrong Package &quot;httpcomponents&quot; [FLINK-15638] - releasing/create_release_branch.sh does not set version in flink-python/pyflink/version.py [FLINK-15812] - HistoryServer archiving is done in Dispatcher main thread [FLINK-15844] - Removal of JobWithJars.buildUserCodeClassLoader method without Configuration breaks backwards compatibility [FLINK-15863] - Fix docs stating that savepoints are relocatable [FLINK-16047] - Blink planner produces wrong aggregate results with state clean up [FLINK-16242] - BinaryGeneric serialization error cause checkpoint failure [FLINK-16308] - SQL connector download links are broken [FLINK-16373] - EmbeddedLeaderService: IllegalStateException: The RPC connection is already closed [FLINK-16573] - Kinesis consumer does not properly shutdown RecordFetcher threads [FLINK-16576] - State inconsistency on restore with memory state backends [FLINK-16696] - Savepoint trigger documentation is insufficient [FLINK-16703] - AkkaRpcActor state machine does not record transition to terminating state. [FLINK-16836] - Losing leadership does not clear rpc connection in JobManagerLeaderListener [FLINK-16860] - Failed to push filter into OrcTableSource when upgrading to 1.9.2 [FLINK-16916] - The logic of NullableSerializer#copy is wrong [FLINK-17062] - Fix the conversion from Java row type to Python row type Improvement [FLINK-14278] - Pass in ioExecutor into AbstractDispatcherResourceManagerComponentFactory [FLINK-15908] - Add description of support &#39;pip install&#39; to 1.9.x documents [FLINK-15909] - Add PyPI release process into the subsequent release of 1.9.x [FLINK-15938] - Idle state not cleaned in StreamingJoinOperator and StreamingSemiAntiJoinOperator [FLINK-16018] - Improve error reporting when submitting batch job (instead of AskTimeoutException) [FLINK-16031] - Improve the description in the README file of PyFlink 1.9.x [FLINK-16167] - Update documentation about python shell execution [FLINK-16280] - Fix sample code errors in the documentation about elasticsearch connector [FLINK-16697] - Disable JMX rebinding [FLINK-16862] - Remove example url in quickstarts [FLINK-16942] - ES 5 sink should allow users to select netty transport client Task [FLINK-11767] - Introduce new TypeSerializerUpgradeTestBase, new PojoSerializerUpgradeTest [FLINK-16454] - Update the copyright year in NOTICE files `}),e.add({id:160,href:"/2020/04/21/memory-management-improvements-with-apache-flink-1.10/",title:"Memory Management Improvements with Apache Flink 1.10",section:"Flink Blog",content:`Apache Flink 1.10 comes with significant changes to the memory model of the Task Managers and configuration options for your Flink applications. These recently-introduced changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), providing strict control over its memory consumption. In this post, we describe Flink’s memory model, as it stands in Flink 1.10, how to set up and manage memory consumption of your Flink applications and the recent changes the community implemented in the latest Apache Flink release.
Introduction to Flink’s memory model # Having a clear understanding of Apache Flink’s memory model allows you to manage resources for the various workloads more efficiently. The following diagram illustrates the main memory components in Flink:
Flink: Total Process Memory The Task Manager process is a JVM process. On a high level, its memory consists of the JVM Heap and Off-Heap memory. These types of memory are consumed by Flink directly or by JVM for its specific purposes (i.e. metaspace etc.). There are two major memory consumers within Flink: the user code of job operator tasks and the framework itself consuming memory for internal data structures, network buffers, etc.
Please note that the user code has direct access to all memory types: JVM Heap, Direct and Native memory. Therefore, Flink cannot really control its allocation and usage. There are however two types of Off-Heap memory which are consumed by tasks and controlled explicitly by Flink:
Managed Memory (Off-Heap) Network Buffers The latter is part of the JVM Direct Memory, allocated for user record data exchange between operator tasks.
How to set up Flink memory # With the latest release of Flink 1.10 and in order to provide better user experience, the framework comes with both high-level and fine-grained tuning of memory components. There are essentially three alternatives to setting up memory in Task Managers.
The first two — and simplest — alternatives are configuring one of the two following options for total memory available for the JVM process of the Task Manager:
Total Process Memory: total memory consumed by the Flink Java application (including user code) and by the JVM to run the whole process. Total Flink Memory: only memory consumed by the Flink Java application, including user code but excluding memory allocated by JVM to run it It is advisable to configure the Total Flink Memory for standalone deployments where explicitly declaring how much memory is given to Flink is a common practice, while the outer JVM overhead is of little interest. For the cases of deploying Flink in containerized environments (such as Kubernetes, Yarn or Mesos), the Total Process Memory option is recommended instead, because it becomes the size for the total memory of the requested container. Containerized environments usually strictly enforce this memory limit.
If you want more fine-grained control over the size of JVM Heap and Managed Memory (Off-Heap), there is also a second alternative to configure both Task Heap and Managed Memory. This alternative gives a clear separation between the heap memory and any other memory types.
In line with the community’s efforts to unify batch and stream processing, this model works universally for both scenarios. It allows sharing the JVM Heap memory between the user code of operator tasks in any workload and the heap state backend in stream processing scenarios. In a similar way, the Managed Memory can be used for batch spilling and for the RocksDB state backend in streaming.
The remaining memory components are automatically adjusted either based on their default values or additionally configured parameters. Flink also checks the overall consistency. You can find more information about the different memory components in the corresponding documentation. Additionally, you can try different configuration options with the configuration spreadsheet of FLIP-49 and check the corresponding results for your individual case.
If you are migrating from a Flink version older than 1.10, we suggest following the steps in the migration guide of the Flink documentation.
Other components # While configuring Flink’s memory, the size of different memory components can either be fixed with the value of the respective option or tuned using multiple options. Below we provide some more insight about the memory setup.
Fractions of the Total Flink Memory # This method allows a proportional breakdown of the Total Flink Memory where the Managed Memory (if not set explicitly) and Network Buffers can take certain fractions of it. The remaining memory is then assigned to the Task Heap (if not set explicitly) and other fixed JVM Heap and Off-Heap components. The following picture represents an example of such a setup:
Flink: Example of Memory Setup Please note that
Flink will verify that the size of the derived Network Memory is between its minimum and maximum value, otherwise Flink’s startup will fail. The maximum and minimum limits have default values which can be overwritten by the respective configuration options. In general, the configured fractions are treated by Flink as hints. Under certain scenarios, the derived value might not match the fraction. For example, if the Total Flink Memory and the Task Heap are configured to fixed values, the Managed Memory will get a certain fraction and the Network Memory will get the remaining memory which might not exactly match its fraction. More hints to control the container memory limit # The heap and direct memory usage are managed by the JVM. There are also many other possible sources of native memory consumption in Apache Flink or its user applications which are not managed by Flink or the JVM. Controlling their limits is often difficult which complicates debugging of potential memory leaks. If Flink’s process allocates too much memory in an unmanaged way, it can often result in killing Task Manager containers in containerized environments. In this case, it may be hard to understand which type of memory consumption has exceeded its limit. Flink 1.10 introduces some specific tuning options to clearly represent such components. Although Flink cannot always enforce strict limits and borders among them, the idea here is to explicitly plan the memory usage. Below we provide some examples of how memory setup can prevent containers exceeding their memory limit:
RocksDB state cannot grow too big. The memory consumption of RocksDB state backend is accounted for in the Managed Memory. RocksDB respects its limit by default (only since Flink 1.10). You can increase the Managed Memory size to improve RocksDB’s performance or decrease it to save resources.
User code or its dependencies consume significant off-heap memory. Tuning the Task Off-Heap option can assign additional direct or native memory to the user code or any of its dependencies. Flink cannot control native allocations but it sets the limit for JVM Direct memory allocations. The Direct memory limit is enforced by the JVM.
JVM metaspace requires additional memory. If you encounter OutOfMemoryError: Metaspace, Flink provides an option to increase its limit and the JVM will ensure that it is not exceeded.
JVM requires more internal memory. There is no direct control over certain types of JVM process allocations but Flink provides JVM Overhead options. The options allow declaring an additional amount of memory, anticipated for those allocations and not covered by other options.
Conclusion # The latest Flink release (Flink 1.10) introduces some significant changes to Flink’s memory configuration, making it possible to manage your application memory and debug Flink significantly better than before. Future developments in this area also include adopting a similar memory model for the job manager process in FLIP-116, so stay tuned for more additions and features in upcoming releases. If you have any suggestions or questions for the community, we encourage you to sign up to the Apache Flink mailing lists and become part of the discussion there.
`}),e.add({id:161,href:"/2020/04/15/flink-serialization-tuning-vol.-1-choosing-your-serializer-if-you-can/",title:"Flink Serialization Tuning Vol. 1: Choosing your Serializer — if you can",section:"Flink Blog",content:`Almost every Flink job has to exchange data between its operators and since these records may not only be sent to another instance in the same JVM but instead to a separate process, records need to be serialized to bytes first. Similarly, Flink’s off-heap state-backend is based on a local embedded RocksDB instance which is implemented in native C++ code and thus also needs transformation into bytes on every state access. Wire and state serialization alone can easily cost a lot of your job’s performance if not executed correctly and thus, whenever you look into the profiler output of your Flink job, you will most likely see serialization in the top places for using CPU cycles.
Since serialization is so crucial to your Flink job, we would like to highlight Flink’s serialization stack in a series of blog posts starting with looking at the different ways Flink can serialize your data types.
Recap: Flink Serialization # Flink handles data types and serialization with its own type descriptors, generic type extraction, and type serialization framework. We recommend reading through the documentation first in order to be able to follow the arguments we present below. In essence, Flink tries to infer information about your job’s data types for wire and state serialization, and to be able to use grouping, joining, and aggregation operations by referring to individual field names, e.g. stream.keyBy(“ruleId”) or dataSet.join(another).where(&quot;name&quot;).equalTo(&quot;personName&quot;). It also allows optimizations in the serialization format as well as reducing unnecessary de/serializations (mainly in certain Batch operations as well as in the SQL/Table APIs).
Choice of Serializer # Apache Flink&rsquo;s out-of-the-box serialization can be roughly divided into the following groups:
Flink-provided special serializers for basic types (Java primitives and their boxed form), arrays, composite types (tuples, Scala case classes, Rows), and a few auxiliary types (Option, Either, Lists, Maps, …),
POJOs; a public, standalone class with a public no-argument constructor and all non-static, non-transient fields in the class hierarchy either public or with a public getter- and a setter-method; see POJO Rules,
Generic types; user-defined data types that are not recognized as a POJO and then serialized via Kryo.
Alternatively, you can also register custom serializers for user-defined data types. This includes writing your own serializers or integrating other serialization systems like Google Protobuf or Apache Thrift via Kryo. Overall, this gives quite a number of different options of serializing user-defined data types and we will elaborate seven of them in the sections below.
PojoSerializer # As outlined above, if your data type is not covered by a specialized serializer but follows the POJO Rules, it will be serialized with the PojoSerializer which uses Java reflection to access an object’s fields. It is fast, generic, Flink-specific, and supports state schema evolution out of the box. If a composite data type cannot be serialized as a POJO, you will find the following message (or similar) in your cluster logs:
15:45:51,460 INFO org.apache.flink.api.java.typeutils.TypeExtractor - Class … cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on &ldquo;Data Types &amp; Serialization&rdquo; for details of the effect on performance.
This means, that the PojoSerializer will not be used, but instead Flink will fall back to Kryo for serialization (see below). We will have a more detailed look into a few (more) situations that can lead to unexpected Kryo fallbacks in the second part of this blog post series.
Tuple Data Types # Flink comes with a predefined set of tuple types which all have a fixed length and contain a set of strongly-typed fields of potentially different types. There are implementations for Tuple0, Tuple1&lt;T0&gt;, …, Tuple25&lt;T0, T1, ..., T24&gt; and they may serve as easy-to-use wrappers that spare the creation of POJOs for each and every combination of objects you need to pass between computations. With the exception of Tuple0, these are serialized and deserialized with the TupleSerializer and the according fields’ serializers. Since tuple classes are completely under the control of Flink, both actions can be performed without reflection by accessing the appropriate fields directly. This certainly is a (performance) advantage when working with tuples instead of POJOs. Tuples, however, are not as flexible and certainly less descriptive in code.
Note Since \`Tuple0\` does not contain any data and therefore is probably a bit special anyway, it will use a special serializer implementation: [Tuple0Serializer](https://github.com/apache/flink/blob/release-1.10.0/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/Tuple0Serializer.java). Row Data Types # Row types are mainly used by the Table and SQL APIs of Flink. A Row groups an arbitrary number of objects together similar to the tuples above. These fields are not strongly typed and may all be of different types. Because field types are missing, Flink’s type extraction cannot automatically extract type information and users of a Row need to manually tell Flink about the row&rsquo;s field types. The RowSerializer will then make use of these types for efficient serialization.
Row type information can be provided in two ways:
you can have your source or operator implement ResultTypeQueryable&lt;Row&gt;: public static class RowSource implements SourceFunction&lt;Row&gt;, ResultTypeQueryable&lt;Row&gt; { // ... @Override public TypeInformation&lt;Row&gt; getProducedType() { return Types.ROW(Types.INT, Types.STRING, Types.OBJECT_ARRAY(Types.STRING)); } } you can provide the types when building the job graph by using SingleOutputStreamOperator#returns() DataStream&lt;Row&gt; sourceStream = env.addSource(new RowSource()) .returns(Types.ROW(Types.INT, Types.STRING, Types.OBJECT_ARRAY(Types.STRING))); Warning If you fail to provide the type information for a \`Row\`, Flink identifies that \`Row\` is not a valid POJO type according to the rules above and falls back to Kryo serialization (see below) which you will also see in the logs as: 13:10:11,148 INFO org.apache.flink.api.java.typeutils.TypeExtractor - Class class org.apache.flink.types.Row cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on &quot;Data Types &amp; Serialization&quot; for details of the effect on performance.
Avro # Flink offers built-in support for the Apache Avro serialization framework (currently using version 1.8.2) by adding the org.apache.flink:flink-avro dependency into your job. Flink’s AvroSerializer can then use Avro’s specific, generic, and reflective data serialization and make use of Avro’s performance and flexibility, especially in terms of evolving the schema when the classes change over time.
Avro Specific # Avro specific records will be automatically detected by checking that the given type’s type hierarchy contains the SpecificRecordBase class. You can either specify your concrete Avro type, or—if you want to be more generic and allow different types in your operator—use the SpecificRecordBase type (or a subtype) in your user functions, in ResultTypeQueryable#getProducedType(), or in SingleOutputStreamOperator#returns(). Since specific records use generated Java code, they are strongly typed and allow direct access to the fields via known getters and setters.
Warning If you specify the Flink type as \`SpecificRecord\` and not \`SpecificRecordBase\`, Flink will not see this as an Avro type. Instead, it will use Kryo to de/serialize any objects which may be considerably slower. Avro Generic # Avro’s GenericRecord types cannot, unfortunately, be used automatically since they require the user to specify a schema (either manually or by retrieving it from some schema registry). With that schema, you can provide the right type information by either of the following options just like for the Row Types above:
implement ResultTypeQueryable&lt;GenericRecord&gt;: public static class AvroGenericSource implements SourceFunction&lt;GenericRecord&gt;, ResultTypeQueryable&lt;GenericRecord&gt; { private final GenericRecordAvroTypeInfo producedType; public AvroGenericSource(Schema schema) { this.producedType = new GenericRecordAvroTypeInfo(schema); } @Override public TypeInformation&lt;GenericRecord&gt; getProducedType() { return producedType; } } provide type information when building the job graph by using SingleOutputStreamOperator#returns() DataStream&lt;GenericRecord&gt; sourceStream = env.addSource(new AvroGenericSource()) .returns(new GenericRecordAvroTypeInfo(schema)); Without this type information, Flink will fall back to Kryo for serialization which would serialize the schema into every record, over and over again. As a result, the serialized form will be bigger and more costly to create.
Note Since Avro’s \`Schema\` class is not serializable, it can not be sent around as is. You can work around this by converting it to a String and parsing it back when needed. If you only do this once on initialization, there is practically no difference to sending it directly. Avro Reflect # The third way of using Avro is to exchange Flink’s PojoSerializer (for POJOs according to the rules above) for Avro’s reflection-based serializer. This can be enabled by calling
env.getConfig().enableForceAvro(); Kryo # Any class or object which does not fall into the categories above or is covered by a Flink-provided special serializer is de/serialized with a fallback to Kryo (currently version 2.24.0) which is a powerful and generic serialization framework in Java. Flink calls such a type a generic type and you may stumble upon GenericTypeInfo when debugging code. If you are using Kryo serialization, make sure to register your types with kryo:
env.getConfig().registerKryoType(MyCustomType.class); Registering types adds them to an internal map of classes to tags so that, during serialization, Kryo does not have to add the fully qualified class names as a prefix into the serialized form. Instead, Kryo uses these (integer) tags to identify the underlying classes and reduce serialization overhead.
Note Flink will store Kryo serializer mappings from type registrations in its checkpoints and savepoints and will retain them across job (re)starts. Disabling Kryo # If desired, you can disable the Kryo fallback, i.e. the ability to serialize generic types, by calling
env.getConfig().disableGenericTypes(); This is mostly useful for finding out where these fallbacks are applied and replacing them with better serializers. If your job has any generic types with this configuration, it will fail with
Exception in thread &ldquo;main&rdquo; java.lang.UnsupportedOperationException: Generic types have been disabled in the ExecutionConfig and type … is treated as a generic type.
If you cannot immediately see from the type where it is being used, this log message also gives you a stacktrace that can be used to set breakpoints and find out more details in your IDE.
Apache Thrift (via Kryo) # In addition to the variants above, Flink also allows you to register other type serialization frameworks with Kryo. After adding the appropriate dependencies from the documentation (com.twitter:chill-thrift and org.apache.thrift:libthrift), you can use Apache Thrift like the following:
env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.class); This only works if generic types are not disabled and MyCustomType is a Thrift-generated data type. If the data type is not generated by Thrift, Flink will fail at runtime with an exception like this:
java.lang.ClassCastException: class MyCustomType cannot be cast to class org.apache.thrift.TBase (MyCustomType and org.apache.thrift.TBase are in unnamed module of loader &lsquo;app&rsquo;)
Note Please note that \`TBaseSerializer\` can be registered as a default Kryo serializer as above (and as specified in [its documentation](https://github.com/twitter/chill/blob/v0.7.6/chill-thrift/src/main/java/com/twitter/chill/thrift/TBaseSerializer.java)) or via \`registerTypeWithKryoSerializer\`. In practice, we found both ways working. We also saw no difference between registering Thrift classes in addition to the call above. Both may be different in your scenario. Protobuf (via Kryo) # In a way similar to Apache Thrift, Google Protobuf may be registered as a custom serializer after adding the right dependencies (com.twitter:chill-protobuf and com.google.protobuf:protobuf-java):
env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, ProtobufSerializer.class); This will work as long as generic types have not been disabled (this would disable Kryo for good). If MyCustomType is not a Protobuf-generated class, your Flink job will fail at runtime with the following exception:
java.lang.ClassCastException: class MyCustomType cannot be cast to class com.google.protobuf.Message (MyCustomType and com.google.protobuf.Message are in unnamed module of loader &lsquo;app&rsquo;)
Note Please note that \`ProtobufSerializer\` can be registered as a default Kryo serializer (as specified in the [Protobuf documentation](https://github.com/twitter/chill/blob/v0.7.6/chill-thrift/src/main/java/com/twitter/chill/thrift/TBaseSerializer.java)) or via \`registerTypeWithKryoSerializer\` (as presented here). In practice, we found both ways working. We also saw no difference between registering your Protobuf classes in addition to the call above. Both may be different in your scenario. State Schema Evolution # Before taking a closer look at the performance of each of the serializers described above, we would like to emphasize that performance is not everything that counts inside a real-world Flink job. Types for storing state, for example, should be able to evolve their schema (add/remove/change fields) throughout the lifetime of the job without losing previous state. This is what Flink calls State Schema Evolution. Currently, as of Flink 1.10, there are only two serializers that support out-of-the-box schema evolution: POJO and Avro. For anything else, if you want to change the state schema, you will have to either implement your own custom serializers or use the State Processor API to modify your state for the new code.
Performance Comparison # With so many options for serialization, it is actually not easy to make the right choice. We already saw some technical advantages and disadvantages of each of them outlined above. Since serializers are at the core of your Flink jobs and usually also sit on the hot path (per record invocations), let us actually take a deeper look into their performance with the help of the Flink benchmarks project at https://github.com/dataArtisans/flink-benchmarks. This project adds a few micro-benchmarks on top of Flink (some more low-level than others) to track performance regressions and improvements. Flink’s continuous benchmarks for monitoring the serialization stack’s performance are implemented in SerializationFrameworkMiniBenchmarks.java. This is only a subset of all available serialization benchmarks though and you will find the complete set in SerializationFrameworkAllBenchmarks.java. All of these use the same definition of a small POJO that may cover average use cases. Essentially (without constructors, getters, and setters), these are the data types that it uses for evaluating performance:
public class MyPojo { public int id; private String name; private String[] operationNames; private MyOperation[] operations; private int otherId1; private int otherId2; private int otherId3; private Object someObject; } public class MyOperation { int id; protected String name; } This is mapped to tuples, rows, Avro specific records, Thrift and Protobuf representations appropriately and sent through a simple Flink job at parallelism 4 where the data type is used during network communication like this:
env.setParallelism(4); env.addSource(new PojoSource(RECORDS_PER_INVOCATION, 10)) .rebalance() .addSink(new DiscardingSink&lt;&gt;()); After running this through the jmh micro-benchmarks defined in SerializationFrameworkAllBenchmarks.java, I retrieved the following performance results for Flink 1.10 on my machine (in number of operations per millisecond): A few takeaways from these numbers:
The default fallback from POJO to Kryo reduces performance by 75%.
Registering types with Kryo significantly improves its performance with only 64% fewer operations than by using a POJO.
Avro GenericRecord and SpecificRecord are roughly serialized at the same speed.
Avro Reflect serialization is even slower than Kryo default (-45%).
Tuples are the fastest, closely followed by Rows. Both leverage fast specialized serialization code based on direct access without Java reflection.
Using a (nested) Tuple instead of a POJO may speed up your job by 42% (but is less flexible!). Having code-generation for the PojoSerializer (FLINK-3599) may actually close that gap (or at least move closer to the RowSerializer). If you feel like giving the implementation a go, please give the Flink community a note and we will see whether we can make that happen.
If you cannot use POJOs, try to define your data type with one of the serialization frameworks that generate specific code for it: Protobuf, Avro, Thrift (in that order, performance-wise).
Note As with all benchmarks, please bear in mind that these numbers only give a hint on Flink’s serializer performance in a specific scenario. They may be different with your data types but the rough classification is probably the same. If you want to be sure, please verify the results with your data types. You should be able to copy from \`SerializationFrameworkAllBenchmarks.java\` to set up your own micro-benchmarks or integrate different serialization benchmarks into your own tooling. Conclusion # In the sections above, we looked at how Flink performs serialization for different sorts of data types and elaborated the technical advantages and disadvantages. For data types used in Flink state, you probably want to leverage either POJO or Avro types which, currently, are the only ones supporting state evolution out of the box and allow your stateful application to develop over time. POJOs are usually faster in the de/serialization while Avro may support more flexible schema evolution and may integrate better with external systems. Please note, however, that you can use different serializers for external vs. internal components or even state vs. network communication.
The fastest de/serialization is achieved with Flink’s internal tuple and row serializers which can access these types&rsquo; fields directly without going via reflection. With roughly 30% decreased throughput as compared to tuples, Protobuf and POJO types do not perform too badly on their own and are more flexible and maintainable. Avro (specific and generic) records as well as Thrift data types further reduce performance by 20% and 30%, respectively. You definitely want to avoid Kryo as that reduces throughput further by around 50% and more!
The next article in this series will use this finding as a starting point to look into a few common pitfalls and obstacles of avoiding Kryo, how to get the most out of the PojoSerializer, and a few more tuning techniques with respect to serialization. Stay tuned for more.
`}),e.add({id:162,href:"/2020/04/09/pyflink-introducing-python-support-for-udfs-in-flinks-table-api/",title:"PyFlink: Introducing Python Support for UDFs in Flink's Table API",section:"Flink Blog",content:`Flink 1.9 introduced the Python Table API, allowing developers and data engineers to write Python Table API jobs for Table transformations and analysis, such as Python ETL or aggregate jobs. However, Python users faced some limitations when it came to support for Python UDFs in Flink 1.9, preventing them from extending the system’s built-in functionality.
In Flink 1.10, the community further extended the support for Python by adding Python UDFs in PyFlink. Additionally, both the Python UDF environment and dependency management are now supported, allowing users to import third-party libraries in the UDFs, leveraging Python&rsquo;s rich set of third-party libraries.
Python Support for UDFs in Flink 1.10 # Before diving into how you can define and use Python UDFs, we explain the motivation and background behind how UDFs work in PyFlink and provide some additional context about the implementation of our approach. Below we give a brief introduction on the PyFlink architecture from job submission, all the way to executing the Python UDF.
The PyFlink architecture mainly includes two parts — local and cluster — as shown in the architecture visual below. The local phase is the compilation of the job, and the cluster is the execution of the job.
For the local part, the Python API is a mapping of the Java API: each time Python executes a method in the figure above, it will synchronously call the method corresponding to Java through Py4J, and finally generate a Java JobGraph, before submitting it to the cluster.
For the cluster part, just like ordinary Java jobs, the JobMaster schedules tasks to TaskManagers. The tasks that include Python UDF in a TaskManager involve the execution of Java and Python operators. In the Python UDF operator, various gRPC services are used to provide different communications between the Java VM and the Python VM, such as DataService for data transmissions, StateService for state requirements, and Logging and Metrics Services. These services are built on Beam&rsquo;s Fn API. While currently only Process mode is supported for Python workers, support for Docker mode and External service mode is also considered for future Flink releases.
How to use PyFlink with UDFs in Flink 1.10 # This section provides some Python user defined function (UDF) examples, including how to install PyFlink, how to define/register/invoke UDFs in PyFlink and how to execute the job.
Install PyFlink # Using Python in Apache Flink requires installing PyFlink. PyFlink is available through PyPI and can be easily installed using pip:
$ python -m pip install apache-flink Note Please note that Python 3.5 or higher is required to install and run PyFlink Define a Python UDF # There are many ways to define a Python scalar function, besides extending the base class ScalarFunction. The following example shows the different ways of defining a Python scalar function that takes two columns of BIGINT as input parameters and returns the sum of them as the result.
# option 1: extending the base class \`ScalarFunction\` class Add(ScalarFunction): def eval(self, i, j): return i + j add = udf(Add(), [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) # option 2: Python function @udf(input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()], result_type=DataTypes.BIGINT()) def add(i, j): return i + j # option 3: lambda function add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) # option 4: callable function class CallableAdd(object): def __call__(self, i, j): return i + j add = udf(CallableAdd(), [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) # option 5: partial function def partial_add(i, j, k): return i + j + k add = udf(functools.partial(partial_add, k=1), [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) Register a Python UDF # # register the Python function table_env.register_function(&#34;add&#34;, add) Invoke a Python UDF # # use the function in Python Table API my_table.select(&#34;add(a, b)&#34;) Below, you can find a complete example of using Python UDF.
from pyflink.datastream import StreamExecutionEnvironment from pyflink.table import StreamTableEnvironment, DataTypes from pyflink.table.descriptors import Schema, OldCsv, FileSystem from pyflink.table.udf import udf env = StreamExecutionEnvironment.get_execution_environment() env.set_parallelism(1) t_env = StreamTableEnvironment.create(env) add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], DataTypes.BIGINT()) t_env.register_function(&#34;add&#34;, add) t_env.connect(FileSystem().path(&#39;/tmp/input&#39;)) \\ .with_format(OldCsv() .field(&#39;a&#39;, DataTypes.BIGINT()) .field(&#39;b&#39;, DataTypes.BIGINT())) \\ .with_schema(Schema() .field(&#39;a&#39;, DataTypes.BIGINT()) .field(&#39;b&#39;, DataTypes.BIGINT())) \\ .create_temporary_table(&#39;mySource&#39;) t_env.connect(FileSystem().path(&#39;/tmp/output&#39;)) \\ .with_format(OldCsv() .field(&#39;sum&#39;, DataTypes.BIGINT())) \\ .with_schema(Schema() .field(&#39;sum&#39;, DataTypes.BIGINT())) \\ .create_temporary_table(&#39;mySink&#39;) t_env.from_path(&#39;mySource&#39;)\\ .select(&#34;add(a, b)&#34;) \\ .insert_into(&#39;mySink&#39;) t_env.execute(&#34;tutorial_job&#34;) Submit the job # Firstly, you need to prepare the input data in the “/tmp/input” file. For example,
$ echo &quot;1,2&quot; &gt; /tmp/input
Next, you can run this example on the command line,
$ python python_udf_sum.py
The command builds and runs the Python Table API program in a local mini-cluster. You can also submit the Python Table API program to a remote cluster using different command lines, (see more details here).
Finally, you can see the execution result on the command line:
$ cat /tmp/output 3
Python UDF dependency management # In many cases, you would like to import third-party dependencies in the Python UDF. The example below provides detailed guidance on how to manage such dependencies.
Suppose you want to use the mpmath to perform the sum of the example above. The Python UDF may look like:
@udf(input_types=[DataTypes.BIGINT(), DataTypes.BIGINT()], result_type=DataTypes.BIGINT()) def add(i, j): from mpmath import fadd # add third-party dependency return int(fadd(i, j)) To make it available on the worker node that does not contain the dependency, you can specify the dependencies with the following commands and API:
$ cd /tmp $ echo mpmath==1.1.0 &gt; requirements.txt $ pip download -d cached_dir -r requirements.txt --no-binary :all: t_env.set_python_requirements(&#34;/tmp/requirements.txt&#34;, &#34;/tmp/cached_dir&#34;) A requirements.txt file that defines the third-party dependencies is used. If the dependencies cannot be accessed in the cluster, then you can specify a directory containing the installation packages of these dependencies by using the parameter &ldquo;requirements_cached_dir&rdquo;, as illustrated in the example above. The dependencies will be uploaded to the cluster and installed offline.
Conclusion &amp; Upcoming work # In this blog post, we introduced the architecture of Python UDFs in PyFlink and provided some examples on how to define, register and invoke UDFs. Flink 1.10 brings Python support in the framework to new levels, allowing Python users to write even more magic with their preferred language. The community is actively working towards continuously improving the functionality and performance of PyFlink. Future work in upcoming releases will introduce support for Pandas UDFs in scalar and aggregate functions, add support to use Python UDFs through the SQL client to further expand the usage scope of Python UDFs, provide support for a Python ML Pipeline API and finally work towards even more performance improvements. The picture below provides more details on the roadmap for succeeding releases.
`}),e.add({id:163,href:"/2020/04/07/stateful-functions-2.0-an-event-driven-database-on-apache-flink/",title:"Stateful Functions 2.0 - An Event-driven Database on Apache Flink",section:"Flink Blog",content:`Today, we are announcing the release of Stateful Functions (StateFun) 2.0 — the first release of Stateful Functions as part of the Apache Flink project. This release marks a big milestone: Stateful Functions 2.0 is not only an API update, but the first version of an event-driven database that is built on Apache Flink.
Stateful Functions 2.0 makes it possible to combine StateFun’s powerful approach to state and composition with the elasticity, rapid scaling/scale-to-zero and rolling upgrade capabilities of FaaS implementations like AWS Lambda and modern resource orchestration frameworks like Kubernetes.
With these features, Stateful Functions 2.0 addresses two of the most cited shortcomings of many FaaS setups today: consistent state and efficient messaging between functions.
An Event-driven Database # When Stateful Functions joined Apache Flink at the beginning of this year, the project had started as a library on top of Flink to build general-purpose event-driven applications. Users would implement functions that receive and send messages, and maintain state in persistent variables. Flink provided the runtime with efficient exactly-once state and messaging. Stateful Functions 1.0 was a FaaS-inspired mix between stream processing and actor programming — on steroids.
Fig.1: A ride-sharing app as a Stateful Functions example. In version 2.0, Stateful Functions now physically decouples the functions from Flink and the JVM, to invoke them through simple services. That makes it possible to execute functions on a FaaS platform, a Kubernetes deployment or behind a (micro) service.
Flink invokes the functions through a service endpoint via HTTP or gRPC based on incoming events, and supplies state access. The system makes sure that only one invocation per entity (type+ID) is ongoing at any point in time, thus guaranteeing consistency through isolation. By supplying state access as part of the function invocation, the functions themselves behave like stateless applications and can be managed with the same simplicity and benefits: rapid scalability, scale-to-zero, rolling/zero-downtime upgrades and so on.
Fig.2: In Stateful Functions 2.0, functions are stateless and state access is part of the function invocation. The functions can be implemented in any programming language that can handle HTTP requests or bring up a gRPC server. The StateFun project includes a very slim SDK for Python, taking requests and dispatching them to annotated functions. We aim to provide similar SDKs for other languages, such as Go, JavaScript or Rust. Users do not need to write any Flink code (or JVM code) at all; data ingresses/egresses and function endpoints can be defined in a compact YAML spec.
Fig.3: A module declaring a remote endpoint and a function type. Fig.4: A Python implementation of a simple classifier function. The Flink processes (and the JVM) are not executing any user-code at all — though this is possible, for performance reasons (see Embedded Functions). Rather than running application-specific dataflows, Flink here stores the state of the functions and provides the dynamic messaging plane through which functions message each other, carefully dispatching messages/invocations to the event-driven functions/services to maintain consistency guarantees.
Effectively, Flink takes the role of the database, but tailored towards event-driven functions and services. It integrates state storage with the messaging between (and the invocations of) functions and services. Because of this, Stateful Functions 2.0 can be thought of as an “Event-driven Database” on Apache Flink.
“Event-driven Database” vs. “Request/Response Database” # In the case of a traditional database or key/value store (let’s call them request/response databases), the application issues queries to the database (e.g. SQL via JDBC, GET/PUT via HTTP). In contrast, an event-driven database like StateFun inverts that relationship between database and application: the database invokes the functions/services based on arriving messages. This fits very naturally with FaaS and many event-driven application architectures.
Fig.5: Stateful Functions 2.0 inverts the relationship between database and application. In the case of applications built on request/response databases, the database is responsible only for the state. Communication between different functions/services is a separate concern handled within the application layer. In contrast to that, an event-driven database takes care of both state storage and message transport, in a tightly integrated manner.
Similar to Actor Programming, Stateful Functions uses the idea of addressable entities - here, the entity is a function type with an invocation scoped to an ID. These addressable entities own the state and are the targets of messages. Different to actor systems is that the application logic is external and the addressable entities are not physical objects in memory (i.e. actors), but rows in Flink&rsquo;s managed state, together with the entities’ mailboxes.
State and Consistency # Besides matching the needs of serverless applications and FaaS well, the event-driven database approach also helps with simplifying consistent state management.
Consider the example below, with two entities of an application — for example two microservices (Service 1, Service 2). Service 1 is invoked, updates the state in the database, and sends a request to Service 2. Assume that this request fails. There is, in general, no way for Service 1 to know whether Service 2 processed the request and updated its state or not (c.f. Two Generals Problem). To work around that, many techniques exist — making requests idempotent and retrying, commit/rollback protocols, or external transaction coordinators, for example. Solving this in the application layer is complex enough, and including the database into these approaches only adds more complexity.
In the scenario where the event-driven database takes care of state and messaging, we have a much easier problem to solve. Assume one shard of the database receives the initial message, updates its state, invokes Service 1, and routes the message produced by the function to another shard, to be delivered to Service 2. Now assume message transport errored — it may have failed or not, we cannot know for certain. Because the database is in charge of state and messaging, it can offer a generic solution to make sure that either both go through or none does, for example through transactions or consistent snapshots. The application functions are stateless and their invocations without side effects, which means they can be re-invoked again without implications on consistency.
Fig.6: The event-driven database integrates state access and messaging, guaranteeing consistency. That is the big lesson we learned from working on stream processing technology in the past years: state access/updates and messaging need to be integrated. This gives you consistency, scalable behavior and backpressures well based on both state access and compute bottlenecks.
Despite state and computation being physically separated here, the scheduling/dispatching of function invocations is still integrated and physically co-located with state access, preserving the consistency guarantees given by physical state/compute co-location.
Remote, Co-located or Embedded Functions # Functions can be deployed in various ways that trade off loose coupling and independent scaling with performance overhead. Each module of functions can be of a different kind, so some functions can run remote, while others could run embedded.
Remote Functions # Remote Functions are the mechanism described so far, where functions are deployed separately from the Flink StateFun cluster. The state/messaging tier (i.e. the Flink processes) and the function tier can be deployed and scaled independently. All function invocations are remote and have to go through the endpoint service.
In a similar way as databases are accessed via a standardized protocol (e.g. ODBC/JDBC for relational databases, REST for many key/value stores), StateFun 2.0 invokes functions and services through a standardized protocol: HTTP or gRPC with data in a well-defined ProtoBuf schema.
Co-located Functions # An alternative way of deploying functions is co-location with the Flink JVM processes. In such a setup, each Flink TaskManager would talk to one function process sitting “next to it”. A common way to do this is to use a system like Kubernetes and deploy pods consisting of a Flink container and the function container that communicate via the pod-local network.
This mode supports different languages while avoiding to route invocations through a Service/Gateway/LoadBalancer, but it cannot scale the state and compute parts independently.
This style of deployment is similar to how Apache Beam’s portability layer and [Flink’s Python API]({{ site.docs-stable }}/tutorials/python_table_api.html) deploy their non-JVM language SDKs.
Embedded Functions # Embedded Functions are the mode of Stateful Functions 1.0 and Flink’s Java/Scala stream processing APIs. Functions are deployed into the JVM and are directly invoked with the messages and state access. This is the most performant way, though at the cost of only supporting JVM languages.
Following the database analogy, embedded functions are a bit like stored procedures, but in a principled way: the functions here are normal Java/Scala/Kotlin functions implementing standard interfaces and can be developed or tested in any IDE.
Loading Data into the Database # When building a new stateful application, you usually don’t start from a completely blank slate. Often, the application has initial state, such as initial “bootstrap” state, or state from previous versions of the application. When using a database, one could simply bulk load the data to prepare the application.
The equivalent step for Flink would be to write a [savepoint]({{ site.docs-stable }}/ops/state/savepoints.html) that contains the initial state. Savepoints are snapshots of the state of the distributed stream processing application and can be passed to Flink to start processing from that state. Think of them as a database dump, but of a distributed streaming database. In the case of StateFun, the savepoint would contain the state of the functions.
To create a savepoint for a Stateful Functions program, check out the State Bootstrapping API that is part of StateFun 2.0. The State Bootstrapping API uses Flink’s [DataSet API]({{ site.docs-stable }}/dev/batch/), but we plan to expand this to use SQL in the next versions.
Try it out and get involved! # We hope that we could convey some of the excitement we feel about Stateful Functions. If we managed to pique your curiosity, try it out — for example, starting with this walkthrough.
The project is still in a comparatively early stage, so if you want to get involved, there is lots to work on: SDKs for other languages (e.g. Go, JavaScript, Rust), ingresses/egresses and tools for testing, among others.
To follow the project and learn more, please check out these resources:
Code: https://github.com/apache/flink-statefun Docs: //nightlies.apache.org/flink/flink-statefun-docs-release-2.0/ Apache Flink project site: https://flink.apache.org/ Apache Flink on Twitter: @ApacheFlink Stateful Functions Webpage: https://statefun.io Stateful Functions on Twitter: @StateFun_IO Thank you! # The Apache Flink community would like to thank all contributors that have made this release possible:
David Anderson, Dian Fu, Igal Shilman, Seth Wiesman, Stephan Ewen, Tzu-Li (Gordon) Tai, hequn8128
`}),e.add({id:164,href:"/2020/03/30/flink-community-update-april20/",title:"Flink Community Update - April'20",section:"Flink Blog",content:`While things slow down around us, the Apache Flink community is privileged to remain as active as ever. This blogpost combs through the past few months to give you an update on the state of things in Flink — from core releases to Stateful Functions; from some good old community stats to a new development blog.
And since now it&rsquo;s more important than ever to keep up the spirits, we’d like to invite you to join the Flink Forward Virtual Conference, on April 22-24 (see Upcoming Events). Hope to see you there!
The Year (so far) in Flink # Flink 1.10 Release # To kick off the new year, the Flink community released Flink 1.10 with the record contribution of over 200 engineers. This release introduced significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and advances in Python support (PyFlink). Flink 1.10 also marked the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage.
The community is now discussing the release of Flink 1.10.1, covering some outstanding bugs from Flink 1.10.
Stateful Functions Contribution and 2.0 Release # Last January, the first version of Stateful Functions (statefun.io) code was pushed to the Flink repository. Stateful Functions started out as an API to build general purpose event-driven applications on Flink, taking advantage of its advanced state management mechanism to cut the “middleman” that usually handles state coordination in such applications (e.g. a database).
In a recent update, some new features were announced, like multi-language support (including a Python SDK), function unit testing and Stateful Functions’ own flavor of the State Processor API. The release cycle will be independent from core Flink releases and the Release Candidate (RC) has been created — so, you can expect Stateful Functions 2.0 to be released very soon!
Building up to Flink 1.11 # Amidst the usual outpour of discussion threads, JIRA tickets and FLIPs, the community is working full steam on bringing Flink 1.11 to life in the next few months. The feature freeze is currently scheduled for late April, so the release is expected around mid May. The upcoming release will focus on new features and integrations that broaden the scope of Flink use cases, as well as core runtime enhancements to streamline the operations of complex deployments.
Some of the plans on the use case side include support for changelog streams in the Table API/SQL (FLIP-105), easy streaming data ingestion into Apache Hive (FLIP-115) and support for Pandas DataFrames in PyFlink. On the operational side, the much anticipated new Source API (FLIP-27) will unify batch and streaming sources, and improve out-of-the-box event-time behavior; while unaligned checkpoints (FLIP-76) and some changes to network memory management will allow to speed up checkpointing under backpressure.
Throw into the mix improvements around type systems, the WebUI, metrics reporting and supported formats, this release is bound to keep the community busy. For a complete overview of the ongoing development, check this discussion and follow the weekly updates on the Flink @community mailing list.
New Committers and PMC Members # The Apache Flink community has welcomed 1 PMC (Project Management Committee) Member and 5 new Committers since the last update (September 2019):
New PMC Members # Jark Wu New Committers # Zili Chen, Jingsong Lee, Yu Li, Dian Fu, Zhu Zhu Congratulations to all and thank you for your hardworking commitment to Flink!
The Bigger Picture # A Look into the Flink Repository # In the last update, we shared some numbers around Flink releases and mailing list activity. This time, we’re looking into the activity in the Flink repository and how it’s evolving.
There is a clear upward trend in the number of contributions to the repository, based on the number of commits. This reflects the fast pace of development the project is experiencing and also the successful integration of the China-based Flink contributors started early last year. To complement these observations, the repository registered a 1.5x increase in the number of individual contributors in 2019, compared to the previous year.
But did this increase in capacity produce any other measurable benefits?
If we look at the average time of Pull Request (PR) “resolution”, it seems like it did: the average time it takes to close a PR these days has been steadily decreasing since last year, sitting between 5-6 days for the past few months.
These are great indicators of the health of Flink as an open source project!
Flink Community Packages # If you missed the launch of flink-packages.org, here’s a reminder! Ververica has created (and open sourced) a website that showcases the work of the community to push forward the ecosystem surrounding Flink. There, you can explore existing packages (like the Pravega and Pulsar Flink connectors, or the Flink Kubernetes operators developed by Google and Lyft) and also submit your own contributions to the ecosystem.
Flink &ldquo;Engine Room&rdquo; # The community has recently launched the “Engine Room”, a dedicated space in Flink’s Wiki for knowledge sharing between contributors. The goal of this initiative is to make ongoing development on Flink internals more transparent across different work streams, and also to help new contributors get on board with best practices. The first blogpost is already up and sheds light on the migration of Flink’s CI infrastructure from Travis to Azure Pipelines.
Upcoming Events # Flink Forward Virtual Conference # The organization of Flink Forward had to make the hard decision of cancelling this year’s event in San Francisco. But all is not lost! Flink Forward SF will be held online on April 22-24 and you can register (for free) here. Join the community for interactive talks and Q&amp;A sessions with core Flink contributors and companies like Splunk, Lyft, Netflix or Google.
Others # Events across the globe have come to a halt due to the growing concerns around COVID-19, so this time we’ll leave you with some interesting content to read instead. In addition to this written content, you can also recap last year’s sessions from Flink Forward Berlin and Flink Forward China!
Type Links Blogposts Replayable Process Functions: Time, Ordering, and Timers @Bird Application Log Intelligence & Performance Insights at Salesforce Using Flink @Salesforce State Unlocked: Interacting with State in Apache Flink Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic Apache Beam: How Beam Runs on Top of Flink Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration Tutorials Flink on Zeppelin — (Part 3). Streaming Streaming ETL with Apache Flink and Amazon Kinesis Data Analytics No Java Required: Configuring Sources and Sinks in SQL A Guide for Unit Testing in Apache Flink If you’d like to keep a closer eye on what’s happening in the community, subscribe to the Flink @community mailing list to get fine-grained weekly updates, upcoming event announcements and more.
`}),e.add({id:165,href:"/2020/03/27/flink-as-unified-engine-for-modern-data-warehousing-production-ready-hive-integration/",title:"Flink as Unified Engine for Modern Data Warehousing: Production-Ready Hive Integration",section:"Flink Blog",content:`In this blog post, you will learn our motivation behind the Flink-Hive integration, and how Flink 1.10 can help modernize your data warehouse.
Introduction # What are some of the latest requirements for your data warehouse and data infrastructure in 2020?
We’ve came up with some for you.
Firstly, today’s business is shifting to a more real-time fashion, and thus demands abilities to process online streaming data with low latency for near-real-time or even real-time analytics. People become less and less tolerant of delays between when data is generated and when it arrives at their hands, ready to use. Hours or even days of delay is not acceptable anymore. Users are expecting minutes, or even seconds, of end-to-end latency for data in their warehouse, to get quicker-than-ever insights.
Secondly, the infrastructure should be able to handle both offline batch data for offline analytics and exploration, and online streaming data for more timely analytics. Both are indispensable as they both have very valid use cases. Apart from the real time processing mentioned above, batch processing would still exist as it’s good for ad hoc queries and explorations, and full-size calculations. Your modern infrastructure should not force users to choose between one or the other, it should offer users both options for a world-class data infrastructure.
Thirdly, the data players, including data engineers, data scientists, analysts, and operations, urge a more unified infrastructure than ever before for easier ramp-up and higher working efficiency. The big data landscape has been fragmented for years - companies may have one set of infrastructure for real time processing, one set for batch, one set for OLAP, etc. That, oftentimes, comes as a result of the legacy of lambda architecture, which was popular in the era when stream processors were not as mature as today and users had to periodically run batch processing as a way to correct streaming pipelines. Well, it&rsquo;s a different era now! As stream processing becomes mainstream and dominant, end users no longer want to learn shattered pieces of skills and maintain many moving parts with all kinds of tools and pipelines. Instead, what they really need is a unified analytics platform that can be mastered easily, and simplify any operational complexity.
If any of these resonate with you, you just found the right post to read: we have never been this close to the vision by strengthening Flink’s integration with Hive to a production grade.
Flink and Its Integration With Hive Comes into the Scene # Apache Flink has been a proven scalable system to handle extremely high workload of streaming data in super low latency in many giant tech companies.
Despite its huge success in the real time processing domain, at its deep root, Flink has been faithfully following its inborn philosophy of being a unified data processing engine for both batch and streaming, and taking a streaming-first approach in its architecture to do batch processing. By making batch a special case for streaming, Flink really leverages its cutting edge streaming capabilities and applies them to batch scenarios to gain the best offline performance. Flink’s batch performance has been quite outstanding in the early days and has become even more impressive, as the community started merging Blink, Alibaba’s fork of Flink, back to Flink in 1.9 and finished it in 1.10.
On the other hand, Apache Hive has established itself as a focal point of the data warehousing ecosystem. It serves as not only a SQL engine for big data analytics and ETL, but also a data management platform, where data is discovered and defined. As business evolves, it puts new requirements on data warehouse.
Thus we started integrating Flink and Hive as a beta version in Flink 1.9. Over the past few months, we have been listening to users’ requests and feedback, extensively enhancing our product, and running rigorous benchmarks (which will be published soon separately). I’m glad to announce that the integration between Flink and Hive is at production grade in Flink 1.10 and we can’t wait to walk you through the details.
Unified Metadata Management # Hive Metastore has evolved into the de facto metadata hub over the years in the Hadoop, or even the cloud, ecosystem. Many companies have a single Hive Metastore service instance in production to manage all of their schemas, either Hive or non-Hive metadata, as the single source of truth.
In 1.9 we introduced Flink’s HiveCatalog, connecting Flink to users’ rich metadata pool. The meaning of HiveCatalog is two-fold here. First, it allows Apache Flink users to utilize Hive Metastore to store and manage Flink’s metadata, including tables, UDFs, and statistics of data. Second, it enables Flink to access Hive’s existing metadata, so that Flink itself can read and write Hive tables.
In Flink 1.10, users can store Flink&rsquo;s own tables, views, UDFs, statistics in Hive Metastore on all of the compatible Hive versions mentioned above. Here’s an end-to-end example of how to store a Flink’s Kafka source table in Hive Metastore and later query the table in Flink SQL.
Stream Processing # The Hive integration feature in Flink 1.10 empowers users to re-imagine what they can accomplish with their Hive data and unlock stream processing use cases:
join real-time streaming data in Flink with offline Hive data for more complex data processing backfill Hive data with Flink directly in a unified fashion leverage Flink to move real-time data into Hive more quickly, greatly shortening the end-to-end latency between when data is generated and when it arrives at your data warehouse for analytics, from hours — or even days — to minutes Compatible with More Hive Versions # In Flink 1.10, we brought full coverage to most Hive versions including 1.0, 1.1, 1.2, 2.0, 2.1, 2.2, 2.3, and 3.1. Take a look here.
Reuse Hive User Defined Functions (UDFs) # Users can reuse all kinds of Hive UDFs in Flink since Flink 1.9.
This is a great win for Flink users with past history with the Hive ecosystem, as they may have developed custom business logic in their Hive UDFs. Being able to run these functions without any rewrite saves users a lot of time and brings them a much smoother experience when they migrate to Flink.
To take it a step further, Flink 1.10 introduces compatibility of Hive built-in functions via HiveModule. Over the years, the Hive community has developed a few hundreds of built-in functions that are super handy for users. For those built-in functions that don&rsquo;t exist in Flink yet, users are now able to leverage the existing Hive built-in functions that they are familiar with and complete their jobs seamlessly.
Enhanced Read and Write on Hive Data # Flink 1.10 extends its read and write capabilities on Hive data to all the common use cases with better performance.
On the reading side, Flink now can read Hive regular tables, partitioned tables, and views. Lots of optimization techniques are developed around reading, including partition pruning and projection pushdown to transport less data from file storage, limit pushdown for faster experiment and exploration, and vectorized reader for ORC files.
On the writing side, Flink 1.10 introduces “INSERT INTO” and “INSERT OVERWRITE” to its syntax, and can write to not only Hive’s regular tables, but also partitioned tables with either static or dynamic partitions.
Formats # Your engine should be able to handle all common types of file formats to give you the freedom of choosing one over another in order to fit your business needs. It’s no exception for Flink. We have tested the following table storage formats: text, csv, SequenceFile, ORC, and Parquet.
More Data Types # In Flink 1.10, we added support for a few more frequently-used Hive data types that were not covered by Flink 1.9. Flink users now should have a full, smooth experience to query and manipulate Hive data from Flink.
Roadmap # Integration between any two systems is a never-ending story.
We are constantly improving Flink itself and the Flink-Hive integration also gets improved by collecting user feedback and working with folks in this vibrant community.
After careful consideration and prioritization of the feedback we received, we have prioritize many of the below requests for the next Flink release of 1.11.
Hive streaming sink so that Flink can stream data into Hive tables, bringing a real streaming experience to Hive Native Parquet reader for better performance Additional interoperability - support creating Hive tables, views, functions in Flink Better out-of-box experience with built-in dependencies, including documentations JDBC driver so that users can reuse their existing toolings to run SQL jobs on Flink Hive syntax and semantic compatible mode If you have more feature requests or discover bugs, please reach out to the community through mailing list and JIRAs.
Summary # Data warehousing is shifting to a more real-time fashion, and Apache Flink can make a difference for your organization in this space.
Flink 1.10 brings production-ready Hive integration and empowers users to achieve more in both metadata management and unified/batch data processing.
We encourage all our users to get their hands on Flink 1.10. You are very welcome to join the community in development, discussions, and all other kinds of collaborations in this topic.
`}),e.add({id:166,href:"/2020/03/24/advanced-flink-application-patterns-vol.2-dynamic-updates-of-application-logic/",title:"Advanced Flink Application Patterns Vol.2: Dynamic Updates of Application Logic",section:"Flink Blog",content:`In the first article of the series, we gave a high-level description of the objectives and required functionality of a Fraud Detection engine. We also described how to make data partitioning in Apache Flink customizable based on modifiable rules instead of using a hardcoded KeysExtractor implementation.
We intentionally omitted details of how the applied rules are initialized and what possibilities exist for updating them at runtime. In this post, we will address exactly these details. You will learn how the approach to data partitioning described in Part 1 can be applied in combination with a dynamic configuration. These two patterns, when used together, can eliminate the need to recompile the code and redeploy your Flink job for a wide range of modifications of the business logic.
Rules Broadcasting # Let&rsquo;s first have a look at the previously-defined data-processing pipeline:
DataStream&lt;Alert&gt; alerts = transactions .process(new DynamicKeyFunction()) .keyBy((keyed) -&gt; keyed.getKey()); .process(new DynamicAlertFunction()) DynamicKeyFunction provides dynamic data partitioning while DynamicAlertFunction is responsible for executing the main logic of processing transactions and sending alert messages according to defined rules.
Vol.1 of this series simplified the use case and assumed that the applied set of rules is pre-initialized and accessible via the List&lt;Rules&gt; within DynamicKeyFunction.
public class DynamicKeyFunction extends ProcessFunction&lt;Transaction, Keyed&lt;Transaction, String, Integer&gt;&gt; { /* Simplified */ List&lt;Rule&gt; rules = /* Rules that are initialized somehow.*/; ... } Adding rules to this list is obviously possible directly inside the code of the Flink Job at the stage of its initialization (Create a List object; use it&rsquo;s add method). A major drawback of doing so is that it will require recompilation of the job with each rule modification. In a real Fraud Detection system, rules are expected to change on a frequent basis, making this approach unacceptable from the point of view of business and operational requirements. A different approach is needed.
Next, let&rsquo;s take a look at a sample rule definition that we introduced in the previous post of the series:
Figure 1: Rule definition The previous post covered use of groupingKeyNames by DynamicKeyFunction to extract message keys. Parameters from the second part of this rule are used by DynamicAlertFunction: they define the actual logic of the performed operations and their parameters (such as the alert-triggering limit). This means that the same rule must be present in both DynamicKeyFunction and DynamicAlertFunction. To achieve this result, we will use the broadcast data distribution mechanism of Apache Flink.
Figure 2 presents the final job graph of the system that we are building:
Figure 2: Job Graph of the Fraud Detection Flink Job The main blocks of the Transactions processing pipeline are:
Transaction Source that consumes transaction messages from Kafka partitions in parallel. Dynamic Key Function that performs data enrichment with a dynamic key. The subsequent keyBy hashes this dynamic key and partitions the data accordingly among all parallel instances of the following operator.
Dynamic Alert Function that accumulates a data window and creates Alerts based on it.
Data Exchange inside Apache Flink # The job graph above also indicates various data exchange patterns between the operators. In order to understand how the broadcast pattern works, let&rsquo;s take a short detour and discuss what methods of message propagation exist in Apache Flink&rsquo;s distributed runtime.
The FORWARD connection after the Transaction Source means that all data consumed by one of the parallel instances of the Transaction Source operator is transferred to exactly one instance of the subsequent DynamicKeyFunction operator. It also indicates the same level of parallelism of the two connected operators (12 in the above case). This communication pattern is illustrated in Figure 3. Orange circles represent transactions, and dotted rectangles depict parallel instances of the conjoined operators. Figure 3: FORWARD message passing across operator instances The HASH connection between DynamicKeyFunction and DynamicAlertFunction means that for each message a hash code is calculated and messages are evenly distributed among available parallel instances of the next operator. Such a connection needs to be explicitly &ldquo;requested&rdquo; from Flink by using keyBy. Figure 4: HASHED message passing across operator instances (via \`keyBy\`) A REBALANCE distribution is either caused by an explicit call to rebalance() or by a change of parallelism (12 -&gt; 1 in the case of the job graph from Figure 2). Calling rebalance() causes data to be repartitioned in a round-robin fashion and can help to mitigate data skew in certain scenarios. Figure 5: REBALANCE message passing across operator instances The Fraud Detection job graph in Figure 2 contains an additional data source: Rules Source. It also consumes from Kafka. Rules are &ldquo;mixed into&rdquo; the main processing data flow through the BROADCAST channel. Unlike other methods of transmitting data between operators, such as forward, hash or rebalance that make each message available for processing in only one of the parallel instances of the receiving operator, broadcast makes each message available at the input of all of the parallel instances of the operator to which the broadcast stream is connected. This makes broadcast applicable to a wide range of tasks that need to affect the processing of all messages, regardless of their key or source partition.
Figure 6: BROADCAST message passing across operator instances Note There are actually a few more specialized data partitioning schemes in Flink which we did not mention here. If you want to find out more, please refer to Flink's documentation on __[stream partitioning](//nightlies.apache.org/flink/flink-docs-stable/dev/stream/operators/#physical-partitioning)__. Broadcast State Pattern # In order to make use of the Rules Source, we need to &ldquo;connect&rdquo; it to the main data stream:
// Streams setup DataStream&lt;Transaction&gt; transactions = [...] DataStream&lt;Rule&gt; rulesUpdateStream = [...] BroadcastStream&lt;Rule&gt; rulesStream = rulesUpdateStream.broadcast(RULES_STATE_DESCRIPTOR); // Processing pipeline setup DataStream&lt;Alert&gt; alerts = transactions .connect(rulesStream) .process(new DynamicKeyFunction()) .keyBy((keyed) -&gt; keyed.getKey()) .connect(rulesStream) .process(new DynamicAlertFunction()) As you can see, the broadcast stream can be created from any regular stream by calling the broadcast method and specifying a state descriptor. Flink assumes that broadcasted data needs to be stored and retrieved while processing events of the main data flow and, therefore, always automatically creates a corresponding broadcast state from this state descriptor. This is different from any other Apache Flink state type in which you need to initialize it in the open() method of the processing function. Also note that broadcast state always has a key-value format (MapState).
public static final MapStateDescriptor&lt;Integer, Rule&gt; RULES_STATE_DESCRIPTOR = new MapStateDescriptor&lt;&gt;(&#34;rules&#34;, Integer.class, Rule.class); Connecting to rulesStream causes some changes in the signature of the processing functions. The previous article presented it in a slightly simplified way as a ProcessFunction. However, DynamicKeyFunction is actually a BroadcastProcessFunction.
public abstract class BroadcastProcessFunction&lt;IN1, IN2, OUT&gt; { public abstract void processElement(IN1 value, ReadOnlyContext ctx, Collector&lt;OUT&gt; out) throws Exception; public abstract void processBroadcastElement(IN2 value, Context ctx, Collector&lt;OUT&gt; out) throws Exception; } The difference is the addition of the processBroadcastElement method through which messages of the rules stream will arrive. The following new version of DynamicKeyFunction allows modifying the list of data-distribution keys at runtime through this stream:
public class DynamicKeyFunction extends BroadcastProcessFunction&lt;Transaction, Rule, Keyed&lt;Transaction, String, Integer&gt;&gt; { @Override public void processBroadcastElement(Rule rule, Context ctx, Collector&lt;Keyed&lt;Transaction, String, Integer&gt;&gt; out) { BroadcastState&lt;Integer, Rule&gt; broadcastState = ctx.getBroadcastState(RULES_STATE_DESCRIPTOR); broadcastState.put(rule.getRuleId(), rule); } @Override public void processElement(Transaction event, ReadOnlyContext ctx, Collector&lt;Keyed&lt;Transaction, String, Integer&gt;&gt; out){ ReadOnlyBroadcastState&lt;Integer, Rule&gt; rulesState = ctx.getBroadcastState(RULES_STATE_DESCRIPTOR); for (Map.Entry&lt;Integer, Rule&gt; entry : rulesState.immutableEntries()) { final Rule rule = entry.getValue(); out.collect( new Keyed&lt;&gt;( event, KeysExtractor.getKey(rule.getGroupingKeyNames(), event), rule.getRuleId())); } } } In the above code, processElement() receives Transactions, and processBroadcastElement() receives Rule updates. When a new rule is created, it is distributed as depicted in Figure 6 and saved in all parallel instances of the operator using processBroadcastState. We use a Rule&rsquo;s ID as the key to store and reference individual rules. Instead of iterating over a hardcoded List&lt;Rules&gt;, we iterate over entries in the dynamically-updated broadcast state.
DynamicAlertFunction follows the same logic with respect to storing the rules in the broadcast MapState. As described in Part 1, each message in the processElement input is intended to be processed by one specific rule and comes &ldquo;pre-marked&rdquo; with a corresponding ID by DynamicKeyFunction. All we need to do is retrieve the definition of the corresponding rule from BroadcastState by using the provided ID and process it according to the logic required by that rule. At this stage, we will also add messages to the internal function state in order to perform calculations on the required time window of data. We will consider how this is done in the final blog of the series about Fraud Detection.
Summary # In this blog post, we continued our investigation of the use case of a Fraud Detection System built with Apache Flink. We looked into different ways in which data can be distributed between parallel operator instances and, most importantly, examined broadcast state. We demonstrated how dynamic partitioning — a pattern described in the first part of the series — can be combined and enhanced by the functionality provided by the broadcast state pattern. The ability to send dynamic updates at runtime is a powerful feature of Apache Flink that is applicable in a variety of other use cases, such as controlling state (cleanup/insert/fix), running A/B experiments or executing updates of ML model coefficients.
`}),e.add({id:167,href:"/2020/02/22/apache-beam-how-beam-runs-on-top-of-flink/",title:"Apache Beam: How Beam Runs on Top of Flink",section:"Flink Blog",content:`Note: This blog post is based on the talk &ldquo;Beam on Flink: How Does It Actually Work?&rdquo;.
Apache Flink and Apache Beam are open-source frameworks for parallel, distributed data processing at scale. Unlike Flink, Beam does not come with a full-blown execution engine of its own but plugs into other execution engines, such as Apache Flink, Apache Spark, or Google Cloud Dataflow. In this blog post we discuss the reasons to use Flink together with Beam for your batch and stream processing needs. We also take a closer look at how Beam works with Flink to provide an idea of the technical aspects of running Beam pipelines with Flink. We hope you find some useful information on how and why the two frameworks can be utilized in combination. For more information, you can refer to the corresponding documentation on the Beam website or contact the community through the Beam mailing list.
What is Apache Beam # Apache Beam is an open-source, unified model for defining batch and streaming data-parallel processing pipelines. It is unified in the sense that you use a single API, in contrast to using a separate API for batch and streaming like it is the case in Flink. Beam was originally developed by Google which released it in 2014 as the Cloud Dataflow SDK. In 2016, it was donated to the Apache Software Foundation with the name of Beam. It has been developed by the open-source community ever since. With Apache Beam, developers can write data processing jobs, also known as pipelines, in multiple languages, e.g. Java, Python, Go, SQL. A pipeline is then executed by one of Beam’s Runners. A Runner is responsible for translating Beam pipelines such that they can run on an execution engine. Every supported execution engine has a Runner. The following Runners are available: Apache Flink, Apache Spark, Apache Samza, Hazelcast Jet, Google Cloud Dataflow, and others.
The execution model, as well as the API of Apache Beam, are similar to Flink&rsquo;s. Both frameworks are inspired by the MapReduce, MillWheel, and Dataflow papers. Like Flink, Beam is designed for parallel, distributed data processing. Both have similar transformations, support for windowing, event/processing time, watermarks, timers, triggers, and much more. However, Beam not being a full runtime focuses on providing the framework for building portable, multi-language batch and stream processing pipelines such that they can be run across several execution engines. The idea is that you write your pipeline once and feed it with either batch or streaming data. When you run it, you just pick one of the supported backends to execute. A large integration test suite in Beam called &ldquo;ValidatesRunner&rdquo; ensures that the results will be the same, regardless of which backend you choose for the execution.
One of the most exciting developments in the Beam technology is the framework’s support for multiple programming languages including Java, Python, Go, Scala and SQL. Essentially, developers can write their applications in a programming language of their choice. Beam, with the help of the Runners, translates the program to one of the execution engines, as shown in the diagram below.
Reasons to use Beam with Flink # Why would you want to use Beam with Flink instead of directly using Flink? Ultimately, Beam and Flink complement each other and provide additional value to the user. The main reasons for using Beam with Flink are the following:
Beam provides a unified API for both batch and streaming scenarios. Beam comes with native support for different programming languages, like Python or Go with all their libraries like Numpy, Pandas, Tensorflow, or TFX. You get the power of Apache Flink like its exactly-once semantics, strong memory management and robustness. Beam programs run on your existing Flink infrastructure or infrastructure for other supported Runners, like Spark or Google Cloud Dataflow. You get additional features like side inputs and cross-language pipelines that are not supported natively in Flink but only supported when using Beam with Flink. The Flink Runner in Beam # The Flink Runner in Beam translates Beam pipelines into Flink jobs. The translation can be parameterized using Beam&rsquo;s pipeline options which are parameters for settings like configuring the job name, parallelism, checkpointing, or metrics reporting.
If you are familiar with a DataSet or a DataStream, you will have no problems understanding what a PCollection is. PCollection stands for parallel collection in Beam and is exactly what DataSet/DataStream would be in Flink. Due to Beam&rsquo;s unified API we only have one type of results of transformation: PCollection.
Beam pipelines are composed of transforms. Transforms are like operators in Flink and come in two flavors: primitive and composite transforms. The beauty of all this is that Beam only comes with a small set of primitive transforms which are:
Source (for loading data) ParDo (think of a flat map operator on steroids) GroupByKey (think of keyBy() in Flink) AssignWindows (windows can be assigned at any point in time in Beam) Flatten (like a union() operation in Flink) Composite transforms are built by combining the above primitive transforms. For example, Combine = GroupByKey + ParDo.
Flink Runner Internals # Although using the Flink Runner in Beam has no prerequisite to understanding its internals, we provide more details of how the Flink runner works in Beam to share knowledge of how the two frameworks can integrate and work together to provide state-of-the-art streaming data pipelines.
The Flink Runner has two translation paths. Depending on whether we execute in batch or streaming mode, the Runner either translates into Flink&rsquo;s DataSet or into Flink&rsquo;s DataStream API. Since multi-language support has been added to Beam, another two translation paths have been added. To summarize the four modes:
The Classic Flink Runner for batch jobs: Executes batch Java pipelines The Classic Flink Runner for streaming jobs: Executes streaming Java pipelines The Portable Flink Runner for batch jobs: Executes Java as well as Python, Go and other supported SDK pipelines for batch scenarios The Portable Flink Runner for streaming jobs: Executes Java as well as Python, Go and other supported SDK pipelines for streaming scenarios The “Classic” Flink Runner in Beam # The classic Flink Runner was the initial version of the Runner, hence the &ldquo;classic&rdquo; name. Beam pipelines are represented as a graph in Java which is composed of the aforementioned composite and primitive transforms. Beam provides translators which traverse the graph in topological order. Topological order means that we start from all the sources first as we iterate through the graph. Presented with a transform from the graph, the Flink Runner generates the API calls as you would normally when writing a Flink job.
While Beam and Flink share very similar concepts, there are enough differences between the two frameworks that make Beam pipelines impossible to be translated 1:1 into a Flink program. In the following sections, we will present the key differences:
Serializers vs Coders # When data is transferred over the wire in Flink, it has to be turned into bytes. This is done with the help of serializers. Flink has a type system to instantiate the correct coder for a given type, e.g. StringTypeSerializer for a String. Apache Beam also has its own type system which is similar to Flink&rsquo;s but uses slightly different interfaces. Serializers are called Coders in Beam. In order to make a Beam Coder run in Flink, we have to make the two serializer types compatible. This is done by creating a special Flink type information that looks like the one in Flink but calls the appropriate Beam coder. That way, we can use Beam&rsquo;s coders although we are executing the Beam job with Flink. Flink operators expect a TypeInformation, e.g. StringTypeInformation, for which we use a CoderTypeInformation in Beam. The type information returns the serializer for which we return a CoderTypeSerializer, which calls the underlying Beam Coder.
Read # The Read transform provides a way to read data into your pipeline in Beam. The Read transform is supported by two wrappers in Beam, the SourceInputFormat for batch processing and the UnboundedSourceWrapper for stream processing.
ParDo # ParDo is the swiss army knife of Beam and can be compared to a RichFlatMapFunction in Flink with additional features such as SideInputs, SideOutputs, State and Timers. ParDo is essentially translated by the Flink runner using the FlinkDoFnFunction for batch processing or the FlinkStatefulDoFnFunction, while for streaming scenarios the translation is executed with the DoFnOperator that takes care of checkpointing and buffering of data during checkpoints, watermark emissions and maintenance of state and timers. This is all executed by Beam’s interface, called the DoFnRunner, that encapsulates Beam-specific execution logic, like retrieving state, executing state and timers, or reporting metrics.
Side Inputs # In addition to the main input, ParDo transforms can have a number of side inputs. A side input can be a static set of data that you want to have available at all parallel instances. However, it is more flexible than that. You can have keyed and even windowed side input which updates based on the window size. This is a very powerful concept which does not exist in Flink but is added on top of Flink using Beam.
AssignWindows # In Flink, windows are assigned by the WindowOperator when you use the window() in the API. In Beam, windows can be assigned at any point in time. Any element is implicitly part of a window. If no window is assigned explicitly, the element is part of the GlobalWindow. Window information is stored for each element in a wrapper called WindowedValue. The window information is only used once we issue a GroupByKey.
GroupByKey # Most of the time it is useful to partition the data by a key. In Flink, this is done via the keyBy() API call. In Beam the GroupByKey transform can only be applied if the input is of the form KV&lt;Key, Value&gt;. Unlike Flink where the key can even be nested inside the data, Beam enforces the key to always be explicit. The GroupByKey transform then groups the data by key and by window which is similar to what keyBy(..).window(..) would give us in Flink. Beam has its own set of libraries to do that because Beam has its own set of window functions and triggers. Essentially, GroupByKey is very similar to what the WindowOperator does in Flink.
Flatten # The Flatten operator takes multiple DataSet/DataStreams, called P[arallel]Collections in Beam, and combines them into one collection. This is equivalent to Flink&rsquo;s union() operation.
The “Portable” Flink Runner in Beam # The portable Flink Runner in Beam is the evolution of the classic Runner. Classic Runners are tied to the JVM ecosystem, but the Beam community wanted to move past this and also execute Python, Go and other languages. This adds another dimension to Beam in terms of portability because, like previously mentioned, Beam already had portability across execution engines. It was necessary to change the translation logic of the Runner to be able to support language portability.
There are two important building blocks for portable Runners:
A common pipeline format across all the languages: The Runner API A common interface during execution for the communication between the Runner and the code written in any language: The Fn API The Runner API provides a universal representation of the pipeline as Protobuf which contains the transforms, types, and user code. Protobuf was chosen as the format because every language has libraries available for it. Similarly, for the execution part, Beam introduced the Fn API interface to handle the communication between the Runner/execution engine and the user code that may be written in a different language and executes in a different process. Fn API is pronounced &ldquo;fun API&rdquo;, you may guess why.
How Are Beam Programs Translated In Language Portability? # Users write their Beam pipelines in one language, but they may get executed in an environment based on a completely different language. How does that work? To explain that, let&rsquo;s follow the lifecycle of a pipeline. Let&rsquo;s suppose we use the Python SDK to write the pipeline. Before submitting the pipeline via the Job API to Beam&rsquo;s JobServer, Beam would convert it to the Runner API, the language-agnostic format we described before. The JobServer is also a Beam component that handles the staging of the required dependencies during execution. The JobServer will then kick-off the translation which is similar to the classic Runner. However, an important change is the so-called ExecutableStage transform. It is essentially a ParDo transform that we already know but designed for holding language-dependent code. Beam tries to combine as many of these transforms into one &ldquo;executable stage&rdquo;. The result again is a Flink program which is then sent to the Flink cluster and executed there. The major difference compared to the classic Runner is that during execution we will start environments to execute the aforementioned ExecutableStages. The following environments are available:
Docker-based (the default) Process-based (a simple process is started) Externally-provided (K8s or other schedulers) Embedded (intended for testing and only works with Java) Environments hold the SDK Harness which is the code that handles the execution and the communication with the Runner over the Fn API. For example, when Flink executes Python code, it sends the data to the Python environment containing the Python SDK Harness. Sending data to an external process involves a minor overhead which we have measured to be 5-10% slower than the classic Java pipelines. However, Beam uses a fusion of transforms to execute as many transforms as possible in the same environment which share the same input or output. That&rsquo;s why in real-world scenarios the overhead could be much lower.
Environments can be present for many languages. This opens up an entirely new type of pipelines: cross-language pipelines. In cross-language pipelines we can combine transforms of two or more languages, e.g. a machine learning pipeline with the feature generation written in Java and the learning written in Python. All this can be run on top of Flink.
Conclusion # Using Apache Beam with Apache Flink combines (a.) the power of Flink with (b.) the flexibility of Beam. All it takes to run Beam is a Flink cluster, which you may already have. Apache Beam&rsquo;s fully-fledged Python API is probably the most compelling argument for using Beam with Flink, but the unified API which allows to &ldquo;write-once&rdquo; and &ldquo;execute-anywhere&rdquo; is also very appealing to Beam users. On top of this, features like side inputs and a rich connector ecosystem are also reasons why people like Beam.
With the introduction of schemas, a new format for handling type information, Beam is heading in a similar direction as Flink with its type system which is essential for the Table API or SQL. Speaking of, the next Flink release will include a Python version of the Table API which is based on the language portability of Beam. Looking ahead, the Beam community plans to extend the support for interactive programs like notebooks. TFX, which is built with Beam, is a very powerful way to solve many problems around training and validating machine learning models.
For many years, Beam and Flink have inspired and learned from each other. With the Python support being based on Beam in Flink, they only seem to come closer to each other. That&rsquo;s all the better for the community, and also users have more options and functionality to choose from.
`}),e.add({id:168,href:"/2020/02/20/no-java-required-configuring-sources-and-sinks-in-sql/",title:"No Java Required: Configuring Sources and Sinks in SQL",section:"Flink Blog",content:` Introduction # The recent Apache Flink 1.10 release includes many exciting features. In particular, it marks the end of the community&rsquo;s year-long effort to merge in the Blink SQL contribution from Alibaba. The reason the community chose to spend so much time on the contribution is that SQL works. It allows Flink to offer a truly unified interface over batch and streaming and makes stream processing accessible to a broad audience of developers and analysts. Best of all, Flink SQL is ANSI-SQL compliant, which means if you&rsquo;ve ever used a database in the past, you already know it1!
A lot of work focused on improving runtime performance and progressively extending its coverage of the SQL standard. Flink now supports the full TPC-DS query set for batch queries, reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads. Its streaming SQL supports an almost equal set of features - those that are well defined on a streaming runtime - including complex joins and MATCH_RECOGNIZE.
As important as this work is, the community also strives to make these features generally accessible to the broadest audience possible. That is why the Flink community is excited in 1.10 to offer production-ready DDL syntax (e.g., CREATE TABLE, DROP TABLE) and a refactored catalog interface.
Accessing Your Data Where It Lives # Flink does not store data at rest; it is a compute engine and requires other systems to consume input from and write its output. Those that have used Flink&rsquo;s DataStream API in the past will be familiar with connectors that allow for interacting with external systems. Flink has a vast connector ecosystem that includes all major message queues, filesystems, and databases.
If your favorite system does not have a connector maintained in the central Apache Flink repository, check out the flink packages website, which has a growing number of community-maintained components. While these connectors are battle-tested and production-ready, they are written in Java and configured in code, which means they are not amenable to pure SQL or Table applications. For a holistic SQL experience, not only queries need to be written in SQL, but also table definitions.
CREATE TABLE Statements # While Flink SQL has long provided table abstractions atop some of Flink&rsquo;s most popular connectors, configurations were not always so straightforward. Beginning in 1.10, Flink supports defining tables through CREATE TABLE statements. With this feature, users can now create logical tables, backed by various external systems, in pure SQL.
By defining tables in SQL, developers can write queries against logical schemas that are abstracted away from the underlying physical data store. Coupled with Flink SQL&rsquo;s unified approach to batch and stream processing, Flink provides a straight line from discovery to production.
Users can define tables over static data sets, anything from a local CSV file to a full-fledged data lake or even Hive. Leveraging Flink&rsquo;s efficient batch processing capabilities, they can perform ad-hoc queries searching for exciting insights. Once something interesting is identified, businesses can gain real-time and continuous insights by merely altering the table so that it is powered by a message queue such as Kafka. Because Flink guarantees SQL queries have unified semantics over batch and streaming, users can be confident that redeploying this query as a continuous streaming application over a message queue will output identical results.
-- Define a table called orders that is backed by a Kafka topic -- The definition includes all relevant Kafka properties, -- the underlying format (JSON) and even defines a -- watermarking algorithm based on one of the fields -- so that this table can be used with event time. CREATE TABLE orders ( user_id BIGINT, product STRING, order_time TIMESTAMP(3), WATERMARK FOR order_time AS order_time - &#39;5&#39; SECONDS ) WITH ( &#39;connector.type&#39; = &#39;kafka&#39;, &#39;connector.version&#39; = &#39;universal&#39;, &#39;connector.topic&#39; = &#39;orders&#39;, &#39;connector.startup-mode&#39; = &#39;earliest-offset&#39;, &#39;connector.properties.bootstrap.servers&#39; = &#39;localhost:9092&#39;, &#39;format.type&#39; = &#39;json&#39; ); -- Define a table called product_analysis -- on top of ElasticSearch 7 where we -- can write the results of our query. CREATE TABLE product_analysis ( product STRING, tracking_time TIMESTAMP(3), units_sold BIGINT ) WITH ( &#39;connector.type&#39; = &#39;elasticsearch&#39;, &#39;connector.version&#39; = &#39;7&#39;, &#39;connector.hosts&#39; = &#39;localhost:9200&#39;, &#39;connector.index&#39; = &#39;ProductAnalysis&#39;, &#39;connector.document.type&#39; = &#39;analysis&#39; ); -- A simple query that analyzes order data -- from Kafka and writes results into -- ElasticSearch. INSERT INTO product_analysis SELECT product_id, TUMBLE_START(order_time, INTERVAL &#39;1&#39; DAY) as tracking_time, COUNT(*) as units_sold FROM orders GROUP BY product_id, TUMBLE(order_time, INTERVAL &#39;1&#39; DAY); Catalogs # While being able to create tables is important, it often isn&rsquo;t enough. A business analyst, for example, shouldn&rsquo;t have to know what properties to set for Kafka, or even have to know what the underlying data source is, to be able to write a query.
To solve this problem, Flink 1.10 also ships with a revamped catalog system for managing metadata about tables and user definined functions. With catalogs, users can create tables once and reuse them across Jobs and Sessions. Now, the team managing a data set can create a table and immediately make it accessible to other groups within their organization.
The most notable catalog that Flink integrates with today is Hive Metastore. The Hive catalog allows Flink to fully interoperate with Hive and serve as a more efficient query engine. Flink supports reading and writing Hive tables, using Hive UDFs, and even leveraging Hive&rsquo;s metastore catalog to persist Flink specific metadata.
Looking Ahead # Flink SQL has made enormous strides to democratize stream processing, and 1.10 marks a significant milestone in that development. However, we are not ones to rest on our laurels and, the community is committed to raising the bar on standards while lowering the barriers to entry. The community is looking to add more catalogs, such as JDBC and Apache Pulsar. We encourage you to sign up for the mailing list and stay on top of the announcements and new features in upcoming releases.
My colleague Timo, whose worked on Flink SQL from the beginning, has the entire SQL standard printed on his desk and references it before any changes are merged. It&rsquo;s enormous.&#160;&#x21a9;&#xfe0e;
`}),e.add({id:169,href:"/2020/02/11/apache-flink-1.10.0-release-announcement/",title:"Apache Flink 1.10.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is excited to hit the double digits and announce the release of Flink 1.10.0! As a result of the biggest community effort to date, with over 1.2k issues implemented and more than 200 contributors, this release introduces significant improvements to the overall performance and stability of Flink jobs, a preview of native Kubernetes integration and great advances in Python support (PyFlink).
Flink 1.10 also marks the completion of the Blink integration, hardening streaming SQL and bringing mature batch processing to Flink with production-ready Hive integration and TPC-DS coverage. This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward.
The binary distribution and source artifacts are now available on the updated Downloads page of the Flink website. For more details, check the complete release changelog and the updated documentation. We encourage you to download the release and share your feedback with the community through the Flink mailing lists or JIRA.
New Features and Improvements # Improved Memory Management and Configuration # The current TaskExecutor memory configuration in Flink has some shortcomings that make it hard to reason about or optimize resource utilization, such as:
Different configuration models for memory footprint in Streaming and Batch execution;
Complex and user-dependent configuration of off-heap state backends (i.e. RocksDB) in Streaming execution.
To make memory options more explicit and intuitive to users, Flink 1.10 introduces significant changes to the TaskExecutor memory model and configuration logic (FLIP-49). These changes make Flink more adaptable to all kinds of deployment environments (e.g. Kubernetes, Yarn, Mesos), giving users strict control over its memory consumption.
Managed Memory Extension
Managed memory was extended to also account for memory usage of RocksDBStateBackend. While batch jobs can use either on-heap or off-heap memory, streaming jobs with RocksDBStateBackend can use off-heap memory only. Therefore, to allow users to switch between Streaming and Batch execution without having to modify cluster configurations, managed memory is now always off-heap.
Simplified RocksDB Configuration
Configuring an off-heap state backend like RocksDB used to involve a good deal of manual tuning, like decreasing the JVM heap size or setting Flink to use off-heap memory. This can now be achieved through Flink&rsquo;s out-of-box configuration, and adjusting the memory budget for RocksDBStateBackend is as simple as resizing the managed memory size.
Another important improvement was to allow Flink to bind RocksDB native memory usage (FLINK-7289), preventing it from exceeding its total memory budget — this is especially relevant in containerized environments like Kubernetes. For details on how to enable and tune this feature, refer to Tuning RocksDB.
Note FLIP-49 changes the process of cluster resource configuration, which may require tuning your clusters for upgrades from previous Flink versions. For a comprehensive overview of the changes introduced and tuning guidance, consult this setup.
Unified Logic for Job Submission # Prior to this release, job submission was part of the duties of the Execution Environments and closely tied to the different deployment targets (e.g. Yarn, Kubernetes, Mesos). This led to a poor separation of concerns and, over time, to a growing number of customized environments that users needed to configure and manage separately.
In Flink 1.10, job submission logic is abstracted into the generic Executor interface (FLIP-73). The addition of the ExecutorCLI (FLIP-81) introduces a unified way to specify configuration parameters for any execution target. To round up this effort, the process of result retrieval was also decoupled from job submission with the introduction of a JobClient (FLINK-74), responsible for fetching the JobExecutionResult.
In particular, these changes make it much easier to programmatically use Flink in downstream frameworks — for example, Apache Beam or Zeppelin interactive notebooks — by providing users with a unified entry point to Flink. For users working with Flink across multiple target environments, the transition to a configuration-based execution process also significantly reduces boilerplate code and maintainability overhead.
Native Kubernetes Integration (Beta) # For users looking to get started with Flink on a containerized environment, deploying and managing a standalone cluster on top of Kubernetes requires some upfront knowledge about containers, operators and environment-specific tools like kubectl.
In Flink 1.10, we rolled out the first phase of Active Kubernetes Integration (FLINK-9953) with support for session clusters (with per-job planned). In this context, “active” means that Flink’s ResourceManager (K8sResMngr) natively communicates with Kubernetes to allocate new pods on-demand, similar to Flink’s Yarn and Mesos integration. Users can also leverage namespaces to launch Flink clusters for multi-tenant environments with limited aggregate resource consumption. RBAC roles and service accounts with enough permission should be configured beforehand.
As introduced in Unified Logic For Job Submission, all command-line options in Flink 1.10 are mapped to a unified configuration. For this reason, users can simply refer to the Kubernetes config options and submit a job to an existing Flink session on Kubernetes in the CLI using:
./bin/flink run -d -e kubernetes-session -Dkubernetes.cluster-id=&lt;ClusterId&gt; examples/streaming/WindowJoin.jar If you want to try out this preview feature, we encourage you to walk through the Native Kubernetes setup, play around with it and share feedback with the community.
Table API/SQL: Production-ready Hive Integration # Hive integration was announced as a preview feature in Flink 1.9. This preview allowed users to persist Flink-specific metadata (e.g. Kafka tables) in Hive Metastore using SQL DDL, call UDFs defined in Hive and use Flink for reading and writing Hive tables. Flink 1.10 rounds up this effort with further developments that bring production-ready Hive integration to Flink with full compatibility of most Hive versions.
Native Partition Support for Batch SQL # So far, only writes to non-partitioned Hive tables were supported. In Flink 1.10, the Flink SQL syntax has been extended with INSERT OVERWRITE and PARTITION (FLIP-63), enabling users to write into both static and dynamic partitions in Hive.
Static Partition Writing
INSERT { INTO | OVERWRITE } TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement; Dynamic Partition Writing
INSERT { INTO | OVERWRITE } TABLE tablename1 select_statement1 FROM from_statement; Fully supporting partitioned tables allows users to take advantage of partition pruning on read, which significantly increases the performance of these operations by reducing the amount of data that needs to be scanned.
Further Optimizations # Besides partition pruning, Flink 1.10 introduces more read optimizations to Hive integration, such as:
Projection pushdown: Flink leverages projection pushdown to minimize data transfer between Flink and Hive tables by omitting unnecessary fields from table scans. This is especially beneficial for tables with a large number of columns.
LIMIT pushdown: for queries with the LIMIT clause, Flink will limit the number of output records wherever possible to minimize the amount of data transferred across the network.
ORC Vectorization on Read: to boost read performance for ORC files, Flink now uses the native ORC Vectorized Reader by default for Hive versions above 2.0.0 and columns with non-complex data types.
Pluggable Modules as Flink System Objects (Beta) # Flink 1.10 introduces a generic mechanism for pluggable modules in the Flink table core, with a first focus on system functions (FLIP-68). With modules, users can extend Flink’s system objects — for example use Hive built-in functions that behave like Flink system functions. This release ships with a pre-implemented HiveModule, supporting multiple Hive versions, but users are also given the possibility to write their own pluggable modules.
Other Improvements to the Table API/SQL # Watermarks and Computed Columns in SQL DDL # Flink 1.10 supports stream-specific syntax extensions to define time attributes and watermark generation in Flink SQL DDL (FLIP-66). This allows time-based operations, like windowing, and the definition of watermark strategies on tables created using DDL statements.
CREATE TABLE table_name ( WATERMARK FOR columnName AS &lt;watermark_strategy_expression&gt; ) WITH ( ... ) This release also introduces support for virtual computed columns (FLIP-70) that can be derived based on other columns in the same table or deterministic expressions (i.e. literal values, UDFs and built-in functions). In Flink, computed columns are useful to define time attributes upon table creation.
Additional Extensions to SQL DDL # There is now a clear distinction between temporary/persistent and system/catalog functions (FLIP-57). This not only eliminates ambiguity in function reference, but also allows for deterministic function resolution order (i.e. in case of naming collision, system functions will precede catalog functions, with temporary functions taking precedence over persistent functions for both dimensions).
Following the groundwork in FLIP-57, we extended the SQL DDL syntax to support the creation of catalog functions, temporary functions and temporary system functions (FLIP-79):
CREATE [TEMPORARY|TEMPORARY SYSTEM] FUNCTION [IF NOT EXISTS] [catalog_name.][db_name.]function_name AS identifier [LANGUAGE JAVA|SCALA] For a complete overview of the current state of DDL support in Flink SQL, check the updated documentation.
Note In order to correctly handle and guarantee a consistent behavior across meta-objects (tables, views, functions) in the future, some object declaration methods in the Table API have been deprecated in favor of methods that are closer to standard SQL DDL (FLIP-64).
Full TPC-DS Coverage for Batch # TPC-DS is a widely used industry-standard decision support benchmark to evaluate and measure the performance of SQL-based data processing engines. In Flink 1.10, all TPC-DS queries are supported end-to-end (FLINK-11491), reflecting the readiness of its SQL engine to address the needs of modern data warehouse-like workloads.
PyFlink: Support for Native User Defined Functions (UDFs) # A preview of PyFlink was introduced in the previous release, making headway towards the goal of full Python support in Flink. For this release, the focus was to enable users to register and use Python User-Defined Functions (UDF, with UDTF/UDAF planned) in the Table API/SQL (FLIP-58).
If you are interested in the underlying implementation — leveraging Apache Beam’s Portability Framework — refer to the “Architecture” section of FLIP-58 and also to FLIP-78. These data structures lay the required foundation for Pandas support and for PyFlink to eventually reach the DataStream API.
From Flink 1.10, users can also easily install PyFlink through pip using:
pip install apache-flink For a preview of other improvements planned for PyFlink, check FLINK-14500 and get involved in the discussion for requested user features.
Important Changes # [FLINK-10725] Flink can now be compiled and run on Java 11.
[FLINK-15495] The Blink planner is now the default in the SQL Client, so that users can benefit from all the latest features and improvements. The switch from the old planner in the Table API is also planned for the next release, so we recommend that users start getting familiar with the Blink planner.
[FLINK-13025] There is a new Elasticsearch sink connector, fully supporting Elasticsearch 7.x versions.
[FLINK-15115] The connectors for Kafka 0.8 and 0.9 have been marked as deprecated and will no longer be actively supported. If you are still using these versions or have any other related concerns, please reach out to the @dev mailing list.
[FLINK-14516] The non-credit-based network flow control code was removed, along with the configuration option taskmanager.network.credit.model. Moving forward, Flink will always use credit-based flow control.
[FLINK-12122] FLIP-6 was rolled out with Flink 1.5.0 and introduced a code regression related to the way slots are allocated from TaskManagers. To use a scheduling strategy that is closer to the pre-FLIP behavior, where Flink tries to spread out the workload across all currently available TaskManagers, users can set cluster.evenly-spread-out-slots: true in the flink-conf.yaml.
[FLINK-11956] s3-hadoop and s3-presto filesystems no longer use class relocations and should be loaded through plugins, but now seamlessly integrate with all credential providers. Other filesystems are strongly recommended to be used only as plugins, as we will continue to remove relocations.
Flink 1.9 shipped with a refactored Web UI, with the legacy one being kept around as backup in case something wasn’t working as expected. No issues have been reported so far, so the community voted to drop the legacy Web UI in Flink 1.10.
Release Notes # Please review the release notes carefully for a detailed list of changes and new features if you plan to upgrade your setup to Flink 1.10. This version is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.
List of Contributors # The Apache Flink community would like to thank all contributors that have made this release possible:
Achyuth Samudrala, Aitozi, Alberto Romero, Alec.Ch, Aleksey Pak, Alexander Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrey Zagrebin, Arvid Heise, Benchao Li, Benoit Hanotte, Benoît Paris, Bhagavan Das, Biao Liu, Chesnay Schepler, Congxian Qiu, Cyrille Chépélov, César Soto Valero, David Anderson, David Hrbacek, David Moravek, Dawid Wysakowicz, Dezhi Cai, Dian Fu, Dyana Rose, Eamon Taaffe, Fabian Hueske, Fawad Halim, Fokko Driesprong, Frey Gao, Gabor Gevay, Gao Yun, Gary Yao, GatsbyNewton, GitHub, Grebennikov Roman, GuoWei Ma, Gyula Fora, Haibo Sun, Hao Dang, Henvealf, Hongtao Zhang, HuangXingBo, Hwanju Kim, Igal Shilman, Jacob Sevart, Jark Wu, Jeff Martin, Jeff Yang, Jeff Zhang, Jiangjie (Becket) Qin, Jiayi, Jiayi Liao, Jincheng Sun, Jing Zhang, Jingsong Lee, JingsongLi, Joao Boto, John Lonergan, Kaibo Zhou, Konstantin Knauf, Kostas Kloudas, Kurt Young, Leonard Xu, Ling Wang, Lining Jing, Liupengcheng, LouisXu, Mads Chr. Olesen, Marco Zühlke, Marcos Klein, Matyas Orhidi, Maximilian Bode, Maximilian Michels, Nick Pavlakis, Nico Kruber, Nicolas Deslandes, Pablo Valtuille, Paul Lam, Paul Lin, PengFei Li, Piotr Nowojski, Piotr Przybylski, Piyush Narang, Ricco Chen, Richard Deurwaarder, Robert Metzger, Roman, Roman Grebennikov, Roman Khachatryan, Rong Rong, Rui Li, Ryan Tao, Scott Kidder, Seth Wiesman, Shannon Carey, Shaobin.Ou, Shuo Cheng, Stefan Richter, Stephan Ewen, Steve OU, Steven Wu, Terry Wang, Thesharing, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, TsReaper, Tzu-Li (Gordon) Tai, Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Wind (Jiayi Liao), Xintong Song, XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yadong Xie, Yang Wang, Yangze Guo, Yikun Jiang, Ying, YngwieWang, Yu Li, Yuan Mei, Yun Gao, Yun Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, a-suiniaev, azagrebin, beyond1920, biao.liub, blueszheng, bowen.li, caoyingjie, catkint, chendonglin, chenqi, chunpinghe, cyq89051127, danrtsey.wy, dengziming, dianfu, eskabetxe, fanrui, forideal, gentlewang, godfrey he, godfreyhe, haodang, hehuiyuan, hequn8128, hpeter, huangxingbo, huzheng, ifndef-SleePy, jiemotongxue, joe, jrthe42, kevin.cyj, klion26, lamber-ken, libenchao, liketic, lincoln-lil, lining, liuyongvs, liyafan82, lz, mans2singh, mojo, openinx, ouyangwulin, shining-huang, shuai-xu, shuo.cs, stayhsfLee, sunhaibotb, sunjincheng121, tianboxiu, tianchen, tianchen92, tison, tszkitlo40, unknown, vinoyang, vthinkxie, wangpeibin, wangxiaowei, wangxiyuan, wangxlong, wangyang0918, whlwanghailong, xuchao0903, xuyang1706, yanghua, yangjf2019, yongqiang chai, yuzhao.cyz, zentol, zhangzhanchum, zhengcanbin, zhijiang, zhongyong jin, zhuzhu.zz, zjuwangg, zoudaokoulife, 砚田, 谢磊, 张志豪, 曹建华
`}),e.add({id:170,href:"/2020/02/03/a-guide-for-unit-testing-in-apache-flink/",title:"A Guide for Unit Testing in Apache Flink",section:"Flink Blog",content:`Writing unit tests is one of the essential tasks of designing a production-grade application. Without tests, a single change in code can result in cascades of failure in production. Thus unit tests should be written for all types of applications, be it a simple job cleaning data and training a model or a complex multi-tenant, real-time data processing system. In the following sections, we provide a guide for unit testing of Apache Flink applications. Apache Flink provides a robust unit testing framework to make sure your applications behave in production as expected during development. You need to include the following dependencies to utilize the provided framework.
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-test-utils_\${scala.binary.version}&lt;/artifactId&gt; &lt;version&gt;\${flink.version}&lt;/version&gt; &lt;scope&gt;test&lt;/scope&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-runtime_\${scala.binary.version}&lt;/artifactId&gt; &lt;version&gt;\${flink.version}&lt;/version&gt; &lt;scope&gt;test&lt;/scope&gt; &lt;classifier&gt;tests&lt;/classifier&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_\${scala.binary.version}&lt;/artifactId&gt; &lt;version&gt;\${flink.version}&lt;/version&gt; &lt;scope&gt;test&lt;/scope&gt; &lt;classifier&gt;tests&lt;/classifier&gt; &lt;/dependency&gt; The strategy of writing unit tests differs for various operators. You can break down the strategy into the following three buckets:
Stateless Operators Stateful Operators Timed Process Operators Stateless Operators # Writing unit tests for a stateless operator is a breeze. You need to follow the basic norm of writing a test case, i.e., create an instance of the function class and test the appropriate methods. Let’s take an example of a simple Map operator.
public class MyStatelessMap implements MapFunction&lt;String, String&gt; { @Override public String map(String in) throws Exception { String out = &#34;hello &#34; + in; return out; } } The test case for the above operator should look like
@Test public void testMap() throws Exception { MyStatelessMap statelessMap = new MyStatelessMap(); String out = statelessMap.map(&#34;world&#34;); Assert.assertEquals(&#34;hello world&#34;, out); } Pretty simple, right? Let’s take a look at one for the FlatMap operator.
public class MyStatelessFlatMap implements FlatMapFunction&lt;String, String&gt; { @Override public void flatMap(String in, Collector&lt;String&gt; collector) throws Exception { String out = &#34;hello &#34; + in; collector.collect(out); } } FlatMap operators require a Collector object along with the input. For the test case, we have two options:
Mock the Collector object using Mockito Use the ListCollector provided by Flink I prefer the second method as it requires fewer lines of code and is suitable for most of the cases.
@Test public void testFlatMap() throws Exception { MyStatelessFlatMap statelessFlatMap = new MyStatelessFlatMap(); List&lt;String&gt; out = new ArrayList&lt;&gt;(); ListCollector&lt;String&gt; listCollector = new ListCollector&lt;&gt;(out); statelessFlatMap.flatMap(&#34;world&#34;, listCollector); Assert.assertEquals(Lists.newArrayList(&#34;hello world&#34;), out); } Stateful Operators # Writing test cases for stateful operators requires more effort. You need to check whether the operator state is updated correctly and if it is cleaned up properly along with the output of the operator.
Let’s take an example of stateful FlatMap function
public class StatefulFlatMap extends RichFlatMapFunction&lt;String, String&gt; { ValueState&lt;String&gt; previousInput; @Override public void open(Configuration parameters) throws Exception { previousInput = getRuntimeContext().getState( new ValueStateDescriptor&lt;String&gt;(&#34;previousInput&#34;, Types.STRING)); } @Override public void flatMap(String in, Collector&lt;String&gt; collector) throws Exception { String out = &#34;hello &#34; + in; if(previousInput.value() != null){ out = out + &#34; &#34; + previousInput.value(); } previousInput.update(in); collector.collect(out); } } The intricate part of writing tests for the above class is to mock the configuration as well as the runtime context of the application. Flink provides TestHarness classes so that users don’t have to create the mock objects themselves. Using the KeyedOperatorHarness, the test looks like:
import org.apache.flink.streaming.api.operators.StreamFlatMap; import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; import org.apache.flink.streaming.util.KeyedOneInputStreamOperatorTestHarness; import org.apache.flink.streaming.util.OneInputStreamOperatorTestHarness; @Test public void testFlatMap() throws Exception{ StatefulFlatMap statefulFlatMap = new StatefulFlatMap(); // OneInputStreamOperatorTestHarness takes the input and output types as type parameters OneInputStreamOperatorTestHarness&lt;String, String&gt; testHarness = // KeyedOneInputStreamOperatorTestHarness takes three arguments: // Flink operator object, key selector and key type new KeyedOneInputStreamOperatorTestHarness&lt;&gt;( new StreamFlatMap&lt;&gt;(statefulFlatMap), x -&gt; &#34;1&#34;, Types.STRING); testHarness.open(); // test first record testHarness.processElement(&#34;world&#34;, 10); ValueState&lt;String&gt; previousInput = statefulFlatMap.getRuntimeContext().getState( new ValueStateDescriptor&lt;&gt;(&#34;previousInput&#34;, Types.STRING)); String stateValue = previousInput.value(); Assert.assertEquals( Lists.newArrayList(new StreamRecord&lt;&gt;(&#34;hello world&#34;, 10)), testHarness.extractOutputStreamRecords()); Assert.assertEquals(&#34;world&#34;, stateValue); // test second record testHarness.processElement(&#34;parallel&#34;, 20); Assert.assertEquals( Lists.newArrayList( new StreamRecord&lt;&gt;(&#34;hello world&#34;, 10), new StreamRecord&lt;&gt;(&#34;hello parallel world&#34;, 20)), testHarness.extractOutputStreamRecords()); Assert.assertEquals(&#34;parallel&#34;, previousInput.value()); } The test harness provides many helper methods, three of which are being used here:
open: calls the open of the FlatMap function with relevant parameters. It also initializes the context. processElement: allows users to pass an input element as well as the timestamp associated with the element. extractOutputStreamRecords: gets the output records along with their timestamps from the Collector. The test harness simplifies the unit testing for the stateful functions to a large extent.
You might also need to check whether the state value is being set correctly. You can get the state value directly from the operator using a mechanism similar to the one used while creating the state. This is also demonstrated in the previous example.
Timed Process Operators # Writing tests for process functions, that work with time, is quite similar to writing tests for stateful functions because you can also use test harness. However, you need to take care of another aspect, which is providing timestamps for events and controlling the current time of the application. By setting the current (processing or event) time, you can trigger registered timers, which will call the onTimer method of the function
public class MyProcessFunction extends KeyedProcessFunction&lt;String, String, String&gt; { @Override public void processElement(String in, Context context, Collector&lt;String&gt; collector) throws Exception { context.timerService().registerProcessingTimeTimer(50); String out = &#34;hello &#34; + in; collector.collect(out); } @Override public void onTimer(long timestamp, OnTimerContext ctx, Collector&lt;String&gt; out) throws Exception { out.collect(String.format(&#34;Timer triggered at timestamp %d&#34;, timestamp)); } } We need to test both the methods in the KeyedProcessFunction, i.e., processElement as well as onTimer. Using a test harness, we can control the current time of the function. Thus, we can trigger the timer at will rather than waiting for a specific time.
Let’s take a look at the test case
@Test public void testProcessElement() throws Exception{ MyProcessFunction myProcessFunction = new MyProcessFunction(); OneInputStreamOperatorTestHarness&lt;String, String&gt; testHarness = new KeyedOneInputStreamOperatorTestHarness&lt;&gt;( new KeyedProcessOperator&lt;&gt;(myProcessFunction), x -&gt; &#34;1&#34;, Types.STRING); // Function time is initialized to 0 testHarness.open(); testHarness.processElement(&#34;world&#34;, 10); Assert.assertEquals( Lists.newArrayList(new StreamRecord&lt;&gt;(&#34;hello world&#34;, 10)), testHarness.extractOutputStreamRecords()); } @Test public void testOnTimer() throws Exception { MyProcessFunction myProcessFunction = new MyProcessFunction(); OneInputStreamOperatorTestHarness&lt;String, String&gt; testHarness = new KeyedOneInputStreamOperatorTestHarness&lt;&gt;( new KeyedProcessOperator&lt;&gt;(myProcessFunction), x -&gt; &#34;1&#34;, Types.STRING); testHarness.open(); testHarness.processElement(&#34;world&#34;, 10); Assert.assertEquals(1, testHarness.numProcessingTimeTimers()); // Function time is set to 50 testHarness.setProcessingTime(50); Assert.assertEquals( Lists.newArrayList( new StreamRecord&lt;&gt;(&#34;hello world&#34;, 10), new StreamRecord&lt;&gt;(&#34;Timer triggered at timestamp 50&#34;)), testHarness.extractOutputStreamRecords()); } The mechanism to test the multi-input stream operators such as CoProcess functions is similar to the ones described in this article. You should use the TwoInput variant of the harness for these operators, such as TwoInputStreamOperatorTestHarness.
Summary # In the previous sections we showcased how unit testing in Apache Flink works for stateless, stateful and times-aware-operators. We hope you found the steps easy to follow and execute while developing your Flink applications. If you have any questions or feedback you can reach out to me here or contact the community on the Apache Flink user mailing list.
`}),e.add({id:171,href:"/2020/01/30/apache-flink-1.9.2-released/",title:"Apache Flink 1.9.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.9 series.
This release includes 117 fixes and minor improvements for Flink 1.9.1. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.9.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.9.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-12122] - Spread out tasks evenly across all available registered TaskManagers [FLINK-13360] - Add documentation for HBase connector for Table API &amp; SQL [FLINK-13361] - Add documentation for JDBC connector for Table API &amp; SQL [FLINK-13723] - Use liquid-c for faster doc generation [FLINK-13724] - Remove unnecessary whitespace from the docs&#39; sidenav [FLINK-13725] - Use sassc for faster doc generation [FLINK-13726] - Build docs with jekyll 4.0.0.pre.beta1 [FLINK-13791] - Speed up sidenav by using group_by [FLINK-13817] - Expose whether web submissions are enabled [FLINK-13818] - Check whether web submission are enabled [FLINK-14535] - Cast exception is thrown when count distinct on decimal fields [FLINK-14735] - Improve batch schedule check input consumable performance Bug [FLINK-10377] - Remove precondition in TwoPhaseCommitSinkFunction.notifyCheckpointComplete [FLINK-10435] - Client sporadically hangs after Ctrl + C [FLINK-11120] - TIMESTAMPADD function handles TIME incorrectly [FLINK-11835] - ZooKeeperLeaderElectionITCase.testJobExecutionOnClusterWithLeaderChange failed [FLINK-12342] - Yarn Resource Manager Acquires Too Many Containers [FLINK-12399] - FilterableTableSource does not use filters on job run [FLINK-13184] - Starting a TaskExecutor blocks the YarnResourceManager&#39;s main thread [FLINK-13589] - DelimitedInputFormat index error on multi-byte delimiters with whole file input splits [FLINK-13702] - BaseMapSerializerTest.testDuplicate fails on Travis [FLINK-13708] - Transformations should be cleared because a table environment could execute multiple job [FLINK-13740] - TableAggregateITCase.testNonkeyedFlatAggregate failed on Travis [FLINK-13749] - Make Flink client respect classloading policy [FLINK-13758] - Failed to submit JobGraph when registered hdfs file in DistributedCache [FLINK-13799] - Web Job Submit Page displays stream of error message when web submit is disables in the config [FLINK-13827] - Shell variable should be escaped in start-scala-shell.sh [FLINK-13862] - Update Execution Plan docs [FLINK-13945] - Instructions for building flink-shaded against vendor repository don&#39;t work [FLINK-13969] - Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end test fails on Travis [FLINK-13995] - Fix shading of the licence information of netty [FLINK-13999] - Correct the documentation of MATCH_RECOGNIZE [FLINK-14066] - Pyflink building failure in master and 1.9.0 version [FLINK-14074] - MesosResourceManager can&#39;t create new taskmanagers in Session Cluster Mode. [FLINK-14175] - Upgrade KPL version in flink-connector-kinesis to fix application OOM [FLINK-14200] - Temporal Table Function Joins do not work on Tables (only TableSources) on the query side [FLINK-14235] - Kafka010ProducerITCase&gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis [FLINK-14315] - NPE with JobMaster.disconnectTaskManager [FLINK-14337] - HistoryServer does not handle NPE on corruped archives properly [FLINK-14347] - YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string [FLINK-14355] - Example code in state processor API docs doesn&#39;t compile [FLINK-14370] - KafkaProducerAtLeastOnceITCase&gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis [FLINK-14382] - Incorrect handling of FLINK_PLUGINS_DIR on Yarn [FLINK-14398] - Further split input unboxing code into separate methods [FLINK-14413] - Shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding [FLINK-14434] - Dispatcher#createJobManagerRunner should not start JobManagerRunner [FLINK-14445] - Python module build failed when making sdist [FLINK-14447] - Network metrics doc table render confusion [FLINK-14459] - Python module build hangs [FLINK-14524] - PostgreSQL JDBC sink generates invalid SQL in upsert mode [FLINK-14547] - UDF cannot be in the join condition in blink planner [FLINK-14561] - Don&#39;t write FLINK_PLUGINS_DIR ENV variable to Flink configuration [FLINK-14562] - RMQSource leaves idle consumer after closing [FLINK-14574] - flink-s3-fs-hadoop doesn&#39;t work with plugins mechanism [FLINK-14589] - Redundant slot requests with the same AllocationID leads to inconsistent slot table [FLINK-14641] - Fix description of metric \`fullRestarts\` [FLINK-14673] - Shouldn&#39;t expect HMS client to throw NoSuchObjectException for non-existing function [FLINK-14683] - RemoteStreamEnvironment&#39;s construction function has a wrong method [FLINK-14701] - Slot leaks if SharedSlotOversubscribedException happens [FLINK-14784] - CsvTableSink miss delimiter when row start with null member [FLINK-14817] - &quot;Streaming Aggregation&quot; document contains misleading code examples [FLINK-14846] - Correct the default writerbuffer size documentation of RocksDB [FLINK-14910] - DisableAutoGeneratedUIDs fails on keyBy [FLINK-14930] - OSS Filesystem Uses Wrong Shading Prefix [FLINK-14949] - Task cancellation can be stuck against out-of-thread error [FLINK-14951] - State TTL backend end-to-end test fail when taskManager has multiple slot [FLINK-14953] - Parquet table source should use schema type to build FilterPredicate [FLINK-14960] - Dependency shading of table modules test fails on Travis [FLINK-14976] - Cassandra Connector leaks Semaphore on Throwable; hangs on close [FLINK-15001] - The digest of sub-plan reuse should contain retraction traits for stream physical nodes [FLINK-15013] - Flink (on YARN) sometimes needs too many slots [FLINK-15030] - Potential deadlock for bounded blocking ResultPartition. [FLINK-15036] - Container startup error will be handled out side of the YarnResourceManager&#39;s main thread [FLINK-15063] - Input group and output group of the task metric are reversed [FLINK-15065] - RocksDB configurable options doc description error [FLINK-15076] - Source thread should be interrupted during the Task cancellation [FLINK-15234] - Hive table created from flink catalog table shouldn&#39;t have null properties in parameters [FLINK-15240] - is_generic key is missing for Flink table stored in HiveCatalog [FLINK-15259] - HiveInspector.toInspectors() should convert Flink constant to Hive constant [FLINK-15266] - NPE in blink planner code gen [FLINK-15361] - ParquetTableSource should pass predicate in projectFields [FLINK-15412] - LocalExecutorITCase#testParameterizedTypes failed in travis [FLINK-15413] - ScalarOperatorsTest failed in travis [FLINK-15418] - StreamExecMatchRule not set FlinkRelDistribution [FLINK-15421] - GroupAggsHandler throws java.time.LocalDateTime cannot be cast to java.sql.Timestamp [FLINK-15435] - ExecutionConfigTests.test_equals_and_hash in pyFlink fails when cpu core numbers is 6 [FLINK-15443] - Use JDBC connector write FLOAT value occur ClassCastException [FLINK-15478] - FROM_BASE64 code gen type wrong [FLINK-15489] - WebUI log refresh not working [FLINK-15522] - Misleading root cause exception when cancelling the job [FLINK-15523] - ConfigConstants generally excluded from japicmp [FLINK-15543] - Apache Camel not bundled but listed in flink-dist NOTICE [FLINK-15549] - Integer overflow in SpillingResettableMutableObjectIterator [FLINK-15577] - WindowAggregate RelNodes missing Window specs in digest [FLINK-15615] - Docs: wrong guarantees stated for the file sink Improvement [FLINK-11135] - Reorder Hadoop config loading in HadoopUtils [FLINK-12848] - Method equals() in RowTypeInfo should consider fieldsNames [FLINK-13729] - Update website generation dependencies [FLINK-14008] - Auto-generate binary licensing [FLINK-14104] - Bump Jackson to 2.10.1 [FLINK-14123] - Lower the default value of taskmanager.memory.fraction [FLINK-14206] - Let fullRestart metric count fine grained restarts as well [FLINK-14215] - Add Docs for TM and JM Environment Variable Setting [FLINK-14251] - Add FutureUtils#forward utility [FLINK-14334] - ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable [FLINK-14335] - ExampleIntegrationTest in testing docs is incorrect [FLINK-14408] - In OldPlanner, UDF open method can not be invoke when SQL is optimized [FLINK-14557] - Clean up the package of py4j [FLINK-14639] - Metrics User Scope docs refer to wrong class [FLINK-14646] - Check non-null for key in KeyGroupStreamPartitioner [FLINK-14825] - Rework state processor api documentation [FLINK-14995] - Kinesis NOTICE is incorrect [FLINK-15113] - fs.azure.account.key not hidden from global configuration [FLINK-15554] - Bump jetty-util-ajax to 9.3.24 [FLINK-15657] - Fix the python table api doc link in Python API tutorial [FLINK-15700] - Improve Python API Tutorial doc [FLINK-15726] - Fixing error message in StreamExecTableSourceScan `}),e.add({id:172,href:"/2020/01/29/state-unlocked-interacting-with-state-in-apache-flink/",title:"State Unlocked: Interacting with State in Apache Flink",section:"Flink Blog",content:` Introduction # With stateful stream-processing becoming the norm for complex event-driven applications and real-time analytics, Apache Flink is often the backbone for running business logic and managing an organization’s most valuable asset — its data — as application state in Flink.
In order to provide a state-of-the-art experience to Flink developers, the Apache Flink community makes significant efforts to provide the safety and future-proof guarantees organizations need while managing state in Flink. In particular, Flink developers should have sufficient means to access and modify their state, as well as making bootstrapping state with existing data from external systems a piece-of-cake. These efforts span multiple Flink major releases and consist of the following:
Evolvable state schema in Apache Flink Flexibility in swapping state backends, and The State processor API, an offline tool to read, write and modify state in Flink This post discusses the community’s efforts related to state management in Flink, provides some practical examples of how the different features and APIs can be utilized and covers some future ideas for new and improved ways of managing state in Apache Flink.
Stream processing: What is State? # To set the tone for the remaining of the post, let us first try to explain the very definition of state in stream processing. When it comes to stateful stream processing, state comprises of the information that an application or stream processing engine will remember across events and streams as more realtime (unbounded) and/or offline (bounded) data flow through the system. Most trivial applications are inherently stateful; even the example of a simple COUNT operation, whereby when counting up to 10, you essentially need to remember that you have already counted up to 9.
To better understand how Flink manages state, one can think of Flink like a three-layered state abstraction, as illustrated in the diagram below.
On the top layer, sits the Flink user code, for example, a KeyedProcessFunction that contains some value state. This is a simple variable whose value state annotations makes it automatically fault-tolerant, re-scalable and queryable by the runtime. These variables are backed by the configured state backend that sits either on-heap or on-disk (RocksDB State Backend) and provides data locality, proximity to the computation and speed when it comes to per-record computations. Finally, when it comes to upgrades, the introduction of new features or bug fixes, and in order to keep your existing state intact, this is where savepoints come in.
A savepoint is a snapshot of the distributed, global state of an application at a logical point-in-time and is stored in an external distributed file system or blob storage such as HDFS, or S3. Upon upgrading an application or implementing a code change — such as adding a new operator or changing a field — the Flink job can restart by re-loading the application state from the savepoint into the state backend, making it local and available for the computation and continue processing as if nothing had ever happened.
It is important to remember here that state is one of the most valuable components of a Flink application carrying all the information about both where you are now and where you are going. State is among the most long-lived components in a Flink service since it can be carried across jobs, operators, configurations, new features and bug fixes. Schema Evolution with Apache Flink # In the previous section, we explained how state is stored and persisted in a Flink application. Let’s now take a look at what happens when evolving state in a stateful Flink streaming application becomes necessary.
Imagine an Apache Flink application that implements a KeyedProcessFunction and contains some ValueState. As illustrated below, within the state descriptor, when registering the type, Flink users specify their TypeInformation that informs Flink about how to serialize the bytes and represents Flink’s internal type system, used to serialize data when shipped across the network or stored in state backends. Flink’s type system has built-in support for all the basic types such as longs, strings, doubles, arrays and basic collection types like lists and maps. Additionally, Flink supports most of the major composite types including Tuples, POJOs, Scala Case Classes and Apache AvroⓇ. Finally, if an application’s type does not match any of the above, developers can either plug in their own serializer or Flink will then fall back to Kryo.
State registration with built-in serialization in Apache Flink # public class MyFunction extends KeyedProcessFunction&lt;Key, Input, Output&gt; { ​ private transient ValueState&lt;MyState&gt; valueState; ​ public void open(Configuration parameters) { ValueStateDescriptor&lt;MyState&gt; descriptor = new ValueStateDescriptor&lt;&gt;(&#34;my-state&#34;, TypeInformation.of(MyState.class)); ​ valueState = getRuntimeContext().getState(descriptor); } } Typically, evolving the schema of an application’s state happens because of some business logic change (adding or dropping fields or changing data types). In all cases, the schema is determined by means of its serializer, and can be thought of in terms of an alter table statement when compared with a database. When a state variable is first introduced it is like running a CREATE_TABLE command, there is a lot of freedom with its execution. However, having data in that table (registered rows) limits developers in what they can do and what rules they follow in order to make updates or changes by an ALTER_TABLE statement. Schema migration in Apache Flink follows a similar principle since the framework is essentially running an ALTER_TABLE statement across savepoints.
Flink 1.8 comes with built-in support for Apache Avro (specifically the 1.7.7 specification) and evolves state schema according to Avro specifications by adding and removing types or even by swapping between generic and specific Avro record types.
In Flink 1.9 the community added support for schema evolution for POJOs, including the ability to remove existing fields from POJO types or add new fields. The POJO schema evolution tends to be less flexible — when compared to Avro — since it is not possible to change neither the declared field types nor the class name of a POJO type, including its namespace.
With the community’s efforts related to schema evolution, Flink developers can now expect out-of-the-box support for both Avro and POJO formats, with backwards compatibility for all Flink state backends. Future work revolves around adding support for Scala Case Classes, Tuples and other formats. Make sure to subscribe to the Flink mailing list to contribute and stay on top of any upcoming additions in this space.
Peeking Under the Hood # Now that we have explained how schema evolution in Flink works, let’s describe the challenges of performing schema serialization with Flink under the hood. Flink considers state as a core part of its API stability, in a way that developers should always be able to take a savepoint from one version of Flink and restart it on the next. With schema evolution, every migration needs to be backwards compatible and also compatible with the different state backends. While in the Flink code the state backends are represented as interfaces detailing how to store and retrieve bytes, in practice, they behave vastly differently, something that adds extra complexity to how schema evolution is executed in Flink.
For instance, the heap state backend supports lazy serialization and eager deserialization, making the per-record code path always working with Java objects, serializing on a background thread. When restoring, Flink will eagerly deserialize all the data and then start the user code. If a developer plugs in a new serializer, the deserialization happens before Flink ever receives the information.
The RocksDB state backend behaves in the exact opposite manner: it supports eager serialization — because of items being stored on disk and RocksDB only consuming byte arrays. RocksDB provides lazy deserialization simply by downloading files to the local disk, making Flink unaware of what the bytes mean until a serializer is registered.
An additional challenge stems from the fact that different versions of user code contain different classes on their classpath making the serializer used to write into a savepoint likely potentially unavailable at runtime.
To overcome the previously mentioned challenges, we introduced what we call TypeSerializerSnapshot. The TypeSerializerSnapshot stores the configuration of the writer serializer in the snapshot. When restoring it will use that configuration to read back the previous state and check its compatibility with the current version. Using such operation allows Flink to:
Read the configuration used to write out a snapshot Consume the new user code Check if both items above are compatible Consume the bytes from the snapshot and move forward or alert the user otherwise public interface TypeSerializerSnapshot&lt;T&gt; { ​ int getCurrentVersion(); ​ void writeSnapshot(DataOutputView out) throws IOException; ​ void readSnapshot( int readVersion, DataInputView in, ClassLoader userCodeClassLoader) throws IOException; ​ TypeSerializer&lt;T&gt; restoreSerializer(); ​ TypeSerializerSchemaCompatibility&lt;T&gt; resolveSchemaCompatibility( TypeSerializer&lt;T&gt; newSerializer); } Implementing Apache Avro Serialization in Flink # Apache Avro is a data serialization format that has very well-defined schema migration semantics and supports both reader and writer schemas. During normal Flink execution the reader and writer schemas will be the same. However, when upgrading an application they may be different and with schema evolution, Flink will be able to migrate objects with their schemas.
public class AvroSerializerSnapshot&lt;T&gt; implements TypeSerializerSnapshot&lt;T&gt; { private Schema runtimeSchema; private Schema previousSchema; ​ @SuppressWarnings(&#34;WeakerAccess&#34;) public AvroSerializerSnapshot() { } ​ AvroSerializerSnapshot(Schema schema) { this.runtimeSchema = schema; } This is a sketch of our Avro serializer. It uses the provided schemas and delegates to Apache Avro for all (de)-serialization. Let’s take a look at one possible implementation of a TypeSerializerSnapshot that supports schema migration for Avro.
Writing out the snapshot # When serializing out the snapshot, the snapshot configuration will write two pieces of information; the current snapshot configuration version and the serializer configuration.
@Override public int getCurrentVersion() { return 1; } ​ @Override public void writeSnapshot(DataOutputView out) throws IOException { out.writeUTF(runtimeSchema.toString(false)); } The version is used to version the snapshot configuration object itself while the writeSnapshot method writes out all the information we need to understand the current format; the runtime schema.
@Override public void readSnapshot( int readVersion, DataInputView in, ClassLoader userCodeClassLoader) throws IOException { assert readVersion == 1; final String previousSchemaDefinition = in.readUTF(); this.previousSchema = parseAvroSchema(previousSchemaDefinition); this.runtimeType = findClassOrFallbackToGeneric( userCodeClassLoader, previousSchema.getFullName()); ​ this.runtimeSchema = tryExtractAvroSchema(userCodeClassLoader, runtimeType); } Now when Flink restores it is able to read back in the writer schema used to serialize the data. The current runtime schema is discovered on the class path using some Java reflection magic.
Once we have both of these we can compare them for compatibility. Perhaps nothing has changed and the schemas are compatible as is.
@Override public TypeSerializerSchemaCompatibility&lt;T&gt; resolveSchemaCompatibility( TypeSerializer&lt;T&gt; newSerializer) { ​ if (!(newSerializer instanceof AvroSerializer)) { return TypeSerializerSchemaCompatibility.incompatible(); } ​ if (Objects.equals(previousSchema, runtimeSchema)) { return TypeSerializerSchemaCompatibility.compatibleAsIs(); } Otherwise, the schemas are compared using Avro’s compatibility checks and they may either be compatible with a migration or incompatible.
final SchemaPairCompatibility compatibility = SchemaCompatibility .checkReaderWriterCompatibility(previousSchema, runtimeSchema); ​ return avroCompatibilityToFlinkCompatibility(compatibility); } If they are compatible with migration then Flink will restore a new serializer that can read the old schema and deserialize into the new runtime type which is in effect a migration.
@Override public TypeSerializer&lt;T&gt; restoreSerializer() { if (previousSchema != null) { return new AvroSerializer&lt;&gt;(runtimeType, runtimeSchema, previousSchema); } else { return new AvroSerializer&lt;&gt;(runtimeType, runtimeSchema, runtimeSchema); } } } The State Processor API: Reading, writing and modifying Flink state # The State Processor API allows reading from and writing to Flink savepoints. Some of the interesting use cases it can be used for are:
Analyzing state for interesting patterns Troubleshooting or auditing jobs by checking for state discrepancies Bootstrapping state for new applications Modifying savepoints such as: Changing the maximum parallelism of a savepoint after deploying a Flink job Introducing breaking schema updates to a Flink application Correcting invalid state in a Flink savepoint In a previous blog post, we discussed the State Processor API in detail, the community’s motivation behind introducing the feature in Flink 1.9, what you can use the API for and how you can use it. Essentially, the State Processor API is based around a relational model of mapping your Flink job state to a database, as illustrated in the diagram below. We encourage you to read the previous story for more information on the API and how to use it. In a follow up post, we will provide detailed tutorials on:
Reading Keyed and Operator State with the State Processor API and Writing and Bootstrapping Keyed and Operator State with the State Processor API Stay tuned for more details and guidance around this feature of Flink.
Looking ahead: More ways to interact with State in Flink # There is a lot of discussion happening in the community related to extending the way Flink developers interact with state in their Flink applications. Regarding the State Processor API, some thoughts revolve around further broadening the API’s scope beyond its current ability to read from and write to both keyed and operator state. In upcoming releases, the State processor API will be extended to support both reading from and writing to windows and have a first-class integration with Flink’s Table API and SQL.
Beyond widening the scope of the State Processor API, the Flink community is discussing a few additional ways to improve the way developers interact with state in Flink. One of them is the proposal for a Unified Savepoint Format (FLIP-41) for all keyed state backends. Such improvement aims at introducing a unified binary format across all savepoints in all keyed state backends, something that drastically reduces the overhead of swapping the state backend in a Flink application. Such an improvement would allow developers to take a savepoint in their application and restart it in a different state backend — for example, moving it from the heap to disk (RocksDB state backend) and back — depending on the scalability and evolution of the application at different points-in-time.
The community is also discussing the ability to have upgradability dry runs in upcoming Flink releases. Having such functionality in Flink allows developers to detect incompatible updates offline without the need of starting a new Flink job from scratch. For example, Flink users will be able to uncover topology or schema incompatibilities upon upgrading a Flink job, without having to load the state back to a running Flink job in the first place. Additionally, with upgradability dry runs Flink users will be able to get information about the registered state through the streaming graph, without needing to access the state in the state backend.
With all the exciting new functionality added in Flink 1.9 as well as some solid ideas and discussions around bringing state in Flink to the next level, the community is committed to making state in Apache Flink a fundamental element of the framework, something that is ever-present across versions and upgrades of your application and a component that is a true first-class citizen in Apache Flink. We encourage you to sign up to the mailing list and stay on top of the announcements and new features in upcoming releases.
`}),e.add({id:173,href:"/2020/01/15/advanced-flink-application-patterns-vol.1-case-study-of-a-fraud-detection-system/",title:"Advanced Flink Application Patterns Vol.1: Case Study of a Fraud Detection System",section:"Flink Blog",content:`In this series of blog posts you will learn about three powerful Flink patterns for building streaming applications:
Dynamic updates of application logic Dynamic data partitioning (shuffle), controlled at runtime Low latency alerting based on custom windowing logic (without using the window API) These patterns expand the possibilities of what is achievable with statically defined data flows and provide the building blocks to fulfill complex business requirements.
Dynamic updates of application logic allow Flink jobs to change at runtime, without downtime from stopping and resubmitting the code.
Dynamic data partitioning provides the ability to change how events are distributed and grouped by Flink at runtime. Such functionality often becomes a natural requirement when building jobs with dynamically reconfigurable application logic.
Custom window management demonstrates how you can utilize the low level process function API, when the native window API is not exactly matching your requirements. Specifically, you will learn how to implement low latency alerting on windows and how to limit state growth with timers.
These patterns build on top of core Flink functionality, however, they might not be immediately apparent from the framework&rsquo;s documentation as explaining and presenting the motivation behind them is not always trivial without a concrete use case. That is why we will showcase these patterns with a practical example that offers a real-world usage scenario for Apache Flink — a Fraud Detection engine. We hope that this series will place these powerful approaches into your tool belt and enable you to take on new and exciting tasks.
In the first blog post of the series we will look at the high-level architecture of the demo application, describe its components and their interactions. We will then deep dive into the implementation details of the first pattern in the series - dynamic data partitioning.
You will be able to run the full Fraud Detection Demo application locally and look into the details of the implementation by using the accompanying GitHub repository.
Fraud Detection Demo # The full source code for our fraud detection demo is open source and available online. To run it locally, check out the following repository and follow the steps in the README:
https://github.com/afedulov/fraud-detection-demo
You will see the demo is a self-contained application - it only requires docker and docker-compose to be built from sources and includes the following components:
Apache Kafka (message broker) with ZooKeeper Apache Flink (application cluster) Fraud Detection Web App The high-level goal of the Fraud Detection engine is to consume a stream of financial transactions and evaluate them against a set of rules. These rules are subject to frequent changes and tweaks. In a real production system, it is important to be able to add and remove them at runtime, without incurring an expensive penalty of stopping and restarting the job.
When you navigate to the demo URL in your browser, you will be presented with the following UI:
Figure 1: Fraud Detection Demo UI On the left side, you can see a visual representation of financial transactions flowing through the system after you click the &ldquo;Start&rdquo; button. The slider at the top allows you to control the number of generated transactions per second. The middle section is devoted to managing the rules evaluated by Flink. From here, you can create new rules as well as issue control commands, such as clearing Flink&rsquo;s state.
The demo out-of-the-box comes with a set of predefined sample rules. You can click the Start button and, after some time, will observe alerts displayed in the right section of the UI. These alerts are the result of Flink evaluating the generated transactions stream against the predefined rules.
Our sample fraud detection system consists of three main components:
Frontend (React) Backend (SpringBoot) Fraud Detection application (Apache Flink) Interactions between the main elements are depicted in Figure 2.
Figure 2: Fraud Detection Demo Components The Backend exposes a REST API to the Frontend for creating/deleting rules as well as issuing control commands for managing the demo execution. It then relays those Frontend actions to Flink by sending them via a &ldquo;Control&rdquo; Kafka topic. The Backend additionally includes a Transactions Generator component, which sends an emulated stream of money transfer events to Flink via a separate &ldquo;Transactions&rdquo; topic. Alerts generated by Flink are consumed by the Backend from &ldquo;Alerts&rdquo; topic and relayed to the UI via WebSockets.
Now that you are familiar with the overall layout and the goal of our Fraud Detection engine, let&rsquo;s now go into the details of what is required to implement such a system.
Dynamic Data Partitioning # The first pattern we will look into is Dynamic Data Partitioning.
If you have used Flink&rsquo;s DataStream API in the past, you are undoubtedly familiar with the keyBy method. Keying a stream shuffles all the records such that elements with the same key are assigned to the same partition. This means all records with the same key are processed by the same physical instance of the next operator.
In a typical streaming application, the choice of key is fixed, determined by some static field within the elements. For instance, when building a simple window-based aggregation of a stream of transactions, we might always group by the transactions account id.
DataStream&lt;Transaction&gt; input = // [...] DataStream&lt;...&gt; windowed = input .keyBy(Transaction::getAccountId) .window(/*window specification*/); This approach is the main building block for achieving horizontal scalability in a wide range of use cases. However, in the case of an application striving to provide flexibility in business logic at runtime, this is not enough. To understand why this is the case, let us start with articulating a realistic sample rule definition for our fraud detection system in the form of a functional requirement:
&ldquo;Whenever the sum of the accumulated payment amount from the same payer to the same beneficiary within the duration of a week is greater than 1 000 000 $ - fire an alert.&rdquo;
In this formulation we can spot a number of parameters that we would like to be able to specify in a newly-submitted rule and possibly even later modify or tweak at runtime:
Aggregation field (payment amount) Grouping fields (payer + beneficiary) Aggregation function (sum) Window duration (1 week) Limit (1 000 000) Limit operator (greater) Accordingly, we will use the following simple JSON format to define the aforementioned parameters:
{ &#34;ruleId&#34;: 1, &#34;ruleState&#34;: &#34;ACTIVE&#34;, &#34;groupingKeyNames&#34;: [&#34;payerId&#34;, &#34;beneficiaryId&#34;], &#34;aggregateFieldName&#34;: &#34;paymentAmount&#34;, &#34;aggregatorFunctionType&#34;: &#34;SUM&#34;, &#34;limitOperatorType&#34;: &#34;GREATER&#34;, &#34;limit&#34;: 1000000, &#34;windowMinutes&#34;: 10080 } At this point, it is important to understand that groupingKeyNames determine the actual physical grouping of events - all Transactions with the same values of specified parameters (e.g. payer #25 -&gt; beneficiary #12) have to be aggregated in the same physical instance of the evaluating operator. Naturally, the process of distributing data in such a way in Flink&rsquo;s API is realised by a keyBy() function.
Most examples in Flink&rsquo;s keyBy()documentation use a hard-coded KeySelector, which extracts specific fixed events&rsquo; fields. However, to support the desired flexibility, we have to extract them in a more dynamic fashion based on the specifications of the rules. For this, we will have to use one additional operator that prepares every event for dispatching to a correct aggregating instance.
On a high level, our main processing pipeline looks like this:
DataStream&lt;Alert&gt; alerts = transactions .process(new DynamicKeyFunction()) .keyBy(/* some key selector */); .process(/* actual calculations and alerting */) We have previously established that each rule defines a groupingKeyNames parameter that specifies which combination of fields will be used for the incoming events&rsquo; grouping. Each rule might use an arbitrary combination of these fields. At the same time, every incoming event potentially needs to be evaluated against multiple rules. This implies that events might simultaneously need to be present at multiple parallel instances of evaluating operators that correspond to different rules and hence will need to be forked. Ensuring such events dispatching is the purpose of DynamicKeyFunction().
Figure 3: Forking events with Dynamic Key Function DynamicKeyFunction iterates over a set of defined rules and prepares every event to be processed by a keyBy() function by extracting the required grouping keys:
public class DynamicKeyFunction extends ProcessFunction&lt;Transaction, Keyed&lt;Transaction, String, Integer&gt;&gt; { ... /* Simplified */ List&lt;Rule&gt; rules = /* Rules that are initialized somehow. Details will be discussed in a future blog post. */; @Override public void processElement( Transaction event, Context ctx, Collector&lt;Keyed&lt;Transaction, String, Integer&gt;&gt; out) { for (Rule rule :rules) { out.collect( new Keyed&lt;&gt;( event, KeysExtractor.getKey(rule.getGroupingKeyNames(), event), rule.getRuleId())); } } ... } KeysExtractor.getKey() uses reflection to extract the required values of groupingKeyNames fields from events and combines them as a single concatenated String key, e.g &quot;{payerId=25;beneficiaryId=12}&quot;. Flink will calculate the hash of this key and assign the processing of this particular combination to a specific server in the cluster. This will allow tracking all transactions between payer #25 and beneficiary #12 and evaluating defined rules within the desired time window.
Notice that a wrapper class Keyed with the following signature was introduced as the output type of DynamicKeyFunction:
public class Keyed&lt;IN, KEY, ID&gt; { private IN wrapped; private KEY key; private ID id; ... public KEY getKey(){ return key; } } Fields of this POJO carry the following information: wrapped is the original transaction event, key is the result of using KeysExtractor and id is the ID of the Rule that caused the dispatch of the event (according to the rule-specific grouping logic).
Events of this type will be the input to the keyBy() function in the main processing pipeline and allow the use of a simple lambda-expression as a KeySelector for the final step of implementing dynamic data shuffle.
DataStream&lt;Alert&gt; alerts = transactions .process(new DynamicKeyFunction()) .keyBy((keyed) -&gt; keyed.getKey()); .process(new DynamicAlertFunction()) By applying DynamicKeyFunction we are implicitly copying events for performing parallel per-rule evaluation within a Flink cluster. By doing so, we achieve an important property - horizontal scalability of rules&rsquo; processing. Our system will be capable of handling more rules by adding more servers to the cluster, i.e. increasing the parallelism. This property is achieved at the cost of data duplication, which might become an issue depending on the specific set of parameters, such as incoming data rate, available network bandwidth, event payload size etc. In a real-life scenario, additional optimizations can be applied, such as combined evaluation of rules which have the same groupingKeyNames, or a filtering layer, which would strip events of all the fields that are not required for processing of a particular rule.
Summary: # In this blog post, we have discussed the motivation behind supporting dynamic, runtime changes to a Flink application by looking at a sample use case - a Fraud Detection engine. We have described the overall architecture and interactions between its components as well as provided references for building and running a demo Fraud Detection application in a dockerized setup. We then showed the details of implementing a dynamic data partitioning pattern as the first underlying building block to enable flexible runtime configurations.
To remain focused on describing the core mechanics of the pattern, we kept the complexity of the DSL and the underlying rules engine to a minimum. Going forward, it is easy to imagine adding extensions such as allowing more sophisticated rule definitions, including filtering of certain events, logical rules chaining, and other more advanced functionality.
In the second part of this series, we will describe how the rules make their way into the running Fraud Detection engine. Additionally, we will go over the implementation details of the main processing function of the pipeline - DynamicAlertFunction().
Figure 4: End-to-end pipeline In the next article, we will see how Flink&rsquo;s broadcast streams can be utilized to help steer the processing within the Fraud Detection engine at runtime (Dynamic Application Updates pattern).
`}),e.add({id:174,href:"/2019/12/11/apache-flink-1.8.3-released/",title:"Apache Flink 1.8.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.8 series.
This release includes 45 fixes and minor improvements for Flink 1.8.2. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.8.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.8.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-13723] - Use liquid-c for faster doc generation [FLINK-13724] - Remove unnecessary whitespace from the docs&#39; sidenav [FLINK-13725] - Use sassc for faster doc generation [FLINK-13726] - Build docs with jekyll 4.0.0.pre.beta1 [FLINK-13791] - Speed up sidenav by using group_by Bug [FLINK-12342] - Yarn Resource Manager Acquires Too Many Containers [FLINK-13184] - Starting a TaskExecutor blocks the YarnResourceManager&#39;s main thread [FLINK-13728] - Fix wrong closing tag order in sidenav [FLINK-13746] - Elasticsearch (v2.3.5) sink end-to-end test fails on Travis [FLINK-13749] - Make Flink client respect classloading policy [FLINK-13892] - HistoryServerTest failed on Travis [FLINK-13936] - NOTICE-binary is outdated [FLINK-13966] - Jar sorting in collect_license_files.sh is locale dependent [FLINK-13995] - Fix shading of the licence information of netty [FLINK-13999] - Correct the documentation of MATCH_RECOGNIZE [FLINK-14009] - Cron jobs broken due to verifying incorrect NOTICE-binary file [FLINK-14010] - Dispatcher &amp; JobManagers don&#39;t give up leadership when AM is shut down [FLINK-14043] - SavepointMigrationTestBase is super slow [FLINK-14107] - Kinesis consumer record emitter deadlock under event time alignment [FLINK-14175] - Upgrade KPL version in flink-connector-kinesis to fix application OOM [FLINK-14235] - Kafka010ProducerITCase&gt;KafkaProducerTestBase.testOneToOneAtLeastOnceCustomOperator fails on travis [FLINK-14315] - NPE with JobMaster.disconnectTaskManager [FLINK-14337] - HistoryServerTest.testHistoryServerIntegration failed on Travis [FLINK-14347] - YARNSessionFIFOITCase.checkForProhibitedLogContents found a log with prohibited string [FLINK-14370] - KafkaProducerAtLeastOnceITCase&gt;KafkaProducerTestBase.testOneToOneAtLeastOnceRegularSink fails on Travis [FLINK-14398] - Further split input unboxing code into separate methods [FLINK-14413] - shade-plugin ApacheNoticeResourceTransformer uses platform-dependent encoding [FLINK-14434] - Dispatcher#createJobManagerRunner should not start JobManagerRunner [FLINK-14562] - RMQSource leaves idle consumer after closing [FLINK-14589] - Redundant slot requests with the same AllocationID leads to inconsistent slot table [FLINK-15036] - Container startup error will be handled out side of the YarnResourceManager&#39;s main thread Improvement [FLINK-12848] - Method equals() in RowTypeInfo should consider fieldsNames [FLINK-13729] - Update website generation dependencies [FLINK-13965] - Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation [FLINK-13967] - Generate full binary licensing via collect_license_files.sh [FLINK-13968] - Add travis check for the correctness of the binary licensing [FLINK-13991] - Add git exclusion for 1.9+ features to 1.8 [FLINK-14008] - Auto-generate binary licensing [FLINK-14104] - Bump Jackson to 2.10.1 [FLINK-14123] - Lower the default value of taskmanager.memory.fraction [FLINK-14215] - Add Docs for TM and JM Environment Variable Setting [FLINK-14334] - ElasticSearch docs refer to non-existent ExceptionUtils.containsThrowable [FLINK-14639] - Fix the document of Metrics that has an error for \`User Scope\` [FLINK-14646] - Check non-null for key in KeyGroupStreamPartitioner [FLINK-14995] - Kinesis NOTICE is incorrect `}),e.add({id:175,href:"/2019/11/25/how-to-query-pulsar-streams-using-apache-flink/",title:"How to query Pulsar Streams using Apache Flink",section:"Flink Blog",content:`In a previous story on the Flink blog, we explained the different ways that Apache Flink and Apache Pulsar can integrate to provide elastic data processing at large scale. This blog post discusses the new developments and integrations between the two frameworks and showcases how you can leverage Pulsar’s built-in schema to query Pulsar streams in real time using Apache Flink.
A short intro to Apache Pulsar # Apache Pulsar is a flexible pub/sub messaging system, backed by durable log storage. Some of the framework’s highlights include multi-tenancy, a unified message model, structured event streams and a cloud-native architecture that make it a perfect fit for a wide set of use cases, ranging from billing, payments and trading services all the way to the unification of the different messaging architectures in an organization. If you are interested in finding out more about Pulsar, you can visit the Apache Pulsar documentation or get in touch with the Pulsar community on Slack.
Existing Pulsar &amp; Flink integration (Apache Flink 1.6+) # The existing integration between Pulsar and Flink exploits Pulsar as a message queue in a Flink application. Flink developers can utilize Pulsar as a streaming source and streaming sink for their Flink applications by selecting a specific Pulsar source and connecting to their desired Pulsar cluster and topic:
// create and configure Pulsar consumer PulsarSourceBuilder&lt;String&gt;builder = PulsarSourceBuilder .builder(new SimpleStringSchema()) .serviceUrl(serviceUrl) .topic(inputTopic) .subsciptionName(subscription); SourceFunction&lt;String&gt; src = builder.build(); // ingest DataStream with Pulsar consumer DataStream&lt;String&gt; words = env.addSource(src); Pulsar streams can then get connected to the Flink processing logic…
// perform computation on DataStream (here a simple WordCount) DataStream&lt;WordWithCount&gt; wc = words .flatmap((FlatMapFunction&lt;String, WordWithCount&gt;) (word, collector) -&gt; { collector.collect(new WordWithCount(word, 1)); }) .returns(WordWithCount.class) .keyBy(&#34;word&#34;) .timeWindow(Time.seconds(5)) .reduce((ReduceFunction&lt;WordWithCount&gt;) (c1, c2) -&gt; new WordWithCount(c1.word, c1.count + c2.count)); &hellip;and then get emitted back to Pulsar (used now as a sink), sending one’s computation results downstream, back to a Pulsar topic:
// emit result via Pulsar producer wc.addSink(new FlinkPulsarProducer&lt;&gt;( serviceUrl, outputTopic, new AuthentificationDisabled(), wordWithCount -&gt; wordWithCount.toString().getBytes(UTF_8), wordWithCount -&gt; wordWithCount.word) ); Although this is a great first integration step, the existing design is not leveraging the full power of Pulsar. Some shortcomings of the integration with Flink 1.6.0 relate to Pulsar neither being utilized as durable storage nor having schema integration with Flink, resulting in manual input when describing an application’s schema registry.
Pulsar’s integration with Flink 1.9: Using Pulsar as a Flink catalog # The latest integration between Flink 1.9.0 and Pulsar addresses most of the previously mentioned shortcomings. The contribution of Alibaba’s Blink to the Flink repository adds many enhancements and new features to the processing framework that make the integration with Pulsar significantly more powerful and impactful. Flink 1.9.0 brings Pulsar schema integration into the picture, makes the Table API a first-class citizen and provides an exactly-once streaming source and at-least-once streaming sink with Pulsar. Lastly, with schema integration, Pulsar can now be registered as a Flink catalog, making running Flink queries on top of Pulsar streams a matter of a few commands. In the following sections, we will take a closer look at the new integrations and provide examples of how to query Pulsar streams using Flink SQL.
Leveraging the Flink &lt;&gt; Pulsar Schema Integration # Before delving into the integration details and how you can use Pulsar schema with Flink, let us describe how schema in Pulsar works. Schema in Apache Pulsar already co-exists and serves as the representation of the data on the broker side of the framework, something that makes schema registry with external systems obsolete. Additionally, the data schema in Pulsar is associated with each topic so both producers and consumers send data with predefined schema information, while the broker performs schema validation, and manages schema multi-versioning and evolution in compatibility checks.
Below you can find an example of Pulsar’s schema on both the producer and consumer side. On the producer side, you can specify which schema you want to use and Pulsar then sends a POJO class without the need to perform any serialization/deserialization. Similarly, on the consumer end, you can also specify the data schema and upon receiving the data, Pulsar will automatically validate the schema information, fetch the schema of the given version and then deserialize the data back to a POJO structure. Pulsar stores the schema information in the metadata of a Pulsar topic.
// Create producer with Struct schema and send messages Producer&lt;User&gt; producer = client.newProducer(Schema.AVRO(User.class)).create(); producer.newMessage() .value(User.builder() .userName(“pulsar-user”) .userId(1L) .build()) .send(); // Create consumer with Struct schema and receive messages Consumer&lt;User&gt; consumer = client.newCOnsumer(Schema.AVRO(User.class)).create(); consumer.receive(); Let’s assume we have an application that specifies a schema to the producer and/or consumer. Upon receiving the schema information, the producer (or consumer) — that is connected to the broker — will transfer such information so that the broker can then perform schema registration, validations and schema compatibility checks before returning or rejecting the schema as illustrated in the diagram below:
Not only is Pulsar able to handle and store the schema information, but is additionally able to handle any schema evolution — where necessary. Pulsar will effectively manage any schema evolution in the broker, keeping track of all different versions of your schema while performing any necessary compatibility checks.
Moreover, when messages are published on the producer side, Pulsar will tag each message with the schema version as part of each message’s metadata. On the consumer side, when the message is received and the metadata is deserialized, Pulsar will check the schema version associated with this message and will fetch the corresponding schema information from the broker. As a result, when Pulsar integrates with a Flink application it uses the pre-existing schema information and maps individual messages with schema information to a different row in Flink’s type system.
For the cases when Flink users do not interact with schema directly or make use of primitive schema (for example, using a topic to store a string or long number), Pulsar will either convert the message payload into a Flink row, called ‘value’ or — for the cases of structured schema types, like JSON and AVRO — Pulsar will extract the individual fields from the schema information and will map the fields to Flink’s type system. Finally, all metadata information associated with each message, such as the message key, topic, publish time, or event time will be converted into metadata fields in a Flink row. Below we provide two examples of primitive schema and structured schema types and how these will be transformed from a Pulsar topic to Flink’s type system.
Once all the schema information is mapped to Flink’s type system, you can start building a Pulsar source, sink or catalog in Flink based on the specified schema information as illustrated below:
Flink &amp; Pulsar: Read data from Pulsar # Create a Pulsar source for streaming queries val env = StreamExecutionEnvironment.getExecutionEnvironment val props = new Properties() props.setProperty(&#34;service.url&#34;, &#34;pulsar://...&#34;) props.setProperty(&#34;admin.url&#34;, &#34;http://...&#34;) props.setProperty(&#34;partitionDiscoveryIntervalMillis&#34;, &#34;5000&#34;) props.setProperty(&#34;startingOffsets&#34;, &#34;earliest&#34;) props.setProperty(&#34;topic&#34;, &#34;test-source-topic&#34;) val source = new FlinkPulsarSource(props) // you don&#39;t need to provide a type information to addSource since FlinkPulsarSource is ResultTypeQueryable val dataStream = env.addSource(source)(null) // chain operations on dataStream of Row and sink the output // end method chaining env.execute() Register topics in Pulsar as streaming tables val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = StreamTableEnvironment.create(env) val prop = new Properties() prop.setProperty(&#34;service.url&#34;, serviceUrl) prop.setProperty(&#34;admin.url&#34;, adminUrl) prop.setProperty(&#34;flushOnCheckpoint&#34;, &#34;true&#34;) prop.setProperty(&#34;failOnWrite&#34;, &#34;true&#34;) props.setProperty(&#34;topic&#34;, &#34;test-sink-topic&#34;) tEnv .connect(new Pulsar().properties(props)) .inAppendMode() .registerTableSource(&#34;sink-table&#34;) val sql = &#34;INSERT INTO sink-table .....&#34; tEnv.sqlUpdate(sql) env.execute() Flink &amp; Pulsar: Write data to Pulsar # Create a Pulsar sink for streaming queries val env = StreamExecutionEnvironment.getExecutionEnvironment val stream = ..... val prop = new Properties() prop.setProperty(&#34;service.url&#34;, serviceUrl) prop.setProperty(&#34;admin.url&#34;, adminUrl) prop.setProperty(&#34;flushOnCheckpoint&#34;, &#34;true&#34;) prop.setProperty(&#34;failOnWrite&#34;, &#34;true&#34;) props.setProperty(&#34;topic&#34;, &#34;test-sink-topic&#34;) stream.addSink(new FlinkPulsarSink(prop, DummyTopicKeyExtractor)) env.execute() Write a streaming table to Pulsar val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = StreamTableEnvironment.create(env) val prop = new Properties() prop.setProperty(&#34;service.url&#34;, serviceUrl) prop.setProperty(&#34;admin.url&#34;, adminUrl) prop.setProperty(&#34;flushOnCheckpoint&#34;, &#34;true&#34;) prop.setProperty(&#34;failOnWrite&#34;, &#34;true&#34;) props.setProperty(&#34;topic&#34;, &#34;test-sink-topic&#34;) tEnv .connect(new Pulsar().properties(props)) .inAppendMode() .registerTableSource(&#34;sink-table&#34;) val sql = &#34;INSERT INTO sink-table .....&#34; tEnv.sqlUpdate(sql) env.execute() In every instance, Flink developers only need to specify the properties of how Flink will connect to a Pulsar cluster without worrying about any schema registry, or serialization/deserialization actions and register the Pulsar cluster as a source, sink or streaming table in Flink. Once all three elements are put together, Pulsar can then be registered as a catalog in Flink, something that drastically simplifies how you process and query data like, for example, writing a program to query data from Pulsar or using the Table API and SQL to query Pulsar data streams.
Next Steps &amp; Future Integration # The goal of the integration between Pulsar and Flink is to simplify how developers use the two frameworks to build a unified data processing stack. As we progress from the classical Lamda architectures — where an online, speeding layer is combined with an offline, batch layer to run data computations — Flink and Pulsar present a great combination in providing a truly unified data processing stack. We see Flink as a unified computation engine, handling both online (streaming) and offline (batch) workloads and Pulsar as the unified data storage layer for a truly unified data processing stack that simplifies developer workloads.
There is still a lot of ongoing work and effort from both communities in getting the integration even better, such as a new source API (FLIP-27) that will allow the contribution of the Pulsar connectors to the Flink community as well as a new subscription type called Key_Shared subscription type in Pulsar that will allow efficient scaling of the source parallelism. Additional efforts focus around the provision of end-to-end, exactly-once guarantees (currently available only in the source Pulsar connector, and not the sink Pulsar connector) and more efforts around using Pulsar/BookKeeper as a Flink state backend.
You can find a more detailed overview of the integration work between the two communities in this recording video from Flink Forward Europe 2019 or sign up to the Flink dev mailing list for the latest contribution and integration efforts between Flink and Pulsar.
`}),e.add({id:176,href:"/2019/11/06/running-apache-flink-on-kubernetes-with-kudo/",title:"Running Apache Flink on Kubernetes with KUDO",section:"Flink Blog",content:`A common use case for Apache Flink is streaming data analytics together with Apache Kafka, which provides a pub/sub model and durability for data streams. To achieve elastic scalability, both are typically deployed in clustered environments, and increasingly on top of container orchestration platforms like Kubernetes. The Operator pattern provides an extension mechanism to Kubernetes that captures human operator knowledge about an application, like Flink, in software to automate its operation. KUDO is an open source toolkit for building Operators using declarative YAML specs, with a focus on ease of use for cluster admins and developers.
In this blog post we demonstrate how to orchestrate a streaming data analytics application based on Flink and Kafka with KUDO. It consists of a Flink job that checks financial transactions for fraud, and two microservices that generate and display the transactions. You can find more details about this demo in the KUDO Operators repository, including instructions for installing the dependencies.
Prerequisites # You can run this demo on your local machine using minikube. The instructions below were tested with minikube v1.5.1 and Kubernetes v1.16.2 but should work on any Kubernetes version above v1.15.0. First, start a minikube cluster with enough capacity:
minikube start --cpus=6 --memory=9216 --disk-size=10g
If you’re using a different way to provision Kubernetes, make sure you have at least 6 CPU Cores, 9 GB of RAM and 10 GB of disk space available.
Install the kubectl CLI tool. The KUDO CLI is a plugin for the Kubernetes CLI. The official instructions for installing and setting up kubectl are here.
Next, let’s install the KUDO CLI. At the time of this writing, the latest KUDO version is v0.10.0. You can find the CLI binaries for download here. Download the kubectl-kudo binary for your OS and architecture.
If you’re using Homebrew on MacOS, you can install the CLI via:
$ brew tap kudobuilder/tap $ brew install kudo-cli Now, let’s initialize KUDO on our Kubernetes cluster:
$ kubectl kudo init $KUDO_HOME has been configured at /Users/gerred/.kudo This will create several resources. First, it will create the Custom Resource Definitions, service account, and role bindings necessary for KUDO to operate. It will also create an instance of the KUDO controller so that we can begin creating instances of applications.
The KUDO CLI leverages the kubectl plugin system, which gives you all its functionality under kubectl kudo. This is a convenient way to install and deal with your KUDO Operators. For our demo, we use Kafka and Flink which depend on ZooKeeper. To make the ZooKeeper Operator available on the cluster, run:
$ kubectl kudo install zookeeper --version=0.3.0 --skip-instance The &ndash;skip-instance flag skips the creation of a ZooKeeper instance. The flink-demo Operator that we’re going to install below will create it as a dependency instead. Now let’s make the Kafka and Flink Operators available the same way:
$ kubectl kudo install kafka --version=1.2.0 --skip-instance $ kubectl kudo install flink --version=0.2.1 --skip-instance This installs all the Operator versions needed for our demo.
Financial Fraud Demo # In our financial fraud demo we have two microservices, called “generator” and “actor”. The generator produces transactions with random amounts and writes them into a Kafka topic. Occasionally, the value will be over 10,000 which is considered fraud for the purpose of this demo. The Flink job subscribes to the Kafka topic and detects fraudulent transactions. When it does, it submits them to another Kafka topic which the actor consumes. The actor simply displays each fraudulent transaction.
The KUDO CLI by default installs Operators from the official repository, but it also supports installation from your local filesystem. This is useful if you want to develop your own Operator, or modify this demo for your own purposes.
First, clone the “kudobuilder/operators” repository via:
$ git clone https://github.com/kudobuilder/operators.git Next, change into the “operators” directory and install the demo-operator from your local filesystem:
$ cd operators $ kubectl kudo install repository/flink/docs/demo/financial-fraud/demo-operator --instance flink-demo instance.kudo.dev/v1beta1/flink-demo created This time we didn’t include the &ndash;skip-instance flag, so KUDO will actually deploy all the components, including Flink, Kafka, and ZooKeeper. KUDO orchestrates deployments and other lifecycle operations using plans that were defined by the Operator developer. Plans are similar to runbooks and encapsulate all the procedures required to operate the software. We can track the status of the deployment using this KUDO command:
$ kubectl kudo plan status --instance flink-demo Plan(s) for &#34;flink-demo&#34; in namespace &#34;default&#34;: . └── flink-demo (Operator-Version: &#34;flink-demo-0.1.4&#34; Active-Plan: &#34;deploy&#34;) └── Plan deploy (serial strategy) [IN_PROGRESS] ├── Phase dependencies [IN_PROGRESS] │ ├── Step zookeeper (COMPLETE) │ └── Step kafka (IN_PROGRESS) ├── Phase flink-cluster [PENDING] │ └── Step flink (PENDING) ├── Phase demo [PENDING] │ ├── Step gen (PENDING) │ └── Step act (PENDING) └── Phase flink-job [PENDING] └── Step submit (PENDING) The output shows that the “deploy” plan is in progress and that it consists of 4 phases: “dependencies”, “flink-cluster”, “demo” and “flink-job”. The “dependencies” phase includes steps for “zookeeper” and “kafka”. This is where both dependencies get installed, before KUDO continues to install the Flink cluster and the demo itself. We also see that ZooKeeper installation completed, and that Kafka installation is currently in progress. We can view details about Kafka’s deployment plan via:
$ kubectl kudo plan status --instance flink-demo-kafka Plan(s) for &#34;flink-demo-kafka&#34; in namespace &#34;default&#34;: . └── flink-demo-kafka (Operator-Version: &#34;kafka-1.2.0&#34; Active-Plan: &#34;deploy&#34;) ├── Plan deploy (serial strategy) [IN_PROGRESS] │ └── Phase deploy-kafka [IN_PROGRESS] │ └── Step deploy (IN_PROGRESS) └── Plan not-allowed (serial strategy) [NOT ACTIVE] └── Phase not-allowed (serial strategy) [NOT ACTIVE] └── Step not-allowed (serial strategy) [NOT ACTIVE] └── not-allowed [NOT ACTIVE] After Kafka was successfully installed the next phase “flink-cluster” will start and bring up, you guessed it, your flink-cluster. After this is done, the demo phase creates the generator and actor pods that generate and display transactions for this demo. Lastly, we have the flink-job phase in which we submit the actual FinancialFraudJob to the Flink cluster. Once the flink job is submitted, we will be able to see fraud logs in our actor pod shortly after.
After a while, the state of all plans, phases and steps will change to “COMPLETE”. Now we can view the Flink dashboard to verify that our job is running. To access it from outside the Kubernetes cluster, first start the client proxy, then open the URL below in your browser:
$ kubectl proxy http://127.0.0.1:8001/api/v1/namespaces/default/services/flink-demo-flink-jobmanager:ui/proxy/#/overview
It should look similar to this, depending on your local machine and how many cores you have available:
The job is up and running and we should now be able to see fraudulent transaction in the logs of the actor pod:
$ kubectl logs $(kubectl get pod -l actor=flink-demo -o jsonpath=&#34;{.items[0].metadata.name}&#34;) Broker: flink-demo-kafka-kafka-0.flink-demo-kafka-svc:9093 Topic: fraud Detected Fraud: TransactionAggregate {startTimestamp=0, endTimestamp=1563395831000, totalAmount=19895: Transaction{timestamp=1563395778000, origin=1, target=&#39;3&#39;, amount=8341} Transaction{timestamp=1563395813000, origin=1, target=&#39;3&#39;, amount=8592} Transaction{timestamp=1563395817000, origin=1, target=&#39;3&#39;, amount=2802} Transaction{timestamp=1563395831000, origin=1, target=&#39;3&#39;, amount=160}} If you add the “-f” flag to the previous command, you can follow along as more transactions are streaming in and are evaluated by our Flink job.
Conclusion # In this blog post we demonstrated how to easily deploy an end-to-end streaming data application on Kubernetes using KUDO. We deployed a Flink job and two microservices, as well as all the required infrastructure - Flink, Kafka, and ZooKeeper using just a few kubectl commands. To find out more about KUDO, visit the project website or join the community on Slack.
`}),e.add({id:177,href:"/2019/10/18/apache-flink-1.9.1-released/",title:"Apache Flink 1.9.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.9 series.
This release includes 96 fixes and minor improvements for Flink 1.9.0. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.9.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.9.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.9.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-11630] - TaskExecutor does not wait for Task termination when terminating itself [FLINK-13490] - Fix if one column value is null when reading JDBC, the following values are all null [FLINK-13941] - Prevent data-loss by not cleaning up small part files from S3. [FLINK-12501] - AvroTypeSerializer does not work with types generated by avrohugger [FLINK-13386] - Fix some frictions in the new default Web UI [FLINK-13526] - Switching to a non existing catalog or database crashes sql-client [FLINK-13568] - DDL create table doesn&#39;t allow STRING data type [FLINK-13805] - Bad Error Message when TaskManager is lost [FLINK-13806] - Metric Fetcher floods the JM log with errors when TM is lost [FLINK-14010] - Dispatcher &amp; JobManagers don&#39;t give up leadership when AM is shut down [FLINK-14145] - CompletedCheckpointStore#getLatestCheckpoint(true) returns wrong checkpoint [FLINK-13059] - Cassandra Connector leaks Semaphore on Exception and hangs on close [FLINK-13534] - Unable to query Hive table with decimal column [FLINK-13562] - Throws exception when FlinkRelMdColumnInterval meets two stage stream group aggregate [FLINK-13563] - TumblingGroupWindow should implement toString method [FLINK-13564] - Throw exception if constant with YEAR TO MONTH resolution was used for group windows [FLINK-13588] - StreamTask.handleAsyncException throws away the exception cause [FLINK-13653] - ResultStore should avoid using RowTypeInfo when creating a result [FLINK-13711] - Hive array values not properly displayed in SQL CLI [FLINK-13737] - flink-dist should add provided dependency on flink-examples-table [FLINK-13738] - Fix NegativeArraySizeException in LongHybridHashTable [FLINK-13742] - Fix code generation when aggregation contains both distinct aggregate with and without filter [FLINK-13760] - Fix hardcode Scala version dependency in hive connector [FLINK-13761] - \`SplitStream\` should be deprecated because \`SplitJavaStream\` is deprecated [FLINK-13789] - Transactional Id Generation fails due to user code impacting formatting string [FLINK-13823] - Incorrect debug log in CompileUtils [FLINK-13825] - The original plugins dir is not restored after e2e test run [FLINK-13831] - Free Slots / All Slots display error [FLINK-13887] - Ensure defaultInputDependencyConstraint to be non-null when setting it in ExecutionConfig [FLINK-13897] - OSS FS NOTICE file is placed in wrong directory [FLINK-13933] - Hive Generic UDTF can not be used in table API both stream and batch mode [FLINK-13936] - NOTICE-binary is outdated [FLINK-13966] - Jar sorting in collect_license_files.sh is locale dependent [FLINK-14009] - Cron jobs broken due to verifying incorrect NOTICE-binary file [FLINK-14049] - Update error message for failed partition updates to include task name [FLINK-14076] - &#39;ClassNotFoundException: KafkaException&#39; on Flink v1.9 w/ checkpointing [FLINK-14107] - Kinesis consumer record emitter deadlock under event time alignment [FLINK-14119] - Clean idle state for RetractableTopNFunction [FLINK-14139] - Fix potential memory leak of rest server when using session/standalone cluster [FLINK-14140] - The Flink Logo Displayed in Flink Python Shell is Broken [FLINK-14150] - Unnecessary __pycache__ directories appears in pyflink.zip [FLINK-14288] - Add Py4j NOTICE for source release [FLINK-13892] - HistoryServerTest failed on Travis [FLINK-14043] - SavepointMigrationTestBase is super slow [FLINK-12164] - JobMasterTest.testJobFailureWhenTaskExecutorHeartbeatTimeout is unstable [FLINK-9900] - Fix unstable test ZooKeeperHighAvailabilityITCase#testRestoreBehaviourWithFaultyStateHandles [FLINK-13484] - ConnectedComponents end-to-end test instable with NoResourceAvailableException [FLINK-13489] - Heavy deployment end-to-end test fails on Travis with TM heartbeat timeout [FLINK-13514] - StreamTaskTest.testAsyncCheckpointingConcurrentCloseAfterAcknowledge unstable [FLINK-13530] - AbstractServerTest failed on Travis [FLINK-13585] - Fix sporadical deallock in TaskAsyncCallTest#testSetsUserCodeClassLoader() [FLINK-13599] - Kinesis end-to-end test failed on Travis [FLINK-13663] - SQL Client end-to-end test for modern Kafka failed on Travis [FLINK-13688] - HiveCatalogUseBlinkITCase.testBlinkUdf constantly failed [FLINK-13739] - BinaryRowTest.testWriteString() fails in some environments [FLINK-13746] - Elasticsearch (v2.3.5) sink end-to-end test fails on Travis [FLINK-13769] - BatchFineGrainedRecoveryITCase.testProgram failed on Travis [FLINK-13807] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8 Improvement [FLINK-13965] - Keep hasDeprecatedKeys and deprecatedKeys methods in ConfigOption and mark it with @Deprecated annotation [FLINK-9941] - Flush in ScalaCsvOutputFormat before close method [FLINK-13336] - Remove the legacy batch fault tolerance page and redirect it to the new task failure recovery page [FLINK-13380] - Improve the usability of Flink session cluster on Kubernetes [FLINK-13819] - Introduce RpcEndpoint State [FLINK-13845] - Drop all the content of removed &quot;Checkpointed&quot; interface [FLINK-13957] - Log dynamic properties on job submission [FLINK-13967] - Generate full binary licensing via collect_license_files.sh [FLINK-13968] - Add travis check for the correctness of the binary licensing [FLINK-13449] - Add ARM architecture to MemoryArchitecture Documentation [FLINK-13105] - Add documentation for blink planner&#39;s built-in functions [FLINK-13277] - add documentation of Hive source/sink [FLINK-13354] - Add documentation for how to use blink planner [FLINK-13355] - Add documentation for Temporal Table Join in blink planner [FLINK-13356] - Add documentation for TopN and Deduplication in blink planner [FLINK-13359] - Add documentation for DDL introduction [FLINK-13362] - Add documentation for Kafka &amp; ES &amp; FileSystem DDL [FLINK-13363] - Add documentation for streaming aggregate performance tunning. [FLINK-13706] - add documentation of how to use Hive functions in Flink [FLINK-13942] - Add Overview page for Getting Started section [FLINK-13863] - Update Operations Playground to Flink 1.9.0 [FLINK-13937] - Fix wrong hive dependency version in documentation [FLINK-13830] - The Document about Cluster on yarn have some problems [FLINK-14160] - Extend Operations Playground with --backpressure option [FLINK-13388] - Update UI screenshots in the documentation to the new default Web Frontend [FLINK-13415] - Document how to use hive connector in scala shell [FLINK-13517] - Restructure Hive Catalog documentation [FLINK-13643] - Document the workaround for users with a different minor Hive version [FLINK-13757] - Fix wrong description of "IS NOT TRUE" function documentation `}),e.add({id:178,href:"/2019/09/13/the-state-processor-api-how-to-read-write-and-modify-the-state-of-flink-applications/",title:"The State Processor API: How to Read, write and modify the state of Flink applications",section:"Flink Blog",content:`Whether you are running Apache FlinkⓇ in production or evaluated Flink as a computation framework in the past, you&rsquo;ve probably found yourself asking the question: How can I access, write or update state in a Flink savepoint? Ask no more! Apache Flink 1.9.0 introduces the State Processor API, a powerful extension of the DataSet API that allows reading, writing and modifying state in Flink&rsquo;s savepoints and checkpoints.
In this post, we explain why this feature is a big step for Flink, what you can use it for, and how to use it. Finally, we will discuss the future of the State Processor API and how it aligns with our plans to evolve Flink into a system for unified batch and stream processing.
Stateful Stream Processing with Apache Flink until Flink 1.9 # All non-trivial stream processing applications are stateful and most of them are designed to run for months or years. Over time, many of them accumulate a lot of valuable state that can be very expensive or even impossible to rebuild if it gets lost due to a failure. In order to guarantee the consistency and durability of application state, Flink featured a sophisticated checkpointing and recovery mechanism from very early on. With every release, the Flink community has added more and more state-related features to improve checkpointing and recovery speed, the maintenance of applications, and practices to manage applications.
However, a feature that was commonly requested by Flink users was the ability to access the state of an application “from the outside”. This request was motivated by the need to validate or debug the state of an application, to migrate the state of an application to another application, to evolve an application from the Heap State Backend to the RocksDB State Backend, or to import the initial state of an application from an external system like a relational database.
Despite all those convincing reasons to expose application state externally, your access options have been fairly limited until now. Flink&rsquo;s Queryable State feature only supports key-lookups (point queries) and does not guarantee the consistency of returned values (the value of a key might be different before and after an application recovered from a failure). Moreover, queryable state cannot be used to add or modify the state of an application. Also, savepoints, which are consistent snapshots of an application&rsquo;s state, were not accessible because the application state is encoded with a custom binary format.
Reading and Writing Application State with the State Processor API # The State Processor API that comes with Flink 1.9 is a true game-changer in how you can work with application state! In a nutshell, it extends the DataSet API with Input and OutputFormats to read and write savepoint or checkpoint data. Due to the interoperability of DataSet and Table API, you can even use relational Table API or SQL queries to analyze and process state data.
For example, you can take a savepoint of a running stream processing application and analyze it with a DataSet batch program to verify that the application behaves correctly. Or you can read a batch of data from any store, preprocess it, and write the result to a savepoint that you use to bootstrap the state of a streaming application. It&rsquo;s also possible to fix inconsistent state entries now. Finally, the State Processor API opens up many ways to evolve a stateful application that were previously blocked by parameter and design choices that could not be changed without losing all the state of the application after it was started. For example, you can now arbitrarily modify the data types of states, adjust the maximum parallelism of operators, split or merge operator state, re-assign operator UIDs, and so on.
Mapping Application State to DataSets # The State Processor API maps the state of a streaming application to one or more data sets that can be separately processed. In order to be able to use the API, you need to understand how this mapping works.
But let&rsquo;s first have a look at what a stateful Flink job looks like. A Flink job is composed of operators, typically one or more source operators, a few operators for the actual processing, and one or more sink operators. Each operator runs in parallel in one or more tasks and can work with different types of state. An operator can have zero, one, or more “operator states” which are organized as lists that are scoped to the operator&rsquo;s tasks. If the operator is applied on a keyed stream, it can also have zero, one, or more “keyed states” which are scoped to a key that is extracted from each processed record. You can think of keyed state as a distributed key-value map.
The following figure shows the application “MyApp” which consists of three operators called “Src”, “Proc”, and “Snk”. Src has one operator state (os1), Proc has one operator state (os2) and two keyed states (ks1, ks2) and Snk is stateless.
A savepoint or checkpoint of MyApp consists of the data of all states, organized in a way that the states of each task can be restored. When processing the data of a savepoint (or checkpoint) with a batch job, we need a mental model that maps the data of the individual tasks&rsquo; states into data sets or tables. In fact, we can think of a savepoint as a database. Every operator (identified by its UID) represents a namespace. Each operator state of an operator is mapped to a dedicated table in the namespace with a single column that holds the state&rsquo;s data of all tasks. All keyed states of an operator are mapped to a single table consisting of a column for the key, and one column for each keyed state. The following figure shows how a savepoint of MyApp is mapped to a database.
The figure shows how the values of Src&rsquo;s operator state are mapped to a table with one column and five rows, one row for all list entries across all parallel tasks of Src. Operator state os2 of the operator “Proc” is similarly mapped to an individual table. The keyed states ks1 and ks2 are combined to a single table with three columns, one for the key, one for ks1 and one for ks2. The keyed table holds one row for each distinct key of both keyed states. Since the operator “Snk” does not have any state, its namespace is empty.
The State Processor API now offers methods to create, load, and write a savepoint. You can read a DataSet from a loaded savepoint or convert a DataSet into a state and add it to a savepoint. DataSets can be processed with the full feature set of the DataSet API. With these building blocks, all of the before-mentioned use cases (and more) can be addressed. Please have a look at the documentation if you&rsquo;d like to learn how to use the State Processor API in detail.
Why DataSet API? # In case you are familiar with Flink&rsquo;s roadmap, you might be surprised that the State Processor API is based on the DataSet API. The Flink community plans to extend the DataStream API with the concept of BoundedStreams and deprecate the DataSet API. When designing this feature, we also evaluated the DataStream API or Table API but neither could provide the right feature set yet. Since we didn&rsquo;t want to block this feature on the progress of Flink&rsquo;s APIs, we decided to build it on the DataSet API, but kept its dependencies on the DataSet API to a minimum. Hence, migrating it to another API should be fairly easy.
Summary # Flink users have requested a feature to access and modify the state of streaming applications from the outside for a long time. With the State Processor API, Flink 1.9.0 finally exposes application state as a data format that can be manipulated. This feature opens up many new possibilities for how users can maintain and manage Flink streaming applications, including arbitrary evolution of stream applications and exporting and bootstrapping of application state. To put it concisely, the State Processor API unlocks the black box that savepoints used to be.
`}),e.add({id:179,href:"/2019/09/11/apache-flink-1.8.2-released/",title:"Apache Flink 1.8.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.8 series.
This release includes 23 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all fixes and improvements.
We highly recommend all users to upgrade to Flink 1.8.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.8.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-13941] - Prevent data-loss by not cleaning up small part files from S3. [FLINK-9526] - BucketingSink end-to-end test failed on Travis [FLINK-10368] - &#39;Kerberized YARN on Docker test&#39; unstable [FLINK-12319] - StackOverFlowError in cep.nfa.sharedbuffer.SharedBuffer [FLINK-12736] - ResourceManager may release TM with allocated slots [FLINK-12889] - Job keeps in FAILING state [FLINK-13059] - Cassandra Connector leaks Semaphore on Exception; hangs on close [FLINK-13159] - java.lang.ClassNotFoundException when restore job [FLINK-13367] - Make ClosureCleaner detect writeReplace serialization override [FLINK-13369] - Recursive closure cleaner ends up with stackOverflow in case of circular dependency [FLINK-13394] - Use fallback unsafe secure MapR in nightly.sh [FLINK-13484] - ConnectedComponents end-to-end test instable with NoResourceAvailableException [FLINK-13499] - Remove dependency on MapR artifact repository [FLINK-13508] - CommonTestUtils#waitUntilCondition() may attempt to sleep with negative time [FLINK-13586] - Method ClosureCleaner.clean broke backward compatibility between 1.8.0 and 1.8.1 [FLINK-13761] - \`SplitStream\` should be deprecated because \`SplitJavaStream\` is deprecated [FLINK-13789] - Transactional Id Generation fails due to user code impacting formatting string [FLINK-13806] - Metric Fetcher floods the JM log with errors when TM is lost [FLINK-13807] - Flink-avro unit tests fails if the character encoding in the environment is not default to UTF-8 [FLINK-13897] - OSS FS NOTICE file is placed in wrong directory Improvement [FLINK-12578] - Use secure URLs for Maven repositories [FLINK-12741] - Update docs about Kafka producer fault tolerance guarantees [FLINK-12749] - Add Flink Operations Playground documentation `}),e.add({id:180,href:"/2019/09/05/flink-community-update-september19/",title:"Flink Community Update - September'19",section:"Flink Blog",content:`This has been an exciting, fast-paced year for the Apache Flink community. But with over 10k messages across the mailing lists, 3k Jira tickets and 2k pull requests, it is not easy to keep up with the latest state of the project. Plus everything happening around it. With that in mind, we want to bring back regular community updates to the Flink blog.
The first post in the series takes you on an little detour across the year, to freshen up and make sure you&rsquo;re all up to date.
The Year (so far) in Flink # Two major versions were released this year: Flink 1.8 and Flink 1.9; paving the way for the goal of making Flink the first framework to seamlessly support stream and batch processing with a single, unified runtime. The contribution of Blink to Apache Flink was key in accelerating the path to this vision and reduced the waiting time for long-pending user requests — such as Hive integration, (better) Python support, the rework of Flink&rsquo;s Machine Learning library and&hellip;fine-grained failure recovery (FLIP-1).
The 1.9 release was the result of the biggest community effort the project has experienced so far, with the number of contributors soaring to 190 (see The Bigger Picture). For a quick overview of the upcoming work for Flink 1.10 (and beyond), have a look at the updated roadmap!
Integration of the Chinese-speaking community # As the number of Chinese-speaking Flink users rapidly grows, the community is working on translating resources and creating dedicated spaces for discussion to invite and include these users in the wider Flink community. Part of the ongoing work is described in FLIP-35 and has resulted in:
A new user mailing list (user-zh@f.a.o) dedicated to Chinese-speakers. * A Chinese translation of the Apache Flink [website](https://flink.apache.org/zh/) and [documentation](//nightlies.apache.org/flink/flink-docs-master/zh/). * Multiple meetups organized all over China, with the biggest one reaching a whopping number of 500+ participants. Some of these meetups were also organized in collaboration with communities from other projects, like Apache Pulsar and Apache Kafka. In case you&rsquo;re interested in knowing more about this work in progress, Robert Metzger and Fabian Hueske will be diving into &ldquo;Inviting Apache Flink&rsquo;s Chinese User Community&rdquo; at the upcoming ApacheCon Europe 2019 (see Upcoming Flink Community Events).
Improving Flink&rsquo;s Documentation # Besides the translation effort, the community has also been working quite hard on a Flink docs overhaul. The main goals are to:
Organize and clean-up the structure of the docs; * Align the content with the overall direction of the project; * Improve the _getting-started_ material and make the content more accessible to different levels of Flink experience. Given that there has been some confusion in the past regarding unclear definition of core Flink concepts, one of the first completed efforts was to introduce a Glossary in the docs. To get up to speed with the roadmap for the remainder efforts, you can refer to FLIP-42 and the corresponding umbrella Jira ticket.
Adjusting the Contribution Process and Experience # The guidelines to contribute to Apache Flink have been reworked on the website, in an effort to lower the entry barrier for new contributors and reduce the overall friction in the contribution process. In addition, the Flink community discussed and adopted bylaws to help the community collaborate and coordinate more smoothly.
For code contributors, a Code Style and Quality Guide that captures the expected standards for contributions was also added to the &ldquo;Contributing&rdquo; section of the Flink website.
It&rsquo;s important to stress that contributions are not restricted to code. Non-code contributions such as mailing list support, documentation work or organization of community events are equally as important to the development of the project and highly encouraged.
New Committers and PMC Members # The Apache Flink community has welcomed 5 new Committers and 4 PMC (Project Management Committee) Members in 2019, so far:
New PMC Members # Jincheng Sun, Kete (Kurt) Young, Kostas Kloudas, Thomas Weise New Committers # Andrey Zagrebin, Hequn, Jiangjie (Becket) Qin, Rong Rong, Zhijiang Wang Congratulations and thank you for your hardworking commitment to Flink!
The Bigger Picture # Flink continues to push the boundaries of (stream) data processing, and the community is proud to see an ever-increasingly diverse set of contributors, users and technologies join the ecosystem.
In the timeframe of three releases, the project jumped from 112 to 190 contributors, also doubling down on the number of requested changes and improvements. To top it off, the Flink GitHub repository recently reached the milestone of 10k stars, all the way up from the incubation days in 2014.
The activity across the user@ and dev@1 mailing lists shows a healthy heartbeat, and the gradual ramp up of user-zh@ suggests that this was a well-received community effort. Looking at the numbers for the same period in 2018, the dev@ mailing list has seen the biggest surge in activity, with an average growth of 2.5x in the number of messages and distinct users — a great reflection of the hyperactive pace of development of the Flink codebase.
In support of these observations, the report for the financial year of 2019 from the Apache Software Foundation (ASF) features Flink as one of the most thriving open source projects, with mentions for:
Most Active Visits and Downloads * Most Active Sources: Visits * Most Active Sources: Clones * Top Repositories by Number of Commits * Top Most Active Apache Mailing Lists (user@ and dev@) Hats off to our fellows at Apache Beam for an astounding year, too! For more detailed insights, check the full report.
1. Excluding messages from "jira@apache.org". Upcoming Events # As the conference and meetup season ramps up again, here are some events to keep an eye out for talks about Flink and opportunities to mingle with the wider stream processing community.
North America # [Conference] Strata Data Conference 2019, September 23-26, New York, USA [Meetup] Apache Flink Bay Area Meetup, September 24, San Francisco, USA [Conference] Scale By The Bay 2019, November 13-15, San Francisco, USA Europe # [Meetup] Apache Flink London Meetup, September 23, London, UK
[Conference] Flink Forward Europe 2019, October 7-9, Berlin, Germany
* The next edition of Flink Forward Europe is around the corner and the [program](https://europe-2019.flink-forward.org/conference-program) has been announced, featuring 70+ talks as well as panel discussions and interactive "Ask Me Anything" sessions with core Flink committers. If you're looking to learn more about Flink and share your experience with other community members, there really is [no better place]((https://vimeo.com/296403091)) than Flink Forward! Note: if you are a committer for any Apache project, you can get a free ticket by registering with your Apache email address and using the discount code: FFEU19-ApacheCommitter. * [Conference] **[ApacheCon Berlin 2019](https://aceu19.apachecon.com/)**, October 22-24, Berlin, Germany * [Conference] **[Data2Day 2019](https://www.data2day.de/)**, October 22-24, Ludwigshafen, Germany * [Conference] **[Big Data Tech Warsaw 2020](https://bigdatatechwarsaw.eu)**, February 7, Warsaw, Poland * The Call For Presentations (CFP) is now [open](https://bigdatatechwarsaw.eu/cfp/). Asia # [Conference] Flink Forward Asia 2019, November 28-30, Beijing, China * The second edition of Flink Forward Asia is also happening later this year, in Beijing, and the CFP is [open](https://developer.aliyun.com/special/ffa2019) until September 20. If you&rsquo;d like to keep a closer eye on what’s happening in the community, subscribe to the community mailing list to get fine-grained weekly updates, upcoming event announcements and more. Also, please reach out if you&rsquo;re interested in organizing or being part of Flink events in your area!
`}),e.add({id:181,href:"/2019/08/22/apache-flink-1.9.0-release-announcement/",title:"Apache Flink 1.9.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is proud to announce the release of Apache Flink 1.9.0.
The Apache Flink project&rsquo;s goal is to develop a stream processing system to unify and power many forms of real-time and offline data processing applications as well as event-driven applications. In this release, we have made a huge step forward in that effort, by integrating Flink’s stream and batch processing capabilities under a single, unified runtime.
Significant features on this path are batch-style recovery for batch jobs and a preview of the new Blink-based query engine for Table API and SQL queries. We are also excited to announce the availability of the State Processor API, which is one of the most frequently requested features and enables users to read and write savepoints with Flink DataSet jobs. Finally, Flink 1.9 includes a reworked WebUI and previews of Flink’s new Python Table API and its integration with the Apache Hive ecosystem.
This blog post describes all major new features and improvements, important changes to be aware of and what to expect moving forward. For more details, check the complete release changelog.
The binary distribution and source artifacts for this release are now available via the Downloads page of the Flink project, along with the updated documentation. Flink 1.9 is API-compatible with previous 1.x releases for APIs annotated with the @Public annotation.
Please feel encouraged to download the release and share your thoughts with the community through the Flink mailing lists or JIRA. As always, feedback is very much appreciated!
New Features and Improvements # Fine-grained Batch Recovery (FLIP-1) # The time to recover a batch (DataSet, Table API and SQL) job from a task failure was significantly reduced. Until Flink 1.9, task failures in batch jobs were recovered by canceling all tasks and restarting the whole job, i.e, the job was started from scratch and all progress was voided. With this release, Flink can be configured to limit the recovery to only those tasks that are in the same failover region. A failover region is the set of tasks that are connected via pipelined data exchanges. Hence, the batch-shuffle connections of a job define the boundaries of its failover regions. More details are available in FLIP-1. To use this new failover strategy, you need to do the following settings:
Make sure you have the entry jobmanager.execution.failover-strategy: region in your flink-conf.yaml. Note: The configuration of the 1.9 distribution has that entry by default, but when reusing a configuration file from previous setups, you have to add it manually.
Moreover, you need to set the ExecutionMode of batch jobs in the ExecutionConfig to BATCH to configure that data shuffles are not pipelined and jobs have more than one failover region.
The &ldquo;Region&rdquo; failover strategy also improves the recovery of “embarrassingly parallel” streaming jobs, i.e., jobs without any shuffle like keyBy() or rebalance. When such a job is recovered, only the tasks of the affected pipeline (failover region) are restarted. For all other streaming jobs, the recovery behavior is the same as in prior Flink versions.
State Processor API (FLIP-43) # Up to Flink 1.9, accessing the state of a job from the outside was limited to the (still) experimental Queryable State. This release introduces a new, powerful library to read, write and modify state snapshots using the batch DataSet API. In practice, this means:
Flink job state can be bootstrapped by reading data from external systems, such as external databases, and converting it into a savepoint. State in savepoints can be queried using any of Flink’s batch APIs (DataSet, Table, SQL), for example to analyze relevant state patterns or check for discrepancies in state that can support application auditing or troubleshooting. The schema of state in savepoints can be migrated offline, compared to the previous approach requiring online migration on schema access. Invalid data in savepoints can be identified and corrected. The new State Processor API covers all variations of snapshots: savepoints, full checkpoints and incremental checkpoints. More details are available in FLIP-43
Stop-with-Savepoint (FLIP-34) # Cancelling with a savepoint is a common operation for stopping/restarting, forking or updating Flink jobs. However, the existing implementation did not guarantee output persistence to external storage systems for exactly-once sinks. To improve the end-to-end semantics when stopping a job, Flink 1.9 introduces a new SUSPEND mode to stop a job with a savepoint that is consistent with the emitted data. You can suspend a job with Flink’s CLI client as follows:
bin/flink stop -p [:targetDirectory] :jobId The final job state is set to FINISHED on success, allowing users to detect failures of the requested operation.
More details are available in FLIP-34
Flink WebUI Rework # After a discussion about modernizing the internals of Flink’s WebUI, this component was reconstructed using the latest stable version of Angular — basically, a bump from Angular 1.x to 7.x. The redesigned version is the default in 1.9.0, however there is a link to switch to the old WebUI.
Note: Moving forward, feature parity for the old version of the WebUI will not be guaranteed.
Preview of the new Blink SQL Query Processor # Following the donation of Blink to Apache Flink, the community worked on integrating Blink’s query optimizer and runtime for the Table API and SQL. As a first step, we refactored the monolithic flink-table module into smaller modules (FLIP-32). This resulted in a clear separation of and well-defined interfaces between the Java and Scala API modules and the optimizer and runtime modules.
Next, we extended Blink’s planner to implement the new optimizer interface such that there are now two pluggable query processors to execute Table API and SQL statements: the pre-1.9 Flink processor and the new Blink-based query processor. The Blink-based query processor offers better SQL coverage (full TPC-H coverage in 1.9, TPC-DS coverage is planned for the next release) and improved performance for batch queries as the result of more extensive query optimization (cost-based plan selection and more optimization rules), improved code-generation, and tuned operator implementations. The Blink-based query processor also provides a more powerful streaming runner, with some new features (e.g. dimension table join, TopN, deduplication) and optimizations to solve data-skew in aggregation and more useful built-in functions.
Note: The semantics and set of supported operations of the query processors are mostly, but not fully aligned.
However, the integration of Blink’s query processor is not fully completed yet. Therefore, the pre-1.9 Flink processor is still the default processor in Flink 1.9 and recommended for production settings. You can enable the Blink processor by configuring it via the EnvironmentSettings when creating a TableEnvironment. The selected processor must be on the classpath of the executing Java process. For cluster setups, both query processors are automatically loaded with the default configuration. When running a query from your IDE you need to explicitly add a planner dependency to your project.
Other Improvements to the Table API and SQL # Besides the exciting progress around the Blink planner, the community worked on a whole set of other improvements to these interfaces, including:
Scala-free Table API and SQL for Java users (FLIP-32)
As part of the refactoring and splitting of the flink-table module, two separate API modules for Java and Scala were created. For Scala users, nothing really changes, but Java users can use the Table API and/or SQL now without pulling in a Scala dependency.
Rework of the Table API Type System (FLIP-37)
The community implemented a new data type system to detach the Table API from Flink’s TypeInformation class and improve its compliance with the SQL standard. This is still a work in progress and expected to be completed in the next release. In Flink 1.9, UDFs are―among other things―not ported to the new type system yet.
Multi-column and Multi-row Transformations for Table API (FLIP-29)
The functionality of the Table API was extended with a set of transformations that support multi-row and/or multi-column inputs and outputs. These transformations significantly ease the implementation of processing logic that would be cumbersome to implement with relational operators.
New, Unified Catalog APIs (FLIP-30)
We reworked the catalog APIs to store metadata and unified the handling of internal and external catalogs. This effort was mainly initiated as a prerequisite for the Hive integration (see below), but improves the overall convenience of managing catalog metadata in Flink. Besides improving the catalog interfaces, we also extended their functionality. Previously table definitions for Table API or SQL queries were volatile. With Flink 1.9, the metadata of tables which are registered with a SQL DDL statement can be persisted in a catalog. This means you can add a table that is backed by a Kafka topic to a Metastore catalog and from then on query this table whenever your catalog is connected to Metastore.
DDL Support in the SQL API (FLINK-10232)
Up to this point, Flink SQL only supported DML statements (e.g. SELECT, INSERT). External tables (table sources and sinks) had to be registered via Java/Scala code or configuration files. For 1.9, we added support for SQL DDL statements to register and remove tables and views (CREATE TABLE, DROP TABLE). However, we did not add stream-specific syntax extensions to define timestamp extraction and watermark generation, yet. Full support for streaming use cases is planned for the next release.
Preview of Full Hive Integration (FLINK-10556) # Apache Hive is widely used in Hadoop’s ecosystem to store and query large amounts of structured data. Besides being a query processor, Hive features a catalog called Metastore to manage and organize large datasets. A common integration point for query processors is to integrate with Hive’s Metastore in order to be able to tap into the data managed by Hive.
Recently, the community started implementing an external catalog for Flink’s Table API and SQL that connects to Hive’s Metastore. In Flink 1.9, users will be able to query and process all data that is stored in Hive. As described earlier, you will also be able to persist metadata of Flink tables in Metastore. Moreover, the Hive integration includes support to use Hive’s UDFs in Flink Table API or SQL queries. More details are available in FLINK-10556.
While, previously, table definitions for Table API or SQL queries were always volatile, the new catalog connector additionally allows persisting a table in Metastore that is created with a SQL DDL statement (see above). This means that you connect to Metastore and register a table that is, for example, backed by a Kafka topic. From now on, you can query that table whenever your catalog is connected to Metastore.
Please note that the Hive support in Flink 1.9 is experimental. We are planning to stabilize these features for the next release and are looking forward to your feedback.
Preview of the new Python Table API (FLIP-38) # This release also introduces a first version of a Python Table API (FLIP-38). This marks the start towards our goal of bringing full-fledged Python support to Flink. The feature was designed as a slim Python API wrapper around the Table API, basically translating Python Table API method calls into Java Table API calls. In the initial version that ships with Flink 1.9, the Python Table API does not support UDFs yet, but just standard relational operations. Support for UDFs implemented in Python is on the roadmap for future releases.
If you’d like to try the new Python API, you have to manually install PyFlink. From there, you can have a look at this walkthrough or explore it on your own. The community is currently working on preparing a pyflink Python package that will be made available for installation via pip.
Important Changes # The Table API and SQL are now part of the default configuration of the Flink distribution. Before, the Table API and SQL had to be enabled by moving the corresponding JAR file from ./opt to ./lib. The machine learning library (flink-ml) has been removed in preparation for FLIP-39. The old DataSet and DataStream Python APIs have been removed in favor of FLIP-38. Flink can be compiled and run on Java 9. Note that certain components interacting with external systems (connectors, filesystems, reporters) may not work since the respective projects may have skipped Java 9 support. Release Notes # Please review the release notes for a more detailed list of changes and new features if you plan to upgrade your Flink setup to Flink 1.9.0.
List of Contributors # We would like to thank all contributors who have made this release possible:
Abdul Qadeer (abqadeer), Aitozi, Alberto Romero, Aleksey Pak, Alexander Fedulov, Alice Yan, Aljoscha Krettek, Aloys, Andrew Duffy, Andrey Zagrebin, Ankur, Artsem Semianenka, Benchao Li, Biao Liu, Bo WANG, Bowen L, Chesnay Schepler, Clark Yang, Congxian Qiu, Cristian, Danny Chan, David Moravek, Dawid Wysakowicz, Dian Fu, EronWright, Fabian Hueske, Fabio Lombardelli, Fokko Driesprong, Gao Yun, Gary Yao, Gen Luo, Gyula Fora, Hequn Cheng, Hongtao Zhang, Huang Xingbo, HuangXingBo, Hugo Da Cruz Louro, Humberto Rodríguez A, Hwanju Kim, Igal Shilman, Jamie Grier, Jark Wu, Jason, Jasper Yue, Jeff Zhang, Jiangjie (Becket) Qin, Jiezhi.G, Jincheng Sun, Jing Zhang, Jingsong Lee, Juan Gentile, Jungtaek Lim, Kailash Dayanand, Kevin Bohinski, Konstantin Knauf, Konstantinos Papadopoulos, Kostas Kloudas, Kurt Young, Lakshmi, Lakshmi Gururaja Rao, Leeviiii, LouisXu, Maximilian Michels, Nico Kruber, Niels Basjes, Paul Lam, PengFei Li, Peter Huang, Pierre Zemb, Piotr Nowojski, Piyush Narang, Richard Deurwaarder, Robert Metzger, Robert Stoll, Romano Vacca, Rong Rong, Rui Li, Ryantaocer, Scott Mitchell, Seth Wiesman, Shannon Carey, Shimin Yang, Stefan Richter, Stephan Ewen, Stephen Connolly, Steven Wu, SuXingLee, TANG Wen-hui, Thomas Weise, Till Rohrmann, Timo Walther, Tom Goong, TsReaper, Tzu-Li (Gordon) Tai, Ufuk Celebi, Victor Wong, WangHengwei, Wei Zhong, WeiZhong94, Xintong Song, Xpray, XuQianJin-Stars, Xuefu Zhang, Xupingyong, Yangze Guo, Yu Li, Yun Gao, Yun Tang, Zhanchun Zhang, Zhenghua Gao, Zhijiang, Zhu Zhu, Zili Chen, aloys, arganzheng, azagrebin, bd2019us, beyond1920, biao.liub, blueszheng, boshu Zheng, chenqi, chummyhe89, chunpinghe, dcadmin, dianfu, godfrey he, guanghui01.rong, hehuiyuan, hello, hequn8128, jackyyin, joongkeun.yang, klion26, lamber-ken, leesf, liguowei, lincoln-lil, liyafan82, luoqi, mans2singh, maqingxiang, maxin, mjl, okidogi, ozan, potseluev, qiangsi.lq, qiaoran, robbinli, shaoxuan-wang, shengqian.zhou, shenlang.sl, shuai-xu, sunhaibotb, tianchen, tianchen92, tison, tom_gong, vinoyang, vthinkxie, wanggeng3, wenhuitang, winifredtamg, xl38154, xuyang1706, yangfei5, yanghua, yuzhao.cyz, zhangxin516, zhangxinxing, zhaofaxian, zhijiang, zjuwangg, 林小铂, 黄培松, 时无两丶.
`}),e.add({id:182,href:"/2019/07/23/flink-network-stack-vol.-2-monitoring-metrics-and-that-backpressure-thing/",title:"Flink Network Stack Vol. 2: Monitoring, Metrics, and that Backpressure Thing",section:"Flink Blog",content:` In a previous blog post, we presented how Flink’s network stack works from the high-level abstractions to the low-level details. This second blog post in the series of network stack posts extends on this knowledge and discusses monitoring network-related metrics to identify effects such as backpressure or bottlenecks in throughput and latency. Although this post briefly covers what to do with backpressure, the topic of tuning the network stack will be further examined in a future post. If you are unfamiliar with the network stack we highly recommend reading the network stack deep-dive first and then continuing here.
Monitoring # Probably the most important part of network monitoring is monitoring backpressure, a situation where a system is receiving data at a higher rate than it can process¹. Such behaviour will result in the sender being backpressured and may be caused by two things:
The receiver is slow.
This can happen because the receiver is backpressured itself, is unable to keep processing at the same rate as the sender, or is temporarily blocked by garbage collection, lack of system resources, or I/O.
The network channel is slow.
Even though in such case the receiver is not (directly) involved, we call the sender backpressured due to a potential oversubscription on network bandwidth shared by all subtasks running on the same machine. Beware that, in addition to Flink’s network stack, there may be more network users, such as sources and sinks, distributed file systems (checkpointing, network-attached storage), logging, and metrics. A previous capacity planning blog post provides some more insights.
1 In case you are unfamiliar with backpressure and how it interacts with Flink, we recommend reading through this blog post on backpressure from 2015.
If backpressure occurs, it will bubble upstream and eventually reach your sources and slow them down. This is not a bad thing per-se and merely states that you lack resources for the current load. However, you may want to improve your job so that it can cope with higher loads without using more resources. In order to do so, you need to find (1) where (at which task/operator) the bottleneck is and (2) what is causing it. Flink offers two mechanisms for identifying where the bottleneck is: directly via Flink’s web UI and its backpressure monitor, or indirectly through some of the network metrics. Flink’s web UI is likely the first entry point for a quick troubleshooting but has some disadvantages that we will explain below. On the other hand, Flink’s network metrics are better suited for continuous monitoring and reasoning about the exact nature of the bottleneck causing backpressure. We will cover both in the sections below. In both cases, you need to identify the origin of backpressure from the sources to the sinks. Your starting point for the current and future investigations will most likely be the operator after the last one that is experiencing backpressure. This specific operator is also highly likely to cause the backpressure in the first place.
Backpressure Monitor # The backpressure monitor is only exposed via Flink’s web UI². Since it&rsquo;s an active component that is only triggered on request, it is currently not available via metrics. The backpressure monitor samples the running tasks&rsquo; threads on all TaskManagers via Thread.getStackTrace() and computes the number of samples where tasks were blocked on a buffer request. These tasks were either unable to send network buffers at the rate they were produced, or the downstream task(s) were slow at processing them and gave no credits for sending. The backpressure monitor will show the ratio of blocked to total requests. Since some backpressure is considered normal / temporary, it will show a status of
OK for ratio ≤ 0.10, LOW for 0.10 &lt; Ratio ≤ 0.5, and HIGH for 0.5 &lt; Ratio ≤ 1. Although you can tune things like the refresh-interval, the number of samples, or the delay between samples, normally, you would not need to touch these since the defaults already give good-enough results.
2 You may also access the backpressure monitor via the REST API: /jobs/:jobid/vertices/:vertexid/backpressure
The backpressure monitor can help you find where (at which task/operator) backpressure originates from. However, it does not support you in further reasoning about the causes of it. Additionally, for larger jobs or higher parallelism, the backpressure monitor becomes too crowded to use and may also take some time to gather all information from all TaskManagers. Please also note that sampling may affect your running job’s performance. Network Metrics # Network and task I/O metrics are more lightweight than the backpressure monitor and are continuously published for each running job. We can leverage those and get even more insights, not only for backpressure monitoring. The most relevant metrics for users are:
up to Flink 1.8: outPoolUsage, inPoolUsage
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools. While interpreting inPoolUsage in Flink 1.5 - 1.8 with credit-based flow control, please note that this only relates to floating buffers (exclusive buffers are not part of the pool).
Flink 1.9 and above: outPoolUsage, inPoolUsage, floatingBuffersUsage, exclusiveBuffersUsage
An estimate on the ratio of buffers used vs. buffers available in the respective local buffer pools. Starting with Flink 1.9, inPoolUsage is the sum of floatingBuffersUsage and exclusiveBuffersUsage.
numRecordsOut, numRecordsIn
Each metric comes with two scopes: one scoped to the operator and one scoped to the subtask. For network monitoring, the subtask-scoped metric is relevant and shows the total number of records it has sent/received. You may need to further look into these figures to extract the number of records within a certain time span or use the equivalent …PerSecond metrics.
numBytesOut, numBytesInLocal, numBytesInRemote
The total number of bytes this subtask has emitted or read from a local/remote source. These are also available as meters via …PerSecond metrics.
numBuffersOut, numBuffersInLocal, numBuffersInRemote
Similar to numBytes… but counting the number of network buffers.
Warning For the sake of completeness and since they have been used in the past, we will briefly look at the \`outputQueueLength\` and \`inputQueueLength\` metrics. These are somewhat similar to the \`[out,in]PoolUsage\` metrics but show the number of buffers sitting in a sender subtask’s output queues and in a receiver subtask’s input queues, respectively. Reasoning about absolute numbers of buffers, however, is difficult and there is also a special subtlety with local channels: since a local input channel does not have its own queue (it works with the output queue directly), its value will always be \`0\` for that channel (see [FLINK-12576](https://issues.apache.org/jira/browse/FLINK-12576)) and for the case where you only have local input channels, then \`inputQueueLength = 0\`. Overall, we discourage the use of outputQueueLength and inputQueueLength because their interpretation highly depends on the current parallelism of the operator and the configured numbers of exclusive and floating buffers. Instead, we recommend using the various *PoolUsage metrics which even reveal more detailed insight.
Note If you reason about buffer usage, please keep the following in mind: Any outgoing channel which has been used at least once will always occupy one buffer (since Flink 1.5). up to Flink 1.8: This buffer (even if empty!) was always counted as a backlog of 1 and thus receivers tried to reserve a floating buffer for it. Flink 1.9 and above: A buffer is only counted in the backlog if it is ready for consumption, i.e. it is full or was flushed (see FLINK-11082) The receiver will only release a received buffer after deserialising the last record in it. The following sections make use of and combine these metrics to reason about backpressure and resource usage / efficiency with respect to throughput. A separate section will detail latency related metrics.
Backpressure # Backpressure may be indicated by two different sets of metrics: (local) buffer pool usages as well as input/output queue lengths. They provide a different level of granularity but, unfortunately, none of these are exhaustive and there is room for interpretation. Because of the inherent problems with interpreting these queue lengths we will focus on the usage of input and output pools below which also provides more detail.
If a subtask’s outPoolUsage is 100%, it is backpressured. Whether the subtask is already blocking or still writing records into network buffers depends on how full the buffers are, that the RecordWriters are currently writing into.
This is different to what the backpressure monitor is showing!
An inPoolUsage of 100% means that all floating buffers are assigned to channels and eventually backpressure will be exercised upstream. These floating buffers are in either of the following conditions: they are reserved for future use on a channel due to an exclusive buffer being utilised (remote input channels always try to maintain #exclusive buffers credits), they are reserved for a sender’s backlog and wait for data, they may contain data and are enqueued in an input channel, or they may contain data and are being read by the receiver’s subtask (one record at a time).
up to Flink 1.8: Due to FLINK-11082, an inPoolUsage of 100% is quite common even in normal situations.
Flink 1.9 and above: If inPoolUsage is constantly around 100%, this is a strong indicator for exercising backpressure upstream.
The following table summarises all combinations and their interpretation. Bear in mind, though, that backpressure may be minor or temporary (no need to look into it), on particular channels only, or caused by other JVM processes on a particular TaskManager, such as GC, synchronisation, I/O, resource shortage, instead of a specific subtask.
outPoolUsage low outPoolUsage high inPoolUsage low (backpressured, temporary situation: upstream is not backpressured yet or not anymore) inPoolUsage high
(Flink 1.9+) if all upstream tasks’outPoolUsage are low: (may eventually cause backpressure) (backpressured by downstream task(s) or network, probably forwarding backpressure upstream) if any upstream task’soutPoolUsage is high: (may exercise backpressure upstream and may be the source of backpressure) We may even reason more about the cause of backpressure by looking at the network metrics of the subtasks of two consecutive tasks: If all subtasks of the receiver task have low inPoolUsage values and any upstream subtask’s outPoolUsage is high, then there may be a network bottleneck causing backpressure. Since network is a shared resource among all subtasks of a TaskManager, this may not directly originate from this subtask, but rather from various concurrent operations, e.g. checkpoints, other streams, external connections, or other TaskManagers/processes on the same machine. Backpressure can also be caused by all parallel instances of a task or by a single task instance. The first usually happens because the task is performing some time consuming operation that applies to all input partitions. The latter is usually the result of some kind of skew, either data skew or resource availability/allocation skew. In either case, you can find some hints on how to handle such situations in the What to do with backpressure? box below.
### Flink 1.9 and above {:.no_toc} If floatingBuffersUsage is not 100%, it is unlikely that there is backpressure. If it is 100% and any upstream task is backpressured, it suggests that this input is exercising backpressure on either a single, some or all input channels. To differentiate between those three situations you can use exclusiveBuffersUsage: Assuming that floatingBuffersUsage is around 100%, the higher the exclusiveBuffersUsage the more input channels are backpressured. In an extreme case of exclusiveBuffersUsage being close to 100%, it means that all channels are backpressured. The relation between \`exclusiveBuffersUsage\`, \`floatingBuffersUsage\`, and the upstream tasks' \`outPoolUsage\` is summarised in the following table and extends on the table above with \`inPoolUsage = floatingBuffersUsage + exclusiveBuffersUsage\`: exclusiveBuffersUsage low exclusiveBuffersUsage high floatingBuffersUsage low +
all upstream outPoolUsage low -3 floatingBuffersUsage low +
any upstream outPoolUsage high (potential network bottleneck) -3 floatingBuffersUsage high +
all upstream outPoolUsage low (backpressure eventually appears on only some of the input channels) (backpressure eventually appears on most or all of the input channels) floatingBuffersUsage high +
any upstream outPoolUsage high (backpressure on only some of the input channels) (backpressure on most or all of the input channels) 3 this should not happen
Resource Usage / Throughput # Besides the obvious use of each individual metric mentioned above, there are also a few combinations providing useful insight into what is happening in the network stack:
Low throughput with frequent outPoolUsage values around 100% but low inPoolUsage on all receivers is an indicator that the round-trip-time of our credit-notification (depends on your network’s latency) is too high for the default number of exclusive buffers to make use of your bandwidth. Consider increasing the buffers-per-channel parameter or try disabling credit-based flow control to verify.
Combining numRecordsOut and numBytesOut helps identifying average serialised record sizes which supports you in capacity planning for peak scenarios.
If you want to reason about buffer fill rates and the influence of the output flusher, you may combine numBytesInRemote with numBuffersInRemote. When tuning for throughput (and not latency!), low buffer fill rates may indicate reduced network efficiency. In such cases, consider increasing the buffer timeout. Please note that, as of Flink 1.8 and 1.9, numBuffersOut only increases for buffers getting full or for an event cutting off a buffer (e.g. a checkpoint barrier) and may lag behind. Please also note that reasoning about buffer fill rates on local channels is unnecessary since buffering is an optimisation technique for remote channels with limited effect on local channels.
You may also separate local from remote traffic using numBytesInLocal and numBytesInRemote but in most cases this is unnecessary.
### What to do with Backpressure? {:.no_toc} Assuming that you identified where the source of backpressure — a bottleneck — is located, the next step is to analyse why this is happening. Below, we list some potential causes of backpressure from the more basic to the more complex ones. We recommend to check the basic causes first, before diving deeper on the more complex ones and potentially drawing false conclusions.
Please also recall that backpressure might be temporary and the result of a load spike, checkpointing, or a job restart with a data backlog waiting to be processed. In that case, you can often just ignore it. Alternatively, keep in mind that the process of analysing and solving the issue can be affected by the intermittent nature of your bottleneck. Having said that, here are a couple of things to check.
System Resources # Firstly, you should check the incriminated machines’ basic resource usage like CPU, network, or disk I/O. If some resource is fully or heavily utilised you can do one of the following:
Try to optimise your code. Code profilers are helpful in this case. Tune Flink for that specific resource. Scale out by increasing the parallelism and/or increasing the number of machines in the cluster. Garbage Collection # Oftentimes, performance issues arise from long GC pauses. You can verify whether you are in such a situation by either printing debug GC logs (via -XX:+PrintGCDetails) or by using some memory/GC profilers. Since dealing with GC issues is highly application-dependent and independent of Flink, we will not go into details here (Oracle&rsquo;s Garbage Collection Tuning Guide or Plumbr’s Java Garbage Collection handbook seem like a good start).
CPU/Thread Bottleneck # Sometimes a CPU bottleneck might not be visible at first glance if one or a couple of threads are causing the CPU bottleneck while the CPU usage of the overall machine remains relatively low. For instance, a single CPU-bottlenecked thread on a 48-core machine would result in only 2% CPU use. Consider using code profilers for this as they can identify hot threads by showing each threads&rsquo; CPU usage, for example.
Thread Contention # Similarly to the CPU/thread bottleneck issue above, a subtask may be bottlenecked due to high thread contention on shared resources. Again, CPU profilers are your best friend here! Consider looking for synchronisation overhead / lock contention in user code — although adding synchronisation in user code should be avoided and may even be dangerous! Also consider investigating shared system resources. The default JVM’s SSL implementation, for example, can become contented around the shared /dev/urandom resource.
Load Imbalance # If your bottleneck is caused by data skew, you can try to remove it or mitigate its impact by changing the data partitioning to separate heavy keys or by implementing local/pre-aggregation.
This list is far from exhaustive. Generally, in order to reduce a bottleneck and thus backpressure, first analyse where it is happening and then find out why. The best place to start reasoning about the “why” is by checking what resources are fully utilised. Latency Tracking # Tracking latencies at the various locations they may occur is a topic of its own. In this section, we will focus on the time records wait inside Flink’s network stack — including the system’s network connections. In low throughput scenarios, these latencies are influenced directly by the output flusher via the buffer timeout parameter or indirectly by any application code latencies. When processing a record takes longer than expected or when (multiple) timers fire at the same time — and block the receiver from processing incoming records — the time inside the network stack for following records is extended dramatically. We highly recommend adding your own metrics to your Flink job for better latency tracking in your job’s components and a broader view on the cause of delays.
Flink offers some support for tracking the latency of records passing through the system (outside of user code). However, this is disabled by default (see below why!) and must be enabled by setting a latency tracking interval either in Flink’s configuration via metrics.latency.interval or via ExecutionConfig#setLatencyTrackingInterval(). Once enabled, Flink will collect latency histograms based on the granularity defined via metrics.latency.granularity:
single: one histogram for each operator subtask operator (default): one histogram for each combination of source task and operator subtask subtask: one histogram for each combination of source subtask and operator subtask (quadratic in the parallelism!) These metrics are collected through special “latency markers”: each source subtask will periodically emit a special record containing the timestamp of its creation. The latency markers then flow alongside normal records while not overtaking them on the wire or inside a buffer queue. However, a latency marker does not enter application logic and is overtaking records there. Latency markers therefore only measure the waiting time between the user code and not a full “end-to-end” latency. User code indirectly influences these waiting times, though!
Since LatencyMarkers sit in network buffers just like normal records, they will also wait for the buffer to be full or flushed due to buffer timeouts. When a channel is on high load, there is no added latency by the network buffering data. However, as soon as one channel is under low load, records and latency markers will experience an expected average delay of at most buffer_timeout / 2. This delay will add to each network connection towards a subtask and should be taken into account when analysing a subtask’s latency metric.
By looking at the exposed latency tracking metrics for each subtask, for example at the 95th percentile, you should nevertheless be able to identify subtasks which are adding substantially to the overall source-to-sink latency and continue with optimising there.
Note Flink's latency markers assume that the clocks on all machines in the cluster are in sync. We recommend setting up an automated clock synchronisation service (like NTP) to avoid false latency results. Warning Enabling latency metrics can significantly impact the performance of the cluster (in particular for \`subtask\` granularity) due to the sheer amount of metrics being added as well as the use of histograms which are quite expensive to maintain. It is highly recommended to only use them for debugging purposes. Conclusion # In the previous sections we discussed how to monitor Flink&rsquo;s network stack which primarily involves identifying backpressure: where it occurs, where it originates from, and (potentially) why it occurs. This can be executed in two ways: for simple cases and debugging sessions by using the backpressure monitor; for continuous monitoring, more in-depth analysis, and less runtime overhead by using Flink’s task and network stack metrics. Backpressure can be caused by the network layer itself but, in most cases, is caused by some subtask under high load. These two scenarios can be distinguished from one another by analysing the metrics as described above. We also provided some hints at monitoring resource usage and tracking network latencies that may add up from sources to sinks.
Stay tuned for the third blog post in the series of network stack posts that will focus on tuning techniques and anti-patterns to avoid.
`}),e.add({id:183,href:"/2019/07/02/apache-flink-1.8.1-released/",title:"Apache Flink 1.8.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.8 series.
This release includes more than 40 fixes and minor improvements for Flink 1.8.1. The list below includes a detailed list of all improvements, sub-tasks and bug fixes.
We highly recommend all users to upgrade to Flink 1.8.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.8.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.8.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10921] - Prioritize shard consumers in Kinesis Consumer by event time [FLINK-12617] - StandaloneJobClusterEntrypoint should default to random JobID for non-HA setups Bug [FLINK-9445] - scala-shell uses plain java command [FLINK-10455] - Potential Kafka producer leak in case of failures [FLINK-10941] - Slots prematurely released which still contain unconsumed data [FLINK-11059] - JobMaster may continue using an invalid slot if releasing idle slot meet a timeout [FLINK-11107] - Avoid memory stateBackend to create arbitrary folders under HA path when no checkpoint path configured [FLINK-11897] - ExecutionGraphSuspendTest does not wait for all tasks to be submitted [FLINK-11915] - DataInputViewStream skip returns wrong value [FLINK-11987] - Kafka producer occasionally throws NullpointerException [FLINK-12009] - Wrong check message about heartbeat interval for HeartbeatServices [FLINK-12042] - RocksDBStateBackend mistakenly uses default filesystem [FLINK-12112] - AbstractTaskManagerProcessFailureRecoveryTest process output logging does not work properly [FLINK-12132] - The example in /docs/ops/deployment/yarn_setup.md should be updated due to the change FLINK-2021 [FLINK-12184] - HistoryServerArchiveFetcher isn&#39;t compatible with old version [FLINK-12219] - Yarn application can&#39;t stop when flink job failed in per-job yarn cluster mode [FLINK-12247] - fix NPE when writing an archive file to a FileSystem [FLINK-12260] - Slot allocation failure by taskmanager registration timeout and race [FLINK-12296] - Data loss silently in RocksDBStateBackend when more than one operator(has states) chained in a single task [FLINK-12297] - Make ClosureCleaner recursive [FLINK-12301] - Scala value classes inside case classes cannot be serialized anymore in Flink 1.8.0 [FLINK-12342] - Yarn Resource Manager Acquires Too Many Containers [FLINK-12375] - flink-container job jar does not have read permissions [FLINK-12416] - Docker build script fails on symlink creation ln -s [FLINK-12544] - Deadlock while releasing memory and requesting segment concurrent in SpillableSubpartition [FLINK-12547] - Deadlock when the task thread downloads jars using BlobClient [FLINK-12646] - Use reserved IP as unrouteable IP in RestClientTest [FLINK-12688] - Make serializer lazy initialization thread safe in StateDescriptor [FLINK-12740] - SpillableSubpartitionTest deadlocks on Travis [FLINK-12835] - Time conversion is wrong in ManualClock [FLINK-12863] - Race condition between slot offerings and AllocatedSlotReport [FLINK-12865] - State inconsistency between RM and TM on the slot status [FLINK-12871] - Wrong SSL setup examples in docs [FLINK-12895] - TaskManagerProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure failed on travis [FLINK-12896] - TaskCheckpointStatisticDetailsHandler uses wrong value for JobID when archiving Improvement [FLINK-11126] - Filter out AMRMToken in the TaskManager credentials [FLINK-12137] - Add more proper explanation on flink streaming connectors [FLINK-12169] - Improve Javadoc of MessageAcknowledgingSourceBase [FLINK-12378] - Consolidate FileSystem Documentation [FLINK-12391] - Add timeout to transfer.sh [FLINK-12539] - StreamingFileSink: Make the class extendable to customize for different usecases Test [FLINK-12350] - RocksDBStateBackendTest doesn&#39;t cover the incremental checkpoint code path Task [FLINK-12460] - Change taskmanager.tmp.dirs to io.tmp.dirs in configuration docs `}),e.add({id:184,href:"/2019/06/26/a-practical-guide-to-broadcast-state-in-apache-flink/",title:"A Practical Guide to Broadcast State in Apache Flink",section:"Flink Blog",content:`Since version 1.5.0, Apache Flink features a new type of state which is called Broadcast State. In this post, we explain what Broadcast State is, and show an example of how it can be applied to an application that evaluates dynamic patterns on an event stream. We walk you through the processing steps and the source code to implement this application in practice.
What is Broadcast State? # The Broadcast State can be used to combine and jointly process two streams of events in a specific way. The events of the first stream are broadcasted to all parallel instances of an operator, which maintains them as state. The events of the other stream are not broadcasted but sent to individual instances of the same operator and processed together with the events of the broadcasted stream. The new broadcast state is a natural fit for applications that need to join a low-throughput and a high-throughput stream or need to dynamically update their processing logic. We will use a concrete example of the latter use case to explain the broadcast state and show its API in more detail in the remainder of this post.
Dynamic Pattern Evaluation with Broadcast State # Imagine an e-commerce website that captures the interactions of all users as a stream of user actions. The company that operates the website is interested in analyzing the interactions to increase revenue, improve the user experience, and detect and prevent malicious behavior. The website implements a streaming application that detects a pattern on the stream of user events. However, the company wants to avoid modifying and redeploying the application every time the pattern changes. Instead, the application ingests a second stream of patterns and updates its active pattern when it receives a new pattern from the pattern stream. In the following, we discuss this application step-by-step and show how it leverages the broadcast state feature in Apache Flink.
Our example application ingests two data streams. The first stream provides user actions on the website and is illustrated on the top left side of the above figure. A user interaction event consists of the type of the action (user login, user logout, add to cart, or complete payment) and the id of the user, which is encoded by color. The user action event stream in our illustration contains a logout action of User 1001 followed by a payment-complete event for User 1003, and an “add-to-cart” action of User 1002.
The second stream provides action patterns that the application will evaluate. A pattern consists of two consecutive actions. In the figure above, the pattern stream contains the following two:
Pattern #1: A user logs in and immediately logs out without browsing additional pages on the e-commerce website. Pattern #2: A user adds an item to the shopping cart and logs out without completing the purchase. Such patterns help a business in better analyzing user behavior, detecting malicious actions, and improving the website experience. For example, in the case of items being added to a shopping cart with no follow up purchase, the website team can take appropriate actions to understand better the reasons why users don’t complete a purchase and initiate specific programs to improve the website conversion (such as providing discount codes, limited free shipping offers etc.)
On the right-hand side, the figure shows three parallel tasks of an operator that ingest the pattern and user action streams, evaluate the patterns on the action stream, and emit pattern matches downstream. For the sake of simplicity, the operator in our example only evaluates a single pattern with exactly two subsequent actions. The currently active pattern is replaced when a new pattern is received from the pattern stream. In principle, the operator could also be implemented to evaluate more complex patterns or multiple patterns concurrently which could be individually added or removed.
We will describe how the pattern matching application processes the user action and pattern streams.
First a pattern is sent to the operator. The pattern is broadcasted to all three parallel tasks of the operator. The tasks store the pattern in their broadcast state. Since the broadcast state should only be updated using broadcasted data, the state of all tasks is always expected to be the same.
Next, the first user actions are partitioned on the user id and shipped to the operator tasks. The partitioning ensures that all actions of the same user are processed by the same task. The figure above shows the state of the application after the first pattern and the first three action events were consumed by the operator tasks.
When a task receives a new user action, it evaluates the currently active pattern by looking at the user’s latest and previous actions. For each user, the operator stores the previous action in the keyed state. Since the tasks in the figure above only received a single action for each user so far (we just started the application), the pattern does not need to be evaluated. Finally, the previous action in the user’s keyed state is updated to the latest action, to be able to look it up when the next action of the same user arrives.
After the first three actions are processed, the next event, the logout action of User 1001, is shipped to the task that processes the events of User 1001. When the task receives the actions, it looks up the current pattern from the broadcast state and the previous action of User 1001. Since the pattern matches both actions, the task emits a pattern match event. Finally, the task updates its keyed state by overriding the previous event with the latest action.
When a new pattern arrives in the pattern stream, it is broadcasted to all tasks and each task updates its broadcast state by replacing the current pattern with the new one.
Once the broadcast state is updated with a new pattern, the matching logic continues as before, i.e., user action events are partitioned by key and evaluated by the responsible task.
How to Implement an Application with Broadcast State? # Until now, we conceptually discussed the application and explained how it uses broadcast state to evaluate dynamic patterns over event streams. Next, we’ll show how to implement the example application with Flink’s DataStream API and the broadcast state feature.
Let’s start with the input data of the application. We have two data streams, actions, and patterns. At this point, we don’t really care where the streams come from. The streams could be ingested from Apache Kafka or Kinesis or any other system. Action and Pattern are Pojos with two fields each:
DataStream&lt;Action&gt; actions = ??? DataStream&lt;Pattern&gt; patterns = ??? Action and Pattern are Pojos with two fields each:
Action: Long userId, String action
Pattern: String firstAction, String secondAction
As a first step, we key the action stream on the userId attribute.
KeyedStream&lt;Action, Long&gt; actionsByUser = actions .keyBy((KeySelector&lt;Action, Long&gt;) action -&gt; action.userId); Next, we prepare the broadcast state. Broadcast state is always represented as MapState, the most versatile state primitive that Flink provides.
MapStateDescriptor&lt;Void, Pattern&gt; bcStateDescriptor = new MapStateDescriptor&lt;&gt;(&#34;patterns&#34;, Types.VOID, Types.POJO(Pattern.class)); Since our application only evaluates and stores a single Pattern at a time, we configure the broadcast state as a MapState with key type Void and value type Pattern. The Pattern is always stored in the MapState with null as key.
BroadcastStream&lt;Pattern&gt; bcedPatterns = patterns.broadcast(bcStateDescriptor); Using the MapStateDescriptor for the broadcast state, we apply the broadcast() transformation on the patterns stream and receive a BroadcastStream bcedPatterns.
DataStream&lt;Tuple2&lt;Long, Pattern&gt;&gt; matches = actionsByUser .connect(bcedPatterns) .process(new PatternEvaluator()); After we obtained the keyed actionsByUser stream and the broadcasted bcedPatterns stream, we connect() both streams and apply a PatternEvaluator on the connected streams. PatternEvaluator is a custom function that implements the KeyedBroadcastProcessFunction interface. It applies the pattern matching logic that we discussed before and emits Tuple2&lt;Long, Pattern&gt; records which contain the user id and the matched pattern.
public static class PatternEvaluator extends KeyedBroadcastProcessFunction&lt;Long, Action, Pattern, Tuple2&lt;Long, Pattern&gt;&gt; { // handle for keyed state (per user) ValueState&lt;String&gt; prevActionState; // broadcast state descriptor MapStateDescriptor&lt;Void, Pattern&gt; patternDesc; @Override public void open(Configuration conf) { // initialize keyed state prevActionState = getRuntimeContext().getState( new ValueStateDescriptor&lt;&gt;(&#34;lastAction&#34;, Types.STRING)); patternDesc = new MapStateDescriptor&lt;&gt;(&#34;patterns&#34;, Types.VOID, Types.POJO(Pattern.class)); } /** * Called for each user action. * Evaluates the current pattern against the previous and * current action of the user. */ @Override public void processElement( Action action, ReadOnlyContext ctx, Collector&lt;Tuple2&lt;Long, Pattern&gt;&gt; out) throws Exception { // get current pattern from broadcast state Pattern pattern = ctx .getBroadcastState(this.patternDesc) // access MapState with null as VOID default value .get(null); // get previous action of current user from keyed state String prevAction = prevActionState.value(); if (pattern != null &amp;&amp; prevAction != null) { // user had an action before, check if pattern matches if (pattern.firstAction.equals(prevAction) &amp;&amp; pattern.secondAction.equals(action.action)) { // MATCH out.collect(new Tuple2&lt;&gt;(ctx.getCurrentKey(), pattern)); } } // update keyed state and remember action for next pattern evaluation prevActionState.update(action.action); } /** * Called for each new pattern. * Overwrites the current pattern with the new pattern. */ @Override public void processBroadcastElement( Pattern pattern, Context ctx, Collector&lt;Tuple2&lt;Long, Pattern&gt;&gt; out) throws Exception { // store the new pattern by updating the broadcast state BroadcastState&lt;Void, Pattern&gt; bcState = ctx.getBroadcastState(patternDesc); // storing in MapState with null as VOID default value bcState.put(null, pattern); } } The KeyedBroadcastProcessFunction interface provides three methods to process records and emit results.
processBroadcastElement() is called for each record of the broadcasted stream. In our PatternEvaluator function, we simply put the received Pattern record in to the broadcast state using the null key (remember, we only store a single pattern in the MapState). processElement() is called for each record of the keyed stream. It provides read-only access to the broadcast state to prevent modification that result in different broadcast states across the parallel instances of the function. The processElement() method of the PatternEvaluator retrieves the current pattern from the broadcast state and the previous action of the user from the keyed state. If both are present, it checks whether the previous and current action match with the pattern and emits a pattern match record if that is the case. Finally, it updates the keyed state to the current user action. onTimer() is called when a previously registered timer fires. Timers can be registered in the processElement method and are used to perform computations or to clean up state in the future. We did not implement this method in our example to keep the code concise. However, it could be used to remove the last action of a user when the user was not active for a certain period of time to avoid growing state due to inactive users. You might have noticed the context objects of the KeyedBroadcastProcessFunction’s processing method. The context objects give access to additional functionality such as:
The broadcast state (read-write or read-only, depending on the method), A TimerService, which gives access to the record’s timestamp, the current watermark, and which can register timers, The current key (only available in processElement()), and A method to apply a function the keyed state of each registered key (only available in processBroadcastElement()) The KeyedBroadcastProcessFunction has full access to Flink state and time features just like any other ProcessFunction and hence can be used to implement sophisticated application logic. Broadcast state was designed to be a versatile feature that adapts to different scenarios and use cases. Although we only discussed a fairly simple and restricted application, you can use broadcast state in many ways to implement the requirements of your application.
Conclusion # In this blog post, we walked you through an example application to explain what Apache Flink’s broadcast state is and how it can be used to evaluate dynamic patterns on event streams. We’ve also discussed the API and showed the source code of our example application.
We invite you to check the documentation of this feature and provide feedback or suggestions for further improvements through our mailing list.
`}),e.add({id:185,href:"/2019/06/05/a-deep-dive-into-flinks-network-stack/",title:"A Deep-Dive into Flink's Network Stack",section:"Flink Blog",content:` Flink’s network stack is one of the core components that make up the flink-runtime module and sit at the heart of every Flink job. It connects individual work units (subtasks) from all TaskManagers. This is where your streamed-in data flows through and it is therefore crucial to the performance of your Flink job for both the throughput as well as latency you observe. In contrast to the coordination channels between TaskManagers and JobManagers which are using RPCs via Akka, the network stack between TaskManagers relies on a much lower-level API using Netty.
This blog post is the first in a series of posts about the network stack. In the sections below, we will first have a high-level look at what abstractions are exposed to the stream operators and then go into detail on the physical implementation and various optimisations Flink did. We will briefly present the result of these optimisations and Flink’s trade-off between throughput and latency. Future blog posts in this series will elaborate more on monitoring and metrics, tuning parameters, and common anti-patterns.
Logical View # Flink’s network stack provides the following logical view to the subtasks when communicating with each other, for example during a network shuffle as required by a keyBy().
It abstracts over the different settings of the following three concepts:
Subtask output type (ResultPartitionType):
pipelined (bounded or unbounded): Sending data downstream as soon as it is produced, potentially one-by-one, either as a bounded or unbounded stream of records. blocking: Sending data downstream only when the full result was produced. Scheduling type:
all at once (eager): Deploy all subtasks of the job at the same time (for streaming applications). next stage on first output (lazy): Deploy downstream tasks as soon as any of their producers generated output. next stage on complete output: Deploy downstream tasks when any or all of their producers have generated their full output set. Transport:
high throughput: Instead of sending each record one-by-one, Flink buffers a bunch of records into its network buffers and sends them altogether. This reduces the costs per record and leads to higher throughput. low latency via buffer timeout: By reducing the timeout of sending an incompletely filled buffer, you may sacrifice throughput for latency. We will have a look at the throughput and low latency optimisations in the sections below which look at the physical layers of the network stack. For this part, let us elaborate a bit more on the output and scheduling types. First of all, it is important to know that the subtask output type and the scheduling type are closely intertwined making only specific combinations of the two valid.
Pipelined result partitions are streaming-style outputs which need a live target subtask to send data to. The target can be scheduled before results are produced or at first output. Batch jobs produce bounded result partitions while streaming jobs produce unbounded results.
Batch jobs may also produce results in a blocking fashion, depending on the operator and connection pattern that is used. In that case, the complete result must be produced first before the receiving task can be scheduled. This allows batch jobs to work more efficiently and with lower resource usage.
The following table summarises the valid combinations: Output Type Scheduling Type Applies to… pipelined, unbounded all at once Streaming jobs next stage on first output n/a¹ pipelined, bounded all at once n/a² next stage on first output Batch jobs blocking next stage on complete output Batch jobs 1 Currently not used by Flink. 2 This may become applicable to streaming jobs once the Batch/Streaming unification is done.
Additionally, for subtasks with more than one input, scheduling start in two ways: after *all* or after *any* input producers to have produced a record/their complete dataset. For tuning the output types and scheduling decisions in batch jobs, please have a look at [ExecutionConfig#setExecutionMode()](//nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setExecutionMode-org.apache.flink.api.common.ExecutionMode-) - and [ExecutionMode](//nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionMode.html#enum.constant.detail) in particular - as well as [ExecutionConfig#setDefaultInputDependencyConstraint()](//nightlies.apache.org/flink/flink-docs-release-1.8/api/java/org/apache/flink/api/common/ExecutionConfig.html#setDefaultInputDependencyConstraint-org.apache.flink.api.common.InputDependencyConstraint-). Physical Transport # In order to understand the physical data connections, please recall that, in Flink, different tasks may share the same slot via slot sharing groups. TaskManagers may also provide more than one slot to allow multiple subtasks of the same task to be scheduled onto the same TaskManager.
For the example pictured below, we will assume a parallelism of 4 and a deployment with two task managers offering 2 slots each. TaskManager 1 executes subtasks A.1, A.2, B.1, and B.2 and TaskManager 2 executes subtasks A.3, A.4, B.3, and B.4. In a shuffle-type connection between task A and task B, for example from a keyBy(), there are 2x4 logical connections to handle on each TaskManager, some of which are local, some remote: B.1 B.2 B.3 B.4 A.1 local remote A.2 A.3 remote local A.4 Each (remote) network connection between different tasks will get its own TCP channel in Flink’s network stack. However, if different subtasks of the same task are scheduled onto the same TaskManager, their network connections towards the same TaskManagers will be multiplexed and share a single TCP channel for reduced resource usage. In our example, this would apply to A.1 → B.3, A.1 → B.4, as well as A.2 → B.3, and A.2 → B.4 as pictured below: The results of each subtask are called ResultPartition, each split into separate ResultSubpartitions — one for each logical channel. At this point in the stack, Flink is not dealing with individual records anymore but instead with a group of serialised records assembled together into network buffers. The number of buffers available to each subtask in its own local buffer pool (one per sending and receiving side each) is limited to at most
#channels * buffers-per-channel + floating-buffers-per-gate The total number of buffers on a single TaskManager usually does not need configuration. See the Configuring the Network Buffers documentation for details on how to do so if needed.
Inflicting Backpressure (1) # Whenever a subtask’s sending buffer pool is exhausted — buffers reside in either a result subpartition&rsquo;s buffer queue or inside the lower, Netty-backed network stack — the producer is blocked, cannot continue, and experiences backpressure. The receiver works in a similar fashion: any incoming Netty buffer in the lower network stack needs to be made available to Flink via a network buffer. If there is no network buffer available in the appropriate subtask&rsquo;s buffer pool, Flink will stop reading from this channel until a buffer becomes available. This would effectively backpressure all sending subtasks on this multiplex and therefore also throttle other receiving subtasks. The following picture illustrates this for an overloaded subtask B.4 which would cause backpressure on the multiplex and also stop subtask B.3 from receiving and processing further buffers, even though it still has capacity.
To prevent this situation from even happening, Flink 1.5 introduced its own flow control mechanism.
Credit-based Flow Control # Credit-based flow control makes sure that whatever is “on the wire” will have capacity at the receiver to handle. It is based on the availability of network buffers as a natural extension of the mechanisms Flink had before. Instead of only having a shared local buffer pool, each remote input channel now has its own set of exclusive buffers. Conversely, buffers in the local buffer pool are called floating buffers as they will float around and are available to every input channel.
Receivers will announce the availability of buffers as credits to the sender (1 buffer = 1 credit). Each result subpartition will keep track of its channel credits. Buffers are only forwarded to the lower network stack if credit is available and each sent buffer reduces the credit score by one. In addition to the buffers, we also send information about the current backlog size which specifies how many buffers are waiting in this subpartition’s queue. The receiver will use this to ask for an appropriate number of floating buffers for faster backlog processing. It will try to acquire as many floating buffers as the backlog size but this may not always be possible and we may get some or no buffers at all. The receiver will make use of the retrieved buffers and will listen for further buffers becoming available to continue. Credit-based flow control will use buffers-per-channel to specify how many buffers are exclusive (mandatory) and floating-buffers-per-gate for the local buffer pool (optional3) thus achieving the same buffer limit as without flow control. The default values for these two parameters have been chosen so that the maximum (theoretical) throughput with flow control is at least as good as without flow control, given a healthy network with usual latencies. You may need to adjust these depending on your actual round-trip-time and bandwidth. 3If there are not enough buffers available, each buffer pool will get the same share of the globally available ones (± 1).
Inflicting Backpressure (2) # As opposed to the receiver&rsquo;s backpressure mechanisms without flow control, credits provide a more direct control: If a receiver cannot keep up, its available credits will eventually hit 0 and stop the sender from forwarding buffers to the lower network stack. There is backpressure on this logical channel only and there is no need to block reading from a multiplexed TCP channel. Other receivers are therefore not affected in processing available buffers.
What do we Gain? Where is the Catch? # Since, with flow control, a channel in a multiplex cannot block another of its logical channels, the overall resource utilisation should increase. In addition, by having full control over how much data is “on the wire”, we are also able to improve checkpoint alignments: without flow control, it would take a while for the channel to fill the network stack’s internal buffers and propagate that the receiver is not reading anymore. During that time, a lot of buffers could be sitting around. Any checkpoint barrier would have to queue up behind these buffers and would thus have to wait until all of those have been processed before it can start (“Barriers never overtake records!”).
However, the additional announce messages from the receiver may come at some additional costs, especially in setup using SSL-encrypted channels. Also, a single input channel cannot make use of all buffers in the buffer pool because exclusive buffers are not shared. It can also not start right away with sending as much data as is available so that during ramp-up (if you are producing data faster than announcing credits in return) it may take longer to send data through. While this may affect your job’s performance, it is usually better to have flow control because of all its advantages. You may want to increase the number of exclusive buffers via buffers-per-channel at the cost of using more memory. The overall memory use compared to the previous implementation, however, may still be lower because lower network stacks do not need to buffer much data any more since we can always transfer that to Flink immediately.
There is one more thing you may notice when using credit-based flow control: since we buffer less data between the sender and receiver, you may experience backpressure earlier. This is, however, desired and you do not really get any advantage by buffering more data. If you want to buffer more but keep flow control, you could consider increasing the number of floating buffers via floating-buffers-per-gate.
Advantages Disadvantages • better resource utilisation with data skew in multiplexed connections • improved checkpoint alignment
• reduced memory use (less data in lower network layers) • additional credit-announce messages
• additional backlog-announce messages (piggy-backed with buffer messages, almost no overhead)
• potential round-trip latency • backpressure appears earlier Note If you need to turn off credit-based flow control, you can add this to your \`flink-conf.yaml\`: taskmanager.network.credit-model: false
This parameter, however, is deprecated and will eventually be removed along with the non-credit-based flow control code.
Writing Records into Network Buffers and Reading them again # The following picture extends the slightly more high-level view from above with further details of the network stack and its surrounding components, from the collection of a record in your sending operator to the receiving operator getting it: After creating a record and passing it along, for example via Collector#collect(), it is given to the RecordWriter which serialises the record from a Java object into a sequence of bytes which eventually ends up in a network buffer that is handed along as described above. The RecordWriter first serialises the record to a flexible on-heap byte array using the SpanningRecordSerializer. Afterwards, it tries to write these bytes into the associated network buffer of the target network channel. We will come back to this last part in the section below.
On the receiver’s side, the lower network stack (netty) is writing received buffers into the appropriate input channels. The (stream) tasks’s thread eventually reads from these queues and tries to deserialise the accumulated bytes into Java objects with the help of the RecordReader and going through the SpillingAdaptiveSpanningRecordDeserializer. Similar to the serialiser, this deserialiser must also deal with special cases like records spanning multiple network buffers, either because the record is just bigger than a network buffer (32KiB by default, set via taskmanager.memory.segment-size) or because the serialised record was added to a network buffer which did not have enough remaining bytes. Flink will nevertheless use these bytes and continue writing the rest to a new network buffer. Flushing Buffers to Netty # In the picture above, the credit-based flow control mechanics actually sit inside the “Netty Server” (and “Netty Client”) components and the buffer the RecordWriter is writing to is always added to the result subpartition in an empty state and then gradually filled with (serialised) records. But when does Netty actually get the buffer? Obviously, it cannot take bytes whenever they become available since that would not only add substantial costs due to cross-thread communication and synchronisation, but also make the whole buffering obsolete.
In Flink, there are three situations that make a buffer available for consumption by the Netty server:
a buffer becomes full when writing a record to it, or
the buffer timeout hits, or
a special event such as a checkpoint barrier is sent.
Flush after Buffer Full # The RecordWriter works with a local serialisation buffer for the current record and will gradually write these bytes to one or more network buffers sitting at the appropriate result subpartition queue. Although a RecordWriter can work on multiple subpartitions, each subpartition has only one RecordWriter writing data to it. The Netty server, on the other hand, is reading from multiple result subpartitions and multiplexing the appropriate ones into a single channel as described above. This is a classical producer-consumer pattern with the network buffers in the middle and as shown by the next picture. After (1) serialising and (2) writing data to the buffer, the RecordWriter updates the buffer’s writer index accordingly. Once the buffer is completely filled, the record writer will (3) acquire a new buffer from its local buffer pool for any remaining bytes of the current record - or for the next one - and add the new one to the subpartition queue. This will (4) notify the Netty server of data being available if it is not aware yet4. Whenever Netty has capacity to handle this notification, it will (5) take the buffer and send it along the appropriate TCP channel. 4We can assume it already got the notification if there are more finished buffers in the queue. Flush after Buffer Timeout # In order to support low-latency use cases, we cannot only rely on buffers being full in order to send data downstream. There may be cases where a certain communication channel does not have too many records flowing through and unnecessarily increase the latency of the few records you actually have. Therefore, a periodic process will flush whatever data is available down the stack: the output flusher. The periodic interval can be configured via StreamExecutionEnvironment#setBufferTimeout and acts as an upper bound on the latency5 (for low-throughput channels). The following picture shows how it interacts with the other components: the RecordWriter serialises and writes into network buffers as before but concurrently, the output flusher may (3,4) notify the Netty server of data being available if Netty is not already aware (similar to the “buffer full” scenario above). When Netty handles this notification (5) it will consume the available data from the buffer and update the buffer’s reader index. The buffer stays in the queue - any further operation on this buffer from the Netty server side will continue reading from the reader index next time. 5Strictly speaking, the output flusher does not give any guarantees - it only sends a notification to Netty which can pick it up at will / capacity. This also means that the output flusher has no effect if the channel is backpressured. Flush after special event # Some special events also trigger immediate flushes if being sent through the RecordWriter. The most important ones are checkpoint barriers or end-of-partition events which obviously should go quickly and not wait for the output flusher to kick in. Further remarks # In contrast to Flink &lt; 1.5, please note that (a) network buffers are now placed in the subpartition queues directly and (b) we are not closing the buffer on each flush. This gives us a few advantages:
less synchronisation overhead (output flusher and RecordWriter are independent) in high-load scenarios where Netty is the bottleneck (either through backpressure or directly), we can still accumulate data in incomplete buffers significant reduction of Netty notifications However, you may notice an increased CPU use and TCP packet rate during low load scenarios. This is because, with the changes, Flink will use any available CPU cycles to try to maintain the desired latency. Once the load increases, this will self-adjust by buffers filling up more. High load scenarios are not affected and even get a better throughput because of the reduced synchronisation overhead. Buffer Builder &amp; Buffer Consumer # If you want to dig deeper into how the producer-consumer mechanics are implemented in Flink, please take a closer look at the BufferBuilder and BufferConsumer classes which have been introduced in Flink 1.5. While reading is potentially only per buffer, writing to it is per record and thus on the hot path for all network communication in Flink. Therefore, it was very clear to us that we needed a lightweight connection between the task’s thread and the Netty thread which does not imply too much synchronisation overhead. For further details, we suggest to check out the source code.
Latency vs. Throughput # Network buffers were introduced to get higher resource utilisation and higher throughput at the cost of having some records wait in buffers a little longer. Although an upper limit to this wait time can be given via the buffer timeout, you may be curious to find out more about the trade-off between these two dimensions: latency and throughput, as, obviously, you cannot get both. The following plot shows various values for the buffer timeout starting at 0 (flush with every record) to 100ms (the default) and shows the resulting throughput rates on a cluster with 100 nodes and 8 slots each running a job that has no business logic and thus only tests the network stack. For comparison, we also plot Flink 1.4 before the low-latency improvements (as described above) were added. As you can see, with Flink 1.5+, even very low buffer timeouts such as 1ms (for low-latency scenarios) provide a maximum throughput as high as 75% of the default timeout where more data is buffered before being sent over the wire.
Conclusion # Now you know about result partitions, the different network connections and scheduling types for both batch and streaming. You also know about credit-based flow control and how the network stack works internally, in order to reason about network-related tuning parameters and about certain job behaviours. Future blog posts in this series will build upon this knowledge and go into more operational details including relevant metrics to look at, further network stack tuning, and common antipatterns to avoid. Stay tuned for more.
`}),e.add({id:186,href:"/2019/05/17/state-ttl-in-flink-1.8.0-how-to-automatically-cleanup-application-state-in-apache-flink/",title:"State TTL in Flink 1.8.0: How to Automatically Cleanup Application State in Apache Flink",section:"Flink Blog",content:`A common requirement for many stateful streaming applications is to automatically cleanup application state for effective management of your state size, or to control how long the application state can be accessed (e.g. due to legal regulations like the GDPR). The state time-to-live (TTL) feature was initiated in Flink 1.6.0 and enabled application state cleanup and efficient state size management in Apache Flink.
In this post, we motivate the State TTL feature and discuss its use cases. Moreover, we show how to use and configure it. We explain how Flink internally manages state with TTL and present some exciting additions to the feature in Flink 1.8.0. The blog post concludes with an outlook on future improvements and extensions.
The Transient Nature of State # There are two major reasons why state should be maintained only for a limited time. For example, let’s imagine a Flink application that ingests a stream of user login events and stores for each user the time of the last login to improve the experience of frequent visitors.
Controlling the size of state. Being able to efficiently manage an ever-growing state size is a primary use case for state TTL. Oftentimes, data needs to be persisted temporarily while there is some user activity around it, e.g. web sessions. When the activity ends there is no longer interest in that data while it still occupies storage. Flink 1.8.0 introduces background cleanup of old state based on TTL that makes the eviction of no-longer-necessary data frictionless. Previously, the application developer had to take extra actions and explicitly remove useless state to free storage space. This manual clean up procedure was not only error prone but also less efficient than the new lazy method to remove state. Following our previous example of storing the time of the last login, this might not be necessary after some time because the user can be treated as “infrequent” later on.
Complying with data protection and sensitive data requirements. Recent developments around data privacy regulations, such as the General Data Protection Regulation (GDPR) introduced by the European Union, make compliance with such data requirements or treating sensitive data a top priority for many use cases and applications. An example of such use cases includes applications that require keeping data for a specific timeframe and preventing access to it thereafter. This is a common challenge for companies providing short-term services to their customers. The state TTL feature gives guarantees for how long an application can access state and hence can help to comply with data protection regulations.
Both requirements can be addressed by a feature that periodically, yet continuously, removes the state for a key once it becomes unnecessary or unimportant and there is no requirement to keep it in storage any more.
State TTL for continuous cleanup of application state # The 1.6.0 release of Apache Flink introduced the State TTL feature. It enabled developers of stream processing applications to configure the state of operators to expire and be cleaned up after a defined timeout (time-to-live). In Flink 1.8.0 the feature was extended, including continuous cleanup of old entries for both the RocksDB and the heap state backends (FSStateBackend and MemoryStateBackend), enabling a continuous cleanup process of old entries (according to the TTL setting).
In Flink’s DataStream API, application state is defined by a state descriptor. State TTL is configured by passing a StateTtlConfiguration object to a state descriptor. The following Java example shows how to create a state TTL configuration and provide it to the state descriptor that holds the last login time of a user as a Long value:
import org.apache.flink.api.common.state.StateTtlConfig; import org.apache.flink.api.common.time.Time; import org.apache.flink.api.common.state.ValueStateDescriptor; StateTtlConfig ttlConfig = StateTtlConfig .newBuilder(Time.days(7)) .setUpdateType(StateTtlConfig.UpdateType.OnCreateAndWrite) .setStateVisibility(StateTtlConfig.StateVisibility.NeverReturnExpired) .build(); ValueStateDescriptor&lt;Long&gt; lastUserLogin = new ValueStateDescriptor&lt;&gt;(&#34;lastUserLogin&#34;, Long.class); lastUserLogin.enableTimeToLive(ttlConfig); Flink provides multiple options to configure the behavior of the state TTL functionality.
When is the Time-to-Live reset? By default, the expiration time of a state entry is updated when the state is modified. Optionally, it can also be updated on read access at the cost of an additional write operation to update the timestamp.
Can the expired state be accessed one last time? State TTL employs a lazy strategy to clean up expired state. This can lead to the situation that an application attempts to read state which is expired but hasn’t been removed yet. You can configure whether such a read request returns the expired state or not. In either case, the expired state is immediately removed afterwards. While the option of returning expired state favors data availability, not returning expired state can be required for data protection regulations.
Which time semantics are used for the Time-to-Live timers? With Flink 1.8.0, users can only define a state TTL in terms of processing time. The support for event time is planned for future Apache Flink releases.
You can read more about how to use state TTL in the Apache Flink documentation.
Internally, the State TTL feature is implemented by storing an additional timestamp of the last relevant state access, along with the actual state value. While this approach adds some storage overhead, it allows Flink to check for the expired state during state access, checkpointing, recovery, or dedicated storage cleanup procedures.
“Taking out the Garbage” # When a state object is accessed in a read operation, Flink will check its timestamp and clear the state if it is expired (depending on the configured state visibility, the expired state is returned or not). Due to this lazy removal, expired state that is never accessed again will forever occupy storage space unless it is garbage collected.
So how can the expired state be removed without the application logic explicitly taking care of it? In general, there are different possible strategies to remove it in the background.
Keep full state snapshots clean # Flink 1.6.0 already supported automatic eviction of the expired state when a full snapshot for a checkpoint or savepoint is taken. Note that state eviction is not applied for incremental checkpoints. State eviction on full snapshots must be explicitly enabled as shown in the following example:
StateTtlConfig ttlConfig = StateTtlConfig .newBuilder(Time.days(7)) .cleanupFullSnapshot() .build(); The local storage stays untouched but the size of the stored snapshot is reduced. The local state of an operator will only be cleaned up when the operator reloads its state from a snapshot, i.e. in case of recovery or when starting from a savepoint.
Due to these limitations, applications still need to actively remove state after it expired in Flink 1.6.0. To improve the user experience, Flink 1.8.0 introduces two more autonomous cleanup strategies, one for each of Flink’s two state backend types. We describe them below.
Incremental cleanup in Heap state backends # This approach is specific to the Heap state backends (FSStateBackend and MemoryStateBackend). The idea is that the storage backend keeps a lazy global iterator over all state entries. Certain events, for instance state access, trigger an incremental cleanup. Every time an incremental cleanup is triggered, the iterator is advanced. The traversed state entries are checked and expired once are removed. The following code example shows how to enable incremental cleanup:
StateTtlConfig ttlConfig = StateTtlConfig .newBuilder(Time.days(7)) // check 10 keys for every state access .cleanupIncrementally(10, false) .build(); If enabled, every state access triggers a cleanup step. For every clean up step, a certain number of state entries are checked for expiration. There are two tuning parameters. The first defines the number of state entries to check for each cleanup step. The second parameter is a flag to trigger a cleanup step after each processed record, additionally to each state access.
There are two important caveats about this approach:
The first one is that the time spent for the incremental cleanup increases the record processing latency. The second one should be practically negligible but still worth mentioning: if no state is accessed or no records are processed, expired state won’t be removed. RocksDB background compaction to filter out expired state # If your application uses the RocksDB state backend, you can enable another cleanup strategy which is based on a Flink specific compaction filter. RocksDB periodically runs asynchronous compactions to merge state updates and reduce storage. The Flink compaction filter checks the expiration timestamp of state entries with TTL and discards all expired values.
The first step to activate this feature is to configure the RocksDB state backend by setting the following Flink configuration option: state.backend.rocksdb.ttl.compaction.filter.enabled. Once the RocksDB state backend is configured, the compaction cleanup strategy is enabled for a state as shown in the following code example:
StateTtlConfig ttlConfig = StateTtlConfig .newBuilder(Time.days(7)) .cleanupInRocksdbCompactFilter() .build(); Keep in mind that calling the Flink TTL filter slows down the RocksDB compaction.
Eager State Cleanup with Timers # Another way to manually cleanup state is based on Flink timers. This is an idea that the community is currently evaluating for future releases. With this approach, a cleanup timer is registered for every state access. This approach is more predictable because state is eagerly removed as soon as it expires. However, it is more expensive because the timers consume storage along with the original state.
Future work # Apart from including the timer-based cleanup strategy, mentioned above, the Flink community has plans to further improve the state TTL feature. The possible improvements include adding support of TTL for event time scale (only processing time is supported at the moment) and enabling State TTL for queryable state.
We encourage you to join the conversation and share your thoughts and ideas in the Apache Flink JIRA board or by subscribing to the Apache Flink dev mailing list. Feedback or suggestions are always appreciated and we look forward to hearing your thoughts on the Flink mailing lists.
Summary # Time-based state access restrictions and controlling the size of application state are common challenges in the world of stateful stream processing. Flink’s 1.8.0 release significantly improves the State TTL feature by adding support for continuous background cleanup of expired state objects. The new clean up mechanisms relieve you from manually implementing state cleanup. They are also more efficient due to their lazy nature. State TTL gives you control over the size of your application state so that you can focus on the core logic of your applications.
`}),e.add({id:187,href:"/2019/05/14/flux-capacitor-huh-temporal-tables-and-joins-in-streaming-sql/",title:"Flux capacitor, huh? Temporal Tables and Joins in Streaming SQL",section:"Flink Blog",content:`Figuring out how to manage and model temporal data for effective point-in-time analysis was a longstanding battle, dating as far back as the early 80’s, that culminated with the introduction of temporal tables in the SQL standard in 2011. Up to that point, users were doomed to implement this as part of the application logic, often hurting the length of the development lifecycle as well as the maintainability of the code. And, although there isn’t a single, commonly accepted definition of temporal data, the challenge it represents is one and the same: how do we validate or enrich data against dynamically changing, historical datasets?
For example: given a stream with Taxi Fare events tied to the local currency of the ride location, we might want to convert the fare price to a common currency for further processing. As conversion rates excel at fluctuating over time, each Taxi Fare event would need to be matched to the rate that was valid at the time the event occurred in order to produce a reliable result.
Modelling Temporal Data with Flink # In the 1.7 release, Flink has introduced the concept of temporal tables into its streaming SQL and Table API: parameterized views on append-only tables — or, any table that only allows records to be inserted, never updated or deleted — that are interpreted as a changelog and keep data closely tied to time context, so that it can be interpreted as valid only within a specific period of time. Transforming a stream into a temporal table requires:
Defining a primary key and a versioning field that can be used to keep track of the changes that happen over time;
Exposing the stream as a temporal table function that maps each point in time to a static relation.
Going back to our example use case, a temporal table is just what we need to model the conversion rate data such as to make it useful for point-in-time querying. Temporal table functions are implemented as an extension of Flink’s generic table function class and can be defined in the same straightforward way to be used with the Table API or SQL parser.
import org.apache.flink.table.functions.TemporalTableFunction; (...) // Get the stream and table environments. StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); StreamTableEnvironment tEnv = StreamTableEnvironment.getTableEnvironment(env); // Provide a sample static data set of the rates history table. List &lt;Tuple2&lt;String, Long&gt;&gt;ratesHistoryData =new ArrayList&lt;&gt;(); ratesHistoryData.add(Tuple2.of(&#34;USD&#34;, 102L)); ratesHistoryData.add(Tuple2.of(&#34;EUR&#34;, 114L)); ratesHistoryData.add(Tuple2.of(&#34;YEN&#34;, 1L)); ratesHistoryData.add(Tuple2.of(&#34;EUR&#34;, 116L)); ratesHistoryData.add(Tuple2.of(&#34;USD&#34;, 105L)); // Create and register an example table using the sample data set. DataStream&lt;Tuple2&lt;String, Long&gt;&gt; ratesHistoryStream = env.fromCollection(ratesHistoryData); Table ratesHistory = tEnv.fromDataStream(ratesHistoryStream, &#34;r_currency, r_rate, r_proctime.proctime&#34;); tEnv.registerTable(&#34;RatesHistory&#34;, ratesHistory); // Create and register the temporal table function &#34;rates&#34;. // Define &#34;r_proctime&#34; as the versioning field and &#34;r_currency&#34; as the primary key. TemporalTableFunction rates = ratesHistory.createTemporalTableFunction(&#34;r_proctime&#34;, &#34;r_currency&#34;); tEnv.registerFunction(&#34;Rates&#34;, rates); (...) What does this Rates function do, in practice? Imagine we would like to check what the conversion rates looked like at a given time — say, 11:00. We could simply do something like:
SELECT * FROM Rates(&#39;11:00&#39;); Even though Flink does not yet support querying temporal table functions with a constant time attribute parameter, these functions can be used to cover a much more interesting scenario: temporal table joins.
Streaming Joins using Temporal Tables # Temporal tables reach their full potential when used in combination — erm, joined — with streaming data, for instance to power applications that must continuously whitelist against a reference dataset that changes over time for auditing or regulatory compliance. While efficient joins have long been an enduring challenge for query processors due to computational cost and resource consumption, joins over streaming data carry some additional challenges:
The unbounded nature of streams means that inputs are continuously evaluated and intermediate join results can consume memory resources indefinitely. Flink gracefully manages its memory consumption out-of-the-box (even for heavier cases where joins require spilling to disk) and supports time-windowed joins to bound the amount of data that needs to be kept around as state; Streaming data might be out-of-order and late, so it is not possible to enforce an ordering upfront and time handling requires some thinking to avoid unnecessary outputs and retractions. In the particular case of temporal data, time-windowed joins are not enough (well, at least not without getting into some expensive tweaking): sooner or later, each reference record will fall outside of the window and be wiped from state, no longer being considered for future join results. To address this limitation, Flink has introduced support for temporal table joins to cover time-varying relations.
Each record from the append-only table on the probe side (Taxi Fare) is joined with the version of the record from the temporal table on the build side (Conversion Rate) that most closely matches the probe side record time attribute (time) for the same value of the primary key (currency). Remember the temporal table function (Rates) we registered earlier? It can now be used to express this join as a simple SQL statement that would otherwise require a heavier statement with a subquery.
Temporal table joins support both processing and event time semantics and effectively limit the amount of data kept in state while also allowing records on the build side to be arbitrarily old, as opposed to time-windowed joins. Probe-side records only need to be kept in state for a very short time to ensure correct semantics in presence of out-of-order records. The challenges mentioned in the beginning of this section are overcome by:
Narrowing the scope of the join: only the time-matching version of ratesHistory is visible for a given taxiFare.time; Pruning unneeded records from state: for cases using event time, records between current time and the watermark delay are persisted for both the probe and build side. These are discarded as soon as the watermark arrives and the results are emitted — allowing the join operation to move forward in time and the build table to “refresh” its version in state. Conclusion # All this means it is now possible to express continuous stream enrichment in relational and time-varying terms using Flink without dabbling into syntactic patchwork or compromising performance. In other words: stream time-travelling minus the flux capacitor. Extending this syntax to batch processing for enriching historic data with proper (event) time semantics is also part of the Flink roadmap!
If you&rsquo;d like to get some hands-on practice in joining streams with Flink SQL (and Flink SQL in general), checkout this free training for Flink SQL. The training environment is based on Docker and set up in just a few minutes.
Subscribe to the Apache Flink mailing lists to stay up-to-date with the latest developments in this space.
`}),e.add({id:188,href:"/2019/05/03/when-flink-pulsar-come-together/",title:"When Flink & Pulsar Come Together",section:"Flink Blog",content:`The open source data technology frameworks Apache Flink and Apache Pulsar can integrate in different ways to provide elastic data processing at large scale. I recently gave a talk at Flink Forward San Francisco 2019 and presented some of the integrations between the two frameworks for batch and streaming applications. In this post, I will give a short introduction to Apache Pulsar and its differentiating elements from other messaging systems and describe the ways that Pulsar and Flink can work together to provide a seamless developer experience for elastic data processing at scale.
A brief introduction to Apache Pulsar # Apache Pulsar is an open-source distributed pub-sub messaging system under the stewardship of the Apache Software Foundation. Pulsar is a multi-tenant, high-performance solution for server-to-server messaging including multiple features such as native support for multiple clusters in a Pulsar instance, with seamless geo-replication of messages across clusters, very low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with persistent message storage provided by Apache BookKeeper among others. Let’s now discuss the primary differentiators between Pulsar and other pub-sub messaging frameworks:
The first differentiating factor stems from the fact that although Pulsar provides a flexible pub-sub messaging system it is also backed by durable log storage — hence combining both messaging and storage under one framework. Because of that layered architecture, Pulsar provides instant failure recovery, independent scalability and balance-free cluster expansion.
Pulsar’s architecture follows a similar pattern to other pub-sub systems as the framework is organized in topics as the main data entity, with producers sending data to, and consumers receiving data from a topic as shown in the diagram below.
The second differentiator of Pulsar is that the framework is built from the get-go with multi-tenancy in mind. What that means is that each Pulsar topic has a hierarchical management structure making the allocation of resources as well as the resource management and coordination between teams efficient and easy. With Pulsar’s multi-tenancy structure, data platform maintainers can onboard new teams with no friction as Pulsar provides resource isolation at the property (tenant), namespace or topic level, while at the same time data can be shared across the cluster for easy collaboration and coordination.
Finally, Pulsar’s flexible messaging framework unifies the streaming and queuing data consumption models and provides greater flexibility. As shown in the below diagram, Pulsar holds the data in the topic while multiple teams can consume the data independently depending on their workloads and data consumption patterns.
Pulsar’s view on data: Segmented data streams # Apache Flink is a streaming-first computation framework that perceives batch processing as a special case of streaming. Flink’s view on data streams distinguishes batch and stream processing between bounded and unbounded data streams, assuming that for batch workloads the data stream is finite, with a beginning and an end.
Apache Pulsar has a similar perspective to that of Apache Flink with regards to the data layer. The framework also uses streams as a unified view on all data, while its layered architecture allows traditional pub-sub messaging for streaming workloads and continuous data processing or usage of Segmented Streams and bounded data stream for batch and static workloads.
With Pulsar, once a producer sends data to a topic, it is partitioned depending on the data traffic and then further segmented under those partitions — using Apache Bookkeeper as segment store — to allow for parallel data processing as illustrated in the diagram below. This allows a combination of traditional pub-sub messaging and distributed parallel computations in one framework.
When Flink + Pulsar come together # Apache Flink and Apache Pulsar integrate in multiple ways already. In the following sections, I will present some potential future integrations between the frameworks and share examples of existing ways in which you can utilize the frameworks together.
Potential Integrations # Pulsar can integrate with Apache Flink in different ways. Some potential integrations include providing support for streaming workloads with the use of Streaming Connectors and support for batch workloads with the use of Batch Source Connectors. Pulsar also comes with native support for schema that can integrate with Flink and provide structured access to the data, for example by using Flink SQL as a way of querying data in Pulsar. Finally, an alternative way of integrating the technologies could include using Pulsar as a state backend with Flink. Since Pulsar has a layered architecture (Streams and Segmented Streams, powered by Apache Bookkeeper), it becomes natural to use Pulsar as a storage layer and store Flink state.
From an architecture point of view, we can imagine the integration between the two frameworks as one that uses Apache Pulsar for a unified view of the data layer and Apache Flink as a unified computation and data processing framework and API.
Existing Integrations # Integration between the two frameworks is ongoing and developers can already use Pulsar with Flink in multiple ways. For example, Pulsar can be used as a streaming source and streaming sink in Flink DataStream applications. Developers can ingest data from Pulsar into a Flink job that makes computations and processes real-time data, to then send the data back to a Pulsar topic as a streaming sink. Such an example is shown below:
// create and configure Pulsar consumer PulsarSourceBuilder&lt;String&gt;builder = PulsarSourceBuilder .builder(new SimpleStringSchema()) .serviceUrl(serviceUrl) .topic(inputTopic) .subscriptionName(subscription); SourceFunction&lt;String&gt; src = builder.build(); // ingest DataStream with Pulsar consumer DataStream&lt;String&gt; words = env.addSource(src); // perform computation on DataStream (here a simple WordCount) DataStream&lt;WordWithCount&gt; wc = words .flatMap((FlatMapFunction&lt;String, WordWithCount&gt;) (word, collector) -&gt; { collector.collect(new WordWithCount(word, 1)); }) .returns(WordWithCount.class) .keyBy(&#34;word&#34;) .timeWindow(Time.seconds(5)) .reduce((ReduceFunction&lt;WordWithCount&gt;) (c1, c2) -&gt; new WordWithCount(c1.word, c1.count + c2.count)); // emit result via Pulsar producer wc.addSink(new FlinkPulsarProducer&lt;&gt;( serviceUrl, outputTopic, new AuthenticationDisabled(), wordWithCount -&gt; wordWithCount.toString().getBytes(UTF_8), wordWithCount -&gt; wordWithCount.word) ); Another integration between the two frameworks that developers can take advantage of includes using Pulsar as both a streaming source and a streaming table sink for Flink SQL or Table API queries as shown in the example below:
// obtain a DataStream with words DataStream&lt;String&gt; words = ... // register DataStream as Table &#34;words&#34; with two attributes (&#34;word&#34;, &#34;ts&#34;). // &#34;ts&#34; is an event-time timestamp. tableEnvironment.registerDataStream(&#34;words&#34;, words, &#34;word, ts.rowtime&#34;); // create a TableSink that produces to Pulsar TableSink sink = new PulsarJsonTableSink( serviceUrl, outputTopic, new AuthenticationDisabled(), ROUTING_KEY); // register Pulsar TableSink as table &#34;wc&#34; tableEnvironment.registerTableSink( &#34;wc&#34;, sink.configure( new String[]{&#34;word&#34;, &#34;cnt&#34;}, new TypeInformation[]{Types.STRING, Types.LONG})); // count words per 5 seconds and write result to table &#34;wc&#34; tableEnvironment.sqlUpdate( &#34;INSERT INTO wc &#34; + &#34;SELECT word, COUNT(*) AS cnt &#34; + &#34;FROM words &#34; + &#34;GROUP BY word, TUMBLE(ts, INTERVAL &#39;5&#39; SECOND)&#34;); Finally, Flink integrates with Pulsar for batch workloads as a batch sink where all results get pushed to Pulsar after Apache Flink has completed the computation in a static data set. Such an example is shown below:
// obtain DataSet from arbitrary computation DataSet&lt;WordWithCount&gt; wc = ... // create PulsarOutputFormat instance OutputFormat pulsarOutputFormat = new PulsarOutputFormat( serviceUrl, topic, new AuthenticationDisabled(), wordWithCount -&gt; wordWithCount.toString().getBytes()); // write DataSet to Pulsar wc.output(pulsarOutputFormat); Conclusion # Both Pulsar and Flink share a similar view on how the data and the computation level of an application can be “streaming-first” with batch as a special case streaming. With Pulsar’s Segmented Streams approach and Flink’s steps to unify batch and stream processing workloads under one framework, there are numerous ways of integrating the two technologies together to provide elastic data processing at massive scale. Subscribe to the Apache Flink and Apache Pulsar mailing lists to stay up-to-date with the latest developments in this space or share your thoughts and recommendations with both communities.
`}),e.add({id:189,href:"/2019/04/17/apache-flinks-application-to-season-of-docs/",title:"Apache Flink's Application to Season of Docs",section:"Flink Blog",content:`The Apache Flink community is happy to announce its application to the first edition of Season of Docs by Google. The program is bringing together Open Source projects and technical writers to raise awareness for and improve documentation of Open Source projects. While the community is continuously looking for new contributors to collaborate on our documentation, we would like to take this chance to work with one or two technical writers to extend and restructure parts of our documentation (details below).
The community has discussed this opportunity on the dev mailinglist and agreed on three project ideas to submit to the program. We have a great team of mentors (Stephan, Fabian, David, Jark &amp; Konstantin) lined up and are very much looking forward to the first proposals by potential technical writers (given we are admitted to the program ;)). In case of questions feel free to reach out to the community via dev@flink.apache.org.
Project Ideas List # Project 1: Improve Documentation of Stream Processing Concepts # Description: Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing over the last several years. As the community has pushed the boundaries of stream processing, we have introduced new concepts that users need to become familiar with to develop and operate Apache Flink applications efficiently. The Apache Flink documentation [1] already contains a “concepts” section, but it is a ) incomplete and b) lacks an overall structure &amp; reading flow. In addition, “concepts”-content is also spread over the development [2] &amp; operations [3] documentation without references to the “concepts” section. An example of this can be found in [4] and [5].
In this project, we would like to restructure, consolidate and extend the concepts documentation for Apache Flink to better guide users who want to become productive as quickly as possible. This includes better conceptual introductions to topics such as event time, state, and fault tolerance with proper linking to and from relevant deployment and development guides.
Related material:
//nightlies.apache.org/flink/flink-docs-release-1.8/ //nightlies.apache.org/flink/flink-docs-release-1.8/dev //nightlies.apache.org/flink/flink-docs-release-1.8/ops //nightlies.apache.org/flink/flink-docs-release-1.8/concepts/programming-model.html#time //nightlies.apache.org/flink/flink-docs-release-1.8/dev/event_time.html Project 2: Improve Documentation of Flink Deployments &amp; Operations # Description: Stream processing is the processing of data in motion―in other words, computing on data directly as it is produced or received. Apache Flink has pioneered the field of distributed, stateful stream processing for the last few years. As a stateful distributed system in general and a continuously running, low-latency system in particular, Apache Flink deployments are non-trivial to setup and manage. Unfortunately, the operations [1] and monitoring documentation [2] are arguably the weakest spots of the Apache Flink documentation. While it is comprehensive and often goes into a lot of detail, it lacks an overall structure and does not address common overarching concerns of operations teams in an efficient way.
In this project, we would like to restructure this part of the documentation and extend it if possible. Ideas for extension include: discussion of session and per-job clusters, better documentation for containerized deployments (incl. K8s), capacity planning &amp; integration into CI/CD pipelines.
Related material:
//nightlies.apache.org/flink/flink-docs-release-1.8/ops //nightlies.apache.org/flink/flink-docs-release-1.8/monitoring Project 3: Improve Documentation for Relational APIs (Table API &amp; SQL) # Description: Apache Flink features APIs at different levels of abstraction which enables its users to trade conciseness for expressiveness. Flink’s relational APIs, SQL and the Table API, are “younger” than the DataStream and DataSet APIs, more high-level and focus on data analytics use cases. A core principle of Flink’s SQL and Table API is that they can be used to process static (batch) and continuous (streaming) data and that a program or query produces the same result in both cases. The documentation of Flink’s relational APIs has organically grown and can be improved in a few areas. There are several on-going development efforts (e.g. Hive Integration, Python Support or Support for Interactive Programming) that aim to extend the scope of the Table API and SQL.
The existing documentation could be reorganized to prepare for covering the new features. Moreover, it could be improved by adding a concepts section that describes the use cases and internals of the APIs in more detail. Moreover, the documentation of built-in functions could be improved by adding more concrete examples.
Related material:
Table API &amp; SQL docs main page Built-in functions Concepts Streaming Concepts `}),e.add({id:190,href:"/2019/04/09/apache-flink-1.8.0-release-announcement/",title:"Apache Flink 1.8.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce Apache Flink 1.8.0. The latest release includes more than 420 resolved issues and some exciting additions to Flink that we describe in the following sections of this post. Please check the complete changelog for more details.
Flink 1.8.0 is API-compatible with previous 1.x.y releases for APIs annotated with the @Public annotation. The release is available now and we encourage everyone to download the release and check out the updated documentation. Feedback through the Flink mailing lists or JIRA is, as always, very much appreciated!
You can find the binaries on the updated Downloads page on the Flink project site.
With Flink 1.8.0 we come closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way. We do this by cleaning up and refactoring Flink under the hood to allow more efficient feature development in the future. This includes removal of the legacy runtime components that were subsumed in the major rework of Flink&rsquo;s underlying distributed system architecture (FLIP-6) as well as refactorings on the Table API that prepare it for the future addition of the Blink enhancements (FLINK-11439).
Nevertheless, this release includes some important new features and bug fixes. The most interesting of those are highlighted below. Please consult the complete changelog and the release notes for more details.
New Features and Improvements # Finalized State Schema Evolution Story: This release completes the community driven effort to provide a schema evolution story for user state managed by Flink. This has been an effort that spanned 2 releases, starting from 1.7.0 with the introduction of support for Avro state schema evolution as well as a revamped serialization compatibility abstraction.
Flink 1.8.0 finalizes this effort by extending support for schema evolution to POJOs, upgrading all Flink built-in serializers to use the new serialization compatibility abstractions, as well as making it easier for advanced users who use custom state serializers to implement the abstractions. These different aspects for a complete out-of-the-box schema evolution story are explained in detail below:
Support for POJO state schema evolution: The pool of data types that support state schema evolution has been expanded to include POJOs. For state types that use POJOs, you can now add or remove fields from your POJO while retaining backwards compatibility. For a full overview of the list of data types that now support schema evolution as well as their evolution specifications and limitations, please refer to the State Schema Evolution documentation page.
Upgrade all Flink serializers to use new serialization compatibility asbtractions: Back in 1.7.0, we introduced the new serialization compatibility abstractions TypeSerializerSnapshot and TypeSerializerSchemaCompatibility. Besides providing a more expressible API to reflect schema compatibility between the data stored in savepoints and the data registered at runtime, another important aspect about the new abstraction is that it avoids the need for Flink to Java-serialize the state serializer as state metadata in savepoints.
In 1.8.0, all of Flink&rsquo;s built-in serializers have been upgraded to use the new abstractions, and therefore the serializers themselves are no longer Java-serialized into savepoints. This greatly improves interoperability of Flink savepoints, in terms of state schema evolvability. For example, one outcome was the support for POJO schema evolution, as previously mentioned above. Another outcome is that all composite data types supported by Flink (such as Either, Scala case classes, Flink Java Tuples, etc.) are generally evolve-able as well when they have a nested evolvable type, such as a POJO. For example, the MyPojo type in ValueState&lt;Tuple2&lt;Integer, MyPojo&gt;&gt; or ListState&lt;Either&lt;Integer, MyPojo&gt;&gt;, which is a POJO, is allowed to evolve its schema.
For users who are using custom TypeSerializer implementations for their state serializer and are still using the outdated abstractions (i.e. TypeSerializerConfigSnapshot and CompatiblityResult), we highly recommend upgrading to the new abstractions to be future proof. Please refer to the Custom State Serialization documentation page for a detailed description on the new abstractions.
Provide pre-defined snapshot implementations for common serializers: For convenience, Flink 1.8.0 comes with two predefined implementations for the TypeSerializerSnapshot that make the task of implementing these new abstractions easier for most implementations of TypeSerializers - SimpleTypeSerializerSnapshot and CompositeTypeSerializerSnapshot. This section in the documentation provides information on how to use these classes.
Continuous cleanup of old state based on TTL (FLINK-7811): We introduced TTL (time-to-live) for Keyed state in Flink 1.6 (FLINK-9510). This feature enabled cleanup and made keyed state entries inaccessible after a defined timeout. In addition state would now also be cleaned up when writing a savepoint/checkpoint.
Flink 1.8 introduces continuous cleanup of old entries for both the RocksDB state backend (FLINK-10471) and the heap state backend (FLINK-10473). This means that old entries (according to the TTL setting) are continuously cleaned up.
SQL pattern detection with user-defined functions and aggregations: The support of the MATCH_RECOGNIZE clause has been extended by multiple features. The addition of user-defined functions allows for custom logic during pattern detection (FLINK-10597), while adding aggregations allows for more complex CEP definitions, such as the following (FLINK-7599).
SELECT * FROM Ticker MATCH_RECOGNIZE ( ORDER BY rowtime MEASURES AVG(A.price) AS avgPrice ONE ROW PER MATCH AFTER MATCH SKIP TO FIRST B PATTERN (A+ B) DEFINE A AS AVG(A.price) &lt; 15 ) MR; RFC-compliant CSV format (FLINK-9964): The SQL tables can now be read and written in an RFC-4180 standard compliant CSV table format. The format might also be useful for general DataStream API users.
New KafkaDeserializationSchema that gives direct access to ConsumerRecord (FLINK-8354): For the Flink KafkaConsumers, we introduced a new KafkaDeserializationSchema that gives direct access to the Kafka ConsumerRecord. This now allows access to all data that Kafka provides for a record, including the headers. This subsumes the KeyedSerializationSchema functionality, which is deprecated but still available for now.
Per-shard watermarking option in FlinkKinesisConsumer (FLINK-5697): The Kinesis Consumer can now emit periodic watermarks that are derived from per-shard watermarks, for correct event time processing with subtasks that consume multiple Kinesis shards.
New consumer for DynamoDB Streams to capture table changes (FLINK-4582): FlinkDynamoDBStreamsConsumer is a variant of the Kinesis consumer that supports retrieval of CDC-like streams from DynamoDB tables.
Support for global aggregates for subtask coordination (FLINK-10887): Designed as a solution for global source watermark tracking, GlobalAggregateManager allows sharing of information between parallel subtasks. This feature will be integrated into streaming connectors for watermark synchronization and can be used for other purposes with a user defined aggregator.
Important Changes # Changes to bundling of Hadoop libraries with Flink (FLINK-11266): Convenience binaries that include hadoop are no longer released.
If a deployment relies on flink-shaded-hadoop2 being included in flink-dist, then you must manually download a pre-packaged Hadoop jar from the optional components section of the download page and copy it into the /lib directory. Alternatively, a Flink distribution that includes hadoop can be built by packaging flink-dist and activating the include-hadoop maven profile.
As hadoop is no longer included in flink-dist by default, specifying -DwithoutHadoop when packaging flink-dist no longer impacts the build.
FlinkKafkaConsumer will now filter restored partitions based on topic specification (FLINK-10342): Starting from Flink 1.8.0, the FlinkKafkaConsumer now always filters out restored partitions that are no longer associated with a specified topic to subscribe to in the restored execution. This behaviour did not exist in previous versions of the FlinkKafkaConsumer. If you wish to retain the previous behaviour, please use the disableFilterRestoredPartitionsWithSubscribedTopics() configuration method on the FlinkKafkaConsumer.
Consider this example: if you had a Kafka Consumer that was consuming from topic A, you did a savepoint, then changed your Kafka consumer to instead consume from topic B, and then restarted your job from the savepoint. Before this change, your consumer would now consume from both topic A and B because it was stored in state that the consumer was consuming from topic A. With the change, your consumer would only consume from topic B after restore because it now filters the topics that are stored in state using the configured topics.
Change in the Maven modules of Table API (FLINK-11064): Users that had a flink-table dependency before, need to update their dependencies to flink-table-planner and the correct dependency of flink-table-api-*, depending on whether Java or Scala is used: one of flink-table-api-java-bridge or flink-table-api-scala-bridge.
Known Issues # Discarded checkpoint can cause Tasks to fail (FLINK-11662): There is a race condition that can lead to erroneous checkpoint failures. This mostly occurs when restarting from a savepoint or checkpoint takes a long time at the sources of a job. If you see random checkpointing failures that don&rsquo;t seem to have a good explanation you might be affected. Please see the Jira issue for more details and a workaround for the problem. Release Notes # Please review the release notes for a more detailed list of changes and new features if you plan to upgrade your Flink setup to Flink 1.8.
List of Contributors # We would like to acknowledge all community members for contributing to this release. Special credits go to the following members for contributing to the 1.8.0 release (according to git log --pretty=&quot;%an&quot; release-1.7.0..release-1.8.0 | sort | uniq without manual deduplication):
Addison Higham, Aitozi, Aleksey Pak, Alexander Fedulov, Alexey Trenikhin, Aljoscha Krettek, Andrey Zagrebin, Artsem Semianenka, Asura7969, Avi, Barisa Obradovic, Benchao Li, Bo WANG, Chesnay Schepler, Congxian Qiu, Cristian, David Anderson, Dawid Wysakowicz, Dian Fu, DuBin, EAlexRojas, EronWright, Eugen Yushin, Fabian Hueske, Fokko Driesprong, Gary Yao, Hequn Cheng, Igal Shilman, Jamie Grier, JaryZhen, Jeff Zhang, Jihyun Cho, Jinhu Wu, Joerg Schad, KarmaGYZ, Kezhu Wang, Konstantin Knauf, Kostas Kloudas, Lakshmi, Lakshmi Gururaja Rao, Lavkesh Lahngir, Li, Shuangjiang, Mai Nakagawa, Matrix42, Matt, Maximilian Michels, Mododo, Nico Kruber, Paul Lin, Piotr Nowojski, Qi Yu, Qin, Robert, Robert Metzger, Romano Vacca, Rong Rong, Rune Skou Larsen, Seth Wiesman, Shannon Carey, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, SuXingLee, TANG Wen-hui, Tao Yang, Thomas Weise, Till Rohrmann, Timo Walther, Tom Goong, Tony Feng, Tony Wei, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Ufuk Celebi, Xingcan Cui, Xpray, XuQianJin-Stars, Xue Yu, Yangze Guo, Ying Xu, Yiqun Lin, Yu Li, Yuanyang Wu, Yun Tang, ZILI CHEN, Zhanchun Zhang, Zhijiang, ZiLi Chen, acqua.csq, alex04.wang, ap, azagrebin, blueszheng, boshu Zheng, chengjie.wu, chensq, chummyhe89, eaglewatcherwb, hequn8128, ifndef-SleePy, intsmaze, jackyyin, jinhu.wjh, jparkie, jrthe42, junsheng.wu, kgorman, kkloudas, kkolman, klion26, lamber-ken, leesf, libenchao, lining, liuzhaokun, lzh3636, maqingxiang, mb-datadome, okidogi, park.yq, sunhaibotb, sunjincheng121, tison, unknown, vinoyang, wenhuitang, wind, xueyu, xuqianjin, yanghua, zentol, zhangzhanchun, zhijiang, zhuzhu.zz, zy, 仲炜, 砚田, 谢磊
`}),e.add({id:191,href:"/2019/03/11/flink-and-prometheus-cloud-native-monitoring-of-streaming-applications/",title:"Flink and Prometheus: Cloud-native monitoring of streaming applications",section:"Flink Blog",content:`This blog post describes how developers can leverage Apache Flink&rsquo;s built-in metrics system together with Prometheus to observe and monitor streaming applications in an effective way. This is a follow-up post from my Flink Forward Berlin 2018 talk (slides, video). We will cover some basic Prometheus concepts and why it is a great fit for monitoring Apache Flink stream processing jobs. There is also an example to showcase how you can utilize Prometheus with Flink to gain insights into your applications and be alerted on potential degradations of your Flink jobs.
Why Prometheus? # Prometheus is a metrics-based monitoring system that was originally created in 2012. The system is completely open-source (under the Apache License 2) with a vibrant community behind it and it has graduated from the Cloud Native Foundation last year – a sign of maturity, stability and production-readiness. As we mentioned, the system is based on metrics and it is designed to measure the overall health, behavior and performance of a service. Prometheus features a multi-dimensional data model as well as a flexible query language. It is designed for reliability and can easily be deployed in traditional or containerized environments. Some of the important Prometheus concepts are:
Metrics: Prometheus defines metrics as floats of information that change in time. These time series have millisecond precision.
Labels are the key-value pairs associated with time series that support Prometheus&rsquo; flexible and powerful data model – in contrast to hierarchical data structures that one might experience with traditional metrics systems.
Scrape: Prometheus is a pull-based system and fetches (&ldquo;scrapes&rdquo;) metrics data from specified sources that expose HTTP endpoints with a text-based format.
PromQL is Prometheus&rsquo; query language. It can be used for both building dashboards and setting up alert rules that will trigger when specific conditions are met.
When considering metrics and monitoring systems for your Flink jobs, there are many options. Flink offers native support for exposing data to Prometheus via the PrometheusReporter configuration. Setting up this integration is very easy.
Prometheus is a great choice as usually Flink jobs are not running in isolation but in a greater context of microservices. For making metrics available to Prometheus from other parts of a larger system, there are two options: There exist libraries for all major languages to instrument other applications. Additionally, there is a wide variety of exporters, which are tools that expose metrics of third-party systems (like databases or Apache Kafka) as Prometheus metrics.
Prometheus and Flink in Action # We have provided a GitHub repository that demonstrates the integration described above. To have a look, clone the repository, make sure Docker is installed and run:
./gradlew composeUp This builds a Flink job using the build tool Gradle and starts up a local environment based on Docker Compose running the job in a Flink job cluster (reachable at http://localhost:8081) as well as a Prometheus instance (http://localhost:9090).
Job graph and custom metric for example job in Flink web interface. The PrometheusExampleJob has three operators: Random numbers up to 10,000 are generated, then a map counts the events and creates a histogram of the values passed through. Finally, the events are discarded without further output. The very simple code below is from the second operator. It illustrates how easy it is to add custom metrics relevant to your business logic into your Flink job.
class FlinkMetricsExposingMapFunction extends RichMapFunction&lt;Integer, Integer&gt; { private transient Counter eventCounter; @Override public void open(Configuration parameters) { eventCounter = getRuntimeContext().getMetricGroup().counter(&#34;events&#34;); } @Override public Integer map(Integer value) { eventCounter.inc(); return value; } } Excerpt from FlinkMetricsExposingMapFunction.java demonstrating custom Flink metric. Configuring Prometheus with Flink # To start monitoring Flink with Prometheus, the following steps are necessary:
Make the PrometheusReporter jar available to the classpath of the Flink cluster (it comes with the Flink distribution):
cp /opt/flink/opt/flink-metrics-prometheus-1.7.2.jar /opt/flink/lib Configure the reporter in Flink&rsquo;s flink-conf.yaml. All job managers and task managers will expose the metrics on the configured port.
metrics.reporters: prom metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter metrics.reporter.prom.port: 9999 Prometheus needs to know where to scrape metrics. In a static scenario, you can simply configure Prometheus in prometheus.yml with the following:
scrape_configs: - job_name: 'flink' static_configs: - targets: ['job-cluster:9999', 'taskmanager1:9999', 'taskmanager2:9999'] In more dynamic scenarios we recommend using Prometheus&rsquo; service discovery support for different platforms such as Kubernetes, AWS EC2 and more.
Both custom metrics are now available in Prometheus:
Example metric in Prometheus web UI. More technical metrics from the Flink cluster (like checkpoint sizes or duration, Kafka offsets or resource consumption) are also available. If you are interested, you can check out the HTTP endpoints exposing all Prometheus metrics for the job managers and the two task managers on http://localhost:9249, http://localhost:9250 and http://localhost:9251, respectively.
To test Prometheus&rsquo; alerting feature, kill one of the Flink task managers via
docker kill taskmanager1 Our Flink job can recover from this partial failure via the mechanism of Checkpointing. Nevertheless, after roughly one minute (as configured in the alert rule) the following alert will fire:
Example alert in Prometheus web UI. In real-world situations alerts like this one can be routed through a component called Alertmanager and be grouped into notifications to systems like email, PagerDuty or Slack.
Go ahead and play around with the setup, and check out the Grafana instance reachable at http://localhost:3000 (credentials admin:flink) for visualizing Prometheus metrics. If there are any questions or problems, feel free to create an issue. Once finished, do not forget to tear down the setup via
./gradlew composeDown Conclusion # Using Prometheus together with Flink provides an easy way for effective monitoring and alerting of your Flink jobs. Both projects have exciting and vibrant communities behind them with new developments and additions scheduled for upcoming releases. We encourage you to try the two technologies together as it has immensely improved our insights into Flink jobs running in production.
`}),e.add({id:192,href:"/2019/03/06/what-to-expect-from-flink-forward-san-francisco-2019/",title:"What to expect from Flink Forward San Francisco 2019",section:"Flink Blog",content:`The third annual Flink Forward San Francisco is just a few weeks away! As always, Flink Forward will be the right place to meet and mingle with experienced Flink users, contributors, and committers. Attendees will hear and chat about the latest developments around Flink and learn from technical deep-dive sessions and exciting use cases that were put into production with Flink. The event will take place on April 1-2, 2019 at Hotel Nikko in San Francisco. The program committee assembled an amazing lineup of speakers who will cover many different aspects of Apache Flink and stream processing.
Some highlights of the program are:
Realtime Store Visit Predictions at Scale: Luca Giovagnoli from Yelp will talk about a &ldquo;multidisciplinary&rdquo; Flink application that combines geospatial clustering algorithms, Machine Learning models, and cutting-edge stream-processing technology.
Real-time Processing with Flink for Machine Learning at Netflix: Elliot Chow will discuss the practical aspects of using Apache Flink to power Machine Learning algorithms for video recommendations, search results ranking, and selection of artwork images at Netflix.
Building production Flink jobs with Airstream at Airbnb: Pala Muthiah and Hao Wang will reveal how Airbnb builds real time data pipelines with Airstream, Airbnb&rsquo;s computation framework that is powered by Flink SQL.
When Table meets AI: Build Flink AI Ecosystem on Table API: Shaoxuan Wang from Alibaba will discuss how they are building a solid AI ecosystem for unified batch/streaming Machine Learning data pipelines on top of Flink&rsquo;s Table API.
Adventures in Scaling from Zero to 5 Billion Data Points per Day: Dave Torok will take us through Comcast&rsquo;s journey in scaling the company&rsquo;s operationalized Machine Learning framework from the very early days in production to processing more than 5 billion data points per day.
If you&rsquo;re new to Apache Flink or want to deepen your knowledge around the framework, Flink Forward features again a full day of training.
You can choose from 3 training tracks:
Introduction to Streaming with Apache Flink: A hands-on, in-depth introduction to stream processing and Apache Flink, this course emphasizes those features of Flink that make it easy to build and manage accurate, fault tolerant applications on streams.
Analyzing Streaming Data with Flink SQL: In this hands-on training, you will learn what it means to run SQL queries on data streams and how to fully leverage the potential of SQL on Flink. We&rsquo;ll also cover some of the more recent features such as time-versioned joins and the MATCH RECOGNIZE clause.
Troubleshooting and Operating Flink at large scale: In this training, we will focus on everything you need to run Apache Flink applications reliably and efficiently in production including topics like capacity planning, monitoring, troubleshooting and tuning Apache Flink.
If you haven&rsquo;t done so yet, check out the full schedule and register your attendance. I&rsquo;m looking forward to meet you at Flink Forward San Francisco.
Fabian
`}),e.add({id:193,href:"/2019/02/25/apache-flink-1.6.4-released/",title:"Apache Flink 1.6.4 Released",section:"Flink Blog",content:`The Apache Flink community released the fourth bugfix version of the Apache Flink 1.6 series.
This release includes more than 25 fixes and minor improvements for Flink 1.6.3. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.6.4.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.6.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.4&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-10721] - Kafka discovery-loop exceptions may be swallowed [FLINK-10761] - MetricGroup#getAllVariables can deadlock [FLINK-10774] - connection leak when partition discovery is disabled and open throws exception [FLINK-10848] - Flink&#39;s Yarn ResourceManager can allocate too many excess containers [FLINK-11022] - Update LICENSE and NOTICE files for older releases [FLINK-11071] - Dynamic proxy classes cannot be resolved when deserializing job graph [FLINK-11084] - Incorrect ouput after two consecutive split and select [FLINK-11119] - Incorrect Scala example for Table Function [FLINK-11134] - Invalid REST API request should not log the full exception in Flink logs [FLINK-11151] - FileUploadHandler stops working if the upload directory is removed [FLINK-11173] - Proctime attribute validation throws an incorrect exception message [FLINK-11224] - Log is missing in scala-shell [FLINK-11232] - Empty Start Time of sub-task on web dashboard [FLINK-11234] - ExternalTableCatalogBuilder unable to build a batch-only table [FLINK-11235] - Elasticsearch connector leaks threads if no connection could be established [FLINK-11251] - Incompatible metric name on prometheus reporter [FLINK-11389] - Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor [FLINK-11584] - ConfigDocsCompletenessITCase fails DescriptionBuilder#linebreak() is used [FLINK-11585] - Prefix matching in ConfigDocsGenerator can result in wrong assignments Improvement [FLINK-10910] - Harden Kubernetes e2e test [FLINK-11079] - Skip deployment for flnk-storm-examples [FLINK-11207] - Update Apache commons-compress from 1.4.1 to 1.18 [FLINK-11262] - Bump jython-standalone to 2.7.1 [FLINK-11289] - Rework example module structure to account for licensing [FLINK-11304] - Typo in time attributes doc [FLINK-11469] - fix Tuning Checkpoints and Large State doc `}),e.add({id:194,href:"/2019/02/21/monitoring-apache-flink-applications-101/",title:"Monitoring Apache Flink Applications 101",section:"Flink Blog",content:` This blog post provides an introduction to Apache Flink’s built-in monitoring and metrics system, that allows developers to effectively monitor their Flink jobs. Oftentimes, the task of picking the relevant metrics to monitor a Flink application can be overwhelming for a DevOps team that is just starting with stream processing and Apache Flink. Having worked with many organizations that deploy Flink at scale, I would like to share my experience and some best practice with the community.
With business-critical applications running on Apache Flink, performance monitoring becomes an increasingly important part of a successful production deployment. It ensures that any degradation or downtime is immediately identified and resolved as quickly as possible.
Monitoring goes hand-in-hand with observability, which is a prerequisite for troubleshooting and performance tuning. Nowadays, with the complexity of modern enterprise applications and the speed of delivery increasing, an engineering team must understand and have a complete overview of its applications’ status at any given point in time.
Flink’s Metrics System # The foundation for monitoring Flink jobs is its metrics system which consists of two components; Metrics and MetricsReporters.
Metrics # Flink comes with a comprehensive set of built-in metrics such as:
Used JVM Heap / NonHeap / Direct Memory (per Task-/JobManager) Number of Job Restarts (per Job) Number of Records Per Second (per Operator) &hellip; These metrics have different scopes and measure more general (e.g. JVM or operating system) as well as Flink-specific aspects.
As a user, you can and should add application-specific metrics to your functions. Typically these include counters for the number of invalid records or the number of records temporarily buffered in managed state. Besides counters, Flink offers additional metrics types like gauges and histograms. For instructions on how to register your own metrics with Flink’s metrics system please check out Flink’s documentation. In this blog post, we will focus on how to get the most out of Flink’s built-in metrics.
MetricsReporters # All metrics can be queried via Flink’s REST API. However, users can configure MetricsReporters to send the metrics to external systems. Apache Flink provides reporters to the most common monitoring tools out-of-the-box including JMX, Prometheus, Datadog, Graphite and InfluxDB. For information about how to configure a reporter check out Flink’s MetricsReporter documentation.
In the remaining part of this blog post, we will go over some of the most important metrics to monitor your Apache Flink application.
Monitoring General Health # The first thing you want to monitor is whether your job is actually in a RUNNING state. In addition, it pays off to monitor the number of restarts and the time since the last restart.
Generally speaking, successful checkpointing is a strong indicator of the general health of your application. For each checkpoint, checkpoint barriers need to flow through the whole topology of your Flink job and events and barriers cannot overtake each other. Therefore, a successful checkpoint shows that no channel is fully congested.
Key Metrics
Metric Scope Description uptime job The time that the job has been running without interruption. fullRestarts job The total number of full restarts since this job was submitted. numberOfCompletedCheckpoints job The number of successfully completed checkpoints. numberOfFailedCheckpoints job The number of failed checkpoints. Example Dashboard Panels
Uptime (35 minutes), Restarting Time (3 milliseconds) and Number of Full Restarts (7) Completed Checkpoints (18336), Failed (14) Possible Alerts
ΔfullRestarts &gt; threshold ΔnumberOfFailedCheckpoints &gt; threshold Monitoring Progress &amp; Throughput # Knowing that your application is RUNNING and checkpointing is working fine is good, but it does not tell you whether the application is actually making progress and keeping up with the upstream systems.
Throughput # Flink provides multiple metrics to measure the throughput of our application. For each operator or task (remember: a task can contain multiple chained tasks Flink counts the number of records and bytes going in and out. Out of those metrics, the rate of outgoing records per operator is often the most intuitive and easiest to reason about.
Key Metrics
Metric Scope Description numRecordsOutPerSecond task The number of records this operator/task sends per second. numRecordsOutPerSecond operator The number of records this operator sends per second. Example Dashboard Panels
Mean Records Out per Second per Operator Possible Alerts
recordsOutPerSecond = 0 (for a non-Sink operator) Note: Source operators always have zero incoming records. Sink operators always have zero outgoing records because the metrics only count Flink-internal communication. There is a JIRA ticket to change this behavior.
Progress # For applications, that use event time semantics, it is important that watermarks progress over time. A watermark of time t tells the framework, that it should not anymore expect to receive events with a timestamp earlier than t, and in turn, to trigger all operations that were scheduled for a timestamp &lt; t. For example, an event time window that ends at t = 30 will be closed and evaluated once the watermark passes 30.
As a consequence, you should monitor the watermark at event time-sensitive operators in your application, such as process functions and windows. If the difference between the current processing time and the watermark, known as even-time skew, is unusually high, then it typically implies one of two issues. First, it could mean that your are simply processing old events, for example during catch-up after a downtime or when your job is simply not able to keep up and events are queuing up. Second, it could mean a single upstream sub-task has not sent a watermark for a long time (for example because it did not receive any events to base the watermark on), which also prevents the watermark in downstream operators to progress. This JIRA ticket provides further information and a work around for the latter.
Key Metrics
Metric Scope Description currentOutputWatermark operator The last watermark this operator has emitted. Example Dashboard Panels
Event Time Lag per Subtask of a single operator in the topology. In this case, the watermark is lagging a few seconds behind for each subtask. Possible Alerts
currentProcessingTime - currentOutputWatermark &gt; threshold &ldquo;Keeping Up&rdquo; # When consuming from a message queue, there is often a direct way to monitor if your application is keeping up. By using connector-specific metrics you can monitor how far behind the head of the message queue your current consumer group is. Flink forwards the underlying metrics from most sources.
Key Metrics
Metric Scope Description records-lag-max user applies to FlinkKafkaConsumer. The maximum lag in terms of the number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers. millisBehindLatest user applies to FlinkKinesisConsumer. The number of milliseconds a consumer is behind the head of the stream. For any consumer and Kinesis shard, this indicates how far it is behind the current time. Possible Alerts
records-lag-max &gt; threshold millisBehindLatest &gt; threshold Monitoring Latency # Generally speaking, latency is the delay between the creation of an event and the time at which results based on this event become visible. Once the event is created it is usually stored in a persistent message queue, before it is processed by Apache Flink, which then writes the results to a database or calls a downstream system. In such a pipeline, latency can be introduced at each stage and for various reasons including the following:
It might take a varying amount of time until events are persisted in the message queue. During periods of high load or during recovery, events might spend some time in the message queue until they are processed by Flink (see previous section). Some operators in a streaming topology need to buffer events for some time (e.g. in a time window) for functional reasons. Each computation in your Flink topology (framework or user code), as well as each network shuffle, takes time and adds to latency. If the application emits through a transactional sink, the sink will only commit and publish transactions upon successful checkpoints of Flink, adding latency usually up to the checkpointing interval for each record. In practice, it has proven invaluable to add timestamps to your events at multiple stages (at least at creation, persistence, ingestion by Flink, publication by Flink, possibly sampling those to save bandwidth). The differences between these timestamps can be exposed as a user-defined metric in your Flink topology to derive the latency distribution of each stage.
In the rest of this section, we will only consider latency, which is introduced inside the Flink topology and cannot be attributed to transactional sinks or events being buffered for functional reasons (4.).
To this end, Flink comes with a feature called Latency Tracking. When enabled, Flink will insert so-called latency markers periodically at all sources. For each sub-task, a latency distribution from each source to this operator will be reported. The granularity of these histograms can be further controlled by setting metrics.latency.granularity as desired.
Due to the potentially high number of histograms (in particular for metrics.latency.granularity: subtask), enabling latency tracking can significantly impact the performance of the cluster. It is recommended to only enable it to locate sources of latency during debugging.
Metrics
Metric Scope Description latency operator The latency from the source operator to this operator. restartingTime job The time it took to restart the job, or how long the current restart has been in progress. Example Dashboard Panel
Latency distribution between a source and a single sink subtask. JVM Metrics # So far we have only looked at Flink-specific metrics. As long as latency &amp; throughput of your application are in line with your expectations and it is checkpointing consistently, this is probably everything you need. On the other hand, if you job’s performance is starting to degrade among the firstmetrics you want to look at are memory consumption and CPU load of your Task- &amp; JobManager JVMs.
Memory # Flink reports the usage of Heap, NonHeap, Direct &amp; Mapped memory for JobManagers and TaskManagers.
Heap memory - as with most JVM applications - is the most volatile and important metric to watch. This is especially true when using Flink’s filesystem statebackend as it keeps all state objects on the JVM Heap. If the size of long-living objects on the Heap increases significantly, this can usually be attributed to the size of your application state (check the checkpointing metrics for an estimated size of the on-heap state). The possible reasons for growing state are very application-specific. Typically, an increasing number of keys, a large event-time skew between different input streams or simply missing state cleanup may cause growing state.
NonHeap memory is dominated by the metaspace, the size of which is unlimited by default and holds class metadata as well as static content. There is a JIRA Ticket to limit the size to 250 megabyte by default.
The biggest driver of Direct memory is by far the number of Flink’s network buffers, which can be configured.
Mapped memory is usually close to zero as Flink does not use memory-mapped files.
In a containerized environment you should additionally monitor the overall memory consumption of the Job- and TaskManager containers to ensure they don’t exceed their resource limits. This is particularly important, when using the RocksDB statebackend, since RocksDB allocates a considerable amount of memory off heap. To understand how much memory RocksDB might use, you can checkout this blog post by Stefan Richter.
Key Metrics
Metric Scope Description Status.JVM.Memory.NonHeap.Committed job-/taskmanager The amount of non-heap memory guaranteed to be available to the JVM (in bytes). Status.JVM.Memory.Heap.Used job-/taskmanager The amount of heap memory currently used (in bytes). Status.JVM.Memory.Heap.Committed job-/taskmanager The amount of heap memory guaranteed to be available to the JVM (in bytes). Status.JVM.Memory.Direct.MemoryUsed job-/taskmanager The amount of memory used by the JVM for the direct buffer pool (in bytes). Status.JVM.Memory.Mapped.MemoryUsed job-/taskmanager The amount of memory used by the JVM for the mapped buffer pool (in bytes). Status.JVM.GarbageCollector.G1 Young Generation.Time job-/taskmanager The total time spent performing G1 Young Generation garbage collection. Status.JVM.GarbageCollector.G1 Old Generation.Time job-/taskmanager The total time spent performing G1 Old Generation garbage collection. Example Dashboard Panel
TaskManager memory consumption and garbage collection times. JobManager memory consumption and garbage collection times. Possible Alerts
container memory limit &lt; container memory + safety margin CPU # Besides memory, you should also monitor the CPU load of the TaskManagers. If your TaskManagers are constantly under very high load, you might be able to improve the overall performance by decreasing the number of task slots per TaskManager (in case of a Standalone setup), by providing more resources to the TaskManager (in case of a containerized setup), or by providing more TaskManagers. In general, a system already running under very high load during normal operations, will need much more time to catch-up after recovering from a downtime. During this time you will see a much higher latency (event-time skew) than usual.
A sudden increase in the CPU load might also be attributed to high garbage collection pressure, which should be visible in the JVM memory metrics as well.
If one or a few TaskManagers are constantly under very high load, this can slow down the whole topology due to long checkpoint alignment times and increasing event-time skew. A common reason is skew in the partition key of the data, which can be mitigated by pre-aggregating before the shuffle or keying on a more evenly distributed key.
Key Metrics
Metric Scope Description Status.JVM.CPU.Load job-/taskmanager The recent CPU usage of the JVM. Example Dashboard Panel
TaskManager & JobManager CPU load. System Resources # In addition to the JVM metrics above, it is also possible to use Flink’s metrics system to gather insights about system resources, i.e. memory, CPU &amp; network-related metrics for the whole machine as opposed to the Flink processes alone. System resource monitoring is disabled by default and requires additional dependencies on the classpath. Please check out the Flink system resource metrics documentation for additional guidance and details. System resource monitoring in Flink can be very helpful in setups without existing host monitoring capabilities.
Conclusion # This post tries to shed some light on Flink’s metrics and monitoring system. You can utilise it as a starting point when you first think about how to successfully monitor your Flink application. I highly recommend to start monitoring your Flink application early on in the development phase. This way you will be able to improve your dashboards and alerts over time and, more importantly, observe the performance impact of the changes to your application throughout the development phase. By doing so, you can ask the right questions about the runtime behaviour of your application, and learn much more about Flink’s internals early on.
Last but not least, this post only scratches the surface of the overall metrics and monitoring capabilities of Apache Flink. I highly recommend going over Flink’s metrics documentation for a full reference of Flink’s metrics system.
`}),e.add({id:195,href:"/2019/02/15/apache-flink-1.7.2-released/",title:"Apache Flink 1.7.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.7 series.
This release includes more than 40 fixes and minor improvements for Flink 1.7.1, covering several critical recovery issues as well as problems in the Flink streaming connectors.
The list below includes a detailed list of all fixes. We highly recommend all users to upgrade to Flink 1.7.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.7.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.7.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.7.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-11179] - JoinCancelingITCase#testCancelSortMatchWhileDoingHeavySorting test error [FLINK-11180] - ProcessFailureCancelingITCase#testCancelingOnProcessFailure [FLINK-11181] - SimpleRecoveryITCaseBase test error Bug [FLINK-10721] - Kafka discovery-loop exceptions may be swallowed [FLINK-10761] - MetricGroup#getAllVariables can deadlock [FLINK-10774] - connection leak when partition discovery is disabled and open throws exception [FLINK-10848] - Flink&#39;s Yarn ResourceManager can allocate too many excess containers [FLINK-11046] - ElasticSearch6Connector cause thread blocked when index failed with retry [FLINK-11071] - Dynamic proxy classes cannot be resolved when deserializing job graph [FLINK-11083] - CRowSerializerConfigSnapshot is not instantiable [FLINK-11084] - Incorrect ouput after two consecutive split and select [FLINK-11100] - Presto S3 FileSystem E2E test broken [FLINK-11119] - Incorrect Scala example for Table Function [FLINK-11134] - Invalid REST API request should not log the full exception in Flink logs [FLINK-11145] - Fix Hadoop version handling in binary release script [FLINK-11151] - FileUploadHandler stops working if the upload directory is removed [FLINK-11168] - LargePlanTest times out on Travis [FLINK-11173] - Proctime attribute validation throws an incorrect exception message [FLINK-11187] - StreamingFileSink with S3 backend transient socket timeout issues [FLINK-11191] - Exception in code generation when ambiguous columns in MATCH_RECOGNIZE [FLINK-11194] - missing Scala 2.12 build of HBase connector [FLINK-11201] - Document SBT dependency requirements when using MiniClusterResource [FLINK-11224] - Log is missing in scala-shell [FLINK-11227] - The DescriptorProperties contains some bounds checking errors [FLINK-11232] - Empty Start Time of sub-task on web dashboard [FLINK-11234] - ExternalTableCatalogBuilder unable to build a batch-only table [FLINK-11235] - Elasticsearch connector leaks threads if no connection could be established [FLINK-11246] - Fix distinct AGG visibility issues [FLINK-11251] - Incompatible metric name on prometheus reporter [FLINK-11279] - Invalid week interval parsing in ExpressionParser [FLINK-11302] - FlinkS3FileSystem uses an incorrect path for temporary files. [FLINK-11389] - Incorrectly use job information when call getSerializedTaskInformation in class TaskDeploymentDescriptor [FLINK-11419] - StreamingFileSink fails to recover after taskmanager failure [FLINK-11436] - Java deserialization failure of the AvroSerializer when used in an old CompositeSerializers New Feature [FLINK-10457] - Support SequenceFile for StreamingFileSink Improvement [FLINK-10910] - Harden Kubernetes e2e test [FLINK-11023] - Update LICENSE and NOTICE files for flink-connectors [FLINK-11079] - Skip deployment for flink-storm-examples [FLINK-11207] - Update Apache commons-compress from 1.4.1 to 1.18 [FLINK-11216] - Back to top button is missing in the Joining document and is not properly placed in the Process Function document [FLINK-11262] - Bump jython-standalone to 2.7.1 [FLINK-11289] - Rework example module structure to account for licensing [FLINK-11304] - Typo in time attributes doc [FLINK-11331] - Fix errors in tableApi.md and functions.md [FLINK-11469] - fix Tuning Checkpoints and Large State doc [FLINK-11473] - Clarify Documenation on Latency Tracking [FLINK-11628] - Cache maven on travis `}),e.add({id:196,href:"/2019/02/13/batch-as-a-special-case-of-streaming-and-alibabas-contribution-of-blink/",title:"Batch as a Special Case of Streaming and Alibaba's contribution of Blink",section:"Flink Blog",content:`Last week, we broke the news that Alibaba decided to contribute its Flink-fork, called Blink, back to the Apache Flink project. Why is that a big thing for Flink, what will it mean for users and the community, and how does it fit into Flink’s overall vision? Let&rsquo;s take a step back to understand this better&hellip;
A Unified Approach to Batch and Streaming # Since its early days, Apache Flink has followed the philosophy of taking a unified approach to batch and streaming data processing. The core building block is &ldquo;continuous processing of unbounded data streams&rdquo;: if you can do that, you can also do offline processing of bounded data sets (batch processing use cases), because these are just streams that happen to end at some point.
The &ldquo;streaming first, with batch as a special case of streaming&rdquo; philosophy is supported by various projects (for example Flink, Beam, etc.) and often been cited as a powerful way to build data applications that generalize across real-time and offline processing and to help greatly reduce the complexity of data infrastructures.
Why are there still batch processors? # However, &ldquo;batch is just a special case of streaming&rdquo; does not mean that any stream processor is now the right tool for your batch processing use cases - the introduction of stream processors did not render batch processors obsolete:
Pure stream processing systems are very slow at batch processing workloads. No one would consider it a good idea to use a stream processor that shuffles through message queues to analyze large amounts of available data.
Unified APIs like Apache Beam often delegate to different runtimes depending on whether the data is continuous/unbounded of fix/bounded. For example, the implementations of the batch and streaming runtime of Google Cloud Dataflow are different, to get the desired performance and resilience in each case.
Apache Flink has a streaming API that can do bounded/unbounded use cases, but still offers a separate DataSet API and runtime stack that is faster for batch use cases.
What is the reason for the above? Where did &ldquo;batch is just a special case of streaming&rdquo; go wrong?
The answer is simple, nothing is wrong with that paradigm. Unifying batch and streaming in the API is one aspect. One needs to also exploit certain characteristics of the special case “bounded data” in the runtime to competitively handle batch processing use cases. After all, batch processors have been built specifically for that special case.
Batch on top of a Streaming Runtime # We always believed that it is possible to have a runtime that is state-of-the-art for both stream processing and batch processing use cases at the same time. A runtime that is streaming-first, but can exploit just the right amount of special properties of bounded streams to be as fast for batch use cases as dedicated batch processors. This is the unique approach that Flink takes.
Apache Flink has a network stack that supports both low-latency/high-throughput streaming data exchanges, as well as high-throughput batch shuffles. Flink has streaming runtime operators for many operations, but also specialized operators for bounded inputs, which get used when you choose the DataSet API or select the batch environment in the Table API.
The figure illustrates a streaming join and a batch join. The batch join can read one input fully into a hash table and then probe with the other input. The stream join needs to build tables for both sides, because it needs to continuously process both inputs. For data larger than memory, the batch join can partition both data sets into subsets that fit in memory (data hits disk once) whereas the continuous nature of the stream join requires it to always keep all data in the table and repeatedly hit disk on cache misses. Because of that, Apache Flink has been actually demonstrating some pretty impressive batch processing performance since its early days. The below benchmark is a bit older, but validated our architectural approach early on.
Time to sort 3.2 TB (80 GB/node), in seconds
(Presentation by Dongwon Kim, Flink Forward Berlin 2015.) What is still missing? # To conclude the approach and make Flink&rsquo;s experience on bounded data (batch) state-of-the-art, we need to add a few more enhancements. We believe that these features are key to realizing our vision:
(1) A truly unified runtime operator stack: Currently the bounded and unbounded operators have a different network and threading model and don&rsquo;t mix and match. The original reason was that batch operators followed a &ldquo;pull model&rdquo; (easier for batch algorithms), while streaming operators followed a &ldquo;push model&rdquo; (better latency/throughput characteristics). In a unified stack, continuous streaming operators are the foundation. When operating on bounded data without latency constraints, the API or the query optimizer can select from a larger set of operators. The optimizer can pick, for example, a specialized join operator that first consumes one input stream entirely before reading the second input stream.
(2) Exploiting bounded streams to reduce the scope of fault tolerance: When input data is bounded, it is possible to completely buffer data during shuffles (memory or disk) and replay that data after a failure. This makes recovery more fine grained and thus much more efficient.
(3) Exploiting bounded stream operator properties for scheduling: A continuous unbounded streaming application needs (by definition) all operators running at the same time. An application on bounded data can schedule operations after another, depending on how the operators consume data (e.g., first build hash table, then probe hash table). This increases resource efficiency.
(4) Enabling these special case optimizations for the DataStream API: Currently, only the Table API (which is unified across bounded/unbounded streams) activates these optimizations when working on bounded data.
(5) Performance and coverage for SQL: SQL is the de-facto standard data language, and while it is also being rapidly adopted for continuous streaming use cases, there is absolutely no way past it for bounded/batch use cases. To be competitive with the best batch engines, Flink needs more coverage and performance for the SQL query execution. While the core data-plane in Flink is high performance, the speed of SQL execution ultimately depends a lot also on optimizer rules, a rich set of operators, and features like code generation.
Enter Blink # Blink is a fork of Apache Flink, originally created inside Alibaba to improve Flink’s behavior for internal use cases. Blink adds a series of improvements and integrations (see the Readme for details), many of which fall into the category of improved bounded-data/batch processing and SQL. In fact, of the above list of features for a unified batch/streaming system, Blink implements significant steps forward in all except (4):
Unified Stream Operators: Blink extends the Flink streaming runtime operator model to support selectively reading from different inputs, while keeping the push model for very low latency. This control over the inputs helps to now support algorithms like hybrid hash-joins on the same operator and threading model as continuous symmetric joins through RocksDB. These operators also form the basis for future features like “Side Inputs”.
Table API &amp; SQL Query Processor: The SQL query processor is the component that evolved the changed most compared to the latest Flink master branch:
While Flink currently translates queries either into DataSet or DataStream programs (depending on the characteristics of their inputs), Blink translates queries to a data flow of the aforementioned stream operators.
Blink adds many more runtime operators for common SQL operations like semi-joins, anti-joins, etc.
The query planner (optimizer) is still based on Apache Calcite, but has many more optimization rules (incl. join reordering) and uses a proper cost model for planning.
Stream operators are more aggressively chained.
The common data structures (sorters, hash tables) and serializers are extended to go even further in operating on binary data and saving serialization overhead. Code generation is used for the row serializers.
Improved Scheduling and Failure Recovery: Finally, Blink implements several improvements for task scheduling and fault tolerance. The scheduling strategies use resources better by exploiting how the operators process their input data. The failover strategies recover more fine-grained along the boundaries of persistent shuffles. A failed JobManager can be replaced without restarting a running application.
The changes in Blink result in a big improvement in performance. The below numbers were reported by the developers of Blink to give a rough impression of the performance gains.
Relative performance of Blink versus Flink 1.6.0 in the TPC-H benchmark, query by query.
The performance improvement is in average 10x.
Presentation by Xiaowei Jiang at Flink Forward Berlin, 2018.) Performance of Blink versus Spark in the TPC-DS benchmark, aggregate time for all queries together.
Presentation by Xiaowei Jiang at Flink Forward Beijing, 2018. How do we plan to merge Blink and Flink? # Blink’s code is currently available as a branch in the Apache Flink repository. It is a challenge to merge a such big amount of changes, while making the merge process as non-disruptive as possible and keeping public APIs as stable as possible.
The community’s merge plan focuses initially on the bounded/batch processing features mentioned above and follows the following approach to ensure a smooth integration:
To merge Blink’s SQL/Table API query processor enhancements, we exploit the fact that both Flink and Blink have the same APIs: SQL and the Table API. Following some restructuring of the Table/SQL module (FLIP-32) we plan to merge the Blink query planner (optimizer) and runtime (operators) as an additional query processor next to the current SQL runtime. Think of it as two different runners for the same APIs.
Initially, users will be able to select which query processor to use. After a transition period in which the new query processor will be developed to subsume the current query processor, the current processor will most likely be deprecated and eventually dropped. Given that SQL is such a well defined interface, we anticipate that this transition has little friction for users. Mostly a pleasant surprise to have broader SQL feature coverage and a boost in performance.
To support the merge of Blink’s enhancements to scheduling and recovery for jobs on bounded data, the Flink community is already working on refactoring its current schedule and adding support for pluggable scheduling and fail-over strategies.
Once this effort is finished, we can add Blink’s scheduling and recovery strategies as a new scheduling strategy that is used by the new query processor. Eventually, we plan to use the new scheduling strategy also for bounded DataStream programs.
The extended catalog support, DDL support, as well as support for Hive’s catalog and integrations is currently going through separate design discussions. We plan to leverage existing code here whenever it makes sense.
Summary # We believe that the data processing stack of the future is based on stream processing: The elegance of stream processing with its ability to model offline processing (batch), real-time data processing, and event-driven applications in the same way, while offering high performance and consistency is simply too compelling.
Exploiting certain properties of bounded data is important for a stream processor to achieve the same performance as dedicated batch processors. While Flink always supported batch processing, the project is taking the next step in building a unified runtime and towards becoming a stream processor that is competitive with batch processing systems even on their home turf: OLAP SQL. The contribution of Alibaba’s Blink code helps the Flink community to pick up the speed on this development.
`}),e.add({id:197,href:"/2018/12/26/apache-flink-1.5.6-released/",title:"Apache Flink 1.5.6 Released",section:"Flink Blog",content:`The Apache Flink community released the sixth and last bugfix version of the Apache Flink 1.5 series.
This release includes more than 47 fixes and minor improvements for Flink 1.5.5. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.6.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.6&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.6&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.6&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10252] - Handle oversized metric messages [FLINK-10863] - Assign uids to all operators Bug [FLINK-8336] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability [FLINK-9646] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis [FLINK-10166] - Dependency problems when executing SQL query in sql-client [FLINK-10309] - Cancel with savepoint fails with java.net.ConnectException when using the per job-mode [FLINK-10419] - ClassNotFoundException while deserializing user exceptions from checkpointing [FLINK-10455] - Potential Kafka producer leak in case of failures [FLINK-10482] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints [FLINK-10491] - Deadlock during spilling data in SpillableSubpartition [FLINK-10566] - Flink Planning is exponential in the number of stages [FLINK-10581] - YarnConfigurationITCase.testFlinkContainerMemory test instability [FLINK-10642] - CodeGen split fields errors when maxGeneratedCodeLength equals 1 [FLINK-10655] - RemoteRpcInvocation not overwriting ObjectInputStream&#39;s ClassNotFoundException [FLINK-10669] - Exceptions &amp; errors are not properly checked in logs in e2e tests [FLINK-10670] - Fix Correlate codegen error [FLINK-10674] - Fix handling of retractions after clean up [FLINK-10690] - Tests leak resources via Files.list [FLINK-10693] - Fix Scala EitherSerializer duplication [FLINK-10715] - E2e tests fail with ConcurrentModificationException in MetricRegistryImpl [FLINK-10750] - SocketClientSinkTest.testRetry fails on Travis [FLINK-10752] - Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored [FLINK-10753] - Propagate and log snapshotting exceptions [FLINK-10770] - Some generated functions are not opened properly. [FLINK-10773] - Resume externalized checkpoint end-to-end test fails [FLINK-10821] - Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint [FLINK-10839] - Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer [FLINK-10856] - Harden resume from externalized checkpoint E2E test [FLINK-10857] - Conflict between JMX and Prometheus Metrics reporter [FLINK-10880] - Failover strategies should not be applied to Batch Execution [FLINK-10913] - ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis [FLINK-10925] - NPE in PythonPlanStreamer [FLINK-10990] - Enforce minimum timespan in MeterView [FLINK-10998] - flink-metrics-ganglia has LGPL dependency [FLINK-11011] - Elasticsearch 6 sink end-to-end test unstable Improvement [FLINK-4173] - Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics [FLINK-9869] - Send PartitionInfo in batch to Improve perfornance [FLINK-10613] - Remove logger casts in HBaseConnectorITCase [FLINK-10614] - Update test_batch_allround.sh e2e to new testing infrastructure [FLINK-10637] - Start MiniCluster with random REST port [FLINK-10678] - Add a switch to run_test to configure if logs should be checked for errors/excepions [FLINK-10906] - docker-entrypoint.sh logs credentails during startup [FLINK-10916] - Include duplicated user-specified uid into error message [FLINK-11005] - Define flink-sql-client uber-jar dependencies via artifactSet Test [FLINK-10606] - Construct NetworkEnvironment simple for tests [FLINK-10607] - Unify to remove duplicated NoOpResultPartitionConsumableNotifier [FLINK-10827] - Add test for duplicate() to SerializerTestBase `}),e.add({id:198,href:"/2018/12/22/apache-flink-1.6.3-released/",title:"Apache Flink 1.6.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.6 series.
This release includes more than 80 fixes and minor improvements for Flink 1.6.2. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.6.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.6.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10097] - More tests to increase StreamingFileSink test coverage [FLINK-10252] - Handle oversized metric messages [FLINK-10367] - Avoid recursion stack overflow during releasing SingleInputGate [FLINK-10863] - Assign uids to all operators in general purpose testing job Bug [FLINK-8336] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability [FLINK-9635] - Local recovery scheduling can cause spread out of tasks [FLINK-9646] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis [FLINK-9878] - IO worker threads BLOCKED on SSL Session Cache while CMS full gc [FLINK-10149] - Fink Mesos allocates extra port when not configured to do so. [FLINK-10166] - Dependency problems when executing SQL query in sql-client [FLINK-10309] - Cancel with savepoint fails with java.net.ConnectException when using the per job-mode [FLINK-10357] - Streaming File Sink end-to-end test failed with mismatch [FLINK-10359] - Scala example in DataSet docs is broken [FLINK-10364] - Test instability in NonHAQueryableStateFsBackendITCase#testMapState [FLINK-10419] - ClassNotFoundException while deserializing user exceptions from checkpointing [FLINK-10425] - taskmanager.host is not respected [FLINK-10455] - Potential Kafka producer leak in case of failures [FLINK-10463] - Null literal cannot be properly parsed in Java Table API function call [FLINK-10481] - Wordcount end-to-end test in docker env unstable [FLINK-10482] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints [FLINK-10491] - Deadlock during spilling data in SpillableSubpartition [FLINK-10566] - Flink Planning is exponential in the number of stages [FLINK-10567] - Lost serialize fields when ttl state store with the mutable serializer [FLINK-10570] - State grows unbounded when &quot;within&quot; constraint not applied [FLINK-10581] - YarnConfigurationITCase.testFlinkContainerMemory test instability [FLINK-10642] - CodeGen split fields errors when maxGeneratedCodeLength equals 1 [FLINK-10655] - RemoteRpcInvocation not overwriting ObjectInputStream&#39;s ClassNotFoundException [FLINK-10663] - Closing StreamingFileSink can cause NPE [FLINK-10669] - Exceptions &amp; errors are not properly checked in logs in e2e tests [FLINK-10670] - Fix Correlate codegen error [FLINK-10674] - Fix handling of retractions after clean up [FLINK-10681] - elasticsearch6.ElasticsearchSinkITCase fails if wrong JNA library installed [FLINK-10690] - Tests leak resources via Files.list [FLINK-10693] - Fix Scala EitherSerializer duplication [FLINK-10715] - E2e tests fail with ConcurrentModificationException in MetricRegistryImpl [FLINK-10750] - SocketClientSinkTest.testRetry fails on Travis [FLINK-10752] - Result of AbstractYarnClusterDescriptor#validateClusterResources is ignored [FLINK-10753] - Propagate and log snapshotting exceptions [FLINK-10763] - Interval join produces wrong result type in Scala API [FLINK-10770] - Some generated functions are not opened properly. [FLINK-10773] - Resume externalized checkpoint end-to-end test fails [FLINK-10809] - Using DataStreamUtils.reinterpretAsKeyedStream produces corrupted keyed state after restore [FLINK-10816] - Fix LockableTypeSerializer.duplicate() [FLINK-10821] - Resuming Externalized Checkpoint E2E test does not resume from Externalized Checkpoint [FLINK-10839] - Fix implementation of PojoSerializer.duplicate() w.r.t. subclass serializer [FLINK-10842] - Waiting loops are broken in e2e/common.sh [FLINK-10856] - Harden resume from externalized checkpoint E2E test [FLINK-10857] - Conflict between JMX and Prometheus Metrics reporter [FLINK-10880] - Failover strategies should not be applied to Batch Execution [FLINK-10913] - ExecutionGraphRestartTest.testRestartAutomatically unstable on Travis [FLINK-10925] - NPE in PythonPlanStreamer [FLINK-10946] - Resuming Externalized Checkpoint (rocks, incremental, scale up) end-to-end test failed on Travis [FLINK-10990] - Enforce minimum timespan in MeterView [FLINK-10992] - Jepsen: Do not use /tmp as HDFS Data Directory [FLINK-10997] - Avro-confluent-registry does not bundle any dependency [FLINK-10998] - flink-metrics-ganglia has LGPL dependency [FLINK-11011] - Elasticsearch 6 sink end-to-end test unstable [FLINK-11017] - Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution [FLINK-11029] - Incorrect parameter in Working with state doc [FLINK-11041] - ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis [FLINK-11045] - UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor [FLINK-11083] - CRowSerializerConfigSnapshot is not instantiable [FLINK-11087] - Broadcast state migration Incompatibility from 1.5.3 to 1.7.0 [FLINK-11123] - Missing import in ML quickstart docs [FLINK-11136] - Fix the logical of merge for DISTINCT aggregates Improvement [FLINK-4173] - Replace maven-assembly-plugin by maven-shade-plugin in flink-metrics [FLINK-10353] - Restoring a KafkaProducer with Semantic.EXACTLY_ONCE from a savepoint written with Semantic.AT_LEAST_ONCE fails with NPE [FLINK-10608] - Add avro files generated by datastream-allround-test to RAT exclusions [FLINK-10613] - Remove logger casts in HBaseConnectorITCase [FLINK-10614] - Update test_batch_allround.sh e2e to new testing infrastructure [FLINK-10637] - Start MiniCluster with random REST port [FLINK-10678] - Add a switch to run_test to configure if logs should be checked for errors/excepions [FLINK-10692] - Harden Confluent schema E2E test [FLINK-10883] - Submitting a jobs without enough slots times out due to a unspecified timeout [FLINK-10906] - docker-entrypoint.sh logs credentails during startup [FLINK-10916] - Include duplicated user-specified uid into error message [FLINK-10951] - Disable enforcing of YARN container virtual memory limits in tests [FLINK-11005] - Define flink-sql-client uber-jar dependencies via artifactSet Test [FLINK-10606] - Construct NetworkEnvironment simple for tests [FLINK-10607] - Unify to remove duplicated NoOpResultPartitionConsumableNotifier [FLINK-10827] - Add test for duplicate() to SerializerTestBase Wish [FLINK-10793] - Change visibility of TtlValue and TtlSerializer to public for external tools `}),e.add({id:199,href:"/2018/12/21/apache-flink-1.7.1-released/",title:"Apache Flink 1.7.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.7 series.
This release includes 27 fixes and minor improvements for Flink 1.7.0. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.7.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.7.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.7.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.7.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10252] - Handle oversized metric messages [FLINK-10367] - Avoid recursion stack overflow during releasing SingleInputGate [FLINK-10522] - Check if RecoverableWriter supportsResume and act accordingly. [FLINK-10963] - Cleanup small objects uploaded to S3 as independent objects Bug [FLINK-8336] - YarnFileStageTestS3ITCase.testRecursiveUploadForYarnS3 test instability [FLINK-9646] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis [FLINK-10149] - Fink Mesos allocates extra port when not configured to do so. [FLINK-10359] - Scala example in DataSet docs is broken [FLINK-10482] - java.lang.IllegalArgumentException: Negative number of in progress checkpoints [FLINK-10566] - Flink Planning is exponential in the number of stages [FLINK-10997] - Avro-confluent-registry does not bundle any dependency [FLINK-11011] - Elasticsearch 6 sink end-to-end test unstable [FLINK-11013] - Fix distinct aggregates for group window in Table API [FLINK-11017] - Time interval for window aggregations in SQL is wrongly translated if specified with YEAR_MONTH resolution [FLINK-11029] - Incorrect parameter in Working with state doc [FLINK-11032] - Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis [FLINK-11033] - Elasticsearch (v6.3.1) sink end-to-end test unstable on Travis [FLINK-11041] - ReinterpretDataStreamAsKeyedStreamITCase.testReinterpretAsKeyedStream failed on Travis [FLINK-11044] - RegisterTableSink docs incorrect [FLINK-11045] - UserCodeClassLoader has not been set correctly for RuntimeUDFContext in CollectionExecutor [FLINK-11047] - CoGroupGroupSortTranslationTest does not compile with scala 2.12 [FLINK-11085] - NoClassDefFoundError in presto-s3 filesystem [FLINK-11087] - Broadcast state migration Incompatibility from 1.5.3 to 1.7.0 [FLINK-11094] - Restored state in RocksDBStateBackend that has not been accessed in restored execution causes NPE on snapshot [FLINK-11123] - Missing import in ML quickstart docs [FLINK-11136] - Fix the logical of merge for DISTINCT aggregates Improvement [FLINK-11080] - Define flink-connector-elasticsearch6 uber-jar dependencies via artifactSet `}),e.add({id:200,href:"/2018/11/30/apache-flink-1.7.0-release-announcement/",title:"Apache Flink 1.7.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce Apache Flink 1.7.0. The latest release includes more than 420 resolved issues and some exciting additions to Flink that we describe in the following sections of this post. Please check the complete changelog for more details.
Flink 1.7.0 is API-compatible with previous 1.x.y releases for APIs annotated with the @Public annotation. The release is available now and we encourage everyone to download the release and check out the updated documentation. Feedback through the Flink mailing lists or JIRA is, as always, very much appreciated!
You can find the binaries on the updated Downloads page on the Flink project site.
Flink 1.7.0 - Extending the reach of Stream Processing # In Flink 1.7.0 we come closer to our goals of enabling fast data processing and building data-intensive applications for the Flink community in a seamless way. Our latest release includes some exciting new features and improvements such as support for Scala 2.12, an exactly-once S3 file sink, the integration of complex event processing with streaming SQL and more features that we explain below.
New Features and Improvements # Scala 2.12 Support in Apache Flink (FLINK-7811): Apache Flink 1.7.0 is the first release which comes with full support for Scala 2.12. This allows users to write Flink applications with a newer Scala version and to leverage the Scala 2.12 ecosystem.
State Evolution (FLINK-9376): In many cases, a long-running Flink application needs to evolve during its lifetime because of changing requirements. Changing the user state without losing the current application progress in the form of its state is a crucial requirement for application evolution.
With Flink 1.7.0, the community added state evolution which allows you to flexibly adapt a long-running application&rsquo;s user states schema, while maintaining compatibility with previous savepoints. With state evolution it is possible to add or remove columns to your state schema in order to change which business features will be captured by your application after it has been deployed.
State schema evolution now works out-of-the-box when using Avro&rsquo;s generated classes as user state, meaning that the schema of the state can be evolved according to Avro&rsquo;s specifications. While Avro types are the only built-in type that supports schema evolution as of Flink 1.7, the community continues working to further extend support to other types in future Flink releases.
Exactly-once S3 StreamingFileSink (FLINK-9752): The StreamingFileSink which was introduced in Flink 1.6.0 is now extended to also support writing to S3 filesystems with exactly-once processing guarantees. Using this feature allows users to build exactly-once end-to-end pipelines writing to S3.
MATCH_RECOGNIZE Support in Streaming SQL (FLINK-6935): This is a major addition to Apache Flink 1.7.0 that provides initial support of the MATCH_RECOGNIZE standard to Flink SQL. This feature combines both complex event processing (CEP) and SQL for easy pattern matching on data streams and, thus, enabling a whole set of new use cases.
This feature is currently in beta phase so we welcome any feedback and suggestions from the community for future iterations and improvements.
Temporal Tables and Temporal Joins in Streaming SQL (FLINK-9712): Temporal Tables is a new concept in Apache Flink that gives a (parameterized) view on a table’s changing history and returns the content of a table at a specific point in time.
As an example, we can use a table with historical currency exchange rates. Such a table is constantly growing/evolving as time progresses and newly updated exchange rates are added. Temporal Table is a view that can return the actual state of those exchange rates to any given point of time. With such a table it is possible to convert a stream of orders in different currencies to a common currency using the correct exchange rate.
Temporal Joins allow for memory and computational-efficient joins of Streaming data with an ever-changing/updating table, using either processing time or event time, while being ANSI SQL compliant.
Miscellaneous Features for Streaming SQL: Besides the major features mentioned above, Flink&rsquo;s Table &amp; SQL API has been extended to serve more use cases.
The following built-in functions were added to the APIs: TO_BASE64, LOG2, LTRIM, REPEAT, REPLACE, COSH, SINH, TANH
The SQL Client now supports the definition of views both in an environment file and within a CLI session. Furthermore, basic SQL statement auto-completion has been added to the CLI.
The community added an Elasticsearch 6 table sink which allows to store updating results of a dynamic table.
Versioned REST API (FLINK-7551): Beginning with Flink 1.7.0, the REST API is versioned. This guarantees the stability of Flink’s REST API so that third-party applications can be developed against a stable API in Flink. Thus, future Flink upgrades will not require changes to existing third-party integrations.
Kafka 2.0 Connector (FLINK-10598): Apache Flink 1.7.0 continues to add more connectors, making it even easier to interact with more external systems. In this release, the community added the Kafka 2.0 connector which allows to read from and write to Kafka 2.0 with exactly-once guarantees.
Local Recovery (FLINK-9635): Apache Flink 1.7.0 completes the local recovery feature by extending Flink’s scheduling to take previous deployment locations into account in case of recovery.
If local recovery is enabled Flink will keep a local copy of the latest checkpoint on the machine where the task is running. By scheduling tasks to their previous locations, Flink will, thus, minimize the network traffic for restoring state by reading checkpoint state from local disk. This feature considerably improves recovery speed.
Removal of Flink’s Legacy Mode (FLINK-10392): Apache Flink 1.7.0 marks the release where the Flip-6 effort has been fully completed and reached feature parity with the legacy mode. Consequently, this release removes support for the legacy mode.
Release Notes # Please review the release notes if you plan to upgrade your Flink setup to Flink 1.7.
List of Contributors # We would like to acknowledge all community members for contributing to this release. Special credits go to the following members for contributing to the 1.7.0 release (according to git):
Aitozi, Alex Arkhipov, Alexander Koltsov, Alexey Trenikhin, Alice, Alice Yan, Aljoscha Krettek, Andrei Poluliakh, Andrey Zagrebin, Ashwin Sinha, Barisa Obradovic, Ben La Monica, Benoit Meriaux, Bowen Li, Chesnay Schepler, Christophe Jolif, Congxian Qiu, Craig Foster, David Anderson, Dawid Wysakowicz, Dian Fu, Diego Carvallo, Dimitris Palyvos, Eugen Yushin, Fabian Hueske, Florian Schmidt, Gary Yao, Guibo Pan, Hequn Cheng, Hiroaki Yoshida, Igal Shilman, JIN SUN, Jamie Grier, Jayant Ameta, Jeff Zhang, Jeffrey Chung, Jicaar, Jin Sun, Joe Malt, Johannes Dillmann, Jun Zhang, Kostas Kloudas, Krzysztof Białek, Lakshmi Gururaja Rao, Liu Biao, Mahesh Senniappan, Manuel Hoffmann, Mark Cho, Max Feng, Mike Pedersen, Mododo, Nico Kruber, Oleksandr Nitavskyi, Osman Şamil AKÇELİK, Patrick Lucas, Paul Lam, Piotr Nowojski, Rick Hofstede, Rong R, Rong Rong, Sayat Satybaldiyev, Sebastian Klemke, Seth Wiesman, Shimin Yang, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Jason, Thomas Weise, Till Rohrmann, Timo Walther, Tzu-Li &ldquo;tison&rdquo; Chen, Tzu-Li (Gordon) Tai, Tzu-Li Chen, Wosin, Xingcan Cui, Xpray, Xue Yu, Yangze Guo, Ying Xu, Yun Tang, Zhijiang, blues Zheng, hequn8128, ifndef-SleePy, jerryjzhang, jrthe42, jyc.jia, kkolman, lihongli, linjun, linzhaoming, liurenjie1024, liuxianjiao, lrl, lsy, lzqdename, maqingxiang, maqingxiang-it, minwenjun, shuai-xu, sihuazhou, snuyanzin, wind, xuewei.linxuewei, xueyu, xuqianjin, yanghua, yangshimin, zhijiang, 谢磊, 陈梓立
`}),e.add({id:201,href:"/2018/10/29/apache-flink-1.5.5-released/",title:"Apache Flink 1.5.5 Released",section:"Flink Blog",content:`The Apache Flink community released the fifth bugfix version of the Apache Flink 1.5 series.
This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.5.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.5&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10242] - Latency marker interval should be configurable [FLINK-10243] - Add option to reduce latency metrics granularity [FLINK-10331] - Reduce number of flush requests to the network stack [FLINK-10332] - Move data available notification in PipelinedSubpartition out of the synchronized block Bug [FLINK-5542] - YARN client incorrectly uses local YARN config to check vcore capacity [FLINK-9567] - Flink does not release resource in Yarn Cluster mode [FLINK-9788] - ExecutionGraph Inconsistency prevents Job from recovering [FLINK-9884] - Slot request may not be removed when it has already be assigned in slot manager [FLINK-9891] - Flink cluster is not shutdown in YARN mode when Flink client is stopped [FLINK-9932] - Timed-out TaskExecutor slot-offers to JobMaster leak the slot [FLINK-10135] - Certain cluster-level metrics are no longer exposed [FLINK-10222] - Table scalar function expression parses error when function name equals the exists keyword suffix [FLINK-10259] - Key validation for GroupWindowAggregate is broken [FLINK-10316] - Add check to KinesisProducer that aws.region is set [FLINK-10354] - Savepoints should be counted as retained checkpoints [FLINK-10400] - Return failed JobResult if job terminates in state FAILED or CANCELED [FLINK-10415] - RestClient does not react to lost connection [FLINK-10451] - TableFunctionCollector should handle the life cycle of ScalarFunction [FLINK-10469] - FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer) [FLINK-10487] - fix invalid Flink SQL example [FLINK-10516] - YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup [FLINK-10524] - MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis [FLINK-10544] - Remove custom settings.xml for snapshot deployments Improvement [FLINK-10075] - HTTP connections to a secured REST endpoint flood the log [FLINK-10260] - Confusing log messages during TaskManager registration [FLINK-10282] - Provide separate thread-pool for REST endpoint [FLINK-10312] - Wrong / missing exception when submitting job [FLINK-10375] - ExceptionInChainedStubException hides wrapped exception in cause [FLINK-10582] - Make REST executor thread priority configurable `}),e.add({id:202,href:"/2018/10/29/apache-flink-1.6.2-released/",title:"Apache Flink 1.6.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.6 series.
This release includes more than 30 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.6.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.6.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-10242] - Latency marker interval should be configurable [FLINK-10243] - Add option to reduce latency metrics granularity [FLINK-10331] - Reduce number of flush requests to the network stack [FLINK-10332] - Move data available notification in PipelinedSubpartition out of the synchronized block Bug [FLINK-5542] - YARN client incorrectly uses local YARN config to check vcore capacity [FLINK-9567] - Flink does not release resource in Yarn Cluster mode [FLINK-9788] - ExecutionGraph Inconsistency prevents Job from recovering [FLINK-9884] - Slot request may not be removed when it has already be assigned in slot manager [FLINK-9891] - Flink cluster is not shutdown in YARN mode when Flink client is stopped [FLINK-9932] - Timed-out TaskExecutor slot-offers to JobMaster leak the slot [FLINK-10135] - Certain cluster-level metrics are no longer exposed [FLINK-10157] - Allow \`null\` user values in map state with TTL [FLINK-10222] - Table scalar function expression parses error when function name equals the exists keyword suffix [FLINK-10259] - Key validation for GroupWindowAggregate is broken [FLINK-10263] - User-defined function with LITERAL paramters yields CompileException [FLINK-10316] - Add check to KinesisProducer that aws.region is set [FLINK-10354] - Savepoints should be counted as retained checkpoints [FLINK-10363] - S3 FileSystem factory prints secrets into logs [FLINK-10379] - Can not use Table Functions in Java Table API [FLINK-10383] - Hadoop configurations on the classpath seep into the S3 file system configs [FLINK-10390] - DataDog MetricReporter leaks connections [FLINK-10400] - Return failed JobResult if job terminates in state FAILED or CANCELED [FLINK-10415] - RestClient does not react to lost connection [FLINK-10444] - Make S3 entropy injection work with FileSystem safety net [FLINK-10451] - TableFunctionCollector should handle the life cycle of ScalarFunction [FLINK-10465] - Jepsen: runit supervised sshd is stopped on tear down [FLINK-10469] - FileChannel may not write the whole buffer in a single call to FileChannel.write(Buffer buffer) [FLINK-10487] - fix invalid Flink SQL example [FLINK-10516] - YarnApplicationMasterRunner does not initialize FileSystem with correct Flink Configuration during setup [FLINK-10524] - MemoryManagerConcurrentModReleaseTest.testConcurrentModificationWhileReleasing failed on travis [FLINK-10532] - Broken links in documentation [FLINK-10544] - Remove custom settings.xml for snapshot deployments Improvement [FLINK-9061] - Add entropy to s3 path for better scalability [FLINK-10075] - HTTP connections to a secured REST endpoint flood the log [FLINK-10260] - Confusing log messages during TaskManager registration [FLINK-10282] - Provide separate thread-pool for REST endpoint [FLINK-10291] - Generate JobGraph with fixed/configurable JobID in StandaloneJobClusterEntrypoint [FLINK-10311] - HA end-to-end/Jepsen tests for standby Dispatchers [FLINK-10312] - Wrong / missing exception when submitting job [FLINK-10371] - Allow to enable SSL mutual authentication on REST endpoints by configuration [FLINK-10375] - ExceptionInChainedStubException hides wrapped exception in cause [FLINK-10582] - Make REST executor thread priority configurable `}),e.add({id:203,href:"/2018/09/20/apache-flink-1.5.4-released/",title:"Apache Flink 1.5.4 Released",section:"Flink Blog",content:`The Apache Flink community released the fourth bugfix version of the Apache Flink 1.5 series.
This release includes more than 20 fixes and minor improvements for Flink 1.5.4. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.4.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.4&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Bug [FLINK-9878] - IO worker threads BLOCKED on SSL Session Cache while CMS full gc [FLINK-10011] - Old job resurrected during HA failover [FLINK-10101] - Mesos web ui url is missing. [FLINK-10115] - Content-length limit is also applied to FileUploads [FLINK-10116] - createComparator fails on case class with Unit type fields prior to the join-key [FLINK-10141] - Reduce lock contention introduced with 1.5 [FLINK-10142] - Reduce synchronization overhead for credit notifications [FLINK-10150] - Chained batch operators interfere with each other other [FLINK-10172] - Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc [FLINK-10193] - Default RPC timeout is used when triggering savepoint via JobMasterGateway [FLINK-10204] - StreamElementSerializer#copy broken for LatencyMarkers [FLINK-10255] - Standby Dispatcher locks submitted JobGraphs [FLINK-10261] - INSERT INTO does not work with ORDER BY clause [FLINK-10267] - [State] Fix arbitrary iterator access on RocksDBMapIterator [FLINK-10293] - RemoteStreamEnvironment does not forward port to RestClusterClient [FLINK-10314] - Blocking calls in Execution Graph creation bring down cluster [FLINK-10328] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks [FLINK-10329] - Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph Improvement [FLINK-10082] - Initialize StringBuilder in Slf4jReporter with estimated size [FLINK-10131] - Improve logging around ResultSubpartition [FLINK-10137] - YARN: Log completed Containers [FLINK-10185] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous [FLINK-10223] - TaskManagers should log their ResourceID during startup [FLINK-10301] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment `}),e.add({id:204,href:"/2018/09/20/apache-flink-1.6.1-released/",title:"Apache Flink 1.6.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.6 series.
This release includes 60 fixes and minor improvements for Flink 1.6.1. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.6.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.6.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.6.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-9637] - Add public user documentation for TTL feature [FLINK-10068] - Add documentation for async/RocksDB-based timers [FLINK-10085] - Update AbstractOperatorRestoreTestBase [FLINK-10087] - Update BucketingSinkMigrationTest [FLINK-10089] - Update FlinkKafkaConsumerBaseMigrationTest [FLINK-10090] - Update ContinuousFileProcessingMigrationTest [FLINK-10091] - Update WindowOperatorMigrationTest [FLINK-10092] - Update StatefulJobSavepointMigrationITCase [FLINK-10109] - Add documentation for StreamingFileSink Bug [FLINK-9289] - Parallelism of generated operators should have max parallism of input [FLINK-9546] - The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0 [FLINK-9693] - Possible memory leak in jobmanager retaining archived checkpoints [FLINK-9972] - Debug memory logging not working [FLINK-10011] - Old job resurrected during HA failover [FLINK-10063] - Jepsen: Automatically restart Mesos Processes [FLINK-10101] - Mesos web ui url is missing. [FLINK-10105] - Test failure because of jobmanager.execution.failover-strategy is outdated [FLINK-10115] - Content-length limit is also applied to FileUploads [FLINK-10116] - createComparator fails on case class with Unit type fields prior to the join-key [FLINK-10141] - Reduce lock contention introduced with 1.5 [FLINK-10142] - Reduce synchronization overhead for credit notifications [FLINK-10150] - Chained batch operators interfere with each other other [FLINK-10151] - [State TTL] Fix false recursion call in TransformingStateTableKeyGroupPartitioner.tryAddToSource [FLINK-10154] - Make sure we always read at least one record in KinesisConnector [FLINK-10169] - RowtimeValidator fails with custom TimestampExtractor [FLINK-10172] - Inconsistentcy in ExpressionParser and ExpressionDsl for order by asc/desc [FLINK-10192] - SQL Client table visualization mode does not update correctly [FLINK-10193] - Default RPC timeout is used when triggering savepoint via JobMasterGateway [FLINK-10204] - StreamElementSerializer#copy broken for LatencyMarkers [FLINK-10255] - Standby Dispatcher locks submitted JobGraphs [FLINK-10261] - INSERT INTO does not work with ORDER BY clause [FLINK-10267] - [State] Fix arbitrary iterator access on RocksDBMapIterator [FLINK-10269] - Elasticsearch 6 UpdateRequest fail because of binary incompatibility [FLINK-10283] - FileCache logs unnecessary warnings [FLINK-10293] - RemoteStreamEnvironment does not forward port to RestClusterClient [FLINK-10314] - Blocking calls in Execution Graph creation bring down cluster [FLINK-10328] - Stopping the ZooKeeperSubmittedJobGraphStore should release all currently held locks [FLINK-10329] - Fail with exception if job cannot be removed by ZooKeeperSubmittedJobGraphStore#removeJobGraph New Feature [FLINK-10022] - Add metrics for input/output buffers Improvement [FLINK-9013] - Document yarn.containers.vcores only being effective when adapting YARN config [FLINK-9446] - Compatibility table not up-to-date [FLINK-9795] - Update Mesos documentation for flip6 [FLINK-9859] - More Akka config options [FLINK-9899] - Add more metrics to the Kinesis source connector [FLINK-9962] - allow users to specify TimeZone in DateTimeBucketer [FLINK-10001] - Improve Kubernetes documentation [FLINK-10006] - Improve logging in BarrierBuffer [FLINK-10020] - Kinesis Consumer listShards should support more recoverable exceptions [FLINK-10082] - Initialize StringBuilder in Slf4jReporter with estimated size [FLINK-10094] - Always backup default config for end-to-end tests [FLINK-10110] - Harden e2e Kafka shutdown [FLINK-10131] - Improve logging around ResultSubpartition [FLINK-10137] - YARN: Log completed Containers [FLINK-10164] - Add support for resuming from savepoints to StandaloneJobClusterEntrypoint [FLINK-10170] - Support string representation for map and array types in descriptor-based Table API [FLINK-10185] - Make ZooKeeperStateHandleStore#releaseAndTryRemove synchronous [FLINK-10223] - TaskManagers should log their ResourceID during startup [FLINK-10301] - Allow a custom Configuration in StreamNetworkBenchmarkEnvironment [FLINK-10325] - [State TTL] Refactor TtlListState to use only loops, no java stream API for performance Test [FLINK-10084] - Migration tests weren&#39;t updated for 1.5 `}),e.add({id:205,href:"/2018/08/21/apache-flink-1.5.3-released/",title:"Apache Flink 1.5.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.5 series.
This release includes more than 20 fixes and minor improvements for Flink 1.5.3. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-9951] - Update scm developerConnection Bug [FLINK-5750] - Incorrect translation of n-ary Union [FLINK-9289] - Parallelism of generated operators should have max parallism of input [FLINK-9546] - The heartbeatTimeoutIntervalMs of HeartbeatMonitor should be larger than 0 [FLINK-9655] - Externalized checkpoint E2E test fails on travis [FLINK-9693] - Possible memory leak in jobmanager retaining archived checkpoints [FLINK-9694] - Potentially NPE in CompositeTypeSerializerConfigSnapshot constructor [FLINK-9923] - OneInputStreamTaskTest.testWatermarkMetrics fails on Travis [FLINK-9935] - Batch Table API: grouping by window and attribute causes java.lang.ClassCastException: [FLINK-9936] - Mesos resource manager unable to connect to master after failover [FLINK-9946] - Quickstart E2E test archetype version is hard-coded [FLINK-9969] - Unreasonable memory requirements to complete examples/batch/WordCount [FLINK-9972] - Debug memory logging not working [FLINK-9978] - Source release sha contains absolute file path [FLINK-9985] - Incorrect parameter order in document [FLINK-9988] - job manager does not respect property jobmanager.web.address [FLINK-10013] - Fix Kerberos integration for FLIP-6 YarnTaskExecutorRunner [FLINK-10033] - Let Task release reference to Invokable on shutdown [FLINK-10070] - Flink cannot be compiled with maven 3.0.x New Feature [FLINK-10022] - Add metrics for input/output buffers Improvement [FLINK-9446] - Compatibility table not up-to-date [FLINK-9765] - Improve CLI responsiveness when cluster is not reachable [FLINK-9806] - Add a canonical link element to documentation HTML [FLINK-9859] - More Akka config options [FLINK-9942] - Guard handlers against null fields in requests [FLINK-9986] - Remove unnecessary information from .version.properties file [FLINK-9987] - Rework ClassLoader E2E test to not rely on .version.properties file [FLINK-10006] - Improve logging in BarrierBuffer [FLINK-10016] - Make YARN/Kerberos end-to-end test stricter `}),e.add({id:206,href:"/2018/08/09/apache-flink-1.6.0-release-announcement/",title:"Apache Flink 1.6.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is proud to announce the 1.6.0 release. Over the past 2 months, the Flink community has worked hard to resolve more than 360 issues. Please check the complete changelog for more details.
Flink 1.6.0 is the seventh major release in the 1.x.y series. It is API-compatible with previous 1.x.y releases for APIs annotated with the @Public annotation.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists or JIRA is, as always, very much appreciated!
You can find the binaries on the updated Downloads page on the Flink project site.
Flink 1.6 - The next step in stateful stream processing # In Flink 1.6.0 we continue the groundwork we laid out in earlier versions: Enabling Flink users to seamlessly run fast data processing and build data-driven and data-intensive applications effortlessly.
Flink&rsquo;s state support is one of the key features which makes Flink so versatile and powerful when it comes to implementing all kinds of use cases. To make it even easier, the community added native support for state TTL (FLINK-9510, FLINK-9938). This feature allows to clean up state after it has expired. With Flink 1.6.0 timer state can now go out of core (FLINK-9485) by storing the relevant state in RocksDB. Last but not least, we also improved the deletion of timers (FLINK-9423) significantly.
With Flink 1.5.0 we reworked Flink&rsquo;s distributed architecture to add support for resource elasticity and different deployment scenarios, most notably a better container integration. In Flink 1.6.0 we follow up on some of the unfinished aspects of this work: All external communication, including job submissions, is now HTTP/REST based (FLINK-9280) which eases container setups considerably. Flink 1.6.0 also comes with a container entrypoint (FLINK-9488) which allows to easily bootstrap a containerized job cluster.
Streaming SQL is one of the features with the most disruptive potential, because it makes Flink much more accessible. In Apache Flink 1.6.0 the community improved further the SQL CLI (FLINK-8863) making the executions of streaming and batch queries (FLINK-8861) against a multitude of data sources a piece of cake. In addition, the full Avro support (FLINK-9444) makes reading any kind of Avro data seamless. Last but not least, the community hardened Flink&rsquo;s CEP library (FLINK-9418) that can now handle significantly larger use cases.
What would be a distributed processing engine without its connectors to talk to the outside world? In the latest Flink release we added a new StreamingFileSink (FLINK-9750) that succeeds the BucketingSink as the standard file sink. The community also added support for ElasticSearch 6.x (FLINK-7386) and implemented multiple AvroDeserializationSchemas (FLINK-9338) to easily ingest Avro data.
New Features and Improvements # Improving Flink&rsquo;s State Support # Support for State TTL (FLINK-9510, FLINK-9938): This feature allows to specify a time-to-live (TTL) for Flink state. Once the time-to-live has been exceeded Flink will no longer give access to the respective state values. The expired data is cleaned up on access so that the operator keyed state doesn’t grow infinitely and it won&rsquo;t be included in subsequent checkpoints. This feature fully complies with new data protection regulations (e.g. GDPR).
Scalable Timers Based on RocksDB (FLINK-9485): Flink’s timer state can now be stored in RocksDB, allowing the technology to support significantly bigger timer state since it can go out of core/spill to disk. Previously, users were limited to the heap memory size. On top of that, snapshots of the timer state are now asynchronous, i.e., they no longer block the processing pipeline during checkpoints and can be incremental.
Faster Timer Deletions (FLINK-9423): Improving Flink’s internal timer data structure such that the deletion complexity is reduced from O(n) to O(log n). This significantly improves Flink jobs using timers. Deleting timers is also exposed through a user-facing API now.
Extending Flink&rsquo;s Deployment Options # Job Cluster Container Entrypoint (FLINK-9488): Flink 1.6.0 provides an easy-to-use container entrypoint to bootstrap a job cluster. Combining this entrypoint with a user-code jar creates a self-contained image which automatically executes the contained Flink job when deployed. Since the image already contains the Flink job, client communication is no longer necessary. Avoiding additional communication steps with the client reduces the number of moving parts and improves operations in a container environment significantly.
Fully RESTified Job Submission (FLINK-9280): The Flink client now sends all job-relevant content via a single POST call to the server. This allows a much easier integration with cluster management frameworks and container environments, since opening custom ports is no longer necessary.
Enhancing SQL and Table API # User-Defined Function in SQL Client CLI (FLINK-8863): The SQL Client CLI now supports the registration of user-defined functions. This considerably improves the CLI’s expressiveness, because SQL queries can be enriched with more powerful custom table, aggregate, and scalar functions.
Support for Batch Queries in SQL Client CLI (FLINK-8861): The SQL Client CLI now supports the execution of batch queries.
Support for INSERT INTO Statements in SQL Client CLI (FLINK-8858): By supporting SQL’s INSERT INTO statements, the SQL Client CLI can be used to submit long-running SQL queries to Flink that sink their results in external systems. The SQL Client itself can be shut down after submission without stopping the job.
Unified Table Sinks and Formats (FLINK-8866, FLINK-8558): In the past, table sinks had to be configured programmatically and were tied to a specific format and implementation. This release reworked these aspects by decoupling formats from connectors and improving how table sinks are discovered and configured. Table sinks can now be defined in a YAML file using string-based properties without having to write a single line of code.
New Kafka Table Sink (FLINK-9846): The Kafka table sink now uses the new unified APIs and supports both JSON and Avro formats.
Full SQL Avro Support (FLINK-9444): Flink’s Table &amp; SQL API now understands the full spectrum of Avro types including generic/specific records and logical types. The types are automatically mapped from and to Flink-equivalent types allowing to specify end-to-end ETL pipelines in SQL.
Improved Expressiveness of SQL and Table API (FLINK-5878, FLINK-8688, FLINK-6810): Flink’s Table &amp; SQL API supports left, right, and full outer joins that allow for continuous result-updating queries. SQL aggregate functions support the DISTINCT keyword. Queries such as COUNT(DISTINCT column) are supported for windowed and non-windowed aggregations. Both SQL and Table API now include more built-in functions such as MD5, SHA1, SHA2, LOG, and UNNEST for multisets.
More Connectors # New StreamingFileSink (FLINK-9750): The new StreamingFileSink is an exactly-once sink for writing to filesystems which capitalizes on the knowledge acquired from the previous BucketingSink. Exactly-once is supported through integration of the sink with Flink’s checkpointing mechanism. The new sink is built upon Flink’s own FileSystem abstraction and it supports local file system and HDFS, with plans for S3 support in the near future. It exposes pluggable file rolling and bucketing policies. Apart from row-wise encoding formats, the new StreamingFileSink comes with support for Parquet. Other bulk-encoding formats like ORC can be easily added using the exposed APIs.
ElasticSearch 6.x Connector and Improved Support for Older Versions (FLINK-7386): Flink now comes with a connector for ElasticSearch 6.x, that is built on top of Elasticsearch’s new high level REST client. For older ElasticSearch versions which still use the native Java TransportClient, Flink&rsquo;s Elasticsearch connectors now support up to Elasticsearch version 5.6.10. Some APIs in the RequestIndexer's public interface of the ElasticSearch connector have been deprecated. Please refer to the Javadoc / documentation for the new preferred API.
Avro Deserialization Schemas (FLINK-9338): Flink comes now with a DeserializationSchema which allows deserializing Avro encoded messages. It also adds out-of-the-box integration with Confluent’s schema registry.
Jepsen Based Distributed Tests Suite # The Flink community added a Jepsen based test suite (FLINK-9004) which validates the behavior of Flink’s distributed cluster components under real-world faults. It is a first step towards a higher test coverage for Flink&rsquo;s fault tolerance mechanisms. The community intends to incrementally improve test coverage with it.
Various Other Features and Improvements # Hardened CEP Library (FLINK-9418): The CEP operator’s internal NFA state is now backed by Flink state. That way it can go out of core to support much larger use cases.
More Expressive DataStream Joins (FLINK-8478): Flink 1.6.0 adds support for interval joins in the DataStream API. With this feature it is now possible to join together events from different streams where elements from one stream lie in a specified time interval relative to elements from the other stream. Check out the documentation for more details.
Intra-Cluster Mutual Authentication (FLINK-9312): Flink’s cluster components now enforce mutual authentication with their peers. This allows only Flink components to talk to each other, making it impossible for malicious actors to impersonate Flink components in order to eavesdrop on the cluster communication.
Release Notes # Please review the release notes if you plan to upgrade your Flink setup to Flink 1.6.
List of Contributors # According to git shortlog, the following 112 people contributed to the 1.6.0 release. Thanks to all contributors!
Alejandro Alcalde, Alexander Koltsov, Alexey Tsitkin, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Arunan Sugunakumar, Ashwin Sinha, Bill Lee, Bowen Li, Chesnay Schepler, Christophe Jolif, Clément Tamisier, Craig Foster, David Anderson, Dawid Wysakowicz, Deepak Sharnma, Dmitrii_Kniazev, EAlexRojas, Elias Levy, Eron Wright, Ethan Li, Fabian Hueske, Florian Schmidt, Franz Thoma, Gabor Gevay, Georgii Gobozov, Haohui Mai, Jamie Grier, Jeff Zhang, Jelmer Kuperus, Jiayi Liao, Jungtaek Lim, Kailash HD, Ken Geis, Ken Krugler, Lakshmi Gururaja Rao, Leonid Ishimnikov, Matrix42, Michael Gendelman, MichealShin, Moser Thomas W, Nico Duldhardt, Nico Kruber, Oleksandr Nitavskyi, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Philippe Duveau, Piotr Nowojski, Qiu Congxian/klion26, Rinat Sharipov, Rong Rong, Rune Skou Larsen, Sayat Satybaldiyev, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Thomas Weise, Till Rohrmann, Timo Walther, Tobii42, Tzu-Li (Gordon) Tai, Viktor Vlasov, Wosin, Xingcan Cui, Xpray, Yan Zhou, Yazdan.JS, Yun Tang, Zhijiang, Zsolt Donca, an4828, aria, binlijin, blueszheng, davidxdh, gyao, hequn8128, hzyuqi1, jerryjzhang, jparkie, juhoautio, kai-chi, kkloudas, klion26, lamber-ken, lincoln-lil, linjun, liurenjie1024, lsy, maqingxiang-it, maxbelov, mayyamus, minwenjun, neoremind, sampathBhat, shankarganesh1234, shuai.xus, sihuazhou, snuyanzin, triones.deng, vinoyang, xueyu, yangshimin, yuemeng, zhangminglei, zhouhai02, zjureel, 军长, 陈梓立
`}),e.add({id:207,href:"/2018/07/31/apache-flink-1.5.2-released/",title:"Apache Flink 1.5.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.5 series.
This release includes more than 20 fixes and minor improvements for Flink 1.5.1. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-9839] - End-to-end test: Streaming job with SSL Bug [FLINK-5750] - Incorrect translation of n-ary Union [FLINK-8161] - Flakey YARNSessionCapacitySchedulerITCase on Travis [FLINK-8731] - TwoInputStreamTaskTest flaky on travis [FLINK-9091] - Failure while enforcing releasability in building flink-json module [FLINK-9380] - Failing end-to-end tests should not clean up logs [FLINK-9439] - DispatcherTest#testJobRecovery dead locks [FLINK-9575] - Potential race condition when removing JobGraph in HA [FLINK-9584] - Unclosed streams in Bucketing-/RollingSink [FLINK-9658] - Test data output directories are no longer cleaned up [FLINK-9706] - DispatcherTest#testSubmittedJobGraphListener fails on Travis [FLINK-9743] - PackagedProgram.extractContainedLibraries fails on Windows [FLINK-9754] - Release scripts refers to non-existing profile [FLINK-9755] - Exceptions in RemoteInputChannel#notifyBufferAvailable() are not propagated to the responsible thread [FLINK-9762] - CoreOptions.TMP_DIRS wrongly managed on Yarn [FLINK-9766] - Incomplete/incorrect cleanup in RemoteInputChannelTest [FLINK-9771] - &quot;Show Plan&quot; option under Submit New Job in WebUI not working [FLINK-9772] - Documentation of Hadoop API outdated [FLINK-9784] - Inconsistent use of &#39;static&#39; in AsyncIOExample.java [FLINK-9793] - When submitting a flink job with yarn-cluster, flink-dist*.jar is repeatedly uploaded [FLINK-9810] - JarListHandler does not close opened jars [FLINK-9838] - Slot request failed Exceptions after completing a job [FLINK-9841] - Web UI only show partial taskmanager log [FLINK-9842] - Job submission fails via CLI with SSL enabled [FLINK-9847] - OneInputStreamTaskTest.testWatermarksNotForwardedWithinChainWhenIdle unstable [FLINK-9857] - Processing-time timers fire too early [FLINK-9860] - Netty resource leak on receiver side [FLINK-9872] - SavepointITCase#testSavepointForJobWithIteration does not properly cancel jobs [FLINK-9908] - Inconsistent state of SlotPool after ExecutionGraph cancellation [FLINK-9910] - Non-queued scheduling failure sometimes does not return the slot [FLINK-9911] - SlotPool#failAllocation is called outside of main thread New Feature [FLINK-9499] - Allow REST API for running a job to provide job configuration as body of POST request Improvement [FLINK-9659] - Remove hard-coded sleeps in bucketing sink E2E test [FLINK-9748] - create_source_release pollutes flink root directory [FLINK-9768] - Only build flink-dist for binary releases [FLINK-9785] - Add remote addresses to LocalTransportException instances [FLINK-9801] - flink-dist is missing dependency on flink-examples [FLINK-9804] - KeyedStateBackend.getKeys() does not work on RocksDB MapState [FLINK-9811] - Add ITCase for interactions of Jar handlers [FLINK-9873] - Log actual state when aborting checkpoint due to task not running [FLINK-9881] - Typo in a function name in table.scala [FLINK-9888] - Remove unsafe defaults from release scripts [FLINK-9909] - Remove cancellation of input futures from ConjunctFutures `}),e.add({id:208,href:"/2018/07/12/apache-flink-1.5.1-released/",title:"Apache Flink 1.5.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.5 series.
This release includes more than 60 fixes and minor improvements for Flink 1.5.0. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.5.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.5.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.5.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-8977] - End-to-end test: Manually resume job after terminal failure [FLINK-8982] - End-to-end test: Queryable state [FLINK-8989] - End-to-end test: ElasticSearch connector [FLINK-8996] - Include an operator with broadcast and union state [FLINK-9008] - End-to-end test: Quickstarts [FLINK-9320] - Update \`test-ha.sh\` end-to-end test to use general purpose DataStream job [FLINK-9322] - Add exception throwing map function that simulates failures to the general purpose DataStream job [FLINK-9394] - Let externalized checkpoint resume e2e also test rescaling Bug [FLINK-8785] - JobSubmitHandler does not handle JobSubmissionExceptions [FLINK-8795] - Scala shell broken for Flip6 [FLINK-8946] - TaskManager stop sending metrics after JobManager failover [FLINK-9174] - The type of state created in ProccessWindowFunction.proccess() is inconsistency [FLINK-9215] - TaskManager Releasing - org.apache.flink.util.FlinkException [FLINK-9257] - End-to-end tests prints &quot;All tests PASS&quot; even if individual test-script returns non-zero exit code [FLINK-9258] - ConcurrentModificationException in ComponentMetricGroup.getAllVariables [FLINK-9326] - TaskManagerOptions.NUM_TASK_SLOTS does not work for local/embedded mode [FLINK-9374] - Flink Kinesis Producer does not backpressure [FLINK-9398] - Flink CLI list running job returns all jobs except in CREATE state [FLINK-9437] - Revert cypher suite update [FLINK-9458] - Unable to recover from job failure on YARN with NPE [FLINK-9467] - No Watermark display on Web UI [FLINK-9468] - Wrong calculation of outputLimit in LimitedConnectionsFileSystem [FLINK-9493] - Forward exception when releasing a TaskManager at the SlotPool [FLINK-9494] - Race condition in Dispatcher with concurrent granting and revoking of leaderhship [FLINK-9500] - FileUploadHandler does not handle EmptyLastHttpContent [FLINK-9524] - NPE from ProcTimeBoundedRangeOver.scala [FLINK-9530] - Task numRecords metrics broken for chains [FLINK-9554] - flink scala shell doesn&#39;t work in yarn mode [FLINK-9567] - Flink does not release resource in Yarn Cluster mode [FLINK-9570] - SQL Client merging environments uses AbstractMap [FLINK-9580] - Potentially unclosed ByteBufInputStream in RestClient#readRawResponse [FLINK-9627] - Extending &#39;KafkaJsonTableSource&#39; according to comments will result in NPE [FLINK-9629] - Datadog metrics reporter does not have shaded dependencies [FLINK-9633] - Flink doesn&#39;t use the Savepoint path&#39;s filesystem to create the OuptutStream on Task. [FLINK-9634] - Deactivate previous location based scheduling if local recovery is disabled [FLINK-9636] - Network buffer leaks in requesting a batch of segments during canceling [FLINK-9646] - ExecutionGraphCoLocationRestartTest.testConstraintsAfterRestart failed on Travis [FLINK-9654] - Internal error while deserializing custom Scala TypeSerializer instances [FLINK-9655] - Externalized checkpoint E2E test fails on travis [FLINK-9665] - PrometheusReporter does not properly unregister metrics [FLINK-9676] - Deadlock during canceling task and recycling exclusive buffer [FLINK-9677] - RestClient fails for large uploads [FLINK-9684] - HistoryServerArchiveFetcher not working properly with secure hdfs cluster [FLINK-9693] - Possible memory leak in jobmanager retaining archived checkpoints [FLINK-9708] - Network buffer leaks when buffer request fails during buffer redistribution [FLINK-9769] - FileUploads may be shared across requests [FLINK-9770] - UI jar list broken [FLINK-9789] - Watermark metrics for an operator&amp;task shadow each other New Feature [FLINK-9153] - TaskManagerRunner should support rpc port range [FLINK-9280] - Extend JobSubmitHandler to accept jar files [FLINK-9316] - Expose operator unique ID to the user defined functions in DataStream . [FLINK-9564] - Expose end-to-end module directory to test scripts [FLINK-9599] - Implement generic mechanism to receive files via rest [FLINK-9669] - Introduce task manager assignment store [FLINK-9670] - Introduce slot manager factory [FLINK-9671] - Add configuration to enable task manager isolation. Improvement [FLINK-4301] - Parameterize Flink version in Quickstart bash script [FLINK-8650] - Add tests and documentation for WINDOW clause [FLINK-8654] - Extend quickstart docs on how to submit jobs [FLINK-9109] - Add flink modify command to documentation [FLINK-9355] - Simplify configuration of local recovery to a simple on/off [FLINK-9372] - Typo on Elasticsearch website link (elastic.io --&gt; elastic.co) [FLINK-9409] - Remove flink-avro and flink-json from /opt [FLINK-9456] - Let ResourceManager notify JobManager about failed/killed TaskManagers [FLINK-9508] - General Spell Check on Flink Docs [FLINK-9517] - Fixing broken links on CLI and Upgrade Docs [FLINK-9518] - SSL setup Docs config example has wrong keys password [FLINK-9549] - Fix FlickCEP Docs broken link and minor style changes [FLINK-9573] - Check for leadership with leader session id [FLINK-9594] - Add documentation for e2e test changes introduced with FLINK-9257 [FLINK-9595] - Add instructions to docs about ceased support of KPL version used in Kinesis connector [FLINK-9638] - Add helper script to run single e2e test [FLINK-9672] - Fail fatally if we cannot submit job on added JobGraph signal [FLINK-9707] - LocalFileSystem does not support concurrent directory creations [FLINK-9729] - Duplicate lines for &quot;Weekday name (Sunday .. Saturday)&quot; [FLINK-9734] - Typo &#39;field-deleimiter&#39; in SQL client docs `}),e.add({id:209,href:"/2018/05/18/apache-flink-1.5.0-release-announcement/",title:"Apache Flink 1.5.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is thrilled to announce the 1.5.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 780 issues. Please check the complete changelog for more detail.
Flink 1.5.0 is the sixth major release in the 1.x.y series. As usual, it is API-compatible with previous 1.x.y releases for APIs annotated with the @Public annotation.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists or JIRA is, as always, very much appreciated!
You can find the binaries on the updated Downloads page on the Flink project site.
Flink 1.5 - Streaming Evolved # We believe that the field of stream processing, and Apache Flink with it, is taking another major leap at the moment. Stream processing is not just faster analytics and a more principled way of building fast continuous data pipelines. Stream processing is becoming a paradigm to build data-driven and data-intensive applications - it brings together data processing logic and application/business logic.
To help users realize the potential of this change, we spent a lot of effort in this release to rework some fundamental pieces of Flink. We want Flink to feel natural to users who do data engineering / data processing, as well as users who build data/event-driven applications (and of course those who combine both aspects inside their applications). This is an ongoing journey, but here are the first steps on this way:
We have redesigned and reimplemented large parts of Flink&rsquo;s process model. This effort has been tracked under the name FLIP-6. While not all is completed yet, the changes in Flink 1.5 enable more natural Kubernetes deployments and switch to HTTP/REST for all external communication (to naturally interact with service proxies). Simultaneously, Flink 1.5 simplifies deployments on common cluster managers (YARN, Mesos) and features dynamic resource allocation.
Streaming broadcast state (FLINK-4940) connects a broadcasted stream (e.g., context data, machine learning models, rules/patterns, triggers, &hellip;) with other streams that may maintain (large) keyed state, such as feature vectors, state machines, etc. Prior to Flink 1.5, such use cases could not be easily built.
To improve support for real-time applications with tight latency constraints, we made major improvements to Flink’s network stack (FLINK-7315). Flink 1.5 achieves even lower latencies while maintaining a high throughput. In addition, we improved checkpoint stability under backpressure.
Streaming SQL is more and more recognized as a simple and powerful way to perform streaming analytics, build data pipelines, do feature engineering, or incrementally keep applications updated on changing data. We added a SQL CLI for streaming SQL queries (FLIP-24) to make this feature easier to get started with.
New Features and Improvements # Rewrite of Flink’s Deployment and Process Model # The rewrite of Flink’s deployment and process model (internally known as FLIP-6) has been in the works for more than a year and was a substantial effort from the Flink community. Many contributors from several organizations, such as data Artisans, Alibaba, and Dell EMC, collaborated on the design and implementation of this feature, which has been the most significant improvement of a Flink core component since the project’s inception.
In a nutshell, the improvements add support for dynamic resource allocation and dynamic release of resources on YARN and Mesos schedulers for better resource utilization, failure recovery, and also dynamic scaling. Moreover, deployments on container management infrastructures like Kubernetes have been simplified and all requests to the JobManager now happen through REST. This includes job submission, cancellation, requesting job status, taking a savepoint, and so on.
The work also builds the foundation for future improvements of Flink’s integration with Kubernetes. In a later version it will be possible to dockerize jobs and deploy them in a natural way as part of the container deployment, i.e., without starting a Flink cluster first. In addition, the work is a big step towards support for applications that are able to automatically adjust their parallelism.
Note that Flink’s programming APIs are not affected by these improvements.
Broadcast State # Support for broadcast state, i.e., state that is replicated across all parallel instances of a function, has been an frequently requested feature. Typical use cases for broadcast state involve two streams, a control or configuration stream that serves rules, patterns, or other configuration messages and a regular data stream. The processing of the regular stream is configured by the messages of the control stream. By broadcasting rules or patterns to all parallel instances of a function, they can be applied to all events of the regular stream.
Of course, broadcasted state can checkpointed and restored just like any other state in Flink with exactly-once state consistency guarantees. Moreover, broadcast state unblocks the implementation of the “dynamic patterns” feature for Flink’s CEP library.
Improvements to Flink’s Network Stack # The performance of a distributed streaming application heavily depends on the component that transfers events from one operator to another via a network connection. In the context of stream processing, two performance metrics, latency and throughput, are important.
For Flink 1.5, the community worked on two efforts to improve Flink’s network stack, credit-based flow control and improving the transfer latency. Credit-based flow control reduces the amount of data “on the wire” to a minimum while preserving high throughput. This significantly reduces the time to complete a checkpoint in back pressure situations. Moreover, Flink is now able to achieve much lower latencies without a reduction in throughput.
Task-Local State Recovery # Flink’s checkpointing mechanism writes copies of an application’s state to a remote, persistent storage and loads it back in case of a failure. This mechanism ensures that state is not lost when an application fails. However, in case of a failure, it might take a while to load the state from the remote storage to recover the application.
Improving the checkpointing and recovery efficiency is an ongoing effort in the Flink community. Prominent features of previous releases were asynchronous and incremental checkpointing. In this release, we improved the efficiency of failure recovery.
Task-local state recovery leverages the fact that a job typically fails due to a single crashed operator, TaskManager, or machine. When writing the state of operators to the remote storage, Flink can now also keep a copy on the local disk of each machine. In case of failover, the scheduler tries to reschedule tasks to their previous machine and load the state from the local disk instead of the remote storage, resulting in faster recovery.
Extending Join Support for SQL and Table API # With the 1.5.0 release, Flink adds support for windowed outer equi-joins. Queries like the one shown below allow for joining of tables on bounded time ranges in both event-time and processing-time.
SELECT d.rideId, d.departureTime, a.arrivalTime FROM Departures d LEFT OUTER JOIN Arrivals a ON d.rideId = a.rideId AND a.arrivalTime BETWEEN d.deptureTime AND d.departureTime + &#39;2&#39; HOURS For cases where two streaming tables should not be joined within a bounded time interval, Flink SQL also now supports non-windowed inner joins. This enables full-history matching, which is common in many standard SQL statements.
SELECT u.name, u.address, o.productId, o.amount FROM Users u JOIN Orders o ON u.userId = o.userId SQL CLI Client # A few months ago, the community started an effort to add a service to execute streaming and batch SQL queries (FLIP-24). The new SQL CLI client is the first step of this effort and provides a SQL shell to run exploratory queries on data streams. The animation below shows a preview of this features.
Various Other Features and Improvements # OpenStack provides software for creating public and private clouds on pools of resources. Flink now supports OpenStack’s S3-like file system, Swift, for checkpoint and savepoint storage. Swift can be used without Hadoop dependencies. Reading and writing JSON messages from and to connectors has been improved. It’s now possible to parse a standard JSON schema in order to configure serializers and deserializers. The SQL CLI Client is able to read JSON records from Kafka. Applications can be rescaled without manually triggering a savepoint. Under the hood, Flink will still take a savepoint, stop the application, and rescale it to the new parallelism. Improved metrics for watermarks and latency. Flink now reports the minimum watermark in all operators, including sources. Moreover, the latency metrics were reworked for better integration with common metrics systems. The FileInputFormat (and many derived input formats) now supports reading files from multiple paths. The BucketingSink supports the specification of custom extensions for multiple parts. The CassandraOutputFormat can be used to emit Row objects. The Kinesis consumer allows for more customization. Release Notes # Please review the release notes if you plan to upgrade your Flink setup to Flink 1.5.
List of Contributors # According to git shortlog, the following 106 people contributed to the 1.5.0 release. Thanks to all contributors!
Aegeaner, Alejandro Alcalde, Aljoscha Krettek, Andreas Fink, Andrey Zagrebin, Ankit Parashar, Arunan Sugunakumar, Bartłomiej Tartanus, Bowen Li, Cristian, Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii_Kniazev, Dyana Rose, EAlexRojas, Eron Wright, Fabian Hueske, Florian Schmidt, Gabor Gevay, Greg Hogan, Gyula Fora, Jark Wu, Jelmer Kuperus, Joerg Schad, John Eismeier, Kailash HD, Ken Geis, Ken Krugler, Kent Murra, Leonid Ishimnikov, Malcolm Taylor, Matrix42, Michael Fong, Michael Gendelman, Moser Thomas W, Nico Kruber, PJ Fanning, Patrick Lucas, Pavel Shvetsov, Phetsarath, Sourigna, Philip Luppens, Piotr Nowojski, Qiu Congxian/klion26, Razvan, Robert Metzger, Rong Rong, Shuyi Chen, Stefan Richter, Stephan Ewen, Stephen Parente, Steven Langbroek, Thomas Weise, Till Rohrmann, Timo Walther, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Vetriselvan1187, Xingcan Cui, Xpray, Yazdan.JS, Zhijiang, Zohar Mizrahi, aria, biao.liub, binlijin, davidxdh, eastcirclek, eskabetxe, gyao, hequn8128, hzyuqi1, ifndef-SleePy, jparkie, juhoautio, kkloudas, maqingxiang-it, maxbelov, mayyamus, mingleiZhang, neoremind, nichuanlei, okumin, shankarganesh1234, shuai.xus, sihuazhou, summerleafs, sunjincheng121, triones.deng, twalthr, uybhatti, vinoyang, wenlong.lwl, yanghua, yew1eb, yuemeng, zentol, zhangminglei, zhouhai02, zjureel, 军长, 金竹, 王振涛, 陈梓立
`}),e.add({id:210,href:"/2018/03/15/apache-flink-1.3.3-released/",title:"Apache Flink 1.3.3 Released",section:"Flink Blog",content:`The Apache Flink community released the third bugfix version of the Apache Flink 1.3 series.
This release includes 4 critical fixes related to checkpointing and recovery. The list below includes a detailed list of all fixes.
We highly recommend all Flink 1.3 series users to upgrade to Flink 1.3.3.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.3.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-7783] - Don&#39;t always remove checkpoints in ZooKeeperCompletedCheckpointStore#recover() Bug [FLINK-7283] - PythonPlanBinderTest issues with python paths [FLINK-8487] - State loss after multiple restart attempts [FLINK-8807] - ZookeeperCompleted checkpoint store can get stuck in infinite loop Improvement [FLINK-8890] - Compare checkpoints with order in CompletedCheckpoint.checkpointsMatch() `}),e.add({id:211,href:"/2018/03/08/apache-flink-1.4.2-released/",title:"Apache Flink 1.4.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.4 series.
This release includes more than 10 fixes and minor improvements for Flink 1.4.1. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.4.2.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.4.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.4.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.4.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-6321] - RocksDB state backend Checkpointing is not working with KeyedCEP. [FLINK-7756] - RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP. Bug [FLINK-8423] - OperatorChain#pushToOperator catch block may fail with NPE [FLINK-8451] - CaseClassSerializer is not backwards compatible in 1.4 [FLINK-8520] - CassandraConnectorITCase.testCassandraTableSink unstable on Travis [FLINK-8621] - PrometheusReporterTest.endpointIsUnavailableAfterReporterIsClosed unstable on Travis [FLINK-8692] - Mistake in MyMapFunction code snippet [FLINK-8735] - Add savepoint migration ITCase that covers operator state [FLINK-8741] - KafkaFetcher09/010/011 uses wrong user code classloader [FLINK-8772] - FlinkKafkaConsumerBase partitions discover missing a log parameter [FLINK-8791] - Fix documentation on how to link dependencies [FLINK-8798] - Make commons-logging a parent-first pattern [FLINK-8849] - Wrong link from concepts/runtime to doc on chaining Improvement [FLINK-8202] - Update queryable section on configuration page [FLINK-8574] - Add timestamps to travis logging messages [FLINK-8576] - Log message for QueryableState loading failure too verbose [FLINK-8652] - Reduce log level of QueryableStateClient.getKvState() to DEBUG Task [FLINK-8308] - Update yajl-ruby dependency to 1.3.1 or higher `}),e.add({id:212,href:"/2018/02/28/an-overview-of-end-to-end-exactly-once-processing-in-apache-flink-with-apache-kafka-too/",title:"An Overview of End-to-End Exactly-Once Processing in Apache Flink (with Apache Kafka, too!)",section:"Flink Blog",content:`This post is an adaptation of Piotr Nowojski&rsquo;s presentation from Flink Forward Berlin 2017. You can find the slides and a recording of the presentation on the Flink Forward Berlin website.
Apache Flink 1.4.0, released in December 2017, introduced a significant milestone for stream processing with Flink: a new feature called TwoPhaseCommitSinkFunction (relevant Jira here) that extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and a selection of data sources and sinks, including Apache Kafka versions 0.11 and beyond. It provides a layer of abstraction and requires a user to implement only a handful of methods to achieve end-to-end exactly-once semantics.
If that&rsquo;s all you need to hear, let us point you to the relevant place in the Flink documentation, where you can read about how to put TwoPhaseCommitSinkFunction to use.
But if you&rsquo;d like to learn more, in this post, we&rsquo;ll share an in-depth overview of the new feature and what is happening behind the scenes in Flink.
Throughout the rest of this post, we&rsquo;ll:
Describe the role of Flink&rsquo;s checkpoints for guaranteeing exactly-once results within a Flink application. Show how Flink interacts with data sources and data sinks via the two-phase commit protocol to deliver end-to-end exactly-once guarantees. Walk through a simple example on how to use TwoPhaseCommitSinkFunction to implement an exactly-once file sink. Exactly-once Semantics Within an Apache Flink Application # When we say &ldquo;exactly-once semantics&rdquo;, what we mean is that each incoming event affects the final results exactly once. Even in case of a machine or software failure, there&rsquo;s no duplicate data and no data that goes unprocessed.
Flink has long provided exactly-once semantics within a Flink application. Over the past few years, we&rsquo;ve written in depth about Flink&rsquo;s checkpointing, which is at the core of Flink&rsquo;s ability to provide exactly-once semantics. The Flink documentation also provides a thorough overview of the feature.
Before we continue, here&rsquo;s a quick summary of the checkpointing algorithm because understanding checkpoints is necessary for understanding this broader topic.
A checkpoint in Flink is a consistent snapshot of:
The current state of an application The position in an input stream Flink generates checkpoints on a regular, configurable interval and then writes the checkpoint to a persistent storage system, such as S3 or HDFS. Writing the checkpoint data to the persistent storage happens asynchronously, which means that a Flink application continues to process data during the checkpointing process.
In the event of a machine or software failure and upon restart, a Flink application resumes processing from the most recent successfully-completed checkpoint; Flink restores application state and rolls back to the correct position in the input stream from a checkpoint before processing starts again. This means that Flink computes results as though the failure never occurred.
Before Flink 1.4.0, exactly-once semantics were limited to the scope of a Flink application only and did not extend to most of the external systems to which Flink sends data after processing.
But Flink applications operate in conjunction with a wide range of data sinks, and developers should be able to maintain exactly-once semantics beyond the context of one component.
To provide end-to-end exactly-once semantics&ndash;that is, semantics that also apply to the external systems that Flink writes to in addition to the state of the Flink application&ndash;these external systems must provide a means to commit or roll back writes that coordinate with Flink&rsquo;s checkpoints.
One common approach for coordinating commits and rollbacks in a distributed system is the two-phase commit protocol. In the next section, we&rsquo;ll go behind the scenes and discuss how Flink&rsquo;s TwoPhaseCommitSinkFunction utilizes the two-phase commit protocol to provide end-to-end exactly-once semantics.
End-to-end Exactly Once Applications with Apache Flink # We&rsquo;ll walk through the two-phase commit protocol and how it enables end-to-end exactly-once semantics in a sample Flink application that reads from and writes to Kafka. Kafka is a popular messaging system to use along with Flink, and Kafka recently added support for transactions with its 0.11 release. This means that Flink now has the necessary mechanism to provide end-to-end exactly-once semantics in applications when receiving data from and writing data to Kafka.
Flink&rsquo;s support for end-to-end exactly-once semantics is not limited to Kafka and you can use it with any source / sink that provides the necessary coordination mechanism. For example, Pravega, an open-source streaming storage system from Dell/EMC, also supports end-to-end exactly-once semantics with Flink via the TwoPhaseCommitSinkFunction.
In the sample Flink application that we&rsquo;ll discuss today, we have:
A data source that reads from Kafka (in Flink, a KafkaConsumer) A windowed aggregation A data sink that writes data back to Kafka (in Flink, a KafkaProducer) For the data sink to provide exactly-once guarantees, it must write all data to Kafka within the scope of a transaction. A commit bundles all writes between two checkpoints.
This ensures that writes are rolled back in case of a failure.
However, in a distributed system with multiple, concurrently-running sink tasks, a simple commit or rollback is not sufficient, because all of the components must &ldquo;agree&rdquo; together on committing or rolling back to ensure a consistent result. Flink uses the two-phase commit protocol and its pre-commit phase to address this challenge.
The starting of a checkpoint represents the &ldquo;pre-commit&rdquo; phase of our two-phase commit protocol. When a checkpoint starts, the Flink JobManager injects a checkpoint barrier (which separates the records in the data stream into the set that goes into the current checkpoint vs. the set that goes into the next checkpoint) into the data stream.
The barrier is passed from operator to operator. For every operator, it triggers the operator&rsquo;s state backend to take a snapshot of its state.
The data source stores its Kafka offsets, and after completing this, it passes the checkpoint barrier to the next operator.
This approach works if an operator has internal state only. Internal state is everything that is stored and managed by Flink&rsquo;s state backends - for example, the windowed sums in the second operator. When a process has only internal state, there is no need to perform any additional action during pre-commit aside from updating the data in the state backends before it is checkpointed. Flink takes care of correctly committing those writes in case of checkpoint success or aborting them in case of failure.
However, when a process has external state, this state must be handled a bit differently. External state usually comes in the form of writes to an external system such as Kafka. In that case, to provide exactly-once guarantees, the external system must provide support for transactions that integrates with a two-phase commit protocol.
We know that the data sink in our example has such external state because it&rsquo;s writing data to Kafka. In this case, in the pre-commit phase, the data sink must pre-commit its external transaction in addition to writing its state to the state backend.
The pre-commit phase finishes when the checkpoint barrier passes through all of the operators and the triggered snapshot callbacks complete. At this point the checkpoint completed successfully and consists of the state of the entire application, including pre-committed external state. In case of a failure, we would re-initialize the application from this checkpoint.
The next step is to notify all operators that the checkpoint has succeeded. This is the commit phase of the two-phase commit protocol and the JobManager issues checkpoint-completed callbacks for every operator in the application. The data source and window operator have no external state, and so in the commit phase, these operators don&rsquo;t have to take any action. The data sink does have external state, though, and commits the transaction with the external writes.
So let&rsquo;s put all of these different pieces together:
Once all of the operators complete their pre-commit, they issue a commit. If at least one pre-commit fails, all others are aborted, and we roll back to the previous successfully-completed checkpoint. After a successful pre-commit, the commit must be guaranteed to eventually succeed &ndash; both our operators and our external system need to make this guarantee. If a commit fails (for example, due to an intermittent network issue), the entire Flink application fails, restarts according to the user&rsquo;s restart strategy, and there is another commit attempt. This process is critical because if the commit does not eventually succeed, data loss occurs. Therefore, we can be sure that all operators agree on the final outcome of the checkpoint: all operators agree that the data is either committed or that the commit is aborted and rolled back.
Implementing the Two-Phase Commit Operator in Flink # All the logic required to put a two-phase commit protocol together can be a little bit complicated and that&rsquo;s why Flink extracts the common logic of the two-phase commit protocol into the abstract TwoPhaseCommitSinkFunction class. Let&rsquo;s discuss how to extend a TwoPhaseCommitSinkFunction on a simple file-based example. We need to implement only four methods and present their implementations for an exactly-once file sink:
beginTransaction - to begin the transaction, we create a temporary file in a temporary directory on our destination file system. Subsequently, we can write data to this file as we process it. preCommit - on pre-commit, we flush the file, close it, and never write to it again. We&rsquo;ll also start a new transaction for any subsequent writes that belong to the next checkpoint. commit - on commit, we atomically move the pre-committed file to the actual destination directory. Please note that this increases the latency in the visibility of the output data. abort - on abort, we delete the temporary file. As we know, if there&rsquo;s any failure, Flink restores the state of the application to the latest successful checkpoint. One potential catch is in a rare case when the failure occurs after a successful pre-commit but before notification of that fact (a commit) reaches our operator. In that case, Flink restores our operator to the state that has already been pre-committed but not yet committed.
We must save enough information about pre-committed transactions in checkpointed state to be able to either abort or commit transactions after a restart. In our example, this would be the path to the temporary file and target directory.
The TwoPhaseCommitSinkFunction takes this scenario into account, and it always issues a preemptive commit when restoring state from a checkpoint. It is our responsibility to implement a commit in an idempotent way. Generally, this shouldn&rsquo;t be an issue. In our example, we can recognize such a situation: the temporary file is not in the temporary directory, but has already been moved to the target directory.
There are a handful of other edge cases that TwoPhaseCommitSinkFunction takes into account, too. Learn more in the Flink documentation.
Wrapping Up # If you&rsquo;ve made it this far, thanks for staying with us through a detailed post. Here are some key points that we covered:
Flink&rsquo;s checkpointing system serves as Flink&rsquo;s basis for supporting a two-phase commit protocol and providing end-to-end exactly-once semantics. An advantage of this approach is that Flink does not materialize data in transit the way that some other systems do&ndash;there&rsquo;s no need to write every stage of the computation to disk as is the case is most batch processing. Flink&rsquo;s new TwoPhaseCommitSinkFunction extracts the common logic of the two-phase commit protocol and makes it possible to build end-to-end exactly-once applications with Flink and external systems that support transactions Starting with Flink 1.4.0, both the Pravega and Kafka 0.11 producers provide exactly-once semantics; Kafka introduced transactions for the first time in Kafka 0.11, which is what made the Kafka exactly-once producer possible in Flink. The Kafka 0.11 producer is implemented on top of the TwoPhaseCommitSinkFunction, and it offers very low overhead compared to the at-least-once Kafka producer. We&rsquo;re very excited about what this new feature enables, and we look forward to being able to support additional producers with the TwoPhaseCommitSinkFunction in the future.
This post first appeared on the data Artisans blog and was contributed to Apache Flink and the Flink blog by the original authors Piotr Nowojski and Mike Winters.
`}),e.add({id:213,href:"/2018/02/15/apache-flink-1.4.1-released/",title:"Apache Flink 1.4.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.4 series.
This release includes more than 60 fixes and minor improvements for Flink 1.4.0. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.4.1.
Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.4.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.11&lt;/artifactId&gt; &lt;version&gt;1.4.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.11&lt;/artifactId&gt; &lt;version&gt;1.4.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-6321] - RocksDB state backend Checkpointing is not working with KeyedCEP. [FLINK-7499] - double buffer release in SpillableSubpartitionView [FLINK-7756] - RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP. [FLINK-7760] - Restore failing from external checkpointing metadata. [FLINK-8323] - Fix Mod scala function bug Bug [FLINK-5506] - Java 8 - CommunityDetection.java:158 - java.lang.NullPointerException [FLINK-6951] - Incompatible versions of httpcomponents jars for Flink kinesis connector [FLINK-7949] - AsyncWaitOperator is not restarting when queue is full [FLINK-8145] - IOManagerAsync not properly shut down in various tests [FLINK-8200] - RocksDBAsyncSnapshotTest should use temp fold instead of fold with fixed name [FLINK-8226] - Dangling reference generated after NFA clean up timed out SharedBufferEntry [FLINK-8230] - NPE in OrcRowInputFormat on nested structs [FLINK-8235] - Cannot run spotbugs for single module [FLINK-8242] - ClassCastException in OrcTableSource.toOrcPredicate [FLINK-8248] - RocksDB state backend Checkpointing is not working with KeyedCEP in 1.4 [FLINK-8249] - Kinesis Producer didnt configure region [FLINK-8261] - Typos in the shading exclusion for jsr305 in the quickstarts [FLINK-8263] - Wrong packaging of flink-core in scala quickstarty [FLINK-8265] - Missing jackson dependency for flink-mesos [FLINK-8270] - TaskManagers do not use correct local path for shipped Keytab files in Yarn deployment modes [FLINK-8275] - Flink YARN deployment with Kerberos enabled not working [FLINK-8278] - Scala examples in Metric documentation do not compile [FLINK-8283] - FlinkKafkaConsumerBase failing on Travis with no output in 10min [FLINK-8295] - Netty shading does not work properly [FLINK-8306] - FlinkKafkaConsumerBaseTest has invalid mocks on final methods [FLINK-8318] - Conflict jackson library with ElasticSearch connector [FLINK-8325] - Add COUNT AGG support constant parameter, i.e. COUNT(*), COUNT(1) [FLINK-8352] - Flink UI Reports No Error on Job Submission Failures [FLINK-8355] - DataSet Should not union a NULL row for AGG without GROUP BY clause. [FLINK-8371] - Buffers are not recycled in a non-spilled SpillableSubpartition upon release [FLINK-8398] - Stabilize flaky KinesisDataFetcherTests [FLINK-8406] - BucketingSink does not detect hadoop file systems [FLINK-8409] - Race condition in KafkaConsumerThread leads to potential NPE [FLINK-8419] - Kafka consumer&#39;s offset metrics are not registered for dynamically discovered partitions [FLINK-8421] - HeapInternalTimerService should reconfigure compatible key / namespace serializers on restore [FLINK-8433] - Update code example for &quot;Managed Operator State&quot; documentation [FLINK-8461] - Wrong logger configurations for shaded Netty [FLINK-8466] - ErrorInfo needs to hold Exception as SerializedThrowable [FLINK-8484] - Kinesis consumer re-reads closed shards on job restart [FLINK-8485] - Running Flink inside Intellij no longer works after upgrading from 1.3.2 to 1.4.0 [FLINK-8489] - Data is not emitted by second ElasticSearch connector [FLINK-8496] - WebUI does not display TM MemorySegment metrics [FLINK-8499] - Kryo must not be child-first loaded [FLINK-8522] - DefaultOperatorStateBackend writes data in checkpoint that is never read. [FLINK-8559] - Exceptions in RocksDBIncrementalSnapshotOperation#takeSnapshot cause job to get stuck [FLINK-8561] - SharedBuffer line 573 uses == to compare BufferEntries instead of .equals. Improvement [FLINK-8079] - Skip remaining E2E tests if one failed [FLINK-8202] - Update queryable section on configuration page [FLINK-8243] - OrcTableSource should recursively read all files in nested directories of the input path. [FLINK-8260] - Document API of Kafka 0.11 Producer [FLINK-8264] - Add Scala to the parent-first loading patterns [FLINK-8271] - upgrade from deprecated classes to AmazonKinesis [FLINK-8287] - Flink Kafka Producer docs should clearly state what partitioner is used by default [FLINK-8296] - Rework FlinkKafkaConsumerBestTest to not use Java reflection for dependency injection [FLINK-8346] - add S3 signature v4 workaround to docs [FLINK-8362] - Shade Elasticsearch dependencies away [FLINK-8455] - Add Hadoop to the parent-first loading patterns [FLINK-8473] - JarListHandler may fail with NPE if directory is deleted [FLINK-8571] - Provide an enhanced KeyedStream implementation to use ForwardPartitioner Test [FLINK-8472] - Extend migration tests for Flink 1.4 `}),e.add({id:214,href:"/2018/01/30/managing-large-state-in-apache-flink-an-intro-to-incremental-checkpointing/",title:"Managing Large State in Apache Flink: An Intro to Incremental Checkpointing",section:"Flink Blog",content:`Apache Flink was purpose-built for stateful stream processing. However, what is state in a stream processing application? I defined state and stateful stream processing in a previous blog post, and in case you need a refresher, state is defined as memory in an application&rsquo;s operators that stores information about previously-seen events that you can use to influence the processing of future events.
State is a fundamental, enabling concept in stream processing required for a majority of complex use cases. Some examples highlighted in the Flink documentation:
When an application searches for certain event patterns, the state stores the sequence of events encountered so far. When aggregating events per minute, the state holds the pending aggregates. When training a machine learning model over a stream of data points, the state holds the current version of the model parameters. However, stateful stream processing is only useful in production environments if the state is fault tolerant. &ldquo;Fault tolerance&rdquo; means that even if there&rsquo;s a software or machine failure, the computed end-result is accurate, with no data loss or double-counting of events.
Flink&rsquo;s fault tolerance has always been a powerful and popular feature, minimizing the impact of software or machine failure on your business and making it possible to guarantee exactly-once results from a Flink application.
Core to this is checkpointing, which is the mechanism Flink uses to make application state fault tolerant. A checkpoint in Flink is a global, asynchronous snapshot of application state that&rsquo;s taken on a regular interval and sent to durable storage (usually, a distributed file system). In the event of a failure, Flink restarts an application using the most recently completed checkpoint as a starting point. Some Apache Flink users run applications with gigabytes or even terabytes of application state. These users reported that with such large state, creating a checkpoint was often a slow and resource intensive operation, which is why in Flink 1.3 we introduced &lsquo;incremental checkpointing.&rsquo;
Before incremental checkpointing, every single Flink checkpoint consisted of the full state of an application. We created the incremental checkpointing feature after we noticed that writing the full state for every checkpoint was often unnecessary, as the state changes from one checkpoint to the next were rarely that large. Incremental checkpointing instead maintains the differences (or &lsquo;delta&rsquo;) between each checkpoint and stores only the differences between the last checkpoint and the current state.
Incremental checkpoints can provide a significant performance improvement for jobs with a very large state. Early testing of the feature by a production user with terabytes of state shows a drop in checkpoint time from more than 3 minutes down to 30 seconds after implementing incremental checkpoints. This is because the checkpoint doesn&rsquo;t need to transfer the full state to durable storage on each checkpoint.
How to Start # Currently, you can only use incremental checkpointing with a RocksDB state back-end, and Flink uses RocksDB&rsquo;s internal backup mechanism to consolidate checkpoint data over time. As a result, the incremental checkpoint history in Flink does not grow indefinitely, and Flink eventually consumes and prunes old checkpoints automatically.
To enable incremental checkpointing in your application, I recommend you read the the Apache Flink documentation on checkpointing for full details, but in summary, you enable checkpointing as normal, but enable incremental checkpointing in the constructor by setting the second parameter to true.
Java Example # StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStateBackend(new RocksDBStateBackend(filebackend, true)); Scala Example # val env = StreamExecutionEnvironment.getExecutionEnvironment() env.setStateBackend(new RocksDBStateBackend(filebackend, true)) By default, Flink retains 1 completed checkpoint, so if you need a higher number, you can configure it with the following flag:
state.checkpoints.num-retained How it Works # Flink&rsquo;s incremental checkpointing uses RocksDB checkpoints as a foundation. RocksDB is a key-value store based on &lsquo;log-structured-merge&rsquo; (LSM) trees that collects all changes in a mutable (changeable) in-memory buffer called a &lsquo;memtable&rsquo;. Any updates to the same key in the memtable replace previous values, and once the memtable is full, RocksDB writes it to disk with all entries sorted by their key and with light compression applied. Once RocksDB writes the memtable to disk it is immutable (unchangeable) and is now called a &lsquo;sorted-string-table&rsquo; (sstable).
A &lsquo;compaction&rsquo; background task merges sstables to consolidate potential duplicates for each key, and over time RocksDB deletes the original sstables, with the merged sstable containing all information from across all the other sstables.
On top of this, Flink tracks which sstable files RocksDB has created and deleted since the previous checkpoint, and as the sstables are immutable, Flink uses this to figure out the state changes. To do this, Flink triggers a flush in RocksDB, forcing all memtables into sstables on disk, and hard-linked in a local temporary directory. This process is synchronous to the processing pipeline, and Flink performs all further steps asynchronously and does not block processing.
Then Flink copies all new sstables to stable storage (e.g., HDFS, S3) to reference in the new checkpoint. Flink doesn&rsquo;t copy all sstables that already existed in the previous checkpoint to stable storage but re-reference them. Any new checkpoints will no longer reference deleted files as deleted sstables in RocksDB are always the result of compaction, and it eventually replaces old tables with an sstable that is the result of a merge. This how in Flink&rsquo;s incremental checkpoints can prune the checkpoint history.
For tracking changes between checkpoints, the uploading of consolidated tables is redundant work. Flink performs the process incrementally, and typically adds only a small overhead, so we consider this worthwhile because it allows Flink to keep a shorter history of checkpoints to consider in a recovery.
An Example # Example setup
Take an example with a subtask of one operator that has a keyed state, and the number of retained checkpoints set at 2. The columns in the figure above show the state of the local RocksDB instance for each checkpoint, the files it references, and the counts in the shared state registry after the checkpoint completes.
For checkpoint &lsquo;CP 1&rsquo;, the local RocksDB directory contains two sstable files, it considers these new and uploads them to stable storage using directory names that match the checkpoint name. When the checkpoint completes, Flink creates the two entries in the shared state registry and sets their counts to &lsquo;1&rsquo;. The key in the shared state registry is a composite of an operator, subtask, and the original sstable file name. The registry also keeps a mapping from the key to the file path in stable storage.
For checkpoint &lsquo;CP 2&rsquo;, RocksDB has created two new sstable files, and the two older ones still exist. For checkpoint &lsquo;CP 2&rsquo;, Flink adds the two new files to stable storage and can reference the previous two files. When the checkpoint completes, Flink increases the counts for all referenced files by 1.
For checkpoint &lsquo;CP 3&rsquo;, RocksDB&rsquo;s compaction has merged sstable-(1), sstable-(2), and sstable-(3) into sstable-(1,2,3) and deleted the original files. This merged file contains the same information as the source files, with all duplicate entries eliminated. In addition to this merged file, sstable-(4) still exists and there is now a new sstable-(5) file. Flink adds the new sstable-(1,2,3) and sstable-(5) files to stable storage, sstable-(4) is re-referenced from checkpoint &lsquo;CP 2&rsquo; and increases the counts for referenced files by 1. The older &lsquo;CP 1&rsquo; checkpoint is now deleted as the number of retained checkpoints (2) has been reached. As part of this deletion, Flink decreases the counts for all files referenced &lsquo;CP 1&rsquo;, (sstable-(1) and sstable-(2)), by 1.
For checkpoint &lsquo;CP-4&rsquo;, RocksDB has merged sstable-(4), sstable-(5), and a new sstable-(6) into sstable-(4,5,6). Flink adds this new table to stable storage and references it together with sstable-(1,2,3), it increases the counts for sstable-(1,2,3) and sstable-(4,5,6) by 1 and then deletes &lsquo;CP-2&rsquo; as the number of retained checkpoints has been reached. As the counts for sstable-(1), sstable-(2), and sstable-(3) have now dropped to 0, and Flink deletes them from stable storage.
Race Conditions and Concurrent Checkpoints # As Flink can execute multiple checkpoints in parallel, sometimes new checkpoints start before confirming previous checkpoints as completed. Because of this, you should consider which the previous checkpoint to use as a basis for a new incremental checkpoint. Flink only references state from a checkpoint confirmed by the checkpoint coordinator so that it doesn&rsquo;t unintentionally reference a deleted shared file.
Restoring Checkpoints and Performance Considerations # If you enable incremental checkpointing, there are no further configuration steps needed to recover your state in case of failure. If a failure occurs, Flink&rsquo;s JobManager tells all tasks to restore from the last completed checkpoint, be it a full or incremental checkpoint. Each TaskManager then downloads their share of the state from the checkpoint on the distributed file system.
Though the feature can lead to a substantial improvement in checkpoint time for users with a large state, there are trade-offs to consider with incremental checkpointing. Overall, the process reduces the checkpointing time during normal operations but can lead to a longer recovery time depending on the size of your state. If the cluster failure is particularly severe and the Flink TaskManagers have to read from multiple checkpoints, recovery can be a slower operation than when using non-incremental checkpointing. You can also no longer delete old checkpoints as newer checkpoints need them, and the history of differences between checkpoints can grow indefinitely over time. You need to plan for larger distributed storage to maintain the checkpoints and the network overhead to read from it.
There are some strategies for improving the convenience/performance trade-off, and I recommend you read the Flink documentation for more details.
This post originally appeared on the data Artisans blog and was contributed to the Flink blog by Stefan Richter and Chris Ward.
`}),e.add({id:215,href:"/2017/12/21/apache-flink-in-2017-year-in-review/",title:"Apache Flink in 2017: Year in Review",section:"Flink Blog",content:`2017 was another exciting year for the Apache Flink® community, with 3 major version releases (Flink 1.2.0 in February, Flink 1.3.0 in June, and Flink 1.4.0 in December) and the first-ever Flink Forward in San Francisco, giving Flink community members in another corner of the globe an opportunity to connect. Users shared details about their innovative production deployments, redefining what is possible with a modern stream processing framework like Flink.
In this post, we&rsquo;ll look back on the project&rsquo;s progress over the course of 2017, and we&rsquo;ll also preview what 2018 has in store.
{%toc%}
Community Growth # Github # First, here&rsquo;s a summary of community statistics from GitHub. At the time of writing:
Contributors have increased from 258 in December 2016 to 352 in December 2017 (up 36%) Stars have increased from 1830 in December 2016 to 3036 in December 2017 (up 65%) Forks have increased from 1255 in December 2016 to 2070 in December 2017 (up 65%) The community also welcomed 10 new committers in 2017: Kostas Kloudas, Jark Wu, Stefan Richter, Kurt Young, Theodore Vasiloudis, Xiaogang Shi, Dawid Wysakowicz, Shaoxuan Wang, Jincheng Sun and Haohui Mai.
We also welcomed 3 new members to the project management committee (PMC): Greg Hogan, Tzu-Li (Gordon) Tai and Chesnay Schepler.
Next, let&rsquo;s take a look at a few other project stats, starting with number of commits. If we run:
git log --pretty=oneline --after=12/31/2016 | wc -l Inside the Flink repository, we&rsquo;ll see a total of 2316 commits so far in 2017, bringing the all-time total commits to 12,532.
Now, let&rsquo;s go a bit deeper, here are instructions to take a look at this data yourself.
Download and install gitstats from the project homepage, then clone the Apache Flink git repository:
git clone git@github.com:apache/flink.git Generate the statistics
gitstats flink/ flink-stats/ View all the statistics as an HTML page using your default browser:
open flink-stats/index.html Flink surpassed 1 million lines of code in 2016, and that trend continued in 2017 with the code base now clocking in at 1,257,949 lines.
Monday remains the day of the week with the most commits over the project&rsquo;s history, but Wednesday is catching up:
5 pm remains the preferred commit time, closely followed by 4 pm:
Meetups # Apache Flink Meetup membership grew by 20% this year to a total of 19,767 members at 39 meetups listing Flink as a topic. With meetups on five continents, the Flink community is proud to be truly global.
Flink Forward 2017 # 2017 was the first year we ran a Flink Forward conference in both Berlin (September 11-13) and San Francisco (April 10-11), and over 350 members of our community attended each event for speaker sessions, training, and discussion about Flink.
Slides and videos are available for all speaker sessions, and if you&rsquo;re interested in learning more about how organizations use Flink in production, we encourage you to browse and watch a couple.
For 2018, Flink Forward will be back in September in Berlin, and in April in San Francisco.
Features and Ecosystem # Flink Ecosystem Growth # Flink was added to a selection of distributions and integrations during 2017, making it easier for a wider user base to get started with Flink:
Official Docker image Official DC/OS and Mesos support A Flink connector for Pravega, Dell/EMC&rsquo;s streaming storage system. Uber announced AthenaX, a streaming SQL platform powered by Apache Flink. dataArtisans announced an early access program of a SaaS product based on Apache Flink, dA Platform 2. Feature Timeline in 2017 # Just in time for the end of the year, our 1.4 release read the full release announcement landed in mid-December culminating 5 months of work and the resolution of more than 900 issues. This is the fifth major release in the 1.x.y series.
Here&rsquo;s a selection of major features added to Flink over the course of 2017:
If you take a look at the resolved issues and enhancements for 2017 on Jira you can see that the community resolved over 1,831 issues and feature additions.
Regarding roadmap commitments from 2016, there is mixed news, with some items a part of current releases, others scheduled for upcoming releases and some that remain under discussion.
Looking ahead to 2018 # A good source of information about the Flink community&rsquo;s roadmap is the list of Flink Improvement Proposals (FLIPs) in the project wiki. Below, we&rsquo;ll highlight a selection of FLIPs accepted by the community as well as some that are still under discussion.
Work is already underway on a number of these features, and some will be included in Flink 1.5 at the beginning of 2018.
Improved BLOB storage architecture, as described in FLIP-19 to consolidate API usage and improve concurrency. Integration of SQL and CEP, as described in FLIP-20 to allow developers to create complex event processing (CEP) patterns using SQL statements. Unified checkpoints and savepoints, as described in FLIP-10, to allow savepoints to be triggered automatically–important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job. An improved Flink deployment and process model, as described in FLIP-6, to allow for better integration with Flink and cluster managers and deployment technologies such as Mesos, Docker, and Kubernetes. Fine-grained recovery from task failures, as described in FLIP-1 to improve recovery efficiency and only re-execute failed tasks, reducing the amount of state that Flink needs to transfer on recovery. An SQL Client, as described in FLIP-24 to add a service and a client to execute SQL queries against batch and streaming tables. Serving of machine learning models, as described in FLIP-23 to add a library that allows users to apply offline-trained machine learning models to data streams. If you&rsquo;re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the Flink mailing lists.
Lastly, we&rsquo;d like to extend a sincere thank you to all the Flink community for making 2017 a great year!
`}),e.add({id:216,href:"/2017/12/12/apache-flink-1.4.0-release-announcement/",title:"Apache Flink 1.4.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the 1.4.0 release. Over the past 5 months, the Flink community has been working hard to resolve more than 900 issues. See the complete changelog for more detail.
This is the fifth major release in the 1.x.y series. It is API-compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.
We encourage everyone to download the release and check out the documentation.
Feedback through the Flink mailing lists is, as always, gladly encouraged!
You can find the binaries on the updated Downloads page on the Flink project site.
The release includes improvements to many different aspects of Flink, including:
The ability to build end-to-end exactly-once applications with Flink and popular data sources and sinks such as Apache Kafka. A more developer-friendly dependency structure as well as Hadoop-free Flink for Flink users who do not have Hadoop dependencies. Support for JOIN and for new sources and sinks in table API and SQL, expanding the range of logic that can be expressed with these APIs. A summary of some of the features in the release is available below.
For more background on the Flink 1.4.0 release and the work planned for the Flink 1.5.0 release, please refer to this blog post on the Apache Flink blog.
New Features and Improvements # End-to-end Exactly Once Applications with Apache Flink and Apache Kafka and TwoPhaseCommitSinkFunction # Flink 1.4 includes a first version of an exactly-once producer for Apache Kafka 0.11. This producer enables developers who build Flink applications with Kafka as a data source and sink to compute exactly-once results not just within the Flink program, but truly “end-to-end” in the application.
The common pattern used for exactly-once applications in Kafka and in other sinks&ndash;the two-phase commit algorithm&ndash;has been extracted in Flink 1.4.0 into a common class, the TwoPhaseCommitSinkFunction (FLINK-7210). This will make it easier for users to create their own exactly-once data sinks in the future.
Table API and Streaming SQL Enhancements # Flink SQL now supports windowed joins based on processing time and event time (FLINK-5725). Users will be able to execute a join between 2 streaming tables and compute windowed results according to these 2 different concepts of time. The syntax and semantics in Flink are the same as standard SQL with JOIN and with Flink’s streaming SQL more broadly.
Flink SQL also now supports “INSERT INTO SELECT” queries, which makes it possible to write results from SQL directly into a data sink (an external system that receives data from a Flink application). This improves operability and ease-of-use of Flink SQL.
The Table API now supports aggregations on streaming tables; previously, the only supported operations on streaming tables were projection, selection, and union (FLINK-4557). This feature was initially discussed in Flink Improvement Proposal 11: FLIP-11.
The release also adds support for new table API and SQL sources and sinks, including a Kafka 0.11 source and JDBC sink.
Lastly, Flink SQL now uses Apache Calcite 1.14, which was just released in October 2017 (FLINK-7051).
A Significantly-Improved Dependency Structure and Reversed Class Loading # Flink 1.4.0 shades a number of dependences and subtle runtime conflicts, including:
ASM Guava Jackson Netty Apache Zookeeper These changes improve Flink’s overall stability and removes friction when embedding Flink or calling Flink &ldquo;library style&rdquo;.
The release also introduces default reversed (child-first) class loading for dynamically-loaded user code, allowing for different dependencies than those included in the core framework.
For details on those changes please check out the relevant Jira issues:
FLINK-7442 FLINK-6529 Hadoop-free Flink # Apache Flink users without any Apache Hadoop dependencies can now run Flink without Hadoop. Flink programs that do not rely on Hadoop components can now be much smaller, a benefit particularly in a container-based setup resulting in less network traffic and better performance.
This includes the addition of Flink’s own Amazon S3 filesystem implementations based on Hadoop&rsquo;s S3a and Presto&rsquo;s S3 file system with properly shaded dependencies (FLINK-5706).
The details of these changes regarding Hadoop-free Flink are available in the Jira issue: FLINK-2268.
Improvements to Flink Internals # Flink 1.4.0 introduces a new blob storage architecture that was first discussed in Flink Improvement Proposal 19 (FLINK-6916).
This will enable easier integration with both the work being done in Flink Improvement Proposal 6 in the future and with other improvements in the 1.4.0 release, such as support for messages larger than the maximum Akka Framesize (FLINK-6046).
The improvement also enables Flink to leverage distributed file systems in high availability settings for optimized distribution of deployment data to TaskManagers.
Improvements to the Queryable State Client # Flink’s queryable state makes it possible for users to access application state directly in Flink before the state has been sent to an external database or key-value store.
Flink 1.4.0 introduces a range of improvements to the queryable state client, including a more container-friendly architecture, a more user-friendly API that hides configuration parameters, and the groundwork to be able to expose window state (the state of an in-flight window) in the future.
For details about the changes to queryable state please refer to the umbrella Jira issue: FLINK-5675.
Metrics and Monitoring # Flink’s metrics system now also includes support for Prometheus, an increasingly-popular metrics and reporting system within the Flink community (FLINK-6221).
And the Apache Kafka connector in Flink now exposes metrics for failed and successful offset commits in the Kafka consumer callback (FLINK-6998).
Connector improvements and fixes # Flink 1.4.0 introduces an Apache Kafka 0.11 connector and, as described above, support for an exactly-once producer for Kafka 0.11 (FLINK-6988).
Additionally, the Flink-Kafka consumer now supports dynamic partition discovery &amp; topic discovery based on regex. This means that the Flink-Kafka consumer can pick up new Kafka partitions without needing to restart the job and while maintaining exactly-once guarantees (FLINK-4022).
Flink’s Apache Kinesis connector now uses an updated version of the Kinesis Consumer Library and Kinesis Consumer Library. This introduces improved retry logic to the connector and should significantly reduce the number of failures caused by Flink writing too quickly to Kinesis (FLINK-7366).
Flink’s Apache Cassandra connector now supports Scala tuples&ndash;previously, only streams of Java tuples were supported (FLINK-4497). Also, a bug was fixed in the Cassandra connector that caused messages to be lost in certain instances (FLINK-4500).
Release Notes - Please Read # Some of these changes will require updating the configuration or Maven dependencies for existing programs. Please read below to see if you might be affected.
Changes to dynamic class loading of user code # As mentioned above, we changed the way Flink loads user code from the previous default of parent-first class loading (the default for Java) to child-first classloading, which is a common practice in Java Application Servers, where this is also referred to as inverted or reversed class loading.
This should not affect regular user code but will enable programs to use a different version of dependencies that come with Flink &ndash; for example Akka, netty, or Jackson. If you want to change back to the previous default, you can use the configuration setting classloader.resolve-order: parent-first, the new default being child-first.
No more Avro dependency included by default # Flink previously included Avro by default so user programs could simply use Avro and not worry about adding any dependencies. This behavior was changed in Flink 1.4 because it can lead to dependency clashes.
You now must manually include the Avro dependency (flink-avro) with your program jar (or add it to the Flink lib folder) if you want to use Avro.
Hadoop-free Flink # Starting with version 1.4, Flink can run without any Hadoop dependencies present in the Classpath. Along with simply running without Hadoop, this enables Flink to dynamically use whatever Hadoop version is available in the classpath.
You could, for example, download the Hadoop-free release of Flink but use that to run on any supported version of YARN, and Flink would dynamically use the Hadoop dependencies from YARN.
This also means that in cases where you used connectors to HDFS, such as the BucketingSink or RollingSink, you now have to ensure that you either use a Flink distribution with bundled Hadoop dependencies or make sure to include Hadoop dependencies when building a jar file for your application.
Bundled S3 FileSystems # Flink 1.4 comes bundled with two different S3 FileSystems based on the Presto S3 FileSystem and the Hadoop S3A FileSystem. They don&rsquo;t have dependencies (because all dependencies are shaded/relocated) and you can use them by dropping the respective file from the opt directory into the lib directory of your Flink installation. For more information about this, please refer to the documentation.
List of Contributors # According to git shortlog, the following 106 people contributed to the 1.4.0 release. Thank you to all contributors!
Ajay Tripathy, Alejandro Alcalde, Aljoscha Krettek, Bang, Phiradet, Bowen Li, Chris Ward, Cristian, Dan Kelley, David Anderson, Dawid Wysakowicz, Dian Fu, Dmitrii Kniazev, DmytroShkvyra, Fabian Hueske, FlorianFan, Fokko Driesprong, Gabor Gevay, Gary Yao, Greg Hogan, Haohui Mai, Hequn Cheng, James Lafa, Jark Wu, Jie Shen, Jing Fan, JingsongLi, Joerg Schad, Juan Paulo Gutierrez, Ken Geis, Kent Murra, Kurt Young, Lim Chee Hau, Maximilian Bode, Michael Fong, Mike Kobit, Mikhail Lipkovich, Nico Kruber, Novotnik, Petr, Nycholas de Oliveira e Oliveira, Patrick Lucas, Piotr Nowojski, Robert Metzger, Rodrigo Bonifacio, Rong Rong, Scott Kidder, Sebastian Klemke, Shuyi Chen, Stefan Richter, Stephan Ewen, Svend Vanderveken, Till Rohrmann, Tony Wei, Tzu-Li (Gordon) Tai, Ufuk Celebi, Usman Younas, Vetriselvan1187, Vishnu Viswanath, Wright, Eron, Xingcan Cui, Xpray, Yestin, Yonatan Most, Zhenzhong Xu, Zhijiang, adebski, asdf2014, bbayani, biao.liub, cactuslrd.lird, dawidwys, desktop, fengyelei, godfreyhe, gosubpl, gyao, hongyuhong, huafengw, kkloudas, kl0u, lincoln-lil, lingjinjiang, mengji.fy, minwenjun, mtunique, p1tz, paul, rtudoran, shaoxuan-wang, sirko bretschneider, sunjincheng121, tedyu, twalthr, uybhatti, wangmiao1981, yew1eb, z00376786, zentol, zhangminglei, zhe li, zhouhai02, zjureel, 付典, 军长, 宝牛, 淘江, 金竹
`}),e.add({id:217,href:"/2017/11/21/looking-ahead-to-apache-flink-1.4.0-and-1.5.0/",title:"Looking Ahead to Apache Flink 1.4.0 and 1.5.0",section:"Flink Blog",content:`The Apache Flink 1.4.0 release is on track to happen in the next couple of weeks, and for all of the readers out there who haven’t been following the release discussion on Flink’s developer mailing list, we’d like to provide some details on what’s coming in Flink 1.4.0 as well as a preview of what the Flink community will save for 1.5.0.
Both releases include ambitious features that we believe will move Flink to an entirely new level in terms of the types of problems it can solve and applications it can support. The community deserves lots of credit for its hard work over the past few months, and we’re excited to see these features in the hands of users.
This post will describe how the community plans to get there and the rationale behind the approach.
Coming soon: Major Changes to Flink’s Runtime # There are 3 significant improvements to the Apache Flink engine that the community has nearly completed and that will have a meaningful impact on Flink’s operability and performance.
Rework of the deployment model and distributed process Transition from configurable, fixed-interval network I/O to event-driven network I/O and application-level flow control for better backpressure handling Faster recovery from failure Next, we’ll go through each of these improvements in more detail.
Reworking Flink’s Deployment Model and Distributed Processing # FLIP-6 (FLIP is short for FLink Improvement Proposal and FLIPs are proposals for bigger changes to Flink) is an initiative that’s been in the works for more than a year and represents a major refactor of Flink’s deployment model and distributed process. The underlying motivation for FLIP-6 was the fact that Flink is being adopted by a wider range of developer communities&ndash;both developers coming from the big data and analytics space as well as developers coming from the event-driven applications space.
Modern, stateful stream processing has served as a convergence for these two developer communities. Despite a significant overlap of the core concepts in the applications being built, each group of developers has its own set of common tools, deployment models, and expected behaviors when working with a stream processing framework like Flink.
FLIP-6 will ensure that Flink fits naturally in both of these contexts, behaving as though it’s native to each ecosystem and operating seamlessly within a broader technology stack. A few of the specific changes in FLIP-6 that will have such an impact:
Leveraging cluster management frameworks to support full resource elasticity First-class support for containerized environments such as Kubernetes and Docker REST-based client-cluster communication to ease operations and 3rd party integrations FLIP-6, along with already-introduced features like rescalable state, lays the groundwork for dynamic scaling in Flink, meaning that Flink programs will be able to scale up or down automatically based on required resources&ndash;a huge step forward in terms of ease of operability and the efficiency of Flink applications.
Lower Latency via Improvements to the Apache Flink Network Stack # Speed will always be a key consideration for users who build stream processing applications, and Flink 1.5 will include a rework of the network stack that will even further improve Flink&rsquo;s latency. At the heart of this work is a transition from configurable, fixed-interval network I/O to event- driven network I/O and application-level flow control, ensuring that Flink will use all available network capacity, as well as credit-based flow control which offers more fine-grained backpressuring for improved checkpoint alignments.
In our testing (see slide 26 here), we’ve seen a substantial improvement in latency using event-driven network I/O, and the community is also doing work to make sure we’re able to provide this increase in speed without a measurable throughput tradeoff.
Faster Recovery from Failures # Flink 1.3.0 introduced incremental checkpoints, making it possible to take a checkpoint of state updates since the last successfully-completed checkpoint only rather than the previous behavior of only taking checkpoints of the entire state of the application. This has led to significant performance improvements for users with large state.
Flink 1.5 will introduce task-local recovery, which means that Flink will store a second copy of the most recent checkpoint on the local disk (or even in main memory) of a task manager. The primary copy still goes to durable storage so that it’s resilient to machine failures.
In case of failover, the scheduler will try to reschedule tasks to their previous task manager (in other words, to the same machine again) if this is possible. The task can then recover from the locally-kept state. This makes it possible to avoid reading all state from the distributed file system (which is remote over the network). Especially in applications with very large state, not having to read many gigabytes over the network and instead from local disk will result in significant performance gains in recovery.
The Proposed Timeline for Flink 1.4 and Flink 1.5 # The good news is that all 3 of the features described above are well underway, and in fact, much of the work is already covered by open pull requests.
But given these features’ importance and the complexity of the work involved, the community expected that the QA and testing required would be extensive and would delay the release of the otherwise- ready features also on the list for the next release.
And so the community decided to withhold the 3 features above (deployment model rework, improvements to the network stack, and faster recovery) to be included a separate Flink 1.5 release that will come shortly after the Flink 1.4 release. Flink 1.5 is estimated to come just a couple of months after 1.4 rather than the typical 4-month cycle in between major releases.
The soon-to-be-released Flink 1.4 represents the current state of Flink without merging those 3 features. And Flink 1.4 is a substantial release in its own right, including, but not limited to, the following:
A significantly improved dependency structure, removing many of Flink’s dependencies and subtle runtime conflicts. This increases overall stability and removes friction when embedding Flink or calling Flink &ldquo;library style&rdquo;. Reversed class loading for dynamically-loaded user code, allowing for different dependencies than those included in the core framework. An Apache Kafka 0.11 exactly-once producer, making it possible to build end-to-end exactly once applications with Flink and Kafka. Streaming SQL JOIN based on processing time and event time, which gives users the full advantage of Flink’s time handling while using a SQL JOIN. Table API / Streaming SQL Source and Sink Additions, including a Kafka 0.11 source and JDBC sink. Hadoop-free Flink, meaning that users who don’t rely on any Hadoop components (such as YARN or HDFS) in their Flink applications can use Flink without Hadoop for the first time. Improvements to queryable state, including a more container-friendly architecture, a more user-friendly API that hides configuration parameters, and the groundwork to be able to expose window state (the state of an in-flight window) in the future. Connector improvements and fixes for a range of connectors including Kafka, Apache Cassandra, Amazon Kinesis, and more. Improved RPC performance for faster recovery from failure The community decided it was best to get these features into a stable version of Flink as soon as possible, and the separation of what could have been a single (and very substantial) Flink 1.4 release into 1.4 and 1.5 serves that purpose.
We’re excited by what each of these represents for Apache Flink, and we’d like to extend our thanks to the Flink community for all of their hard work.
If you’d like to follow along with release discussions, please subscribe to the dev@ mailing list.
`}),e.add({id:218,href:"/2017/08/05/apache-flink-1.3.2-released/",title:"Apache Flink 1.3.2 Released",section:"Flink Blog",content:`The Apache Flink community released the second bugfix version of the Apache Flink 1.3 series.
This release includes more than 60 fixes and minor improvements for Flink 1.3.1. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.3.2.
Important Notice: A user reported a bug in the FlinkKafkaConsumer (FLINK-7143) that is causing incorrect partition assignment in large Kafka deployments in the presence of inconsistent broker metadata. In that case multiple parallel instances of the FlinkKafkaConsumer may read from the same topic partition, leading to data duplication. In Flink 1.3.2 this bug is fixed but incorrect assignments from Flink 1.3.0 and 1.3.1 cannot be automatically fixed by upgrading to Flink 1.3.2 via a savepoint because the upgraded version would resume the wrong partition assignment from the savepoint. If you believe you are affected by this bug (seeing messages from some partitions duplicated) please refer to the JIRA issue for an upgrade path that works around that.
Before attempting the more elaborate upgrade path, we would suggest to check if you are actually affected by this bug. We did not manage to reproduce it in various testing clusters and according to the reporting user, it only appeared in rare cases on their very large setup. This leads us to believe that most likely only a minority of setups would be affected by this bug.
Notable changes:
The default Kafka version for Flink Kafka Consumer 0.10 was bumped from 0.10.0.1 to 0.10.2.1. Some default values for configurations of AWS API call behaviors in the Flink Kinesis Consumer were adapted for better default consumption performance: 1) SHARD_GETRECORDS_MAX default changed to 10,000, and 2) SHARD_GETRECORDS_INTERVAL_MILLIS default changed to 200ms. Updated Maven dependencies:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.3.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
List of resolved issues:
Sub-task [FLINK-6665] - Pass a ScheduledExecutorService to the RestartStrategy [FLINK-6667] - Pass a callback type to the RestartStrategy, rather than the full ExecutionGraph [FLINK-6680] - App &amp; Flink migration guide: updates for the 1.3 release Bug [FLINK-5488] - yarnClient should be closed in AbstractYarnClusterDescriptor for error conditions [FLINK-6376] - when deploy flink cluster on the yarn, it is lack of hdfs delegation token. [FLINK-6541] - Jar upload directory not created [FLINK-6654] - missing maven dependency on &quot;flink-shaded-hadoop2-uber&quot; in flink-dist [FLINK-6655] - Misleading error message when HistoryServer path is empty [FLINK-6742] - Improve error message when savepoint migration fails due to task removal [FLINK-6774] - build-helper-maven-plugin version not set [FLINK-6806] - rocksdb is not listed as state backend in doc [FLINK-6843] - ClientConnectionTest fails on travis [FLINK-6867] - Elasticsearch 1.x ITCase still instable due to embedded node instability [FLINK-6918] - Failing tests: ChainLengthDecreaseTest and ChainLengthIncreaseTest [FLINK-6945] - TaskCancelAsyncProducerConsumerITCase.testCancelAsyncProducerAndConsumer instable test case [FLINK-6964] - Fix recovery for incremental checkpoints in StandaloneCompletedCheckpointStore [FLINK-6965] - Avro is missing snappy dependency [FLINK-6987] - TextInputFormatTest fails when run in path containing spaces [FLINK-6996] - FlinkKafkaProducer010 doesn&#39;t guarantee at-least-once semantic [FLINK-7005] - Optimization steps are missing for nested registered tables [FLINK-7011] - Instable Kafka testStartFromKafkaCommitOffsets failures on Travis [FLINK-7025] - Using NullByteKeySelector for Unbounded ProcTime NonPartitioned Over [FLINK-7034] - GraphiteReporter cannot recover from lost connection [FLINK-7038] - Several misused &quot;KeyedDataStream&quot; term in docs and Javadocs [FLINK-7041] - Deserialize StateBackend from JobCheckpointingSettings with user classloader [FLINK-7132] - Fix BulkIteration parallelism [FLINK-7133] - Fix Elasticsearch version interference [FLINK-7137] - Flink table API defaults top level fields as nullable and all nested fields within CompositeType as non-nullable [FLINK-7143] - Partition assignment for Kafka consumer is not stable [FLINK-7154] - Missing call to build CsvTableSource example [FLINK-7158] - Wrong test jar dependency in flink-clients [FLINK-7177] - DataSetAggregateWithNullValuesRule fails creating null literal for non-nullable type [FLINK-7178] - Datadog Metric Reporter Jar is Lacking Dependencies [FLINK-7180] - CoGroupStream perform checkpoint failed [FLINK-7195] - FlinkKafkaConsumer should not respect fetched partitions to filter restored partition states [FLINK-7216] - ExecutionGraph can perform concurrent global restarts to scheduling [FLINK-7225] - Cutoff exception message in StateDescriptor [FLINK-7226] - REST responses contain invalid content-encoding header [FLINK-7231] - SlotSharingGroups are not always released in time for new restarts [FLINK-7234] - Fix CombineHint documentation [FLINK-7241] - Fix YARN high availability documentation [FLINK-7255] - ListStateDescriptor example uses wrong constructor [FLINK-7258] - IllegalArgumentException in Netty bootstrap with large memory state segment size [FLINK-7266] - Don&#39;t attempt to delete parent directory on S3 [FLINK-7268] - Zookeeper Checkpoint Store interacting with Incremental State Handles can lead to loss of handles [FLINK-7281] - Fix various issues in (Maven) release infrastructure Improvement [FLINK-6365] - Adapt default values of the Kinesis connector [FLINK-6575] - Disable all tests on Windows that use HDFS [FLINK-6682] - Improve error message in case parallelism exceeds maxParallelism [FLINK-6789] - Remove duplicated test utility reducer in optimizer [FLINK-6874] - Static and transient fields ignored for POJOs [FLINK-6898] - Limit size of operator component in metric name [FLINK-6937] - Fix link markdown in Production Readiness Checklist doc [FLINK-6940] - Clarify the effect of configuring per-job state backend [FLINK-6998] - Kafka connector needs to expose metrics for failed/successful offset commits in the Kafka Consumer callback [FLINK-7004] - Switch to Travis Trusty image [FLINK-7032] - Intellij is constantly changing language level of sub projects back to 1.6 [FLINK-7069] - Catch exceptions for each reporter separately [FLINK-7149] - Add checkpoint ID to &#39;sendValues()&#39; in GenericWriteAheadSink [FLINK-7164] - Extend integration tests for (externalised) checkpoints, checkpoint store [FLINK-7174] - Bump dependency of Kafka 0.10.x to the latest one [FLINK-7211] - Exclude Gelly javadoc jar from release [FLINK-7224] - Incorrect Javadoc description in all Kafka consumer versions [FLINK-7228] - Harden HistoryServerStaticFileHandlerTest [FLINK-7233] - TaskManagerHeapSizeCalculationJavaBashTest failed on Travis [FLINK-7287] - test instability in Kafka010ITCase.testCommitOffsetsToKafka [FLINK-7290] - Make release scripts modular `}),e.add({id:219,href:"/2017/07/04/a-deep-dive-into-rescalable-state-in-apache-flink/",title:"A Deep Dive into Rescalable State in Apache Flink",section:"Flink Blog",content:`Apache Flink 1.2.0, released in February 2017, introduced support for rescalable state. This post provides a detailed overview of stateful stream processing and rescalable state in Flink. An Intro to Stateful Stream Processing # At a high level, we can consider state in stream processing as memory in operators that remembers information about past input and can be used to influence the processing of future input.
In contrast, operators in stateless stream processing only consider their current inputs, without further context and knowledge about the past. A simple example to illustrate this difference: let us consider a source stream that emits events with schema e = {event_id:int, event_value:int}. Our goal is, for each event, to extract and output the event_value. We can easily achieve this with a simple source-map-sink pipeline, where the map function extracts the event_value from the event and emits it downstream to an outputting sink. This is an instance of stateless stream processing.
But what if we want to modify our job to output the event_value only if it is larger than the value from the previous event? In this case, our map function obviously needs some way to remember the event_value from a past event — and so this is an instance of stateful stream processing.
This example should demonstrate that state is a fundamental, enabling concept in stream processing that is required for a majority of interesting use cases.
State in Apache Flink # Apache Flink is a massively parallel distributed system that allows stateful stream processing at large scale. For scalability, a Flink job is logically decomposed into a graph of operators, and the execution of each operator is physically decomposed into multiple parallel operator instances. Conceptually, each parallel operator instance in Flink is an independent task that can be scheduled on its own machine in a network-connected cluster of shared-nothing machines.
For high throughput and low latency in this setting, network communications among tasks must be minimized. In Flink, network communication for stream processing only happens along the logical edges in the job’s operator graph (vertically), so that the stream data can be transferred from upstream to downstream operators.
However, there is no communication between the parallel instances of an operator (horizontally). To avoid such network communication, data locality is a key principle in Flink and strongly affects how state is stored and accessed.
For the sake of data locality, all state data in Flink is always bound to the task that runs the corresponding parallel operator instance and is co-located on the same machine that runs the task.
Through this design, all state data for a task is local, and no network communication between tasks is required for state access. Avoiding this kind of traffic is crucial for the scalability of a massively parallel distributed system like Flink.
For Flink’s stateful stream processing, we differentiate between two different types of state: operator state and keyed state. Operator state is scoped per parallel instance of an operator (sub-task), and keyed state can be thought of as “operator state that has been partitioned, or sharded, with exactly one state-partition per key”. We could have easily implemented our previous example as operator state: all events that are routed through the operator instance can influence its value.
Rescaling Stateful Stream Processing Jobs # Changing the parallelism (that is, changing the number of parallel subtasks that perform work for an operator) in stateless streaming is very easy. It requires only starting or stopping parallel instances of stateless operators and dis-/connecting them to/from their upstream and downstream operators as shown in Figure 1A.
On the other hand, changing the parallelism of stateful operators is much more involved because we must also (i) redistribute the previous operator state in a (ii) consistent, (iii) meaningful way. Remember that in Flink’s shared-nothing architecture, all state is local to the task that runs the owning parallel operator instance, and there is no communication between parallel operator instances at job runtime.
However, there is already one mechanism in Flink that allows the exchange of operator state between tasks, in a consistent way, with exactly-once guarantees — Flink’s checkpointing!
You can see detail about Flink’s checkpoints in the documentation. In a nutshell, a checkpoint is triggered when a checkpoint coordinator injects a special event (a so-called checkpoint barrier) into a stream.
Checkpoint barriers flow downstream with the event stream from sources to sinks, and whenever an operator instance receives a barrier, the operator instance immediately snapshots its current state to a distributed storage system, e.g. HDFS.
On restore, the new tasks for the job (which potentially run on different machines now) can again pick up the state data from the distributed storage system.
Figure 1
We can piggyback rescaling of stateful jobs on checkpointing, as shown in Figure 1B. First, a checkpoint is triggered and sent to a distributed storage system. Next, the job is restarted with a changed parallelism and can access a consistent snapshot of all previous state from the distributed storage. While this solves (i) redistribution of a (ii) consistent state across machines there is still one problem: without a clear 1:1 relationship between previous state and new parallel operator instances, how can we assign the state in a (iii) meaningful way?
We could again assign the state from previous map_1 and map_2 to the new map_1 and map_2. But this would leave map_3 with empty state. Depending on the type of state and concrete semantics of the job, this naive approach could lead to anything from inefficiency to incorrect results.
In the following section, we’ll explain how we solved the problem of efficient, meaningful state reassignment in Flink. Each of Flink state’s two flavours, operator state and keyed state, requires a different approach to state assignment.
Reassigning Operator State When Rescaling # First, we’ll discuss how state reassignment in rescaling works for operator state. A common real-world use-case of operator state in Flink is to maintain the current offsets for Kafka partitions in Kafka sources. Each Kafka source instance would maintain &lt;PartitionID, Offset&gt; pairs – one pair for each Kafka partition that the source is reading–as operator state. How would we redistribute this operator state in case of rescaling? Ideally, we would like to reassign all &lt;PartitionID, Offset&gt; pairs from the checkpoint in round robin across all parallel operator instances after the rescaling.
As a user, we are aware of the “meaning” of Kafka partition offsets, and we know that we can treat them as independent, redistributable units of state. The problem of how we can we share this domain-specific knowledge with Flink remains.
Figure 2A illustrates the previous interface for checkpointing operator state in Flink. On snapshot, each operator instance returned an object that represented its complete state. In the case of a Kafka source, this object was a list of partition offsets.
This snapshot object was then written to the distributed store. On restore, the object was read from distributed storage and passed to the operator instance as a parameter to the restore function.
This approach was problematic for rescaling: how could Flink decompose the operator state into meaningful, redistributable partitions? Even though the Kafka source was actually always a list of partition offsets, the previously-returned state object was a black box to Flink and therefore could not be redistributed.
As a generalized approach to solve this black box problem, we slightly modified the checkpointing interface, called ListCheckpointed. Figure 2B shows the new checkpointing interface, which returns and receives a list of state partitions. Introducing a list instead of a single object makes the meaningful partitioning of state explicit: each item in the list still remains a black box to Flink, but is considered an atomic, independently re-distributable part of the operator state.
Figure 2
Our approach provides a simple API with which implementing operators can encode domain-specific knowledge about how to partition and merge units of state. With our new checkpointing interface, the Kafka source makes individual partition offsets explicit, and state reassignment becomes as easy as splitting and merging lists.
public class FlinkKafkaConsumer&lt;T&gt; extends RichParallelSourceFunction&lt;T&gt; implements CheckpointedFunction { // ... private transient ListState&lt;Tuple2&lt;KafkaTopicPartition, Long&gt;&gt; offsetsOperatorState; @Override public void initializeState(FunctionInitializationContext context) throws Exception { OperatorStateStore stateStore = context.getOperatorStateStore(); // register the state with the backend this.offsetsOperatorState = stateStore.getSerializableListState(&#34;kafka-offsets&#34;); // if the job was restarted, we set the restored offsets if (context.isRestored()) { for (Tuple2&lt;KafkaTopicPartition, Long&gt; kafkaOffset : offsetsOperatorState.get()) { // ... restore logic } } } @Override public void snapshotState(FunctionSnapshotContext context) throws Exception { this.offsetsOperatorState.clear(); // write the partition offsets to the list of operator states for (Map.Entry&lt;KafkaTopicPartition, Long&gt; partition : this.subscribedPartitionOffsets.entrySet()) { this.offsetsOperatorState.add(Tuple2.of(partition.getKey(), partition.getValue())); } } // ... } Reassigning Keyed State When Rescaling # The second flavour of state in Flink is keyed state. In contrast to operator state, keyed state is scoped by key, where the key is extracted from each stream event.
To illustrate how keyed state differs from operator state, let’s use the following example. Assume we have a stream of events, where each event has the schema {customer_id:int, value:int}. We have already learned that we can use operator state to compute and emit the running sum of values for all customers.
Now assume we want to slightly modify our goal and compute a running sum of values for each individual customer_id. This is a use case from keyed state, as one aggregated state must be maintained for each unique key in the stream.
Note that keyed state is only available for keyed streams, which are created through the keyBy() operation in Flink. The keyBy() operation (i) specifies how to extract a key from each event and (ii) ensures that all events with the same key are always processed by the same parallel operator instance. As a result, all keyed state is transitively also bound to one parallel operator instance, because for each key, exactly one operator instance is responsible. This mapping from key to operator is deterministically computed through hash partitioning on the key.
We can see that keyed state has one clear advantage over operator state when it comes to rescaling: we can easily figure out how to correctly split and redistribute the state across parallel operator instances. State reassignment simply follows the partitioning of the keyed stream. After rescaling, the state for each key must be assigned to the operator instance that is now responsible for that key, as determined by the hash partitioning of the keyed stream.
While this automatically solves the problem of logically remapping the state to sub-tasks after rescaling, there is one more practical problem left to solve: how can we efficiently transfer the state to the subtasks’ local backends?
When we’re not rescaling, each subtask can simply read the whole state as written to the checkpoint by a previous instance in one sequential read.
When rescaling, however, this is no longer possible – the state for each subtask is now potentially scattered across the files written by all subtasks (think about what happens if you change the parallelism in hash(key) mod parallelism). We have illustrated this problem in Figure 3A. In this example, we show how keys are shuffled when rescaling from parallelism 3 to 4 for a key space of 0, 20, using identity as hash function to keep it easy to follow.
A naive approach might be to read all the previous subtask state from the checkpoint in all sub-tasks and filter out the matching keys for each sub-task. While this approach can benefit from a sequential read pattern, each subtask potentially reads a large fraction of irrelevant state data, and the distributed file system receives a huge number of parallel read requests.
Another approach could be to build an index that tracks the location of the state for each key in the checkpoint. With this approach, all sub-tasks could locate and read the matching keys very selectively. This approach would avoid reading irrelevant data, but it has two major downsides. A materialized index for all keys, i.e. a key-to-read-offset mapping, can potentially grow very large. Furthermore, this approach can also introduce a huge amount of random I/O (when seeking to the data for individual keys, see Figure 3A, which typically entails very bad performance in distributed file systems.
Flink’s approach sits in between those two extremes by introducing key-groups as the atomic unit of state assignment. How does this work? The number of key-groups must be determined before the job is started and (currently) cannot be changed after the fact. As key-groups are the atomic unit of state assignment, this also means that the number of key-groups is the upper limit for parallelism. In a nutshell, key-groups give us a way to trade between flexibility in rescaling (by setting an upper limit for parallelism) and the maximum overhead involved in indexing and restoring the state.
We assign key-groups to subtasks as ranges. This makes the reads on restore not only sequential within each key-group, but often also across multiple key-groups. An additional benefit: this also keeps the metadata of key-group-to-subtask assignments very small. We do not maintain explicit lists of key-groups because it is sufficient to track the range boundaries.
We have illustrated rescaling from parallelism 3 to 4 with 10 key-groups in Figure 3B. As we can see, introducing key-groups and assigning them as ranges greatly improves the access pattern over the naive approach. Equation 2 and 3 in Figure 3B also details how we compute key-groups and the range assignment.
Figure 2
Wrapping Up # Thanks for staying with us, and we hope you now have a clear idea of how rescalable state works in Apache Flink and how to make use of rescaling in real-world scenarios.
Flink 1.3.0, which was released earlier this month, adds more tooling for state management and fault tolerance in Flink, including incremental checkpoints. And the community is exploring features such as…
• State replication
• State that isn’t bound to the lifecycle of a Flink job
• Automatic rescaling (with no savepoints required)
…for Flink 1.4.0 and beyond.
If you’d like to learn more, we recommend starting with the Apache Flink documentation.
This is an excerpt from a post that originally appeared on the data Artisans blog. If you&rsquo;d like to read the original post in its entirety, you can find it here (external link).
`}),e.add({id:220,href:"/2017/06/23/apache-flink-1.3.1-released/",title:"Apache Flink 1.3.1 Released",section:"Flink Blog",content:"The Apache Flink community released the first bugfix version of the Apache Flink 1.3 series.\nThis release includes 50 fixes and minor improvements for Flink 1.3.0. The list below includes a detailed list of all fixes.\nWe highly recommend all users to upgrade to Flink 1.3.1.\n&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.3.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.3.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.\nBug [FLINK-6492] - Unclosed DataOutputViewStream in GenericArraySerializerConfigSnapshot#write() [FLINK-6602] - Table source with defined time attributes allows empty string [FLINK-6652] - Problem with DelimitedInputFormat [FLINK-6659] - RocksDBMergeIteratorTest, SavepointITCase leave temporary directories behind [FLINK-6669] - [Build] Scala style check errror on Windows [FLINK-6685] - SafetyNetCloseableRegistry is closed prematurely in Task::triggerCheckpointBarrier [FLINK-6772] - Incorrect ordering of matched state events in Flink CEP [FLINK-6775] - StateDescriptor cannot be shared by multiple subtasks [FLINK-6780] - ExternalTableSource should add time attributes in the row type [FLINK-6783] - Wrongly extracted TypeInformations for WindowedStream::aggregate [FLINK-6797] - building docs fails with bundler 1.15 [FLINK-6801] - PojoSerializerConfigSnapshot cannot deal with missing Pojo fields [FLINK-6804] - Inconsistent state migration behaviour between different state backends [FLINK-6807] - Elasticsearch 5 connector artifact not published to maven [FLINK-6808] - Stream join fails when checkpointing is enabled [FLINK-6809] - side outputs documentation: wrong variable name in java example code [FLINK-6812] - Elasticsearch 5 release artifacts not published to Maven central [FLINK-6815] - Javadocs don&#39;t work anymore in Flink 1.4-SNAPSHOT [FLINK-6816] - Fix wrong usage of Scala string interpolation in Table API [FLINK-6833] - Race condition: Asynchronous checkpointing task can fail completed StreamTask [FLINK-6844] - TraversableSerializer should implement compatibility methods [FLINK-6848] - Extend the managed state docs with a Scala example [FLINK-6853] - Migrating from Flink 1.1 fails for FlinkCEP [FLINK-6869] - Scala serializers do not have the serialVersionUID specified [FLINK-6875] - Remote DataSet API job submission timing out [FLINK-6881] - Creating a table from a POJO and defining a time attribute fails [FLINK-6883] - Serializer for collection of Scala case classes are generated with different anonymous class names in 1.3 [FLINK-6886] - Fix Timestamp field can not be selected in event time case when toDataStream[T], `T` not a `Row` Type. [FLINK-6896] - Creating a table from a POJO and use table sink to output fail [FLINK-6899] - Wrong state array size in NestedMapsStateTable [FLINK-6914] - TrySerializer#ensureCompatibility causes StackOverflowException [FLINK-6915] - EnumValueSerializer broken [FLINK-6921] - EnumValueSerializer cannot properly handle appended enum values [FLINK-6922] - Enum(Value)SerializerConfigSnapshot uses Java serialization to store enum values [FLINK-6930] - Selecting window start / end on row-based Tumble/Slide window causes NPE [FLINK-6932] - Update the inaccessible Dataflow Model paper link [FLINK-6941] - Selecting window start / end on over window causes field not resolve exception [FLINK-6948] - EnumValueSerializer cannot handle removed enum values Improvement [FLINK-5354] - Split up Table API documentation into multiple pages [FLINK-6038] - Add deep links to Apache Bahir Flink streaming connector documentations [FLINK-6796] - Allow setting the user code class loader for AbstractStreamOperatorTestHarness [FLINK-6803] - Add test for PojoSerializer when Pojo changes [FLINK-6859] - StateCleaningCountTrigger should not delete timer [FLINK-6929] - Add documentation for Table API OVER windows [FLINK-6952] - Add link to Javadocs [FLINK-6748] - Table API / SQL Docs: Table API Page Test [FLINK-6830] - Add ITTests for savepoint migration from 1.3 [FLINK-6320] - Flakey JobManagerHAJobGraphRecoveryITCase [FLINK-6744] - Flaky ExecutionGraphSchedulingTest [FLINK-6913] - Instable StatefulJobSavepointMigrationITCase.testRestoreSavepoint "}),e.add({id:221,href:"/2017/06/01/apache-flink-1.3.0-release-announcement/",title:"Apache Flink 1.3.0 Release Announcement",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the 1.3.0 release. Over the past 4 months, the Flink community has been working hard to resolve more than 680 issues. See the complete changelog for more detail.
This is the fourth major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.
Users can expect Flink releases now in a 4 month cycle. At the beginning of the 1.3 release cycle, the community decided to follow a strict time-based release model.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, gladly encouraged!
You can find the binaries on the updated Downloads page. Some highlights of the release are listed below.
Large State Handling/Recovery # Incremental Checkpointing for RocksDB: It is now possible to checkpoint only the difference from the previous successful checkpoint, rather than checkpointing the entire application state. This speeds up checkpointing and saves disk space, because the individual checkpoints are smaller. (FLINK-5053).
Asynchronous snapshots for heap-based state backends: The filesystem and memory statebackends now also support asynchronous snapshots using a copy-on-write HashMap implementation. Asynchronous snapshotting makes Flink more resilient to slow storage systems and expensive serialization. The time an operator blocks on a snapshot is reduced to a minimum (FLINK-6048, FLINK-5715).
Allow upgrades to state serializers: Users can now upgrade serializers, while keeping their application state. One use case of this is upgrading custom serializers used for managed operator state/keyed state. Also, registration order for POJO types/Kryo types is now no longer fixed (Documentation, FLINK-6178).
Recover job state at the granularity of operator: Before Flink 1.3, operator state was bound to Flink’s internal &ldquo;Task&rdquo; representation. This made it hard to change a job’s topology while keeping its state around. With this change, users are allowed to do more topology changes (un-chain operators) by restoring state into logical operators instead of “Tasks” (FLINK-5892).
Fine-grained recovery (beta): Instead of restarting the complete ExecutionGraph in case of a task failure, Flink is now able to restart only the affected subgraph and thereby significantly decrease recovery time (FLINK-4256).
DataStream API # Side Outputs: This change allows users to have more than one output stream for an operator. Operator metadata, internal system information (debugging, performance etc.) or rejected/late elements are potential use-cases for this new API feature. The Window operator is now using this new feature for late window elements (Side Outputs Documentation, FLINK-4460).
Union Operator State: Flink 1.2.0 introduced broadcast state functionality, but this had not yet been exposed via a public API. Flink 1.3.0 provides the Union Operator State API for exposing broadcast operator state. The union state will send the entire state across all parallel instances to each instance on restore, giving each operator a full view of the state (FLINK-5991).
Per-Window State: Previously, the state that a WindowFunction or ProcessWindowFunction could access was scoped to the key of the window but not the window itself. With this new feature, users can keep window state independent of the key (FLINK-5929).
Deployment and Tooling # Flink HistoryServer: Flink’s HistoryServer now allows you to query the status and statistics of completed jobs that have been archived by a JobManager (FLINK-1579).
Watermark Monitoring in Web Front-end: For easier diagnosis of watermark issues, the Flink JobManager front-end now provides a new tab to track the watermark of each operator (FLINK-3427).
Datadog HTTP Metrics Reporter: Datadog is a widely-used metrics system, and Flink now offers a Datadog reporter that contacts the Datadog http endpoint directly (FLINK-6013).
Network Buffer Configuration: We finally got rid of the tedious network buffer configuration and replaced it with a more generic approach. First of all, you may now follow the idiom &ldquo;more is better&rdquo; without any penalty on the latency which could previously occur due to excessive buffering in incoming and outgoing channels. Secondly, instead of defining an absolute number of network buffers, we now use fractions of the available JVM memory (10% by default). This should cover more use cases by default and may also be tweaked by defining a minimum and maximum size.
→ See Configuring the Network Buffers in the Flink documentation.
Table API / SQL # Support for Retractions in Table API / SQL: As part of our endeavor to support continuous queries on Dynamic Tables, Retraction is an important building block that will enable a whole range of new applications which require updating previously-emitted results. Examples for such use cases are computation of early results for long-running windows, updates due to late arriving data, or maintaining constantly changing results similar to materialized views in relational database systems. Flink 1.3.0 supports retraction for non-windowed aggregates. Results with updates can be either converted into a DataStream or materialized to external data stores using TableSinks with upsert or retraction support.
Extended support for aggregations in Table API / SQL: With Flink 1.3.0, the Table API and SQL support many more types of aggregations, including
GROUP BY window aggregations in SQL (via the window functions TUMBLE, HOP, and SESSION windows) for both batch and streaming.
SQL OVER window aggregations (only for streaming)
Non-windowed aggregations (in streaming with retractions).
User-defined aggregation functions for custom aggregation logic.
External catalog support: The Table API &amp; SQL allows to register external catalogs. Table API and SQL queries can then have access to table sources and their schema from the external catalogs without register those tables one by one.
→ See the Flink documentation for details about these features.
The Table API / SQL documentation is currently being reworked. The community plans to publish the updated docs in the week of June 5th. Connectors # ElasticSearch 5.x support: The ElasticSearch connectors have been restructured to have a common base module and specific modules for ES 1, 2 and 5, similar to how the Kafka connectors are organized. This will make fixes and future improvements available across all ES versions (FLINK-4988).
Allow rescaling the Kinesis Consumer: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the Kinesis Consumer also makes use of that engine feature (FLINK-4821).
Transparent shard discovery for Kinesis Consumer: The Kinesis consumer can now discover new shards without failing / restarting jobs when a resharding is happening (FLINK-4577).
Allow setting custom start positions for the Kafka consumer: With this change, you can instruct Flink’s Kafka consumer to start reading messages from a specific offset (FLINK-3123) or earliest / latest offset (FLINK-4280) without respecting committed offsets in Kafka.
Allow out-opt from offset committing for the Kafka consumer: By default, Kafka commits the offsets to the Kafka broker once a checkpoint has been completed. This change allows users to disable this mechanism (FLINK-3398).
CEP Library # The CEP library has been greatly enhanced and is now able to accommodate more use-cases out-of-the-box (expressivity enhancements), make more efficient use of the available resources, adjust to changing runtime conditions&ndash;all without breaking backwards compatibility of operator state.
Please note that the API of the CEP library has been updated with this release.
Below are some of the main features of the revamped CEP library:
Make CEP operators rescalable: Flink 1.2.0 introduced rescalable state for DataStream programs. With Flink 1.3, the CEP library also makes use of that engine feature (FLINK-5420).
New operators for the CEP library:
Quantifiers (*,+,?) for the pattern API (FLINK-3318)
Support for different continuity requirements (FLINK-6208)
Support for iterative conditions (FLINK-6197)
Gelly Library # Unified driver for running Gelly examples FLINK-4949). PageRank algorithm for directed graphs (FLINK-4896). Add Circulant and Echo graph generators (FLINK-6393). Known Issues # There are two known issues in Flink 1.3.0. Both will be addressed in the 1.3.1 release. FLINK-6783: Wrongly extracted TypeInformations for WindowedStream::aggregate FLINK-6775: StateDescriptor cannot be shared by multiple subtasks List of Contributors # According to git shortlog, the following 103 people contributed to the 1.3.0 release. Thank you to all contributors!
Addison Higham, Alexey Diomin, Aljoscha Krettek, Andrea Sella, Andrey Melentyev, Anton Mushin, barcahead, biao.liub, Bowen Li, Chen Qin, Chico Sokol, David Anderson, Dawid Wysakowicz, DmytroShkvyra, Fabian Hueske, Fabian Wollert, fengyelei, Flavio Pompermaier, FlorianFan, Fokko Driesprong, Geoffrey Mon, godfreyhe, gosubpl, Greg Hogan, guowei.mgw, hamstah, Haohui Mai, Hequn Cheng, hequn.chq, heytitle, hongyuhong, Jamie Grier, Jark Wu, jingzhang, Jinkui Shi, Jin Mingjian, Joerg Schad, Joshua Griffith, Jürgen Thomann, kaibozhou, Kathleen Sharp, Ken Geis, kkloudas, Kurt Young, lincoln-lil, lingjinjiang, liuyuzhong7, Lorenz Buehmann, manuzhang, Marc Tremblay, Mauro Cortellazzi, Max Kuklinski, mengji.fy, Mike Dias, mtunique, Nico Kruber, Omar Erminy, Patrick Lucas, paul, phoenixjiangnan, rami-alisawi, Ramkrishna, Rick Cox, Robert Metzger, Rodrigo Bonifacio, rtudoran, Seth Wiesman, Shaoxuan Wang, shijinkui, shuai.xus, Shuyi Chen, spkavuly, Stefano Bortoli, Stefan Richter, Stephan Ewen, Stephen Gran, sunjincheng121, tedyu, Till Rohrmann, tonycox, Tony Wei, twalthr, Tzu-Li (Gordon) Tai, Ufuk Celebi, Ventura Del Monte, Vijay Srinivasaraghavan, WangTaoTheTonic, wenlong.lwl, xccui, xiaogang.sxg, Xpray, zcb, zentol, zhangminglei, Zhenghua Gao, Zhijiang, Zhuoluo Yang, zjureel, Zohar Mizrahi, 士远, 槿瑜, 淘江, 金竹
`}),e.add({id:222,href:"/2017/05/16/introducing-docker-images-for-apache-flink/",title:"Introducing Docker Images for Apache Flink",section:"Flink Blog",content:`For some time, the Apache Flink community has provided scripts to build a Docker image to run Flink. Now, starting with version 1.2.1, Flink will have a Docker image on the Docker Hub. This image is maintained by the Flink community and curated by the Docker team to ensure it meets the quality standards for container images of the Docker community.
A community-maintained way to run Apache Flink on Docker and other container runtimes and orchestrators is part of the ongoing effort by the Flink community to make Flink a first-class citizen of the container world.
If you want to use the Docker image today you can get the latest version by running:
docker pull flink And to run a local Flink cluster with one TaskManager and the Web UI exposed on port 8081, run:
docker run -t -p 8081:8081 flink local With this image there are various ways to start a Flink cluster, both locally and in a distributed environment. Take a look at the documentation that shows how to run a Flink cluster with multiple TaskManagers locally using Docker Compose or across multiple machines using Docker Swarm. You can also use the examples as a reference to create configurations for other platforms like Mesos and Kubernetes.
While this announcement is an important milestone, it’s just the first step to help users run containerized Flink in production. There are improvements to be made in Flink itself and we will continue to improve these Docker images and for the documentation and examples surrounding them.
This is of course a team effort, so any contribution is welcome. The docker-flink GitHub organization hosts the source files to generate the images and the documentation that is presented alongside the images on Docker Hub.
Disclaimer: The docker images are provided as a community project by individuals on a best-effort basis. They are not official releases by the Apache Flink PMC.
`}),e.add({id:223,href:"/2017/04/26/apache-flink-1.2.1-released/",title:"Apache Flink 1.2.1 Released",section:"Flink Blog",content:`The Apache Flink community released the first bugfix version of the Apache Flink 1.2 series.
This release includes many critical fixes for Flink 1.2.0. The list below includes a detailed list of all fixes.
We highly recommend all users to upgrade to Flink 1.2.1.
Please note that there are two unresolved major issues in Flink 1.2.1 and 1.2.0:
FLINK-6353 Restoring using CheckpointedRestoring does not work from 1.2 to 1.2 FLINK-6188 Some setParallelism() methods can&rsquo;t cope with default parallelism &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.2.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.2.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.2.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.2.1 Sub-task [FLINK-5546] - java.io.tmpdir setted as project build directory in surefire plugin [FLINK-5640] - configure the explicit Unit Test file suffix [FLINK-5723] - Use &quot;Used&quot; instead of &quot;Initial&quot; to make taskmanager tag more readable [FLINK-5825] - In yarn mode, a small pic can not be loaded Bug [FLINK-4813] - Having flink-test-utils as a dependency outside Flink fails the build [FLINK-4848] - keystoreFilePath should be checked against null in SSLUtils#createSSLServerContext [FLINK-5628] - CheckpointStatsTracker implements Serializable but isn&#39;t [FLINK-5644] - Task#lastCheckpointSize metric broken [FLINK-5650] - Flink-python tests executing cost too long time [FLINK-5652] - Memory leak in AsyncDataStream [FLINK-5669] - flink-streaming-contrib DataStreamUtils.collect in local environment mode fails when offline [FLINK-5678] - User-defined TableFunctions do not support all types of parameters [FLINK-5699] - Cancel with savepoint fails with a NPE if savepoint target directory not set [FLINK-5701] - FlinkKafkaProducer should check asyncException on checkpoints [FLINK-5708] - we should remove duplicated configuration options [FLINK-5732] - Java quick start mvn command line is incorrect [FLINK-5749] - unset HADOOP_HOME and HADOOP_CONF_DIR to avoid env in build machine failing the UT and IT [FLINK-5751] - 404 in documentation [FLINK-5771] - DelimitedInputFormat does not correctly handle multi-byte delimiters [FLINK-5773] - Cannot cast scala.util.Failure to org.apache.flink.runtime.messages.Acknowledge [FLINK-5806] - TaskExecutionState toString format have wrong key [FLINK-5814] - flink-dist creates wrong symlink when not used with cleaned before [FLINK-5817] - Fix test concurrent execution failure by test dir conflicts. [FLINK-5828] - BlobServer create cache dir has concurrency safety problem [FLINK-5885] - Java code snippet instead of scala in documentation [FLINK-5907] - RowCsvInputFormat bug on parsing tsv [FLINK-5934] - Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState [FLINK-5940] - ZooKeeperCompletedCheckpointStore cannot handle broken state handles [FLINK-5942] - Harden ZooKeeperStateHandleStore to deal with corrupted data [FLINK-5945] - Close function in OuterJoinOperatorBase#executeOnCollections [FLINK-5949] - Flink on YARN checks for Kerberos credentials for non-Kerberos authentication methods [FLINK-5962] - Cancel checkpoint canceller tasks in CheckpointCoordinator [FLINK-5965] - Typo on DropWizard wrappers [FLINK-5972] - Don&#39;t allow shrinking merging windows [FLINK-5985] - Flink treats every task as stateful (making topology changes impossible) [FLINK-6000] - Can not start HA cluster with start-cluster.sh [FLINK-6001] - NPE on TumblingEventTimeWindows with ContinuousEventTimeTrigger and allowedLateness [FLINK-6002] - Documentation: &#39;MacOS X&#39; under &#39;Download and Start Flink&#39; in Quickstart page is not rendered correctly [FLINK-6006] - Kafka Consumer can lose state if queried partition list is incomplete on restore [FLINK-6025] - User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying [FLINK-6051] - Wrong metric scope names in documentation [FLINK-6084] - Cassandra connector does not declare all dependencies [FLINK-6133] - fix build status in README.md [FLINK-6170] - Some checkpoint metrics rely on latest stat snapshot [FLINK-6181] - Zookeeper scripts use invalid regex [FLINK-6182] - Fix possible NPE in SourceStreamTask [FLINK-6183] - TaskMetricGroup may not be cleanup when Task.run() is never called or exits early [FLINK-6184] - Buffer metrics can cause NPE [FLINK-6203] - DataSet Transformations [FLINK-6207] - Duplicate type serializers for async snapshots of CopyOnWriteStateTable [FLINK-6308] - Task managers are not attaching to job manager on macos Improvement [FLINK-4326] - Flink start-up scripts should optionally start services on the foreground [FLINK-5217] - Deprecated interface Checkpointed make clear suggestion [FLINK-5331] - PythonPlanBinderTest idling extremely long [FLINK-5581] - Improve Kerberos security related documentation [FLINK-5639] - Clarify License implications of RabbitMQ Connector [FLINK-5680] - Document env.ssh.opts [FLINK-5681] - Make ReaperThread for SafetyNetCloseableRegistry a singleton [FLINK-5702] - Kafka Producer docs should warn if using setLogFailuresOnly, at-least-once is compromised [FLINK-5705] - webmonitor&#39;s request/response use UTF-8 explicitly [FLINK-5713] - Protect against NPE in WindowOperator window cleanup [FLINK-5721] - Add FoldingState to State Documentation [FLINK-5800] - Make sure that the CheckpointStreamFactory is instantiated once per operator only [FLINK-5805] - improve docs for ProcessFunction [FLINK-5807] - improved wording for doc home page [FLINK-5837] - improve readability of the queryable state docs [FLINK-5876] - Mention Scala type fallacies for queryable state client serializers [FLINK-5877] - Fix Scala snippet in Async I/O API doc [FLINK-5894] - HA docs are misleading re: state backends [FLINK-5895] - Reduce logging aggressiveness of FileSystemSafetyNet [FLINK-5938] - Replace ExecutionContext by Executor in Scheduler [FLINK-6212] - Missing reference to flink-avro dependency New Feature [FLINK-6139] - Documentation for building / preparing Flink for MapR Task [FLINK-2883] - Add documentation to forbid key-modifying ReduceFunction [FLINK-3903] - Homebrew Installation `}),e.add({id:224,href:"/2017/03/30/continuous-queries-on-dynamic-tables/",title:"Continuous Queries on Dynamic Tables",section:"Flink Blog",content:` Analyzing Data Streams with SQL # More and more companies are adopting stream processing and are migrating existing batch applications to streaming or implementing streaming solutions for new use cases. Many of those applications focus on analyzing streaming data. The data streams that are analyzed come from a wide variety of sources such as database transactions, clicks, sensor measurements, or IoT devices.
Apache Flink is very well suited to power streaming analytics applications because it provides support for event-time semantics, stateful exactly-once processing, and achieves high throughput and low latency at the same time. Due to these features, Flink is able to compute exact and deterministic results from high-volume input streams in near real-time while providing exactly-once semantics in case of failures.
Flink&rsquo;s core API for stream processing, the DataStream API, is very expressive and provides primitives for many common operations. Among other features, it offers highly customizable windowing logic, different state primitives with varying performance characteristics, hooks to register and react on timers, and tooling for efficient asynchronous requests to external systems. On the other hand, many stream analytics applications follow similar patterns and do not require the level of expressiveness as provided by the DataStream API. They could be expressed in a more natural and concise way using a domain specific language. As we all know, SQL is the de-facto standard for data analytics. For streaming analytics, SQL would enable a larger pool of people to specify applications on data streams in less time. However, no open source stream processor offers decent SQL support yet.
Why is SQL on Streams a Big Deal? # SQL is the most widely used language for data analytics for many good reasons:
SQL is declarative: You specify what you want but not how to compute it. SQL can be effectively optimized: An optimizer figures out an efficient plan to compute your result. SQL can be efficiently evaluated: The processing engine knows exactly what to compute and how to do so efficiently. And finally, everybody knows and many tools speak SQL. So being able to process and analyze data streams with SQL makes stream processing technology available to many more users. Moreover, it significantly reduces the time and effort to define efficient stream analytics applications due to the SQL&rsquo;s declarative nature and potential to be automatically optimized.
However, SQL (and the relational data model and algebra) were not designed with streaming data in mind. Relations are (multi-)sets and not infinite sequences of tuples. When executing a SQL query, conventional database systems and query engines read and process a data set, which is completely available, and produce a fixed sized result. In contrast, data streams continuously provide new records such that data arrives over time. Hence, streaming queries have to continuously process the arriving data and never &ldquo;complete&rdquo;.
That being said, processing streams with SQL is not impossible. Some relational database systems feature eager maintenance of materialized views, which is similar to evaluating SQL queries on streams of data. A materialized view is defined as a SQL query just like a regular (virtual) view. However, the result of the query is actually stored (or materialized) in memory or on disk such that the view does not need to be computed on-the-fly when it is queried. In order to prevent that a materialized view becomes stale, the database system needs to update the view whenever its base relations (the tables referenced in its definition query) are modified. If we consider the changes on the view&rsquo;s base relations as a stream of modifications (or as a changelog stream) it becomes obvious that materialized view maintenance and SQL on streams are somehow related.
Flink&rsquo;s Relational APIs: Table API and SQL # Since version 1.1.0 (released in August 2016), Flink features two semantically equivalent relational APIs, the language-embedded Table API (for Java and Scala) and standard SQL. Both APIs are designed as unified APIs for online streaming and historic batch data. This means that,
a query produces exactly the same result regardless whether its input is static batch data or streaming data.
Unified APIs for stream and batch processing are important for several reasons. First of all, users only need to learn a single API to process static and streaming data. Moreover, the same query can be used to analyze batch and streaming data, which allows to jointly analyze historic and live data in the same query. At the current state we haven&rsquo;t achieved complete unification of batch and streaming semantics yet, but the community is making very good progress towards this goal.
The following code snippet shows two equivalent Table API and SQL queries that compute a simple windowed aggregate on a stream of temperature sensor measurements. The syntax of the SQL query is based on Apache Calcite&rsquo;s syntax for grouped window functions and will be supported in version 1.3.0 of Flink.
val env = StreamExecutionEnvironment.getExecutionEnvironment env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime) val tEnv = TableEnvironment.getTableEnvironment(env) // define a table source to read sensor data (sensorId, time, room, temp) val sensorTable = ??? // can be a CSV file, Kafka topic, database, or ... // register the table source tEnv.registerTableSource(&#34;sensors&#34;, sensorTable) // Table API val tapiResult: Table = tEnv.scan(&#34;sensors&#34;) // scan sensors table .window(Tumble over 1.hour on &#39;rowtime as &#39;w) // define 1-hour window .groupBy(&#39;w, &#39;room) // group by window and room .select(&#39;room, &#39;w.end, &#39;temp.avg as &#39;avgTemp) // compute average temperature // SQL val sqlResult: Table = tEnv.sql(&#34;&#34;&#34; |SELECT room, TUMBLE_END(rowtime, INTERVAL &#39;1&#39; HOUR), AVG(temp) AS avgTemp |FROM sensors |GROUP BY TUMBLE(rowtime, INTERVAL &#39;1&#39; HOUR), room |&#34;&#34;&#34;.stripMargin) As you can see, both APIs are tightly integrated with each other and Flink&rsquo;s primary DataStream and DataSet APIs. A Table can be generated from and converted to a DataSet or DataStream. Hence, it is easily possible to scan an external table source such as a database or Parquet file, do some preprocessing with a Table API query, convert the result into a DataSet and run a Gelly graph algorithm on it. The queries defined in the example above can also be used to process batch data by changing the execution environment.
Internally, both APIs are translated into the same logical representation, optimized by Apache Calcite, and compiled into DataStream or DataSet programs. In fact, the optimization and translation process does not know whether a query was defined using the Table API or SQL. If you are curious about the details of the optimization process, have a look at a blog post that we published last year. Since the Table API and SQL are equivalent in terms of semantics and only differ in syntax, we always refer to both APIs when we talk about SQL in this post.
In its current state (version 1.2.0), Flink&rsquo;s relational APIs support a limited set of relational operators on data streams, including projections, filters, and windowed aggregates. All supported operators have in common that they never update result records which have been emitted. This is clearly not an issue for record-at-a-time operators such as projection and filter. However, it affects operators that collect and process multiple records as for instance windowed aggregates. Since emitted results cannot be updated, input records, which arrive after a result has been emitted, have to be discarded in Flink 1.2.0.
The limitations of the current version are acceptable for applications that emit data to storage systems such as Kafka topics, message queues, or files which only support append operations and no updates or deletes. Common use cases that follow this pattern are for example continuous ETL and stream archiving applications that persist streams to an archive or prepare data for further online (streaming) analysis or later offline analysis. Since it is not possible to update previously emitted results, these kinds of applications have to make sure that the emitted results are correct and will not need to be corrected in the future. The following figure illustrates such applications.
While queries that only support appends are useful for some kinds of applications and certain types of storage systems, there are many streaming analytics use cases that need to update results. This includes streaming applications that cannot discard late arriving records, need early results for (long-running) windowed aggregates, or require non-windowed aggregates. In each of these cases, previously emitted result records need to be updated. Result-updating queries often materialize their result to an external database or key-value store in order to make it accessible and queryable for external applications. Applications that implement this pattern are dashboards, reporting applications, or other applications, which require timely access to continuously updated results. The following figure illustrates these kind of applications.
Continuous Queries on Dynamic Tables # Support for queries that update previously emitted results is the next big step for Flink&rsquo;s relational APIs. This feature is so important because it vastly increases the scope of the APIs and the range of supported use cases. Moreover, many of the newly supported use cases can be challenging to implement using the DataStream API.
So when adding support for result-updating queries, we must of course preserve the unified semantics for stream and batch inputs. We achieve this by the concept of Dynamic Tables. A dynamic table is a table that is continuously updated and can be queried like a regular, static table. However, in contrast to a query on a batch table which terminates and returns a static table as result, a query on a dynamic table runs continuously and produces a table that is continuously updated depending on the modification on the input table. Hence, the resulting table is a dynamic table as well. This concept is very similar to materialized view maintenance as we discussed before.
Assuming we can run queries on dynamic tables which produce new dynamic tables, the next question is, How do streams and dynamic tables relate to each other? The answer is that streams can be converted into dynamic tables and dynamic tables can be converted into streams. The following figure shows the conceptual model of processing a relational query on a stream.
First, the stream is converted into a dynamic table. The dynamic table is queried with a continuous query, which produces a new dynamic table. Finally, the resulting table is converted back into a stream. It is important to note that this is only the logical model and does not imply how the query is actually executed. In fact, a continuous query is internally translated into a conventional DataStream program.
In the following, we describe the different steps of this model:
Defining a dynamic table on a stream, Querying a dynamic table, and Emitting a dynamic table. Defining a Dynamic Table on a Stream # The first step of evaluating a SQL query on a dynamic table is to define a dynamic table on a stream. This means we have to specify how the records of a stream modify the dynamic table. The stream must carry records with a schema that is mapped to the relational schema of the table. There are two modes to define a dynamic table on a stream: Append Mode and Update Mode.
In append mode each stream record is an insert modification to the dynamic table. Hence, all records of a stream are appended to the dynamic table such that it is ever-growing and infinite in size. The following figure illustrates the append mode.
In update mode a stream record can represent an insert, update, or delete modification on the dynamic table (append mode is in fact a special case of update mode). When defining a dynamic table on a stream via update mode, we can specify a unique key attribute on the table. In that case, update and delete operations are performed with respect to the key attribute. The update mode is visualized in the following figure.
Querying a Dynamic Table # Once we have defined a dynamic table, we can run a query on it. Since dynamic tables change over time, we have to define what it means to query a dynamic table. Let&rsquo;s imagine we take a snapshot of a dynamic table at a specific point in time. This snapshot can be treated as a regular static batch table. We denote a snapshot of a dynamic table A at a point t as A[t]. The snapshot can be queried with any SQL query. The query produces a regular static table as result. We denote the result of a query q on a dynamic table A at time t as q(A[t]). If we repeatedly compute the result of a query on snapshots of a dynamic table for progressing points in time, we obtain many static result tables which are changing over time and effectively constitute a dynamic table. We define the semantics of a query on a dynamic table as follows.
A query q on a dynamic table A produces a dynamic table R, which is at each point in time t equivalent to the result of applying q on A[t], i.e., R[t] = q(A[t]). This definition implies that running the same query on q on a batch table and on a streaming table produces the same result. In the following, we show two examples to illustrate the semantics of queries on dynamic tables.
In the figure below, we see a dynamic input table A on the left side, which is defined in append mode. At time t = 8, A consists of six rows (colored in blue). At time t = 9 and t = 12, one row is appended to A (visualized in green and orange, respectively). We run a simple query on table A which is shown in the center of the figure. The query groups by attribute k and counts the records per group. On the right hand side we see the result of query q at time t = 8 (blue), t = 9 (green), and t = 12 (orange). At each point in time t, the result table is equivalent to a batch query on the dynamic table A at time t.
The query in this example is a simple grouped (but not windowed) aggregation query. Hence, the size of the result table depends on the number of distinct grouping keys of the input table. Moreover, it is worth noticing that the query continuously updates result rows that it had previously emitted instead of merely adding new rows.
The second example shows a similar query which differs in one important aspect. In addition to grouping on the key attribute k, the query also groups records into tumbling windows of five seconds, which means that it computes a count for each value of k every five seconds. Again, we use Calcite&rsquo;s group window functions to specify this query. On the left side of the figure we see the input table A and how it changes over time in append mode. On the right we see the result table and how it evolves over time.
In contrast to the result of the first example, the resulting table grows relative to the time, i.e., every five seconds new result rows are computed (given that the input table received more records in the last five seconds). While the non-windowed query (mostly) updates rows of the result table, the windowed aggregation query only appends new rows to the result table.
Although this blog post focuses on the semantics of SQL queries on dynamic tables and not on how to efficiently process such a query, we&rsquo;d like to point out that it is not possible to compute the complete result of a query from scratch whenever an input table is updated. Instead, the query is compiled into a streaming program which continuously updates its result based on the changes on its input. This implies that not all valid SQL queries are supported but only those that can be continuously, incrementally, and efficiently computed. We plan discuss details about the evaluation of SQL queries on dynamic tables in a follow up blog post.
Emitting a Dynamic Table # Querying a dynamic table yields another dynamic table, which represents the query&rsquo;s results. Depending on the query and its input tables, the result table is continuously modified by insert, update, and delete changes just like a regular database table. It might be a table with a single row, which is constantly updated, an insert-only table without update modifications, or anything in between.
Traditional database systems use logs to rebuild tables in case of failures and for replication. There are different logging techniques, such as UNDO, REDO, and UNDO/REDO logging. In a nutshell, UNDO logs record the previous value of a modified element to revert incomplete transactions, REDO logs record the new value of a modified element to redo lost changes of completed transactions, and UNDO/REDO logs record the old and the new value of a changed element to undo incomplete transactions and redo lost changes of completed transactions. Based on the principles of these logging techniques, a dynamic table can be converted into two types of changelog streams, a REDO Stream and a REDO+UNDO Stream.
A dynamic table is converted into a redo+undo stream by converting the modifications on the table into stream messages. An insert modification is emitted as an insert message with the new row, a delete modification is emitted as a delete message with the old row, and an update modification is emitted as a delete message with the old row and an insert message with the new row. This behavior is illustrated in the following figure.
The left shows a dynamic table which is maintained in append mode and serves as input to the query in the center. The result of the query converted into a redo+undo stream which is shown at the bottom. The first record (1, A) of the input table results in a new record in the result table and hence in an insert message +(A, 1) to the stream. The second input record with k = &lsquo;A&rsquo; (4, A) produces an update of the (A, 1) record in the result table and hence yields a delete message -(A, 1) and an insert message for +(A, 2). All downstream operators or data sinks need to be able to correctly handle both types of messages.
A dynamic table can be converted into a redo stream in two cases: either it is an append-only table (i.e., it only has insert modifications) or it has a unique key attribute. Each insert modification on the dynamic table results in an insert message with the new row to the redo stream. Due to the restriction of redo streams, only tables with unique keys can have update and delete modifications. If a key is removed from the keyed dynamic table, either because a row is deleted or because the key attribute of a row was modified, a delete message with the removed key is emitted to the redo stream. An update modification yields an update message with the updating, i.e., new row. Since delete and update modifications are defined with respect to the unique key, the downstream operators need to be able to access previous values by key. The figure below shows how the result table of the same query as above is converted into a redo stream.
The row (1, A) which yields an insert into the dynamic table results in the +(A, 1) insert message. The row (4, A) which produces an update yields the *(A, 2) update message.
Common use cases for redo streams are to write the result of a query to an append-only storage system, like rolling files or a Kafka topic, or to a data store with keyed access, such as Cassandra, a relational DBMS, or a compacted Kafka topic. It is also possible to materialize a dynamic table as keyed state inside of the streaming application that evaluates the continuous query and make it queryable from external systems. With this design Flink itself maintains the result of a continuous SQL query on a stream and serves key lookups on the result table, for instance from a dashboard application.
What will Change When Switching to Dynamic Tables? # In version 1.2, all streaming operators of Flink&rsquo;s relational APIs, like filter, project, and group window aggregates, only emit new rows and are not capable of updating previously emitted results. In contrast, dynamic table are able to handle update and delete modifications. Now you might ask yourself, How does the processing model of the current version relate to the new dynamic table model? Will the semantics of the APIs completely change and do we need to reimplement the APIs from scratch to achieve the desired semantics?
The answer to all these questions is simple. The current processing model is a subset of the dynamic table model. Using the terminology we introduced in this post, the current model converts a stream into a dynamic table in append mode, i.e., an infinitely growing table. Since all operators only accept insert changes and produce insert changes on their result table (i.e., emit new rows), all supported queries result in dynamic append tables, which are converted back into DataStreams using the redo model for append-only tables. Consequently, the semantics of the current model are completely covered and preserved by the new dynamic table model.
Conclusion and Outlook # Flink&rsquo;s relational APIs are great to implement stream analytics applications in no time and used in several production settings. In this blog post we discussed the future of the Table API and SQL. This effort will make Flink and stream processing accessible to more people. Moreover, the unified semantics for querying historic and real-time data as well as the concept of querying and maintaining dynamic tables will enable and significantly ease the implementation of many exciting use cases and applications. As this post was focusing on the semantics of relational queries on streams and dynamic tables, we did not discuss the details of how a query will be executed, which includes the internal implementation of retractions, handling of late events, support for early results, and bounding space requirements. We plan to publish a follow up blog post on this topic at a later point in time.
In recent months, many members of the Flink community have been discussing and contributing to the relational APIs. We made great progress so far. While most work has focused on processing streams in append mode, the next steps on the agenda are to work on dynamic tables to support queries that update their results. If you are excited about the idea of processing streams with SQL and would like to contribute to this effort, please give feedback, join the discussions on the mailing list, or grab a JIRA issue to work on.
`}),e.add({id:225,href:"/2017/03/29/from-streams-to-tables-and-back-again-an-update-on-flinks-table-sql-api/",title:"From Streams to Tables and Back Again: An Update on Flink's Table & SQL API",section:"Flink Blog",content:`Stream processing can deliver a lot of value. Many organizations have recognized the benefit of managing large volumes of data in real-time, reacting quickly to trends, and providing customers with live services at scale. Streaming applications with well-defined business logic can deliver a competitive advantage.
Flink&rsquo;s DataStream abstraction is a powerful API which lets you flexibly define both basic and complex streaming pipelines. Additionally, it offers low-level operations such as Async IO and ProcessFunctions. However, many users do not need such a deep level of flexibility. They need an API which quickly solves 80% of their use cases where simple tasks can be defined using little code.
To deliver the power of stream processing to a broader set of users, the Apache Flink community is developing APIs that provide simpler abstractions and more concise syntax so that users can focus on their business logic instead of advanced streaming concepts. Along with other APIs (such as CEP for complex event processing on streams), Flink offers a relational API that aims to unify stream and batch processing: the Table &amp; SQL API, often referred to as the Table API.
Recently, contributors working for companies such as Alibaba, Huawei, data Artisans, and more decided to further develop the Table API. Over the past year, the Table API has been rewritten entirely. Since Flink 1.1, its core has been based on Apache Calcite, which parses SQL and optimizes all relational queries. Today, the Table API can address a wide range of use cases in both batch and stream environments with unified semantics.
This blog post summarizes the current status of Flink’s Table API and showcases some of the recently-added features in Apache Flink. Among the features presented here are the unified access to batch and streaming data, data transformation, and window operators. The following paragraphs are not only supposed to give you a general overview of the Table API, but also to illustrate the potential of relational APIs in the future.
Because the Table API is built on top of Flink’s core APIs, DataStreams and DataSets can be converted to a Table and vice-versa without much overhead. Hereafter, we show how to create tables from different sources and specify programs that can be executed locally or in a distributed setting. In this post, we will use the Scala version of the Table API, but there is also a Java version as well as a SQL API with an equivalent set of features.
Data Transformation and ETL # A common task in every data processing pipeline is importing data from one or multiple systems, applying some transformations to it, then exporting the data to another system. The Table API can help to manage these recurring tasks. For reading data, the API provides a set of ready-to-use TableSources such as a CsvTableSource and KafkaTableSource, however, it also allows the implementation of custom TableSources that can hide configuration specifics (e.g. watermark generation) from users who are less familiar with streaming concepts.
Let’s assume we have a CSV file that stores customer information. The values are delimited by a “|”-character and contain a customer identifier, name, timestamp of the last update, and preferences encoded in a comma-separated key-value string:
42|Bob Smith|2016-07-23 16:10:11|color=12,length=200,size=200 The following example illustrates how to read a CSV file and perform some data cleansing before converting it to a regular DataStream program.
// set up execution environment val env = StreamExecutionEnvironment.getExecutionEnvironment val tEnv = TableEnvironment.getTableEnvironment(env) // configure table source val customerSource = CsvTableSource.builder() .path(&#34;/path/to/customer_data.csv&#34;) .ignoreFirstLine() .fieldDelimiter(&#34;|&#34;) .field(&#34;id&#34;, Types.LONG) .field(&#34;name&#34;, Types.STRING) .field(&#34;last_update&#34;, Types.TIMESTAMP) .field(&#34;prefs&#34;, Types.STRING) .build() // name your table source tEnv.registerTableSource(&#34;customers&#34;, customerSource) // define your table program val table = tEnv .scan(&#34;customers&#34;) .filter(&#39;name.isNotNull &amp;&amp; &#39;last_update &gt; &#34;2016-01-01 00:00:00&#34;.toTimestamp) .select(&#39;id, &#39;name.lowerCase(), &#39;prefs) // convert it to a data stream val ds = table.toDataStream[Row] ds.print() env.execute() The Table API comes with a large set of built-in functions that make it easy to specify business logic using a language integrated query (LINQ) syntax. In the example above, we filter out customers with invalid names and only select those that updated their preferences recently. We convert names to lowercase for normalization. For debugging purposes, we convert the table into a DataStream and print it.
The CsvTableSource supports both batch and stream environments. If the programmer wants to execute the program above in a batch application, all he or she has to do is to replace the environment via ExecutionEnvironment and change the output conversion from DataStream to DataSet. The Table API program itself doesn’t change.
In the example, we converted the table program to a data stream of Row objects. However, we are not limited to row data types. The Table API supports all types from the underlying APIs such as Java and Scala Tuples, Case Classes, POJOs, or generic types that are serialized using Kryo. Let’s assume that we want to have regular object (POJO) with the following format instead of generic rows:
class Customer { var id: Int = _ var name: String = _ var update: Long = _ var prefs: java.util.Properties = _ } We can use the following table program to convert the CSV file into Customer objects. Flink takes care of creating objects and mapping fields for us.
val ds = tEnv .scan(&#34;customers&#34;) .select(&#39;id, &#39;name, &#39;last_update as &#39;update, parseProperties(&#39;prefs) as &#39;prefs) .toDataStream[Customer] You might have noticed that the query above uses a function to parse the preferences field. Even though Flink’s Table API is shipped with a large set of built-in functions, is often necessary to define custom user-defined scalar functions. In the above example we use a user-defined function parseProperties. The following code snippet shows how easily we can implement a scalar function.
object parseProperties extends ScalarFunction { def eval(str: String): Properties = { val props = new Properties() str .split(&#34;,&#34;) .map(\\_.split(&#34;=&#34;)) .foreach(split =&gt; props.setProperty(split(0), split(1))) props } } Scalar functions can be used to deserialize, extract, or convert values (and more). By overwriting the open() method we can even have access to runtime information such as distributed cached files or metrics. Even the open() method is only called once during the runtime’s task lifecycle.
Unified Windowing for Static and Streaming Data # Another very common task, especially when working with continuous data, is the definition of windows to split a stream into pieces of finite size, over which we can apply computations. At the moment, the Table API supports three types of windows: sliding windows, tumbling windows, and session windows (for general definitions of the different types of windows, we recommend Flink’s documentation). All three window types work on event or processing time. Session windows can be defined over time intervals, sliding and tumbling windows can be defined over time intervals or a number of rows.
Let’s assume that our customer data from the example above is an event stream of updates generated whenever the customer updated his or her preferences. We assume that events come from a TableSource that has assigned timestamps and watermarks. The definition of a window happens again in a LINQ-style fashion. The following example could be used to count the updates to the preferences during one day.
table .window(Tumble over 1.day on &#39;rowtime as &#39;w) .groupBy(&#39;id, &#39;w) .select(&#39;id, &#39;w.start as &#39;from, &#39;w.end as &#39;to, &#39;prefs.count as &#39;updates) By using the on() parameter, we can specify whether the window is supposed to work on event-time or not. The Table API assumes that timestamps and watermarks are assigned correctly when using event-time. Elements with timestamps smaller than the last received watermark are dropped. Since the extraction of timestamps and generation of watermarks depends on the data source and requires some deeper knowledge of their origin, the TableSource or the upstream DataStream is usually responsible for assigning these properties.
The following code shows how to define other types of windows:
// using processing-time table.window(Tumble over 100.rows as &#39;manyRowWindow) // using event-time table.window(Session withGap 15.minutes on &#39;rowtime as &#39;sessionWindow) table.window(Slide over 1.day every 1.hour on &#39;rowtime as &#39;dailyWindow) Since batch is just a special case of streaming (where a batch happens to have a defined start and end point), it is also possible to apply all of these windows in a batch execution environment. Without any modification of the table program itself, we can run the code on a DataSet given that we specified a column named “rowtime”. This is particularly interesting if we want to compute exact results from time-to-time, so that late events that are heavily out-of-order can be included in the computation.
At the moment, the Table API only supports so-called “group windows” that also exist in the DataStream API. Other windows such as SQL’s OVER clause windows are in development and planned for Flink 1.3.
In order to demonstrate the expressiveness and capabilities of the API, here’s a snippet with a more advanced example of an exponentially decaying moving average over a sliding window of one hour which returns aggregated results every second. The table program weighs recent orders more heavily than older orders. This example is borrowed from Apache Calcite and shows what will be possible in future Flink releases for both the Table API and SQL.
table .window(Slide over 1.hour every 1.second as &#39;w) .groupBy(&#39;productId, &#39;w) .select( &#39;w.end, &#39;productId, (&#39;unitPrice * (&#39;rowtime - &#39;w.start).exp() / 1.hour).sum / ((&#39;rowtime - &#39;w.start).exp() / 1.hour).sum) User-defined Table Functions # User-defined table functions were added in Flink 1.2. These can be quite useful for table columns containing non-atomic values which need to be extracted and mapped to separate fields before processing. Table functions take an arbitrary number of scalar values and allow for returning an arbitrary number of rows as output instead of a single value, similar to a flatMap function in the DataStream or DataSet API. The output of a table function can then be joined with the original row in the table by using either a left-outer join or cross join.
Using the previously-mentioned customer table, let’s assume we want to produce a table that contains the color and size preferences as separate columns. The table program would look like this:
// create an instance of the table function val extractPrefs = new PropertiesExtractor() // derive rows and join them with original row table .join(extractPrefs(&#39;prefs) as (&#39;color, &#39;size)) .select(&#39;id, &#39;username, &#39;color, &#39;size) The PropertiesExtractor is a user-defined table function that extracts the color and size. We are not interested in customers that haven’t set these preferences and thus don’t emit anything if both properties are not present in the string value. Since we are using a (cross) join in the program, customers without a result on the right side of the join will be filtered out.
class PropertiesExtractor extends TableFunction[Row] { def eval(prefs: String): Unit = { // split string into (key, value) pairs val pairs = prefs .split(&#34;,&#34;) .map { kv =&gt; val split = kv.split(&#34;=&#34;) (split(0), split(1)) } val color = pairs.find(\\_.\\_1 == &#34;color&#34;).map(\\_.\\_2) val size = pairs.find(\\_.\\_1 == &#34;size&#34;).map(\\_.\\_2) // emit a row if color and size are specified (color, size) match { case (Some(c), Some(s)) =&gt; collect(Row.of(c, s)) case _ =&gt; // skip } } override def getResultType = new RowTypeInfo(Types.STRING, Types.STRING) } Conclusion # There is significant interest in making streaming more accessible and easier to use. Flink’s Table API development is happening quickly, and we believe that soon, you will be able to implement large batch or streaming pipelines using purely relational APIs or even convert existing Flink jobs to table programs. The Table API is already a very useful tool since you can work around limitations and missing features at any time by switching back-and-forth between the DataSet/DataStream abstraction to the Table abstraction.
Contributions like support of Apache Hive UDFs, external catalogs, more TableSources, additional windows, and more operators will make the Table API an even more useful tool. Particularly, the upcoming introduction of Dynamic Tables, which is worth a blog post of its own, shows that even in 2017, new relational APIs open the door to a number of possibilities.
Try it out, or even better, join the design discussions on the mailing lists and JIRA and start contributing!
`}),e.add({id:226,href:"/2017/03/23/apache-flink-1.1.5-released/",title:"Apache Flink 1.1.5 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.
This release includes critical fixes for HA recovery robustness, fault tolerance guarantees of the Flink Kafka Connector, as well as classloading issues with the Kryo serializer. We highly recommend all users to upgrade to Flink 1.1.5.
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.1.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.5&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.5&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.1.5 # Bug # [FLINK-5701] - FlinkKafkaProducer should check asyncException on checkpoints [FLINK-6006] - Kafka Consumer can lose state if queried partition list is incomplete on restore [FLINK-5940] - ZooKeeperCompletedCheckpointStore cannot handle broken state handles [FLINK-5942] - Harden ZooKeeperStateHandleStore to deal with corrupted data [FLINK-6025] - User code ClassLoader not used when KryoSerializer fallbacks to serialization for copying [FLINK-5945] - Close function in OuterJoinOperatorBase#executeOnCollections [FLINK-5934] - Scheduler in ExecutionGraph null if failure happens in ExecutionGraph.restoreLatestCheckpointedState [FLINK-5771] - DelimitedInputFormat does not correctly handle multi-byte delimiters [FLINK-5647] - Fix RocksDB Backend Cleanup [FLINK-2662] - CompilerException: "Bug: Plan generation for Unions picked a ship strategy between binary plan operators." [FLINK-5585] - NullPointer Exception in JobManager.updateAccumulators [FLINK-5484] - Add test for registered Kryo types [FLINK-5518] - HadoopInputFormat throws NPE when close() is called before open() Improvement # [FLINK-5575] - in old releases, warn users and guide them to the latest stable docs [FLINK-5639] - Clarify License implications of RabbitMQ Connector [FLINK-5466] - Make production environment default in gulpfile `}),e.add({id:227,href:"/2017/02/06/announcing-apache-flink-1.2.0/",title:"Announcing Apache Flink 1.2.0",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the 1.2.0 release. Over the past months, the Flink community has been working hard to resolve 650 issues. See the complete changelog for more detail.
This is the third major release in the 1.x.y series. It is API compatible with the other 1.x.y releases for APIs annotated with the @Public annotation.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, gladly encouraged!
You can find the binaries on the updated Downloads page. Some highlights of the release are listed below.
Dynamic Scaling / Key Groups # Flink now supports changing the parallelism of a streaming job by restoring it from a savepoint with a different parallelism. Both changing the entire job’s parallelism and operator parallelism is supported. In the StreamExecutionEnvironment, users can set a new per-job configuration parameter called “max parallelism”. It determines the upper limit for the parallelism.
By default, the value is set to:
128 : for all parallelism &lt;= 128 MIN(nextPowerOfTwo(parallelism + (parallelism / 2)), 2^15): for all parallelism &gt; 128 The following built-in functions and operators support rescaling:
Window operator Rolling/Bucketing sink Kafka consumers Continuous File Processing source The write-ahead log Cassandra sink and the CEP operator are currently not rescalable. Users using the keyed state interfaces can use the dynamic scaling without changing their code.
Rescalable Non-Partitioned State # As part of the dynamic scaling effort, the community has also added rescalable non-partitioned state for operators like the Kafka consumer that don’t use keyed state but instead use operator state.
In case of rescaling, the operator state needs to be redistributed among the parallel consumer instances. In case of the Kafka consumer, the assigned partitions and their offsets are redistributed.
ProcessFunction # The ProcessFunction is a low-level stream processing operation giving access to the basic building blocks of all (acyclic) streaming applications:
Events (stream elements) State (fault tolerant, consistent) Timers (event time and processing time) The ProcessFunction can be thought of as a FlatMapFunction with access to keyed state and timers.
ProcessFunction documentation
Async I/O # Flink now has a dedicated Async I/O operator for making blocking calls asynchronously and in a checkpointed fashion. For example, there are many Flink applications that need to query external datastores for each element in a stream. To avoid slowing down the stream to the speed of the external system, the async I/O operator allows requests to overlap.
Async I/O documentation
Run Flink with Apache Mesos # The latest release further extends Flink’s deployment flexibility by adding support for Apache Mesos and DC/OS. In combination with Marathon, it is now possible to run an highly available Flink cluster on Mesos.
Mesos documentation
Secure Data Access # Flink is now able to authenticate against external services such as Zookeeper, Kafka, HDFS and YARN using Kerberos. Also, experimental support for encryption over the wire has been added.
Kerberos documentation and SSL setup documentation.
Queryable State # This experimental feature allows users to query the current state of an operator. If you have, for example, a flatMap() operator that keeps a running aggregate per key, queryable state allows you to retrieve the current aggregate value at any time by directly connecting to the TaskManager and retrieving that value.
Queryable State documentation
Backwards compatible savepoints # Flink 1.2.0 allows users to restart a job from an 1.1.4 savepoint. This makes major Flink version upgrades possible without losing application state. The following built-in operators are backwards compatible:
Window operator Rolling/Bucketing sink Kafka consumers Continuous File Processing source Upgrading Flink applications documentation
Table API &amp; SQL # This release significantly expanded the performance, stability, and coverage of Flink’s Table API and SQL support for batch and streaming tables.
The community added tumbling, sliding, and session group-window aggregations over streaming tables e.g. table.window(Session withGap 10.minutes on 'rowtime as 'w)
SQL supports more built-in functions and operations e.g. EXISTS, VALUES, LIMIT, CURRENT_DATE, INITCAP, NULLIF
Both APIs support more data types and are better integrated e.g. access a POJO field myPojo.get('field'), myPojo.flatten()
Users can now define their own scalar and table functions e.g. table.select('uid, parse('field) as 'parsed).join(split('parsed) as 'atom)
Flink Table API &amp; SQL documentation
Miscellaneous improvements # Metrics in Flink web interface: A metrics system was added in Flink 1.1, and with this release, Flink provides a new tab in the web frontend to see some of the metrics in the web UI.
Kafka 0.10 support: Flink 1.2 now provides a connector for Apache Kafka 0.10.0.x, including support for consuming and producing messages with a timestamp using Flink’s internal event time (Kafka Connector Documentation)
Evictor Semantics: Flink 1.2 ships with more expressive evictor semantics that allow the programmer to evict elements form a window both before and after the application of the window function, and to remove elements arbitrarily (Evictor Semantics Documentation)
List of Contributors # According to git shortlog, the following 122 people contributed to the 1.2.0 release. Thank you to all contributors!
Abhishek R. Singh Ahmad Ragab Aleksandr Chermenin Alexander Pivovarov Alexander Shoshin Alexey Diomin Aljoscha Krettek Andrey Melentyev Anton Mushin Bob Thorman Boris Osipov Bram Vogelaar Bruno Aranda David Anderson Dominik Evgeny_Kincharov Fabian Hueske Fokko Driesprong Gabor Gevay George Gordon Tai Greg Hogan Gyula Fora Haohui Mai Holger Frydrych HungUnicorn Ismaël Mejía Ivan Mushketyk Jakub Havlik Jark Wu Jendrik Poloczek Jincheng Sun Josh Joshi Keiji Yoshida Kirill Morozov Kurt Young Liwei Lin Lorenz Buehmann Maciek Próchniak Makman2 Markus Müller Martin Junghanns Márton Balassi Max Kuklinski Maximilian Michels Milosz Tanski Nagarjun Neelesh Srinivas Salian Neil Derraugh Nick Chadwick Nico Kruber Niels Basjes Pattarawat Chormai Piotr Godek Raghav Ramkrishna Robert Metzger Rohit Agarwal Roman Maier Sachin Sachin Goel Scott Kidder Shannon Carey Stefan Richter Steffen Hausmann Stephan Epping Stephan Ewen Sunny T Suri Theodore Vasiloudis Till Rohrmann Tony Wei Tzu-Li (Gordon) Tai Ufuk Celebi Vijay Srinivasaraghavan Vishnu Viswanath WangTaoTheTonic William-Sang Yassine Marzougui anton solovev beyond1920 biao.liub chobeat danielblazevski f7753 fengyelei fengyelei 00406569 gallenvara gaolun.gl godfreyhe heytitle hzyuemeng1 iteblog kl0u larsbachmann lincoln-lil manuzhang medale miaoever mtunique radekg renkai sergey_sokur shijinkui shuai.xus smarthi swapnil-chougule tedyu tibor.moger tonycox twalthr vasia wenlong.lwl wrighe3 xiaogang.sxg yushi.wxg yuzhongliu zentol zhuhaifengleon 淘江 魏偉哲 `}),e.add({id:228,href:"/2016/12/21/apache-flink-1.1.4-released/",title:"Apache Flink 1.1.4 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.1 series.
This release includes major robustness improvements for checkpoint cleanup on failures and consumption of intermediate streams. We highly recommend all users to upgrade to Flink 1.1.4.
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.1.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.4&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.4&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Note for RocksDB Backend Users # We updated Flink&rsquo;s RocksDB dependency version from 4.5.1 to 4.11.2. Between these versions some of RocksDB&rsquo;s internal configuration defaults changed that would affect the memory footprint of running Flink with RocksDB. Therefore, we manually reset them to the previous defaults. If you want to run with the new Rocks 4.11.2 defaults, you can do this via:
RocksDBStateBackend backend = new RocksDBStateBackend(&#34;...&#34;); // Use the new default options. Otherwise, the default for RocksDB 4.5.1 // \`PredefinedOptions.DEFAULT_ROCKS_4_5_1\` will be used. backend.setPredefinedOptions(PredefinedOptions.DEFAULT); Release Notes - Flink - Version 1.1.4 # Sub-task # [FLINK-4510] - Always create CheckpointCoordinator [FLINK-4984] - Add Cancellation Barriers to BarrierTracker and BarrierBuffer [FLINK-4985] - Report Declined/Canceled Checkpoints to Checkpoint Coordinator Bug # [FLINK-2662] - CompilerException: &quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&quot; [FLINK-3680] - Remove or improve (not set) text in the Job Plan UI [FLINK-3813] - YARNSessionFIFOITCase.testDetachedMode failed on Travis [FLINK-4108] - NPE in Row.productArity [FLINK-4506] - CsvOutputFormat defaults allowNullValues to false, even though doc and declaration says true [FLINK-4581] - Table API throws &quot;No suitable driver found for jdbc:calcite&quot; [FLINK-4586] - NumberSequenceIterator and Accumulator threading issue [FLINK-4619] - JobManager does not answer to client when restore from savepoint fails [FLINK-4727] - Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read [FLINK-4862] - NPE on EventTimeSessionWindows with ContinuousEventTimeTrigger [FLINK-4932] - Don&#39;t let ExecutionGraph fail when in state Restarting [FLINK-4933] - ExecutionGraph.scheduleOrUpdateConsumers can fail the ExecutionGraph [FLINK-4977] - Enum serialization does not work in all cases [FLINK-4991] - TestTask hangs in testWatchDogInterruptsTask [FLINK-4998] - ResourceManager fails when num task slots &gt; Yarn vcores [FLINK-5013] - Flink Kinesis connector doesn&#39;t work on old EMR versions [FLINK-5028] - Stream Tasks must not go through clean shutdown logic on cancellation [FLINK-5038] - Errors in the &quot;cancelTask&quot; method prevent closeables from being closed early [FLINK-5039] - Avro GenericRecord support is broken [FLINK-5040] - Set correct input channel types with eager scheduling [FLINK-5050] - JSON.org license is CatX [FLINK-5057] - Cancellation timeouts are picked from wrong config [FLINK-5058] - taskManagerMemory attribute set wrong value in FlinkShell [FLINK-5063] - State handles are not properly cleaned up for declined or expired checkpoints [FLINK-5073] - ZooKeeperCompleteCheckpointStore executes blocking delete operation in ZooKeeper client thread [FLINK-5075] - Kinesis consumer incorrectly determines shards as newly discovered when tested against Kinesalite [FLINK-5082] - Pull ExecutionService lifecycle management out of the JobManager [FLINK-5085] - Execute CheckpointCoodinator&#39;s state discard calls asynchronously [FLINK-5114] - PartitionState update with finished execution fails [FLINK-5142] - Resource leak in CheckpointCoordinator [FLINK-5149] - ContinuousEventTimeTrigger doesn&#39;t fire at the end of the window [FLINK-5154] - Duplicate TypeSerializer when writing RocksDB Snapshot [FLINK-5158] - Handle ZooKeeperCompletedCheckpointStore exceptions in CheckpointCoordinator [FLINK-5172] - In RocksDBStateBackend, set flink-core and flink-streaming-java to &quot;provided&quot; [FLINK-5173] - Upgrade RocksDB dependency [FLINK-5184] - Error result of compareSerialized in RowComparator class [FLINK-5193] - Recovering all jobs fails completely if a single recovery fails [FLINK-5197] - Late JobStatusChanged messages can interfere with running jobs [FLINK-5214] - Clean up checkpoint files when failing checkpoint operation on TM [FLINK-5215] - Close checkpoint streams upon cancellation [FLINK-5216] - CheckpointCoordinator&#39;s &#39;minPauseBetweenCheckpoints&#39; refers to checkpoint start rather then checkpoint completion [FLINK-5218] - Eagerly close checkpoint streams on cancellation [FLINK-5228] - LocalInputChannel re-trigger request and release deadlock [FLINK-5229] - Cleanup StreamTaskStates if a checkpoint operation of a subsequent operator fails [FLINK-5246] - Don&#39;t discard unknown checkpoint messages in the CheckpointCoordinator [FLINK-5248] - SavepointITCase doesn&#39;t catch savepoint restore failure [FLINK-5274] - LocalInputChannel throws NPE if partition reader is released [FLINK-5275] - InputChanelDeploymentDescriptors throws misleading Exception if producer failed/cancelled [FLINK-5276] - ExecutionVertex archiving can throw NPE with many previous attempts [FLINK-5285] - CancelCheckpointMarker flood when using at least once mode [FLINK-5326] - IllegalStateException: Bug in Netty consumer logic: reader queue got notified by partition about available data, but none was available [FLINK-5352] - Restore RocksDB 1.1.3 memory behavior Improvement # [FLINK-3347] - TaskManager (or its ActorSystem) need to restart in case they notice quarantine [FLINK-3787] - Yarn client does not report unfulfillable container constraints [FLINK-4445] - Ignore unmatched state when restoring from savepoint [FLINK-4715] - TaskManager should commit suicide after cancellation failure [FLINK-4894] - Don&#39;t block on buffer request after broadcastEvent [FLINK-4975] - Add a limit for how much data may be buffered during checkpoint alignment [FLINK-4996] - Make CrossHint @Public [FLINK-5046] - Avoid redundant serialization when creating the TaskDeploymentDescriptor [FLINK-5123] - Add description how to do proper shading to Flink docs. [FLINK-5169] - Make consumption of input channels fair [FLINK-5192] - Provide better log config templates [FLINK-5194] - Log heartbeats on TRACE level [FLINK-5196] - Don&#39;t log InputChannelDescriptor [FLINK-5198] - Overwrite TaskState toString [FLINK-5199] - Improve logging of submitted job graph actions in HA case [FLINK-5201] - Promote loaded config properties to INFO [FLINK-5207] - Decrease HadoopFileSystem logging [FLINK-5249] - description of datastream rescaling doesn&#39;t match the figure [FLINK-5259] - wrong execution environment in retry delays example [FLINK-5278] - Improve Task and checkpoint logging New Feature # [FLINK-4976] - Add a way to abort in flight checkpoints Task # [FLINK-4778] - Update program example in /docs/setup/cli.md due to the change in FLINK-2021 `}),e.add({id:229,href:"/2016/12/19/apache-flink-in-2016-year-in-review/",title:"Apache Flink in 2016: Year in Review",section:"Flink Blog",content:`2016 was an exciting year for the Apache Flink® community, and the release of Flink 1.0 in March marked the first time in Flink’s history that the community guaranteed API backward compatibility for all versions in a series. This step forward for Flink was followed by many new and exciting production deployments in organizations of all shapes and sizes, all around the globe.
In this post, we’ll look back on the project’s progress over the course of 2016, and we’ll also preview what 2017 has in store.
{%toc%}
Community Growth # Github # First, here&rsquo;s a summary of community statistics from GitHub. At the time of writing:
Contributors have increased from 150 in December 2015 to 258 in December 2016 (up 72%) Stars have increased from 813 in December 2015 to 1830 in December 2016 (up 125%) Forks have increased from 544 in December 2015 to 1255 in December 2016 (up 130%) The community also welcomed 3 new committers in 2016: Chengxiang Li, Greg Hogan, and Tzu-Li (Gordon) Tai.
Next, let&rsquo;s take a look at a few other project stats, starting with number of commits. If we run:
git log --pretty=oneline --after=12/31/2015 | wc -l &hellip;inside the Flink repository, we&rsquo;ll see a total of 1884 commits so far in 2016, bringing the all-time total commits to 10,015.
Now, let&rsquo;s go a bit deeper. And here are instructions in case you&rsquo;d like to take a look at this data yourself.
Download gitstats from the project homepage. Or, on OS X with homebrew, type: brew install --HEAD homebrew/head-only/gitstats Clone the Apache Flink git repository: git clone git@github.com:apache/flink.git Generate the statistics gitstats flink/ flink-stats/ View all the statistics as an html page using your defaulf browser: open flink-stats/index.html 2016 is the year that Flink surpassed 1 million lines of code, now clocking in at 1,034,137 lines.
Monday remains the day of the week with the most commits over the project&rsquo;s history:
And 5pm is still solidly the preferred commit time:
Meetups # Apache Flink Meetup membership grew by 240% this year, and at the time of writing, there are 41 meetups comprised of 16,541 members listing Flink as a topic&ndash;up from 16 groups with 4,864 members in December 2015. The Flink community is proud to be truly global in nature.
Flink Forward 2016 # The second annual Flink Forward conference took place in Berlin on September 12-14, and over 350 members of the Flink community came together for speaker sessions, training, and discussion about Flink. Slides and videos from speaker sessions are available online, and we encourage you to take a look if you’re interested in learning more about how Flink is used in production in a wide range of organizations.
Flink Forward will be expanding to San Francisco in April 2017, and the third-annual Berlin event is scheduled for September 2017.
Features and Ecosystem # Flink Ecosystem Growth # Flink was added to a selection of distributions during 2016, making it easier for an even larger base of users to start working with Flink:
Amazon EMR Google Cloud Dataproc Lightbend Fast Data Platform In addition, the Apache Beam and Flink communities teamed up to build a Flink runner for Beam that, according to the Google team, is &ldquo;sophisticated enough to be a compelling alternative to Cloud Dataflow when running on premise or on non-Google clouds&rdquo;.
Feature Timeline in 2016 # Here’s a selection of major features added to Flink over the course of 2016:
If you spend time in the Apache Flink JIRA project, you’ll see that the Flink community has addressed every single one of the roadmap items identified in 2015’s year in review post. Here&rsquo;s to making that an annual tradition. :)
Looking ahead to 2017 # A good source of information about the Flink community&rsquo;s roadmap is the list of Flink Improvement Proposals (FLIPs) in the project wiki. Below, we&rsquo;ll highlight a selection of FLIPs that have been accepted by the community as well as some that are still under discussion.
We should note that work is already underway on a number of these features, and some will even be included in Flink 1.2 at the beginning of 2017.
A new Flink deployment and process model, as described in FLIP-6. This work ensures that Flink supports a wide range of deployment types and cluster managers, making it possible to run Flink smoothly in any environment.
Dynamic scaling for both key-value state (as described in this PR) and non-partitioned state (as described in FLIP-8), ensuring that it&rsquo;s always possible to split or merge state when scaling up or down, respectively.
Asynchronous I/O, as described in FLIP-12 , which makes I/O access a less time-consuming process without adding complexity or the need for extra checkpoint coordination.
Enhancements to the window evictor, as described in FLIP-4, to provide users with more control over how elements are evicted from a window.
Fined-grained recovery from task failures, as described in FLIP-1, to make it possible to restart only what needs to be restarted during recovery, building on cached intermediate results.
Unified checkpoints and savepoints, as described in FLIP-10, to allow savepoints to be triggered automatically&ndash;important for program updates for the sake of error handling because savepoints allow the user to modify both the job and Flink version whereas checkpoints can only be recovered with the same job.
Table API window aggregations, as described in FLIP-11, to support group-window and row-window aggregates on streaming and batch tables.
Side inputs, as described in this design document, to enable the joining of a main, high-throughput stream with one more more inputs with static or slowly-changing data.
If you&rsquo;re interested in getting involved with Flink, we encourage you to take a look at the FLIPs and to join the discussion via the Flink mailing lists.
Lastly, we&rsquo;d like to extend a sincere thank you to all of the Flink community for making 2016 a great year!
`}),e.add({id:230,href:"/2016/10/12/apache-flink-1.1.3-released/",title:"Apache Flink 1.1.3 Released",section:"Flink Blog",content:`The Apache Flink community released the next bugfix version of the Apache Flink 1.1. series.
We recommend all users to upgrade to Flink 1.1.3.
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.1.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.3&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.3&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Note for RocksDB Backend Users # It is highly recommended to use the &ldquo;fully async&rdquo; mode for the RocksDB state backend. The &ldquo;fully async&rdquo; mode will most likely allow you to easily upgrade to Flink 1.2 (via savepoints) when it is released. The &ldquo;semi async&rdquo; mode will no longer be supported by Flink 1.2.
RocksDBStateBackend backend = new RocksDBStateBackend(&#34;...&#34;); backend.enableFullyAsyncSnapshots(); Release Notes - Flink - Version 1.1.3 # Bug [FLINK-2662] - CompilerException: &quot;Bug: Plan generation for Unions picked a ship strategy between binary plan operators.&quot; [FLINK-4311] - TableInputFormat fails when reused on next split [FLINK-4329] - Fix Streaming File Source Timestamps/Watermarks Handling [FLINK-4485] - Finished jobs in yarn session fill /tmp filesystem [FLINK-4513] - Kafka connector documentation refers to Flink 1.1-SNAPSHOT [FLINK-4514] - ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream [FLINK-4540] - Detached job execution may prevent cluster shutdown [FLINK-4544] - TaskManager metrics are vulnerable to custom JMX bean installation [FLINK-4566] - ProducerFailedException does not properly preserve Exception causes [FLINK-4588] - Fix Merging of Covering Window in MergingWindowSet [FLINK-4589] - Fix Merging of Covering Window in MergingWindowSet [FLINK-4616] - Kafka consumer doesn&#39;t store last emmited watermarks per partition in state [FLINK-4618] - FlinkKafkaConsumer09 should start from the next record on startup from offsets in Kafka [FLINK-4619] - JobManager does not answer to client when restore from savepoint fails [FLINK-4636] - AbstractCEPPatternOperator fails to restore state [FLINK-4640] - Serialization of the initialValue of a Fold on WindowedStream fails [FLINK-4651] - Re-register processing time timers at the WindowOperator upon recovery. [FLINK-4663] - Flink JDBCOutputFormat logs wrong WARN message [FLINK-4672] - TaskManager accidentally decorates Kill messages [FLINK-4677] - Jars with no job executions produces NullPointerException in ClusterClient [FLINK-4702] - Kafka consumer must commit offsets asynchronously [FLINK-4727] - Kafka 0.9 Consumer should also checkpoint auto retrieved offsets even when no data is read [FLINK-4732] - Maven junction plugin security threat [FLINK-4777] - ContinuousFileMonitoringFunction may throw IOException when files are moved [FLINK-4788] - State backend class cannot be loaded, because fully qualified name converted to lower-case Improvement [FLINK-4396] - GraphiteReporter class not found at startup of jobmanager [FLINK-4574] - Strengthen fetch interval implementation in Kinesis consumer [FLINK-4723] - Unify behaviour of committed offsets to Kafka / ZK for Kafka 0.8 and 0.9 consumer `}),e.add({id:231,href:"/2016/09/05/apache-flink-1.1.2-released/",title:"Apache Flink 1.1.2 Released",section:"Flink Blog",content:`The Apache Flink community released another bugfix version of the Apache Flink 1.1. series.
We recommend all users to upgrade to Flink 1.1.2.
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.1.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.2&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.2&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
Release Notes - Flink - Version 1.1.2 [FLINK-4236] - Flink Dashboard stops showing list of uploaded jars if main method cannot be looked up [FLINK-4309] - Potential null pointer dereference in DelegatingConfiguration#keySet() [FLINK-4334] - Shaded Hadoop1 jar not fully excluded in Quickstart [FLINK-4341] - Kinesis connector does not emit maximum watermark properly [FLINK-4402] - Wrong metrics parameter names in documentation [FLINK-4409] - class conflict between jsr305-1.3.9.jar and flink-shaded-hadoop2-1.1.1.jar [FLINK-4411] - [py] Chained dual input children are not properly propagated [FLINK-4412] - [py] Chaining does not properly handle broadcast variables [FLINK-4425] - &quot;Out Of Memory&quot; during savepoint deserialization [FLINK-4454] - Lookups for JobManager address in config [FLINK-4480] - Incorrect link to elastic.co in documentation [FLINK-4486] - JobManager not fully running when yarn-session.sh finishes [FLINK-4488] - Prevent cluster shutdown after job execution for non-detached jobs [FLINK-4514] - ExpiredIteratorException in Kinesis Consumer on long catch-ups to head of stream [FLINK-4526] - ApplicationClient: remove redundant proxy messages [FLINK-3866] - StringArraySerializer claims type is immutable; shouldn&#39;t [FLINK-3899] - Document window processing with Reduce/FoldFunction + WindowFunction [FLINK-4302] - Add JavaDocs to MetricConfig [FLINK-4495] - Running multiple jobs on yarn (without yarn-session) `}),e.add({id:232,href:"/2016/08/24/flink-forward-2016-announcing-schedule-keynotes-and-panel-discussion/",title:"Flink Forward 2016: Announcing Schedule, Keynotes, and Panel Discussion",section:"Flink Blog",content:`An update for the Flink community: the Flink Forward 2016 schedule is now available online. This year's event will include 2 days of talks from stream processing experts at Google, MapR, Alibaba, Netflix, Cloudera, and more. Following the talks is a full day of hands-on Flink training.
Ted Dunning has been announced as a keynote speaker at the event. Ted is the VP of Incubator at Apache Software Foundation, the Chief Application Architect at MapR Technologies, and a mentor on many recent projects. He'll present "How Can We Take Flink Forward?" on the second day of the conference.
Following Ted's keynote there will be a panel discussion on "Large Scale Streaming in Production". As stream processing systems become more mainstream, companies are looking to empower their users to take advantage of this technology. We welcome leading stream processing experts Xiaowei Jiang (Alibaba), Monal Daxini (Netflix), Maxim Fateev (Uber), and Ted Dunning (MapR Technologies) on stage to talk about the challenges they have faced and the solutions they have discovered while implementing stream processing systems at very large scale. The panel will be moderated by Jamie Grier (data Artisans).
The welcome keynote on Monday, September 12, will be presented by data Artisans' co-founders Kostas Tzoumas and Stephan Ewen. They will talk about "The maturing data streaming ecosystem and Apache Flink’s accelerated growth". In this talk, Kostas and Stephan discuss several large-scale stream processing use cases that the data Artisans team has seen over the past year.
And one more recent addition to the program: Maxim Fateev of Uber will present "Beyond the Watermark: On-Demand Backfilling in Flink". Flink’s time-progress model is built around a single watermark, which is incompatible with Uber’s business need for generating aggregates retroactively. Maxim's talk covers Uber's solution for on-demand backfilling.
We hope to see many community members at Flink Forward 2016. Registration is available online: flink-forward.org/registration `}),e.add({id:233,href:"/2016/08/04/announcing-apache-flink-1.1.0/",title:"Announcing Apache Flink 1.1.0",section:"Flink Blog",content:`Important: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use 1.1.1 or 1.1.1-hadoop1 as the Flink version. The Apache Flink community is pleased to announce the availability of Flink 1.1.0.
This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the complete changelog for more details.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!
Some highlights of the release are listed in the following sections.
Connectors # The streaming connectors are a major part of Flink&rsquo;s DataStream API. This release adds support for new external systems and further improves on the available connectors.
Continuous File System Sources # A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via FileProcessingModes:
DataStream&lt;String&gt; stream = env.readFile( textInputFormat, &#34;hdfs:///file-path&#34;, FileProcessingMode.PROCESS_CONTINUOUSLY, 5000, // monitoring interval (millis) FilePathFilter.createDefaultFilter()); // file path filter This will monitor hdfs:///file-path every 5000 milliseconds. Check out the DataSource documentation for more details.
Kinesis Source and Sink # Flink 1.1 adds a Kinesis connector for both consuming (FlinkKinesisConsumer) from and producing (FlinkKinesisProduer) to Amazon Kinesis Streams, which is a managed service purpose-built to make it easy to work with streaming data on AWS.
DataStream&lt;String&gt; kinesis = env.addSource( new FlinkKinesisConsumer&lt;&gt;(&#34;stream-name&#34;, schema, config)); Check out the Kinesis connector documentation for more details.
Cassandra Sink # The Apache Cassandra sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result.
CassandraSink.addSink(input) Check out the Cassandra Sink documentation for more details.
Table API and SQL # The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala).
Table custT = tableEnv .toTable(custDs, &#34;name, zipcode&#34;) .where(&#34;zipcode = &#39;12345&#39;&#34;) .select(&#34;name&#34;) An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with Apache Calcite.
In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks.
Table result = tableEnv.sql( &#34;SELECT STREAM product, amount FROM Orders WHERE product LIKE &#39;%Rubber%&#39;&#34;); A more detailed introduction can be found in the Flink blog and the Table API documentation.
DataStream API # The DataStream API now exposes session windows and allowed lateness as first-class citizens.
Session Windows # Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a certain period of inactivity. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed.
input.keyBy(&lt;key selector&gt;) .window(EventTimeSessionWindows.withGap(Time.minutes(10))) .&lt;windowed transformation&gt;(&lt;window function&gt;); Support for Late Elements # You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called allowed lateness. This specifies by how much time elements can be late.
input.keyBy(&lt;key selector&gt;).window(&lt;window assigner&gt;) .allowedLateness(&lt;time&gt;) .&lt;windowed transformation&gt;(&lt;window function&gt;); Elements that arrive within the allowed lateness are still put into windows and are considered when computing window results. If elements arrive after the allowed lateness they will be dropped. Flink will also make sure that any state held by the windowing operation is garbage collected once the watermark passes the end of a window plus the allowed lateness.
Check out the Windows documentation for more details.
Scala API for Complex Event Processing (CEP) # Flink 1.0 added the initial version of the CEP library. The core of the library is a Pattern API, which allows you to easily specify patterns to match against in your event stream. While in Flink 1.0 this API was only available for Java, Flink 1.1. now exposes the same API for Scala, allowing you to specify your event patterns in a more concise manner.
A more detailed introduction can be found in the Flink blog and the CEP documentation.
Graph generators and new Gelly library algorithms # This release includes many enhancements and new features for graph processing. Gelly now provides a collection of scalable graph generators for common graph types, such as complete, cycle, grid, hypercube, and RMat graphs. A variety of new graph algorithms have been added to the Gelly library, including Global and Local Clustering Coefficient, HITS, and similarity measures (Jaccard and Adamic-Adar).
For a full list of new graph processing features, check out the Gelly documentation.
Metrics # Flink’s new metrics system allows you to easily gather and expose metrics from your user application to external systems. You can add counters, gauges, and histograms to your application via the runtime context:
Counter counter = getRuntimeContext() .getMetricGroup() .counter(&#34;my-counter&#34;); All registered metrics will be exposed via reporters. Out of the box, Flinks comes with support for JMX, Ganglia, Graphite, and statsD. In addition to your custom metrics, Flink exposes many internal metrics like checkpoint sizes and JVM stats.
Check out the Metrics documentation for more details.
List of Contributors # The following 95 people contributed to this release:
Abdullah Ozturk Ajay Bhat Alexey Savartsov Aljoscha Krettek Andrea Sella Andrew Palumbo Chenguang He Chiwan Park David Moravek Dominik Bruhn Dyana Rose Fabian Hueske Flavio Pompermaier Gabor Gevay Gabor Horvath Geoffrey Mon Gordon Tai Greg Hogan Gyula Fora Henry Saputra Ignacio N. Lucero Ascencio Igor Berman Ismaël Mejía Ivan Mushketyk Jark Wu Jiri Simsa Jonas Traub Josh Joshi Joshua Herman Ken Krugler Konstantin Knauf Lasse Dalegaard Li Fanxi MaBiao Mao Wei Mark Reddy Martin Junghanns Martin Liesenberg Maximilian Michels Michal Fijolek Márton Balassi Nathan Howell Niels Basjes Niels Zeilemaker Phetsarath, Sourigna Robert Metzger Scott Kidder Sebastian Klemke Shahin Shannon Carey Shannon Quinn Stefan Richter Stefano Baghino Stefano Bortoli Stephan Ewen Steve Cosenza Sumit Chawla Tatu Saloranta Tianji Li Till Rohrmann Todd Lisonbee Tony Baines Trevor Grant Ufuk Celebi Vasudevan Yijie Shen Zack Pierce Zhai Jia chengxiang li chobeat danielblazevski dawid dawidwys eastcirclek erli ding gallenvara kl0u mans2singh markreddy mjsax nikste omaralvarez philippgrulich ramkrishna sahitya-pavurala samaitra smarthi spkavuly subhankar twalthr vasia xueyan.li zentol 卫乐 `}),e.add({id:234,href:"/2016/08/04/flink-1.1.1-released/",title:"Flink 1.1.1 Released",section:"Flink Blog",content:`Today, the Flink community released Flink version 1.1.1.
The Maven artifacts published on Maven central for 1.1.0 had a Hadoop dependency issue: No Hadoop 1 specific version (with version 1.1.0-hadoop1) was deployed and 1.1.0 artifacts have a dependency on Hadoop 1 instead of Hadoop 2.
This was fixed with this release and we highly recommend all users to use this version of Flink by bumping your Flink dependencies to version 1.1.1:
&lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-java&lt;/artifactId&gt; &lt;version&gt;1.1.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-streaming-java_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.1&lt;/version&gt; &lt;/dependency&gt; &lt;dependency&gt; &lt;groupId&gt;org.apache.flink&lt;/groupId&gt; &lt;artifactId&gt;flink-clients_2.10&lt;/artifactId&gt; &lt;version&gt;1.1.1&lt;/version&gt; &lt;/dependency&gt; You can find the binaries on the updated Downloads page.
`}),e.add({id:235,href:"/2016/05/24/stream-processing-for-everyone-with-sql-and-apache-flink/",title:"Stream Processing for Everyone with SQL and Apache Flink",section:"Flink Blog",content:`The capabilities of open source systems for distributed stream processing have evolved significantly over the last years. Initially, the first systems in the field (notably Apache Storm) provided low latency processing, but were limited to at-least-once guarantees, processing-time semantics, and rather low-level APIs. Since then, several new systems emerged and pushed the state of the art of open source stream processing in several dimensions. Today, users of Apache Flink or Apache Beam can use fluent Scala and Java APIs to implement stream processing jobs that operate in event-time with exactly-once semantics at high throughput and low latency.
In the meantime, stream processing has taken off in the industry. We are witnessing a rapidly growing interest in stream processing which is reflected by prevalent deployments of streaming processing infrastructure such as Apache Kafka and Apache Flink. The increasing number of available data streams results in a demand for people that can analyze streaming data and turn it into real-time insights. However, stream data analysis requires a special skill set including knowledge of streaming concepts such as the characteristics of unbounded streams, windows, time, and state as well as the skills to implement stream analysis jobs usually against Java or Scala APIs. People with this skill set are rare and hard to find.
About six months ago, the Apache Flink community started an effort to add a SQL interface for stream data analysis. SQL is the standard language to access and process data. Everybody who occasionally analyzes data is familiar with SQL. Consequently, a SQL interface for stream data processing will make this technology accessible to a much wider audience. Moreover, SQL support for streaming data will also enable new use cases such as interactive and ad-hoc stream analysis and significantly simplify many applications including stream ingestion and simple transformations. In this blog post, we report on the current status, architectural design, and future plans of the Apache Flink community to implement support for SQL as a language for analyzing data streams.
Where did we come from? # With the 0.9.0-milestone1 release, Apache Flink added an API to process relational data with SQL-like expressions called the Table API. The central concept of this API is a Table, a structured data set or stream on which relational operations can be applied. The Table API is tightly integrated with the DataSet and DataStream API. A Table can be easily created from a DataSet or DataStream and can also be converted back into a DataSet or DataStream as the following example shows
val execEnv = ExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(execEnv) // obtain a DataSet from somewhere val tempData: DataSet[(String, Long, Double)] = // convert the DataSet to a Table val tempTable: Table = tempData.toTable(tableEnv, &#39;location, &#39;time, &#39;tempF) // compute your result val avgTempCTable: Table = tempTable .where(&#39;location.like(&#34;room%&#34;)) .select( (&#39;time / (3600 * 24)) as &#39;day, &#39;Location as &#39;room, ((&#39;tempF - 32) * 0.556) as &#39;tempC ) .groupBy(&#39;day, &#39;room) .select(&#39;day, &#39;room, &#39;tempC.avg as &#39;avgTempC) // convert result Table back into a DataSet and print it avgTempCTable.toDataSet[Row].print() Although the example shows Scala code, there is also an equivalent Java version of the Table API. The following picture depicts the original architecture of the Table API.
A Table is created from a DataSet or DataStream and transformed into a new Table by applying relational transformations such as filter, join, or select on them. Internally, a logical table operator tree is constructed from the applied Table transformations. When a Table is translated back into a DataSet or DataStream, the respective translator translates the logical operator tree into DataSet or DataStream operators. Expressions like 'location.like(&quot;room%&quot;) are compiled into Flink functions via code generation.
However, the original Table API had a few limitations. First of all, it could not stand alone. Table API queries had to be always embedded into a DataSet or DataStream program. Queries against batch Tables did not support outer joins, sorting, and many scalar functions which are commonly used in SQL queries. Queries against streaming tables only supported filters, union, and projections and no aggregations or joins. Also, the translation process did not leverage query optimization techniques except for the physical optimization that is applied to all DataSet programs.
Table API joining forces with SQL # The discussion about adding support for SQL came up a few times in the Flink community. With Flink 0.9 and the availability of the Table API, code generation for relational expressions, and runtime operators, the foundation for such an extension seemed to be there and SQL support the next logical step. On the other hand, the community was also well aware of the multitude of dedicated &ldquo;SQL-on-Hadoop&rdquo; solutions in the open source landscape (Apache Hive, Apache Drill, Apache Impala, Apache Tajo, just to name a few). Given these alternatives, we figured that time would be better spent improving Flink in other ways than implementing yet another SQL-on-Hadoop solution.
However, with the growing popularity of stream processing and the increasing adoption of Flink in this area, the Flink community saw the need for a simpler API to enable more users to analyze streaming data. About half a year ago, we decided to take the Table API to the next level, extend the stream processing capabilities of the Table API, and add support for SQL on streaming data. What we came up with was a revised architecture for a Table API that supports SQL (and Table API) queries on streaming and static data sources. We did not want to reinvent the wheel and decided to build the new Table API on top of Apache Calcite, a popular SQL parser and optimizer framework. Apache Calcite is used by many projects including Apache Hive, Apache Drill, Cascading, and many more. Moreover, the Calcite community put SQL on streams on their roadmap which makes it a perfect fit for Flink&rsquo;s SQL interface.
Calcite is central in the new design as the following architecture sketch shows:
The new architecture features two integrated APIs to specify relational queries, the Table API and SQL. Queries of both APIs are validated against a catalog of registered tables and converted into Calcite&rsquo;s representation for logical plans. In this representation, stream and batch queries look exactly the same. Next, Calcite&rsquo;s cost-based optimizer applies transformation rules and optimizes the logical plans. Depending on the nature of the sources (streaming or static) we use different rule sets. Finally, the optimized plan is translated into a regular Flink DataStream or DataSet program. This step involves again code generation to compile relational expressions into Flink functions.
The new architecture of the Table API maintains the basic principles of the original Table API and improves it. It keeps a uniform interface for relational queries on streaming and static data. In addition, we take advantage of Calcite&rsquo;s query optimization framework and SQL parser. The design builds upon Flink&rsquo;s established APIs, i.e., the DataStream API that offers low-latency, high-throughput stream processing with exactly-once semantics and consistent results due to event-time processing, and the DataSet API with robust and efficient in-memory operators and pipelined data exchange. Any improvements to Flink&rsquo;s core APIs and engine will automatically improve the execution of Table API and SQL queries.
With this effort, we are adding SQL support for both streaming and static data to Flink. However, we do not want to see this as a competing solution to dedicated, high-performance SQL-on-Hadoop solutions, such as Impala, Drill, and Hive. Instead, we see the sweet spot of Flink&rsquo;s SQL integration primarily in providing access to streaming analytics to a wider audience. In addition, it will facilitate integrated applications that use Flink&rsquo;s API&rsquo;s as well as SQL while being executed on a single runtime engine.
How will Flink&rsquo;s SQL on streams look like? # So far we discussed the motivation for and architecture of Flink&rsquo;s stream SQL interface, but how will it actually look like? The new SQL interface is integrated into the Table API. DataStreams, DataSets, and external data sources can be registered as tables at the TableEnvironment in order to make them queryable with SQL. The TableEnvironment.sql() method states a SQL query and returns its result as a Table. The following example shows a complete program that reads a streaming table from a JSON encoded Kafka topic, processes it with a SQL query and writes the resulting stream into another Kafka topic. Please note that the KafkaJsonSource and KafkaJsonSink are under development and not available yet. In the future, TableSources and TableSinks can be persisted to and loaded from files to ease reuse of source and sink definitions and to reduce boilerplate code.
// get environments val execEnv = StreamExecutionEnvironment.getExecutionEnvironment val tableEnv = TableEnvironment.getTableEnvironment(execEnv) // configure Kafka connection val kafkaProps = ... // define a JSON encoded Kafka topic as external table val sensorSource = new KafkaJsonSource[(String, Long, Double)]( &#34;sensorTopic&#34;, kafkaProps, (&#34;location&#34;, &#34;time&#34;, &#34;tempF&#34;)) // register external table tableEnv.registerTableSource(&#34;sensorData&#34;, sensorSource) // define query in external table val roomSensors: Table = tableEnv.sql( &#34;SELECT STREAM time, location AS room, (tempF - 32) * 0.556 AS tempC &#34; + &#34;FROM sensorData &#34; + &#34;WHERE location LIKE &#39;room%&#39;&#34; ) // define a JSON encoded Kafka topic as external sink val roomSensorSink = new KafkaJsonSink(...) // define sink for room sensor data and execute query roomSensors.toSink(roomSensorSink) execEnv.execute() You might have noticed that this example left out the most interesting aspects of stream data processing: window aggregates and joins. How will these operations be expressed in SQL? Well, that is a very good question. The Apache Calcite community put out an excellent proposal that discusses the syntax and semantics of SQL on streams. It describes Calcite’s stream SQL as &ldquo;an extension to standard SQL, not another ‘SQL-like’ language&rdquo;. This has several benefits. First, people who are familiar with standard SQL will be able to analyze data streams without learning a new syntax. Queries on static tables and streams are (almost) identical and can be easily ported. Moreover it is possible to specify queries that reference static and streaming tables at the same time which goes well together with Flink’s vision to handle batch processing as a special case of stream processing, i.e., as processing finite streams. Finally, using standard SQL for stream data analysis means following a well established standard that is supported by many tools.
Although we haven’t completely fleshed out the details of how windows will be defined in Flink’s SQL syntax and Table API, the following examples show how a tumbling window query could look like in SQL and the Table API.
SQL (following the syntax proposal of Calcite’s streaming SQL document) # SELECT STREAM TUMBLE_END(time, INTERVAL &#39;1&#39; DAY) AS day, location AS room, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE &#39;room%&#39; GROUP BY TUMBLE(time, INTERVAL &#39;1&#39; DAY), location Table API # val avgRoomTemp: Table = tableEnv.ingest(&#34;sensorData&#34;) .where(&#39;location.like(&#34;room%&#34;)) .partitionBy(&#39;location) .window(Tumbling every Days(1) on &#39;time as &#39;w) .select(&#39;w.end, &#39;location, , ((&#39;tempF - 32) * 0.556).avg as &#39;avgTempCs) What&rsquo;s up next? # The Flink community is actively working on SQL support for the next minor version Flink 1.1.0. In the first version, SQL (and Table API) queries on streams will be limited to selection, filter, and union operators. Compared to Flink 1.0.0, the revised Table API will support many more scalar functions and be able to read tables from external sources and write them back to external sinks. A lot of work went into reworking the architecture of the Table API and integrating Apache Calcite.
In Flink 1.2.0, the feature set of SQL on streams will be significantly extended. Among other things, we plan to support different types of window aggregates and maybe also streaming joins. For this effort, we want to closely collaborate with the Apache Calcite community and help extending Calcite&rsquo;s support for relational operations on streaming data when necessary.
If this post made you curious and you want to try out Flink’s SQL interface and the new Table API, we encourage you to do so! Simply clone the SNAPSHOT master branch and check out the Table API documentation for the SNAPSHOT version. Please note that the branch is under heavy development, and hence some code examples in this blog post might not work. We are looking forward to your feedback and welcome contributions.
`}),e.add({id:236,href:"/2016/05/11/flink-1.0.3-released/",title:"Flink 1.0.3 Released",section:"Flink Blog",content:`Today, the Flink community released Flink version 1.0.3, the third bugfix release of the 1.0 series.
We recommend all users updating to this release by bumping the version of your Flink dependencies to 1.0.3 and updating the binaries on the server. You can find the binaries on the updated Downloads page.
Fixed Issues # Bug # [FLINK-3790] [streaming] Use proper hadoop config in rolling sink [FLINK-3840] Remove Testing Files in RocksDB Backend [FLINK-3835] [optimizer] Add input id to JSON plan to resolve ambiguous input names [hotfix] OptionSerializer.duplicate to respect stateful element serializer [FLINK-3803] [runtime] Pass CheckpointStatsTracker to ExecutionGraph [hotfix] [cep] Make cep window border treatment consistent Improvement # [FLINK-3678] [dist, docs] Make Flink logs directory configurable Docs # [docs] Add note about S3AFileSystem &lsquo;buffer.dir&rsquo; property [docs] Update AWS S3 docs Tests # [FLINK-3860] [connector-wikiedits] Add retry loop to WikipediaEditsSourceTest [streaming-contrib] Fix port clash in DbStateBackend tests `}),e.add({id:237,href:"/2016/04/22/flink-1.0.2-released/",title:"Flink 1.0.2 Released",section:"Flink Blog",content:`Today, the Flink community released Flink version 1.0.2, the second bugfix release of the 1.0 series.
We recommend all users updating to this release by bumping the version of your Flink dependencies to 1.0.2 and updating the binaries on the server. You can find the binaries on the updated Downloads page.
Fixed Issues # Bug # [FLINK-3657] [dataSet] Change access of DataSetUtils.countElements() to &lsquo;public&rsquo; [FLINK-3762] [core] Enable Kryo reference tracking [FLINK-3732] [core] Fix potential null deference in ExecutionConfig#equals() [FLINK-3760] Fix StateDescriptor.readObject [FLINK-3730] Fix RocksDB Local Directory Initialization [FLINK-3712] Make all dynamic properties available to the CLI frontend [FLINK-3688] WindowOperator.trigger() does not emit Watermark anymore [FLINK-3697] Properly access type information for nested POJO key selection Improvement # [FLINK-3654] Disable Write-Ahead-Log in RocksDB State Docs # [FLINK-2544] [docs] Add Java 8 version for building PowerMock tests to docs [FLINK-3469] [docs] Improve documentation for grouping keys [FLINK-3634] [docs] Fix documentation for DataSetUtils.zipWithUniqueId() [FLINK-3711][docs] Documentation of Scala fold()() uses correct syntax Tests # [FLINK-3716] [kafka consumer] Decreasing socket timeout so testFailOnNoBroker() will pass before JUnit timeout `}),e.add({id:238,href:"/2016/04/14/flink-forward-2016-call-for-submissions-is-now-open/",title:"Flink Forward 2016 Call for Submissions Is Now Open",section:"Flink Blog",content:`We are happy to announce that the call for submissions for Flink Forward 2016 is now open! The conference will take place September 12-14, 2016 in Berlin, Germany, bringing together the open source stream processing community. Most Apache Flink committers will attend the conference, making it the ideal venue to learn more about the project and its roadmap and connect with the community.
The conference welcomes submissions on everything Flink-related, including experiences with using Flink, products based on Flink, technical talks on extending Flink, as well as connecting Flink with other open source or proprietary software.
Read more here.
`}),e.add({id:239,href:"/2016/04/06/introducing-complex-event-processing-cep-with-apache-flink/",title:"Introducing Complex Event Processing (CEP) with Apache Flink",section:"Flink Blog",content:`With the ubiquity of sensor networks and smart devices continuously collecting more and more data, we face the challenge to analyze an ever growing stream of data in near real-time. Being able to react quickly to changing trends or to deliver up to date business intelligence can be a decisive factor for a company’s success or failure. A key problem in real time processing is the detection of event patterns in data streams.
Complex event processing (CEP) addresses exactly this problem of matching continuously incoming events against a pattern. The result of a matching are usually complex events which are derived from the input events. In contrast to traditional DBMSs where a query is executed on stored data, CEP executes data on a stored query. All data which is not relevant for the query can be immediately discarded. The advantages of this approach are obvious, given that CEP queries are applied on a potentially infinite stream of data. Furthermore, inputs are processed immediately. Once the system has seen all events for a matching sequence, results are emitted straight away. This aspect effectively leads to CEP’s real time analytics capability.
Consequently, CEP’s processing paradigm drew significant interest and found application in a wide variety of use cases. Most notably, CEP is used nowadays for financial applications such as stock market trend and credit card fraud detection. Moreover, it is used in RFID-based tracking and monitoring, for example, to detect thefts in a warehouse where items are not properly checked out. CEP can also be used to detect network intrusion by specifying patterns of suspicious user behaviour.
Apache Flink with its true streaming nature and its capabilities for low latency as well as high throughput stream processing is a natural fit for CEP workloads. Consequently, the Flink community has introduced the first version of a new CEP library with Flink 1.0. In the remainder of this blog post, we introduce Flink’s CEP library and we illustrate its ease of use through the example of monitoring a data center.
Monitoring and alert generation for data centers # Assume we have a data center with a number of racks. For each rack the power consumption and the temperature are monitored. Whenever such a measurement takes place, a new power or temperature event is generated, respectively. Based on this monitoring event stream, we want to detect racks that are about to overheat, and dynamically adapt their workload and cooling.
For this scenario we use a two staged approach. First, we monitor the temperature events. Whenever we see two consecutive events whose temperature exceeds a threshold value, we generate a temperature warning with the current average temperature. A temperature warning does not necessarily indicate that a rack is about to overheat. But whenever we see two consecutive warnings with increasing temperatures, then we want to issue an alert for this rack. This alert can then lead to countermeasures to cool the rack.
Implementation with Apache Flink # First, we define the messages of the incoming monitoring event stream. Every monitoring message contains its originating rack ID. The temperature event additionally contains the current temperature and the power consumption event contains the current voltage. We model the events as POJOs:
public abstract class MonitoringEvent { private int rackID; ... } public class TemperatureEvent extends MonitoringEvent { private double temperature; ... } public class PowerEvent extends MonitoringEvent { private double voltage; ... } Now we can ingest the monitoring event stream using one of Flink’s connectors (e.g. Kafka, RabbitMQ, etc.). This will give us a DataStream&lt;MonitoringEvent&gt; inputEventStream which we will use as the input for Flink’s CEP operator. But first, we have to define the event pattern to detect temperature warnings. The CEP library offers an intuitive Pattern API to easily define these complex patterns.
Every pattern consists of a sequence of events which can have optional filter conditions assigned. A pattern always starts with a first event to which we will assign the name “First Event”.
Pattern.&lt;MonitoringEvent&gt;begin(&#34;First Event&#34;); This pattern will match every monitoring event. Since we are only interested in TemperatureEvents whose temperature is above a threshold value, we have to add an additional subtype constraint and a where clause:
Pattern.&lt;MonitoringEvent&gt;begin(&#34;First Event&#34;) .subtype(TemperatureEvent.class) .where(evt -&gt; evt.getTemperature() &gt;= TEMPERATURE_THRESHOLD); As stated before, we want to generate a TemperatureWarning if and only if we see two consecutive TemperatureEvents for the same rack whose temperatures are too high. The Pattern API offers the next call which allows us to add a new event to our pattern. This event has to follow directly the first matching event in order for the whole pattern to match.
Pattern&lt;MonitoringEvent, ?&gt; warningPattern = Pattern.&lt;MonitoringEvent&gt;begin(&#34;First Event&#34;) .subtype(TemperatureEvent.class) .where(evt -&gt; evt.getTemperature() &gt;= TEMPERATURE_THRESHOLD) .next(&#34;Second Event&#34;) .subtype(TemperatureEvent.class) .where(evt -&gt; evt.getTemperature() &gt;= TEMPERATURE_THRESHOLD) .within(Time.seconds(10)); The final pattern definition also contains the within API call which defines that two consecutive TemperatureEvents have to occur within a time interval of 10 seconds for the pattern to match. Depending on the time characteristic setting, this can either be processing, ingestion or event time.
Having defined the event pattern, we can now apply it on the inputEventStream.
PatternStream&lt;MonitoringEvent&gt; tempPatternStream = CEP.pattern( inputEventStream.keyBy(&#34;rackID&#34;), warningPattern); Since we want to generate our warnings for each rack individually, we keyBy the input event stream by the “rackID” POJO field. This enforces that matching events of our pattern will all have the same rack ID.
The PatternStream&lt;MonitoringEvent&gt; gives us access to successfully matched event sequences. They can be accessed using the select API call. The select API call takes a PatternSelectFunction which is called for every matching event sequence. The event sequence is provided as a Map&lt;String, MonitoringEvent&gt; where each MonitoringEvent is identified by its assigned event name. Our pattern select function generates for each matching pattern a TemperatureWarning event.
public class TemperatureWarning { private int rackID; private double averageTemperature; ... } DataStream&lt;TemperatureWarning&gt; warnings = tempPatternStream.select( (Map&lt;String, MonitoringEvent&gt; pattern) -&gt; { TemperatureEvent first = (TemperatureEvent) pattern.get(&#34;First Event&#34;); TemperatureEvent second = (TemperatureEvent) pattern.get(&#34;Second Event&#34;); return new TemperatureWarning( first.getRackID(), (first.getTemperature() + second.getTemperature()) / 2); } ); Now we have generated a new complex event stream DataStream&lt;TemperatureWarning&gt; warnings from the initial monitoring event stream. This complex event stream can again be used as the input for another round of complex event processing. We use the TemperatureWarnings to generate TemperatureAlerts whenever we see two consecutive TemperatureWarnings for the same rack with increasing temperatures. The TemperatureAlerts have the following definition:
public class TemperatureAlert { private int rackID; ... } At first, we have to define our alert event pattern:
Pattern&lt;TemperatureWarning, ?&gt; alertPattern = Pattern.&lt;TemperatureWarning&gt;begin(&#34;First Event&#34;) .next(&#34;Second Event&#34;) .within(Time.seconds(20)); This definition says that we want to see two TemperatureWarnings within 20 seconds. The first event has the name “First Event” and the second consecutive event has the name “Second Event”. The individual events don’t have a where clause assigned, because we need access to both events in order to decide whether the temperature is increasing. Therefore, we apply the filter condition in the select clause. But first, we obtain again a PatternStream.
PatternStream&lt;TemperatureWarning&gt; alertPatternStream = CEP.pattern( warnings.keyBy(&#34;rackID&#34;), alertPattern); Again, we keyBy the warnings input stream by the &quot;rackID&quot; so that we generate our alerts for each rack individually. Next we apply the flatSelect method which will give us access to matching event sequences and allows us to output an arbitrary number of complex events. Thus, we will only generate a TemperatureAlert if and only if the temperature is increasing.
DataStream&lt;TemperatureAlert&gt; alerts = alertPatternStream.flatSelect( (Map&lt;String, TemperatureWarning&gt; pattern, Collector&lt;TemperatureAlert&gt; out) -&gt; { TemperatureWarning first = pattern.get(&#34;First Event&#34;); TemperatureWarning second = pattern.get(&#34;Second Event&#34;); if (first.getAverageTemperature() &lt; second.getAverageTemperature()) { out.collect(new TemperatureAlert(first.getRackID())); } }); The DataStream&lt;TemperatureAlert&gt; alerts is the data stream of temperature alerts for each rack. Based on these alerts we can now adapt the workload or cooling for overheating racks.
The full source code for the presented example as well as an example data source which generates randomly monitoring events can be found in this repository.
Conclusion # In this blog post we have seen how easy it is to reason about event streams using Flink’s CEP library. Using the example of monitoring and alert generation for a data center, we have implemented a short program which notifies us when a rack is about to overheat and potentially to fail.
In the future, the Flink community will further extend the CEP library’s functionality and expressiveness. Next on the road map is support for a regular expression-like pattern specification, including Kleene star, lower and upper bounds, and negation. Furthermore, it is planned to allow the where-clause to access fields of previously matched events. This feature will allow to prune unpromising event sequences early.
Note: The example code requires Flink 1.0.1 or higher.
`}),e.add({id:240,href:"/2016/04/06/flink-1.0.1-released/",title:"Flink 1.0.1 Released",section:"Flink Blog",content:`Today, the Flink community released Flink version 1.0.1, the first bugfix release of the 1.0 series.
We recommend all users updating to this release by bumping the version of your Flink dependencies to 1.0.1 and updating the binaries on the server. You can find the binaries on the updated Downloads page.
Fixed Issues # Bug [FLINK-3179] - Combiner is not injected if Reduce or GroupReduce input is explicitly partitioned [FLINK-3472] - JDBCInputFormat.nextRecord(..) has misleading message on NPE [FLINK-3491] - HDFSCopyUtilitiesTest fails on Windows [FLINK-3495] - RocksDB Tests can&#39;t run on Windows [FLINK-3533] - Update the Gelly docs wrt examples and cluster execution [FLINK-3563] - .returns() doesn&#39;t compile when using .map() with a custom MapFunction [FLINK-3566] - Input type validation often fails on custom TypeInfo implementations [FLINK-3578] - Scala DataStream API does not support Rich Window Functions [FLINK-3595] - Kafka09 consumer thread does not interrupt when stuck in record emission [FLINK-3602] - Recursive Types are not supported / crash TypeExtractor [FLINK-3621] - Misleading documentation of memory configuration parameters [FLINK-3629] - In wikiedits Quick Start example, &quot;The first call, .window()&quot; should be &quot;The first call, .timeWindow()&quot; [FLINK-3651] - Fix faulty RollingSink Restore [FLINK-3653] - recovery.zookeeper.storageDir is not documented on the configuration page [FLINK-3663] - FlinkKafkaConsumerBase.logPartitionInfo is missing a log marker [FLINK-3681] - CEP library does not support Java 8 lambdas as select function [FLINK-3682] - CEP operator does not set the processing timestamp correctly [FLINK-3684] - CEP operator does not forward watermarks properly Improvement [FLINK-3570] - Replace random NIC selection heuristic by InetAddress.getLocalHost [FLINK-3575] - Update Working With State Section in Doc [FLINK-3591] - Replace Quickstart K-Means Example by Streaming Example Test [FLINK-2444] - Add tests for HadoopInputFormats [FLINK-2445] - Add tests for HadoopOutputFormats `}),e.add({id:241,href:"/2016/03/08/announcing-apache-flink-1.0.0/",title:"Announcing Apache Flink 1.0.0",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the availability of the 1.0.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on improving the experience of writing and executing data stream processing pipelines in production.
Flink version 1.0.0 marks the beginning of the 1.X.X series of releases, which will maintain backwards compatibility with 1.0.0. This means that applications written against stable APIs of Flink 1.0.0 will compile and run with all Flink versions in the 1. series. This is the first time we are formally guaranteeing compatibility in Flink&rsquo;s history, and we therefore see this release as a major milestone of the project, perhaps the most important since graduation as a top-level project.
Apart from backwards compatibility, Flink 1.0.0 brings a variety of new user-facing features, as well as tons of bug fixes. About 64 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!
Interface stability annotations # Flink 1.0.0 introduces interface stability annotations for API classes and methods. Interfaces defined as @Public are guaranteed to remain stable across all releases of the 1.x series. The @PublicEvolving annotation marks API features that may be subject to change in future versions.
Flink&rsquo;s stability annotations will help users to implement applications that compile and execute unchanged against future versions of Flink 1.x. This greatly reduces the complexity for users when upgrading to a newer Flink release.
Out-of-core state support # Flink 1.0.0 adds a new state backend that uses RocksDB to store state (both windows and user-defined key-value state). RocksDB is an embedded key/value store database, originally developed by Facebook. When using this backend, active state in streaming programs can grow well beyond memory. The RocksDB files are stored in a distributed file system such as HDFS or S3 for backups.
Savepoints and version upgrades # Savepoints are checkpoints of the state of a running streaming job that can be manually triggered by the user while the job is running. Savepoints solve several production headaches, including code upgrades (both application and framework), cluster maintenance and migration, A/B testing and what-if scenarios, as well as testing and debugging. Read more about savepoints at the data Artisans blog.
Library for Complex Event Processing (CEP) # Complex Event Processing has been one of the oldest and more important use cases from stream processing. The new CEP functionality in Flink allows you to use a distributed general-purpose stream processor instead of a specialized CEP system to detect complex patterns in event streams. Get started with CEP on Flink.
Enhanced monitoring interface: job submission, checkpoint statistics and backpressure monitoring # The web interface now allows users to submit jobs. Previous Flink releases had a separate service for submitting jobs. The new interface is part of the JobManager frontend. It also works on YARN now.
Backpressure monitoring allows users to trigger a sampling mechanism which analyzes the time operators are waiting for new network buffers. When senders are spending most of their time for new network buffers, they are experiencing backpressure from their downstream operators. Many users requested this feature for understanding bottlenecks in both batch and streaming applications.
Improved checkpointing control and monitoring # The checkpointing has been extended by a more fine-grained control mechanism: In previous versions, new checkpoints were triggered independent of the speed at which old checkpoints completed. This can lead to situations where new checkpoints are piling up, because they are triggered too frequently.
The checkpoint coordinator now exposes statistics through our REST monitoring API and the web interface. Users can review the checkpoint size and duration on a per-operator basis and see the last completed checkpoints. This is helpful for identifying performance issues, such as processing slowdown by the checkpoints.
Improved Kafka connector and support for Kafka 0.9 # Flink 1.0 supports both Kafka 0.8 and 0.9. With the new release, Flink exposes Kafka metrics for the producers and the 0.9 consumer through Flink’s accumulator system. We also enhanced the existing connector for Kafka 0.8, allowing users to subscribe to multiple topics in one source.
Changelog and known issues # This release resolves more than 450 issues, including bug fixes, improvements, and new features. See the complete changelog and known issues.
List of contributors # Abhishek Agarwal Ajay Bhat Aljoscha Krettek Andra Lungu Andrea Sella Chesnay Schepler Chiwan Park Daniel Pape Fabian Hueske Filipe Correia Frederick F. Kautz IV Gabor Gevay Gabor Horvath Georgios Andrianakis Greg Hogan Gyula Fora Henry Saputra Hilmi Yildirim Hubert Czerpak Jark Wu Johannes Jun Aoki Jun Aoki Kostas Kloudas Li Chengxiang Lun Gao Martin Junghanns Martin Liesenberg Matthias J. Sax Maximilian Michels Márton Balassi Nick Dimiduk Niels Basjes Omer Katz Paris Carbone Patrice Freydiere Peter Vandenabeele Piotr Godek Prez Cannady Robert Metzger Romeo Kienzler Sachin Goel Saumitra Shahapure Sebastian Klemke Stefano Baghino Stephan Ewen Stephen Samuel Subhobrata Dey Suneel Marthi Ted Yu Theodore Vasiloudis Till Rohrmann Timo Walther Trevor Grant Ufuk Celebi Ulf Karlsson Vasia Kalavri fversaci madhukar qingmeng.wyh ramkrishna rtudoran sahitya-pavurala zhangminglei `}),e.add({id:242,href:"/2016/02/11/flink-0.10.2-released/",title:"Flink 0.10.2 Released",section:"Flink Blog",content:`Today, the Flink community released Flink version 0.10.2, the second bugfix release of the 0.10 series.
We recommend all users updating to this release by bumping the version of your Flink dependencies to 0.10.2 and updating the binaries on the server.
Issues fixed # FLINK-3242: Adjust StateBackendITCase for 0.10 signatures of state backends FLINK-3236: Flink user code classloader as parent classloader from Flink core classes FLINK-2962: Cluster startup script refers to unused variable FLINK-3151: Downgrade to Netty version 4.0.27.Final FLINK-3224: Call setInputType() on output formats that implement InputTypeConfigurable FLINK-3218: Fix overriding of user parameters when merging Hadoop configurations FLINK-3189: Fix argument parsing of CLI client INFO action FLINK-3176: Improve documentation for window apply FLINK-3185: Log error on failure during recovery FLINK-3185: Don&rsquo;t swallow test failure Exception FLINK-3147: Expose HadoopOutputFormatBase fields as protected FLINK-3145: Pin Kryo version of transitive dependencies FLINK-3143: Update Closure Cleaner&rsquo;s ASM references to ASM5 FLINK-3136: Fix shaded imports in ClosureCleaner.scala FLINK-3108: JoinOperator&rsquo;s with() calls the wrong TypeExtractor method FLINK-3125: Web server starts also when JobManager log files cannot be accessed. FLINK-3080: Relax restrictions of DataStream.union() FLINK-3081: Properly stop periodic Kafka committer FLINK-3082: Fixed confusing error about an interface that no longer exists FLINK-3067: Enforce zkclient 0.7 for Kafka FLINK-3020: Set number of task slots to maximum parallelism in local execution `}),e.add({id:243,href:"/2015/12/18/flink-2015-a-year-in-review-and-a-lookout-to-2016/",title:"Flink 2015: A year in review, and a lookout to 2016",section:"Flink Blog",content:`With 2015 ending, we thought that this would be good time to reflect on the amazing work done by the Flink community over this past year, and how much this community has grown.
Overall, we have seen Flink grow in terms of functionality from an engine to one of the most complete open-source stream processing frameworks available. The community grew from a relatively small and geographically focused team, to a truly global, and one of the largest big data communities in the the Apache Software Foundation.
We will also look at some interesting stats, including that the busiest days for Flink are Mondays (who would have thought :-).
Community growth # Let us start with some simple statistics from Flink&rsquo;s github repository. During 2015, the Flink community doubled in size, from about 75 contributors to over 150. Forks of the repository more than tripled from 160 in February 2015 to 544 in December 2015, and the number of stars of the repository almost tripled from 289 to 813.
Although Flink started out geographically in Berlin, Germany, the community is by now spread all around the globe, with many contributors from North America, Europe, and Asia. A simple search at meetup.com for groups that mention Flink as a focus area reveals 16 meetups around the globe:
Flink Forward 2015 # One of the highlights of the year for Flink was undoubtedly the Flink Forward conference, the first conference on Apache Flink that was held in October in Berlin. More than 250 participants (roughly half based outside Germany where the conference was held) attended more than 33 technical talks from organizations including Google, MongoDB, Bouygues Telecom, NFLabs, Euranova, RedHat, IBM, Huawei, Intel, Ericsson, Capital One, Zalando, Amadeus, the Otto Group, and ResearchGate. If you have not yet watched their talks, check out the slides and videos from Flink Forward.
Media coverage # And of course, interest in Flink was picked up by the tech media. During 2015, articles about Flink appeared in InfoQ, ZDNet, Datanami, Infoworld (including being one of the best open source big data tools of 2015), the Gartner blog, Dataconomy, SDTimes, the MapR blog, KDnuggets, and HadoopSphere.
It is interesting to see that Hadoop Summit EMEA 2016 had a whopping number of 17 (!) talks submitted that are mentioning Flink in their title and abstract:
Fun with stats: when do committers commit? # To get some deeper insight on what is happening in the Flink community, let us do some analytics on the git log of the project :-) The easiest thing we can do is count the number of commits at the repository in 2015. Running
git log --pretty=oneline --after=1/1/2015 | wc -l on the Flink repository yields a total of 2203 commits in 2015.
To dig deeper, we will use an open source tool called gitstats that will give us some interesting statistics on the committer behavior. You can create these also yourself and see many more by following four easy steps:
Download gitstats from the project homepage.. E.g., on OS X with homebrew, type brew install --HEAD homebrew/head-only/gitstats Clone the Apache Flink git repository: git clone git@github.com:apache/flink.git Generate the statistics gitstats flink/ flink-stats/ View all the statistics as an html page using your favorite browser (e.g., chrome): chrome flink-stats/index.html First, we can see a steady growth of lines of code in Flink since the initial Apache incubator project. During 2015, the codebase almost doubled from 500,000 LOC to 900,000 LOC.
It is interesting to see when committers commit. For Flink, Monday afternoons are by far the most popular times to commit to the repository:
Feature timeline # So, what were the major features added to Flink and the Flink ecosystem during 2015? Here is a (non-exhaustive) chronological list:
Roadmap for 2016 # With 2015 coming to a close, the Flink community has already started discussing Flink&rsquo;s roadmap for the future. Some highlights are:
Runtime scaling of streaming jobs: streaming jobs are running forever, and need to react to a changing environment. Runtime scaling means dynamically increasing and decreasing the parallelism of a job to sustain certain SLAs, or react to changing input throughput.
SQL queries for static data sets and streams: building on top of Flink&rsquo;s Table API, users should be able to write SQL queries for static data sets, as well as SQL queries on data streams that continuously produce new results.
Streaming operators backed by managed memory: currently, streaming operators like user-defined state and windows are backed by JVM heap objects. Moving those to Flink managed memory will add the ability to spill to disk, GC efficiency, as well as better control over memory utilization.
Library for detecting temporal event patterns: a common use case for stream processing is detecting patterns in an event stream with timestamps. Flink makes this possible with its support for event time, so many of these operators can be surfaced in the form of a library.
Support for Apache Mesos, and resource-dynamic YARN support: support for both Mesos and YARN, including dynamic allocation and release of resource for more resource elasticity (for both batch and stream processing).
Security: encrypt both the messages exchanged between TaskManagers and JobManager, as well as the connections for data exchange between workers.
More streaming connectors, more runtime metrics, and continuous DataStream API enhancements: add support for more sources and sinks (e.g., Amazon Kinesis, Cassandra, Flume, etc), expose more metrics to the user, and provide continuous improvements to the DataStream API.
If you are interested in these features, we highly encourage you to take a look at the current draft, and join the discussion on the Flink mailing lists.
`}),e.add({id:244,href:"/2015/12/11/storm-compatibility-in-apache-flink-how-to-run-existing-storm-topologies-on-flink/",title:"Storm Compatibility in Apache Flink: How to run existing Storm topologies on Flink",section:"Flink Blog",content:`Apache Storm was one of the first distributed and scalable stream processing systems available in the open source space offering (near) real-time tuple-by-tuple processing semantics. Initially released by the developers at Backtype in 2011 under the Eclipse open-source license, it became popular very quickly. Only shortly afterwards, Twitter acquired Backtype. Since then, Storm has been growing in popularity, is used in production at many big companies, and is the de-facto industry standard for big data stream processing. In 2013, Storm entered the Apache incubator program, followed by its graduation to top-level in 2014.
Apache Flink is a stream processing engine that improves upon older technologies like Storm in several dimensions, including strong consistency guarantees (&ldquo;exactly once&rdquo;), a higher level DataStream API, support for event time and a rich windowing system, as well as superior throughput with competitive low latency.
While Flink offers several technical benefits over Storm, an existing investment on a codebase of applications developed for Storm often makes it difficult to switch engines. For these reasons, as part of the Flink 0.10 release, Flink ships with a Storm compatibility package that allows users to:
Run unmodified Storm topologies using Apache Flink benefiting from superior performance. Embed Storm code (spouts and bolts) as operators inside Flink DataStream programs. Only minor code changes are required in order to submit the program to Flink instead of Storm. This minimizes the work for developers to run existing Storm topologies while leveraging Apache Flink’s fast and robust execution engine.
We note that the Storm compatibility package is continuously improving and does not cover the full spectrum of Storm’s API. However, it is powerful enough to cover many use cases.
Executing Storm topologies with Flink # The easiest way to use the Storm compatibility package is by executing a whole Storm topology in Flink. For this, you only need to replace the dependency storm-core by flink-storm in your Storm project and change two lines of code in your original Storm program.
The following example shows a simple Storm-Word-Count-Program that can be executed in Flink. First, the program is assembled the Storm way without any code change to Spouts, Bolts, or the topology itself.
// assemble topology, the Storm way TopologyBuilder builder = new TopologyBuilder(); builder.setSpout(&#34;source&#34;, new StormFileSpout(inputFilePath)); builder.setBolt(&#34;tokenizer&#34;, new StormBoltTokenizer()) .shuffleGrouping(&#34;source&#34;); builder.setBolt(&#34;counter&#34;, new StormBoltCounter()) .fieldsGrouping(&#34;tokenizer&#34;, new Fields(&#34;word&#34;)); builder.setBolt(&#34;sink&#34;, new StormBoltFileSink(outputFilePath)) .shuffleGrouping(&#34;counter&#34;); In order to execute the topology, we need to translate it to a FlinkTopology and submit it to a local or remote Flink cluster, very similar to submitting the application to a Storm cluster.1
// transform Storm topology to Flink program // replaces: StormTopology topology = builder.createTopology(); FlinkTopology topology = FlinkTopology.createTopology(builder); Config conf = new Config(); if(runLocal) { // use FlinkLocalCluster instead of LocalCluster FlinkLocalCluster cluster = FlinkLocalCluster.getLocalCluster(); cluster.submitTopology(&#34;WordCount&#34;, conf, topology); } else { // use FlinkSubmitter instead of StormSubmitter FlinkSubmitter.submitTopology(&#34;WordCount&#34;, conf, topology); } As a shorter Flink-style alternative that replaces the Storm-style submission code, you can also use context-based job execution:
// transform Storm topology to Flink program (as above) FlinkTopology topology = FlinkTopology.createTopology(builder); // executes locally by default or remotely if submitted with Flink&#39;s command-line client topology.execute() After the code is packaged in a jar file (e.g., StormWordCount.jar), it can be easily submitted to Flink via
bin/flink run StormWordCount.jar The used Spouts and Bolts as well as the topology assemble code is not changed at all! Only the translation and submission step have to be changed to the Storm-API compatible Flink pendants. This allows for minimal code changes and easy adaption to Flink.
Embedding Spouts and Bolts in Flink programs # It is also possible to use Spouts and Bolts within a regular Flink DataStream program. The compatibility package provides wrapper classes for Spouts and Bolts which are implemented as a Flink SourceFunction and StreamOperator respectively. Those wrappers automatically translate incoming Flink POJO and TupleXX records into Storm&rsquo;s Tuple type and emitted Values back into either POJOs or TupleXX types for further processing by Flink operators. As Storm is type agnostic, it is required to specify the output type of embedded Spouts/Bolts manually to get a fully typed Flink streaming program.
// use regular Flink streaming environment StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); // use Spout as source DataStream&lt;Tuple1&lt;String&gt;&gt; source = env.addSource(// Flink provided wrapper including original Spout new SpoutWrapper&lt;String&gt;(new FileSpout(localFilePath)), // specify output type manually TypeExtractor.getForObject(new Tuple1&lt;String&gt;(&#34;&#34;))); // FileSpout cannot be parallelized DataStream&lt;Tuple1&lt;String&gt;&gt; text = source.setParallelism(1); // further processing with Flink DataStream&lt;Tuple2&lt;String,Integer&gt; tokens = text.flatMap(new Tokenizer()).keyBy(0); // use Bolt for counting DataStream&lt;Tuple2&lt;String,Integer&gt; counts = tokens.transform(&#34;Counter&#34;, // specify output type manually TypeExtractor.getForObject(new Tuple2&lt;String,Integer&gt;(&#34;&#34;,0)) // Flink provided wrapper including original Bolt new BoltWrapper&lt;String,Tuple2&lt;String,Integer&gt;&gt;(new BoltCounter())); // write result to file via Flink sink counts.writeAsText(outputPath); // start Flink job env.execute(&#34;WordCount with Spout source and Bolt counter&#34;); Although some boilerplate code is needed (we plan to address this soon!), the actual embedded Spout and Bolt code can be used unmodified. We also note that the resulting program is fully typed, and type errors will be found by Flink&rsquo;s type extractor even if the original Spouts and Bolts are not.
Outlook # The Storm compatibility package is currently in beta and undergoes continuous development. We are currently working on providing consistency guarantees for stateful Bolts. Furthermore, we want to provide a better API integration for embedded Spouts and Bolts by providing a &ldquo;StormExecutionEnvironment&rdquo; as a special extension of Flink&rsquo;s StreamExecutionEnvironment. We are also investigating the integration of Storm&rsquo;s higher-level programming API Trident.
Summary # Flink&rsquo;s compatibility package for Storm allows using unmodified Spouts and Bolts within Flink. This enables you to even embed third-party Spouts and Bolts where the source code is not available. While you can embed Spouts/Bolts in a Flink program and mix-and-match them with Flink operators, running whole topologies is the easiest way to get started and can be achieved with almost no code changes.
If you want to try out Flink&rsquo;s Storm compatibility package checkout our Documentation.
1. We confess, there are three lines changed compared to a Storm project &mdash;because the example covers local and remote execution. ↩
`}),e.add({id:245,href:"/2015/12/04/introducing-stream-windows-in-apache-flink/",title:"Introducing Stream Windows in Apache Flink",section:"Flink Blog",content:`The data analysis space is witnessing an evolution from batch to stream processing for many use cases. Although batch can be handled as a special case of stream processing, analyzing never-ending streaming data often requires a shift in the mindset and comes with its own terminology (for example, “windowing” and “at-least-once”/”exactly-once” processing). This shift and the new terminology can be quite confusing for people being new to the space of stream processing. Apache Flink is a production-ready stream processor with an easy-to-use yet very expressive API to define advanced stream analysis programs. Flink&rsquo;s API features very flexible window definitions on data streams which let it stand out among other open source stream processors.
In this blog post, we discuss the concept of windows for stream processing, present Flink&rsquo;s built-in windows, and explain its support for custom windowing semantics.
What are windows and what are they good for? # Consider the example of a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location. The resulting stream could look like:
If you would like to know, how many vehicles passed that location, you would simply sum the individual counts. However, the nature of a sensor stream is that it continuously produces data. Such a stream never ends and it is not possible to compute a final sum that can be returned. Instead, it is possible to compute rolling sums, i.e., return for each input event an updated sum record. This would yield a new stream of partial sums.
However, a stream of partial sums might not be what we are looking for, because it constantly updates the count and even more important, some information such as variation over time is lost. Hence, we might want to rephrase our question and ask for the number of cars that pass the location every minute. This requires us to group the elements of the stream into finite sets, each set corresponding to sixty seconds. This operation is called a tumbling windows operation.
Tumbling windows discretize a stream into non-overlapping windows. For certain applications it is important that windows are not disjunct because an application might require smoothed aggregates. For example, we can compute every thirty seconds the number of cars passed in the last minute. Such windows are called sliding windows.
Defining windows on a data stream as discussed before is a non-parallel operation. This is because each element of a stream must be processed by the same window operator that decides which windows the element should be added to. Windows on a full stream are called AllWindows in Flink. For many applications, a data stream needs to be grouped into multiple logical streams on each of which a window operator can be applied. Think for example about a stream of vehicle counts from multiple traffic sensors (instead of only one sensor as in our previous example), where each sensor monitors a different location. By grouping the stream by sensor id, we can compute windowed traffic statistics for each location in parallel. In Flink, we call such partitioned windows simply Windows, as they are the common case for distributed streams. The following figure shows tumbling windows that collect two elements over a stream of (sensorId, count) pair elements.
Generally speaking, a window defines a finite set of elements on an unbounded stream. This set can be based on time (as in our previous examples), element counts, a combination of counts and time, or some custom logic to assign elements to windows. Flink&rsquo;s DataStream API provides concise operators for the most common window operations as well as a generic windowing mechanism that allows users to define very custom windowing logic. In the following we present Flink&rsquo;s time and count windows before discussing its windowing mechanism in detail.
Time Windows # As their name suggests, time windows group stream elements by time. For example, a tumbling time window of one minute collects elements for one minute and applies a function on all elements in the window after one minute passed.
Defining tumbling and sliding time windows in Apache Flink is very easy:
// Stream of (sensorId, carCnt) val vehicleCnts: DataStream[(Int, Int)] = ... val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts // key stream by sensorId .keyBy(0) // tumbling time window of 1 minute length .timeWindow(Time.minutes(1)) // compute sum over carCnt .sum(1) val slidingCnts: DataStream[(Int, Int)] = vehicleCnts .keyBy(0) // sliding time window of 1 minute length and 30 secs trigger interval .timeWindow(Time.minutes(1), Time.seconds(30)) .sum(1) There is one aspect that we haven&rsquo;t discussed yet, namely the exact meaning of &ldquo;collects elements for one minute&rdquo; which boils down to the question, &ldquo;How does the stream processor interpret time?&rdquo;.
Apache Flink features three different notions of time, namely processing time, event time, and ingestion time.
In processing time, windows are defined with respect to the wall clock of the machine that builds and processes a window, i.e., a one minute processing time window collects elements for exactly one minute. In event time, windows are defined with respect to timestamps that are attached to each event record. This is common for many types of events, such as log entries, sensor data, etc, where the timestamp usually represents the time at which the event occurred. Event time has several benefits over processing time. First of all, it decouples the program semantics from the actual serving speed of the source and the processing performance of system. Hence you can process historic data, which is served at maximum speed, and continuously produced data with the same program. It also prevents semantically incorrect results in case of backpressure or delays due to failure recovery. Second, event time windows compute correct results, even if events arrive out-of-order of their timestamp which is common if a data stream gathers events from distributed sources. Ingestion time is a hybrid of processing and event time. It assigns wall clock timestamps to records as soon as they arrive in the system (at the source) and continues processing with event time semantics based on the attached timestamps. Count Windows # Apache Flink also features count windows. A tumbling count window of 100 will collect 100 events in a window and evaluate the window when the 100th element has been added.
In Flink&rsquo;s DataStream API, tumbling and sliding count windows are defined as follows:
// Stream of (sensorId, carCnt) val vehicleCnts: DataStream[(Int, Int)] = ... val tumblingCnts: DataStream[(Int, Int)] = vehicleCnts // key stream by sensorId .keyBy(0) // tumbling count window of 100 elements size .countWindow(100) // compute the carCnt sum .sum(1) val slidingCnts: DataStream[(Int, Int)] = vehicleCnts .keyBy(0) // sliding count window of 100 elements size and 10 elements trigger interval .countWindow(100, 10) .sum(1) Dissecting Flink&rsquo;s windowing mechanics # Flink&rsquo;s built-in time and count windows cover a wide range of common window use cases. However, there are of course applications that require custom windowing logic that cannot be addressed by Flink&rsquo;s built-in windows. In order to support also applications that need very specific windowing semantics, the DataStream API exposes interfaces for the internals of its windowing mechanics. These interfaces give very fine-grained control about the way that windows are built and evaluated.
The following figure depicts Flink&rsquo;s windowing mechanism and introduces the components being involved.
Elements that arrive at a window operator are handed to a WindowAssigner. The WindowAssigner assigns elements to one or more windows, possibly creating new windows. A Window itself is just an identifier for a list of elements and may provide some optional meta information, such as begin and end time in case of a TimeWindow. Note that an element can be added to multiple windows, which also means that multiple windows can exist at the same time.
Each window owns a Trigger that decides when the window is evaluated or purged. The trigger is called for each element that is inserted into the window and when a previously registered timer times out. On each event, a trigger can decide to fire (i.e., evaluate), purge (remove the window and discard its content), or fire and then purge the window. A trigger that just fires evaluates the window and keeps it as it is, i.e., all elements remain in the window and are evaluated again when the triggers fires the next time. A window can be evaluated several times and exists until it is purged. Note that a window consumes memory until it is purged.
When a Trigger fires, the list of window elements can be given to an optional Evictor. The evictor can iterate through the list and decide to cut off some elements from the start of the list, i.e., remove some of the elements that entered the window first. The remaining elements are given to an evaluation function. If no Evictor was defined, the Trigger hands all the window elements directly to the evaluation function.
The evaluation function receives the elements of a window (possibly filtered by an Evictor) and computes one or more result elements for the window. The DataStream API accepts different types of evaluation functions, including predefined aggregation functions such as sum(), min(), max(), as well as a ReduceFunction, FoldFunction, or WindowFunction. A WindowFunction is the most generic evaluation function and receives the window object (i.e, the metadata of the window), the list of window elements, and the window key (in case of a keyed window) as parameters.
These are the components that constitute Flink&rsquo;s windowing mechanics. We now show step-by-step how to implement custom windowing logic with the DataStream API. We start with a stream of type DataStream[IN] and key it using a key selector function that extracts a key of type KEY to obtain a KeyedStream[IN, KEY].
val input: DataStream[IN] = ... // created a keyed stream using a key selector function val keyed: KeyedStream[IN, KEY] = input .keyBy(myKeySel: (IN) =&gt; KEY) We apply a WindowAssigner[IN, WINDOW] that creates windows of type WINDOW resulting in a WindowedStream[IN, KEY, WINDOW]. In addition, a WindowAssigner also provides a default Trigger implementation.
// create windowed stream using a WindowAssigner var windowed: WindowedStream[IN, KEY, WINDOW] = keyed .window(myAssigner: WindowAssigner[IN, WINDOW]) We can explicitly specify a Trigger to overwrite the default Trigger provided by the WindowAssigner. Note that specifying a triggers does not add an additional trigger condition but replaces the current trigger.
// override the default trigger of the WindowAssigner windowed = windowed .trigger(myTrigger: Trigger[IN, WINDOW]) We may want to specify an optional Evictor as follows.
// specify an optional evictor windowed = windowed .evictor(myEvictor: Evictor[IN, WINDOW]) Finally, we apply a WindowFunction that returns elements of type OUT to obtain a DataStream[OUT].
// apply window function to windowed stream val output: DataStream[OUT] = windowed .apply(myWinFunc: WindowFunction[IN, OUT, KEY, WINDOW]) With Flink&rsquo;s internal windowing mechanics and its exposure through the DataStream API it is possible to implement very custom windowing logic such as session windows or windows that emit early results if the values exceed a certain threshold.
Conclusion # Support for various types of windows over continuous data streams is a must-have for modern stream processors. Apache Flink is a stream processor with a very strong feature set, including a very flexible mechanism to build and evaluate windows over continuous data streams. Flink provides pre-defined window operators for common uses cases as well as a toolbox that allows to define very custom windowing logic. The Flink community will add more pre-defined window operators as we learn the requirements from our users.
`}),e.add({id:246,href:"/2015/11/27/flink-0.10.1-released/",title:"Flink 0.10.1 released",section:"Flink Blog",content:`Today, the Flink community released the first bugfix release of the 0.10 series of Flink.
We recommend all users updating to this release, by bumping the version of your Flink dependencies and updating the binaries on the server.
Issues fixed # [FLINK-2879] - Links in documentation are broken [FLINK-2938] - Streaming docs not in sync with latest state changes [FLINK-2942] - Dangling operators in web UI&#39;s program visualization (non-deterministic) [FLINK-2967] - TM address detection might not always detect the right interface on slow networks / overloaded JMs [FLINK-2977] - Cannot access HBase in a Kerberos secured Yarn cluster [FLINK-2987] - Flink 0.10 fails to start on YARN 2.6.0 [FLINK-2989] - Job Cancel button doesn&#39;t work on Yarn [FLINK-3005] - Commons-collections object deserialization remote command execution vulnerability [FLINK-3011] - Cannot cancel failing/restarting streaming job from the command line [FLINK-3019] - CLI does not list running/restarting jobs [FLINK-3020] - Local streaming execution: set number of task manager slots to the maximum parallelism [FLINK-3024] - TimestampExtractor Does not Work When returning Long.MIN_VALUE [FLINK-3032] - Flink does not start on Hadoop 2.7.1 (HDP), due to class conflict [FLINK-3043] - Kafka Connector description in Streaming API guide is wrong/outdated [FLINK-3047] - Local batch execution: set number of task manager slots to the maximum parallelism [FLINK-3052] - Optimizer does not push properties out of bulk iterations [FLINK-2966] - Improve the way job duration is reported on web frontend. [FLINK-2974] - Add periodic offset commit to Kafka Consumer if checkpointing is disabled [FLINK-3028] - Cannot cancel restarting job via web frontend [FLINK-3040] - Add docs describing how to configure State Backends [FLINK-3041] - Twitter Streaming Description section of Streaming Programming guide refers to an incorrect example &#39;TwitterLocal&#39; `}),e.add({id:247,href:"/2015/11/16/announcing-apache-flink-0.10.0/",title:"Announcing Apache Flink 0.10.0",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the availability of the 0.10.0 release. The community put significant effort into improving and extending Apache Flink since the last release, focusing on data stream processing and operational features. About 80 contributors provided bug fixes, improvements, and new features such that in total more than 400 JIRA issues could be resolved.
For Flink 0.10.0, the focus of the community was to graduate the DataStream API from beta and to evolve Apache Flink into a production-ready stream data processor with a competitive feature set. These efforts resulted in support for event-time and out-of-order streams, exactly-once guarantees in the case of failures, a very flexible windowing mechanism, sophisticated operator state management, and a highly-available cluster operation mode. Flink 0.10.0 also brings a new monitoring dashboard with real-time system and job monitoring capabilities. Both batch and streaming modes of Flink benefit from the new high availability and improved monitoring features. Needless to say that Flink 0.10.0 includes many more features, improvements, and bug fixes.
We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!
New Features # Event-time Stream Processing # Many stream processing applications consume data from sources that produce events with associated timestamps such as sensor or user-interaction events. Very often, events have to be collected from several sources such that it is usually not guaranteed that events arrive in the exact order of their timestamps at the stream processor. Consequently, stream processors must take out-of-order elements into account in order to produce results which are correct and consistent with respect to the timestamps of the events. With release 0.10.0, Apache Flink supports event-time processing as well as ingestion-time and processing-time processing. See FLINK-2674 for details.
Stateful Stream Processing # Operators that maintain and update state are a common pattern in many stream processing applications. Since streaming applications tend to run for a very long time, operator state can become very valuable and impossible to recompute. In order to enable fault-tolerance, operator state must be backed up to persistent storage in regular intervals. Flink 0.10.0 offers flexible interfaces to define, update, and query operator state and hooks to connect various state backends.
Highly-available Cluster Operations # Stream processing applications may be live for months. Therefore, a production-ready stream processor must be highly-available and continue to process data even in the face of failures. With release 0.10.0, Flink supports high availability modes for standalone cluster and YARN setups, eliminating any single point of failure. In this mode, Flink relies on Apache Zookeeper for leader election and persisting small sized meta-data of running jobs. You can check out the documentation to see how to enable high availability. See FLINK-2287 for details.
Graduated DataStream API # The DataStream API was revised based on user feedback and with foresight for upcoming features and graduated from beta status to fully supported. The most obvious changes are related to the methods for stream partitioning and window operations. The new windowing system is based on the concepts of window assigners, triggers, and evictors, inspired by the Dataflow Model. The new API is fully described in the DataStream API documentation. This migration guide will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0. See FLINK-2674 and FLINK-2877 for details.
New Connectors for Data Streams # Apache Flink 0.10.0 features DataStream sources and sinks for many common data producers and stores. This includes an exactly-once rolling file sink which supports any file system, including HDFS, local FS, and S3. We also updated the Apache Kafka producer to use the new producer API, and added a connectors for ElasticSearch and Apache Nifi. More connectors for DataStream programs will be added by the community in the future. See the following JIRA issues for details FLINK-2583, FLINK-2386, FLINK-2372, FLINK-2740, and FLINK-2558.
New Web Dashboard &amp; Real-time Monitoring # The 0.10.0 release features a newly designed and significantly improved monitoring dashboard for Apache Flink. The new dashboard visualizes the progress of running jobs and shows real-time statistics of processed data volumes and record counts. Moreover, it gives access to resource usage and JVM statistics of TaskManagers including JVM heap usage and garbage collection details. The following screenshot shows the job view of the new dashboard.
The web server that provides all monitoring statistics has been designed with a REST interface allowing other systems to also access the internal system metrics. See FLINK-2357 for details.
Off-heap Managed Memory # Flink’s internal operators (such as its sort algorithm and hash tables) write data to and read data from managed memory to achieve memory-safe operations and reduce garbage collection overhead. Until version 0.10.0, managed memory was allocated only from JVM heap memory. With this release, managed memory can also be allocated from off-heap memory. This will facilitate shorter TaskManager start-up times as well as reduce garbage collection pressure. See the documentation to learn how to configure managed memory on off-heap memory. JIRA issue FLINK-1320 contains further details.
Outer Joins # Outer joins have been one of the most frequently requested features for Flink’s DataSet API. Although there was a workaround to implement outer joins as CoGroup function, it had significant drawbacks including added code complexity and not being fully memory-safe. With release 0.10.0, Flink adds native support for left, right, and full outer joins to the DataSet API. All outer joins are backed by a memory-safe operator implementation that leverages Flink’s managed memory. See FLINK-687 and FLINK-2107 for details.
Gelly: Major Improvements and Scala API # Gelly is Flink’s API and library for processing and analyzing large-scale graphs. Gelly was introduced with release 0.9.0 and has been very well received by users and contributors. Based on user feedback, Gelly has been improved since then. In addition, Flink 0.10.0 introduces a Scala API for Gelly. See FLINK-2857 and FLINK-1962 for details.
More Improvements and Fixes # The Flink community resolved more than 400 issues. The following list is a selection of new features and fixed bugs.
FLINK-1851 Java Table API does not support Casting FLINK-2152 Provide zipWithIndex utility in flink-contrib FLINK-2158 NullPointerException in DateSerializer. FLINK-2240 Use BloomFilter to minimize probe side records which are spilled to disk in Hybrid-Hash-Join FLINK-2533 Gap based random sample optimization FLINK-2555 Hadoop Input/Output Formats are unable to access secured HDFS clusters FLINK-2565 Support primitive arrays as keys FLINK-2582 Document how to build Flink with other Scala versions FLINK-2584 ASM dependency is not shaded away FLINK-2689 Reusing null object for joins with SolutionSet FLINK-2703 Remove log4j classes from fat jar / document how to use Flink with logback FLINK-2763 Bug in Hybrid Hash Join: Request to spill a partition with less than two buffers. FLINK-2767 Add support Scala 2.11 to Scala shell FLINK-2774 Import Java API classes automatically in Flink&rsquo;s Scala shell FLINK-2782 Remove deprecated features for 0.10 FLINK-2800 kryo serialization problem FLINK-2834 Global round-robin for temporary directories FLINK-2842 S3FileSystem is broken FLINK-2874 Certain Avro generated getters/setters not recognized FLINK-2895 Duplicate immutable object creation FLINK-2964 MutableHashTable fails when spilling partitions without overflow segments Notice # As previously announced, Flink 0.10.0 no longer supports Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015). Also note that some methods in the DataStream API had to be renamed as part of the API rework. For example the groupBy method has been renamed to keyBy and the windowing API changed. This migration guide will help to port your Flink 0.9 DataStream programs to the revised API of Flink 0.10.0.
Contributors # Alexander Alexandrov Marton Balassi Enrique Bautista Faye Beligianni Bryan Bende Ajay Bhat Chris Brinkman Dmitry Buzdin Kun Cao Paris Carbone Ufuk Celebi Shivani Chandna Liang Chen Felix Cheung Hubert Czerpak Vimal Das Behrouz Derakhshan Suminda Dharmasena Stephan Ewen Fengbin Fang Gyula Fora Lun Gao Gabor Gevay Piotr Godek Sachin Goel Anton Haglund Gábor Hermann Greg Hogan Fabian Hueske Martin Junghanns Vasia Kalavri Ulf Karlsson Frederick F. Kautz Samia Khalid Johannes Kirschnick Kostas Kloudas Alexander Kolb Johann Kovacs Aljoscha Krettek Sebastian Kruse Andreas Kunft Chengxiang Li Chen Liang Andra Lungu Suneel Marthi Tamara Mendt Robert Metzger Maximilian Michels Chiwan Park Sahitya Pavurala Pietro Pinoli Ricky Pogalz Niraj Rai Lokesh Rajaram Johannes Reifferscheid Till Rohrmann Henry Saputra Matthias Sax Shiti Saxena Chesnay Schepler Peter Schrott Saumitra Shahapure Nikolaas Steenbergen Thomas Sun Peter Szabo Viktor Taranenko Kostas Tzoumas Pieter-Jan Van Aeken Theodore Vasiloudis Timo Walther Chengxuan Wang Huang Wei Dawid Wysakowicz Rerngvit Yanggratoke Nezih Yigitbasi Ted Yu Rucong Zhang Vyacheslav Zholudev Zoltán Zvara `}),e.add({id:248,href:"/2015/09/16/off-heap-memory-in-apache-flink-and-the-curious-jit-compiler/",title:"Off-heap Memory in Apache Flink and the curious JIT compiler",section:"Flink Blog",content:`Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, &ldquo;off-heap&rdquo; has become almost something like a magic word to solve these problems.
In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java&rsquo;s JIT compiler for highly optimized methods and loops.
Recap: Memory Management in Flink # To understand Flink’s approach to off-heap memory, we need to recap Flink’s approach to custom managed memory. We have written an earlier blog post about how Flink manages JVM memory itself
As a summary, the core part is that Flink implements its algorithms not against Java objects, arrays, or lists, but actually against a data structure similar to java.nio.ByteBuffer. Flink uses its own specialized version, called MemorySegment on which algorithms put and get at specific positions ints, longs, byte arrays, etc, and compare and copy memory. The memory segments are held and distributed by a central component (called MemoryManager) from which algorithms request segments according to their calculated memory budgets.
Don&rsquo;t believe that this can be fast? Have a look at the benchmarks in the earlier blogpost, which show that it is actually often much faster than working on objects, due to better control over data layout (cache efficiency, data size), and reducing the pressure on Java&rsquo;s Garbage Collector.
This form of memory management has been in Flink for a long time. Anecdotally, the first public demo of Flink&rsquo;s predecessor project Stratosphere, at the VLDB conference in 2010, was running its programs with custom managed memory (although I believe few attendees were aware of that).
Why actually bother with off-heap memory? # Given that Flink has a sophisticated level of managing on-heap memory, why do we even bother with off-heap memory? It is true that &ldquo;out of memory&rdquo; has been much less of a problem for Flink because of its heap memory management techniques. Nonetheless, there are a few good reasons to offer the possibility to move Flink&rsquo;s managed memory out of the JVM heap:
Very large JVMs (100s of GBytes heap memory) tend to be tricky. It takes long to start them (allocate and initialize heap) and garbage collection stalls can be huge (minutes). While newer incremental garbage collectors (like G1) mitigate this problem to some extend, an even better solution is to just make the heap much smaller and allocate Flink&rsquo;s managed memory chunks outside the heap.
I/O and network efficiency: In many cases, we write MemorySegments to disk (spilling) or to the network (data transfer). Off-heap memory can be written/transferred with zero copies, while heap memory always incurs an additional memory copy.
Off-heap memory can actually be owned by other processes. That way, cached data survives process crashes (due to user code exceptions) and can be used for recovery. Flink does not exploit that, yet, but it is interesting future work.
The opposite question is also valid. Why should Flink ever not use off-heap memory?
On-heap is easier and interplays better with tools. Some container environments and monitoring tools get confused when the monitored heap size does not remotely reflect the amount of memory used by the process.
Short lived memory segments are cheaper on the heap. Flink sometimes needs to allocate some short lived buffers, which works cheaper on the heap than off-heap.
Some operations are actually a bit faster on heap memory (or the JIT compiler understands them better).
The off-heap Memory Implementation # Given that all memory intensive internal algorithms are already implemented against the MemorySegment, our implementation to switch to off-heap memory is actually trivial. You can compare it to replacing all ByteBuffer.allocate(numBytes) calls with ByteBuffer.allocateDirect(numBytes). In Flink&rsquo;s case it meant that we made the MemorySegment abstract and added the HeapMemorySegment and OffHeapMemorySegment subclasses. The OffHeapMemorySegment takes the off-heap memory pointer from a java.nio.DirectByteBuffer and implements its specialized access methods using sun.misc.Unsafe. We also made a few adjustments to the startup scripts and the deployment code to make sure that the JVM is permitted enough off-heap memory (direct memory, -XX:MaxDirectMemorySize).
In practice we had to go one step further, to make the implementation perform well. While the ByteBuffer is used in I/O code paths to compose headers and move bulk memory into place, the MemorySegment is part of the innermost loops of many algorithms (sorting, hash tables, &hellip;). That means that the access methods have to be as fast as possible.
Understanding the JIT and tuning the implementation # The MemorySegment was (before our change) a standalone class, it was final (had no subclasses). Via Class Hierarchy Analysis (CHA), the JIT compiler was able to determine that all of the accessor method calls go to one specific implementation. That way, all method calls can be perfectly de-virtualized and inlined, which is essential to performance, and the basis for all further optimizations (like vectorization of the calling loop).
With two different memory segments loaded at the same time, the JIT compiler cannot perform the same level of optimization any more, which results in a noticeable difference in performance: A slowdown of about 2.7 x in the following example:
Writing 100000 x 32768 bytes to 32768 bytes segment: HeapMemorySegment (standalone) : 1,441 msecs OffHeapMemorySegment (standalone) : 1,628 msecs HeapMemorySegment (subclass) : 3,841 msecs OffHeapMemorySegment (subclass) : 3,847 msecs To get back to the original performance, we explored two approaches:
Approach 1: Make sure that only one memory segment implementation is ever loaded. # We re-structured the code a bit to make sure that all places that produce long-lived and short-lived memory segments instantiate the same MemorySegment subclass (Heap- or Off-Heap segment). Using factories rather than directly instantiating the memory segment classes, this was straightforward.
Experiments (see appendix) showed that the JIT compiler properly detects this (via hierarchy analysis) and that it can perform the same level of aggressive optimization as before, when there was only one MemorySegment class.
Approach 2: Write one segment that handles both heap and off-heap memory # We created a class HybridMemorySegment which handles transparently both heap- and off-heap memory. It can be initialized either with a byte array (heap memory), or with a pointer to a memory region outside the heap (off-heap memory).
Fortunately, there is a nice trick to do this without introducing code branches and specialized handling of the two different memory types. The trick is based on the way that the sun.misc.Unsafe methods interpret object references. To illustrate this, we take the method that gets a long integer from a memory position:
sun.misc.Unsafe.getLong(Object reference, long offset) The method accepts an object reference, takes its memory address, and add the offset to obtain a pointer. It then fetches the eight bytes at the address pointed to and interprets them as a long integer. Since the method accepts null as the reference (and interprets it a zero) one can write a method that fetches a long integer seamlessly from heap and off-heap memory as follows:
public class HybridMemorySegment { private final byte[] heapMemory; // non-null in heap case, null in off-heap case private final long address; // may be absolute, or relative to byte[] // method of interest public long getLong(int pos) { return UNSAFE.getLong(heapMemory, address + pos); } // initialize for heap memory public HybridMemorySegment(byte[] heapMemory) { this.heapMemory = heapMemory; this.address = UNSAFE.arrayBaseOffset(byte[].class) } // initialize for off-heap memory public HybridMemorySegment(long offheapPointer) { this.heapMemory = null; this.address = offheapPointer } } To check whether both cases (heap and off-heap) really result in the same code paths (no hidden branches inside the Unsafe.getLong(Object, long) method) one can check out the C++ source code of sun.misc.Unsafe, available here: http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/prims/unsafe.cpp
Of particular interest is the macro in line 155, which is the base of all GET methods. Tracing the function calls (many are no-ops), one can see that both variants of Unsafe’s getLong() result in the same code: Either 0 + absolutePointer or objectRefAddress + offset.
Summary # We ended up choosing a combination of both techniques:
For off-heap memory, we use the HybridMemorySegment from approach (2) which can represent both heap and off-heap memory. That way, the same class represents the long-lived off-heap memory as the short-lived temporary buffers allocated (or wrapped) on the heap.
We follow approach (1) to use factories to make sure that one segment is ever only loaded, which gives peak performance. We can exploit the performance benefits of the HeapMemorySegment on individual byte operations, and we have a mechanism in place to add further implementations of MemorySegments for the case that Oracle really removes sun.misc.Unsafe in future Java versions.
The final code can be found in the Flink repository, under https://github.com/apache/flink/tree/master/flink-core/src/main/java/org/apache/flink/core/memory
Detailed micro benchmarks are in the appendix. A summary of the findings is as follows:
The HybridMemorySegment performs equally well in heap and off-heap memory, as is to be expected (the code paths are the same)
The HeapMemorySegment is quite a bit faster in reading individual bytes, not so much at writing them. Access to a byte[] is after all a bit cheaper than an invocation of a sun.misc.Unsafe method, even when JIT-ed.
The abstract class MemorySegment (with its subclasses HeapMemorySegment and HybridMemorySegment) performs as well as any specialized non-abstract class, as long as only one subclass is loaded. When both are loaded, performance may suffer by a factor of 2.7 x on certain operations.
How badly the performance degrades in cases where both MemorySegment subclasses are loaded seems to depend a lot on which subclass is loaded and operated on before and after which. Sometimes, performance is affected more than other times. It seems to be an artifact of the JIT’s code profiling and how heavily it performs optimistic specialization towards certain subclasses.
There is still a bit of mystery left, specifically why sometimes code is faster when it performs more checks (has more instructions and an additional branch). Even though the branch is perfectly predictable, this seems counter-intuitive. The only explanation that we could come up with is that the branch optimizations (such as optimistic elimination etc) result in code that does better register allocation (for whatever reason, maybe the intermediate instructions just fit the allocation algorithm better).
tl;dr # Off-heap memory in Flink complements the already very fast on-heap memory management. It improves the scalability to very large heap sizes and reduces memory copies for network and disk I/O.
Flink’s already present memory management infrastructure made the addition of off-heap memory simple. Off-heap memory is not only used for caching data, Flink can actually sort data off-heap and build hash tables off-heap.
We play a few nice tricks in the implementation to make sure the code is as friendly as possible to the JIT compiler and processor, to make the managed memory accesses are as fast as possible.
Understanding the JVM’s JIT compiler is tough - one needs a lot of (randomized) micro benchmarking to examine its behavior.
Appendix: Detailed Micro Benchmarks # These microbenchmarks test the performance of the different memory segment implementations on various operation.
Each experiments tests the different implementations multiple times in different orders, to balance the advantage/disadvantage of the JIT compiler specializing towards certain code paths. All experiments were run 5x, discarding the fastest and slowest run, and then averaged. This compensated for delay before the JIT kicks in.
My setup:
Oracle Java 8 (1.8.0_25) 4 GBytes JVM heap (the experiments need 1.4 GBytes Heap + 1 GBytes direct memory) Intel Core i7-4700MQ CPU, 2.40GHz (4 cores, 8 hardware contexts) The tested implementations are
Type Description HeapMemorySegment (exclusive) The case where it is the only loaded MemorySegment subclass. HeapMemorySegment (mixed) The case where both the HeapMemorySegment and the HybridMemorySegment are loaded. HybridMemorySegment (heap-exclusive) Backed by heap memory, and the case where it is the only loaded MemorySegment class. HybridMemorySegment (heap-mixed) Backed by heap memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded. HybridMemorySegment (off-heap-exclusive) Backed by off-heap memory, and the case where it is the only loaded MemorySegment class. HybridMemorySegment (off-heap-mixed) Backed by heap off-memory, and the case where both the HeapMemorySegment and the HybridMemorySegment are loaded. PureHeapSegment Has no class hierarchy and virtual methods at all. PureHybridSegment (heap) Has no class hierarchy and virtual methods at all, backed by heap memory. PureHybridSegment (off-heap) Has no class hierarchy and virtual methods at all, backed by off-heap memory. Byte accesses Writing 100000 x 32768 bytes to 32768 bytes segment
Segment Time HeapMemorySegment, exclusive 1,441 msecs HeapMemorySegment, mixed 3,841 msecs HybridMemorySegment, heap, exclusive 1,626 msecs HybridMemorySegment, off-heap, exclusive 1,628 msecs HybridMemorySegment, heap, mixed 3,848 msecs HybridMemorySegment, off-heap, mixed 3,847 msecs PureHeapSegment 1,442 msecs PureHybridSegment, heap 1,623 msecs PureHybridSegment, off-heap 1,620 msecs Reading 100000 x 32768 bytes from 32768 bytes segment
Segment Time HeapMemorySegment, exclusive 1,326 msecs HeapMemorySegment, mixed 1,378 msecs HybridMemorySegment, heap, exclusive 2,029 msecs HybridMemorySegment, off-heap, exclusive 2,030 msecs HybridMemorySegment, heap, mixed 2,047 msecs HybridMemorySegment, off-heap, mixed 2,049 msecs PureHeapSegment 1,331 msecs PureHybridSegment, heap 2,030 msecs PureHybridSegment, off-heap 2,030 msecs Writing 10 x 1073741824 bytes to 1073741824 bytes segment
Segment Time HeapMemorySegment, exclusive 5,602 msecs HeapMemorySegment, mixed 12,570 msecs HybridMemorySegment, heap, exclusive 5,691 msecs HybridMemorySegment, off-heap, exclusive 5,691 msecs HybridMemorySegment, heap, mixed 12,566 msecs HybridMemorySegment, off-heap, mixed 12,556 msecs PureHeapSegment 5,599 msecs PureHybridSegment, heap 5,687 msecs PureHybridSegment, off-heap 5,681 msecs Reading 10 x 1073741824 bytes from 1073741824 bytes segment
Segment Time HeapMemorySegment, exclusive 4,243 msecs HeapMemorySegment, mixed 4,265 msecs HybridMemorySegment, heap, exclusive 6,730 msecs HybridMemorySegment, off-heap, exclusive 6,725 msecs HybridMemorySegment, heap, mixed 6,933 msecs HybridMemorySegment, off-heap, mixed 6,926 msecs PureHeapSegment 4,247 msecs PureHybridSegment, heap 6,919 msecs PureHybridSegment, off-heap 6,916 msecs Byte Array accesses Writing 100000 x 32 byte[1024] to 32768 bytes segment
Segment Time HeapMemorySegment, mixed 164 msecs HybridMemorySegment, heap, mixed 163 msecs HybridMemorySegment, off-heap, mixed 163 msecs PureHeapSegment 165 msecs PureHybridSegment, heap 182 msecs PureHybridSegment, off-heap 176 msecs Reading 100000 x 32 byte[1024] from 32768 bytes segment
Segment Time HeapMemorySegment, mixed 157 msecs HybridMemorySegment, heap, mixed 155 msecs HybridMemorySegment, off-heap, mixed 162 msecs PureHeapSegment 161 msecs PureHybridSegment, heap 175 msecs PureHybridSegment, off-heap 179 msecs Writing 10 x 1048576 byte[1024] to 1073741824 bytes segment Segment Time HeapMemorySegment, mixed 1,164 msecs HybridMemorySegment, heap, mixed 1,173 msecs HybridMemorySegment, off-heap, mixed 1,157 msecs PureHeapSegment 1,169 msecs PureHybridSegment, heap 1,174 msecs PureHybridSegment, off-heap 1,166 msecs Reading 10 x 1048576 byte[1024] from 1073741824 bytes segment
Segment Time HeapMemorySegment, mixed 854 msecs HybridMemorySegment, heap, mixed 853 msecs HybridMemorySegment, off-heap, mixed 854 msecs PureHeapSegment 857 msecs PureHybridSegment, heap 896 msecs PureHybridSegment, off-heap 887 msecs Long integer accesses (note that the heap and off-heap segments use the same or comparable code for this)
Writing 100000 x 4096 longs to 32768 bytes segment
Segment Time HeapMemorySegment, mixed 221 msecs HybridMemorySegment, heap, mixed 222 msecs HybridMemorySegment, off-heap, mixed 221 msecs PureHeapSegment 194 msecs PureHybridSegment, heap 220 msecs PureHybridSegment, off-heap 221 msecs Reading 100000 x 4096 longs from 32768 bytes segment
Segment Time HeapMemorySegment, mixed 233 msecs HybridMemorySegment, heap, mixed 232 msecs HybridMemorySegment, off-heap, mixed 231 msecs PureHeapSegment 232 msecs PureHybridSegment, heap 232 msecs PureHybridSegment, off-heap 233 msecs Writing 10 x 134217728 longs to 1073741824 bytes segment
Segment Time HeapMemorySegment, mixed 1,120 msecs HybridMemorySegment, heap, mixed 1,120 msecs HybridMemorySegment, off-heap, mixed 1,115 msecs PureHeapSegment 1,148 msecs PureHybridSegment, heap 1,116 msecs PureHybridSegment, off-heap 1,113 msecs Reading 10 x 134217728 longs from 1073741824 bytes segment
Segment Time HeapMemorySegment, mixed 1,097 msecs HybridMemorySegment, heap, mixed 1,099 msecs HybridMemorySegment, off-heap, mixed 1,093 msecs PureHeapSegment 917 msecs PureHybridSegment, heap 1,105 msecs PureHybridSegment, off-heap 1,097 msecs Integer accesses (note that the heap and off-heap segments use the same or comparable code for this)
Writing 100000 x 8192 ints to 32768 bytes segment
Segment Time HeapMemorySegment, mixed 578 msecs HybridMemorySegment, heap, mixed 580 msecs HybridMemorySegment, off-heap, mixed 576 msecs PureHeapSegment 624 msecs PureHybridSegment, heap 576 msecs PureHybridSegment, off-heap 578 msecs Reading 100000 x 8192 ints from 32768 bytes segment
Segment Time HeapMemorySegment, mixed 464 msecs HybridMemorySegment, heap, mixed 464 msecs HybridMemorySegment, off-heap, mixed 465 msecs PureHeapSegment 463 msecs PureHybridSegment, heap 464 msecs PureHybridSegment, off-heap 463 msecs Writing 10 x 268435456 ints to 1073741824 bytes segment
Segment Time HeapMemorySegment, mixed 2,187 msecs HybridMemorySegment, heap, mixed 2,161 msecs HybridMemorySegment, off-heap, mixed 2,152 msecs PureHeapSegment 2,770 msecs PureHybridSegment, heap 2,161 msecs PureHybridSegment, off-heap 2,157 msecs Reading 10 x 268435456 ints from 1073741824 bytes segment
Segment Time HeapMemorySegment, mixed 1,782 msecs HybridMemorySegment, heap, mixed 1,783 msecs HybridMemorySegment, off-heap, mixed 1,774 msecs PureHeapSegment 1,501 msecs PureHybridSegment, heap 1,774 msecs PureHybridSegment, off-heap 1,771 msecs `}),e.add({id:249,href:"/2015/09/03/announcing-flink-forward-2015/",title:"Announcing Flink Forward 2015",section:"Flink Blog",content:`Flink Forward 2015 is the first conference with Flink at its center that aims to bring together the Apache Flink community in a single place. The organizers are starting this conference in October 12 and 13 from Berlin, the place where Apache Flink started.
The conference program has been announced by the organizers and a program committee consisting of Flink PMC members. The agenda contains talks from industry and academia as well as a dedicated session on hands-on Flink training.
Some highlights of the talks include
A keynote by William Vambenepe, lead of the product management team responsible for Big Data services on Google Cloud Platform (BigQuery, Dataflow, etc&hellip;) on data streaming, Google Cloud Dataflow, and Apache Flink.
Talks by several practitioners on how they are putting Flink to work in their projects, including ResearchGate, Bouygues Telecom, Amadeus, Telefonica, Capital One, Ericsson, and Otto Group.
Talks on how open source projects, including Apache Mahout, Apache SAMOA (incubating), Apache Zeppelin (incubating), Apache BigTop, and Apache Storm integrate with Apache Flink.
Talks by Flink committers on several aspects of the system, such as fault tolerance, the internal runtime architecture, and others.
Check out the schedule and register for the conference.
`}),e.add({id:250,href:"/2015/09/01/apache-flink-0.9.1-available/",title:"Apache Flink 0.9.1 available",section:"Flink Blog",content:`The Flink community is happy to announce that Flink 0.9.1 is now available.
0.9.1 is a maintenance release, which includes a lot of minor fixes across several parts of the system. We suggest all users of Flink to work with this latest stable version.
Download the release and [check out the documentation]({{ site.docs-stable }}). Feedback through the Flink mailing lists is, as always, very welcome!
The following issues were fixed for this release:
FLINK-1916 EOFException when running delta-iteration job FLINK-2089 &ldquo;Buffer recycled&rdquo; IllegalStateException during cancelling FLINK-2189 NullPointerException in MutableHashTable FLINK-2205 Confusing entries in JM Webfrontend Job Configuration section FLINK-2229 Data sets involving non-primitive arrays cannot be unioned FLINK-2238 Scala ExecutionEnvironment.fromCollection does not work with Sets FLINK-2248 Allow disabling of sdtout logging output FLINK-2257 Open and close of RichWindowFunctions is not called FLINK-2262 ParameterTool API misnamed function FLINK-2280 GenericTypeComparator.compare() does not respect ascending flag FLINK-2285 Active policy emits elements of the last window twice FLINK-2286 Window ParallelMerge sometimes swallows elements of the last window FLINK-2293 Division by Zero Exception FLINK-2298 Allow setting custom YARN application names through the CLI FLINK-2347 Rendering problem with Documentation website FLINK-2353 Hadoop mapred IOFormat wrappers do not respect JobConfigurable interface FLINK-2356 Resource leak in checkpoint coordinator FLINK-2361 CompactingHashTable loses entries FLINK-2362 distinct is missing in DataSet API documentation FLINK-2381 Possible class not found Exception on failed partition producer FLINK-2384 Deadlock during partition spilling FLINK-2386 Implement Kafka connector using the new Kafka Consumer API FLINK-2394 HadoopOutFormat OutputCommitter is default to FileOutputCommiter FLINK-2412 Race leading to IndexOutOfBoundsException when querying for buffer while releasing SpillablePartition FLINK-2422 Web client is showing a blank page if &ldquo;Meta refresh&rdquo; is disabled in browser FLINK-2424 InstantiationUtil.serializeObject(Object) does not close output stream FLINK-2437 TypeExtractor.analyzePojo has some problems around the default constructor detection FLINK-2442 PojoType fields not supported by field position keys FLINK-2447 TypeExtractor returns wrong type info when a Tuple has two fields of the same POJO type FLINK-2450 IndexOutOfBoundsException in KryoSerializer FLINK-2460 ReduceOnNeighborsWithExceptionITCase failure FLINK-2527 If a VertexUpdateFunction calls setNewVertexValue more than once, the MessagingFunction will only see the first value set FLINK-2540 LocalBufferPool.requestBuffer gets into infinite loop FLINK-2542 It should be documented that it is required from a join key to override hashCode(), when it is not a POJO FLINK-2555 Hadoop Input/Output Formats are unable to access secured HDFS clusters FLINK-2560 Flink-Avro Plugin cannot be handled by Eclipse FLINK-2572 Resolve base path of symlinked executable FLINK-2584 ASM dependency is not shaded away `}),e.add({id:251,href:"/2015/08/24/introducing-gelly-graph-processing-with-apache-flink/",title:"Introducing Gelly: Graph Processing with Apache Flink",section:"Flink Blog",content:`This blog post introduces Gelly, Apache Flink&rsquo;s graph-processing API and library. Flink&rsquo;s native support for iterations makes it a suitable platform for large-scale graph analytics. By leveraging delta iterations, Gelly is able to map various graph processing models such as vertex-centric or gather-sum-apply to Flink dataflows.
Gelly allows Flink users to perform end-to-end data analysis in a single system. Gelly can be seamlessly used with Flink&rsquo;s DataSet API, which means that pre-processing, graph creation, analysis, and post-processing can be done in the same application. At the end of this post, we will go through a step-by-step example in order to demonstrate that loading, transformation, filtering, graph creation, and analysis can be performed in a single Flink program.
Overview
What is Gelly? Graph Representation and Creation Transformations and Utilities Iterative Graph Processing Library of Graph Algorithms Use-Case: Music Profiles Ongoing and Future Work What is Gelly? # Gelly is a Graph API for Flink. It is currently supported in both Java and Scala. The Scala methods are implemented as wrappers on top of the basic Java operations. The API contains a set of utility functions for graph analysis, supports iterative graph processing and introduces a library of graph algorithms.
Back to top
Graph Representation and Creation # In Gelly, a graph is represented by a DataSet of vertices and a DataSet of edges. A vertex is defined by its unique ID and a value, whereas an edge is defined by its source ID, target ID, and value. A vertex or edge for which a value is not specified will simply have the value type set to NullValue.
A graph can be created from:
DataSet of edges and an optional DataSet of vertices using Graph.fromDataSet() DataSet of Tuple3 and an optional DataSet of Tuple2 using Graph.fromTupleDataSet() Collection of edges and an optional Collection of vertices using Graph.fromCollection() In all three cases, if the vertices are not provided, Gelly will automatically produce the vertex IDs from the edge source and target IDs.
Back to top
Transformations and Utilities # These are methods of the Graph class and include common graph metrics, transformations and mutations as well as neighborhood aggregations.
Common Graph Metrics # These methods can be used to retrieve several graph metrics and properties, such as the number of vertices, edges and the node degrees.
Transformations # The transformation methods enable several Graph operations, using high-level functions similar to the ones provided by the batch processing API. These transformations can be applied one after the other, yielding a new Graph after each step, in a fashion similar to operators on DataSets:
inputGraph.getUndirected().mapEdges(new CustomEdgeMapper()); Transformations can be applied on:
Vertices: mapVertices, joinWithVertices, filterOnVertices, addVertex, &hellip; Edges: mapEdges, filterOnEdges, removeEdge, &hellip; Triplets (source vertex, target vertex, edge): getTriplets Neighborhood Aggregations # Neighborhood methods allow vertices to perform an aggregation on their first-hop neighborhood. This provides a vertex-centric view, where each vertex can access its neighboring edges and neighbor values.
reduceOnEdges() provides access to the neighboring edges of a vertex, i.e. the edge value and the vertex ID of the edge endpoint. In order to also access the neighboring vertices’ values, one should call the reduceOnNeighbors() function. The scope of the neighborhood is defined by the EdgeDirection parameter, which can be IN, OUT or ALL, to gather in-coming, out-going or all edges (neighbors) of a vertex.
The two neighborhood functions mentioned above can only be used when the aggregation function is associative and commutative. In case the function does not comply with these restrictions or if it is desirable to return zero, one or more values per vertex, the more general groupReduceOnEdges() and groupReduceOnNeighbors() functions must be called.
Consider the following graph, for instance:
Assume you would want to compute the sum of the values of all incoming neighbors for each vertex. We will call the reduceOnNeighbors() aggregation method since the sum is an associative and commutative operation and the neighbors’ values are needed:
graph.reduceOnNeighbors(new SumValues(), EdgeDirection.IN); The vertex with id 1 is the only node that has no incoming edges. The result is therefore:
Back to top
Iterative Graph Processing # During the past few years, many different programming models for distributed graph processing have been introduced: vertex-centric, partition-centric, gather-apply-scatter, edge-centric, neighborhood-centric. Each one of these models targets a specific class of graph applications and each corresponding system implementation optimizes the runtime respectively. In Gelly, we would like to exploit the flexible dataflow model and the efficient iterations of Flink, to support multiple distributed graph processing models on top of the same system.
Currently, Gelly has methods for writing vertex-centric programs and provides support for programs implemented using the gather-sum(accumulate)-apply model. We are also considering to offer support for the partition-centric computation model, using Fink’s mapPartition() operator. This model exposes the partition structure to the user and allows local graph structure exploitation inside a partition to avoid unnecessary communication.
Vertex-centric # Gelly wraps Flink’s Spargel APi to support the vertex-centric, Pregel-like programming model. Gelly’s runVertexCentricIteration method accepts two user-defined functions:
MessagingFunction: defines what messages a vertex sends out for the next superstep. VertexUpdateFunction:* defines how a vertex will update its value based on the received messages. The method will execute the vertex-centric iteration on the input Graph and return a new Graph, with updated vertex values.
Gelly’s vertex-centric programming model exploits Flink’s efficient delta iteration operators. Many iterative graph algorithms expose non-uniform behavior, where some vertices converge to their final value faster than others. In such cases, the number of vertices that need to be recomputed during an iteration decreases as the algorithm moves towards convergence.
For example, consider a Single Source Shortest Paths problem on the following graph, where S is the source node, i is the iteration counter and the edge values represent distances between nodes:
In each iteration, a vertex receives distances from its neighbors and adopts the minimum of these distances and its current distance as the new value. Then, it propagates its new value to its neighbors. If a vertex does not change value during an iteration, there is no need for it to propagate its old distance to its neighbors; as they have already taken it into account.
Flink’s IterateDelta operator permits exploitation of this property as well as the execution of computations solely on the active parts of the graph. The operator receives two inputs:
the Solution Set, which represents the current state of the input and the Workset, which determines which parts of the graph will be recomputed in the next iteration. In the SSSP example above, the Workset contains the vertices which update their distances. The user-defined iterative function is applied on these inputs to produce state updates. These updates are efficiently applied on the state, which is kept in memory.
Internally, a vertex-centric iteration is a Flink delta iteration, where the initial Solution Set is the vertex set of the input graph and the Workset is created by selecting the active vertices, i.e. the ones that updated their value in the previous iteration. The messaging and vertex-update functions are user-defined functions wrapped inside coGroup operators. In each superstep, the active vertices (Workset) are coGrouped with the edges to generate the neighborhoods for each vertex. The messaging function is then applied on each neighborhood. Next, the result of the messaging function is coGrouped with the current vertex values (Solution Set) and the user-defined vertex-update function is applied on the result. The output of this coGroup operator is finally used to update the Solution Set and create the Workset input for the next iteration.
Gather-Sum-Apply # Gelly supports a variation of the popular Gather-Sum-Apply-Scatter computation model, introduced by PowerGraph. In GSA, a vertex pulls information from its neighbors as opposed to the vertex-centric approach where the updates are pushed from the incoming neighbors. The runGatherSumApplyIteration() accepts three user-defined functions:
GatherFunction: gathers neighboring partial values along in-edges. SumFunction: accumulates/reduces the values into a single one. ApplyFunction: uses the result computed in the sum phase to update the current vertex’s value. Similarly to vertex-centric, GSA leverages Flink’s delta iteration operators as, in many cases, vertex values do not need to be recomputed during an iteration.
Let us reconsider the Single Source Shortest Paths algorithm. In each iteration, a vertex:
Gather retrieves distances from its neighbors summed up with the corresponding edge values; Sum compares the newly obtained distances in order to extract the minimum; Apply and finally adopts the minimum distance computed in the sum step, provided that it is lower than its current value. If a vertex’s value does not change during an iteration, it no longer propagates its distance. Internally, a Gather-Sum-Apply Iteration is a Flink delta iteration where the initial solution set is the vertex input set and the workset is created by selecting the active vertices.
The three functions: gather, sum and apply are user-defined functions wrapped in map, reduce and join operators respectively. In each superstep, the active vertices are joined with the edges in order to create neighborhoods for each vertex. The gather function is then applied on the neighborhood values via a map function. Afterwards, the result is grouped by the vertex ID and reduced using the sum function. Finally, the outcome of the sum phase is joined with the current vertex values (solution set), the values are updated, thus creating a new workset that serves as input for the next iteration.
Back to top
Library of Graph Algorithms # We are building a library of graph algorithms in Gelly, to easily analyze large-scale graphs. These algorithms extend the GraphAlgorithm interface and can be simply executed on the input graph by calling a run() method.
We currently have implementations of the following algorithms:
PageRank Single-Source-Shortest-Paths Label Propagation Community Detection (based on this paper) Connected Components GSA Connected Components GSA PageRank GSA Single-Source-Shortest-Paths Gelly also offers implementations of common graph algorithms through examples. Among them, one can find graph weighting schemes, like Jaccard Similarity and Euclidean Distance Weighting, as well as computation of common graph metrics.
Back to top
Use-Case: Music Profiles # In the following section, we go through a use-case scenario that combines the Flink DataSet API with Gelly in order to process users’ music preferences to suggest additions to their playlist.
First, we read a user’s music profile which is in the form of user-id, song-id and the number of plays that each song has. We then filter out the list of songs the users do not wish to see in their playlist. Then we compute the top songs per user (i.e. the songs a user listened to the most). Finally, as a separate use-case on the same data set, we create a user-user similarity graph based on the common songs and use this resulting graph to detect communities by calling Gelly’s Label Propagation library method.
For running the example implementation, please use the 0.10-SNAPSHOT version of Flink as a dependency. The full example code base can be found here. The public data set used for testing can be found here. This data set contains 48,373,586 real user-id, song-id and play-count triplets.
Note: The code snippets in this post try to reduce verbosity by skipping type parameters of generic functions. Please have a look at the full example for the correct and complete code.
Filtering out Bad Records # After reading the (user-id, song-id, play-count) triplets from a CSV file and after parsing a text file in order to retrieve the list of songs that a user would not want to include in a playlist, we use a coGroup function to filter out the mismatches.
// read the user-song-play triplets. DataSet&lt;Tuple3&lt;String, String, Integer&gt;&gt; triplets = getUserSongTripletsData(env); // read the mismatches dataset and extract the songIDs DataSet&lt;Tuple3&lt;String, String, Integer&gt;&gt; validTriplets = triplets .coGroup(mismatches).where(1).equalTo(0) .with(new CoGroupFunction() { void coGroup(Iterable triplets, Iterable invalidSongs, Collector out) { if (!invalidSongs.iterator().hasNext()) { for (Tuple3 triplet : triplets) { // valid triplet out.collect(triplet); } } } } The coGroup simply takes the triplets whose song-id (second field) matches the song-id from the mismatches list (first field) and if the iterator was empty for a certain triplet, meaning that there were no mismatches found, the triplet associated with that song is collected.
Compute the Top Songs per User # As a next step, we would like to see which songs a user played more often. To this end, we build a user-song weighted, bipartite graph in which edge source vertices are users, edge target vertices are songs and where the weight represents the number of times the user listened to that certain song.
// create a user -&gt; song weighted bipartite graph where the edge weights // correspond to play counts Graph&lt;String, NullValue, Integer&gt; userSongGraph = Graph.fromTupleDataSet(validTriplets, env); Consult the Gelly guide for guidelines on how to create a graph from a given DataSet of edges or from a collection.
To retrieve the top songs per user, we call the groupReduceOnEdges function as it perform an aggregation over the first hop neighborhood taking just the edges into consideration. We will basically iterate through the edge value and collect the target (song) of the maximum weight edge.
//get the top track (most listened to) for each user DataSet&lt;Tuple2&gt; usersWithTopTrack = userSongGraph .groupReduceOnEdges(new GetTopSongPerUser(), EdgeDirection.OUT); class GetTopSongPerUser implements EdgesFunctionWithVertexValue { void iterateEdges(Vertex vertex, Iterable&lt;Edge&gt; edges) { int maxPlaycount = 0; String topSong = &#34;&#34;; for (Edge edge : edges) { if (edge.getValue() &gt; maxPlaycount) { maxPlaycount = edge.getValue(); topSong = edge.getTarget(); } } return new Tuple2(vertex.getId(), topSong); } } Creating a User-User Similarity Graph # Clustering users based on common interests, in this case, common top songs, could prove to be very useful for advertisements or for recommending new musical compilations. In a user-user graph, two users who listen to the same song will simply be linked together through an edge as depicted in the figure below.
To form the user-user graph in Flink, we will simply take the edges from the user-song graph (left-hand side of the image), group them by song-id, and then add all the users (source vertex ids) to an ArrayList.
We then match users who listened to the same song two by two, creating a new edge to mark their common interest (right-hand side of the image).
Afterwards, we perform a distinct() operation to avoid creation of duplicate data. Considering that we now have the DataSet of edges which present interest, creating a graph is as straightforward as a call to the Graph.fromDataSet() method.
// create a user-user similarity graph: // two users that listen to the same song are connected DataSet&lt;Edge&gt; similarUsers = userSongGraph.getEdges() // filter out user-song edges that are below the playcount threshold .filter(new FilterFunction&lt;Edge&lt;String, Integer&gt;&gt;() { public boolean filter(Edge&lt;String, Integer&gt; edge) { return (edge.getValue() &gt; playcountThreshold); } }) .groupBy(1) .reduceGroup(new GroupReduceFunction() { void reduce(Iterable&lt;Edge&gt; edges, Collector&lt;Edge&gt; out) { List users = new ArrayList(); for (Edge edge : edges) { users.add(edge.getSource()); for (int i = 0; i &lt; users.size() - 1; i++) { for (int j = i+1; j &lt; users.size() - 1; j++) { out.collect(new Edge(users.get(i), users.get(j))); } } } } }) .distinct(); Graph similarUsersGraph = Graph.fromDataSet(similarUsers).getUndirected(); After having created a user-user graph, it would make sense to detect the various communities formed. To do so, we first initialize each vertex with a numeric label using the joinWithVertices() function that takes a data set of Tuple2 as a parameter and joins the id of a vertex with the first element of the tuple, afterwards applying a map function. Finally, we call the run() method with the LabelPropagation library method passed as a parameter. In the end, the vertices will be updated to contain the most frequent label among their neighbors.
// detect user communities using label propagation // initialize each vertex with a unique numeric label DataSet&lt;Tuple2&lt;String, Long&gt;&gt; idsWithInitialLabels = DataSetUtils .zipWithUniqueId(similarUsersGraph.getVertexIds()) .map(new MapFunction&lt;Tuple2&lt;Long, String&gt;, Tuple2&lt;String, Long&gt;&gt;() { @Override public Tuple2&lt;String, Long&gt; map(Tuple2&lt;Long, String&gt; tuple2) throws Exception { return new Tuple2&lt;String, Long&gt;(tuple2.f1, tuple2.f0); } }); // update the vertex values and run the label propagation algorithm DataSet&lt;Vertex&gt; verticesWithCommunity = similarUsersGraph .joinWithVertices(idsWithlLabels, new MapFunction() { public Long map(Tuple2 idWithLabel) { return idWithLabel.f1; } }) .run(new LabelPropagation(numIterations)) .getVertices(); Back to top
Ongoing and Future Work # Currently, Gelly matches the basic functionalities provided by most state-of-the-art graph processing systems. Our vision is to turn Gelly into more than “yet another library for running PageRank-like algorithms” by supporting generic iterations, implementing graph partitioning, providing bipartite graph support and by offering numerous other features.
We are also enriching Flink Gelly with a set of operators suitable for highly skewed graphs as well as a Graph API built on Flink Streaming.
In the near future, we would like to see how Gelly can be integrated with graph visualization tools, graph database systems and sampling techniques.
Curious? Read more about our plans for Gelly in the roadmap.
Back to top
Links # Gelly Documentation
`}),e.add({id:252,href:"/2015/06/24/announcing-apache-flink-0.9.0/",title:"Announcing Apache Flink 0.9.0",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.
Download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!
New Features # Exactly-once Fault Tolerance for streaming programs # This release introduces a new fault tolerance mechanism for streaming dataflows. The new checkpointing algorithm takes data sources and also user-defined state into account and recovers failures such that all records are reflected exactly once in the operator states.
The checkpointing algorithm is lightweight and driven by barriers that are periodically injected into the data streams at the sources. As such, it has an extremely low coordination overhead and is able to sustain very high throughput rates. User-defined state can be automatically backed up to configurable storage by the fault tolerance mechanism.
Please refer to the documentation on stateful computation for details in how to use fault tolerant data streams with Flink.
The fault tolerance mechanism requires data sources that can replay recent parts of the stream, such as Apache Kafka. Read more about how to use the persistent Kafka source.
Table API # Flink’s new Table API offers a higher-level abstraction for interacting with structured data sources. The Table API allows users to execute logical, SQL-like queries on distributed data sets while allowing them to freely mix declarative queries with regular Flink operators. Here is an example that groups and joins two tables:
val clickCounts = clicks .groupBy(&#39;user).select(&#39;userId, &#39;url.count as &#39;count) val activeUsers = users.join(clickCounts) .where(&#39;id === &#39;userId &amp;&amp; &#39;count &gt; 10).select(&#39;username, &#39;count, ...) Tables consist of logical attributes that can be selected by name rather than physical Java and Scala data types. This alleviates a lot of boilerplate code for common ETL tasks and raises the abstraction for Flink programs. Tables are available for both static and streaming data sources (DataSet and DataStream APIs).
Check out the Table guide for Java and Scala.
Gelly Graph Processing API # Gelly is a Java Graph API for Flink. It contains a set of utilities for graph analysis, support for iterative graph processing and a library of graph algorithms. Gelly exposes a Graph data structure that wraps DataSets for vertices and edges, as well as methods for creating graphs from DataSets, graph transformations and utilities (e.g., in- and out- degrees of vertices), neighborhood aggregations, iterative vertex-centric graph processing, as well as a library of common graph algorithms, including PageRank, SSSP, label propagation, and community detection.
Gelly internally builds on top of Flink’s delta iterations. Iterative graph algorithms are executed leveraging mutable state, achieving similar performance with specialized graph processing systems.
Gelly will eventually subsume Spargel, Flink’s Pregel-like API.
Note: The Gelly library is still in beta status and subject to improvements and heavy performance tuning.
Check out the Gelly guide.
Flink Machine Learning Library # This release includes the first version of Flink’s Machine Learning library. The library’s pipeline approach, which has been strongly inspired by scikit-learn’s abstraction of transformers and predictors, makes it easy to quickly set up a data processing pipeline and to get your job done.
Flink distinguishes between transformers and predictors. Transformers are components which transform your input data into a new format allowing you to extract features, cleanse your data or to sample from it. Predictors on the other hand constitute the components which take your input data and train a model on it. The model you obtain from the learner can then be evaluated and used to make predictions on unseen data.
Currently, the machine learning library contains transformers and predictors to do multiple tasks. The library supports multiple linear regression using stochastic gradient descent to scale to large data sizes. Furthermore, it includes an alternating least squares (ALS) implementation to factorizes large matrices. The matrix factorization can be used to do collaborative filtering. An implementation of the communication efficient distributed dual coordinate ascent (CoCoA) algorithm is the latest addition to the library. The CoCoA algorithm can be used to train distributed soft-margin SVMs.
Note: The ML library is still in beta status and subject to improvements and heavy performance tuning.
Check out FlinkML
Flink on YARN leveraging Apache Tez # We are introducing a new execution mode for Flink to be able to run restricted Flink programs on top of Apache Tez. This mode retains Flink’s APIs, optimizer, as well as Flink’s runtime operators, but instead of wrapping those in Flink tasks that are executed by Flink TaskManagers, it wraps them in Tez runtime tasks and builds a Tez DAG that represents the program.
By using Flink on Tez, users have an additional choice for an execution platform for Flink programs. While Flink’s distributed runtime favors low latency, streaming shuffles, and iterative algorithms, Tez focuses on scalability and elastic resource usage in shared YARN clusters.
Get started with Flink on Tez.
Reworked Distributed Runtime on Akka # Flink’s RPC system has been replaced by the widely adopted Akka framework. Akka’s concurrency model offers the right abstraction to develop a fast as well as robust distributed system. By using Akka’s own failure detection mechanism the stability of Flink’s runtime is significantly improved, because the system can now react in proper form to node outages. Furthermore, Akka improves Flink’s scalability by introducing asynchronous messages to the system. These asynchronous messages allow Flink to be run on many more nodes than before.
Improved YARN support # Flink’s YARN client contains several improvements, such as a detached mode for starting a YARN session in the background, the ability to submit a single Flink job to a YARN cluster without starting a session, including a &ldquo;fire and forget&rdquo; mode. Flink is now also able to reallocate failed YARN containers to maintain the size of the requested cluster. This feature allows to implement fault-tolerant setups on top of YARN. There is also an internal Java API to deploy and control a running YARN cluster. This is being used by system integrators to easily control Flink on YARN within their Hadoop 2 cluster.
See the YARN docs.
Static Code Analysis for the Flink Optimizer: Opening the UDF blackboxes # This release introduces a first version of a static code analyzer that pre-interprets functions written by the user to get information about the function’s internal dataflow. The code analyzer can provide useful information about forwarded fields to Flink&rsquo;s optimizer and thus speedup job executions. It also informs if the code contains obvious mistakes. For stability reasons, the code analyzer is initially disabled by default. It can be activated through
ExecutionEnvironment.getExecutionConfig().setCodeAnalysisMode(&hellip;)
either as an assistant that gives hints during the implementation or by directly applying the optimizations that have been found.
More Improvements and Fixes # FLINK-1605: Flink is not exposing its Guava and ASM dependencies to Maven projects depending on Flink. We use the maven-shade-plugin to relocate these dependencies into our own namespace. This allows users to use any Guava or ASM version.
FLINK-1417: Automatic recognition and registration of Java Types at Kryo and the internal serializers: Flink has its own type handling and serialization framework falling back to Kryo for types that it cannot handle. To get the best performance Flink is automatically registering all types a user is using in their program with Kryo.Flink also registers serializers for Protocol Buffers, Thrift, Avro and YodaTime automatically. Users can also manually register serializers to Kryo (https://issues.apache.org/jira/browse/FLINK-1399)
FLINK-1296: Add support for sorting very large records
FLINK-1679: &ldquo;degreeOfParallelism&rdquo; methods renamed to “parallelism”
FLINK-1501: Add metrics library for monitoring TaskManagers
FLINK-1760: Add support for building Flink with Scala 2.11
FLINK-1648: Add a mode where the system automatically sets the parallelism to the available task slots
FLINK-1622: Add groupCombine operator
FLINK-1589: Add option to pass Configuration to LocalExecutor
FLINK-1504: Add support for accessing secured HDFS clusters in standalone mode
FLINK-1478: Add strictly local input split assignment
FLINK-1512: Add CsvReader for reading into POJOs.
FLINK-1461: Add sortPartition operator
FLINK-1450: Add Fold operator to the Streaming api
FLINK-1389: Allow setting custom file extensions for files created by the FileOutputFormat
FLINK-1236: Add support for localization of Hadoop Input Splits
FLINK-1179: Add button to JobManager web interface to request stack trace of a TaskManager
FLINK-1105: Add support for locally sorted output
FLINK-1688: Add socket sink
FLINK-1436: Improve usability of command line interface
FLINK-2174: Allow comments in &lsquo;slaves&rsquo; file
FLINK-1698: Add polynomial base feature mapper to ML library
FLINK-1697: Add alternating least squares algorithm for matrix factorization to ML library
FLINK-1792: FLINK-456 Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only
FLINK-1672: Refactor task registration/unregistration
FLINK-2001: DistanceMetric cannot be serialized
FLINK-1676: enableForceKryo() is not working as expected
FLINK-1959: Accumulators BROKEN after Partitioning
FLINK-1696: Add multiple linear regression to ML library
FLINK-1820: Bug in DoubleParser and FloatParser - empty String is not casted to 0
FLINK-1985: Streaming does not correctly forward ExecutionConfig to runtime
FLINK-1828: Impossible to output data to an HBase table
FLINK-1952: Cannot run ConnectedComponents example: Could not allocate a slot on instance
FLINK-1848: Paths containing a Windows drive letter cannot be used in FileOutputFormats
FLINK-1954: Task Failures and Error Handling
FLINK-2004: Memory leak in presence of failed checkpoints in KafkaSource
FLINK-2132: Java version parsing is not working for OpenJDK
FLINK-2098: Checkpoint barrier initiation at source is not aligned with snapshotting
FLINK-2069: writeAsCSV function in DataStream Scala API creates no file
FLINK-2092: Document (new) behavior of print() and execute()
FLINK-2177: NullPointer in task resource release
FLINK-2054: StreamOperator rework removed copy calls when passing output to a chained operator
FLINK-2196: Missplaced Class in flink-java SortPartitionOperator
FLINK-2191: Inconsistent use of Closure Cleaner in Streaming API
FLINK-2206: JobManager webinterface shows 5 finished jobs at most
FLINK-2188: Reading from big HBase Tables
FLINK-1781: Quickstarts broken due to Scala Version Variables
Notice # The 0.9 series of Flink is the last version to support Java 6. If you are still using Java 6, please consider upgrading to Java 8 (Java 7 ended its free support in April 2015).
Flink will require at least Java 7 in major releases after 0.9.0.
`}),e.add({id:253,href:"/2015/05/14/april-2015-in-the-flink-community/",title:"April 2015 in the Flink community",section:"Flink Blog",content:`April was an packed month for Apache Flink.
Flink runner for Google Cloud Dataflow # A Flink runner for Google Cloud Dataflow was announced. See the blog posts by data Artisans and the Google Cloud Platform Blog. Google Cloud Dataflow programs can be written using and open-source SDK and run in multiple backends, either as a managed service inside Google&rsquo;s infrastructure, or leveraging open source runners, including Apache Flink.
Flink 0.9.0-milestone1 release # The highlight of April was of course the availability of Flink 0.9-milestone1. This was a release packed with new features, including, a Python DataSet API, the new SQL-like Table API, FlinkML, a machine learning library on Flink, Gelly, FLink&rsquo;s Graph API, as well as a mode to run Flink on YARN leveraging Tez. In case you missed it, check out the release announcement blog post for details
Conferences and meetups # April kicked off the conference season. Apache Flink was presented at ApacheCon in Texas (slides), the Hadoop Summit in Brussels featured two talks on Flink (see slides here and here), as well as at the Hadoop User Groups of the Netherlands (slides) and Stockholm. The brand new Apache Flink meetup Stockholm was also established.
Google Summer of Code # Three students will work on Flink during Google&rsquo;s Summer of Code program on distributed pattern matching, exact and approximate statistics for data streams and windows, as well as asynchronous iterations and updates.
Flink on the web # Fabian Hueske gave an interview at InfoQ on Apache Flink.
Upcoming events # Stay tuned for a wealth of upcoming events! Two Flink talsk will be presented at Berlin Buzzwords, Flink will be presented at the Hadoop Summit in San Jose. A training workshop on Apache Flink is being organized in Berlin. Finally, Flink Forward, the first conference to bring together the whole Flink community is taking place in Berlin in October 2015.
`}),e.add({id:254,href:"/2015/05/11/juggling-with-bits-and-bytes/",title:"Juggling with Bits and Bytes",section:"Flink Blog",content:` How Apache Flink operates on binary data # Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.
In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.
Data Objects? Let’s put them on the heap! # The most straight-forward approach to process lots of data in a JVM is to put it as objects on the heap and operate on these objects. Caching a data set as objects would be as simple as maintaining a list containing an object for each record. An in-memory sort would simply sort the list of objects. However, this approach has a few notable drawbacks. First of all it is not trivial to watch and control heap memory usage when a lot of objects are created and invalidated constantly. Memory overallocation instantly kills the JVM with an OutOfMemoryError. Another aspect is garbage collection on multi-GB JVMs which are flooded with new objects. The overhead of garbage collection in such environments can easily reach 50% and more. Finally, Java objects come with a certain space overhead depending on the JVM and platform. For data sets with many small objects this can significantly reduce the effectively usable amount of memory. Given proficient system design and careful, use-case specific system parameter tuning, heap memory usage can be more or less controlled and OutOfMemoryErrors avoided. However, such setups are rather fragile especially if data characteristics or the execution environment change.
What is Flink doing about that? # Apache Flink has its roots at a research project which aimed to combine the best technologies of MapReduce-based systems and parallel database systems. Coming from this background, Flink has always had its own way of processing data in-memory. Instead of putting lots of objects on the heap, Flink serializes objects into a fixed number of pre-allocated memory segments. Its DBMS-style sort and join algorithms operate as much as possible on this binary data to keep the de/serialization overhead at a minimum. If more data needs to be processed than can be kept in memory, Flink’s operators partially spill data to disk. In fact, a lot of Flink’s internal implementations look more like C/C++ rather than common Java. The following figure gives a high-level overview of how Flink stores data serialized in memory segments and spills to disk if necessary.
Flink’s style of active memory management and operating on binary data has several benefits:
Memory-safe execution &amp; efficient out-of-core algorithms. Due to the fixed amount of allocated memory segments, it is trivial to monitor remaining memory resources. In case of memory shortage, processing operators can efficiently write larger batches of memory segments to disk and later them read back. Consequently, OutOfMemoryErrors are effectively prevented. Reduced garbage collection pressure. Because all long-lived data is in binary representation in Flink&rsquo;s managed memory, all data objects are short-lived or even mutable and can be reused. Short-lived objects can be more efficiently garbage-collected, which significantly reduces garbage collection pressure. Right now, the pre-allocated memory segments are long-lived objects on the JVM heap, but the Flink community is actively working on allocating off-heap memory for this purpose. This effort will result in much smaller JVM heaps and facilitate even faster garbage collection cycles. Space efficient data representation. Java objects have a storage overhead which can be avoided if the data is stored in a binary representation. Efficient binary operations &amp; cache sensitivity. Binary data can be efficiently compared and operated on given a suitable binary representation. Furthermore, the binary representations can put related values, as well as hash codes, keys, and pointers, adjacently into memory. This gives data structures with usually more cache efficient access patterns. These properties of active memory management are very desirable in a data processing systems for large-scale data analytics but have a significant price tag attached. Active memory management and operating on binary data is not trivial to implement, i.e., using java.util.HashMap is much easier than implementing a spillable hash-table backed by byte arrays and a custom serialization stack. Of course Apache Flink is not the only JVM-based data processing system that operates on serialized binary data. Projects such as Apache Drill, Apache Ignite (incubating) or Apache Geode (incubating) apply similar techniques and it was recently announced that also Apache Spark will evolve into this direction with Project Tungsten.
In the following we discuss in detail how Flink allocates memory, de/serializes objects, and operates on binary data. We will also show some performance numbers comparing processing objects on the heap and operating on binary data.
How does Flink allocate memory? # A Flink worker, called TaskManager, is composed of several internal components such as an actor system for coordination with the Flink master, an IOManager that takes care of spilling data to disk and reading it back, and a MemoryManager that coordinates memory usage. In the context of this blog post, the MemoryManager is of most interest.
The MemoryManager takes care of allocating, accounting, and distributing MemorySegments to data processing operators such as sort and join operators. A MemorySegment is Flink’s distribution unit of memory and is backed by a regular Java byte array (size is 32 KB by default). A MemorySegment provides very efficient write and read access to its backed byte array using Java’s unsafe methods. You can think of a MemorySegment as a custom-tailored version of Java’s NIO ByteBuffer. In order to operate on multiple MemorySegments like on a larger chunk of consecutive memory, Flink uses logical views that implement Java’s java.io.DataOutput and java.io.DataInput interfaces.
MemorySegments are allocated once at TaskManager start-up time and are destroyed when the TaskManager is shut down. Hence, they are reused and not garbage-collected over the whole lifetime of a TaskManager. After all internal data structures of a TaskManager have been initialized and all core services have been started, the MemoryManager starts creating MemorySegments. By default 70% of the JVM heap that is available after service initialization is allocated by the MemoryManager. It is also possible to configure an absolute amount of managed memory. The remaining JVM heap is used for objects that are instantiated during task processing, including objects created by user-defined functions. The following figure shows the memory distribution in the TaskManager JVM after startup.
How does Flink serialize objects? # The Java ecosystem offers several libraries to convert objects into a binary representation and back. Common alternatives are standard Java serialization, Kryo, Apache Avro, Apache Thrift, or Google’s Protobuf. Flink includes its own custom serialization framework in order to control the binary representation of data. This is important because operating on binary data such as comparing or even manipulating binary data requires exact knowledge of the serialization layout. Further, configuring the serialization layout with respect to operations that are performed on binary data can yield a significant performance boost. Flink’s serialization stack also leverages the fact, that the type of the objects which are going through de/serialization are exactly known before a program is executed.
Flink programs can process data represented as arbitrary Java or Scala objects. Before a program is optimized, the data types at each processing step of the program’s data flow need to be identified. For Java programs, Flink features a reflection-based type extraction component to analyze the return types of user-defined functions. Scala programs are analyzed with help of the Scala compiler. Flink represents each data type with a TypeInformation. Flink has TypeInformations for several kinds of data types, including:
BasicTypeInfo: Any (boxed) Java primitive type or java.lang.String. BasicArrayTypeInfo: Any array of a (boxed) Java primitive type or java.lang.String. WritableTypeInfo: Any implementation of Hadoop’s Writable interface. TupleTypeInfo: Any Flink tuple (Tuple1 to Tuple25). Flink tuples are Java representations for fixed-length tuples with typed fields. CaseClassTypeInfo: Any Scala CaseClass (including Scala tuples). PojoTypeInfo: Any POJO (Java or Scala), i.e., an object with all fields either being public or accessible through getters and setter that follow the common naming conventions. GenericTypeInfo: Any data type that cannot be identified as another type. Each TypeInformation provides a serializer for the data type it represents. For example, a BasicTypeInfo returns a serializer that writes the respective primitive type, the serializer of a WritableTypeInfo delegates de/serialization to the write() and readFields() methods of the object implementing Hadoop’s Writable interface, and a GenericTypeInfo returns a serializer that delegates serialization to Kryo. Object serialization to a DataOutput which is backed by Flink MemorySegments goes automatically through Java’s efficient unsafe operations. For data types that can be used as keys, i.e., compared and hashed, the TypeInformation provides TypeComparators. TypeComparators compare and hash objects and can - depending on the concrete data type - also efficiently compare binary representations and extract fixed-length binary key prefixes.
Tuple, Pojo, and CaseClass types are composite types, i.e., containers for one or more possibly nested data types. As such, their serializers and comparators are also composite and delegate the serialization and comparison of their member data types to the respective serializers and comparators. The following figure illustrates the serialization of a (nested) Tuple3&lt;Integer, Double, Person&gt; object where Person is a POJO and defined as follows:
public class Person { public int id; public String name; } Flink’s type system can be easily extended by providing custom TypeInformations, Serializers, and Comparators to improve the performance of serializing and comparing custom data types.
How does Flink operate on binary data? # Similar to many other data processing APIs (including SQL), Flink’s APIs provide transformations to group, sort, and join data sets. These transformations operate on potentially very large data sets. Relational database systems feature very efficient algorithms for these purposes since several decades including external merge-sort, merge-join, and hybrid hash-join. Flink builds on this technology, but generalizes it to handle arbitrary objects using its custom serialization and comparison stack. In the following, we show how Flink operates with binary data by the example of Flink’s in-memory sort algorithm.
Flink assigns a memory budget to its data processing operators. Upon initialization, a sort algorithm requests its memory budget from the MemoryManager and receives a corresponding set of MemorySegments. The set of MemorySegments becomes the memory pool of a so-called sort buffer which collects the data that is be sorted. The following figure illustrates how data objects are serialized into the sort buffer.
The sort buffer is internally organized into two memory regions. The first region holds the full binary data of all objects. The second region contains pointers to the full binary object data and - depending on the key data type - fixed-length sort keys. When an object is added to the sort buffer, its binary data is appended to the first region, and a pointer (and possibly a key) is appended to the second region. The separation of actual data and pointers plus fixed-length keys is done for two purposes. It enables efficient swapping of fix-length entries (key+pointer) and also reduces the data that needs to be moved when sorting. If the sort key is a variable length data type such as a String, the fixed-length sort key must be a prefix key such as the first n characters of a String. Note, not all data types provide a fixed-length (prefix) sort key. When serializing objects into the sort buffer, both memory regions are extended with MemorySegments from the memory pool. Once the memory pool is empty and no more objects can be added, the sort buffer is completely filled and can be sorted. Flink’s sort buffer provides methods to compare and swap elements. This makes the actual sort algorithm pluggable. By default, Flink uses a Quicksort implementation which can fall back to HeapSort. The following figure shows how two objects are compared.
The sort buffer compares two elements by comparing their binary fix-length sort keys. The comparison is successful if either done on a full key (not a prefix key) or if the binary prefix keys are not equal. If the prefix keys are equal (or the sort key data type does not provide a binary prefix key), the sort buffer follows the pointers to the actual object data, deserializes both objects and compares the objects. Depending on the result of the comparison, the sort algorithm decides whether to swap the compared elements or not. The sort buffer swaps two elements by moving their fix-length keys and pointers. The actual data is not moved. Once the sort algorithm finishes, the pointers in the sort buffer are correctly ordered. The following figure shows how the sorted data is returned from the sort buffer.
The sorted data is returned by sequentially reading the pointer region of the sort buffer, skipping the sort keys and following the sorted pointers to the actual data. This data is either deserialized and returned as objects or the binary representation is copied and written to disk in case of an external merge-sort (see this blog post on joins in Flink).
Show me numbers! # So, what does operating on binary data mean for performance? We’ll run a benchmark that sorts 10 million Tuple2&lt;Integer, String&gt; objects to find out. The values of the Integer field are sampled from a uniform distribution. The String field values have a length of 12 characters and are sampled from a long-tail distribution. The input data is provided by an iterator that returns a mutable object, i.e., the same tuple object instance is returned with different field values. Flink uses this technique when reading data from memory, network, or disk to avoid unnecessary object instantiations. The benchmarks are run in a JVM with 900 MB heap size which is approximately the required amount of memory to store and sort 10 million tuple objects on the heap without dying of an OutOfMemoryError. We sort the tuples on the Integer field and on the String field using three sorting methods:
Object-on-heap. The tuples are stored in a regular java.util.ArrayList with initial capacity set to 10 million entries and sorted using Java’s regular collection sort. Flink-serialized. The tuple fields are serialized into a sort buffer of 600 MB size using Flink’s custom serializers, sorted as described above, and finally deserialized again. When sorting on the Integer field, the full Integer is used as sort key such that the sort happens entirely on binary data (no deserialization of objects required). For sorting on the String field a 8-byte prefix key is used and tuple objects are deserialized if the prefix keys are equal. Kryo-serialized. The tuple fields are serialized into a sort buffer of 600 MB size using Kryo serialization and sorted without binary sort keys. This means that each pair-wise comparison requires two object to be deserialized. All sort methods are implemented using a single thread. The reported times are averaged over ten runs. After each run, we call System.gc() to request a garbage collection run which does not go into measured execution time. The following figure shows the time to store the input data in memory, sort it, and read it back as objects.
We see that Flink’s sort on binary data using its own serializers significantly outperforms the other two methods. Comparing to the object-on-heap method, we see that loading the data into memory is much faster. Since we actually collect the objects, there is no opportunity to reuse the object instances, but have to re-create every tuple. This is less efficient than Flink’s serializers (or Kryo serialization). On the other hand, reading objects from the heap comes for free compared to deserialization. In our benchmark, object cloning was more expensive than serialization and deserialization combined. Looking at the sorting time, we see that also sorting on the binary representation is faster than Java’s collection sort. Sorting data that was serialized using Kryo without binary sort key, is much slower than both other methods. This is due to the heavy deserialization overhead. Sorting the tuples on their String field is faster than sorting on the Integer field due to the long-tailed value distribution which significantly reduces the number of pair-wise comparisons. To get a better feeling of what is happening during sorting we monitored the executing JVM using VisualVM. The following screenshots show heap memory usage, garbage collection activity and CPU usage over the execution of 10 runs.
Garbage Collection Memory Usage Object-on-Heap (int) Flink-Serialized (int) Kryo-Serialized (int) The experiments run single-threaded on an 8-core machine, so full utilization of one core only corresponds to a 12.5% overall utilization. The screenshots show that operating on binary data significantly reduces garbage collection activity. For the object-on-heap approach, the garbage collector runs in very short intervals while filling the sort buffer and causes a lot of CPU usage even for a single processing thread (sorting itself does not trigger the garbage collector). The JVM garbage collects with multiple parallel threads, explaining the high overall CPU utilization. On the other hand, the methods that operate on serialized data rarely trigger the garbage collector and have a much lower CPU utilization. In fact the garbage collector does not run at all if the tuples are sorted on the Integer field using the flink-serialized method because no objects need to be deserialized for pair-wise comparisons. The kryo-serialized method requires slightly more garbage collection since it does not use binary sort keys and deserializes two objects for each comparison.
The memory usage charts shows that the flink-serialized and kryo-serialized constantly occupy a high amount of memory (plus some objects for operation). This is due to the pre-allocation of MemorySegments. The actual memory usage is much lower, because the sort buffers are not completely filled. The following table shows the memory consumption of each method. 10 million records result in about 280 MB of binary data (object data plus pointers and sort keys) depending on the used serializer and presence and size of a binary sort key. Comparing this to the memory requirements of the object-on-heap approach we see that operating on binary data can significantly improve memory efficiency. In our benchmark more than twice as much data can be sorted in-memory if serialized into a sort buffer instead of holding it as objects on the heap.
Occupied Memory Object-on-Heap Flink-Serialized Kryo-Serialized Sort on Integer approx. 700 MB (heap) 277 MB (sort buffer) 266 MB (sort buffer) Sort on String approx. 700 MB (heap) 315 MB (sort buffer) 266 MB (sort buffer) To summarize, the experiments verify the previously stated benefits of operating on binary data.
We’re not done yet! # Apache Flink features quite a bit of advanced techniques to safely and efficiently process huge amounts of data with limited memory resources. However, there are a few points that could make Flink even more efficient. The Flink community is working on moving the managed memory to off-heap memory. This will allow for smaller JVMs, lower garbage collection overhead, and also easier system configuration. With Flink’s Table API, the semantics of all operations such as aggregations and projections are known (in contrast to black-box user-defined functions). Hence we can generate code for Table API operations that directly operates on binary data. Further improvements include serialization layouts which are tailored towards the operations that are applied on the binary data and code generation for serializers and comparators.
The groundwork (and a lot more) for operating on binary data is done but there is still some room for making Flink even better and faster. If you are crazy about performance and like to juggle with lot of bits and bytes, join the Flink community!
TL;DR; Give me three things to remember! # Flink’s active memory management avoids nasty OutOfMemoryErrors that kill your JVMs and reduces garbage collection overhead. Flink features a highly efficient data de/serialization stack that facilitates operations on binary data and makes more data fit into memory. Flink’s DBMS-style operators operate natively on binary data yielding high performance in-memory and destage gracefully to disk if necessary. `}),e.add({id:255,href:"/2015/04/13/announcing-flink-0.9.0-milestone1-preview-release/",title:"Announcing Flink 0.9.0-milestone1 preview release",section:"Flink Blog",content:`The Apache Flink community is pleased to announce the availability of the 0.9.0-milestone-1 release. The release is a preview of the upcoming 0.9.0 release. It contains many new features which will be available in the upcoming 0.9 release. Interested users are encouraged to try it out and give feedback. As the version number indicates, this release is a preview release that contains known issues.
You can download the release here and check out the latest documentation here. Feedback through the Flink mailing lists is, as always, very welcome!
New Features # Table API # Flink’s new Table API offers a higher-level abstraction for interacting with structured data sources. The Table API allows users to execute logical, SQL-like queries on distributed data sets while allowing them to freely mix declarative queries with regular Flink operators. Here is an example that groups and joins two tables:
val clickCounts = clicks .groupBy(&#39;user).select(&#39;userId, &#39;url.count as &#39;count) val activeUsers = users.join(clickCounts) .where(&#39;id === &#39;userId &amp;&amp; &#39;count &gt; 10).select(&#39;username, &#39;count, ...) Tables consist of logical attributes that can be selected by name rather than physical Java and Scala data types. This alleviates a lot of boilerplate code for common ETL tasks and raises the abstraction for Flink programs. Tables are available for both static and streaming data sources (DataSet and DataStream APIs).
Check out the Table guide for Java and Scala here.
Gelly Graph Processing API # Gelly is a Java Graph API for Flink. It contains a set of utilities for graph analysis, support for iterative graph processing and a library of graph algorithms. Gelly exposes a Graph data structure that wraps DataSets for vertices and edges, as well as methods for creating graphs from DataSets, graph transformations and utilities (e.g., in- and out- degrees of vertices), neighborhood aggregations, iterative vertex-centric graph processing, as well as a library of common graph algorithms, including PageRank, SSSP, label propagation, and community detection.
Gelly internally builds on top of Flink’s delta iterations. Iterative graph algorithms are executed leveraging mutable state, achieving similar performance with specialized graph processing systems.
Gelly will eventually subsume Spargel, Flink’s Pregel-like API. Check out the Gelly guide here.
Flink Machine Learning Library # This release includes the first version of Flink’s Machine Learning library. The library’s pipeline approach, which has been strongly inspired by scikit-learn’s abstraction of transformers and estimators, makes it easy to quickly set up a data processing pipeline and to get your job done.
Flink distinguishes between transformers and learners. Transformers are components which transform your input data into a new format allowing you to extract features, cleanse your data or to sample from it. Learners on the other hand constitute the components which take your input data and train a model on it. The model you obtain from the learner can then be evaluated and used to make predictions on unseen data.
Currently, the machine learning library contains transformers and learners to do multiple tasks. The library supports multiple linear regression using a stochastic gradient implementation to scale to large data sizes. Furthermore, it includes an alternating least squares (ALS) implementation to factorizes large matrices. The matrix factorization can be used to do collaborative filtering. An implementation of the communication efficient distributed dual coordinate ascent (CoCoA) algorithm is the latest addition to the library. The CoCoA algorithm can be used to train distributed soft-margin SVMs.
Flink on YARN leveraging Apache Tez # We are introducing a new execution mode for Flink to be able to run restricted Flink programs on top of Apache Tez. This mode retains Flink’s APIs, optimizer, as well as Flink’s runtime operators, but instead of wrapping those in Flink tasks that are executed by Flink TaskManagers, it wraps them in Tez runtime tasks and builds a Tez DAG that represents the program.
By using Flink on Tez, users have an additional choice for an execution platform for Flink programs. While Flink’s distributed runtime favors low latency, streaming shuffles, and iterative algorithms, Tez focuses on scalability and elastic resource usage in shared YARN clusters.
Get started with Flink on Tez here.
Reworked Distributed Runtime on Akka # Flink’s RPC system has been replaced by the widely adopted Akka framework. Akka’s concurrency model offers the right abstraction to develop a fast as well as robust distributed system. By using Akka’s own failure detection mechanism the stability of Flink’s runtime is significantly improved, because the system can now react in proper form to node outages. Furthermore, Akka improves Flink’s scalability by introducing asynchronous messages to the system. These asynchronous messages allow Flink to be run on many more nodes than before.
Exactly-once processing on Kafka Streaming Sources # This release introduces stream processing with exacly-once delivery guarantees for Flink streaming programs that analyze streaming sources that are persisted by Apache Kafka. The system is internally tracking the Kafka offsets to ensure that Flink can pick up data from Kafka where it left off in case of an failure.
Read here on how to use the persistent Kafka source.
Improved YARN support # Flink’s YARN client contains several improvements, such as a detached mode for starting a YARN session in the background, the ability to submit a single Flink job to a YARN cluster without starting a session, including a “fire and forget” mode. Flink is now also able to reallocate failed YARN containers to maintain the size of the requested cluster. This feature allows to implement fault-tolerant setups on top of YARN. There is also an internal Java API to deploy and control a running YARN cluster. This is being used by system integrators to easily control Flink on YARN within their Hadoop 2 cluster.
See the YARN docs here.
More Improvements and Fixes # FLINK-1605: Flink is not exposing its Guava and ASM dependencies to Maven projects depending on Flink. We use the maven-shade-plugin to relocate these dependencies into our own namespace. This allows users to use any Guava or ASM version.
FLINK-1417: Automatic recognition and registration of Java Types at Kryo and the internal serializers: Flink has its own type handling and serialization framework falling back to Kryo for types that it cannot handle. To get the best performance Flink is automatically registering all types a user is using in their program with Kryo.Flink also registers serializers for Protocol Buffers, Thrift, Avro and YodaTime automatically. Users can also manually register serializers to Kryo (https://issues.apache.org/jira/browse/FLINK-1399)
FLINK-1296: Add support for sorting very large records
FLINK-1679: &ldquo;degreeOfParallelism&rdquo; methods renamed to &ldquo;parallelism&rdquo;
FLINK-1501: Add metrics library for monitoring TaskManagers
FLINK-1760: Add support for building Flink with Scala 2.11
FLINK-1648: Add a mode where the system automatically sets the parallelism to the available task slots
FLINK-1622: Add groupCombine operator
FLINK-1589: Add option to pass Configuration to LocalExecutor
FLINK-1504: Add support for accessing secured HDFS clusters in standalone mode
FLINK-1478: Add strictly local input split assignment
FLINK-1512: Add CsvReader for reading into POJOs.
FLINK-1461: Add sortPartition operator
FLINK-1450: Add Fold operator to the Streaming api
FLINK-1389: Allow setting custom file extensions for files created by the FileOutputFormat
FLINK-1236: Add support for localization of Hadoop Input Splits
FLINK-1179: Add button to JobManager web interface to request stack trace of a TaskManager
FLINK-1105: Add support for locally sorted output
FLINK-1688: Add socket sink
FLINK-1436: Improve usability of command line interface
`}),e.add({id:256,href:"/2015/04/07/march-2015-in-the-flink-community/",title:"March 2015 in the Flink community",section:"Flink Blog",content:`March has been a busy month in the Flink community.
Scaling ALS # Flink committers employed at data Artisans published a blog post on how they scaled matrix factorization with Flink and Google Compute Engine to matrices with 28 billion elements.
Learn about the internals of Flink # The community has started an effort to better document the internals of Flink. Check out the first articles on the Flink wiki on how Flink manages memory, how tasks in Flink exchange data, type extraction and serialization in Flink, as well as how Flink builds on Akka for distributed coordination.
Check out also the new blog post on how Flink executes joins with several insights into Flink&rsquo;s runtime.
Meetups and talks # Flink&rsquo;s machine learning efforts were presented at the Machine Learning Stockholm meetup group. The regular Berlin Flink meetup featured a talk on the past, present, and future of Flink. The talk is available on youtube.
In the Flink master # Table API in Scala and Java # The new Table API in Flink is now available in both Java and Scala. Check out the examples here (Java) and here (Scala).
Additions to the Machine Learning library # Flink&rsquo;s Machine Learning library is seeing quite a bit of traction. Recent additions include the CoCoA algorithm for distributed optimization.
Exactly-once delivery guarantees for streaming jobs # Flink streaming jobs now provide exactly once processing guarantees when coupled with persistent sources (notably Apache Kafka). Flink periodically checkpoints and persists the offsets of the sources and restarts from those checkpoints at failure recovery. This functionality is currently limited in that it does not yet handle large state and iterative programs.
`}),e.add({id:257,href:"/2015/03/13/peeking-into-apache-flinks-engine-room/",title:"Peeking into Apache Flink's Engine Room",section:"Flink Blog",content:` Join Processing in Apache Flink # Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. Therefore, join processing serves as a good example to discuss the salient design points and implementation details of a data processing system.
In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins. Specifically, I will
show how easy it is to join data sets using Flink’s fluent APIs, discuss basic distributed join strategies, Flink’s join implementations, and its memory management, talk about Flink’s optimizer that automatically chooses join strategies, show some performance numbers for joining data sets of different sizes, and finally briefly discuss joining of co-located and pre-sorted data sets. Disclaimer: This blog post is exclusively about equi-joins. Whenever I say “join” in the following, I actually mean “equi-join”.
How do I join with Flink? # Flink provides fluent APIs in Java and Scala to write data flow programs. Flink’s APIs are centered around parallel data collections which are called data sets. data sets are processed by applying Transformations that compute new data sets. Flink’s transformations include Map and Reduce as known from MapReduce [1] but also operators for joining, co-grouping, and iterative processing. The documentation gives an overview of all available transformations [2].
Joining two Scala case class data sets is very easy as the following example shows:
// define your data types case class PageVisit(url: String, ip: String, userId: Long) case class User(id: Long, name: String, email: String, country: String) // get your data from somewhere val visits: DataSet[PageVisit] = ... val users: DataSet[User] = ... // filter the users data set val germanUsers = users.filter((u) =&gt; u.country.equals(&#34;de&#34;)) // join data sets val germanVisits: DataSet[(PageVisit, User)] = // equi-join condition (PageVisit.userId = User.id) visits.join(germanUsers).where(&#34;userId&#34;).equalTo(&#34;id&#34;) Flink’s APIs also allow to:
apply a user-defined join function to each pair of joined elements instead returning a ($Left, $Right) tuple, select fields of pairs of joined Tuple elements (projection), and define composite join keys such as .where(“orderDate”, “zipCode”).equalTo(“date”, “zip”). See the documentation for more details on Flink’s join features [3].
How does Flink join my data? # Flink uses techniques which are well known from parallel database systems to efficiently execute parallel joins. A join operator must establish all pairs of elements from its input data sets for which the join condition evaluates to true. In a standalone system, the most straight-forward implementation of a join is the so-called nested-loop join which builds the full Cartesian product and evaluates the join condition for each pair of elements. This strategy has quadratic complexity and does obviously not scale to large inputs.
In a distributed system joins are commonly processed in two steps:
The data of both inputs is distributed across all parallel instances that participate in the join and each parallel instance performs a standard stand-alone join algorithm on its local partition of the overall data. The distribution of data across parallel instances must ensure that each valid join pair can be locally built by exactly one instance. For both steps, there are multiple valid strategies that can be independently picked and which are favorable in different situations. In Flink terminology, the first phase is called Ship Strategy and the second phase Local Strategy. In the following I will describe Flink’s ship and local strategies to join two data sets R and S.
Ship Strategies # Flink features two ship strategies to establish a valid data partitioning for a join:
the Repartition-Repartition strategy (RR) and the Broadcast-Forward strategy (BF). The Repartition-Repartition strategy partitions both inputs, R and S, on their join key attributes using the same partitioning function. Each partition is assigned to exactly one parallel join instance and all data of that partition is sent to its associated instance. This ensures that all elements that share the same join key are shipped to the same parallel instance and can be locally joined. The cost of the RR strategy is a full shuffle of both data sets over the network.
The Broadcast-Forward strategy sends one complete data set (R) to each parallel instance that holds a partition of the other data set (S), i.e., each parallel instance receives the full data set R. Data set S remains local and is not shipped at all. The cost of the BF strategy depends on the size of R and the number of parallel instances it is shipped to. The size of S does not matter because S is not moved. The figure below illustrates how both ship strategies work.
The Repartition-Repartition and Broadcast-Forward ship strategies establish suitable data distributions to execute a distributed join. Depending on the operations that are applied before the join, one or even both inputs of a join are already distributed in a suitable way across parallel instances. In this case, Flink will reuse such distributions and only ship one or no input at all.
Flink’s Memory Management # Before delving into the details of Flink’s local join algorithms, I will briefly discuss Flink’s internal memory management. Data processing algorithms such as joining, grouping, and sorting need to hold portions of their input data in memory. While such algorithms perform best if there is enough memory available to hold all data, it is crucial to gracefully handle situations where the data size exceeds memory. Such situations are especially tricky in JVM-based systems such as Flink because the system needs to reliably recognize that it is short on memory. Failure to detect such situations can result in an OutOfMemoryException and kill the JVM.
Flink handles this challenge by actively managing its memory. When a worker node (TaskManager) is started, it allocates a fixed portion (70% by default) of the JVM’s heap memory that is available after initialization as 32KB byte arrays. These byte arrays are distributed as working memory to all algorithms that need to hold significant portions of data in memory. The algorithms receive their input data as Java data objects and serialize them into their working memory.
This design has several nice properties. First, the number of data objects on the JVM heap is much lower resulting in less garbage collection pressure. Second, objects on the heap have a certain space overhead and the binary representation is more compact. Especially data sets of many small elements benefit from that. Third, an algorithm knows exactly when the input data exceeds its working memory and can react by writing some of its filled byte arrays to the worker’s local filesystem. After the content of a byte array is written to disk, it can be reused to process more data. Reading data back into memory is as simple as reading the binary data from the local filesystem. The following figure illustrates Flink’s memory management.
This active memory management makes Flink extremely robust for processing very large data sets on limited memory resources while preserving all benefits of in-memory processing if data is small enough to fit in-memory. De/serializing data into and from memory has a certain cost overhead compared to simply holding all data elements on the JVM’s heap. However, Flink features efficient custom de/serializers which also allow to perform certain operations such as comparisons directly on serialized data without deserializing data objects from memory.
Local Strategies # After the data has been distributed across all parallel join instances using either a Repartition-Repartition or Broadcast-Forward ship strategy, each instance runs a local join algorithm to join the elements of its local partition. Flink’s runtime features two common join strategies to perform these local joins:
the Sort-Merge-Join strategy (SM) and the Hybrid-Hash-Join strategy (HH). The Sort-Merge-Join works by first sorting both input data sets on their join key attributes (Sort Phase) and merging the sorted data sets as a second step (Merge Phase). The sort is done in-memory if the local partition of a data set is small enough. Otherwise, an external merge-sort is done by collecting data until the working memory is filled, sorting it, writing the sorted data to the local filesystem, and starting over by filling the working memory again with more incoming data. After all input data has been received, sorted, and written as sorted runs to the local file system, a fully sorted stream can be obtained. This is done by reading the partially sorted runs from the local filesystem and sort-merging the records on the fly. Once the sorted streams of both inputs are available, both streams are sequentially read and merge-joined in a zig-zag fashion by comparing the sorted join key attributes, building join element pairs for matching keys, and advancing the sorted stream with the lower join key. The figure below shows how the Sort-Merge-Join strategy works.
The Hybrid-Hash-Join distinguishes its inputs as build-side and probe-side input and works in two phases, a build phase followed by a probe phase. In the build phase, the algorithm reads the build-side input and inserts all data elements into an in-memory hash table indexed by their join key attributes. If the hash table outgrows the algorithm&rsquo;s working memory, parts of the hash table (ranges of hash indexes) are written to the local filesystem. The build phase ends after the build-side input has been fully consumed. In the probe phase, the algorithm reads the probe-side input and probes the hash table for each element using its join key attribute. If the element falls into a hash index range that was spilled to disk, the element is also written to disk. Otherwise, the element is immediately joined with all matching elements from the hash table. If the hash table completely fits into the working memory, the join is finished after the probe-side input has been fully consumed. Otherwise, the current hash table is dropped and a new hash table is built using spilled parts of the build-side input. This hash table is probed by the corresponding parts of the spilled probe-side input. Eventually, all data is joined. Hybrid-Hash-Joins perform best if the hash table completely fits into the working memory because an arbitrarily large the probe-side input can be processed on-the-fly without materializing it. However even if build-side input does not fit into memory, the the Hybrid-Hash-Join has very nice properties. In this case, in-memory processing is partially preserved and only a fraction of the build-side and probe-side data needs to be written to and read from the local filesystem. The next figure illustrates how the Hybrid-Hash-Join works.
How does Flink choose join strategies? # Ship and local strategies do not depend on each other and can be independently chosen. Therefore, Flink can execute a join of two data sets R and S in nine different ways by combining any of the three ship strategies (RR, BF with R being broadcasted, BF with S being broadcasted) with any of the three local strategies (SM, HH with R being build-side, HH with S being build-side). Each of these strategy combinations results in different execution performance depending on the data sizes and the available amount of working memory. In case of a small data set R and a much larger data set S, broadcasting R and using it as build-side input of a Hybrid-Hash-Join is usually a good choice because the much larger data set S is not shipped and not materialized (given that the hash table completely fits into memory). If both data sets are rather large or the join is performed on many parallel instances, repartitioning both inputs is a robust choice.
Flink features a cost-based optimizer which automatically chooses the execution strategies for all operators including joins. Without going into the details of cost-based optimization, this is done by computing cost estimates for execution plans with different strategies and picking the plan with the least estimated costs. Thereby, the optimizer estimates the amount of data which is shipped over the the network and written to disk. If no reliable size estimates for the input data can be obtained, the optimizer falls back to robust default choices. A key feature of the optimizer is to reason about existing data properties. For example, if the data of one input is already partitioned in a suitable way, the generated candidate plans will not repartition this input. Hence, the choice of a RR ship strategy becomes more likely. The same applies for previously sorted data and the Sort-Merge-Join strategy. Flink programs can help the optimizer to reason about existing data properties by providing semantic information about user-defined functions [4]. While the optimizer is a killer feature of Flink, it can happen that a user knows better than the optimizer how to execute a specific join. Similar to relational database systems, Flink offers optimizer hints to tell the optimizer which join strategies to pick [5].
How is Flink’s join performance? # Alright, that sounds good, but how fast are joins in Flink? Let’s have a look. We start with a benchmark of the single-core performance of Flink’s Hybrid-Hash-Join implementation and run a Flink program that executes a Hybrid-Hash-Join with parallelism 1. We run the program on a n1-standard-2 Google Compute Engine instance (2 vCPUs, 7.5GB memory) with two locally attached SSDs. We give 4GB as working memory to the join. The join program generates 1KB records for both inputs on-the-fly, i.e., the data is not read from disk. We run 1:N (Primary-Key/Foreign-Key) joins and generate the smaller input with unique Integer join keys and the larger input with randomly chosen Integer join keys that fall into the key range of the smaller input. Hence, each tuple of the larger side joins with exactly one tuple of the smaller side. The result of the join is immediately discarded. We vary the size of the build-side input from 1 million to 12 million elements (1GB to 12GB). The probe-side input is kept constant at 64 million elements (64GB). The following chart shows the average execution time of three runs for each setup.
The joins with 1 to 3 GB build side (blue bars) are pure in-memory joins. The other joins partially spill data to disk (4 to 12GB, orange bars). The results show that the performance of Flink’s Hybrid-Hash-Join remains stable as long as the hash table completely fits into memory. As soon as the hash table becomes larger than the working memory, parts of the hash table and corresponding parts of the probe side are spilled to disk. The chart shows that the performance of the Hybrid-Hash-Join gracefully decreases in this situation, i.e., there is no sharp increase in runtime when the join starts spilling. In combination with Flink’s robust memory management, this execution behavior gives smooth performance without the need for fine-grained, data-dependent memory tuning.
So, Flink’s Hybrid-Hash-Join implementation performs well on a single thread even for limited memory resources, but how good is Flink’s performance when joining larger data sets in a distributed setting? For the next experiment we compare the performance of the most common join strategy combinations, namely:
Broadcast-Forward, Hybrid-Hash-Join (broadcasting and building with the smaller side), Repartition, Hybrid-Hash-Join (building with the smaller side), and Repartition, Sort-Merge-Join for different input size ratios:
1GB : 1000GB 10GB : 1000GB 100GB : 1000GB 1000GB : 1000GB The Broadcast-Forward strategy is only executed for up to 10GB. Building a hash table from 100GB broadcasted data in 5GB working memory would result in spilling proximately 95GB (build input) + 950GB (probe input) in each parallel thread and require more than 8TB local disk storage on each machine.
As in the single-core benchmark, we run 1:N joins, generate the data on-the-fly, and immediately discard the result after the join. We run the benchmark on 10 n1-highmem-8 Google Compute Engine instances. Each instance is equipped with 8 cores, 52GB RAM, 40GB of which are configured as working memory (5GB per core), and one local SSD for spilling to disk. All benchmarks are performed using the same configuration, i.e., no fine tuning for the respective data sizes is done. The programs are executed with a parallelism of 80.
As expected, the Broadcast-Forward strategy performs best for very small inputs because the large probe side is not shipped over the network and is locally joined. However, when the size of the broadcasted side grows, two problems arise. First the amount of data which is shipped increases but also each parallel instance has to process the full broadcasted data set. The performance of both Repartitioning strategies behaves similar for growing input sizes which indicates that these strategies are mainly limited by the cost of the data transfer (at max 2TB are shipped over the network and joined). Although the Sort-Merge-Join strategy shows the worst performance all shown cases, it has a right to exist because it can nicely exploit sorted input data.
I’ve got sooo much data to join, do I really need to ship it? # We have seen that off-the-shelf distributed joins work really well in Flink. But what if your data is so huge that you do not want to shuffle it across your cluster? We recently added some features to Flink for specifying semantic properties (partitioning and sorting) on input splits and co-located reading of local input files. With these tools at hand, it is possible to join pre-partitioned data sets from your local filesystem without sending a single byte over your cluster’s network. If the input data is even pre-sorted, the join can be done as a Sort-Merge-Join without sorting, i.e., the join is essentially done on-the-fly. Exploiting co-location requires a very special setup though. Data needs to be stored on the local filesystem because HDFS does not feature data co-location and might move file blocks across data nodes. That means you need to take care of many things yourself which HDFS would have done for you, including replication to avoid data loss. On the other hand, performance gains of joining co-located and pre-sorted can be quite substantial.
tl;dr: What should I remember from all of this? # Flink’s fluent Scala and Java APIs make joins and other data transformations easy as cake. The optimizer does the hard choices for you, but gives you control in case you know better. Flink’s join implementations perform very good in-memory and gracefully degrade when going to disk. Due to Flink’s robust memory management, there is no need for job- or data-specific memory tuning to avoid a nasty OutOfMemoryException. It just runs out-of-the-box. References # [1] “MapReduce: Simplified data processing on large clusters”, Dean, Ghemawat, 2004 [2] Flink 0.8.1 documentation: Data Transformations [3] Flink 0.8.1 documentation: Joins [4] Flink 1.0 documentation: Semantic annotations [5] Flink 1.0 documentation: Optimizer join hints `}),e.add({id:258,href:"/2015/03/02/february-2015-in-the-flink-community/",title:"February 2015 in the Flink community",section:"Flink Blog",content:`February might be the shortest month of the year, but this does not mean that the Flink community has not been busy adding features to the system and fixing bugs. Here’s a rundown of the activity in the Flink community last month.
0.8.1 release # Flink 0.8.1 was released. This bugfixing release resolves a total of 22 issues.
New committer # Max Michels has been voted a committer by the Flink PMC.
Flink adapter for Apache SAMOA # Apache SAMOA (incubating) is a distributed streaming machine learning (ML) framework with a programming abstraction for distributed streaming ML algorithms. SAMOA runs on a variety of backend engines, currently Apache Storm and Apache S4. A pull request is available at the SAMOA repository that adds a Flink adapter for SAMOA.
Easy Flink deployment on Google Compute Cloud # Flink is now integrated in bdutil, Google’s open source tool for creating and configuring (Hadoop) clusters in Google Compute Engine. Deployment of Flink clusters in now supported starting with bdutil 1.2.0.
Flink on the Web # A new blog post on Flink Streaming was published at the blog. Flink was mentioned in several articles on the web. Here are some examples:
How Flink became an Apache Top-Level Project
Stale Synchronous Parallelism: The new frontier for Apache Flink?
Distributed data processing with Apache Flink
Ciao latency, hello speed
In the Flink master # The following features have been now merged in Flink’s master repository.
Gelly # Gelly, Flink’s Graph API allows users to manipulate graph-shaped data directly. Here’s for example a calculation of shortest paths in a graph:
Graph&lt;Long, Double, Double&gt; graph = Graph.fromDataSet(vertices, edges, env); DataSet&lt;Vertex&lt;Long, Double&gt;&gt; singleSourceShortestPaths = graph .run(new SingleSourceShortestPaths&lt;Long&gt;(srcVertexId, maxIterations)).getVertices(); See more Gelly examples here.
Flink Expressions # The newly merged flink-table module is the first step in Flink’s roadmap towards logical queries and SQL support. Here’s a preview on how you can read two CSV file, assign a logical schema to, and apply transformations like filters and joins using logical attributes rather than physical data types.
val customers = getCustomerDataSet(env) .as(&#39;id, &#39;mktSegment) .filter( &#39;mktSegment === &#34;AUTOMOBILE&#34; ) val orders = getOrdersDataSet(env) .filter( o =&gt; dateFormat.parse(o.orderDate).before(date) ) .as(&#39;orderId, &#39;custId, &#39;orderDate, &#39;shipPrio) val items = orders.join(customers) .where(&#39;custId === &#39;id) .select(&#39;orderId, &#39;orderDate, &#39;shipPrio) Access to HCatalog tables # With the flink-hcatalog module, you can now conveniently access HCatalog/Hive tables. The module supports projection (selection and order of fields) and partition filters.
Access to secured YARN clusters/HDFS. # With this change users can access Kerberos secured YARN (and HDFS) Hadoop clusters. Also, basic support for accessing secured HDFS with a standalone Flink setup is now available.
`}),e.add({id:259,href:"/2015/02/09/introducing-flink-streaming/",title:"Introducing Flink Streaming",section:"Flink Blog",content:`This post is the first of a series of blog posts on Flink Streaming, the recent addition to Apache Flink that makes it possible to analyze continuous data sources in addition to static files. Flink Streaming uses the pipelined Flink engine to process data streams in real time and offers a new API including definition of flexible windows.
In this post, we go through an example that uses the Flink Streaming API to compute statistics on stock market data that arrive continuously and combine the stock market data with Twitter streams. See the Streaming Programming Guide for a detailed presentation of the Streaming API.
First, we read a bunch of stock price streams and combine them into one stream of market data. We apply several transformations on this market data stream, like rolling aggregations per stock. Then we emit price warning alerts when the prices are rapidly changing. Moving towards more advanced features, we compute rolling correlations between the market data streams and a Twitter stream with stock mentions.
For running the example implementation please use the 0.9-SNAPSHOT version of Flink as a dependency. The full example code base can be found here in Scala and here in Java7.
Back to top
Reading from multiple inputs # First, let us create the stream of stock prices:
Read a socket stream of stock prices Parse the text in the stream to create a stream of StockPrice objects Add four other sources tagged with the stock symbol. Finally, merge the streams to create a unified stream. def main(args: Array[String]) { val env = StreamExecutionEnvironment.getExecutionEnvironment //Read from a socket stream at map it to StockPrice objects val socketStockStream = env.socketTextStream(&#34;localhost&#34;, 9999).map(x =&gt; { val split = x.split(&#34;,&#34;) StockPrice(split(0), split(1).toDouble) }) //Generate other stock streams val SPX_Stream = env.addSource(generateStock(&#34;SPX&#34;)(10) _) val FTSE_Stream = env.addSource(generateStock(&#34;FTSE&#34;)(20) _) val DJI_Stream = env.addSource(generateStock(&#34;DJI&#34;)(30) _) val BUX_Stream = env.addSource(generateStock(&#34;BUX&#34;)(40) _) //Merge all stock streams together val stockStream = socketStockStream.merge(SPX_Stream, FTSE_Stream, DJI_Stream, BUX_Stream) stockStream.print() env.execute(&#34;Stock stream&#34;) } public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); //Read from a socket stream at map it to StockPrice objects DataStream&lt;StockPrice&gt; socketStockStream = env .socketTextStream(&#34;localhost&#34;, 9999) .map(new MapFunction&lt;String, StockPrice&gt;() { private String[] tokens; @Override public StockPrice map(String value) throws Exception { tokens = value.split(&#34;,&#34;); return new StockPrice(tokens[0], Double.parseDouble(tokens[1])); } }); //Generate other stock streams DataStream&lt;StockPrice&gt; SPX_stream = env.addSource(new StockSource(&#34;SPX&#34;, 10)); DataStream&lt;StockPrice&gt; FTSE_stream = env.addSource(new StockSource(&#34;FTSE&#34;, 20)); DataStream&lt;StockPrice&gt; DJI_stream = env.addSource(new StockSource(&#34;DJI&#34;, 30)); DataStream&lt;StockPrice&gt; BUX_stream = env.addSource(new StockSource(&#34;BUX&#34;, 40)); //Merge all stock streams together DataStream&lt;StockPrice&gt; stockStream = socketStockStream .merge(SPX_stream, FTSE_stream, DJI_stream, BUX_stream); stockStream.print(); env.execute(&#34;Stock stream&#34;); See here on how you can create streaming sources for Flink Streaming programs. Flink, of course, has support for reading in streams from external sources such as Apache Kafka, Apache Flume, RabbitMQ, and others. For the sake of this example, the data streams are simply generated using the generateStock method:
val symbols = List(&#34;SPX&#34;, &#34;FTSE&#34;, &#34;DJI&#34;, &#34;DJT&#34;, &#34;BUX&#34;, &#34;DAX&#34;, &#34;GOOG&#34;) case class StockPrice(symbol: String, price: Double) def generateStock(symbol: String)(sigma: Int)(out: Collector[StockPrice]) = { var price = 1000. while (true) { price = price + Random.nextGaussian * sigma out.collect(StockPrice(symbol, price)) Thread.sleep(Random.nextInt(200)) } } private static final ArrayList&lt;String&gt; SYMBOLS = new ArrayList&lt;String&gt;( Arrays.asList(&#34;SPX&#34;, &#34;FTSE&#34;, &#34;DJI&#34;, &#34;DJT&#34;, &#34;BUX&#34;, &#34;DAX&#34;, &#34;GOOG&#34;)); public static class StockPrice implements Serializable { public String symbol; public Double price; public StockPrice() { } public StockPrice(String symbol, Double price) { this.symbol = symbol; this.price = price; } @Override public String toString() { return &#34;StockPrice{&#34; + &#34;symbol=&#39;&#34; + symbol + &#39;\\&#39;&#39; + &#34;, count=&#34; + price + &#39;}&#39;; } } public final static class StockSource implements SourceFunction&lt;StockPrice&gt; { private Double price; private String symbol; private Integer sigma; public StockSource(String symbol, Integer sigma) { this.symbol = symbol; this.sigma = sigma; } @Override public void invoke(Collector&lt;StockPrice&gt; collector) throws Exception { price = DEFAULT_PRICE; Random random = new Random(); while (true) { price = price + random.nextGaussian() * sigma; collector.collect(new StockPrice(symbol, price)); Thread.sleep(random.nextInt(200)); } } } To read from the text socket stream please make sure that you have a socket running. For the sake of the example executing the following command in a terminal does the job. You can get netcat here if it is not available on your machine.
nc -lk 9999 If we execute the program from our IDE we see the system the stock prices being generated:
INFO Job execution switched to status RUNNING. INFO Socket Stream(1/1) switched to SCHEDULED INFO Socket Stream(1/1) switched to DEPLOYING INFO Custom Source(1/1) switched to SCHEDULED INFO Custom Source(1/1) switched to DEPLOYING … 1&gt; StockPrice{symbol=&#39;SPX&#39;, count=1011.3405732645239} 2&gt; StockPrice{symbol=&#39;SPX&#39;, count=1018.3381290039248} 1&gt; StockPrice{symbol=&#39;DJI&#39;, count=1036.7454894073978} 3&gt; StockPrice{symbol=&#39;DJI&#39;, count=1135.1170217478427} 3&gt; StockPrice{symbol=&#39;BUX&#39;, count=1053.667523187687} 4&gt; StockPrice{symbol=&#39;BUX&#39;, count=1036.552601487263} Back to top
Window aggregations # We first compute aggregations on time-based windows of the data. Flink provides flexible windowing semantics where windows can also be defined based on count of records or any custom user defined logic.
We partition our stream into windows of 10 seconds and slide the window every 5 seconds. We compute three statistics every 5 seconds. The first is the minimum price of all stocks, the second produces maximum price per stock, and the third is the mean stock price (using a map window function). Aggregations and groupings can be performed on named fields of POJOs, making the code more readable.
//Define the desired time window val windowedStream = stockStream .window(Time.of(10, SECONDS)).every(Time.of(5, SECONDS)) //Compute some simple statistics on a rolling window val lowest = windowedStream.minBy(&#34;price&#34;) val maxByStock = windowedStream.groupBy(&#34;symbol&#34;).maxBy(&#34;price&#34;) val rollingMean = windowedStream.groupBy(&#34;symbol&#34;).mapWindow(mean _) //Compute the mean of a window def mean(ts: Iterable[StockPrice], out: Collector[StockPrice]) = { if (ts.nonEmpty) { out.collect(StockPrice(ts.head.symbol, ts.foldLeft(0: Double)(_ + _.price) / ts.size)) } } //Define the desired time window WindowedDataStream&lt;StockPrice&gt; windowedStream = stockStream .window(Time.of(10, TimeUnit.SECONDS)) .every(Time.of(5, TimeUnit.SECONDS)); //Compute some simple statistics on a rolling window DataStream&lt;StockPrice&gt; lowest = windowedStream.minBy(&#34;price&#34;).flatten(); DataStream&lt;StockPrice&gt; maxByStock = windowedStream.groupBy(&#34;symbol&#34;) .maxBy(&#34;price&#34;).flatten(); DataStream&lt;StockPrice&gt; rollingMean = windowedStream.groupBy(&#34;symbol&#34;) .mapWindow(new WindowMean()).flatten(); //Compute the mean of a window public final static class WindowMean implements WindowMapFunction&lt;StockPrice, StockPrice&gt; { private Double sum = 0.0; private Integer count = 0; private String symbol = &#34;&#34;; @Override public void mapWindow(Iterable&lt;StockPrice&gt; values, Collector&lt;StockPrice&gt; out) throws Exception { if (values.iterator().hasNext()) {s for (StockPrice sp : values) { sum += sp.price; symbol = sp.symbol; count++; } out.collect(new StockPrice(symbol, sum / count)); } } } Let us note that to print a windowed stream one has to flatten it first, thus getting rid of the windowing logic. For example execute maxByStock.flatten().print() to print the stream of maximum prices of the time windows by stock. For Scala flatten() is called implicitly when needed.
Back to top
Data-driven windows # The most interesting event in the stream is when the price of a stock is changing rapidly. We can send a warning when a stock price changes more than 5% since the last warning. To do that, we use a delta-based window providing a threshold on when the computation will be triggered, a function to compute the difference and a default value with which the first record is compared. We also create a Count data type to count the warnings every 30 seconds.
case class Count(symbol: String, count: Int) val defaultPrice = StockPrice(&#34;&#34;, 1000) //Use delta policy to create price change warnings val priceWarnings = stockStream.groupBy(&#34;symbol&#34;) .window(Delta.of(0.05, priceChange, defaultPrice)) .mapWindow(sendWarning _) //Count the number of warnings every half a minute val warningsPerStock = priceWarnings.map(Count(_, 1)) .groupBy(&#34;symbol&#34;) .window(Time.of(30, SECONDS)) .sum(&#34;count&#34;) def priceChange(p1: StockPrice, p2: StockPrice): Double = { Math.abs(p1.price / p2.price - 1) } def sendWarning(ts: Iterable[StockPrice], out: Collector[String]) = { if (ts.nonEmpty) out.collect(ts.head.symbol) } private static final Double DEFAULT_PRICE = 1000.; private static final StockPrice DEFAULT_STOCK_PRICE = new StockPrice(&#34;&#34;, DEFAULT_PRICE); //Use delta policy to create price change warnings DataStream&lt;String&gt; priceWarnings = stockStream.groupBy(&#34;symbol&#34;) .window(Delta.of(0.05, new DeltaFunction&lt;StockPrice&gt;() { @Override public double getDelta(StockPrice oldDataPoint, StockPrice newDataPoint) { return Math.abs(oldDataPoint.price - newDataPoint.price); } }, DEFAULT_STOCK_PRICE)) .mapWindow(new SendWarning()).flatten(); //Count the number of warnings every half a minute DataStream&lt;Count&gt; warningsPerStock = priceWarnings.map(new MapFunction&lt;String, Count&gt;() { @Override public Count map(String value) throws Exception { return new Count(value, 1); } }).groupBy(&#34;symbol&#34;).window(Time.of(30, TimeUnit.SECONDS)).sum(&#34;count&#34;).flatten(); public static class Count implements Serializable { public String symbol; public Integer count; public Count() { } public Count(String symbol, Integer count) { this.symbol = symbol; this.count = count; } @Override public String toString() { return &#34;Count{&#34; + &#34;symbol=&#39;&#34; + symbol + &#39;\\&#39;&#39; + &#34;, count=&#34; + count + &#39;}&#39;; } } public static final class SendWarning implements MapWindowFunction&lt;StockPrice, String&gt; { @Override public void mapWindow(Iterable&lt;StockPrice&gt; values, Collector&lt;String&gt; out) throws Exception { if (values.iterator().hasNext()) { out.collect(values.iterator().next().symbol); } } } Back to top
Combining with a Twitter stream # Next, we will read a Twitter stream and correlate it with our stock price stream. Flink has support for connecting to Twitter&rsquo;s API but for the sake of this example we generate dummy tweet data.
//Read a stream of tweets val tweetStream = env.addSource(generateTweets _) //Extract the stock symbols val mentionedSymbols = tweetStream.flatMap(tweet =&gt; tweet.split(&#34; &#34;)) .map(_.toUpperCase()) .filter(symbols.contains(_)) //Count the extracted symbols val tweetsPerStock = mentionedSymbols.map(Count(_, 1)) .groupBy(&#34;symbol&#34;) .window(Time.of(30, SECONDS)) .sum(&#34;count&#34;) def generateTweets(out: Collector[String]) = { while (true) { val s = for (i &lt;- 1 to 3) yield (symbols(Random.nextInt(symbols.size))) out.collect(s.mkString(&#34; &#34;)) Thread.sleep(Random.nextInt(500)) } } //Read a stream of tweets DataStream&lt;String&gt; tweetStream = env.addSource(new TweetSource()); //Extract the stock symbols DataStream&lt;String&gt; mentionedSymbols = tweetStream.flatMap( new FlatMapFunction&lt;String, String&gt;() { @Override public void flatMap(String value, Collector&lt;String&gt; out) throws Exception { String[] words = value.split(&#34; &#34;); for (String word : words) { out.collect(word.toUpperCase()); } } }).filter(new FilterFunction&lt;String&gt;() { @Override public boolean filter(String value) throws Exception { return SYMBOLS.contains(value); } }); //Count the extracted symbols DataStream&lt;Count&gt; tweetsPerStock = mentionedSymbols.map(new MapFunction&lt;String, Count&gt;() { @Override public Count map(String value) throws Exception { return new Count(value, 1); } }).groupBy(&#34;symbol&#34;).window(Time.of(30, TimeUnit.SECONDS)).sum(&#34;count&#34;).flatten(); public static final class TweetSource implements SourceFunction&lt;String&gt; { Random random; StringBuilder stringBuilder; @Override public void invoke(Collector&lt;String&gt; collector) throws Exception { random = new Random(); stringBuilder = new StringBuilder(); while (true) { stringBuilder.setLength(0); for (int i = 0; i &lt; 3; i++) { stringBuilder.append(&#34; &#34;); stringBuilder.append(SYMBOLS.get(random.nextInt(SYMBOLS.size()))); } collector.collect(stringBuilder.toString()); Thread.sleep(500); } } } Back to top
Streaming joins # Finally, we join real-time tweets and stock prices and compute a rolling correlation between the number of price warnings and the number of mentions of a given stock in the Twitter stream. As both of these data streams are potentially infinite, we apply the join on a 30-second window.
//Join warnings and parsed tweets val tweetsAndWarning = warningsPerStock.join(tweetsPerStock) .onWindow(30, SECONDS) .where(&#34;symbol&#34;) .equalTo(&#34;symbol&#34;) { (c1, c2) =&gt; (c1.count, c2.count) } val rollingCorrelation = tweetsAndWarning.window(Time.of(30, SECONDS)) .mapWindow(computeCorrelation _) rollingCorrelation print //Compute rolling correlation def computeCorrelation(input: Iterable[(Int, Int)], out: Collector[Double]) = { if (input.nonEmpty) { val var1 = input.map(_._1) val mean1 = average(var1) val var2 = input.map(_._2) val mean2 = average(var2) val cov = average(var1.zip(var2).map(xy =&gt; (xy._1 - mean1) * (xy._2 - mean2))) val d1 = Math.sqrt(average(var1.map(x =&gt; Math.pow((x - mean1), 2)))) val d2 = Math.sqrt(average(var2.map(x =&gt; Math.pow((x - mean2), 2)))) out.collect(cov / (d1 * d2)) } } //Join warnings and parsed tweets DataStream&lt;Tuple2&lt;Integer, Integer&gt;&gt; tweetsAndWarning = warningsPerStock .join(tweetsPerStock) .onWindow(30, TimeUnit.SECONDS) .where(&#34;symbol&#34;) .equalTo(&#34;symbol&#34;) .with(new JoinFunction&lt;Count, Count, Tuple2&lt;Integer, Integer&gt;&gt;() { @Override public Tuple2&lt;Integer, Integer&gt; join(Count first, Count second) throws Exception { return new Tuple2&lt;Integer, Integer&gt;(first.count, second.count); } }); //Compute rolling correlation DataStream&lt;Double&gt; rollingCorrelation = tweetsAndWarning .window(Time.of(30, TimeUnit.SECONDS)) .mapWindow(new WindowCorrelation()); rollingCorrelation.print(); public static final class WindowCorrelation implements WindowMapFunction&lt;Tuple2&lt;Integer, Integer&gt;, Double&gt; { private Integer leftSum; private Integer rightSum; private Integer count; private Double leftMean; private Double rightMean; private Double cov; private Double leftSd; private Double rightSd; @Override public void mapWindow(Iterable&lt;Tuple2&lt;Integer, Integer&gt;&gt; values, Collector&lt;Double&gt; out) throws Exception { leftSum = 0; rightSum = 0; count = 0; cov = 0.; leftSd = 0.; rightSd = 0.; //compute mean for both sides, save count for (Tuple2&lt;Integer, Integer&gt; pair : values) { leftSum += pair.f0; rightSum += pair.f1; count++; } leftMean = leftSum.doubleValue() / count; rightMean = rightSum.doubleValue() / count; //compute covariance &amp; std. deviations for (Tuple2&lt;Integer, Integer&gt; pair : values) { cov += (pair.f0 - leftMean) * (pair.f1 - rightMean) / count; } for (Tuple2&lt;Integer, Integer&gt; pair : values) { leftSd += Math.pow(pair.f0 - leftMean, 2) / count; rightSd += Math.pow(pair.f1 - rightMean, 2) / count; } leftSd = Math.sqrt(leftSd); rightSd = Math.sqrt(rightSd); out.collect(cov / (leftSd * rightSd)); } } Back to top
Other things to try # For a full feature overview please check the Streaming Guide, which describes all the available API features. You are very welcome to try out our features for different use-cases we are looking forward to your experiences. Feel free to contact us.
Upcoming for streaming # There are some aspects of Flink Streaming that are subjects to change by the next release making this application look even nicer.
Stay tuned for later blog posts on how Flink Streaming works internally, fault tolerance, and performance measurements!
Back to top
`}),e.add({id:260,href:"/2015/02/04/january-2015-in-the-flink-community/",title:"January 2015 in the Flink community",section:"Flink Blog",content:`Happy 2015! Here is a (hopefully digestible) summary of what happened last month in the Flink community.
0.8.0 release # Flink 0.8.0 was released. See here for the release notes.
Flink roadmap # The community has published a roadmap for 2015 on the Flink wiki. Check it out to see what is coming up in Flink, and pick up an issue to contribute!
Articles in the press # The Apache Software Foundation announced Flink as a Top-Level Project. The announcement was picked up by the media, e.g., here, here, and here.
Hadoop Summit # A submitted abstract on Flink Streaming won the community vote at “The Future of Hadoop” track.
Meetups and talks # Flink was presented at the Paris Hadoop User Group, the Bay Area Hadoop User Group, the Apache Tez User Group, and FOSDEM 2015. The January Flink meetup in Berlin had talks on recent community updates and new features.
Notable code contributions # Note: Code contributions listed here may not be part of a release or even the Flink master repository yet.
Using off-heap memory # This pull request enables Flink to use off-heap memory for its internal memory uses (sort, hash, caching of intermediate data sets).
Gelly, Flink’s Graph API # This pull request introduces Gelly, Flink’s brand new Graph API. Gelly offers a native graph programming abstraction with functionality for vertex-centric programming, as well as available graph algorithms. See this slide set for an overview of Gelly.
Semantic annotations # Semantic annotations are a powerful mechanism to expose information about the behavior of Flink functions to Flink’s optimizer. The optimizer can leverage this information to generate more efficient execution plans. For example the output of a Reduce operator that groups on the second field of a tuple is still partitioned on that field if the Reduce function does not modify the value of the second field. By exposing this information to the optimizer, the optimizer can generate plans that avoid expensive data shuffling and reuse the partitioned output of Reduce. Semantic annotations can be defined for most data types, including (nested) tuples and POJOs. See the snapshot documentation for details (not online yet).
New YARN client # The improved YARN client of Flink now allows users to deploy Flink on YARN for executing a single job. Older versions only supported a long-running YARN session. The code of the YARN client has been refactored to provide an (internal) Java API for controlling YARN clusters more easily.
`}),e.add({id:261,href:"/2015/01/21/apache-flink-0.8.0-available/",title:"Apache Flink 0.8.0 available",section:"Flink Blog",content:`We are pleased to announce the availability of Flink 0.8.0. This release includes new user-facing features as well as performance and bug fixes, extends the support for filesystems and introduces the Scala API and flexible windowing semantics for Flink Streaming. A total of 33 people have contributed to this release, a big thanks to all of them!
Download Flink 0.8.0
See the release changelog
Overview of major new features # Extended filesystem support: The former DistributedFileSystem interface has been generalized to HadoopFileSystem now supporting all sub classes of org.apache.hadoop.fs.FileSystem. This allows users to use all file systems supported by Hadoop with Apache Flink. See connecting to other systems
Streaming Scala API: As an alternative to the existing Java API Streaming is now also programmable in Scala. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release.
Streaming windowing semantics: The new windowing api offers an expressive way to define custom logic for triggering the execution of a stream window and removing elements. The new features include out-of-the-box support for windows based in logical or physical time and data-driven properties on the events themselves among others. Read more here
Mutable and immutable objects in runtime All Flink versions before 0.8.0 were always passing the same objects to functions written by users. This is a common performance optimization, also used in other systems such as Hadoop. However, this is error-prone for new users because one has to carefully check that references to the object aren’t kept in the user function. Starting from 0.8.0, Flink allows to configure a mode which is disabling that mechanism.
Performance and usability improvements: The new Apache Flink 0.8.0 release brings several new features which will significantly improve the performance and the usability of the system. Amongst others, these features include:
Improved input split assignment which maximizes computation locality Smart broadcasting mechanism which minimizes network I/O Custom partitioners which let the user control how the data is partitioned within the cluster. This helps to prevent data skewness and allows to implement highly efficient algorithms. coGroup operator now supports group sorting for its inputs Kryo is the new fallback serializer: Apache Flink has a sophisticated type analysis and serialization framework that is able to handle commonly used types very efficiently. In addition to that, there is a fallback serializer for types which are not supported. Older versions of Flink used the reflective Avro serializer for that purpose. With this release, Flink is using the powerful Kryo and twitter-chill library for support of types such as Java Collections and Scala specifc types.
Hadoop 2.2.0+ is now the default Hadoop dependency: With Flink 0.8.0 we made the “hadoop2” build profile the default build for Flink. This means that all users using Hadoop 1 (0.2X or 1.2.X versions) have to specify version “0.8.0-hadoop1” in their pom files.
HBase module updated The HBase version has been updated to 0.98.6.1. Also, Hbase is now available to the Hadoop1 and Hadoop2 profile of Flink.
Contributors # Marton Balassi Daniel Bali Carsten Brandt Moritz Borgmann Stefan Bunk Paris Carbone Ufuk Celebi Nils Engelbach Stephan Ewen Gyula Fora Gabor Hermann Fabian Hueske Vasiliki Kalavri Johannes Kirschnick Aljoscha Krettek Suneel Marthi Robert Metzger Felix Neutatz Chiwan Park Flavio Pompermaier Mingliang Qi Shiva Teja Reddy Till Rohrmann Henry Saputra Kousuke Saruta Chesney Schepler Erich Schubert Peter Szabo Jonas Traub Kostas Tzoumas Timo Walther Daniel Warneke Chen Xu `}),e.add({id:262,href:"/2015/01/06/december-2014-in-the-flink-community/",title:"December 2014 in the Flink community",section:"Flink Blog",content:`This is the first blog post of a “newsletter” like series where we give a summary of the monthly activity in the Flink community. As the Flink project grows, this can serve as a &ldquo;tl;dr&rdquo; for people that are not following the Flink dev and user mailing lists, or those that are simply overwhelmed by the traffic.
Flink graduation # The biggest news is that the Apache board approved Flink as a top-level Apache project! The Flink team is working closely with the Apache press team for an official announcement, so stay tuned for details!
New Flink website # The Flink website got a total make-over, both in terms of appearance and content.
Flink IRC channel # A new IRC channel called #flink was created at irc.freenode.org. An easy way to access the IRC channel is through the web client. Feel free to stop by to ask anything or share your ideas about Apache Flink!
Meetups and Talks # Apache Flink was presented in the Amsterdam Hadoop User Group
Notable code contributions # Note: Code contributions listed here may not be part of a release or even the current snapshot yet.
Streaming Scala API # The Flink Streaming Java API recently got its Scala counterpart. Once merged, Flink Streaming users can use both Scala and Java for their development. The Flink Streaming Scala API is built as a thin layer on top of the Java API, making sure that the APIs are kept easily in sync.
Intermediate datasets # This pull request introduces a major change in the Flink runtime. Currently, the Flink runtime is based on the notion of operators that exchange data through channels. With the PR, intermediate data sets that are produced by operators become first-class citizens in the runtime. While this does not have any user-facing impact yet, it lays the groundwork for a slew of future features such as blocking execution, fine-grained fault-tolerance, and more efficient data sharing between cluster and client.
Configurable execution mode # This pull request allows the user to change the object-reuse behaviour. Before this pull request, some operations would reuse objects passed to the user function while others would always create new objects. This introduces a system wide switch and changes all operators to either reuse objects or don’t reuse objects.
Distributed Coordination via Akka # Another major change is a complete rewrite of the JobManager / TaskManager components in Scala. In addition to that, the old RPC service was replaced by Actors, using the Akka framework.
Sorting of very large records # Flink&rsquo;s internal sort-algorithms were improved to better handle large records (multiple 100s of megabytes or larger). Previously, the system did in some cases hold instances of multiple large records, resulting in high memory consumption and JVM heap thrashing. Through this fix, large records are streamed through the operators, reducing the memory consumption and GC pressure. The system now requires much less memory to support algorithms that work on such large records.
Kryo Serialization as the new default fallback # Flink’s build-in type serialization framework is handles all common types very efficiently. Prior versions uses Avro to serialize types that the built-in framework could not handle. Flink serialization system improved a lot over time and by now surpasses the capabilities of Avro in many cases. Kryo now serves as the default fallback serialization framework, supporting a much broader range of types.
Hadoop FileSystem support # This change permits users to use all file systems supported by Hadoop with Flink. In practice this means that users can use Flink with Tachyon, Google Cloud Storage (also out of the box Flink YARN support on Google Compute Cloud), FTP and all the other file system implementations for Hadoop.
Heading to the 0.8.0 release # The community is working hard together with the Apache infra team to migrate the Flink infrastructure to a top-level project. At the same time, the Flink community is working on the Flink 0.8.0 release which should be out very soon.
`}),e.add({id:263,href:"/2014/11/18/hadoop-compatibility-in-flink/",title:"Hadoop Compatibility in Flink",section:"Flink Blog",content:`Apache Hadoop is an industry standard for scalable analytical data processing. Many data analysis applications have been implemented as Hadoop MapReduce jobs and run in clusters around the world. Apache Flink can be an alternative to MapReduce and improves it in many dimensions. Among other features, Flink provides much better performance and offers APIs in Java and Scala, which are very easy to use. Similar to Hadoop, Flink’s APIs provide interfaces for Mapper and Reducer functions, as well as Input- and OutputFormats along with many more operators. While being conceptually equivalent, Hadoop’s MapReduce and Flink’s interfaces for these functions are unfortunately not source compatible.
Flink’s Hadoop Compatibility Package # To close this gap, Flink provides a Hadoop Compatibility package to wrap functions implemented against Hadoop’s MapReduce interfaces and embed them in Flink programs. This package was developed as part of a Google Summer of Code 2014 project.
With the Hadoop Compatibility package, you can reuse all your Hadoop
InputFormats (mapred and mapreduce APIs) OutputFormats (mapred and mapreduce APIs) Mappers (mapred API) Reducers (mapred API) in Flink programs without changing a line of code. Moreover, Flink also natively supports all Hadoop data types (Writables and WritableComparable).
The following code snippet shows a simple Flink WordCount program that solely uses Hadoop data types, InputFormat, OutputFormat, Mapper, and Reducer functions.
// Definition of Hadoop Mapper function public class Tokenizer implements Mapper&lt;LongWritable, Text, Text, LongWritable&gt; { ... } // Definition of Hadoop Reducer function public class Counter implements Reducer&lt;Text, LongWritable, Text, LongWritable&gt; { ... } public static void main(String[] args) { final String inputPath = args[0]; final String outputPath = args[1]; final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); // Setup Hadoop’s TextInputFormat HadoopInputFormat&lt;LongWritable, Text&gt; hadoopInputFormat = new HadoopInputFormat&lt;LongWritable, Text&gt;( new TextInputFormat(), LongWritable.class, Text.class, new JobConf()); TextInputFormat.addInputPath(hadoopInputFormat.getJobConf(), new Path(inputPath)); // Read a DataSet with the Hadoop InputFormat DataSet&lt;Tuple2&lt;LongWritable, Text&gt;&gt; text = env.createInput(hadoopInputFormat); DataSet&lt;Tuple2&lt;Text, LongWritable&gt;&gt; words = text // Wrap Tokenizer Mapper function .flatMap(new HadoopMapFunction&lt;LongWritable, Text, Text, LongWritable&gt;(new Tokenizer())) .groupBy(0) // Wrap Counter Reducer function (used as Reducer and Combiner) .reduceGroup(new HadoopReduceCombineFunction&lt;Text, LongWritable, Text, LongWritable&gt;( new Counter(), new Counter())); // Setup Hadoop’s TextOutputFormat HadoopOutputFormat&lt;Text, LongWritable&gt; hadoopOutputFormat = new HadoopOutputFormat&lt;Text, LongWritable&gt;( new TextOutputFormat&lt;Text, LongWritable&gt;(), new JobConf()); hadoopOutputFormat.getJobConf().set(&#34;mapred.textoutputformat.separator&#34;, &#34; &#34;); TextOutputFormat.setOutputPath(hadoopOutputFormat.getJobConf(), new Path(outputPath)); // Output &amp; Execute words.output(hadoopOutputFormat); env.execute(&#34;Hadoop Compat WordCount&#34;); } As you can see, Flink represents Hadoop key-value pairs as Tuple2&lt;key, value&gt; tuples. Note, that the program uses Flink’s groupBy() transformation to group data on the key field (field 0 of the Tuple2&lt;key, value&gt;) before it is given to the Reducer function. At the moment, the compatibility package does not evaluate custom Hadoop partitioners, sorting comparators, or grouping comparators.
Hadoop functions can be used at any position within a Flink program and of course also be mixed with native Flink functions. This means that instead of assembling a workflow of Hadoop jobs in an external driver method or using a workflow scheduler such as Apache Oozie, you can implement an arbitrary complex Flink program consisting of multiple Hadoop Input- and OutputFormats, Mapper and Reducer functions. When executing such a Flink program, data will be pipelined between your Hadoop functions and will not be written to HDFS just for the purpose of data exchange.
What comes next? # While the Hadoop compatibility package is already very useful, we are currently working on a dedicated Hadoop Job operation to embed and execute Hadoop jobs as a whole in Flink programs, including their custom partitioning, sorting, and grouping code. With this feature, you will be able to chain multiple Hadoop jobs, mix them with Flink functions, and other operations such as Spargel operations (Pregel/Giraph-style jobs).
Summary # Flink lets you reuse a lot of the code you wrote for Hadoop MapReduce, including all data types, all Input- and OutputFormats, and Mapper and Reducers of the mapred-API. Hadoop functions can be used within Flink programs and mixed with all other Flink functions. Due to Flink’s pipelined execution, Hadoop functions can arbitrarily be assembled without data exchange via HDFS. Moreover, the Flink community is currently working on a dedicated Hadoop Job operation to supporting the execution of Hadoop jobs as a whole.
If you want to use Flink’s Hadoop compatibility package checkout our documentation.
`}),e.add({id:264,href:"/2014/11/04/apache-flink-0.7.0-available/",title:"Apache Flink 0.7.0 available",section:"Flink Blog",content:`We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!
Download Flink 0.7.0 here
See the release changelog here
Overview of major new features # Flink Streaming: The gem of the 0.7.0 release is undoubtedly Flink Streaming. Available currently in alpha, Flink Streaming provides a Java API on top of Apache Flink that can consume streaming data sources (e.g., from Apache Kafka, Apache Flume, and others) and process them in real time. A dedicated blog post on Flink Streaming and its performance is coming up here soon. You can check out the Streaming programming guide here.
New Scala API: The Scala API has been completely rewritten. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release. See the new Scala API here.
Logical key expressions: You can now specify grouping and joining keys with logical names for member variables of POJO data types. For example, you can join two data sets as persons.join(cities).where(“zip”).equalTo(“zipcode”). Read more here.
Hadoop MapReduce compatibility: You can run unmodified Hadoop Mappers and Reducers (mapred API) in Flink, use all Hadoop data types, and read data with all Hadoop InputFormats.
Collection-based execution backend: The collection-based execution backend enables you to execute a Flink job as a simple Java collections program, bypassing completely the Flink runtime and optimizer. This feature is extremely useful for prototyping, and embedding Flink jobs in projects in a very lightweight manner.
Record API deprecated: The (old) Stratosphere Record API has been marked as deprecated and is planned for removal in the 0.9.0 release.
BLOB service: This release contains a new service to distribute jar files and other binary data among the JobManager, TaskManagers and the client.
Intermediate data sets: A major rewrite of the system internals introduces intermediate data sets as first class citizens. The internal state machine that tracks the distributed tasks has also been completely rewritten for scalability. While this is not visible as a user-facing feature yet, it is the foundation for several upcoming exciting features.
Note: Currently, there is limited support for Java 8 lambdas when compiling and running from an IDE. The problem is due to type erasure and whether Java compilers retain type information. We are currently working with the Eclipse and OpenJDK communities to resolve this.
Contributors # Tamas Ambrus Mariem Ayadi Marton Balassi Daniel Bali Ufuk Celebi Hung Chang David Eszes Stephan Ewen Judit Feher Gyula Fora Gabor Hermann Fabian Hueske Vasiliki Kalavri Kristof Kovacs Aljoscha Krettek Sebastian Kruse Sebastian Kunert Matyas Manninger Robert Metzger Mingliang Qi Till Rohrmann Henry Saputra Chesnay Schelper Moritz Schubotz Hung Sendoh Chang Peter Szabo Jonas Traub Fabian Tschirschnitz Artem Tsikiridis Kostas Tzoumas Timo Walther Daniel Warneke Tobias Wiens Yingjun Wu `}),e.add({id:265,href:"/2014/10/03/upcoming-events/",title:"Upcoming Events",section:"Flink Blog",content:`We are happy to announce several upcoming Flink events both in Europe and the US. Starting with a Flink hackathon in Stockholm (Oct 8-9) and a talk about Flink at the Stockholm Hadoop User Group (Oct 8). This is followed by the very first Flink Meetup in Berlin (Oct 15). In the US, there will be two Flink Meetup talks: the first one at the Pasadena Big Data User Group (Oct 29) and the second one at Silicon Valley Hands On Programming Events (Nov 4).
We are looking forward to seeing you at any of these events. The following is an overview of each event and links to the respective Meetup pages.
Flink Hackathon, Stockholm (Oct 8-9) # The hackathon will take place at KTH/SICS from Oct 8th-9th. You can sign up here: https://docs.google.com/spreadsheet/viewform?formkey=dDZnMlRtZHJ3Z0hVTlFZVjU2MWtoX0E6MA.
Here is a rough agenda and a list of topics to work upon or look into. Suggestions and more topics are welcome.
Wednesday (8th) # 9:00 - 10:00 Introduction to Apache Flink, System overview, and Dev environment (by Stephan)
10:15 - 11:00 Introduction to the topics (Streaming API and system by Gyula &amp; Marton), (Graphs by Vasia / Martin / Stephan)
11:00 - 12:30 Happy hacking (part 1)
12:30 - Lunch (Food will be provided by KTH / SICS. A big thank you to them and also to Paris, for organizing that)
13:xx - Happy hacking (part 2)
Thursday (9th) # Happy hacking (continued)
Suggestions for topics # Streaming # Sample streaming applications (e.g. continuous heavy hitters and topics on the twitter stream)
Implement a simple SQL to Streaming program parser. Possibly using Apache Calcite (http://optiq.incubator.apache.org/)
Implement different windowing methods (count-based, time-based, &hellip;)
Implement different windowed operations (windowed-stream-join, windowed-stream-co-group)
Streaming state, and interaction with other programs (that access state of a stream program)
Graph Analysis # Prototype a Graph DSL (simple graph building, filters, graph properties, some algorithms)
Prototype abstractions different Graph processing paradigms (vertex-centric, partition-centric).
Generalize the delta iterations, allow flexible state access.
Meetup: Hadoop User Group Talk, Stockholm (Oct 8) # Hosted by Spotify, opens at 6 PM.
http://www.meetup.com/stockholm-hug/events/207323222/
1st Flink Meetup, Berlin (Oct 15) # We are happy to announce the first Flink meetup in Berlin. You are very welcome to to sign up and attend. The event will be held in Betahaus Cafe.
http://www.meetup.com/Apache-Flink-Meetup/events/208227422/
Meetup: Pasadena Big Data User Group (Oct 29) # http://www.meetup.com/Pasadena-Big-Data-Users-Group/
Meetup: Silicon Valley Hands On Programming Events (Nov 4) # http://www.meetup.com/HandsOnProgrammingEvents/events/210504392/
`}),e.add({id:266,href:"/2014/09/26/apache-flink-0.6.1-available/",title:"Apache Flink 0.6.1 available",section:"Flink Blog",content:`We are happy to announce the availability of Flink 0.6.1.
0.6.1 is a maintenance release, which includes minor fixes across several parts of the system. We suggest all users of Flink to work with this newest version.
Download the release today.
`}),e.add({id:267,href:"/2014/08/26/apache-flink-0.6-available/",title:"Apache Flink 0.6 available",section:"Flink Blog",content:`We are happy to announce the availability of Flink 0.6. This is the first release of the system inside the Apache Incubator and under the name Flink. Releases up to 0.5 were under the name Stratosphere, the academic and open source project that Flink originates from.
What is Flink? # Apache Flink is a general-purpose data processing engine for clusters. It runs on YARN clusters on top of data stored in Hadoop, as well as stand-alone. Flink currently has programming APIs in Java and Scala. Jobs are executed via Flink&rsquo;s own runtime engine. Flink features:
Robust in-memory and out-of-core processing: once read, data stays in memory as much as possible, and is gracefully de-staged to disk in the presence of memory pressure from limited memory or other applications. The runtime is designed to perform very well both in setups with abundant memory and in setups where memory is scarce.
POJO-based APIs: when programming, you do not have to pack your data into key-value pairs or some other framework-specific data model. Rather, you can use arbitrary Java and Scala types to model your data.
Efficient iterative processing: Flink contains explicit &ldquo;iterate&rdquo; operators that enable very efficient loops over data sets, e.g., for machine learning and graph applications.
A modular system stack: Flink is not a direct implementation of its APIs but a layered system. All programming APIs are translated to an intermediate program representation that is compiled and optimized via a cost-based optimizer. Lower-level layers of Flink also expose programming APIs for extending the system.
Data pipelining/streaming: Flink&rsquo;s runtime is designed as a pipelined data processing engine rather than a batch processing engine. Operators do not wait for their predecessors to finish in order to start processing data. This results to very efficient handling of large data sets.
Release 0.6 # Flink 0.6 builds on the latest Stratosphere 0.5 release. It includes many bug fixes and improvements that make the system more stable and robust, as well as breaking API changes.
The full release notes are available here.
Download the release here.
Contributors # Wilson Cao Ufuk Celebi Stephan Ewen Jonathan Hasenburg Markus Holzemer Fabian Hueske Sebastian Kunert Vikhyat Korrapati Aljoscha Krettek Sebastian Kruse Raymond Liu Robert Metzger Mingliang Qi Till Rohrmann Henry Saputra Chesnay Schepler Kostas Tzoumas Robert Waury Timo Walther Daniel Warneke Tobias Wiens `}),e.add({id:268,href:"/how-to-contribute/code-style-and-quality-common/",title:"Code Style and Quality Guide — Common Rules",section:"How to Contribute",content:` Code Style and Quality Guide — Common Rules # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # 1. Copyright # Each file must include the Apache license information as a header.
/* * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * &#34;License&#34;); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an &#34;AS IS&#34; BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ 2. Tools # We recommend to follow the IDE Setup Guide to get IDE tooling configured.
Warnings # We strive for zero warnings Even though there are many warnings in existing code, new changes should not add any additional compiler warnings If it is not possible to address the warning in a sane way (in some cases when working with generics) add an annotation to suppress the warning When deprecating methods, check that this does not introduce additional warnings 3. Comments And Code Readability # Comments # Golden rule: Comment as much as necessary to support code understanding, but don’t add redundant information.
Think about
What is the code doing? How does the code do this? Why is the code like that? The code alone should explain as much as possible the “what” and the “how”
Use JavaDocs to describe the roles of classes and the contracts of methods, in cases where the contract is not obvious or intuitive from the method name (the “what”). The flow of the code should give a good description of the “how”. Think of variable and method names as part of the code documenting itself. It often makes reading the code easier if larger blocks that form a unit are moved into a private method with a descriptive name of what that block is doing In-code comments help explain the “why”
For example // this specific code layout helps the JIT to better do this or that Or // nulling out this field here means future write attempts are fail-fast Or // for arguments with which this method is actually called, this seemingly naive approach works actually better than any optimized/smart version In-code comments should not state redundant information about the “what” and “how” that is already obvious in the code itself.
JavaDocs should not state meaningless information (just to satisfy the Checkstyle checker).
Don’t:
/** * The symbol expression. */ public class CommonSymbolExpression {} Do:
/** * An expression that wraps a single specific symbol. * A symbol could be a unit, an alias, a variable, etc. */ public class CommonSymbolExpression {} Branches and Nesting # Avoid deep nesting of scopes, by flipping the if condition and exiting early.
Don’t:
if (a) { if (b) { if (c) { the main path } } } Do
if (!a) { return .. } if (!b) { return ... } if (!c) { return ... } the main path 4. Design and Structure # While it is hard to exactly specify what constitutes a good design, there are some properties that can serve as a litmus test for a good design. If these properties are given, the chances are good that the design is going into a good direction. If these properties cannot be achieved, there is a high probability that the design is flawed.
Immutability and Eager Initialization # Try to use immutable types where possible, especially for APIs, messages, identifiers, properties, configuration, etc. A good general approach is to try and make as many fields of a class final as possible. Classes that are used as keys in maps should be strictly immutable and only have final fields (except maybe auxiliary fields, like lazy cached hash codes). Eagerly initialize classes. There should be no init() or setup() methods. Once the constructor completes, the object should be usable. Nullability of the Mutable Parts # For nullability, the Flink codebase aims to follow these conventions:
Fields, parameters, and return types are always non-null, unless indicated otherwise All fields, parameters and method types that can be null should be annotated with @javax.annotation.Nullable. That way, you get warnings from IntelliJ about all sections where you have to reason about potential null values. For all mutable (non-final) fields that are not annotated, the assumption is that while the field value changes, there always is a value. This should be double check whether these can in fact not be null throughout the lifetime of the object. Note: This means that @Nonnull annotations are usually not necessary, but can be used in certain cases to override a previous annotation, or to point non-nullability out in a context where one would expect a nullable value.
Optional is a good solution as a return type for method that may or may not have a result, so nullable return types are good candidates to be replaced with Optional. See also usage of Java Optional.
Avoid Code Duplication # Whenever you are about to copy/paste some code, or reproduce a similar type of functionality in a different place, think about the ways how to refactor/reuse/abstract the changes to avoid the duplication. Common behavior between different specializations should be shared in a common component (or a shared superclass). Always use “private static final” constants instead of duplicating strings or other special values at different locations. Constants should be declared in the top member area of a class. Design for Testability # Code that is easily testable typically has good separation of concerns and is structured to be reusable outside the original context (by being easily reusable in tests).
A good summary or problems / symptoms and recommended refactoring is in the PDF linked below. Please note that while the examples in the PDF often use a dependency injection framework (Guice), it works in the same way without such a framework.1
http://misko.hevery.com/attachments/Guide-Writing%20Testable%20Code.pdf
Here is a compact summary of the most important aspects.
Inject dependencies
Reusability becomes easier if constructors don’t create their dependencies (the objects assigned to the fields), but accept them as parameters.
Effectively, constructors should have no new keyword. Exceptions are creating a new empty collection (new ArrayList&lt;&gt;()) or similar auxiliary fields (objects that have only primitive dependencies). To make instantiation easy / readable, add factory methods or additional convenience constructors to construct whole object with dependencies.
In no case should it ever be required to use a reflection or a “Whitebox” util to change the fields of an object in a test, or to use PowerMock to intercept a “new” call and supply a mock.
Avoid “too many collaborators”
If you have to take a big set of other components into account during testing (“too many collaborators”), consider refactoring.
The component/class you want to test probably depends on another broad component (and its implementation), rather than on the minimal interface (abstraction) required for its work.
In that case, segregate the interfaces (factor out the minimal required interface) and supply a test stub in that case.
For example, if testing a S3RecoverableMultiPartUploader requires actual S3 access then the S3 access should be factored out into an interface and test should replace it by a test stub This naturally requires to be able to inject dependencies (see above) ⇒ Please note that these steps often require more effort in implementing tests (factoring out interfaces, creating dedicated test stubs), but make the tests more resilient to changes in other components, i.e., you do not need to touch the tests when making unrelated changes.
Performance Awareness # We can conceptually distinguish between code that “coordinates” and code that “processes data”. Code that coordinates should always favor simplicity and cleanness. Data processing code is highly performance critical and should optimize for performance.
That means still applying the general idea of the sections above, but possibly forgoing some aspects in some place, in order to achieve higher performance.
Which code paths are Data Processing paths?
Per-record code paths: Methods and code paths that are called for each record. Found for example in Connectors, Serializers, State Backends, Formats, Tasks, Operators, Metrics, runtime data structures, etc. I/O methods: Transferring messages or chunks of data in buffers. Examples are in the RPC system, Network Stack, FileSystems, Encoders / Decoders, etc. Things that performance critical code may do that we would otherwise avoid
Using (and reusing) mutable objects to take pressure off the GC (and sometimes help with cache locality), thus forgoing the strive for immutability. Using primitive types, arrays of primitive types, or MemorySegment/ByteBuffer and encoding meaning into the primitive types and byte sequences, rather than encapsulating the behavior in dedicated classes and using objects. Structuring the code to amortize expensive work (allocations, lookups, virtual method calls, …) across multiple records, for example by doing the work once per buffer/bundle/batch. Code layout optimized for the JIT rather than for readability. Examples are inlining fields from other classes (in cases where it is doubtful whether the JIT would do that optimization at runtime), or structuring code to help the JIT compiler with inlining, loop unrolling, vectorization, etc. 5. Concurrency and Threading # Most code paths should not require any concurrency. The right internal abstractions should obviate the need for concurrency in almost all cases.
The Flink core and runtime use concurrency to provide these building blocks. Examples are in the RPC system, Network Stack, in the Task’s mailbox model, or some predefined Source / Sink utilities. We are not fully there, but any new addition that introduces implements its own concurrency should be under scrutiny, unless it falls into the above category of core system building blocks. Contributors should reach out to committers if they feel they need to implement concurrent code to see if there is an existing abstraction/building-block, or if one should be added. When developing a component think about threading model and synchronization points ahead.
For example: single threaded, blocking, non-blocking, synchronous, asynchronous, multi threaded, thread pool, message queues, volatile, synchronized block/methods, mutexes, atomics, callbacks, … Getting those things right and thinking about them ahead is even more important than designing classes interfaces/responsibilities, since it’s much harder to change later on. Try to avoid using threads all together if possible in any way.
If you feel you have a case for spawning a thread, point this out in the pull request as something to be explicitly reviewed. Be aware that using threads is in fact much harder than it initially looks
Clean shutdown of threads is very tricky. Handling interruptions in a rock solid fashion (avoid both slow shutdown and live locks) requires almost a Java Wizard Ensuring clean error propagation out of threads in all cases needs thorough design. Complexity of multi-threaded application/component/class grows exponentially, with each additional synchronisation point/block/critical section. Your code initially might be easy enough to understand, but can quickly grow beyond that point. Proper testing of multithreaded code is basically impossible, while alternative approaches (like asynchronous code, non-blocking code, actor model with message queues) are quite easy to test. Usually multi-threaded code is often even less efficient compared to alternative approaches on modern hardware. Be aware of the java.util.concurrent.CompletableFuture
Like with other concurrent code, there should rarely be the need to use a CompletableFuture Completing a future would also complete on the calling thread any chained futures that are waiting for the result to be completed, unless a completion executor specified explicitly. This can be intentional, if the entire execution should be synchronous / single-threaded, as for example in parts of the Scheduler / ExecutionGraph. Flink even makes use of a “main-thread executor” to allow calling chained handlers in the same thread as a single-threaded RPC endpoint runs This can be unexpected, if the thread that completes the future is a sensitive thread. It may be better to use CompletableFuture.supplyAsync(value, executor) in that case, instead of future.complete(value) when an executor is available When blocking on a future awaiting completion, always supply a timeout for a result instead of waiting indefinitely, and handle timeouts explicitly. Use CompletableFuture.allOf()/anyOf(), ExecutorCompletionService, or org.apache.flink.runtime.concurrent.FutureUtils#waitForAll if you need to wait for: all the results/any of the results/all the results but handled by (approximate) completion order. 6. Dependencies and Modules # Keep the dependency footprint small The more dependencies the harder it gets for the community to manage them as a whole. Dependency management includes dependency conflicts, maintaining licenses and related notices, and handling security vulnerabilities. Discuss whether the dependency should be shaded/relocated to avoid future conflicts. Don’t add a dependency for just one method Use Java built-in means if possible. If the method is Apache-licensed, you can copy the method into a Flink utility class with proper attribution. Declaration of dependencies Declare dependencies that you explicitly rely on, whether it provides classes you directly import and use or it&rsquo;s something that provides a service you directly use, like Log4J. Transitive dependencies should only supply dependencies that are needed at runtime but that you don&rsquo;t use yourself. [source] Location of classes in the Maven modules Whenever you create a new class, think about where to put it. A class might be used by multiple modules in the future and might belong into a common module in this case. 7. Testing # Tooling # We are moving our codebase to JUnit 5 and AssertJ as our testing framework and assertions library of choice.
Unless there is a specific reason, make sure you use JUnit 5 and AssertJ when contributing to Flink with new tests and even when modifying existing tests. Don&rsquo;t use Hamcrest, JUnit assertions and assert directive. Make your tests readable and don&rsquo;t duplicate assertions logic provided by AssertJ or by custom assertions provided by some flink modules. For example, avoid:
assert list.size() == 10; for (String item : list) { assertTrue(item.length() &lt; 10); } And instead use:
assertThat(list) .hasSize(10) .allMatch(item -&gt; item.length() &lt; 10); Write targeted tests # Test contracts not implementations: Test that after a sequence of actions, the components are in a certain state, rather than testing that the components followed a sequence of internal state modifications.
For example, a typical antipattern is to check whether one specific method was called as part of the test A way to enforce this is to try to follow the Arrange, Act, Assert test structure when writing a unit test (https://xp123.com/articles/3a-arrange-act-assert/)
This helps to communicate the intention of the test (what is the scenario under test) rather than the mechanics of the tests. The technical bits go to a static methods at the bottom of the test class.
Example of tests in Flink that follow this pattern are:
https://github.com/apache/flink/blob/master/flink-core/src/test/java/org/apache/flink/util/LinkedOptionalMapTest.java https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-base/src/test/java/org/apache/flink/fs/s3/common/writer/RecoverableMultiPartUploadImplTest.java Avoid Mockito - Use reusable test implementations # Mockito-based tests tend to be costly to maintain in the long run by encouraging duplication of functionality and testing for implementation rather than effect More details: https://docs.google.com/presentation/d/1fZlTjOJscwmzYadPGl23aui6zopl94Mn5smG-rB0qT8 Instead, create reusable test implementations and utilities That way, when some class changes, we only have to update a few test utils or mocks Avoid timeouts in JUnit tests # Generally speaking, we should avoid setting local timeouts in JUnit tests but rather depend on the global timeout in Azure. The global timeout benefits from taking thread dumps just before timing out the build, easing debugging.
At the same time, any timeout value that you manually set is arbitrary. If it&rsquo;s set too low, you get test instabilities. What too low means depends on numerous factors, such as hardware and current utilization (especially I/O). Moreover, a local timeout is more maintenance-intensive. It&rsquo;s one more knob where you can tweak a build. If you change the test a bit, you also need to double-check the timeout. Hence, there have been quite a few commits that just increase timeouts.
We are keeping such frameworks out of Flink, to make debugging easier and avoid dependency clashes.&#160;&#x21a9;&#xfe0e;
`}),e.add({id:269,href:"/how-to-contribute/code-style-and-quality-components/",title:"Code Style and Quality Guide — Components Guide",section:"How to Contribute",content:` Code Style and Quality Guide — Components Guide # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # Component Specific Guidelines # Additional guidelines about changes in specific components.
Configuration Changes # Where should the config option go?
‘flink-conf.yaml’: All configuration that pertains to execution behavior that one may want to standardize across jobs. Think of it as parameters someone would set wearing an “ops” hat, or someone that provides a stream processing platform to other teams.
‘ExecutionConfig’: Parameters specific to an individual Flink application, needed by the operators during execution. Typical examples are watermark interval, serializer parameters, object reuse.
ExecutionEnvironment (in code): Everything that is specific to an individual Flink application and is only needed to build program / dataflow, not needed inside the operators during execution.
How to name config keys:
Config key names should be hierarchical. Think of the configuration as nested objects (JSON style)
taskmanager: { jvm-exit-on-oom: true, network: { detailed-metrics: false, request-backoff: { initial: 100, max: 10000 }, memory: { fraction: 0.1, min: 64MB, max: 1GB, buffers-per-channel: 2, floating-buffers-per-gate: 16 } } } The resulting config keys should hence be:
NOT &quot;taskmanager.detailed.network.metrics&quot;
But rather &quot;taskmanager.network.detailed-metrics&quot;
Connectors # Connectors are historically hard to implement and need to deal with many aspects of threading, concurrency, and checkpointing.
As part of FLIP-27 we are working on making this much simpler for sources. New sources should not have to deal with any aspect of concurrency/threading and checkpointing any more.
A similar FLIP can be expected for sinks in the near future.
Examples # Examples should be self-contained and not require systems other than Flink to run. Except for examples that show how to use specific connectors, like the Kafka connector. Sources/sinks that are ok to use are StreamExecutionEnvironment.socketTextStream, which should not be used in production but is quite handy for exploring how things work, and file-based sources/sinks. (For streaming, there is the continuous file source)
Examples should also not be pure toy-examples but strike a balance between real-world code and purely abstract examples. The WordCount example is quite long in the tooth by now but it’s a good showcase of simple code that highlights functionality and can do useful things.
Examples should also be heavy in comments. They should describe the general idea of the example in the class-level Javadoc and describe what is happening and what functionality is used throughout the code. The expected input data and output data should also be described.
Examples should include parameter parsing, so that you can run an example (from the Jar that is created for each example using bin/flink run path/to/myExample.jar --param1 … --param2.
Table &amp; SQL API # Semantics # The SQL standard should be the main source of truth.
Syntax, semantics, and features should be aligned with SQL! We don’t need to reinvent the wheel. Most problems have already been discussed industry-wide and written down in the SQL standard. We rely on the newest standard (SQL:2016 or ISO/IEC 9075:2016 when writing this document (download). Not every part is available online but a quick web search might help here. Discuss divergence from the standard or vendor-specific interpretations.
Once a syntax or behavior is defined it cannot be undone easily. Contributions that need to extent or interpret the standard need a thorough discussion with the community. Please help committers by performing some initial research about how other vendors such as Postgres, Microsoft SQL Server, Oracle, Hive, Calcite, Beam are handling such cases. Consider the Table API as a bridge between the SQL and Java/Scala programming world.
The Table API is an Embedded Domain Specific Language for analytical programs following the relational model. It is not required to strictly follow the SQL standard in regards of syntax and names, but can be closer to the way a programming language would do/name functions and features, if that helps make it feel more intuitive. The Table API might have some non-SQL features (e.g. map(), flatMap(), etc.) but should nevertheless “feel like SQL”. Functions and operations should have equal semantics and naming if possible. Common mistakes # Support SQL’s type system when adding a feature. A SQL function, connector, or format should natively support most SQL types from the very beginning. Unsupported types lead to confusion, limit the usability, and create overhead by touching the same code paths multiple times. For example, when adding a SHIFT_LEFT function, make sure that the contribution is general enough not only for INT but also BIGINT or TINYINT. Testing # Test for nullability.
SQL natively supports NULL for almost every operation and has a 3-valued boolean logic. Make sure to test every feature for nullability as well. Avoid full integration tests
Spawning a Flink mini-cluster and performing compilation of generated code for a SQL query is expensive. Avoid integration tests for planner tests or variations of API calls. Instead, use unit tests that validate the optimized plan which comes out of a planner. Or test the behavior of a runtime operator directly. Compatibility # Don’t introduce physical plan changes in patch releases!
Backwards compatibility for state in streaming SQL relies on the fact that the physical execution plan remains stable. Otherwise the generated Operator Names/IDs change and state cannot be matched and restored. Every bug fix that leads to changes in the optimized physical plan of a streaming pipeline hences breaks compatibility. As a consequence, changes of the kind that lead to different optimizer plans can only be merged in major releases for now. Scala / Java interoperability (legacy code parts) # Keep Java in mind when designing interfaces.
Consider whether a class will need to interact with a Java class in the future. Use Java collections and Java Optional in interfaces for a smooth integration with Java code. Don’t use features of case classes such as .copy() or apply() for construction if a class is subjected to be converted to Java. Pure Scala user-facing APIs should use pure Scala collections/iterables/etc. for natural and idiomatic (“scalaesk”) integration with Scala. `}),e.add({id:270,href:"/how-to-contribute/code-style-and-quality-formatting/",title:"Code Style and Quality Guide — Formatting Guide",section:"How to Contribute",content:` Code Style and Quality Guide — Formatting Guide # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # Java Code Formatting Style # We recommend to set up the IDE to automatically check the code style. Please follow the IDE Setup Guide to set up spotless and checkstyle .
License # Apache license headers. Make sure you have Apache License headers in your files. The RAT plugin is checking for that when you build the code. Imports # Empty line before and after package declaration. No unused imports. No redundant imports. No wildcard imports. They can cause problems when adding to the code and in some cases even during refactoring. Import order. Imports must be ordered alphabetically, grouped into the following blocks, with each block separated by an empty line: &lt;imports from org.apache.flink.*&gt; &lt;imports from org.apache.flink.shaded.*&gt; &lt;imports from other libraries&gt; &lt;imports from javax.*&gt; &lt;imports from java.*&gt; &lt;imports from scala.*&gt; &lt;static imports&gt; Naming # Package names must start with a letter, and must not contain upper-case letters or special characters. Non-private static final fields must be upper-case, with words being separated by underscores.(MY_STATIC_VARIABLE) Non-static fields/methods must be in lower camel case. (myNonStaticField) Whitespaces # Tabs vs. spaces. We are using spaces for indentation, not tabs. No trailing whitespace. Spaces around operators/keywords. Operators (+, =, &gt;, …) and keywords (if, for, catch, …) must have a space before and after them, provided they are not at the start or end of the line. Breaking the lines of too long statements # In general long lines should be avoided for the better readability. Try to use short statements which operate on the same level of abstraction. Break the long statements by creating more local variables, defining helper functions etc.
Two major sources of long lines are:
Long list of arguments in function declaration or call: void func(type1 arg1, type2 arg2, ...) Long sequence of chained calls: list.stream().map(...).reduce(...).collect(...)... Rules about breaking the long lines:
Break the argument list or chain of calls if the line exceeds limit or earlier if you believe that the breaking would improve the code readability If you break the line then each argument/call should have a separate line, including the first one Each new line should have one extra indentation (or two for a function declaration) relative to the line of the parent function name or the called entity Additionally for function arguments:
The opening parenthesis always stays on the line of the parent function name The possible thrown exception list is never broken and stays on the same last line, even if the line length exceeds its limit The line of the function argument should end with a comma staying on the same line except the last argument Example of breaking the list of function arguments:
public void func( int arg1, int arg2, ...) throws E1, E2, E3 { } The dot of a chained call is always on the line of that chained call proceeding the call at the beginning.
Example of breaking the list of chained calls:
values .stream() .map(...) .collect(...); Braces # Left curly braces ({) must not be placed on a new line. Right curly braces (}) must always be placed at the beginning of the line. Blocks. All statements after if, for, while, do, … must always be encapsulated in a block with curly braces (even if the block contains one statement). Javadocs # All public/protected methods and classes must have a Javadoc. The first sentence of the Javadoc must end with a period. Paragraphs must be separated with a new line, and started with . Modifiers # No redundant modifiers. For example, public modifiers in interface methods. Follow JLS3 modifier order. Modifiers must be ordered in the following order: public, protected, private, abstract, static, final, transient, volatile, synchronized, native, strictfp. Files # All files must end with \\n. File length must not exceed 3000 lines. Misc # Arrays must be defined Java-style. For example, public String[] array. Use Flink Preconditions. To increase homogeneity, consistently use the org.apache.flink.Preconditions methods checkNotNull and checkArgument rather than Apache Commons Validate or Google Guava. `}),e.add({id:271,href:"/how-to-contribute/code-style-and-quality-java/",title:"Code Style and Quality Guide — Java",section:"How to Contribute",content:` Code Style and Quality Guide — Java # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # Java Language Features and Libraries # Preconditions and Log Statements # Never concatenate strings in the parameters Don’t: Preconditions.checkState(value &lt;= threshold, &quot;value must be below &quot; + threshold) Don’t: LOG.debug(&quot;value is &quot; + value) Do: Preconditions.checkState(value &lt;= threshold, &quot;value must be below %s&quot;, threshold) Do: LOG.debug(&quot;value is {}&quot;, value) Generics # No raw types: Do not use raw types, unless strictly necessary (sometimes necessary for signature matches, arrays). Suppress warnings for unchecked conversions: Add annotations to suppress warnings, if they cannot be avoided (such as “unchecked”, or “serial”). Otherwise warnings about generics flood the build and drown relevant warnings. equals() / hashCode() # equals() / hashCode() should be added when they are well defined only. They should not be added to enable a simpler assertion in tests when they are not well defined. Use hamcrest matchers in that case: https://github.com/junit-team/junit4/wiki/matchers-and-assertthat A common indicator that the methods are not well defined is when they take a subset of the fields into account (other than fields that are purely auxiliary). When the methods take mutable fields into account, you often have a design issue. The equals()/hashCode() methods suggest to use the type as a key, but the signatures suggest it is safe to keep mutating the type. Java Serialization # Do not use Java Serialization for anything !!!
Do not use Java Serialization for anything !!! !!!
Do not use Java Serialization for anything !!! !!! !!!
Internal to Flink, Java serialization is used to transport messages and programs through RPC. This is the only case where we use Java serialization. Because of that, some classes need to be serializable (if they are transported via RPC).
Serializable classes must define a Serial Version UID:
private static final long serialVersionUID = 1L;
The Serial Version UID for new classes should start at 1 and should generally be bumped on every incompatible change to the class according to the Java serialization compatibility definition (i.e: changing the type of a field, or moving the position of a class in the class hierarchy).
Java Reflection # Avoid using Java’s Reflection API
Java’s Reflection API can be a very useful tool in certain cases but in all cases it is a hack and one should research for alternatives. The only cases where Flink should use reflection are Dynamically loading implementations from another module (like webUI, additional serializers, pluggable query processors). Extracting types inside the TypeExtractor class. This is fragile enough and should not be done outside the TypeExtractor class. Some cases of cross-JDK version features, where we need to use reflection because we cannot assume a class/method to be present in all versions. If you need reflection for accessing methods or fields in tests, it usually indicates some deeper architectural issues, like wrong scoping, bad separation of concerns, or that there is no clean way to provide components / dependencies to the class that is tested Collections # ArrayList and ArrayDeque are almost always superior to LinkedList, except when frequently insert and deleting in the middle of the list For Maps, avoid patterns that require multiple lookups contains() before get() → get() and check null contains() before put() → putIfAbsent() or computeIfAbsent() Iterating over keys, getting values → iterate over entrySet() Set the initial capacity for a collection only if there is a good proven reason for that, otherwise do not clutter the code. In case of Maps it can be even deluding because the Map&rsquo;s load factor effectively reduces the capacity. Java Optional # Use @Nullable annotation where you do not use Optional for the nullable values. If you can prove that Optional usage would lead to a performance degradation in critical code then fallback to @Nullable. Always use Optional to return nullable values in the API/public methods except the case of a proven performance concern. Do not use Optional as a function argument, instead either overload the method or use the Builder pattern for the set of function arguments. Note: an Optional argument can be allowed in a private helper method if you believe that it simplifies the code (example). Do not use Optional for class fields. Lambdas # Prefer non-capturing lambdas (lambdas that do not contain references to the outer scope). Capturing lambdas need to create a new object instance for every call. Non-capturing lambdas can use the same instance for each invocation.
don’t:
map.computeIfAbsent(key, x -&gt; key.toLowerCase()) do:
map.computeIfAbsent(key, k -&gt; k.toLowerCase()); Consider method references instead of inline lambdas
don’t:
map.computeIfAbsent(key, k-&gt; Loader.load(k)); do:
map.computeIfAbsent(key, Loader::load); Java Streams # Avoid Java Streams in any performance critical code. The main motivation to use Java Streams would be to improve code readability. As such, they can be a good match in parts of the code that are not data-intensive, but deal with coordination.. Even in the latter case, try to limit the scope to a method, or a few private methods within an internal class. `}),e.add({id:272,href:"/how-to-contribute/code-style-and-quality-pull-requests/",title:"Code Style and Quality Guide — Pull Requests & Changes",section:"How to Contribute",content:` Code Style and Quality Guide — Pull Requests &amp; Changes # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # Rationale: We ask contributors to put in a little bit of extra effort to bring pull requests into a state that they can be more easily and more thoroughly reviewed. This helps the community in many ways:
Reviews are much faster and thus contributions get merged sooner. We can ensure higher code quality by overlooking fewer issues in the contributions. Committers can review more contributions in the same time, which helps to keep up with the high rate of contributions that Flink is experiencing Please understand that contributions that do not follow this guide will take longer to review and thus will typically be picked up with lower priority by the community. That is not ill intend, it is due to the added complexity of reviewing unstructured Pull Requests.
1. JIRA issue and Naming # Make sure that the pull request corresponds to a JIRA issue.
Exceptions are hotfixes, like fixing typos in JavaDocs or documentation files.
Name the pull request in the form [FLINK-XXXX][component] Title of the pull request, where FLINK-XXXX should be replaced by the actual issue number. The components should be the same as used in the JIRA issue.
Hotfixes should be named for example [hotfix][docs] Fix typo in event time introduction or [hotfix][javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator.
2. Description # Please fill out the pull request template to describe the contribution. Please describe it such that the reviewer understands the problem and solution from the description, not only from the code.
A stellar example of a well-described pull request is https://github.com/apache/flink/pull/7264
Make sure that the description is adequate for the problem solved by PR. Small changes do not need a wall of text. In ideal cases, the problem was described in the Jira issue and the description be mostly copied from there.
If additional open questions / issues were discovered during the implementation and you made a choice regarding those, describe them in the pull request text so that reviewers can double check the assumptions. And example is in https://github.com/apache/flink/pull/8290 (Section “Open Architecture Questions”).
3. Separate Refactoring, Cleanup and Independent Changes # NOTE: This is not an optimization, this is a critical requirement.
Pull Requests must put cleanup, refactoring, and core changes into separate commits. That way, the reviewer can look independently at the cleanup and refactoring and ensure that those changes to not alter the behavior. Then the reviewer can look at the core changes in isolation (without the noise of other changes) and ensure that this is a clean and robust change.
Examples for changes that strictly need to go into a separate commit include
Cleanup, fixing style and warnings in pre-existing code Renaming packages, classes, or methods Moving code (to other packages or classes) Refactoring structure or changing design patterns Consolidating related tests or utilities Changing the assumptions in existing tests (add a commit message that describes why the changed assumptions make sense). There should be no cleanup commits that fix issues that have been introduced in previous commits of the same PR. Commits should be clean in themselves.
In addition, any larger contributions should split the changes into a set of independent changes that can be independently reviewed.
Two great examples of splitting issues into separate commits are:
https://github.com/apache/flink/pull/6692 (splits cleanup and refactoring from main changes) https://github.com/apache/flink/pull/7264 (splits also main changes into independently reviewable pieces) If a pull request does still contain big commits (e.g. a commit with more than 1000 changed lines), it might be worth thinking about how to split the commit into multiple subproblems, as in the example above.
4. Commit Naming Conventions # Commit messages should follow a similar pattern as the pull request as a whole: [FLINK-XXXX][component] Commit description.
In some cases, the issue might be a subtask here, and the component may be different from the Pull Request’s main component. For example, when the commit introduces an end-to-end test for a runtime change, the PR would be tagged as [runtime], but the individual commit would be tagged as [e2e].
Examples for commit messages:
[hotfix] Fix update_branch_version.sh to allow version suffixes [hotfix] [table] Remove unused geometry dependency [FLINK-11704][tests] Improve AbstractCheckpointStateOutputStreamTestBase [FLINK-10569][runtime] Remove Instance usage in ExecutionVertexCancelTest [FLINK-11702][table-planner-blink] Introduce a new table type system 5. Changes to the observable behavior of the system # Contributors should be aware of changes in their PRs that break the observable behavior of Flink in any way because in many cases such changes can break existing setups. Red flags that should raise questions while coding or in reviews with respect to this problem are for example:
Assertions have been changed to make tests pass again with the breaking change. Configuration setting that must suddenly be set to (non-default) values to keep existing tests passing. This can happen in particular for new settings with a breaking default. Existing scripts or configurations have to be adjusted. `}),e.add({id:273,href:"/how-to-contribute/code-style-and-quality-scala/",title:"Code Style and Quality Guide — Scala",section:"How to Contribute",content:` Code Style and Quality Guide — Scala # Preamble # Pull Requests &amp; Changes # Common Coding Guide # Java Language Guide # Scala Language Guide # Components Guide # Formatting Guide # Scala Language Features # Where to use (and not use) Scala # We use Scala for Scala APIs or pure Scala Libraries.
We do not use Scala in the core APIs and runtime components. We aim to remove existing Scala use (code and dependencies) from those components.
⇒ This is not because we do not like Scala, it is a consequence of “the right tool for the right job” approach (see below).
For APIs, we develop the foundation in Java, and layer Scala on top.
This has traditionally given the best interoperability for both Java and Scala It does mean dedicated effort to keep the Scala API up to date Why don’t we use Scala in the core APIs and runtime?
The past has shown that Scala evolves too quickly with tricky changes in functionality. Each Scala version upgrade was a rather big effort process for the Flink community. Scala does not always interact nicely with Java classes, e.g. Scala’s visibility scopes work differently and often expose more to Java consumers than desired Scala adds an additional layer of complexity to artifact/dependency management. We may want to keep Scala dependent libraries like Akka in the runtime, but abstract them via an interface and load them in a separate classloader, to keep them shielded and avoid version conflicts. Scala makes it very easy for knowledgeable Scala programmers to write code that is very hard to understand for programmers that are less knowledgeable in Scala. That is especially tricky for an open source project with a broad community of diverse experience levels. Working around this means restricting the Scala feature set by a lot, which defeats a good amount of the purpose of using Scala in the first place. API Parity # Keep Java API and Scala API in sync in terms of functionality and code quality.
The Scala API should cover all the features of the Java APIs as well.
Scala APIs should have a “completeness test”, like the following example from the DataStream API: https://github.com/apache/flink/blob/master/flink-streaming-scala/src/test/scala/org/apache/flink/streaming/api/scala/StreamingScalaAPICompletenessTest.scala
Language Features # Avoid Scala implicits. Scala’s implicits should only be used for user-facing API improvements such as the Table API expressions or type information extraction. Don’t use them for internal “magic”. Add explicit types for class members. Don’t rely on implicit type inference for class fields and methods return types:
Don’t:
var expressions = new java.util.ArrayList[String]() Do:
var expressions: java.util.List[String] = new java.util.ArrayList[]() Type inference for local variables on the stack is fine.
Use strict visibility. Avoid Scala’s package private features (such as private[flink]) and use regular private/protected instead. Keep in mind that private[flink] and protected members are public in Java. Keep in mind that private[flink] still exposes all members in Flink provided examples. Coding Formatting # Use line wrapping to structure your code.
Scala’s functional nature allows for long transformation chains (x.map().map().foreach()). In order to force implementers to structure their code, the line length is therefore limited to 100 characters. Use one line per transformation for better maintainability. `}),e.add({id:274,href:"/flink-packages/",title:"flink-packages.org",section:"Apache Flink® — Stateful Computations over Data Streams",content:` What is the Flink Kubernetes Operator? # All information on the flink-packages can be found on the flink-packages website.
`}),e.add({id:275,href:"/material/",title:"Material",section:"Apache Flink® — Stateful Computations over Data Streams",content:` Material # Apache Flink Logos # We provide the Apache Flink logo in different sizes and formats. You can download all variants (7.4 MB) or just pick the one you need from this page.
Portable Network Graphics (PNG) # Colored logo White filled logo Black outline logo Sizes (px) 50x50, 100x100, 200x200, 500x500, 1000x1000 Sizes (px): 50x50, 100x100, 200x200, 500x500, 1000x1000
Sizes (px): 50x50, 100x100, 200x200, 500x500, 1000x1000 You can find more variants of the logo in this directory or download all variants (7.4 MB).
Scalable Vector Graphics (SVG) # Colored logo White filled logo Black outline logo Colored logo with black text (color_black.svg) White filled logo (white_filled.svg) Black outline logo (black_outline.svg) You can find more variants of the logo in this directory or download all variants (7.4 MB).
Photoshop (PSD) # You can download the logo in PSD format as well:
Colored logo: 1000x1000. Black outline logo with text: 1000x1000, 5000x5000. You can find more variants of the logo in this directory or download all variants (7.4 MB).
Color Scheme # You can use the provided color scheme which incorporates some colors of the Flink logo:
PDF color scheme Powerpoint color scheme `}),e.add({id:276,href:"/what-is-flink-ml/",title:"What is Flink ML?",section:"Apache Flink® — Stateful Computations over Data Streams",content:` What is Stateful Functions? # All information on Flink ML can be found on the Flink ML website.
`}),e.add({id:277,href:"/what-is-flink-table-store/",title:"What is Paimon(incubating) (formerly Flink Table Store)?",section:"Apache Flink® — Stateful Computations over Data Streams",content:` What is Apache Paimon (formerly Flink Table Store)? # The Flink Table Store had joined Apache Incubator as Apache Paimon(incubating). All information on the Apache Paimon(incubating) can be found on the Paimon website.
`}),e.add({id:278,href:"/what-is-stateful-functions/",title:"What is Stateful Functions?",section:"Apache Flink® — Stateful Computations over Data Streams",content:` What is Stateful Functions? # All information on Stateful Functions can be found on the Stateful Functions project website.
`}),e.add({id:279,href:"/what-is-the-flink-kubernetes-operator/",title:"What is the Flink Kubernetes Operator?",section:"Apache Flink® — Stateful Computations over Data Streams",content:` What is the Flink Kubernetes Operator? # All information on the Flink Kubernetes Operator can be found on the Flink Kubernetes Operator website.
`})})()