These release notes discuss important aspects, such as configuration, behavior, or dependencies, that changed between Flink 1.15 and Flink 1.16. Please read these notes carefully if you are planning to upgrade your Flink version to 1.16.
The host/web-ui-port parameters of the jobmanager.sh script have been deprecated. These can (and should) be specified with the corresponding options as dynamic properties.
The deprecated String expression DSL has been removed from Java/Scala/Python Table API.
Adds retryable lookup join to support both async and sync lookups in order to solve the delayed updates issue in external systems.
AsyncDataStream.OutputMode configurable for table moduleIt is recommend to set the new option ‘table.exec.async-lookup.output-mode’ to ‘ALLOW_UNORDERED’ when no stritctly output order is needed, this will yield significant performance gains on append-only streams
For complex streaming jobs, now it's possible to detect and resolve potential correctness issues before running.
The Elasticsearch connector has been copied from the Flink repository to its own individual repository at https://github.com/apache/flink-connector-elasticsearch. For this release, identical Elasticsearch connector artifacts will be available from both repositories but with different versions. For example, the first releases will be 1.16.0 and the externally versioned and maintained artifact 3.0.0. Developers are encouraged to move to the latter during this release cycle.
Support for Hive 1.*, 2.1.* and 2.2.* has been dropped from Flink. These Hive versions are no longer supported by the Hive community and therefore are also no longer supported by Flink.
In batch mode, Hive sink now will report statistics to Hive metastore by default for written tables and partitions. This might be time-consuming when there are many written files. You can disable this feature by setting table.exec.hive.sink.statistic-auto-gather.enable to false.
A number of breaking changes were made to the Pulsar Connector cursor APIs:
CursorPosition#seekPosition() has been removed.StartCursor#seekPosition() has been removed.StopCursor#shouldStop now returns a StopCondition instead of a boolean.The StreamingFileSink has been deprecated in favor of the unified FileSink since Flink 1.12.
Avro schemas generated by Flink now use the “org.apache.flink.avro.generated” namespace for compatibility with the Avro Python SDK.
Supports configurable RateLimitingStrategy for the AsyncSinkWriter. This change allows sink implementers to change the behaviour of an AsyncSink when requests fail, for a specific sink. If no RateLimitingStrategy is specified, it will use the current default of AIMDRateLimitingStrategy.
Configuring reporters by their class has been deprecated. Reporter implementations should provide a MetricReporterFactory, and all configurations should be migrated to such a factory.
If the reporter is loaded from the plugins directory, setting metrics.reporter.reporter_name.class no longer works.
The ‘tags’ option from the DatadogReporter has been deprecated in favor of the generic ‘scope.variables.additional’ option.
The REST API now returns a 503 Service Unavailable error when a request is made but the backing component isn't ready yet. Previously this returned a 500 Internal Server error.
The JobID in application mode is no longer 0000000000, but instead based on the cluster ID.
New concept of overdraft network buffers was introduced to mitigate effects of uninterruptible blocking a subtask thread during back pressure. Starting from 1.16.0 Flink subtask can request by default up to 5 extra (overdraft) buffers over the regular configured amount(you can read more about this in the documentation: https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/network_mem_tuning/#overdraft-buffers). This change can slightly increase memory consumption of the Flink Job. To restore the older behaviour you can set taskmanager.network.memory.max-overdraft-buffers-per-gate to zero.
1.15.0 and 1.15.1 generated non-deterministic UIDs for operators, which make it difficult/impossible to restore state or upgrade to next patch version. A new table.exec.uid.generation config option (with correct default behavior) disables setting a UID for new pipelines from non-compiled plans. Existing pipelines can set table.exec.uid.generation=ALWAYS if the 1.15.0/1 behavior was acceptable.
Python 3.6 extended support ended on 23 December 2021. We plan that PyFlink 1.16 will be the last version support Python3.6.
The Hadoop implementation used for Flink's filesystem implementation has been updated. This provides Flink users with the features that are listed under https://issues.apache.org/jira/browse/HADOOP-17566. One of these features is to enable client-side encryption of Flink state via https://issues.apache.org/jira/browse/HADOOP-13887.
Kafka connector uses Kafka client 3.1.1 by default now.
Upgrade Hive 2.3 connector to version 2.3.9
For support of Python3.9 and M1, PyFlink updates a series dependencies version: apache-beam==2.38.0 arrow==5.0.0 pemja==0.2.6
System resource metrics dependencies has been updated to the following:
com.github.oshi:oshi-core:6.1.5 (licensed under MIT license) net.java.dev.jna:jna-platform:jar:5.10.0 net.java.dev.jna:jna:jar:5.10.0