BoundedSourceAsSDFWrapperFn
and UnboundedSourceAsSDFWrapper
.javax.jms.Message
(Java) (BEAM-16308).topicNameMapper
must be set to extract the topic name from the input value.valueMapper
must be set to convert the input value to JMS message.DataFrame.unstack()
, DataFrame.pivot()
and Series.unstack()
implemented for DataFrame API (BEAM-13948, BEAM-13966).ShallowCloneParDoPayload()
, ShallowCloneSideInput()
, and ShallowCloneFunctionSpec()
have been removed from the Go SDK's pipelinex package (BEAM-13739).valueMapper
to be set (BEAM-16308). You can use the TextMessageMapper
to convert String
inputs to JMS TestMessage
s:JmsIO.<String>write() .withConnectionFactory(jmsConnectionFactory) .withValueMapper(new TextMessageMapper());
metadata()
added to io.filesystem.FileSystem in the Python SDK. (BEAM-14314)--experiments=disable_projection_pushdown
.amazon-web-services2
has reached feature parity and is finally recommended over the earlier amazon-web-services
and kinesis
modules (Java). These will be deprecated in one of the next releases (BEAM-13174).Kinesis
was added (BEAM-13175).AwsOptions
(BEAM-13563, BEAM-13663, BEAM-13587).S3
Filesystem (BEAM-13245, BEAM-13246, BEAM-13441, BEAM-13445, BEAM-14011), DynamoDB
IO (BEAM-13209, BEAM-13209), SQS
IO (BEAM-13631, BEAM-13510) and others.--requirements_file
will now be staged to the runner using binary distributions (wheels) of the PyPI packages for linux_x86_64 platform (BEAM-4032). To restore the behavior to use source distributions, set pipeline option --requirements_cache_only_sources
. To skip staging the packages at submission time, set pipeline option --requirements_cache=skip
(Python).DoFn.infer_output_types
was expected to return Iterable[element_type]
where element_type
is the PCollection elemnt type. It is now expected to return element_type
. Take care if you have overriden infer_output_type
in a DoFn
(this is not common). See BEAM-13860.amazon-web-services2
) The types of awsRegion
/ endpoint
in AwsOptions
changed from String to Region
/ URI
(BEAM-13563).amazon-web-services2
) Client providers (withXYZClientProvider()
) as well as IO specific RetryConfiguration
s are deprecated, instead use withClientConfiguration()
or AwsOptions
to configure AWS IOs / clients. Custom implementations of client providers shall be replaced with a respective ClientBuilderFactory
and configured through AwsOptions
(BEAM-13563).pyarrow
version parsing (Python)(BEAM-14235)SqsMessage
for AWS IOs for SDK v2 was changed from String
to long
, visibility of all fields was fixed from package private
to public
BEAM-13638.JdbcIO.readWithPartitions
from int
to long
(BEAM-13149). This is a relatively minor breaking change, which we're implementing to improve the usability of the transform without increasing cruft. This transform is relatively new, so we may implement other breaking changes in the future to improve its usability.with_exception_handling
option for easily ignoring bad records and implementing the dead letter pattern.ReadFromBigQuery
and ReadAllFromBigQuery
now run queries with BATCH priority by default. The query_priority
parameter is introduced to the same transforms to allow configuring the query priority (Python) (BEAM-12913).ReadFromBigQuery
. The newly introduced method
parameter can be set as DIRECT_READ
to use the Storage Read API. The default is EXPORT
which invokes a BigQuery export request. (Python) (BEAM-10917).use_native_datetime
parameter to ReadFromBigQuery
to configure the return type of DATETIME fields when using ReadFromBigQuery
. This parameter can only be used when method = DIRECT_READ
(Python) (BEAM-10917).dataframe
extra to the Python SDK that tracks pandas
versions we've verified compatibility with. We now recommend installing Beam with pip install apache-beam[dataframe]
when you intend to use the DataFrame API (BEAM-12906).go.mod
files will need to change to require github.com/apache/beam/sdks/v2
..../sdks/go/...
to .../sdks/v2/go/...
useReflectApi
setting to control it (BEAM-12628).--allow_unsafe_triggers
. (BEAM-9487).--allow_unsafe_triggers
flag starting with Beam 2.34. (BEAM-9487).spark.jackson.version
) to at least version 2.9.2, due to Beam updating its dependencies.VARCHAR
, NVARCHAR
, LONGVARCHAR
, LONGNVARCHAR
, DATE
, TIME
(Java)(BEAM-12385).--allow_unsafe_triggers
. (BEAM-9487).--allow_unsafe_triggers
flag starting with Beam 2.33. (BEAM-9487).CREATE FUNCTION
DDL statement added to Calcite SQL syntax. JAR
and AGGREGATE
are now reserved keywords. (BEAM-12339).TriggerFn
has a new may_lose_data
method to signal potential data loss. Default behavior assumes safe (necessary for backwards compatibility). See Deprecations for potential impact of overriding this. (BEAM-9487).Row(x=3, y=4)
is no longer considered equal to Row(y=4, x=3)
(BEAM-11929).TopCombineFn
disallow compare
as its argument (Python) (BEAM-7372).--allow_unsafe_triggers
. (BEAM-9487).--allow_unsafe_triggers
flag starting with Beam 2.33. (BEAM-9487).FakeDeterministicFastPrimitivesCoder
with beam.coders.registry.register_fallback_coder(beam.coders.coders.FakeDeterministicFastPrimitivesCoder())
or use the allow_non_deterministic_key_coders
pipeline option.dependencyManagement
in Maven and force
in Gradle.ReadAllFromBigQuery
that can receive multiple requests to read data from BigQuery at pipeline runtime. See PR 13170, and BEAM-9650.--region
flag in amazon-web-services2 was replaced by --awsRegion
(BEAM-11331).--experiments=use_deprecated_read
. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to user@beam.apache.org specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) (BEAM-10670)--HTTPWriteTimeout=0
to revert to the old behavior. (BEAM-6103)--experiments=use_runner_v2
before using this feature. (BEAM-3736)CombineFn.from_callable()
or CombineFn.maybe_from_callable()
can lead to incorrect behavior. (BEAM-11522).--experiments=use_deprecated_read
. The Apache Beam community is looking for feedback for this change as the community is planning to make this change permanent with no opt-out. If you run into an issue requiring the opt-out, please send an e-mail to user@beam.apache.org specifically referencing BEAM-10670 in the subject line and why you needed to opt-out. (Java) (BEAM-10670)apache_beam.io.kinesis
(BEAM-10138, BEAM-10137).apache_beam.io.snowflake
(BEAM-9898).ElasticsearchIO#Write
. Now, Java’s ElasticsearchIO can be used to selectively delete documents using withIsDeleteFn
function (BEAM-5757).ReadFromBigQuery
added. (Python) (BEAM-10524)@ptransform_fn
decorators in the Python SDK. (BEAM-4091) This has not enabled by default to preserve backwards compatibility; use the --type_check_additional=ptransform_fn
flag to enable. It may be enabled by default in future versions of Beam.apache_beam.io.external.snowflake
to apache_beam.io.snowflake
. The previous path will be removed in the future versions.--experiments=use_legacy_bq_sink
.apache_beam.io.jdbc
(BEAM-10135, BEAM-10136).apache_beam.io.external.snowflake
(BEAM-9897).typing.FrozenSet
type hints, which are not interchangeable with typing.Set
. You may need to update your pipelines if type checking fails. (BEAM-10197)apache_beam.io.gcp.bigquery.ReadFromBigQuery
. This transform is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See Pydoc for more information.RowJson.RowJsonDeserializer
, JsonToRow
, and PubsubJsonTableProvider
now accept “implicit nulls” by default when deserializing JSON (Java) (BEAM-10220). Previously nulls could only be represented with explicit null values, as in {"foo": "bar", "baz": null}
, whereas an implicit null like {"foo": "bar"}
would raise an exception. Now both JSON strings will yield the same result by default. This behavior can be overridden with RowJson.RowJsonDeserializer#withNullBehavior
.GroupIntoBatches
experimental transform in Python to actually group batches by key. This changes the output type for this transform (BEAM-6696).--workerCacheMB
flag is supported in Dataflow streaming pipeline (BEAM-9964)--direct_num_workers=0
is supported for FnApi runner. It will set the number of threads/subprocesses to number of cores of the machine executing the pipeline (BEAM-9443).--job_endpoint
to be set when using --runner=PortableRunner
(BEAM-9860). Users seeking the old default behavior should set --runner=FlinkRunner
instead.apache_beam.io.gcp.datastore.v1
has been removed as the client it uses is out of date and does not support Python 3 (BEAM-9529). Please migrate your code to use apache_beam.io.gcp.datastore.v1new. See the updated datastore_wordcount for example usage.Python SDK will now use Python 3 type annotations as pipeline type hints. (#10717)
If you suspect that this feature is causing your pipeline to fail, calling apache_beam.typehints.disable_type_annotations()
before pipeline creation will disable is completely, and decorating specific functions (such as process()
) with @apache_beam.typehints.no_annotations
will disable it for that function.
More details will be in Ensuring Python Type Safety and an upcoming blog post.
Java SDK: Introducing the concept of options in Beam Schemas. These options add extra context to fields and schemas. This replaces the current Beam metadata that is present in a FieldType only, options are available in fields and row schemas. Schema options are fully typed and can contain complex rows. Remark: Schema aware is still experimental. (BEAM-9035)
Java SDK: The protobuf extension is fully schema aware and also includes protobuf option conversion to beam schema options. Remark: Schema aware is still experimental. (BEAM-9044)
Added ability to write to BigQuery via Avro file loads (Python) (BEAM-8841)
By default, file loads will be done using JSON, but it is possible to specify the temp_file_format parameter to perform file exports with AVRO. AVRO-based file loads work by exporting Python types into Avro types, so to switch to Avro-based loads, you will need to change your data types from Json-compatible types (string-type dates and timestamp, long numeric values as strings) into Python native types that are written to Avro (Python's date, datetime types, decimal, etc). For more information see https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions.
Added integration of Java SDK with Google Cloud AI VideoIntelligence service (BEAM-9147)
Added integration of Java SDK with Google Cloud AI natural language processing API (BEAM-9634)
docker-pull-licenses
tag was introduced. Licenses/notices of third party dependencies will be added to the docker images when docker-pull-licenses
was set. The files are added to /opt/apache/beam/third_party_licenses/
. By default, no licenses/notices are added to the docker images. (BEAM-9136)
--region
option to be set, unless a default value is set in the environment (BEAM-9199). See here for more details.2.23.0
. (BEAM-9704)--zone
option in the Dataflow runner is now deprecated. Please use --worker_zone
instead. (BEAM-9716)SpannerConfig.connectToSpanner
has been moved to SpannerAccessor.create
. (BEAM-9310).force_generated_pcollection_output_ids
experiment.