--experiments=use_legacy_bq_sink
.apache_beam.io.external.jdbc
(BEAM-10135, BEAM-10136).apache_beam.io.gcp.bigquery.ReadFromBigQuery
. This transform is experimental. It reads data from BigQuery by exporting data to Avro files, and reading those files. It also supports reading data by exporting to JSON files. This has small differences in behavior for Time and Date-related fields. See Pydoc for more information.RowJson.RowJsonDeserializer
, JsonToRow
, and PubsubJsonTableProvider
now accept “implicit nulls” by default when deserializing JSON (Java) (BEAM-10220). Previously nulls could only be represented with explicit null values, as in {"foo": "bar", "baz": null}
, whereas an implicit null like {"foo": "bar"}
would raise an exception. Now both JSON strings will yield the same result by default. This behavior can be overridden with RowJson.RowJsonDeserializer#withNullBehavior
.GroupIntoBatches
experimental transform in Python to actually group batches by key. This changes the output type for this transform (BEAM-6696).--workerCacheMB
flag is supported in Dataflow streaming pipeline (BEAM-9964)--direct_num_workers=0
is supported for FnApi runner. It will set the number of threads/subprocesses to number of cores of the machine executing the pipeline (BEAM-9443).--job_endpoint
to be set when using --runner=PortableRunner
(BEAM-9860). Users seeking the old default behavior should set --runner=FlinkRunner
instead.apache_beam.io.gcp.datastore.v1
has been removed as the client it uses is out of date and does not support Python 3 (BEAM-9529). Please migrate your code to use apache_beam.io.gcp.datastore.v1new. See the updated datastore_wordcount for example usage.Python SDK will now use Python 3 type annotations as pipeline type hints. (#10717)
If you suspect that this feature is causing your pipeline to fail, calling apache_beam.typehints.disable_type_annotations()
before pipeline creation will disable is completely, and decorating specific functions (such as process()
) with @apache_beam.typehints.no_annotations
will disable it for that function.
More details will be in Ensuring Python Type Safety and an upcoming blog post.
Java SDK: Introducing the concept of options in Beam Schema’s. These options add extra context to fields and schemas. This replaces the current Beam metadata that is present in a FieldType only, options are available in fields and row schemas. Schema options are fully typed and can contain complex rows. Remark: Schema aware is still experimental. (BEAM-9035)
Java SDK: The protobuf extension is fully schema aware and also includes protobuf option conversion to beam schema options. Remark: Schema aware is still experimental. (BEAM-9044)
Added ability to write to BigQuery via Avro file loads (Python) (BEAM-8841)
By default, file loads will be done using JSON, but it is possible to specify the temp_file_format parameter to perform file exports with AVRO. AVRO-based file loads work by exporting Python types into Avro types, so to switch to Avro-based loads, you will need to change your data types from Json-compatible types (string-type dates and timestamp, long numeric values as strings) into Python native types that are written to Avro (Python's date, datetime types, decimal, etc). For more information see https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#avro_conversions.
Added integration of Java SDK with Google Cloud AI VideoIntelligence service (BEAM-9147)
Added integration of Java SDK with Google Cloud AI natural language processing API (BEAM-9634)
docker-pull-licenses
tag was introduced. Licenses/notices of third party dependencies will be added to the docker images when docker-pull-licenses
was set. The files are added to /opt/apache/beam/third_party_licenses/
. By default, no licenses/notices are added to the docker images. (BEAM-9136)
--region
option to be set, unless a default value is set in the environment (BEAM-9199). See here for more details.2.23.0
. (BEAM-9704)--zone
option in the Dataflow runner is now deprecated. Please use --worker_zone
instead. (BEAM-9716)SpannerConfig.connectToSpanner
has been moved to SpannerAccessor.create
. (BEAM-9310).force_generated_pcollection_output_ids
experiment.